frontline learning research 5 special issue „learning through networks‟ (2014): 1 3 issn 2295-3159 corresponding author: anoush margaryan, anoush.margaryan@gcu.ac.uk doi: http://dx.doi.org/10.14786/flr.v2i2.123 1 | f l r introduction to the special issue ‘learning through networks’ anoush margaryan a a caledonian academy, glasgow caledonian university article received 26 june 2014 / revised 9 july 2014 / accepted 11 july 2014 / available online 15 july 2014 this special issue examines the role of networks in professionals‟ learning. by networks, we specifically mean personal professional networks, which may or may not be mediated by digital technology such as social media. the special issue is based on a symposium „learning through networks‟ held at the 2013 conference of the european association for research in learning and instruction (earli) in munich, germany. a special issue capturing current empirical and conceptual research in this area is timely. the importance of the social dimension of learning, in particular of learning from the experience of others, is firmly established in the literature. there are many different types of social formations that have been studied within the learning sciences– groups, teams, communities, collectives and increasingly also networks (dron and anderson, 2007; mccormick, fox, carmichael and procter, 2011). recently, the concept of „networked expertise‟ (hakkarainen, palonen, paavola, & lehtinen, 2008) has been put forward to characterise learning and development in professional contexts. yet, despite the growing recognition of the importance of networks in learning, it is not well understood what precisely is learned through networks, how it is learned, and what environmental factors organisational, social, structural or technological enable or constrain learning through networks (littlejohn and margaryan, 2013). identifying and analysing the mechanisms and factors of learning through networks are vital to our understanding of the contemporary professional learning. comprising contributions from researchers in four european countries (germany, netherlands, finland and the uk), this special issue brings together examples of emergent empirical and conceptual research in learning through networks and proposes recommendations to stimulate future research and development in this area. contextualized in different settings and using complementary approaches, the contributions collectively address the following overarching questions: 1. what is learned through networks and how it is learned? 2. what is the potential of applying a social network perspective to understanding the nature of learning through networks? a. margaryan 2 | f l r 3. what key factors – individual, structural, organisational, technological – impact professional's learning through networks and how they impact it? there are five contributions in this special issue. three of these contributions are reports of new empirical data. these papers draw on social network analysis (sna) and/or semi-structured interviews to examine the structure of networks of professionals and to identify what professionals learn through these networks. pataraia, falconer, margaryan, littlejohn and fincher examine personal networks of academics teaching in universities, analyzing the types of interactions that academics engage in and the implications of these interactions for their professional learning and improvement of their teaching practice. hytonen, palonen and hakkarainen on one hand, and rehm, gijselaers and segers on the other hand examine the impact of individuals‟ hierarchical positions within networks upon their opportunities for learning and knowledge sharing. these three empirical contributions are supplemented by a conceptual review paper by vaessen, van den beemt and de laat which draws on a synthesis of workplace learning, hrd, organisational and management science and learning science literatures to analyse key organisational factors impacting upon learning through networks and to propose how informal networked learning practices of professionals can be integrated within the formal organisational structures. the special issue concludes with a commentary, in which de laat and strijbos abstract and synthesize the key themes arising from the special issue contributions and outline a range of recommendations and directions for future research and development in the field. in his opening editorial in the first issue of frontline learning research lehtinen (2013) outlined the rationale: “…to develop a journal which would explicitly support innovative theoretical and methodological thinking and increase dynamics in the field.” (p.1). this special issue offers a number of innovative methodological and theoretical insights and ideas discussed in detail in de laat and strijbos (this issue). first, it contributes much needed empirical evidence about individual‟s networked learning practices at a range of levels from ego-networks and sub-networks to whole networks, elucidating the configurations and contents of these networks and their value for learning, development and improvement of professional practice. second, the special issue provides examples of application of social network analysis to study learning ties within a network rather than only the structure and dynamics of the network, generating new directions for future research. we hope you benefit from the contributions assembled here. acknowledgments i would like to thank frontline learning research and in particular erno lehtinen for the opportunity to publish this special issue and also to inneke berghmans and eva vanhee for their excellent editorial support. i am very grateful to the anonymous reviewers who closely engaged with the papers, providing feedback to the authors at a short notice. last but not least, thank you to the authors who contributed to this special issue. references dron, j., & anderson, t. (2007). collectives, networks and groups in social software for e-learning. in proceedings of world conference on e-learning in corporate, government, healthcare, and higher education, quebec. [online] www.editlib.org/index.cfm/files/paper_26726.pdf. http://www.editlib.org/index.cfm/files/paper_26726.pdf a. margaryan 3 | f l r hakkarainen, k., palonen, t., paavola, s., & lehtinen, e. (2008). communities of networked expertise. bingley, uk: emerald. lehtinen, e. (2013). frontline research in an accessible and flexible way. frontline learning research 1(1), 1-2. littlejohn, a., & margaryan, a. (2013). technology-enhanced professional learning: processes, practices and tools. london/new york: routledge. mccormick, r., fox, a., carmichael, p., & procter, r. (2011). researching and understanding educational networks. london/new york: routledge. frontline learning research 2 (2013) 99-101 issn 2295-3159 corresponding author: michael schneider, university of trier, www.educational-psychology.uni-trier.de, m.schneider@uni-trier.de, and peter edelsbrunner, eth zurich, www.ifvll.ethz.ch, peter.edelsburnner@ifv.gess.ethz.ch http://dx.doi.org/10.14786/flr.v1i2.74 99 | f l r modelling for prediction vs. modelling for understanding: commentary on musso et al. (2013) peter edelsbrunner a , michael schneider b a eth zurich, switzerland b university of trier, germany article received 11 september 2013 / accepted 12 december 2013 / available online 20 december 2013 abstract musso et al. (2013) predict students’ academic achievement with high accuracy one year in advance from cognitive and demographic variables, using artificial neural networks (anns). they conclude that anns have high potential for theoretical and practical improvements in learning sciences. anns are powerful statistical modelling tools but they can mainly be used for exploratory modelling. moreover, the output generated from anns cannot be fully translated into a meaningful set of rules because they store information about input-output relations in a complex, distributed, and implicit way. these problems hamper systematic theory-building as well as communication and justification of model predictions in practical contexts. modern-day regression techniques, including (bayesian) structural equation models, have advantages similar to those of anns but without the drawbacks. they are able to handle numerous variables, non-linear effects, multi-way interactions, and incomplete data. thus, researchers in the learning sciences should prefer more theory-driven and parsimonious modelling techniques over anns whenever possible. keywords: artificial neural networks; black box; student achievement; statistical modelling musso, kyndt, cascallar, and dochy (2013) conducted a study in which the statistical modelling technique of artificial neural networks (anns) was used to predict the academic achievement of university students a year in advance. the measures used were attention, working memory, learning strategies, and demographic variables. the results were precise estimations of each student’s achievement tercile after their first year at university. this is an impressive success, demonstrating the usefulness of anns as a statistical modelling tool. p. edelsbrunner and m. schneider 100 | f l r the study has raised an important question of the preferred statistical methods used by researchers in learning sciences. should anns replace conventional statistical methods such as multiple regression, discriminant analysis, and structural equation modelling? – the potential of anns cannot be denied especially as a tool to examine predictive patterns in complex systems. however, musso and colleagues overestimate the ability of anns in their application to the learning sciences. they do not mention shortcomings of anns, while overemphasizing shortcomings of competing conventional methods. anns are limited in at least two important ways. first, the construction of ann models such as those used by musso et al. is highly explorative apart from choosing relevant input and output variables (günther, pigeot, &bammann, 2012; scarborough & somers, 2006). the connection weights, which determine how an ann transforms input into output patterns, are not specified by the researchers or based on theory. they are set to random values and changed gradually by an optimisation algorithm. this process usually involves thousands of iterations until each input pattern leads to the desired output pattern in the training data set. anns, thus, cannot be entirely compared to conventional methods since the latter are aimed at confirming or disconfirming pre-specified relations and interactions. in other words, the research question should determine whether the exploratory nature of anns is adequate, or if a conventional, confirmatory model should be the method of choice. second, connection weights cannot be codified into a coherent set of rules that delineate the process by which anns transform input patterns into output patterns. anns typically have a high number of connections between neurons (e.g., 300 in ann1 by musso et al.). the transformation process of input into output patterns is determined by non-linear, multi-way interactions of these connection weights. recent research has attempted to increase the interpretability of anns, for example with the help of visualizations for complex interactions (e.g., cortez & embrechts, 2013; intrator & intrator, 2001). however, the basic problem of how non-linear interactions between hundreds of variables can be understood and communicated in meaningful terms has not yet been solved, causing anns to be frequently characterised as “black boxes” (cf. benitez, castro, & requena, 1997). while one can assess how well an ann works, it is difficult to comprehensively explain why it performs well or not (scarborough & somers, 2006). to interpret their results, musso and colleagues list an importance parameter for each predictor but these parameters do not explain interaction effects or non-linear relations among the variables. in addition, it is difficult to integrate the results of anns across studies and also generalise from samples to underlying populations due to the lack of output parameters such as standard errors and error probabilities. the explorative and opaque nature of anns impedes theory-developing and limits their practical application. each relation in a statistical model should ideally correspond to a matching relation in an educational or psychological theory that justifies and explains the assumed statistical relation. researchers can compare competing theories and advance assumptions that are not in line with the empirical data by fitting a series of statistical models that differ in theoretically relevant aspects (kaplan, 1990). this is not possible with anns because the input-output relations are implicitly coded and distributed over all connection weights, preventing researchers from being able to map elements of an ann and elements of a theory onto each other (luger, 2009, p. 680). the results obtained from ann models are also of limited use for solving real-life problems. this limitation can be illustrated in a situation where diagnosticians would have to tell certain high school students that despite achieving satisfactory levels in their current academic performances, they cannot be admitted to college because an ann predicts low academic performance in the future. in justifying the results, the diagnosticians would have to admit that they cannot explain how the different predictors statistically combine, nor describe the causal processes that will contribute to the anticipated decrease in the students’ achievement. these limitations are unsatisfactory from diagnostic, educational, and public policymaking perspectives. conventional methods represent more parsimonious and theory-driven alternatives to anns because they use smaller numbers of parameters, which enhances the interpretability of results. like anns, modern regression techniques can account for non-linear relations (bates & watts, 2007) and complex interactions between variables (aiken & west, 1991). structural equation models are built on regression techniques and p. edelsbrunner and m. schneider 101 | f l r allow a simultaneous analysis of numerous variables. these models can be estimated by methods that are robust to missing data and non-normal distributions, account for hierarchical data structures, and identify heterogeneous sub-populations in mixture-models (hoyle, 2012). especially bayesian structural equation models represent a strong advancement in modelling non-linear relations, assessing unspecified relations and handling highly non-normal and hierarchical data (song & lee, 2012). in contrast to anns, these modelling techniques require explicit theoretical assumptions about the relationship of the variables and they allow for explicit tests of these assumptions. this might limit their predictive power compared to anns, but it aids theory-building, hypothesis testing, and the communication of model results in practical applications. keypoints artificial neural networks are powerful statistical tools for pattern recognition and prediction. artificial neural networks transform input patterns into output patterns by non-linear multi-way interactions between simulated neurons that are governed by information that is stored in connection weights in an implicit and distributed way. this “black box” nature of artificial neural networks hampers the systematic testing of theories and the communication of results in practical settings. more conventional regression-type models can also handle non-linear relations, interaction effects, and a high number of variables, correlated errors, missing values, and non-normal distributions. artificial neural network analysis cannot replace conventional statistical methods in the learning sciences but may be applicable in specific cases. references aiken, l. s., & west, s. g. (1991). multiple regression: testing and interpreting interactions. newbury park, ca: sage. bates, d. m., & watts, d. g. (2007). nonlinear regression analysis and its applications (2nd ed.). hoboken, nj: wiley. benitez, j. m., castro, j. l., & requena, i. (1997). are artificial neural networks black boxes? ieee transactions on neural networks, 8, 1156-1164. doi:10.1109/72.623216 cortez, p., & embrechts, m. j. (2013). using sensitivity analysis and visualization techniques to open black box data mining models. information sciences, 225, 1-17. doi:http://dx.doi.org/10.1016/j.ins.2012.10.039 günther, f., pigeot, i., & bammann, k. (2012). artificial neural networks modeling gene-environment interaction. bmc genetics, 13(1), 37. doi:10.1186/1471-2156-13-37 hoyle, r. h. (ed.). (2012). handbook of structural equation modeling. new york: guilford press. intrator, o., & intrator, n. (2001). interpreting neural-network results: a simulation study. computational statistics & data analysis, 37, 373-393. doi:10.1016/s0167-9473(01)00016-0 kaplan, d. (1990). evaluating and modifying covariance structure models: a review and recommendation. multivariate behavioral research, 25, 137-155. doi:10.1207/s15327906mbr2502_1 luger, g. f. (2009). artificial intelligence: structures and strategies for complex problem solving (6th ed.). boston, ma: pearson education. musso, m. f., kyndt, e., cascallar, e. c., & dochy, f. (2013). predicting general academic performance and identifying the differential contribution of participating variables using artificial neural networks. frontline learning research, 1, 42-71. retrieved from http://journals.sfu.ca/flr/index.php/journal/article/view/13 p. edelsbrunner and m. schneider 102 | f l r scarborough, d., & somers, m. j. (2006). neural networks in organizational research: applying pattern recognition to the analysis of organizational behavior (pp. 137-144). washington, dc: american psychological association. song, x. y., & lee, s. y. (2012). basic and advanced bayesian structural equation modeling: with applications in the medical and behavioral sciences. chichester, uk: john wiley & sons. codepen introduction publication frontline learning research special issue vol.8 no.5 (2020) 1 4 issn 2295-3159 the happy victimizer pattern in adulthood – state of the art and contrasting approaches: introduction to the special issue eveline gutzwiller-helfenfingea, karin heinrichsb auniversity of duisburg-essen, germany buniversity of education upper austria, austria keywords: happy victimizer phenomenon; moral cognitions; moral emotions; adulthood info corresponding author email: eveline.gutzwiller-helfenfinger@uni-due.dedoi: https://doi.org/10.14786/flr.v8i5.681 introduction the happy victimizer phenomenon (hvp) relates to the stable finding that young children attribute positive emotions like happiness to a rule transgressor despite judging the transgression as wrong (arsenio & kramer, 1992; arsenio & lover, 1995; nunner-winkler, 1999; 2012; nunner-winkler & sodian, 1988), whereas older children attribute negative emotions like shame or guilt. various studies suggest that the hvp disappears in the course of (moral) development (for reviews, see for example arsenio, gold, & adams, 2006; krettenauer, malti & sokol, 2008). in the moral developmental literature, various, partly interrelated explanations and interpretations of this phenomenon have been suggested. the classical explanation as offered by nunner-winkler and sodian (1988) implies that moral cognitions, that is, making and justifying moral judgments, evolve before moral motivation. the attribution of positive emotions to a rule transgressor is seen as indicating a lack of moral motivation. based on the assumption that (negative) moral emotions like guilt can be seen as an indicating that the self does not only know a moral rule but also feel committed towards it (malti, gummerum, keller, & buchmann, 2009), the hvp can be interpreted to the effect that a lack of negative moral emotion attributions coincides with a lack of moral commitment. another, transition-oriented explanation (arsenio et al., 2006; lagattuta, 2005; krettenauer et al., 2008) views the hvp as a developmental transition based on a dis-integration of moral rule knowledge and moral motivation (as assessed by moral emotion attributions and justifications thereof): whereas young children already possess appropriate moral rule knowledge, their development of moral motivation, that is, prioritising moral values over hedonistic needs, is delayed. according to this understanding, the hvp would be restricted to (early) childhood. this position, however, is challenged by recent empirical evidence suggesting that the hvp can also be found in adolescence and adulthood and seems even to be widely spread (e.g., heinrichs, minnameier, gutzwiller-helfenfinger, & latzko, 2015; krettenauer, asendorpf, & nunner-winkler, 2013; krettenauer & eichler, 2006; nunner-winkler, 2007). therefore, the question arises whether the hvp actually does disappear in the course of sociomoral development. this question is essential: if the hvp represents a transitional stage affecting all (or at least the vast majority of) children and adolescents, then the occurrence of the hvp in adolescence and adulthood must represent either a developmental delay or even a deviation. first longitudinal findings do not yet offer a clear picture (krettenauer et al., 2013) however, cross-sectional research can be used to address the very basic question whether the patterns consisting of moral judgment, emotion attribution and respective justifications found in adulthood actually do represent the happy victimizer phenomenon as documented in children. moreover, there are indications that patterns of moral decision-making may also differ according to the specific context or situation referred to (e.g., bienengräber, 2011). we may therefore assume that happy victimizing in adulthood represents a phenomenon that is distinct from the phenomenon studied in children. we therefore suggest that in adolescents and adults, it is more appropriate to speak of happy victimizer patterns, that is, patterns of moral reasoning and emotion attributions which, while having some similarities with the hvp on the surface, carry different meanings and represent more complex moral functioning. this special issue presents new ideas to explain patterns of moral decision-making in adolescence and adulthood. in paper 1 by heinrichs, gutzwiller-helfenfinger, latzko, minnameier, and döring, the current state of the art in happy victimizer research is presented, with a focus on the core theoretical and methodological issues involved in trying to disentangle the phenomenon and the pattern. in papers 2, 3, and 4, empirical evidence is presented on three levels, each level being addressed in at least one of the papers. the levels refer to (a) the emergence of the happy victimizer pattern in adolescence and adulthood; (b) the personal determinants of the happy victimizer pattern; and (c) situational variations in the manifestation of the pattern. moreover, each paper represents a specific theoretical perspective towards studying the pattern. each of these approaches has different pedagogical implications. thus, the action-theoretical explanation (heinrichs, kärner, & reinke) assumes that cognitive control strategies, in particular moral disengagement strategies, play an important role in forming an intention (as the first phase in the process of acting) in the case of an ambivalence or tension between cognitive, emotional, and motivational states. forming an intention to act requires a decision on the part of the individual as well as his/her commitment to a specific course of action. the emotion development perspective (gutzwiller-helfenfinger & latzko) is grounded in the expectation that adults who display the specific judgment-attribution-justification pattern differ from adults not displaying it with respect to their justifications of emotion attributions. the cognitive-structural explanation of the hvp (minnameier) postulates that it can be reconstructed as a specific moral judgment structure (cf. minnameier, 2012) which is applied in specific situations that can be modelled game-theoretically. in their comment to this special issue, gertrud nunner-winkler and beate sodian, the researchers first investigating the role of moral motivation in the course of children’s moral development, critically discuss the ideas and approaches presented and evaluate the relative merit of the respective positions in explaining the occurrence of happy victimizing in adolescence and adulthood. gaining deeper insights into the happy victimizer and happy victimizing contributes to a better understanding of children’s, adolescents’ and adults’ social, emotional and moral – what we call “sociomoral” – learning and development. sociomoral literacy, that is, successfully engaging in meaningful, positive and caring relationships is both a prerequisite for and consequence of successful teaching and learning processes at school (malti, häcker, & nakamura, 2009) and in other learning environments and is especially important in a globalized society (latzko & malti, 2010). if teachers are to foster students’ sociomoral competencies in diverse classrooms and schools, where multiple, sometimes divergent and contradictory social, religious, and moral values are present and often clash, they need a deeper understanding of children’s and adolescents’ moral functioning. thus, both recent curricula and professional standards for teachers have made explicit reference to the necessity of fostering positive social relationships, openness and tolerance towards diversity, conflict resolution, and democratic values (e.g., australian institute for teaching and school leadership, 2011; kultusministerkonferenz, 2014), all of which are based on sociomoral competencies. references arsenio, w. f., & kramer, r. (1992). victimizers and their victims: children's conceptions of the mixed emotional consequences of moral transgressions. child development, 63, 915-927. https://doi.org/10.1111/j.1467-8624.1992.tb01671.x arsenio, w. f., & lover, a. (1995). children’s conceptions of socio-moral affect: happy victimizers, mixed emotions, and other expectancies. in m. killen & d. hart (eds.), morality in everyday life (pp. 87-130). new york: cambridge university press. arsenio, w. f., gold, j., & adams, e. (2006). children's conceptions and displays of moral emotions. in m. killen & j. g. smetana (eds.), handbook of moral development (pp. 581-610). new jersey: erlbaum. australian institute for teaching and school leadership. (2011). australian professional standards for teachers. melbourne: education services australia. bienengräber, t. (2011). situierung oder segmentierung? – zur entstehung einer differenzierten moralischen urteilskompetenz [situationism or segmentation? – on the formation of differentiated moral judgment competence]. zeitschrift für berufsund wirtschaftspädagogik, 107(4), 499-519. gutzwiller-helfenfinger, e., & latzko, b. (2020). happy victimizing in emerging adulthood: reconstruction of a developmental phenomenon?frontline learning research, 8(5), 47-69. https://doi.org/10.14786/flr.v8i5.382 heinrichs, k., kärner, t., & reinke, h. (2020). an action-theoretical approach to the ‘happy victimizer’ pattern – exploring the role of moral disengagement strategies on the way to action , frontline learning research, 8(5), 24-46. https://doi.org/10.14786/flr.v8i5.386 heinrichs, k., minnameier, g., gutzwiller-helfenfinger, e. & latzko, b. (2015). „don’t worry, be happy“? – das happy-victimizer-phänomen im berufsund wirtschaftspädagogischen kontext [the happy victimizer phenomenon in a vocational and business educational context]. zeitschrift für berufsund wirtschaftspädagogik, 111(1), 31-55. heinrichs, k., gutzwiller-helfenfinger, e., latzko, b., minnameier, g. & döring, b. (2020). happy-victimizing in adolescence and adulthood – empirical findings and further perspectives, frontline learning research, 8(5), 5-23. https://doi.org/10.14786/flr.v8i5.385 krettenauer, t., & eichler, d. (2006). adolescents' self-attributed emotions following a moral transgression: relations with delinquency, confidence in moral judgment, and age. british journal of developmental psychology, 24, 489-506. https://doi.org/10.1348/026151005x50825 krettenauer, t., asendorpf, j. b., & nunner-winkler, g. (2013). moral emotion attributions and personality traits as long-term predictors of antisocial conduct in early adulthood findings from a 20-year longitudinal study. international journal of behavioral development, 37(3), 192-201. https://doi.org/10.1177/0165025412472409 krettenauer, t., malti, t., & sokol, b. (2008). the development of moral emotions and the happy victimizer phenomenon: a critical review of theory and applications. european journal of developmental science, 2, 221-235. doi: 10.3233/dev-2008-2303 kultusministerkonferenz. (2014). standards für die lehrerbildung: bildungswissenschaften. [standards for teacher education: educational sciences]. berlin: sekretariat der kultusministerkonferenz. lagatutta, k. h. (2005). when you shouldn’t do what you want to do: young children’s understanding of desires, rules, and emotions. child development, 76, 713-733. https://doi.org/10.1111/j.1467-8624.2005.00873.x latzko, b., & malti, t. (eds.) (2010). children’s moral emotions and moral cognition – developmental and educational perspectives. new directions for child and adolescent development, no. 129. https://doi.org/10.1002/cd.v2010:129 malti, t., gummerum, m., keller, m., & buchmann, m. (2009). children’s moral motivation, sympathy, and prosocial behavior. child development, 80, 442-460. https://doi.org/10.1111/j.1467-8624.2009.01271.x malti, t., häcker, t., & nakamura, y. (2009). sozial-emotionales lernen in der schule [socioemotional learning in schools]. zurich, switzerland: pestalozzianum verlag. minnameier, g. (2020). explaining happy victimizing in adulthood – a cognitive and economic approach, frontline learning research, 8 (5), 70-91. https://doi.org/10.14786/flr.v8i5.381 nunner-winkler, g. (2012). moral [morality]. in e. schneider & u. lindenmann (hrsg.), entwicklungspsychologie (s. 521-541). weinheim: beltz. nunner-winkler, g. (2007). development of moral motivation from childhood to early adulthood. journal of moral education, 36(4), 399-414. https://doi.org/10.1080/03057240701687970 nunner-winkler, g. (1999). development of moral understanding and moral motivation. in f. e. weinert & w. schneider (eds.), individual development from 3 to 12 (pp. 253-292). cambridge: cambridge university press. nunner-winkler, g. & sodian, b. (1988). children’s understanding of moral emotions. child development, 59, 1323-1338. doi: 10.2307/1130495 microsoft word 31-101-2-ce_salminen_final proof.docx frontline learning research 1 (2013) 72 80 issn corresponding author: jenni salminen, department of education, university of jyväskylä, p.o.box 35, jyväskylä, finland 40014, jenni.e.salminen@jyu.fi, t +358-40-805 4032, f +358-14-260 1761. doi case study on teachers’ contribution to children’s participation in finnish preschool classrooms during structured learning sessions jenni elina salminen a a university of jyväskylä, finland article received 15 may 2013 / revised 27 june 2013 / accepted 14 august 2013 / available online 27 august 2013 abstract the main aim of this study was to identify different teaching practices and explore the types of opportunities that they provide for children’s participation in four different finnish preschool classrooms for 6-year olds during structured learning sessions. observational data of four preschool teachers were analyzed according to the principles of qualitative content analysis. three themes of teachers’ practices were identified, which described the key practices through which teachers influence children’s participation, namely, through discussion and conversations; by referring to shared rules and managing the classroom; and through demonstrating pedagogical sensitivity and understanding towards children’s active participation. further, each teacher was observed implementing these practices in a unique combination in their classrooms, thus, creating different opportunities for participation. the four teachers showed a constructive, enabling, reserved or restrictive/unbalanced stance towards children’s participation. the results of this study highlight the importance of teachers’ pedagogically sensitive attitude as the key to children’s participation. given that the advantages of participation to learning and development are well established, the results also point to a need to evaluate the prevailing pedagogy and practices more closely from the perspective of participation. keywords: case-study; participation; preschool; teacher–child interactions; teaching practices. 1. introduction extensive research has suggested that one of the best ways to support learning is through encouraging active participation of children already in early childhood classroom contexts (e.g., pramling-samuelsson & j.e. salminen 73 | f l r sheridan, 2003; hännikäinen & rasku-puttonen, 2010). this study was set to explore teachers’ contribution to children’s participation, i.e., children’s right to experience respect and confidence in partnership with adults (cockburn, 2005; emilson & folkesson, 2006) in finnish preschool classrooms for 6-year old children. according to sociocultural approach interactional processes are the key elements for learning and development (vygotsky, 1978; mercer & littleton, 2007), so participation is also enabled in the interaction between teacher and children. participation demands that teacher values a child’s own ways of experiencing, understanding and exploring the world (pramling-samuelsson & sheridan, 2003), and that he or she is able to consider these practices as an important part of learning. further, genuine respect shown towards children by teachers has a significant impact on the relationships they build with children in care and educational backgrounds (laevers, 2005). thus, participation in educational settings can be seen to be contingent upon teachers’ decisions and ideas. through their professional role, teachers are the central figure in determining the learning opportunities available to children (hännikäinen, de jong, & rubinstein reich, 1997; pianta, 1999) and also how those children are encouraged to participate. according to recent studies the essential features that encourage children to participate are when teacher’s interest comes close to children’s own views (emilson & folkesson, 2006), when rules are negotiated and shared (e.g. bohn, roehrig, & pressley, 2004; hännikäinen, 2005) and when teachers provide children with a feeling of being part of the group and of being listened to (hännikäinen & rasku-puttonen, 2010; johansson & sandberg, 2010). in a previous study by salminen et al. (2013b), the contribution of teachers to the social life within preschool classrooms (i.e. for 6-year-olds) was explored through a ‘best-practices’ perspective. some of the practices that enhanced children’s participation included supporting children’s constructive and respectful friendships, working according to shared social rules in group contexts allowing individual children certain levels of leaderships and inviting children to contribute to simple decision making processes (salminen et al., 2013b). the inspiration for the current study was to extend these earlier findings, in particular those relating to participation. thus, i sought to investigate the naturally-occurring variation among a smaller sample of four finnish preschool teachers by identifying teachers’ key-practices and exploring the unique combinations of these practices that can be seen to provide ample support and opportunities for children to participate in different classroom contexts. in the field of participation studies, emilson and folkesson (2006) have studied how teachers’ control, in terms of classification and framing, affects children’s participation. the current study aimed to widen the perspective from teacher control to classroom interaction more broadly, since participation occurs in a socially shared network of interactions between adults and children. further, aim was to identify the ways in which teachers may affect children’s participation –– either by enhancing or preventing it –– during structured learning sessions. this was necessary, since a majority of the formal learning sessions (i.e., content driven purposeful sessions and about 45 minutes in length) in finnish preschool classrooms are constructed around teacher-led formats (e.g., hujala et al., 2012; salminen et al., 2013b). two related research questions were addressed. (1) what are the key practices by which finnish preschool teachers enable or disable children’s participation in a variety of classroom situations? (2) which combinations of teacher support do these key practices create for children’s participation in four different preschool classrooms? 2. methods 2.1 data the data for this study were collected as part of the large-scale ‘first steps’ follow-up study (lerkkanen et al., 2006). four finnish preschool teachers were selected as informants from the total of 49 of those participating in the ‘first steps’ follow-up study. in a previous study by salminen et al. (2012), the original 49 teachers were divided into four subgroups on the basis of observed classroom quality, as assessed j.e. salminen 74 | f l r with the classroom assessment scoring system (class; pianta, laparo, & hamre, 2008), utilizing the mixture modelling procedure of the mplus 5.0 statistical package. class is designed to measure the classroom level variables (i.e., observed indicators of classroom quality) in three domains: (1) emotional support, (2) classroom organization, and (3) instructional support, by rating each aspect numerically from 1 to 7. the profiles from which the cases of the current study were selected can be summarised as follows: profile 1 – highest quality (prevalence 53%); profile 2 – medium emotional, organizational, and instructional quality (prevalence 29%); profile 3 – medium to low emotional and instructional quality, medium organizational quality (prevalence 12%); and profile 4 – lowest quality (prevalence 6%). teachers for this study were selected to represent each of the four subgroups in order to investigate the maximum variation in practices among teachers as well as their relative representativeness throughout the whole dataset. further, a previous study by salminen et al. (2013a) partially utilized the same data of four teachers (with the exception of one teacher) in a case analysis that explored teachers’ instructional teaching practices. results from this work indicated that even the teachers at the higher end of the quality continuum employed only relatively low levels of the practices known to emphasise the role of active participation in children’s learning of deeper thinking skills. this was an important justification for further exploring the data for these four teachers: this time, more specifically from the perspective of participation. the qualitative observational data were collected through classroom observations in spring 2007, simultaneous with the live class observations. the observations were conducted on two different days during the morning assembly (i.e., times of more formal educational activities in the morning, before lunch, and nap time) and all of the teachers carried an mp3-player that recorded all teacher–child interactions. the length of each recording was, on average, 53 minutes. all of the recordings were transcribed, resulting 53 pages of transcribed text for the analysis of this study. 2.2 context and the participants of the study before beginning formal schooling at the age of 7 years, finnish children have a statutory right to receive a preschool education free of charge for 1 year. the core curriculum for preschool education (2000) serves as a binding guideline for preschool education throughout the country. nearly 100% of finnish 6year-old children attend preschool education (statistics finland, 2012; taguma, litjens, & makowiecki, 2012) despite its voluntary nature. all of the four teachers were finnish-speaking females, working in preschool classrooms with typical equipment and materials under the national guidelines provided by the core curriculum for preschool education (2000). of the four, diana and berta worked in larger groups of 22 and 24 children, respectively, with teacher’s aids in their classrooms; whereas cecilia and anna both worked in groups of seven children, with no teacher’s aids. however, in finnish preschool classrooms it is typical to divide large groups of children to smaller groups for the more formal learning sessions. hence, during the observed and recorded sessions, both diana and berta were working with smaller group of children (i.e., 8–10 children each). 2.3 data analysis data were analysed according to the principles of qualitative content analysis (patton, 2002; graneheim & lundman, 2004). the observational data for the four teachers were combined and analysed from the perspective of teachers’ practices through which teachers aimed to engage children to daily activities. these practices emerged during interactional episodes of varying lengths, and these episodes (each containing one or several meaningful interactional verbal and non-verbal expressions) were determined as the units of analysis for this study. the analytical process is illustrated in table 1. the first analytical interest of the study was in identifying certain commonalities in the practices of all four teachers. the episodes (i.e., units of analysis) were first combined into eight categories, which provided overarching concepts through which teachers’ practices could be further classified. each of the categories conceptualized teachers’ practices in relation to children’s participation without seeking individual patterns between teachers, but j.e. salminen 75 | f l r rather, by drawing together the practices in a more general level. second, the categories were revised and further combined to wider themes (i.e., pedagogical sensitivity and understanding; discussion and conversations; rules and management), which provided common and more generic denominators for the practice categories identified before. thus, these themes were generated on the basis of the practices that arose from the data of all four teachers, and can be seen to generally represent the key practices through which teachers either encourage or prevent participation of children during the structured learning sessions within this sample. table 1 identifying teachers’ key practices: describing the analytical process as the three themes represented general ways in which to deal with children’s participation, the second analytical interest was to further reflect the three themes (i.e., key practices) to each of the four individual teachers in order to determine which personal combinations of key practices characterized each of them. at this stage of the analysis i re-examined each teachers daily interaction with the children using the aspects provided by the three themes, and examples of individual ways to support children’s participation were gathered (e.g., how does this particular teacher use rules and management, discussions and establishes sensitivity in relation to children’s participation). as a result, each teacher was seen to represent a unique combination of the key practices, which created different opportunities for children’s participation. each teacher case was assigned with a descriptive name according to teachers’ prevailing stance towards children’s participation, namely: diana – constructive stance towards participation; cecilia – enabling stance towards participation; berta – reserved stance towards participation; anna – restrictive/unbalanced stance towards participation. these teacher cases and examples of the key practices will be introduced in the following paragraphs in detail. j.e. salminen 76 | f l r 3. results teachers’ key practices were displayed in unique combinations. these combinations created different learning environments and, thus, affected how children were encouraged to actively be part of a group, activities and the social network of their classrooms. the following results individually present the four teachers according to their unique combination of the key practices (i.e., combinations of pedagogical sensitivity and understanding; discussion and conversations; rules and management). 3.1 diana diana’s classroom was characterized by a constructive stance towards the children’s participation. this teacher was warm and respectful towards the children nearly all the time, establishing high pedagogical sensitivity. this was apparent as diana was well aware of the children’s needs and abilities, and she aimed to keep them engaged with the particular exercise or activities provided (e.g., by saying, “sam please tell the others”, or, “jonah, do you think you could tell what the number of the exercise at hands is?”, as well as, “please, alice, come here and help me to look for the missing syllable”). there were clearly established shared rules in the classroom, and as a result, teaching formed a logical and understandable entity that the children could easily follow, enjoy and participate in. the ways in which diana involved children in daily routines and activities consisted of subtle and delicate reminders of rules such as saying, “children, please listen, let’s listen to mandy for a moment more”, or by whispering softly, “raise your hand if you want to say something”. diana made an attempt to listen to children’s ideas: there were discussions on both academic and social issues. during these discussions diana made it easy for children to find a way to join in. for instance, she asked questions in a very whole-hearted manner, as if not only to hear the children but as if she was honestly pondering the same questions herself. for example, diana commented, “i really enjoyed the warmth of the sunshine today” and then asked, “but what do you think it has done to the snow outside?” when the teacher positioned herself at the children’s level like this it evoked very natural and easy participation from the children, and several such interactions occurred throughout the observed sessions. this type of behaviour, combined with provision of frequent opportunities for children to take turns to answer, for example in a show and tell, or to assist teacher in performing tasks, showed that diana was highly persistent and able in keeping children engaged in activities. her attitude towards the children’s ideas and comments showed she was aiming to understand what the children thought and were telling her. despite the fact that participation was occurring in a goal-oriented, teacher-led format all this time, children’s participation was nevertheless constructive (i.e., children were taken seriously and the classroom agenda was built on their active role). 3.2 cecilia cecilia’s classroom was characterized by an enabling stance towards children’s participation. cecilia repeatedly made children feel like she was listening to them and understood them (e.g., “i know you like these types of exercises, although they are a bit difficult”), indicating teacher’s pedagogical sensitivity. as she sensitively listened to the children, she was also able to monitor their needs and progress most of the time. however, every now and then she missed children’s hints. there were also clearly-established and shared rules in the classroom, which neither cecilia nor children had to be reminded of, and which made participation easier and also contributed to the coherence of the group. cecilia discussed subjects openly with the children throughout the observed sessions. she was, for instance, using children’s daily lives and own experiences efficiently as a tool to engage children in discussions. cecilia’s enabling stance towards participation was apparent when she used inviting questions during the learning sessions (e.g., “if you need to know what’s happening around the world, what types of sources of information can you think of?”, or, “today we are discussing of newspapers, do your mom or dad read the newspaper?”) as well as comments aimed at participation of individual children (e.g., “would you like to try to read this aloud andy?”). both diana and cecilia shared similar practices and personal warmth towards the children. however, throughout the observed sessions cecilia’s practices concerning children’s participation were slightly inconsistent; of the j.e. salminen 77 | f l r two teachers, cecilia’s attitude was less effective for truly understanding the children’s point of view. this was apparent as although cecilia provided children with opportunities to participate, she did not use children’s activity to construct the ideas to aid further learning as diana did and, thus, cecilia’s stance towards participation was enabling rather than constructive. 3.3 berta berta’s classroom was characterized by a reserved stance towards children’s participation. berta showed signs of ambivalent pedagogical sensitivity, since she seemed to be highly responsive towards children’s needs and aimed to achieve participation of the whole group, especially so during exercises and tasks (e.g., “roger’s answer was ‘a hat’. do you [saying to other children] think that roger’s answer was correct?”), but at other times she was less concerned about the children’s perspectives or about truly finding out their thoughts and ideas. the use of rules and management was structured, as berta was very efficient in teaching and managing the classroom. her teaching was logical and it was easy for children to comprehend. for instance, berta said, “you may come here and choose the word that corresponds with the picture, please use the pin and place the word beside the picture”. berta was talking to the children nearly all the time, however, she was restricting children’s participation to discussions by giving children rather short turns, and as a result the children usually only gave answers to the teacher’s questions or produced a few words or short sentences (e.g., “with which letter does the word peruna [potato] begin?”, or, “you are right, this is the face of the person, but could you be a bit more specific? which part of the face is the correct answer?”). as a consequence, berta’s reserved stance towards participation was most clearly apparent in the use of highly structured tasks that allowed only very few chances for children’s ideas or discussions to be used as a valuable way for children to learn and interact. 3.4 anna anna’s classroom was characterized by restrictive/unbalanced stance towards children’s participation. anna had occasional difficulties in monitoring the behaviour, needs and academic performance of the children. she was probably more aware of the children’s academic skills and needs (e.g., inviting children to goal-oriented tasks by using hints, or providing individual additional tasks) rather than their emotional needs (e.g., being unable to soothe restless children and assist them to participate in on-going activities), thus, establishing lower and unbalanced pedagogical sensitivity towards children’s emotional needs. the classroom in general was somewhat disorganized since anna’s practices were inefficient in managing her classroom. anna discussed topics with children, but due to their misbehaviour it was difficult to create an equal and content-driven discussion, when a majority of her time was used to discuss managerial issues. she was, in a sense, forced to cut down children’s turns at the expense of organization to be able to continue working. for instance, anna said, “it is not your turn to speak now”, or, “you are not allowed to speak until you sit quietly and still”, as well as, “you need to step outside unless you can’t be quiet”. as a consequence, autonomous opportunities were not provided to children and children’s participation was discontinuous or even restricted. 4. discussion in relation to the first research question, analysis of the four teacher cases indicated key practices in the four classrooms (i.e., pedagogical sensitivity and understanding; discussion and conversations; rules and management), that were related to children’s participation in preschool classrooms. in addition, the teacher cases showed four different combinations of teacher support, which created unique opportunities for children to participate in both the on-going activities and the social network of their classrooms. these can be discussed further as a response to the second research question. diana and cecilia had established a j.e. salminen 78 | f l r combination of (1) teacher’s pedagogical sensitivity and understanding towards children’s needs, (2) utilizing constructive and shared rules, and (3) involving children to conversations, whereas berta and anna provided fewer opportunities for children to participate. it was noted that both berta and anna had managerial practices that restricted active participation, but for very different reasons. the reason why children’s participation was infrequent in anna’s classroom was that management took too much time because the rules were not clear or shared, whereas in berta’s classroom, which was highly structured, participation did not occur on children’s terms and was thus reserved in nature. these observations indicate the importance of constructive classroom management and organization for children’s participation (see also emilson & folkesson, 2006): neither the lack of behavioural control nor too highly structured management are good ways of enhancing children’s participation. berta and diana shared similar well-managed rules in their classrooms, but the warmth and pedagogical sensitivity was different for these two teachers. diana was perceptive, and identified children’s needs, whereas berta was more concerned with working according to the plans she had made. in a previous study, sandberg and eriksson (2010) highlighted the importance of the intensive respectful discussions between teacher and children in encouraging children’s participation. in addition, constructive and coherent rules and management provide support for working as a group (bohn, roehrig, & pressley, 2004; hännikäinen, 2005). in light of the results of the present study, it seems that neither the intensive respectful discussions between teacher and children nor coherent rules and management alone can create practices that enhance children’s participation within these preschool classrooms. in order to be meaningful, participation requires a teacher’s pedagogical awareness and respectful attitude (pramling-samuelson & sheridan, 2003). this attitude enables teachers to see children’s participation as an important and usable way of learning in preschool. this is of great significance, since being a part of the group is one of the most meaningful things from children’s perspective too (e.g., einarsdottir, 2010). my study indicates that teachers enhance children’s participation through the simple daily routines and pedagogical choices that they make, an idea that is by hännikäinen & rasku-puttonen (2010). however, the findings of this study also showed aspects that may hinder participation in classrooms and unfortunately, such aspects included typical ways of working in preschool classrooms during formal content-driven and teacher-led learning sessions. such practices included working in a predominantly teacher-led format with relatively little control offered to children in deciding or determining what to do, or providing classroom management and rules that are too strict to allow frequent participation. within the classrooms studied, it was teachers’ determination and open-minded stance towards participation that seemed to make a positive difference. further studies are needed to widen the perspective from teachers’ practices to include child interviews or child observations, since in its current form this study cannot suggest how children experienced or perceived the different classroom environments and practices. moreover, it is noteworthy that participation takes different forms depending on the age of the children in the group as well as the cultural expectations (e.g., national curriculums, legislations) addressed within an educational setting. hence, it is necessary to raise a scientific discussion of the importance of children’s participation, and conduct studies in a variety of countries and contexts to gains deeper knowledge and understanding of how participation is experienced and what enhances it in different educational settings. the findings introduce exemplary practices for preschool education and for the discussion about the importance of teachers’ role in enhancing the active role of children in preschool classrooms. the results may provide both practical and educational implications for teachers in their daily work with children by promoting awareness of preschool teachers to the role of teaching practices and teacher–student interactions for children’s participation. since not all teachers were able to fully support children’s participation, this issue should be addressed more carefully in future research and teacher training, and also from the children’s perspective. j.e. salminen 79 | f l r keypoints observational data of four finnish preschool teachers were analysed according to the principles of qualitative content analysis. three themes indicated teachers’ key practices which were related to children’s participation: namely, discussion and conversation; rules and management; pedagogical sensitivity and understanding. combinations of key practices created ample opportunities for children to participate in each classroom. teachers showed a constructive, enabling, reserved or restricted/unbalanced stance towards children’s participation. acknowledgements this study has been carried out in the centre of excellence in learning and motivation research and is financed by the academy of finland (no. 213486 for 2006–2011) and other grants from the same funding agency (no. 213353 for 2005–2008, and no. 125811 for 2008–2009). the author gratefully acknowledges the personal support of prof. maritta hännikäinen and dr. pirjo-liisa poikonen in preparing the manuscript. references bohn, c. m., roehrig, a. d., & pressley, m. (2004). the first days of school in the classrooms of two more effective and four less effective primary-grades teachers. the elementary school journal, 104, 269– 287. cockburn, t. (2005). children’s participation in social policy: inclusion, chimera or authenticity? social policy and society, 4, 109–119. core curriculum for preschool education in finland. (2000). esiopetuksen opetussuunnitelman perusteet [the core curriculum for preschool education 2000]. helsinki, finland: national board of education. retrieved from http://www.oph.fi/download/123162_core_curriculum_for_pre_school_education_2000.pdf einarsdottir, j. (2010).children’s experiences of the first year of primary school. european early childhood education research journal, 18(2), 163–180. emilson, a., & folkesson, a.-m. (2006). children’s participation and teacher control. early child development and care, 176, 219–238. graneheim, u. h., & lundman, b. (2004). qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. nurse education today, 24, 105–112. hujala, e., backlund-smulter, t., koivisto, p., parkkinen, h., sarakorpi, h., suortti, o., … korkeakoski, e. (2012). esiopetuksen laatu 2012. [the quality of pre-primary education]. publications by the finnish education evaluation council 61. jyväskylä, finland. retrieved from http://www.edev.fi/img/portal/1354/julkaisu_61.pdf hännikäinen, m. (2005). rules and agreements and becoming a preschool community of learners. european early childhood education research journal, 13, 97–110. hännikäinen, m., de jong, m., & rubinstein reich, l. (1997). our heads are the same size! a study of quality of the child’s life in nordic day care centres. educational information and debate 107. malmö, sweden: malmö university school of education. hännikäinen, m., & rasku-puttonen, h. (2010). promoting children’s participation: the role of teachers in preschool and primary school learning sessions. early years: an international journal of research and development, 30, 147–160. j.e. salminen 80 | f l r johansson, i., & sandeberg, a. (2010). learning and participation: two interrelated key-concepts in the preschool. european early childhood education research journal, 18, 229–242. laevers, f. (2005). the curriculum as a means to raise the quality of early childhood education. implications for policy. european early childhood education research journal, 13(1), 17–29. lerkkanen, m.-k., niemi, p., poikkeus, a.-m., poskiparta, e., siekkinen, m., & nurmi, j.-e. (2006). the first steps study (alkuportaat). unpublished data. university of jyväskylä, finland. mercer, n., & littleton, k. (2007). dialogue and the development of children’s thinking: a sociocultural approach. new york, ny: routledge. patton (2002). qualitative research & evaluation methods (3rd ed.). thousand oaks, ca: sage. pianta, r. c. 1999. enhancing relationships between children and teachers. washington, dc: american psychological association. pianta, r. c., laparo, k. m., & hamre, b. k. (2008). the classroom assessment scoring system. manual, pre-k. baltimore, md: paul h. brookes. pramling-samuelsson, i., & sheridan, s. (2003). delaktighet som värdering och pedagogic [participation as valuation and pedagogy]. pedagogisk forskning i sverige 8, no. 1/2,70–84. salminen, j., lerkkanen, m.-k., poikkeus, a.-m., siekkinen, m., pakarinen, e., hännikäinen, m., poikonen, p.-l., & rasku-puttonen, h. (2012). observed classroom quality profiles of kindergarten classrooms in finland. early education and development, 23, 654–677. salminen, j., hännikäinen, m., poikonen, p.-l., & rasku-puttonen, h. (2013a). a descriptive case analysis of instructional teaching practices in finnish preschool classrooms. journal of research in childhood education, 27, 127–152. salminen, j., hännikäinen, m., poikonen, p.-l., & rasku-puttonen, h. (2013b). teachers’ contribution to the social life in finnish preschool classrooms during structured learning sessions. early child development and care, doi:10.1080/03004430.2013.793182. sandberg, a., & eriksson, a. (2010). children’s participation in preschool – on the conditions of the adults? preschool staff’s concepts of children’s participation in preschool everyday life. early child development and care, 180, 619–631. statistics finland. (2012). suomen virallinen tilasto (svt): esija peruskouluopetus. [official finnish statistics: preschool and primary education]. helsinki: tilastokeskus. retrieved from http://www.tilastokeskus.fi/til/pop/index.html taguma, m., litjens, i., & makowiecki, k. (2012). quality matters in early childhood education and care: finland. organisation for economic co-operation, & development (oecd). retrieved from http://www.oecd.org/edu/preschoolandschool/49985030.pdf vygotksy, l. s. (1978). mind in society. the development of higher psychological processes. cambridge, ma: harvard university press. frontline learning research 6 (2014) 46-55 issn 2295-3159 corresponding author: susan goldman (susan.goldman@gmail.com) doi: http://dx.doi.org/10.14786/flr.v2i4.117 46 | f l r perspectives on learning: methodologies for exploring learning processes and outcomes susan r. goldman learning sciences research institute university of illinois, chicago, usa article received 27 may 2014 / accepted 2 december 2014 / available online 23 december 2014 abstract the papers in this special issue were initially prepared for an earli 2013 symposium that was designed to examine methodologies in use by researchers from two sister communities, learning and instruction and learning sciences. the four papers reflect a common ground in advances in conceptions of learning since the early days of the “cognitive revolution” in the 1960s. this commentary shows the interdependence between advances in theory and advances in methodologies. four shifts in conceptions of learning are described. that these shifts are evident in the work of both communities suggests a blurring of the boundaries between the two. keywords: learning, collaboration, mixed methods s. goldman 47 | f l r the papers in this special issue were initially prepared for an earli 2013 symposium that was designed to examine methodologies in use by researchers from two sister communities, learning and instruction and learning sciences. a main goal was to explore what these methodologies might reveal about underlying conceptions of learning and potential common ground across the two communities. indeed, the four papers, taken together, reflect a common ground in advances in conceptions of learning since the early days of the “cognitive revolution” in the 1960s. the papers depict highly systematic, thoughtful, and rigorous approaches to studying learning as it is happening whether individually, in small groups or large; in classrooms, in workplaces, or in labs; face to face or virtually; in one moment in time or over extended periods. they also illustrate the interdependence between advances in theory and advances in methodologies. during the 30 year period from 1960 – 1990, the majority of studies of learning took place in one location: the laboratory; looked at individual cognition as a function of an operationally defined and restricted set of variables, in one time frame with occasional return visits. although some researchers were examining learning in the context of tasks students might be asked to do in school, many of the “learning” situations were set up as experiments that were highly constrained to maintain experimental control over “extraneous” variables; as well the tasks were frequently “toy” problems that had little relevance to classrooms or other contexts outside of cognitive theory and the academic settings in which the research was being conducted. consequently, it was difficult to see how findings from the lab could possibly have relevance to everyday learning. emphases were on manipulating characteristics of the materials, the task, or both and observing how people “solved” the tasks and their success at doing so with some emphasis on understanding how they had completed the tasks. some of the findings emerging from that research shaped instructional studies in which people were instructed to summarize sections of text, underline main ideas, break down tasks into subtasks before solving them, group words based on taxonomic categories to improve memory, and similar heuristics. published studies of this sort attest to the success of these approaches in building a cognitive theory of learning and problem solving that superceded extant behaviourist/empiricist views (cf. greeno, collins, & resnick, 1996). there was, however, little uptake of these theories by educational practitioners. in addition to changes in the conceptualization of learning and the kinds of questions being asked about learning, data analytic methodologies have come a long way since the 1960s. some of the (older) readers of this article will remember the days of doing anovas by hand, with “advances” marked by programmable wang calculators, and main frame programs that automated the process. of course, you had to batch process jobs, submitting stacks of punch cards containing the data (all the time living in fear that you would drop the deck and have to start again making sure they were in the right order) and then wait for the print out. turnaround varied from 10 minutes to 24 hours. over the past 30 years there have been huge advances in the technologies and data analytic applications available on devices that are small enough to carry around the way we once carried pads of paper and notebooks (not electronic ones). these technologies have expanded the ways we dare to think about analyzing our data, enabled us to collect and make sense of new forms of data, and automated or semi-automated analysis methods that we used to do solely by hand. currently, the learning sciences and learning and instruction communities operate with theoretical frameworks on learning that reflect more complex views of learning in four major ways. we now understand and attempt to study learning 1. in multiple and iteratively designed environments 2. over multiple time scales 3. occurring in social groups of multiple and collaborating individuals s. goldman 48 | f l r 4. with effects evident at multiple levels ranging from behavioural to neural. as a set, the papers in this special issue reflect these shifts in conceptions of learning and its investigation and provide us with methodological tools that enable us to rigorously investigate learning processes and outcomes despite the greater complexity of doing so. there is evidence of one or more of these shifts in conceptions of learning among researchers who identify with learning sciences, as well as among those who identify with learning and instruction, suggesting something of a convergence, or at least a blurring of the boundaries of the two communities. 1. learning in multiple and iteratively designed environments design-based research marked a pivotal shift in perspectives on learning and its study in classrooms. svihla (this issue) provided an excellent description of the goals of this research approach. up until the time that this approach to research on learning was introduced in the early 90s by ann brown (1992) and allan collins (1992), educational research in schools typically took the form of relatively short-term experiments that involved comparisons of the effects of different methods or materials on various cognitive skills. the studies were usually conducted by the researchers. teachers “cooperated” with the researchers in terms of providing access to their students for the duration of the study but were otherwise minimally involved in contributing to the instructional design or the materials. a major goal of this research was ascertaining which instructional methods were better than others for achieving largely cognitive objectives such as more accurate mathematics performance, better memory for new vocabulary, and better comprehension of text. accordingly, assessments were designed to measure changes in students’ performance as a function of having participated in the study either in the “experimental treatment” or the “control” group. along with these types of cognitive studies there were similarly designed studies that examined the impact on individuals of having worked in cooperative groups (e.g., johnson & johnson, 1999; see for review webb & palincsar, 1996). as svihla described, the goals of dbr reflected a fundamental shift to an emphasis on studying learning processes in situ as both social and interactional (collins, 1992). learning processes were studied in the context of designed learning environments developed through collaborations of researchers and practitioners and based on principles that constituted a learning theory. enactments of designs were objects of study for purposes of understanding how, with the understandings that emerged from close study of, and reflection on, the interactions and student work informing iterative refinement of the learning theory principles, and designs. svihla does an excellent job of depicting the ways in which dbr has developed since its initial introduction. suffice to say it made apparent the need for methodologies to capture processes occurring over multiple time scales and among individuals in social configurations. 2. learning processes over multiple time scales complex views of learning make it clear that processes occur over time, with different learning processes occurring at different time scales. molenaar (this issue) provided an excellent rationale for the need for temporal analysis methods. she described a variety of the issues involved in shifting from a focus on whether a particular construct has been learned or not to a focus on how that construct is learned, what that learning looks like at different time scales, and indeed what constructs are conceptualized as emerging over longer versus shorter time frames (cf. lemke, 2000). she referenced a variety of constructs that we now s. goldman 49 | f l r think of as emerging over events and across time but that used to be thought of as personality traits (e.g., motivation, persistence). she discussed various computational tools that can aid in segmenting, coding, and relating different time scales. these methodologies are critical to doing the analyses needed to understand learning over different time scales. a variety of issues face us as individual researchers and as a community as we apply methodologies for temporal analysis: what units of time are appropriate for particular constructs of interest, especially when multiple time scales operate in parallel? how do we determine the time scale most appropriate for tracing the emergence of a construct over time? or alternatively, how do we capture the interrelationships between events occurring over time but at different time scales? of potentially many patterns of events that might be extracted by pattern detection software, how do we determine which are psychologically meaningful and at what scale of time they are meaningful? equally necessary are new forms of representation that can assist us in conveying our findings to the broader community. molenaar (this issue) presented one form of representation. figure 1 illustrates a different form of representation in which we plotted the discourse moves of three students comprising a small group engaged in a science investigation (radinsky, goldman, doherty, & ping, 2010). this particular figure shows the moves for the first day of the investigation. we plotted similar representations for each day and then used the graphs to identify regions where there were clusters of moves across the three students that suggested there were interesting dialogic discourses occurring. we then “dove” into these segments of the discourse to determine the character of the “argumentation” in which the students were engaged and whether the claims and evidence being offered were similar or different later in the investigation versus earlier. as well, we considered how participation and roles in the discourse revealed dimensions of identity and positioning with respect to disciplinary competence. this form of representation was a useful analytic tool and with some refinement might be a useful way to represent the time course of argument development (radinsky, et al., 2010). molenaar highlighted the need to conceptualize different dimensions of time in order to define important temporal characteristics. she cited papers by bloome, et al. (2009) and lemke (2000) as informing this discussion. in addition, bahktin’s (1981) framings of time in relation to discourse, meaning, and learning will be a useful resource. we also need to (re)connect learning and development. indeed, the move from the more traditional educational research paradigms to dbr and related learning sciences methodologies creates a convergence between research on development and research on learning. that is, one distinction between development and learning had traditionally been the time frame over which phenomena of interest emerged. those that occurred over multiple years were called developmental, e.g., oral language; those over minutes or hours, learning, e.g., declarative knowledge such as “c – a – t” spells cat; or propositions such as earli is a professional research organization. arguably, a second distinction was whether the phenomenon emerged with or without formal instruction, the latter being deemed developmental phenomena and the former learning. for example, children develop oral language but learn to read with explicit instruction. because developmental psychologists have long been concerned with the study of change over time, they have developed techniques that examine change over relatively longer periods of times such as growth analysis (willett, 1989), as well as techniques that look at moment to moment change, such as sequential analysis (bakeman & quera, 1995) and microgenetic analysis (e.g., kuhn, 1995; siegler & stern, 1998). these methods are rich resources for examining learning over multiple time scales. s. goldman 50 | f l r 3. learning in social groups of multiple and collaborating individuals a core assumption of the learning sciences is that learning is social and interactional and takes place through situated activity (brown, collins, & duguid, 1989). collins, brown, and newman (1989) labeled the approach cognitive apprenticeship, reflecting the importance of observing the habits of mind as well as the actions of the more knowledgeable others in the community (cf. vygotsky, 1978). hence, discourse about activity and interaction with others engaged in the activity became a central focus for understanding learning. researchers with intellectual roots in a variety of disciplines have long relied on discourse among participants in a joint activity as a window into knowledge building processes of groups as well as individuals, and, along with gestures, into processes of learning through joint activity (gee, 1992; goodwin, 1994; hutchins, 1995; lave & wenger, 1991; sawyer, 2006; scardamalia & bereiter, 1991; schegloff, 1991, 2007). video and audio recordings have typically provided the raw data and various types of very time-consuming and intensive qualitative analyses have been used by researchers to provide evidence for claims about learning outcomes and processes. frequently and understandably given the labor-intensive nature of these analyses, the evidence provided in any one empirical report has tended to be based on relatively small data sets or corpora. many computer-supported collaborative learning (cscl) environments make available written traces of learning interactions that can also be mined to understand learning processes and outcomes for individuals and for groups. although initially these were also analyzed by humans using processes similar to those used for coding discourse that was transcribed from video and audio recordings, a number of computer-assisted methods have been developed that make the work of coding less time-consuming. stegmann (this issue) argued that to understand the mechanisms that produced enhanced learning outcomes in cscl, three hypotheses needed to be tested. these have been conceptualized as a “triangle of hypotheses:” “(a) instructional/technological support facilitates learning activities; (b) facilitated learning activities have positive effects on learning outcomes; and (c) mediated by learning activities, instructional/technological support has a positive effect on learning outcomes.” (stegmann, this issue, p. #, citing wecker, stegmann & fischer, 2013; fig. 1). it could be argued that these three hypotheses can be thought of as constituting an activity system (engeström, 1987, 2001) in which tools (technology support), activities, and performances of individuals and groups exist in interaction with one another and over time. stegman and colleagues argue that conceptualizations other than experimental designs are needed to establish relationships between learning tools, activities, and outcomes. they propose the use of nomological nets to ensure that direct and mediating relationships between the tools and outcomes can be tested. nomological nets specify what constructs are indexed to which observables over what time frames. as well interrelationships among constructs are specified. empirical evidence derived from collaborative activities constitute input to revisions and refinements of theoretically grounded nomological nets. these revisions may reflect mediational variables that become evident through indepth analyses of the discourse, of changes in the interactions and discourse over time, and at different ”units” of analysis (e.g., individual, dyad, small groups, entire activity system). stegmann (this issue) argued that the indepth analyses required to “test” initially specified nomological nets should take advantage of statistical techniques designed to detect patterns of interactions as they occur over time. these techniques require some form of quantified information; therefore, qualitative analyses need to be quantified. fortunately, there are a number of computational algorithms that can assist researchers in doing so. importantly, these systems assist researchers in parsing the input as well as counting instances of particular codes and discovering repeating sequences of codes. the construct specification required by nomological nets is one way of ensuring that codes, sequences of codes, and s. goldman 51 | f l r recurring patterns relate to theoretically meaningful constructs. thus nomological nets can assist researchers in determining whether “discovered” patterns have psychological validity and practical utility. construct specification in nets can also assist with an additional “sticky wicket” in efficient yet automated detection of meaningful patterns of interactions. essentially, over what time frame are patterns of interaction to be detected and at what levels? that is, if a pattern is detected in a series of successive turns does that pattern then become a “unit” that can act as input to a subsequent pattern analysis effort? how are such patterns related to constructs in the net? one might envision a series of intermediate level patterns being inferred from turn by turn coding. these intermediate levels (essentially patterned sequences of turns) are the units upon which further pattern detection analyses are conducted. determining and optimizing appropriate time scales over which patterns of code sequences are constituted depends on understanding the intentions and assumptions of specific designed learning environments, particularly what and when specific processes are expected; why, and how they support expected outcomes. although patterns and sequences can be detected automatically it will take human interpretive lenses and socio-cognitive theories to specify the constructs these index and their meaningfulness in the context of learning. the issue of levels is relevant not only to pattern detection but to learning in general, as reflected in the fourth aspect of a more complex view of learning. 4. the effects of learning are evident at multiple levels ranging from behavioural to neural. learning is “visible” at different levels. de smedt (this issue) is to be applauded for emphasizing the need for alignment between the level and topical focus of research questions and the methods selected to investigate the questions. he pointed out that if the research question is targeted at the macrolevel, behavioral methods would be most appropriate. cognitive neuroscience methods become appropriate for research questions focused on microlevel processes. the two most common cognitive neuroscience methods are electroencephalography (eeg) and functional magnetic resonance imaging (fmri). eeg methods provide temporal information about when particular processes are taking place and fmri methods provide spatial information about where in the brain processes are taking place. cognitive theories provide needed links between behavioral and neural levels. furthermore, echoing the point made above – that socio cognitive theory needs to guide the interpretation of patterns in interactions, de smedt (this issue) called for detailed cognitive theory of learning phenomena to provide needed links between behavioral and neural levels. he cited the cognitive theories and the behavioral data on which they are based as critical for interpretations of the information that results from the application of cognitive neuroscience methods. to demonstrate his claims, de smedt (this issue) illustrated three ways in which cognitive neuroscience methods elucidate mathematical instruction. these are convincing demonstrations of the value added of obtaining data on the same phenomenon at multiple levels and coordinating findings across levels. predictions can be pursued across levels by postulating what should be the case at one level based on manifestations at another level. the value added of using multiple methods to examine learning at multiple levels is not restricted to mathematics. for example, in the area of language acquisition, researchers have used eeg methods to establish predictive relationships between phonemic and word-level development. specifically, infants below six months of age are sensitive to phonetic contrasts in all languages; between six and 10 months, a perceptual narrowing process occurs that results in sensitivity to only those phonetic contrasts that matter in their native language. eeg methods have established that better neural discrimination of native language phonetic contrasts is associated with faster vocabulary development (kuhl & rivera-gaxiola, 2008). kuhl s. goldman 52 | f l r and colleagues have also used neural activation patterns to determine that the perceptual narrowing process occurs several months later for infants reared in two-language homes compared to those reared in monolingual homes. finally, neural indicators of phoneme learning demonstrate that social interaction plays a critical role in language acquisition (kuhl, 2007). in each of these cases, evidence from the neural level provides measures of far greater precision than could be obtained behaviorally. 5. summary and challenges more complex views of learning require more complex methodologies for addressing key questions about learning and the conditions that support it, including explicit instruction. the four papers in this special issue illustrate methodologies that assist with capturing the iterative design-based research process, learning processes and outcomes that occur at different time scales and levels, and make possible the formulation and testing of hypotheses that relate different levels to one another. the papers present examples of ways in which these methodologies are augmenting the knowledge base for understanding learning as it occurs across individuals as well as within individuals. as such they make valuable contributions to the field.moving forward, there are a number of areas that need attention in terms of further theoretical and methodological development. briefly, more emphasis needs to be devoted to formative assessment that provides opportunities to better facilitate instructional processes and outcomes. this includes the design and testing of tools for capturing learning interactions that are classroom, teacher, and student friendly. such tools would enable students and teachers to reflect on their learning processes as well as outcomes at much finer levels of detail than is currently feasible. ideally, researchers would develop and test various technology-based tools for accomplishing these goals and would then engage in “user testing” of tools that travel outside research labs and into the hands of teachers and learners. a type of tool that would be helpful in this process is one that enables visualizations of the ebb and flow of learning processes across people and across time. finally, the learning sciences community has tended to design within specific disciplines and fields; the learning and instruction community has tended to test principles and variables thought of as general across all learning situations. neither perspective has as yet come to grips with the tension between generalist and discipline-specific views of learning nor the limitations of each view. what is needed are studies that 1) embrace a disciplinary perspective but that also situate that discipline in the context of epistemological orientations and inquiry methods that have been adopted and developed within other disciplinary communities; and 2) examine the “fit” of principles, constructs, and explanatory mechanisms suggested by cognitive, developmental, and social psychological research to learning phenomena observed in designed learning environments. studies of the first type will advance our understanding of the general and idiosyncratic aspects of learning in different disciplines. studies of the second type will advance our understanding of explanatory mechanisms that have traction across a wide versus narrow band of learners and situations of learning. there are also aspects of learning processes and outcomes that need far more systematic and sustained research over shorter and longer time scales. specifically, we need to conduct systematic research on relationships among persistence, engagement, identity, learning processes and outcomes, within and across formal and informal contexts of learning. some of this research is currently being conducted; more of it needs to be conducted. the methodologies discussed in these papers can be synergistic with respect to tackling these challenges. s. goldman 53 | f l r keypoints contemporary views of learning depart from solely “in the head” views of learning. learning occurs in multiple and iteratively designed environments over multiple time scales. learning occurs in social groups of multiple and collaborating individuals with effects evident at multiple levels ranging from behavioral to neural. new methodologies are needed to capture the processes and outcomes of this complex perspective on learning. acknowledgments the writing of this paper was supported, in part, by the institute of education sciences, u.s. department of education, through grant r305f100007 to university of illinois at chicago. the opinions expressed are those of the authors and do not represent views of the institute or the u.s. department of education. references bakhtin, m. m. (1981). forms of time and of the chronotope in the novel: notes toward a historical poetics. in m. m. bakhtin the dialogic imagination: four essays, trans. by caryl emerson and michael holquist (pp. 8 – 258). austin, tx: university of texas press. bakeman, r., & quera, v. (1995). analyzing interaction: sequential analysis with sdis and gseq. ny: cambridge university press. bloome, d., beierle, m., grigorenko, m., & goldman, s. r. (2009). learning over time: uses of intercontextuality, collective memories, and classroom chronotopes in the construction of learning opportunities in a ninth-grade language arts classroom. language and education, 23(4), pp. 313-334. brown, a. l. (1992). design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. the journal of the learning sciences, 2, 141-178. brown, j. s., collins, a, & duguid, p. (1989). situated cognition and the culture of learning. educational researcher, 18, 32 – 42. collins, a. (1992). toward a design science of education. in e. scanlon & t. o’shea (eds.), new directions in educational technology (pp. 15-22). berlin: springer-verlag. collins, a., brown, j. s., & newman, s. e. (1989). cognitive apprenticeship: teaching the craft of reading, writing, and mathematics. in l. b. resnick (ed.), knowing, learning, and instruction: essays in honor or robert glaser (pp. 453-494). hillsdale, nj: lawrence erlbaum associates. de smedt, b. (this issue). advances in the use of neuroscience methods in research on learning and instruction. frontline learning research. engeström, y. (1987). learning by expanding: an activity-theoretical approach to developmental research. helsinki: orienta-konsultit. engestrӧm, y. (2001). expansive learning at work: toward an activity theoretical reconceptualization. journal of education and work, 14(1), 133-156. fischer, f., kollar, i., stegmann, k. & wecker, c. (2013). towards a script theory of guidance in computersupported collaborative learning. educational psychologist, 49(1), 56-66. garcia-sierra, a., rivera-gaxiola, m., percaccio, c. r., barbara t. conboy, b. t. romo, h., klarman, l., ortiz, s., kuhl, p. k. (2011). bilingual language learning: an erp study relating early brain responses to speech, language input, and later word production. journal of phonetics, 39, 546-557. gee, j. p. (1992). the social mind: language, ideology, and social practice. ny: bergin & garvey. http://www.sciencedirect.com.proxy.cc.uic.edu/science/article/pii/s0095447011000660 http://www.sciencedirect.com.proxy.cc.uic.edu/science/article/pii/s0095447011000660 s. goldman 54 | f l r goodwin, c. (1994), professional vision. american anthropologist, 96, 606–633. greeno, j. g., collins, a. m., and resnick, l. b. (1996). cognition and learning. in d. c. berliner and r. c. calfee (eds.), handbook of educational psychology (pp. 15-46). new york: macmillan. johnson, d. w., & johnson, r. t. (1999). making cooperative learning work. theory into practice 38, 6773. hutchins, e. (1995). cognition in the wild. cambridge, ma: mit press. kuhl, p. k. (2007). is speech learning “gated” by the social brain? developmental science 10, 110–120. kuhl, p. k. & rivera-gaxiola, m. (2008). neural substrates of language acquisition. annual review of neuroscience, 34, 511 – 534. kuhn, d. (1995). microgenetic study of change: what has it told us? psychological science, 6, 133-139. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge, england: cambridge university press. lemke, j.l. (2000). across the scales of time: artifacts, activities, and meanings in ecosocial systems. mind, culture and activity, 7(4), 273–290. molenaar, i. (this issue). advances in temporal analysis in learning and instruction. frontline learning research. radinsky, j. l., goldman, s. r. doherty, r. & ping, r. (2010, june). small group argumentation with visual data: negotiating what is seen and what it means. paper presented at the annual meeting of the american educational research association, denver, co. sawyer, r. k. (2006). analyzing collaborative discourse. in r. k. sawyer (ed.), cambridge handbook of the learning sciences (pp. 187 – 204). ny: cambridge university press. scardamalia, m., & bereiter, c. (1991). higher levels of agency for children in knowledge building: a challenge for the design of new knowledge media. journal of the learning sciences, 1, 37 -68. schegloff, e. a., (1991). conversation analysis and socially shared cognition. in l. b. resnick, j. m. levine, & s. d. teasley (eds.), perspectives on socially shared cognition (pp. 150 – 171). washington, dc: american psychological association. schegloff, e. a., 2007. sequence organization in interaction: a primer in conversation analysis. cambridge: cambridge university press. siegler, r. s., & stern, e. (1998). conscious and unconscious strategy discoveries: a microgenetic analysis. journal of experimental psychology: general, 127, 377-397. stegmann, k. (this issue). advances in the analysis of computer-supported collaborative learning processes. frontline learning research svihla, v. (this issue). advances in design-based research. frontline learning research. vygotsky, l. s. (1978). mind in society: the development of higher psychological processes. cambridge: harvard university press. webb, n. m., & palincsar, a. s. (1996). group processes in the classroom. in d.c. berliner & r.c. calfee (eds.), handbook of educational psychology, (pp. 841-873). ny: macmillan library reference. wecker, c., stegmann, k., & fischer, f. (2012). lernund kooperationsprozesse: warum sind sie interessant und wie können sie analysiert werden? [learning and cooperation processes in casebased learning. interesting issues and analysis approaches] report: zeitschrift für weiterbildungsforschung, 35(3), 30-41. doi:10.3278/rep1203w willett, j. b. (1989). some results on reliability for the longitudinal measurement of change: implications for the design of studies of individual growth. educational and psychological measurement, 49, 587-602. s. goldman 55 | f l r figure 1. representation of the discourse moves of three students engaged in a science inquiry task. (cf. radinsky, j. l., goldman, s. r. doherty, r. & ping, r. (2010, june). small group argumentation with visual data: negotiating what is seen and what it means. paper presented at the annual meeting of the american educational research association, denver, co.) frontline learning research 6 (2014) 35-45 issn 2295-3159 corresponding author: vanessa svihla, organization, information & learning sciences, university of new mexico, albuquerque nm 87131, usa. email: vsvihla@unm.edu doi: http://dx.doi.org/10.14786/flr.v2i4.114 35 | f l r advances in design-based research vanessa svihla a a university of new mexico, usa article received 26 may 2014 / revised 1 november 2014 / accepted 11 november 2014 / available online 23 december 2014 abstract design-based research (dbr) is a core methodology of the learning sciences. historically rooted as a movement away from the methods of experimental psychology, it is a means to develop ―humble‖ theory that takes into account numerous contextual effects for understanding how and why a design supported learning. dbr involves iterative refinement of both designs for learning and theory; this process is illustrated with retrospective analysis of six dbr cycles. calls for educational research to parallel medical research has led learning scientists to strive for more specific standards about what constitutes dbr and what makes it desirable, especially regarding robustness and rigor. a recent trend in dbr involves efforts to extend the reach through scalability. these developments potentially endanger the designerly nature of dbr by orienting focus toward generalizability, meaning researchers must be vigilant in their pursuit of understanding how and why learning occurs in complex contexts. keywords: design-based research; learning sciences; research methods v. svihla 36 | f l r 1. overview of design-based research design-based research (dbr) is a core methodology of the learning sciences. begun as a movement away from experimental psychology, dbr was proposed as means to study learning amidst the ―blooming, buzzing confusion‖ of classrooms (brown, 1992, p. 141). it is a way to develop theory that takes into account numerous contextual effects for understanding how and why a design supports learning; these theories are ―humble in that they target domain-specific learning processes‖ (cobb, confrey, disessa, lehrer, & schauble, 2003, p. 9). dbr involves iterative refinement of both designs for learning and theory (brown, 1992; collins, 1992; the design-based research collective, 2003). this paper outlines the methodological standards for conducting dbr, illustrated with an example, and describes recent advances. 2. methodological standards for conducting design-based research dbr is sometimes conflated with mixed methods or action research; this, paired with calls for educational research to parallel medical research has led learning scientists to strive for more specific standards about what constitutes dbr and what makes it desirable, especially regarding robustness and rigor. this section details current methodological standards for conducting dbr. 2.1. a collaborative effort conducted in context dbr is typically conducted as a team of researchers, designers, and practitioners with intensive planning and debriefing sessions throughout the process. rather than a wholly researcher-driven process, practitioners generally have greater ownership of the process. working collaboratively, they identify a practical problem (reeves, 2006) that is then investigated through literature review, learning theory, and question posing. the intervention instantiates this learning theory into the design. because learning is understood to be a process, and because dbr seeks understanding of how learning occurs, process data are prioritized in dbr, such as video records and artifacts of student work. this approach allows researchers to be opportunistic when something surprising or emergent occurs. the notion that emergence plays a central role in dbr is a shift away from the more positivistic origins in which variables are well-known a priori (collins, 1992). dbr allows for intervention while yet valuing the importance of social interaction rather than social isolation (collins, joseph, & bielaczyc, 2004). this resonates the basic belief by learning scientists that learning is a fundamentally social, interactional process. by occurring in classrooms rather than in laboratories, dbr also allows for testing of designs and theory that address ―the complexity that is a hallmark of educational settings‖ (cobb et al., 2003, p. 9). the challenge is to apply lessons learned in context to a broader range of settings (barab & squire, 2004). because dbr may not be replicated in the classical sense, given strong ties to context, it is critical to share the design along with thick description (barab & squire, 2004). 2.2. iterative cycles refine the design and the theory because of the contextual nature of dbr, some view dbr as a means to generate, but not validate conjectures about learning (sandoval, 2004); however, because such conjectures are made visible in designs for learning, they become testable through iterative refinement. simply conducting one study in the field does not qualify, although it may be reported as one cycle in a longer dbr effort. iterative refinement across contexts allows conjectures to become robust (disessa & cobb, 2004) by placing theory ―in harm’s way‖ (cobb et al., 2003, p. 10). the development of interactive learning assessments (ilas) illustrates the iterative refinement process (mckay, cantarero, svihla, yakes jimenez, & castillo, 2014; phillips et al., 2009; svihla et al., v. svihla 37 | f l r 2010; svihla, phillips, et al., 2009; svihla, vye, et al., 2009; svihla et al., 2013; yakes et al., 2013). ilas place the learner in an authentic, professional role giving advice to virtual clients. ilas were first developed in response to a call for high school biology assessments that did not pause learning, but instead assessed students as they learned; more specifically, we aimed to assess how students used resources to solve problems that were new to them. because this call came from an organization interested in using our designs for all schools in one state, we faced early challenges; our design decisions were driven by the need for scalability. this led us to seek school partners to test our designs, but meant that we neglected some of the contextual influences that are typical of dbr. initially, we did not involve instructors in the design process extensively, but we did debrief with them to inform redesign. we partnered with subject matter experts (e.g., a genetic counselor or dietitian) who helped ensure the problems reflected authentic professional practices, as this was central to our humble theory. our designs for and theory of learning evolved through six iterations (figure 1 & table 1). we initially provided authentic, real-world problems posed by virtual clients and access to resources as a way to support students to solve complex problems. students took on the role of interns and gave counsel to virtual (avatar) clients. our first design succeeded in supporting learning, but was too open-ended to be a useful assessment at scale. beginning with iteration 2, we designed more specified sequences of questions and provided feedback from a virtual supervisor. we found the ilas supported learning and provided useful data for assessment, but the student experience was too linear and scaffolded. we moved to a new setting—a university nutrition program seeking to provide students with ways to learn about professional practices prior to internships as a means to recruit and retain diverse students (svihla et al., 2013). with this different motivation driving our work, we sought to bring instructors more centrally into the role of designers of cases. to offset the linear feel of the cases, we sought to support greater agency, providing opportunities to make choices among story-like branches. instructors found it cumbersome to design such cases. instead of distancing the instructors from the design process, we changed how we instantiated agency into the design, creating short story-like loops; students could explore as many or as few of the loops as they liked. in these versions, students learned content and professional practices, and they enjoyed the opportunity to explore further according to their level of interest. retrospective analysis of dbr cycles provides an opportunity to ―see order, pattern, and regularity‖ in messy, complex settings (disessa & cobb, 2004, p. 84) and supports the development of ―useful, generalizable theories‖ (edelson, 2002, p. 112). this analysis includes considering the conditions for success (dede, 2004) and highlights the need to report failures (o'neill, 2012). retrospective analysis of the six iterations – across varied contexts (rural, urban; high school, university; biology, nutrition) – highlights areas where our theory is robust: students consistently learned by taking on real roles and solving challenges posed by virtual clients. this hinged on our ability to place students in roles they could understand; when the role was further from their experience, the addition of vignettes of the virtual supervisor explaining the role bridged this gap. the distance between student figure 1. refinement of humble theory of learning instantiated in interactive learning assessments v. svihla 38 | f l r experience and professional role also affected feedback given to students. for high school students, it was hard to design feedback that did not seem schoolish, lowering the authenticity. in contrast, the university students found the opportunity to see an expert answer and compare it to their own answers to be an authentic learning activity. v. svihla 39 | f l r table 1. design-based iterations in the development of interactive learning assessments iteration participants and setting role taken by students and problem implementation main findings sample design decisions 1 biology students (n=34) at a rural southern us high school as a conservation geneticist, student advises developers on conservation of two bird species one case completed as think aloud task with researcher students learned about genetics and saw what they were learning as relevant increase scaffolding, create diagnostic yet authentic multiple choice questions 2 biology students (n=24) at a rural southern us high school as an intern genetic counselor, student counsels couple worried about potential for having a baby with sickle cell disease one case completed as think aloud task with researcher students didn’t understand what an internship was, but did learn genetics content from the case provide explicit guidance about internship 3 biology students (n=48) at an affluent west coast us suburban high school as an intern genetic counselor, student counsels couple worried about potential for having a baby with sickle cell disease one case completed in class session students who moved quickly from reading the problem to searching for information struggled; teacher unsure how/when to use case add generate ideas step and reflective prompts, make less linear; add teacher-asdesigner 4 undergraduate nutrition students (n=15) at a southwestern us research university as an intern dietitian, student counsels family about nutrition needs of child with down syndrome one case completed as an online assignment students learned and retained content; designing branches was burdensome for instructor replace branches with loops 5 graduate nutrition students (n=14) at a southwestern us research university as an intern dietitian, student counsels pregnant woman about gestational diabetes one case completed as an online assignment, one in-class discussion session students learned and retained content; instructor could design and teach with the case create more cases 6 undergraduate nutrition students (n=25) at a southwestern us research university as an intern dietitian, student counsels a range of clients on various nutrition topics seven cases completed in place of class meetings, plus seven in-class discussion sessions students learned and retained content; instructor developed more student-centered practice investigate ways to make branching design feasible v. svihla 40 | f l r 3. extensibility of dbr: design-based implementation research (dbir) in the earlier example of ilas, the initial goal was to help bring about statewide systemic change by providing a new way to embed assessment within learning. this driver necessitated changes to traditional dbr. when we changed settings, we also changed the role of the instructors from informants and consumers to designers of cases; this shift reflected our goal to help bring about smaller scale yet systematic change within a university program. in the first set of high school iterations, instructors were uncertain about how to use the cases. in the first iterations in the university setting in which the cases were designed by instructors, the cases were treated as homework, supplemental to in-class lectures. in the most recent iteration, the same instructors replaced lectures with the cases and further supplemented them with discussion (mckay et al., 2014). what we first viewed as a better assessment tool evolved into a tool for instructors to test their ideas about learning, resulting in more learner-centered teaching. 3.1. design-based implementation research recently, others have similarly sought ways to expand the reach of dbr, such as through ―implementation paths‖ that could lead the way to scaling a design (bielaczyc, 2013), seeking to develop learning theory that can be adapted to contexts (barab & squire, 2004), and design-based implementation research (dbir, fishman, penuel, allen, cheng, & sabelli, 2013; penuel & fishman, 2012). dbir includes ―(a) a focus on persistent problems of practice from multiple stakeholders’ perspectives; (b) a commitment to iterative, collaborative design; (c) a concern with developing theory related to both classroom learning and implementation through systematic inquiry; and (d) a concern with developing capacity for sustaining change in systems‖ (penuel, fishman, haugan cheng, & sabelli, 2011, p. 331). in one example of dbir, researchers partnered with four districts to develop a theory of action around improving mathematics instruction (cobb, jackson, smith, sorum, & henrick, 2013); the partnership lasted four years through cycles of data collection and analysis focused on the strategies as implemented. in each cycle, they documented the intended strategies, recorded how they were actually enacted, and made recommendations based on analysis. in order to support and maintain the relationship between researchers and practitioners, the team used two means of data collection and analysis: first, they prioritized providing usable evidence for the districts to evaluate the impact of their policies; second, they iteratively tested their theory of action to refine it. in addition to being guided by and refining a theory of action, they created an interpretive framework; this tool was used to evaluate and guide design decisions prior to, during and after implementation. following the four cycles of implementation, they began retrospective analysis to further test and refine their theory of action. this example highlights many parallels with dbr, including collaborative and contextual work with a focus on refining design and theory through iterative refinement and retrospective analysis. it also highlights the different scale at which dbir is conducted, involving many districts, schools, and classrooms, and a focus on creating sustainable change. by working at this scale, the research is more easily generalizable; by testing conjectures across four districts, they were able to learn about strategies that were effective across districts given specific conditions. because the target of their design was tied to how districts could support improved mathematics instruction, they were able to identify and refine strategies that were ineffective. for instance, school leaders had been receiving contentindependent professional development to guide their feedback to mathematics teachers; however, this process uncovered that they were not able to distinguish between high and low quality enactments of the mathematics. by recommending school leaders instead receive content-based professional development, they were able to design a sustainable, lasting change. dbir researchers emphasize the practical nature of their work, from problem to design to theory (dolle, gomez, russell, & bryk, 2013). this approach takes a broader view of the context and attends to usability by jointly considering how to change larger entities or systems (e.g., school districts) and how to support their ability to sustainably adapt designs (penuel & fishman, 2012). dbir has only begun to be taken up, bringing focus on scalability and sustainability, while respecting teachers and avoiding trying to ―teacher-proof‖ the materials of reform, for instance, through productive adaptation. v. svihla 41 | f l r 3.2. productive adaptation one approach to dbir is in teachers’ productive adaptations of curricula; this means staying faithful to the original intent of the design, reproducing ―invariant principles‖ across sites while being responsive to local contexts (kirshner & polman, 2013). in particular, focusing on maintaining or increasing – rather than reducing – the complexity and students’ engagement can support productive adaptations (debarger, choppin, beauvineau, & moorthy, 2013). dialogic interactions between teachers and researchers can support productive adaptions (kirshner & polman, 2013), but deliberate support – and spaces – are needed to ensure these are frequent enough and sustained (donovan, snow, & daro, 2013). related to this, it is also important to attend to power relationships and ownership of problems of practice; researchers bring different cultural norms and may have status not afforded to practitioners. deliberately viewing this as a cultural exchange, in which researchers and practitioners can trade ideas, can mitigate these challenges (penuel, coburn, & gallagher, 2013). in some cases, district support for any particular program, professional development, or curriculum may be taken as another in a sequence of top-down mandates, and therefore meet with resistance at school sites (borko & klingner, 2013). this highlights the importance of attending to influences across levels in the system in which research is occurring. because of this systems approach, not all dbir research occurs within schools or formal settings; though less common, dbir research has been conducted in communities, as a means to identify issues that might prevent youth from being successful and address them in creative, cross-institutional ways (mclaughlin & london, 2013). such approaches are important because dbr has been critiqued for not sufficiently attending to equity and social justice (confrey, 2005), though some work has sought this out (e.g., barab, dodge, thomas, jackson, & tuzun, 2007). 4. are dbr and dbir designerly? although design-based, not all dbr and dbir appear to be designerly (cross, 2001), explicitly applying design process by seeking needs, optimizing the design, and evaluating a solution in light of identified needs (edelson, 2002). because the targets of dbr are designs for learning and theories of learning, potential needs may be found both in review of research and in the world. needs are sometimes implicit and the design process left to the reader’s imagination (e.g., ―the tool was designed to scaffold learning of argumentation‖). aiming at scalability can strip the contextualist, designerly aspects from dbr, but committing to novel usability—and therefore a focus on context-can mitigate this. dbir focuses on design at scale, which would suggest a less designerly approach; yet, the emphasis on working in partnership with practitioners to support sustained change has helped focus dbir research on worldly needs. as these methods continue to evolve and incorporate bigger systems and big data, there are many opportunities for looking across streams of related data, such as logfiles and videos. these offer ways to evaluate the influence and refinement of designs for learning and of learning theories that are contextual and adaptive to the systems in which they reside. 4.1 credibility of design-based research concerns have been raised previously about the credibility of educational research in general (levin & o'donnell, 1999; national research council, 2002), urging researchers to employ methodologies influenced by medical research. in such approaches, tests of efficacy (whether the treatment works under optimal conditions) and effectiveness (whether the treatment works under real world conditions) ―are often conflated‖ (sloane, 2008, p. 625). influenced by this, discussions about dbr have focused on robustness, rigor and validity, grounded in experimental perspectives, an odd choice given the contextual, qualitative work that is commonly done with dbr. however, trustworthiness and credibility – as applied in qualitative methods – have also been considered (barab & squire, 2004), resulting in other ways to evaluate dbr: methodological alignment means the ―research methods we use actually test what we think they are testing‖ (hoadley, 2004, p. 203). edelson holds that dbr should not be evaluated by the same standards as v. svihla 42 | f l r traditional approaches because the goals differ; instead, ―novelty and usefulness‖ of the theory developed should be applied (2002, p. 118). 4.2 new types of data with the increasing popularity of big data and the relatively common use of technology-enhanced learning, some have included these new types of data in dbr studies. for instance, complex statistical modeling has recently taken the traditional place of qualitative approaches (markauskaite, 2010; markauskaite & reimann, 2008), arguing this approach avoids selection and confirmation bias. others remain skeptical about finding usable evidence of learning from big data, citing examples of contextual, interactional ―in-room‖ events that are not logged automatically (stevens, 2013); such events may explain successes and failures of designs in important ways. as an example, a long period of activity on a logfile might indicate a range of activities: a student spending a long time diligently reading the screen; a student absent from the activity, wandering the class out of boredom; or a teacher interaction in response to a reflective question by the student. these tell us very different things about how the design is or is not supporting learning, and do not, on average, provide useful design information. to deal with this issue, others rely on a combination of video and logfiles. for instance, researchers first analyzed classroom and video data to redesign a feedback feature that students rarely used (segedy, kinnebrew, & biswas, 2012). they then analyzed data from students’ interactions with the technology using hidden markov modeling, to evaluate the impact of their design decisions, leading to further refinement of both the design and theory guiding their work. similarly, in our research, we have leveraged data from logfiles, field notes, student performance, and videos of implementations to test and inform design decisions (svihla & linn, 2012a, 2012b). for instance, based on review of video and logfiles and student performance, we chose to add a step to an instructional unit to support students to interpret interactive visualizations, but we feared students might use a guess-and-check approach as a result. by examining logfiles, we found that most students revisited an earlier step seeking information, rather than guessing. this led us to more closely examine logfiles for particular patterns of activities, such as revisiting steps from earlier activities. though the primary theory guiding that work was well developed, the instantiation of it in the particular context and for the particular curricular goals was not, resulting in a much more humble, localized version that incorporated new ideas about how students revisit prior curricula to support their learning. keypoints design-based research (dbr) is the core methodology of the learning sciences the purpose of dbr is to develop designs for learning and learning theory through iterative refinement and retrospective analysis typically, dbr involves qualitative data; recently, some researchers have begun using ―big data‖ to make design refinements and build theory design-based implementation research (dbir) is a recent trend that involves scaling dbr to support change in larger systems, such as school districts acknowledgments the author would like to acknowledge support from the usda/nifa hispanic-serving institutions (hsi) education grants program (#2012-38422-19836). v. svihla 43 | f l r references barab, s. a., dodge, t., thomas, m. k., jackson, c., & tuzun, h. (2007). our designs and the social agendas they carry. the journal of the learning sciences, 16(2), 263-305. doi: 10.1080/10508400701193713 barab, s. a., & squire, k. (2004). design-based research: putting a stake in the ground. journal of the learning sciences, 13(1), 1-14. doi: 10.1207/s15327809jls1301_1 bielaczyc, k. (2013). informing design research: learning from teachers' designs of social infrastructure. journal of the learning sciences, 22(2), 258-311. doi: 10.1080/10508406.2012.691925 borko, h., & klingner, j. (2013). supporting teachers in schools to improve their instructional practice. national society for the study of education yearbook, 112(2), 274-297. brown, a. l. (1992). design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. the journal of the learning sciences, 2(2), 141-178. doi: 10.1207/s15327809jls0202_2 cobb, p., confrey, j., disessa, a. a., lehrer, r., & schauble, l. (2003). design experiments in educational research. educational researcher, 32(1), 9-13. doi: 10.3102/0013189x032001009 cobb, p., jackson, k., smith, t., sorum, m., & henrick, e. (2013). design research with educational systems: investigating and supporting improvements in the quality of mathematics teaching and learning at scale. national society for the study of education yearbook, 112(2). collins, a. (1992). toward a design science of education. in e. scanlon & t. o’shea (eds.), new directions in educational technology (pp. 15-22). berlin: springer-verlag. collins, a., joseph, d., & bielaczyc, k. (2004). design research: theoretical and methodological issues. journal of the learning sciences, 13(1), 15-42. doi: 10.1207/s15327809jls1301_2 confrey, j. (2005). the evolution of design studies as methodology. the cambridge handbook of the learning sciences, 135-151. cross, n. (2001). designerly ways of knowing: design discipline versus design science. design issues, 17(3), 49-55. doi: 10.1162/074793601750357196 debarger, a. h., choppin, j., beauvineau, y., & moorthy, s. (2013). designing for productive adaptations of curriculum interventions. national society for the study of education yearbook, 112(2). dede, c. (2004). if design-based research is the answer, what is the question? a commentary on collins, joseph, and bielaczyc; disessa and cobb; and fishman, marx, blumenthal, krajcik, and soloway in the jls special issue on design-based research. journal of the learning sciences, 13(1), 105-114. doi: 10.1207/s15327809jls1301_5 disessa, a. a., & cobb, p. (2004). ontological innovation and the role of theory in design experiments. journal of the learning sciences, 13(1), 77-10327. doi: 10.1207/s15327809jls1301_4 dolle, j., gomez, l. m., russell, j., & bryk, a. s. (2013). more than a network: building professional communities for educational improvement. national society for the study of education yearbook, 112(2), 443-463. donovan, m. s., snow, c., & daro, p. (2013). the serp approach to problem-solving research, development, and implementation. national society for the study of education yearbook, 112(2), 400-425. edelson, d. (2002). design research: what we learn when we engage in design. journal of the learning sciences, 11(1), 105-121. doi: 10.1207/s15327809jls1101_4 fishman, b., penuel, w. r., allen, a., cheng, b. h., & sabelli, n. (2013). design-based implementation research: an emerging model for transforming the relationship of research and practice. national society for the study of education yearbook, 112(2), 136-156. hoadley, c. m. (2004). methodological alignment in design-based research. educational psychologist, 39(4), 203-212. doi: 10.1207/s15326985ep3904_2 kirshner, b., & polman, j. l. (2013). adaptation by design: a context-sensitive, dialogic approach to interventions. national society for the study of education yearbook, 112(2), 215-236. levin, j. r., & o'donnell, a. m. (1999). what to do about educational research's credibility gaps? issues in education, 5(2), 177-229. doi: 10.1016/s1080-9724(00)00025-2 v. svihla 44 | f l r markauskaite, l. (2010). digital media, technologies and scholarship: some shapes of eresearch in educational inquiry. the australian educational researcher, 37(4), 79-101. doi: 10.1007/bf03216938 markauskaite, l., & reimann, p. (2008). enhancing and scaling-up design-based research: the potential of e-research. paper presented at the proceedings of the 8th international conference on international conference for the learning sciences-volume 2. mckay, t., cantarero, a., svihla, v., yakes jimenez, e., & castillo, t. (2014, june 23-27). becoming a professional through virtual practice. paper presented at the 11th international conference of the learning sciences (icls2014), boulder, co. mclaughlin, m., & london, r. a. (2013). taking a societal sector perspective on youth learning and development. national society for the study of education yearbook, 112(2), 192-214. national research council. (2002). scientific research in education. washington, dc: the national academies press. o'neill, d. k. (2012). designs that fly: what the history of aeronautics tells us about the future of designbased research in education. international journal of research & method in education, 35(2), 119140. doi: 10.1080/1743727x.2012.683573 penuel, w. r., coburn, c. e., & gallagher, d. j. (2013). negotiating problems of practice in research– practice design partnerships. national society for the study of education yearbook, 112(2), 237255. penuel, w. r., & fishman, b. j. (2012). large‐ scale science education intervention research we can use. journal of research in science teaching. doi: 10.3102/0013189x11421826 penuel, w. r., fishman, b. j., haugan cheng, b., & sabelli, n. (2011). organizing research and development at the intersection of learning, implementation, and design. educational researcher, 40(7), 331-337. doi: 10.3102/0013189x11421826 phillips, r., gawel, d. j., svihla, v., brown, m., vye, n. j., & bransford, j. d. (2009). new technology supports for authentic science inquiry, practice, and assessment in the classroom. paper presented at the aera, san diego. reeves, t. c. (2006). design research from a technology perspective. educational design research, 1(3), 5266. sandoval, w. a. (2004). developing learning theory by refining conjectures embodied in educational designs. educational psychologist, 39(4), 213-223. doi: 10.1207/s15326985ep3904_3 segedy, j., kinnebrew, j., & biswas, g. (2012). supporting student learning using conversational agents in a teachable agent environment. in j. van aalst, k. thompson, m. j. jacobson & p. reimann (eds.), the future of learning: proceedings of the 10th international conference of the learning sciences (icls 2012) – volume 2, short papers, symposia, and abstracts (pp. 251-255). sydney, australia: isls. sloane, f. c. (2008). randomized trials in mathematics education: recalibrating the proposed high watermark. educational researcher, 37(9), 624-630. doi: 10.3102/0013189x08328879 stevens, r. (2013, 6/12-6/14). big data, interaction analysis, and everything in between. paper presented at the games, learning, society 9.0, madison, wi. svihla, v., gawel, d. j., brown, m., moore, a., vye, n. j., & bransford, j. d. (2010). 21st century assessment: redesigning to optimize learning. in k. gomez, l. lyons & j. radinsky (eds.), learning in the disciplines: proceedings of the 9th international conference of the learning sciences (icls) (vol. 2, pp. 474-475). chicago, il: international society of the learning sciences. svihla, v., & linn, m. c. (2012a). a design-based approach to fostering understanding of global climate change. international journal of science education, 34(5), 651-676. doi: 10.1080/09500693.2011.597453 svihla, v., & linn, m. c. (2012b). distributing practice: challenges and opportunities for inquiry learning. in j. van aalst, k. thompson, m. j. jacobson & p. reimann (eds.), the future of learning: proceedings of the 10th international conference of the learning sciences (icls 2012) – volume 1, full papers (pp. 371-378). sydney, australia: isls. svihla, v., phillips, r., gawel, d. j., vye, n. j., brown, m., & bransford, j. d. (2009). a tool for 21st century learning and assessment. in a. dimitracopoulou, c. o'malley, d. suthers & p. reimann v. svihla 45 | f l r (eds.), cscl practices: proceedings of the 8th international conference on computer supported collaborative learning (cscl 09) (vol. 2, pp. 46-48). rhodes, greece: international society of the learning sciences. svihla, v., vye, n. j., brown, m., phillips, r., gawel, d. j., & bransford, j. d. (2009). interactive learning assessments for the 21st century education canada, 49(3), 44-47. svihla, v., yakes, e., castillo, t., cantarero, a., valdez, i., & dominguez, n. (2013). interactive learning assessment: providing context and simulating professional practices proceedings of games, learning, society 9. the design-based research collective. (2003). design-based research: an emerging paradigm for educational inquiry. educational researcher, 32(1), 5–8. doi: 10.3102/0013189x032001005 yakes, e., cantarero, a., mckay, t., svihla, v., castillo, t., valdez, i., & hertel, j. (2013). interactive learning assessment: simulating professional practices. nacta journal, 57(supplement 1). morena -esteva et al publication frontline learning research vol.6 no. 3 (2018) 72 84 issn 2295-3159 application of mathematical and machine learning techniques to analyse eye tracking data enabling better understanding of children’s visual cognitive behaviours enrique garcia moreno-estevaa, sonia l. j. whiteb, joanne m. woodc, alex a. black c adepartment of education, university of helsinki, finland b faculty of education, queensland university of technology, australia c faculty of health, queensland university of technology, australia article received 9 may 2018/ revised 16 september/ accepted 17 october/ available online 7 december abstract in this research, we aimed to investigate the visual-cognitive behaviours of a sample of 106 children in year 3 (8.8 ± 0.3 years) while completing a mathematics bar-graph task. eye movements were recorded while children completed the task and the patterns of eye movements were explored using machine learning approaches. two different techniques of machine-learning were used (bayesian and k-means) to obtain separate model sequences or average scanpaths for those children who responded either correctly or incorrectly to the graph task. application of these machine-learning approaches indicated distinct differences in the resulting scanpaths for children who completed the graph task correctly or incorrectly: children who responded correctly accessed information that was mostly categorised as critical, whereas children responding incorrectly did not. there was also evidence that the children who were correct accessed the graph information in a different, more logical order, compared to the children who were incorrect. the visual behaviours aligned with different aspects of graph comprehension, such as initial understanding and orienting to the graph, and later interpretation and use of relevant information on the graph. the findings are discussed in terms of the implications for early mathematics teaching and learning, particularly in the development of graph comprehension, as well as the application of machine learning techniques to investigations of other visual-cognitive behaviours. graph interpretation; eye tracking; machine learning; mathematics education; gaze metrics info corresponding author: enrique.garciamoreno-esteva@helsinki.fi doi: https://doi.org/10.14786/flr.v6i3.365 1. introduction eye tracking is rapidly becoming an established technique for investigating the cognitive processes involved in learning mathematics and other subjects. the use of eye tracking makes it possible to make implicit visual-cognitive behaviours explicit, in order to better understand learning processes and subsequently inform educational practice (lai, et al., 2013). previous methods, such as interviewing and think aloud protocols, have been adopted to understand the different approaches and strategies used in a range of problem-solving tasks, however, there are limitations to these approaches. first, interviews undertaken following task completion are reliant on a participant accurately remembering and recalling specific steps. second, think aloud protocols assume the participant has the cognitive flexibility to think out loud while also engaging in the problem-solving task (rosenzweig, krawec, & montague, 2011). for example, to understand how young children engage in mathematics problem-solving tasks, the additional cognitive demand of think aloud protocols could potentially have an adverse impact on task performance. indeed, kotsopoulos and lee (2012) used modified think aloud protocols and real time naturalistic analysis of students completing mathematical problem-solving tasks, and reported that problem-solving often broke down in the early stages of understanding the task. the limitations with many of these approaches led van gog, paas, van merriënboer and witte (2005) to highlight the need for research methods to understand different problem-solving processes and supported the use of eye tracking methods for more complex tasks that involve a sequence of cognitive steps. an extensive body of eye tracking research has focused on the link between visual gaze and information processing. for example, how a student looks at a diagram is influenced by their preliminary intuitions and the conceptions activated by the task context (knoblich, ohlsson & raney, 2001). eye tracking research has revealed that more experienced problem-solvers (experts) can identify task relevant visual information more rapidly than less experienced individuals (novices), and their visual attention (eye fixation scanpaths) tend to be more focused on relevant than irrelevant regions of the visual stimulus (gegenfurtner, lehtinen & säljö, 2011; tsai, hou, lai, liu, & yang, 2012); these objective findings were corroborated by self-reported accounts of participants completing the task (tsai, et al., 2012). another study reported that novices display significantly more shifts in visual attention than experts and have longer gaze sequences (or scanpaths) for a given problem-solving task (kim, aleven & dey, 2014). furthermore, psychology research indicates that the order of fixations affects cognitive functions, such as memory (e.g. bochynska & laeng, 2015; rinaldi, brugger, bockisch, bertolini, girelli, 2015). bochynska and laeng (2015) used eye tracking in a visuospatial memory recognition task and found that both the spatial information and the order of fixations were important for visuospatial memory formation, with increased accuracy in trials where the elements were presented serially, in the same order as in the participant’s original fixation scanpath. the current research, utilised machine learning techniques to make sense of the order structure of the multiple scanpaths of primary school children while they completed a graph problem-solving task, in order to better understand the visual cognitive behaviours involved. the rationale for using machine learning analysis is that it allows us to examine the sequential (temporal) structure of the data. this is unlike the approach adopted in most eye tracking studies and allows novel insight into the underlying behaviours of children while completing these graph tasks. while there are other methods to examine the temporal structure of data, we felt our approach was the most suitable for our research purpose. our method is also faster and more automated than many other traditional methods. in order to interpret the scanpaths, our approach used the sequential framework of children's data comprehension, described by curcio (2010) which comprises; understanding, interpretation, and prediction with data. the first level of comprehension, ‘understanding’, requires reading of the information explicitly stated in the presented data (e.g. graph). the second level of comprehension, ‘interpretation’, requires reading between the data and integration of the presented information, this level of comprehension requires specific skills, such as comparison or computation (e.g. addition, subtraction, multiplication, division). the third level of comprehension, ‘prediction’, requires reading beyond the data and the use of existing knowledge to make inferences/predictions from the data. when making predictions, the information is neither explicitly nor implicitly presented in the graph. the bar graph task used in the present study required each child to engage in the first two levels of data comprehension: reading the question and basic details of the graph (understanding), and then reading between the different elements of information (interpretation) in order to complete the computation and arrive at the correct solution. most importantly, with graph interpretation, both visual and cognitive integration is required (ratwani, trafton & boehm-davis, 2008), where integration of relevant visual attributes on the graph, such as labels, pattern recognition, or other spatial features contribute to higher order visual clusters of information, that are compared to create a coherent representation and response. the aim of this research was thus to use mathematical and machine learning based analysis of eye tracking data to better understand the visual and cognitive behaviours associated with the completion of a graph task in a sample of children in year 3. using a desktop eye tracking system, children completed a mathematics task that involved comprehension of a bar graph. the scanpaths and the accuracy of individual responses to the task were analysed to identify the visual cognitive behaviours when completing the graph task, for example, what happens when children are confronted with such a task, which features children look at, and the order in which relevant information is accessed. the research also aimed to determine whether the gaze patterns for children who correctly or incorrectly completed the task were different, and if so, to identify the characteristics of the different visual cognitive behaviours. such information will enable more reliable inferences about the implicit visual cognitive behaviours of children when engaging in a graph problem-solving task and other cognitive tasks. 2. method 2.1 participants participants included 106 year 3 children (58 females, 48 males; mean age 8.8 ± 0.3 years) who completed a graphical mathematics task. children were from three primary schools in south east queensland, australia. data collection occurred in the last half of the school year, and the graph task formed part of a larger study that involved eye tracking while children completed a series of mathematics and reading tasks. all data collection occurred in a quiet room near the respective classrooms. the study was approved by university human research ethics committee that operates within the australian national statement on ethical conduct in human research. approval to conduct research in queensland state schools was also granted by the queensland government, department of education and training. 2.2 task the design of the graph task was based on the year 3 australian mathematics curriculum where children are interpreting and comparing data displays (acara, 2016). as presented in figure 1, the graph task included: a) a bar graph, where the height of each bar indicated the number of hours worked by sarah during a given week; b) a labelled coordinate system, where the x-axis had the week number labels, and the y-axis had numbers corresponding to hours; c) a sentence indicating sarah’s hourly wage; d) another sentence indicating the question related to sarah’s wages in week 3. 2.3 apparatus a screen-based tobii eye tracker (tx300) operating at 300 hz recorded the eye movements of the children as they completed the graph task. the task was presented on a 23 inch screen and participants sat comfortably (without restraint) at a working distance of approximately 60cm. the calibration used the tx300 nine point calibration procedure, with re-calibration conducted for any points where calibration was recorded as poor (denoted in red). only when the calibration procedure was completed for all nine points (denoted in green) was the eye tracking task started. events were detected using a dispersion based algorithm for detecting fixations. a fixation was defined as static eye movements with gaze positions remaining within a visual angle of 1.6° for at least 100 milliseconds (tobii technology, 2014). for the graph task, the 106 participants had a mean tracking percentage of 87.8 ± 9.8%. the output of the eye tracking device is typically a sequence of coordinate pairs relative to the scene that the participant is viewing, together with a time stamp when the locus of gaze is positioned at a given coordinate pair. 2.4 data extraction the visual stimulus (the graph on the screen; figure 1) was subdivided into regions known as areas of interest (aoi). initially, aoi sequences were generated from the sequence of fixations, as determined by the dispersion-based algorithm. the first step involved determining which were the most critical aois and labelling them as a1 (wage information (part of sentence below the graph)), a2 (week number (part of sentence below the graph)), a3 (week 3 bar), and a4 (number region containing the number of hours corresponding to week 3). these aois were considered as the minimum number of areas that needed to be fixated to complete the graph task successfully. the other aois were labelled as somewhat critical (b) and less critical (c). the categorisation of b areas as somewhat critical was derived as part of an iterative process, incorporating typical classroom practice and qualitative inspection of the scanpaths. typical classroom practice, or the procedural frameworks used to support children engaging with graphs, usually includes reading the title and axis labels as an initial orientation to the graph; these areas were therefore labelled as somewhat critical b areas. furthermore, qualitative inspection of scanpaths revealed that when reading the sentences below the graph, some children failed to read the full sentence and did not access the most critical information (a1 and a2), but did read the first part of each of the two sentences. to enable identification of this type of incomplete reading behaviour, the sentences were separated into two different areas, indicating somewhat critical (b) and most critical (a) information. finally, the less critical c areas were classified because they represented procedural checking that may be promoted as part of classroom practice, for example, systematically checking all bars on the bar graph before providing a response. figure 1. the graph task visual stimulus partitioned into areas of interest (aois). in this diagram, dashed lines indicate the original stimulus size and solid lines indicate the alignment of displacement zones for each aoi. the size of each aoi regardless of whether they were categorised as a, b or c, was calculated using two criteria; first, the aoi had to allow for vertical and horizontal displacement errors in the visual scanpath, and second, the aois could not overlap (holmqvist, et al., 2011). while some studies involving children (e.g. sasson & elison, 2012) have implemented a two degree vertical and horizontal displacement zone centred over relevant visual stimuli (one degree above and one degree below the stimulus), this was not possible with the current graph task as it would have resulted in substantial overlap of aois. thus wherever possible, a 1.9 degree vertical and 0.5 degree horizontal displacement zone was centred over the stimulus and in the event of overlap, the horizontal displacement zone was reduced to 0.15 degrees, which occurred for the following aois: a1 and b1, a2 and b2. to avoid overlap between c2 and b5 aois, b5 had a 1.5 degree centred vertical displacement, whereas the c2 vertical displacement zone could not be centred over the stimulus, so the c2 aoi was defined as 0.5 degree above and 0.95 degree below the stimulus. to distinguish fixations on the y-axis, a4 and b4 had a 1.9 degree vertical displacement, and a larger 1 degree horizontal displacement. the data sequences that were obtained initially were based on fixations and then transformed into dwell sequences by coalescing contiguous identical elements in the sequence. thus, for example, if in a fixation sequence we have b2, a1, a1, c3, this would give rise to the dwell sequence b2, a1, c3. in our analysis, sequences of dwells were used. the data included 106 data sequences or scanpaths (known as the inputs both terminologies are used to reflect what is typically used in the literature), one for each child, and the corresponding 106 answers to the graph question (the outputs), were categorised as correct (1) or incorrect (0). the data sequences (sequences of aoi names) could thus be grouped into two classes, one being the sequences of children who responded correctly to the task, and the other being the sequences of children who responded incorrectly. 2.5 machine learning techniques machine learning approaches were used to find a representative sequence of aois that would provide qualitative information about the visual cognitive characteristics associated with task performance. two methods are described that provide means by which an average scanpath can be extracted from a set of multiple sequences or scanpaths obtained from children performing the same task. method one applies a naïve bayesian approach to calculate the most probable vector for each class (using two possible alternatives to managing variable feature vector length), and method two applies a single iteration of a k-means approach to calculate the central vector for each class using an edit distance metric. the use of these techniques will form the basis for future research that characterises the relevant scanpaths of children to potentially guide identification of their ability to effectively perform graph tasks. 2.5.1 method one – most probable vector for each class the naïve bayes model or classifier “learns” from the data sequences, which are known as feature vectors, which is a commonly used term in the machine learning literature (murphy, 2012). a data sequence can be defined as a finite list of representational feature objects or values, which can be names, numbers or symbols, as long as the possible set of representational names or values is finite. in the case of scanpaths, the feature values are the aoi names. the classes in the model are the children who either correctly or incorrectly completed the graph task. learning a naïve bayes model consists of calculating the class conditional probabilities of the feature values. for each class and each feature, the number of occurrences of the feature value is divided by the number of feature vectors in the class (whether the child completed the task correctly or incorrectly) in question. this yields the class conditional probability for the possible values of a given feature in the given class. the “classifier” obtained from the model consists of probability distributions of the feature values determined by the class conditional probabilities for the values of each feature in each class. for example, suppose that we have feature vectors (each one in brackets) in a class: (a, b, c, a, b), (b, b, c, c, b), and (c, a, b, c, a). then, the class conditional probabilities for the second feature value (in bold) of this class are, for feature value a, 1/3; for feature value b, 2/3; and for feature value c, 0 because it does not appear in the second feature position. for a given sequence or feature vector in a given class, all the class conditional probabilities are multiplied and the product is multiplied by the class prior probability. in our analysis we assumed what is called a uniform prior, that is, a prior probability value of ½ for each class. the final products, one for each class, are the joint probabilities. to obtain the most probable feature vector for a class (i.e., the most probable or average scanpath or data sequence for the given class) the feature value that has the highest probability of occurring in a feature and class, as given by the class conditional probabilities, is selected for each feature and class. in the above example, the resulting vector would be (a, b, c, c, b). in the first position, all values have probability 1/3 so we pick one at random, say a. in the second through fifth positions, b, c, c, and b have probability 2/3 whereas the other symbols in each position have probability 1/3 or 0. the resulting list, or vector, of most probable feature values is the most probable feature vector (or average scanpath or data sequence) for that class. the vectors obtained for each class provide qualitative information regarding the children’s task performance according to the previously established criterion, which indicates whether the graph task was completed correctly or incorrectly. the feature vectors can have different lengths, which can lead to biases; there are two alternatives for managing variable feature vector lengths. the first way to deal with this length variability is to consider those feature vectors in a class which are shorter than the longest one in that class, as feature vectors containing missing data. the class conditional probabilities are calculated as outlined in the example. if a feature is considered as a coordinate or location in a vector, for a given feature and class, the probability of a feature value is the number of times that value occurs in any of the feature vectors, at the given location, divided by the total number of feature vectors in the given class. if a feature vector is too short to contain any feature values, then it will not contribute to the numerator of the preceding quantity, as in the example above. the alternative approach to handling the length variability is to turn all the feature vectors in a given class into vectors of the same length by “padding” the shorter vectors in the class with a dummy value to the right, up to the length of the longest feature vector in the class. padding a short vector to the right means that, after the last value of the short feature vector, a new padded value is added to the end of the list of feature values until it is the same length as the longest feature vector in the class. the “padding feature value” instances are different to all original feature values occurring in the data feature vectors. as an example, if the longest feature vector has a length of 5 (a, b, a, c, a), and we have a feature vector of a length of 3 (a, b, c), then, in positions 4 and 5 of the short feature vector we introduce the padding feature value, say x, so that instead of (a, b, c) we end up with (a, b, c, x, x). the class conditional probabilities are calculated as before, noting that in some cases, the most probable feature value might turn out to be the padding feature value. in the case of our analysis we used the first method to deal with length variability because, among other more technical reasons, it allowed us to have strings that contain only aoi names. 2.5.2. method two – the central vector for each class the second overall method of analysis involves taking a measure of the “distance” between one feature vector and another one with the levenshtein edit distance metric. the edit distance between two feature vectors is given as the number of feature deletions, insertions, and replacements that are required to transform one of the vectors into the other one. the method then works by finding a feature vector, which is called a central feature vector, that minimises the total edit distance between the central feature vector and each of the other feature vectors. methods to compute edit distance and thus a central feature vector are well understood in the computer science or mathematics literature (see skiena, 2010 for a review). these central feature vectors also reflect, on average, what is happening with all the sequences, and should be generally similar to those obtained with method one the most probable vector. 3. results the results of method one show that classifying multiple scanpaths, particularly, dwell area of interest (daoi) sequences with naïve bayes works extremely well (where a dwell is a continuous series of fixations within the same aoi). this in itself is informative, as it means that the elements of a daoi sequence are not necessarily correlated, that is, after looking at any one particular aoi, the observer may look at any other aoi with equal likelihood. both method one and method two allowed us to produce “virtual scanpaths” (the most probable vector and the central vector scanpaths) that demonstrate what children are doing when they are solving the problem correctly or incorrectly. 3.1 using a naïve bayes classifier to obtain most probable vectors using a naïve bayes classifier we were able to get a perfect classification, where our training error rate was zero. since predictive analyses was not our goal, we did not write our own cross validation routine. however, with a commercial package that handles missing data very differently (mathematica, wolfram research), we obtained a training error rate of .15 and a “leave one out” cross validation error rate of .27. this suggested that the aois in the gaze sequences are uncorrelated along the sequences and obey the naïve bayes assumption. what makes the use of a naïve bayes classifier attractive to us is the possibility of generating “virtual scanpaths” that are “most probable”. since for each feature (each coordinate of the scanpath sequences) and for each class we obtain a probability distribution of the possible aois, the most probable one can be selected. this provides us with an exemplary sequence that provides good qualitative data regarding what the children are doing while they solve the task, depending on whether they do so correctly or incorrectly. on average, the lengths of the scanpaths of children in either group are largely similar (69 ± 31 aois for children who responded correctly, vs 77 ± 39 for children who responded incorrectly), although they are slightly shorter for children who completed the task correctly. the qualitative information provided by the average scanpaths becomes even more evident with further manipulation and post-processing of the average scanpaths. if we merge the aois which are contiguous and identical in the average scanpath into a single instance, and remove dwells on blank spaces, the average scanpaths become much shorter, and this varies between the two classes. the correct children’s post processed (merged) average scanpath is as follows: {c1,b1,a1,b2,a2,b2,a2,a3,b2,a3,b2,a3,a1,a3,a1,a3,a1,a2,a1,a3,a1,a3,a1} and the incorrect children’s post processed (merged) average scanpath is as follows: {c2,c1,b1,a1,b1,a1,b1,b2,b1,b2,a1,b2,c2,b2,a1,c2,b2,a3,b2,c2,a3,a1,a3,a1,c1,c2,b2,a2,a3,b2,a1, b2,a1,a3,c2,b3,c2,b2,a3,a2} as can be observed, the incorrect children’s merged average scanpath is almost twice as long (40 aois) as that of correct children (23 aois). it is notable that a4 does not feature in the merged average scanpath for either the correct or incorrect children. this is likely to indicate that while children may have accessed a4 in their scanpath, there is not one clear position in the participants’ sequences where a4 features, therefore it does not appear in the average scanpath. the percentage of overall aoi accessed in the merged average scanpath are presented in figure 2. this shows that the gaze of the incorrect children wandered more than the gaze of correct children. it is also interesting to note that the incorrect children spread their attention relatively evenly across the a, b, and c areas, whereas the correct children exhibited greater visual attention (i.e. highest percentage of dwells) on the more critical a areas. figure 2. percentage of aoi access in the merged average scanpath, comparing correct and incorrect children. a aois: most critical, b aois: somewhat critical, c aois: less critical. examining the merged average scanpath as a sequence of dwells in areas a, b and c, demonstrates a more detailed characterisation of the visual cognitive behaviours (figure 3). correct children more rapidly accessed and maintained attention on the a areas. whereas children who were incorrect did not exhibit the same rate of attention on a areas, but rather appeared to be uncertain as to where to focus their visual attention, with many shifts between a, b and c areas with a larger number of dwells. figure 3. merged average scanpath, showing which aoi were accessed by correct and incorrect children over time. 3.2 finding the central data item in each class and what it reveals the average scanpath derived by method two (central data item) determined that correct children had the following sequence: {a1,b5,b1,b1,b1,a1,a1,a1,c2,a1,b2,b2,b2,b2,a2,b2,a2,a2,a1,b3,c2,c1,a3,a3,a1,a3,b3,a4,a3,a4,a3,a3,b2,a2,a1,a1} while the sequence for incorrect children was as follows: {c1,b1,b1,a1,a1,b2,b2,b2,b2,b2,b2,b2,a3,c1,c2,b3,a3,a4,c1,b5,b2,a2,b2,b2,b2,b2,a2,c3,c1,a4,c2,a3,b1, a1,a2,a1} the average scanpath sequences for correct and incorrect children have the same sequence length (36 values). in terms of percentage of dwells in the most critical areas, incorrect children, in contrast to the correct children, spent a lot of time looking at the less critical areas (b and c) (figure 4), which is similar to the results of method one (the most probable vector). figure 4. percentage of aoi access using the central feature vector, comparing correct and incorrect children. a aois: most critical, b aois: somewhat critical, c aois: less critical. the central feature vectors allowed us to learn how the critical areas were accessed. if we remove all of the non-critical (b and c) areas, leaving only the a areas, a distinction between the two vectors which is crucial to our understanding of the children’s task behavior becomes evident. following this step, we obtained the following correct sequence: or more simply, without the non-critical areas: correct response sequence: {a1,a1,a1,a1,a1,a2,a2,a2,a1,a3,a3,a1,a3,a4,a3,a4,a3,a3,a2,a1,a1} and, the incorrect response sequence: {a1,a1,a3,a3,a4,a2,a2,a4,a3,a1,a2,a1}. figure 5 summarises the sequence of dwells on most critical areas (a1-a4) and other non-critical areas (b-c) for the modified sequences. looking at the modified sequences, it is evident that area a4, which includes the values along the y-axis of the task graph, is infrequently visited by incorrect children, and, as anybody who learns math as a student or as a professional is aware, knowing what is being counted is critical to understanding and solving graph problems correctly. moreover, the correct children’s inspection of the most critical areas is in a predictable order (a1, a2, a3, a4). conversely, a less predictable order was evident in the modified sequence of the incorrect children. figure 5. modified average scanpath, showing the sequence in which the most critical aois (a1-a4) and other non-critical areas (b-c) were accessed by correct and incorrect children over time. discussion in this paper we discuss two novel methods of analysing eye tracking data to better understand the visual and cognitive behaviours associated with completion of a graph task in a sample of children in year 3. each method produced two strings (one for the correct children and one for the incorrect) which were compared. the initial findings are significant because they demonstrate how the methods developed by our research group can assist in characterising the visual behaviour of children who correctly or incorrectly answered the graph task. the results of both methods of analysis reveal differences in gaze patterns between the correct and incorrect groups. this discussion will summarise the key findings for each method, compare the different analytic approaches and characterise the visual cognitive behaviours identified. method one used a naïve bayes classifier to obtain most probable vectors. the perfect classification training rate with our software of 100% suggests that the eye tracking data for this task obeyed the naïve bayes assumption, and that the aois in the scanpath were uncorrelated, and an average scanpath can plausibly be determined. we believe that this is likely to be task dependent, however, further research is required in order to fully understand this issue. we did not do cross-validation with our software, but used a commercial package (mathematica, wolfram research) which showed a training error rate of .15 and a leave one out cross validation of .27, which given the small amount data and relatively large number of features, is still very good. we are optimistic that with more data, this rate would drop. with method one, comparisons of the average number of dwells in the sequences of each class identified a slightly higher number of dwells for the incorrect children, compared to the correct children (77 vs 69 aois, respectively). this higher number of dwells for the incorrect children is in accord with the findings of kim et al. (2014) who found that novices display significantly more shifts in visual attention than experts and have longer gaze sequences (or scanpaths) for a given problem-solving task, than their more experienced counterparts. in our study, the correct children exhibited fewer dwells and fewer shifts in visual attention than their counterparts who were incorrect. this is further evidenced in the merged average scanpath, where correct children had a higher percentage of dwells in the most critical a areas, whereas the incorrect children had dwells distributed across the most critical and less critical (a-c) areas. interestingly, these patterns reflect those of gegenfurtner, et al., (2011) and tsai, et al., (2012) regarding the behaviours of more experienced problem solvers who were shown to identify task relevant features of visual information more rapidly than less experienced individuals, and their visual attention focused more on relevant than irrelevant regions of the visual stimulus. our research was also able to characterise the sequence of rapid dwells on relevant/critical areas in the children who completed the graph task correctly, as shown in figure 3. method two used the central data item to determine the average scanpath, as well as a procedure to isolate the sequence of dwells on the most critical a areas. this isolation of the sequence of dwells on a areas was revealing in terms of understanding whether there was a logical sequence in accessing the most critical information. the children who responded correctly followed a logical sequence of dwells (based on our initial categorisation of which areas were most or least critical), that progresses from a1 to a2 to a3, then a4 (figure 5). conversely, the children who responded incorrectly, accessed a2 relatively late in the sequence. a2 represented the area surrounding the main question ‘how much did she earn in week 3?’ with a2 specifically representing ‘week 3’. delays in accessing this information may result in poorly focused attention to the relevant aspects of the graph, resulting in the child not knowing which bar they should be using in their calculation. relating this to curcio’s (2010) framework it is the first level in the data comprehension that is delayed, or out of sequence for the children who responded incorrectly. the initial understanding and reading of information in a logical sequence is the essential foundation to guide subsequent visual attention for interpreting and integrating the relevant graphical information and successfully completing the task. this finding is consistent with kotsopoulos and lee (2012), who reported that student problem-solving most often broke down in the early stages of understanding a mathematics problem-solving task. it may be that if initial understanding or orientation is not achieved early, the visual dwells become more frequent and variable in location, where children may be looking for information to help them understand the requirements of the task. it may also be that the positioning of the question, either above or below the graph, would impact on the visual behaviours and task response. in comparing the results of method one and method two, there are similarities and differences. both methods produce an average scanpath where the correct children have a larger percentage access of a areas, whereas incorrect children distribute their dwells across a, b and c areas, demonstrating that they may not be identifying the most critical areas. the average scanpath derived using the most probable vector indicated that children who provided correct responses had a slightly shorter average scanpath (69 aois) compared to the children who were incorrect (77 aois). this distinction was not evident in the average scanpath derived using the central data item, both groups had an average scanpath of 36 aois. a further difference between the two methods was associated with dwells on a1-a4 aois. the average scanpath (central vector) included dwells on all of the most critical areas (a1-a4) and implied a logical dwell sequence, whereas the average scanpath (most probable vector) did not include an a4 dwell. as noted in the results for method one (most probable vector), the omission of a4 from the average scanpath could be an indicator of variation in how children accessed this scale information, so it did not appear using the most probable vector method. for example, children may have counted up the gridlines within the body of the graph to determine the two hours worked in week 3. or alternatively, children may have looked more generally at the scale of the y-axis, but not necessarily specifically at a4. the source of this variation would need to be investigated with eye tracking data from another task. there are limitations to both methods and in the characterisation of the average scanpath for children who responded correctly and incorrectly. as with all averaging procedures, there is some loss of individual detail, however, the aim of this study was to characterise the visual cognitive behaviours of those children who completed the task correctly versus those children who were incorrect. method one, using a naïve bayes classifier to determine the most probable vector, might be limited by the data used in the analysis unless, as was the case in our study, the eye tracking data obeys the naïve bayes assumption of an uncorrelated aoi sequence. future research should replicate this task, or a close variant, in order to obtain test data to validate the models produced here. overall, this research has applied novel machine learning and mathematical analyses to characterise the visual cognitive behaviours of year 3 children engaged in a mathematics graph task. the resulting characterisations support the importance of initial understanding of the presented task and identification of the most critical information. while identification of the most critical information may be part of a typical classroom practice, this reinforcement, and the logical sequence of visual access of the most critical information, may be beneficial for children who feel less confident. keypoints machine learning techniques take the analysis of eye tracking data to a new level. in particular, they allow for understanding of phenomena in the dimension of time in relation to the order structure of data. spending time looking at critical areas and accessing critical areas in a logical order is important for completing the task successfully. spending too much time looking at less critical areas, and more dwells, is indicative of probable failure to successfully complete the task. for teaching purposes, it is important to identify critical areas and to break down the process of problem-solving into recognisable steps which can be carried out procedurally. acknowledgments this research was financially supported by the ian potter foundation (ref: 20140415). thank you to the schools, teachers, parents and children for their interest and involvement in this research. egme wishes to acknowledge the on-going discussions with nora mcintyre, at psychology in education research centre in the university of york, with whom valuable discussions ensued about the possible measures discussed here, and about classifying gaze patterns according to the subjects state of mind; and the ongoing support of prof. markku s. hannula at the faculty of educational sciences of the university of helsinki during this research. sw was supported by an australian research council decra (de160100830) during the preparation of this manuscript. references australiancurriculum, assessment and reporting authority (2016). the australian curriculum:mathematics,v8.2.retrievedfrom http://www.australiancurriculum.edu.au/ bochynska,a., &laeng,b. (2015).trackingdown thepathofmemory:eyescanpathsfacilitatetheretrievalofvisuospatialinformation. cognitiveprocessing,16(suppl. 1)159–163.doi:10.1007/s10339-015-0690-0 curcio,f.r.(2010).developingdata-graphcomprehensioningradesk-8 (3rdedition.).reston,va:thenationalcouncilofteachersofmathematics. gegenfurtner,a.,lehtinen,e.,&säljö,r.(2011). expertisedifferencesinthecomprehensionofvisualizations:ameta-analysisofeye-trackingresearchinprofessionaldomains. educationalpsychologyreview,23 (4),523-552.doi:10.1007/s10648-011-9174-7 holmqvist,k.,nystrom,m.,andersson,r.,dewhurst,r.,jarodzka,h.,&vandeweijer,j.(2011). eyetracking:acomprehensiveguidetomethodsandmeasures .newyork,ny:oxforduniversitypress. kim,s.,aleven,v.,&dey,a.k.(2014,april). understandingexpert-novicedifferencesingeometryproblemsolvingtasks:asensor-basedapproach .paperpresentedatthechi'14extendedabstractsonhumanfactorsincomputingsystems,toronto, ontario,canada.doi:10.1145/2559206.2581248 knoblich,g.,ohlsson,s.,&raney,g.e.(2001).aneyemovementstudyofinsightproblemsolving. memory&cognition,29(7),1000-1009.doi:10.3758/bf03195762 kotsopoulos,d.,&lee,j.(2012).anaturalisticstudyofexecutivefunctionandmathematicalproblem-solving. thejournalofmathematicalbehavior,31 ,196-208doi:10.1016/j.jmathb.2011.12.005 lai,m.l.,tsaim.-j.,yang,f.-y.,hsu,c.-y.,liut.-z.,lees.w.-y.,leem.-h.,chiou,g.-l.,liang,j.c.andtsaic.-c.(2013).areviewofusingeye-trackingtechnology in exploringlearningfrom2000 to2012.educationalresearchreview,10, 90-115.doi:10.1016/j.edurev.2013.10.001 murphy,k.p.(2012). machinelearning:aprobabilisticperspective,cambridge:mitpress. ratwani,r,m.,trafton,j.g.,&boehm-davis,d.a.(2008).thinkinggraphically:connectingvisionandcognitionduringgraphcomprehension. journalofexperimentalpsychology:applied,14 ,36-49.doi:10.1037/1076-898x.14.1.36 rinaldi,l.,brugger,p.,bockisch,c.j.,bertolini,g.,&girelli,l.(2015).keepinganeyeonserialorder:ocularmovementbindspaceandtime .cognition,142,291-298.doi:10.1016/j.cognition.2015.05.022 rosenzweig,c.,krawec,j.,&montague,m.(2011).metacognitivestrategyuseofeighth-gradestudentswithandwithoutlearningdisabilitiesduringmathematicalproblemsolving: athink-aloudanalysis.journaloflearningdisabilities,44 ,508-520.doi:10.1177/0022219410378445 sasson,n.j.,&elison,j.t.(2012). eyetrackingyoungchildrenwithautism. journalofvisualizedexperiments,61,e3675.doi:10.3791/3675 skiena,s.(2010).thealgorithmdesignmanual (2ndedition),newyork:springer sciencebusinessmedia tobii technology.(2014).usermanual:tobiitx300eyetracker ,revision2.sweden:tobiitechnology. tsai,m.-j.,hou,h.-t.,lai,m.-l.,liu,w.-y.,&yang,f.-y.(2012).visualattentionforsolvingmultiple-choicescienceproblem:aneye-trackinganalysis. computers&education,58 (1),375-385.doi:10.1016/j.compedu.2011.07.012 vangog,t.,paas,f.,van merriënboer,j.j.,&witte,p.(2005).uncoveringtheproblem-solvingprocess:cuedretrospectivereportingversusconcurrentandretrospectivereporting. journalofexperimentalpsychology:applied,11(4),237-244.doi:10.1037/ microsoft word boeren et al_publication.docx frontline learning research vol.3 no. 3 special issue (2015) 68 80 issn 2295-3159 corresponding author: dr ellen boeren, university of edinburgh, moray house school of education, simon laurie house, holyrood road , edinburgh eh8 8aq. phone: 0044-(0)131-651 6233, email: ellen.boeren@ed.ac.uk doi: http://dx.doi.org/10.14786/flr.v3i3.186 mentoring: a review of early career researcher studies ellen boerena, irina lokhtina-antonioub, yusuke sakuraic, chaya hermand. lynn mcalpinee a university of edinburgh, uk b university of leicester, uk c university of tokyo, japan c university of pretoria, south africa e university of oxford, uk article received 14 june 2015 / revised 14 july 2015 / accepted 16 july 2015 / available online 12 october 2015 abstract this paper reviews 23 journal articles on ‘mentoring’ in the context of early career researchers, defined as those in academia with less than 10 years of experience from the start of their phd. achieving a better understanding of mentoring is important since within the higher education context new dynamics have created expectations towards more supportive mechamisms for ecrs. in order to better understand the benefits of mentoring for ecrs careers and psychosocial well-being, it is important to understand (1) the core definitions of mentoring used in research, (2) the research methodologies that are applied to research mentoring, (3) the empirical evidence showing the value of mentoring and (4) the remaining gaps for which future research will be needed. results of the review lead to the following conclusions: there is much research to do, first, to better inform our conceptualization of ecr mentoring and, second, to better understand the value of ecr mentoring support. a research agenda is outlined. keywords: mentoring; early career researchers; review paper boeren et al 69 | f l r 1. introduction this paper presents the results of a review of mentoring papers that appeared in leading higher education journals in the past ten years. over the past years, new dynamics have emerged in the context of higher education globally that have created both expectations and aspirations towards supportive mechanisms of early career researchers’ (ecrs) professional development. in this paper, ecrs are defined as researchers in academia with less than 10 years of experience from the start of their phd studies, congruent with the definition used by the european commission. why is mentoring an important topic in relation to ecrs? internationally, ecrs in academia are challenged as regards access to resources, supportive interactions and lack of transparent career prospectives (the european commision 2011). related to this is the underlying pressure experienced by ecrs in terms of their opportunities for research and development (sauermann & roach, 2012; åkerlind, 2005; vitae, 2011) and international mobility (jepsen et al. 2014; mellors-bourne et al., 2013; kehm, 2007) required to enhance their career prospects and secure stable positions. moreover, academic workplaces have been transformed; that in turn, has lengthened the learning trajectories of ecrs (bonetta, 2011) and made them in some respects more complex (shuster, 2009). the above reports have pointed out the learning challenges ecrs perceive in developing their intellectual independence and scholarly profiles (gardner, 2008 for doctoral students; laudel & glaser, 2008 for postdocs). these reports also make relatively frequent mention of the value of mentors and mentoring (mullen & forbes, 2000; hemmings, 2012) as do reports of institutional practices to support ecrs (debowski, 2012). in this context, mentoring broadly can be situated in an array of complex supportive mechanisms including co-working and networking that lead to ecrs’ personal development, adaptation and integration as members of their scholarly community (e.g., baker et al., 2014). on the one hand, this includes informal mentoring through interactions between academics at different career stages. on the other hand, this includes formal mentoring programmes organised and structured at the institutional level. the role of mentoring in relation to ecrs corresponds to the more general literature on mentoring which focuses on ‘career’ and ‘psychosocial’ functions as the two major functions of support between mentors and mentees, contributing to (1) increasing the chances for promotion and higher salaries, building a network of professional collaborators (career function) as well as (2) achieving higher levels of confidence and social skills (psychosocial function) (ragins & kram, 2007). not only the specific function of mentoring, but also the organisation in which mentoring takes place might also have a significant effect on how mentoring is carried out, and which outcomes of mentoring are experienced. it is the specific higher education context we are interested in, and how mentoring gets discussed in the higher education literature as regards ecrs. four specific aims were formulated for the review. first of all, we wondered the extent to which ecr mentoring was conceptually constituted since a review on mentoring spanning 30 years of formal mentoring programs in the fields of education, business and medicine (ehrich et al., 2004) noted the absence of conceptual frameworks. so, we undertook to explore the conceptual tools and definitions used in the post-ehrich literature on ecr mentoring, starting from 2005. secondly, not only were we interested in the definitions and conceptual frameworks used by scholars, but we also wished to document the methodological tools they used to measure the impact of their definitions of mentoring. thirdly, we also analysed the extent to which empirical evidence would provide insight into how best to support ecrs’ development, i.e., what to avoid, since ehrich et al. (2004) had reported some negative consequences related to, for instance, the lack of training of mentors or mismatch of expertise or personality. there was also some evidence, again from non-higher education contexts (eby, 2008), that the effects of mentoring could be quite small. this led us to explore the nature of the evidence that would suggest mentoring could be a solution for the existing problems with career development and retention of ecrs, in boeren et al 70 | f l r particular whether mentoring could be used as a tool that contributes to ecrs’ professional development, including their competence (linden et al., 2013) and professional confidence in a range of key academic practices. finally, this review analysis aimed to identify gaps in the current literature and to explore the recommendations scholars have made for future research. in other words, our goal was to provide a research agenda for further inquiry into mentoring in relation to ecrs. 2. research question our overall question was ‘what does the ecr literature-research say about mentoring?’ specific research questions, summarizing the aims in the previous four paragraphs were: • what is the range of ways in which ecr mentoring is defined or conceptually presented? • what methodological tools are used, i.e., the range of ways in which mentoring is measured? • what evidence (or counter evidence) is there of the value of ecr mentoring? • to what extent does the literature point to future research? the answers to these questions provided a means to assess the extent to which mentoring was a robust workable construct in examining ecr experience. 3. method scope of review: as we undertook the study, we noted two related fields of study on mentoring, mentioned in the introduction: • the informal field of learning and acquiring research skills from interactions with more experienced researchers usually working in the same context • the structured, institutionalized programs of mentoring which are designed to support the needs of special groups like women, newly hired staff, or minority groups if they were directed at ecr. search process: in terms of the scope of the review, we decided to include journal articles published in the isi top ranked higher education journals in the past 10 years only (2005-2014) as these are supposed to be the most influential ones in the field. papers had to be published post-ehrich review, that is 2004. we reviewed papers that appeared in higher education, journal of higher education, research in higher education, review of higher education, and studies in higher education. these journals were likely the ones that he researchers, developers and policy makers would go to in seeking information about ecr mentoring. we also included the international journal for academic development and international journal for researcher development since these two journals are highly referenced in the field of academic development and thus need to be taken into account in an academic development-related review exercise. while the review has been limited to these journals, we feel confident in having made a sound selection of the major journals in the field. the keywords ‘mentor(ing)’ combined with ‘early career researchers’, ‘post-docs’, ‘doctoral students’, and variants had to appear in the title and/or the abstract of the article. we distributed the search task amongst the group of authors, with the search producing 23 papers. the distribution of papers according to journals can be found in table 1, full references of the 23 papers are included in the reference list at the end of the paper. boeren et al 71 | f l r analysis: the analysis framework drew upon boote and beile’s (2005) literature review scoring rubric, which they developed based on hart’s (1999) previous work. the five review categories boote and beile constructed are (1) coverage (reasons for inclusion or exclusion), (2) synthesis (state of the field, ambiguities, new perspectives), (3) methodology (methodologies and research techniques), (4) significance (practical and scholarly significance) and (5) rhetoric (level of coherence and structure). boote and beile’s work was specifically undertaken to increase scholar’s awareness of the literature review stage of a research project and is well-cited. in order to synthesize the selected literature, we created an excel file with the following expanded sub-categories of all but the last category in boote and beile’s (2005) rubric: (1) nature of the article: empirical or theoretical, (2) gap identified by authors, (3) question or purpose of the article, (4) conceptual framework for study, (5) pedagogical intervention (if there was one), (6) data collection method, (7) sample and nature of participants (8) country, disciplines (9) key empirical findings, if any (10) conceptual representation of results (11) practical and pedagogical implications (given our interest in ecr development), (12) suggestions for future research, (13) core references used by authors, and (14) reviewers notes/critique. in general, it can be argued that the sequence from exploring the literature, identifying a gap, spelling out research questions, explaining methodology, explaining and discussing results, and drawing conclusions with recommendations for future policy and practice, is perceived as a standard structure following which social sciences journal articles are written (see shon, 2012). this structure is also reflected in the sequence of our four research questions, focussing on (1) conceptual frameworks and definitions, (2) methodological approaches, (3) empirical evidence and (4) recommendations for future research. the articles were distributed across the group of authors. we each read and then summarized the papers; each author separately wrote a description of the emerging findings and his/her interpretation of them. these were used by two of the authors to create a first draft of the findings and conclusions. the draft was then reviewed and edited by the other authors. table 1: papers included in review (journals listed in alphabetic order) journal year authors title country article keywords higher education 2014 lechuga a motivation perspective on faculty mentoring: the notion of ‘‘non-intrusive’’ mentoring practices in science and engineering us faculty mentoring motivation discipline 2014 van der weijden, belder, van arensbergen & van den besselaar how do young tenured professors benefit from a mentor? effects on management, motivation and performance. the netherlands mentorship academic careers research management human resources motivation performance 2011 bell & treleaven looking for professor right: mentee selection of mentors in a formal mentoring program. australia academic development flexible mentoring mentor-mentee choice pairing process 2011 lechuga faculty-graduate student mentoring relationships: mentors’ perceived roles and responsibilities. us faculty graduate students mentoring higher education 2011 scaffidi & a positive postdoctoral australia postdocs boeren et al 72 | f l r berman experience is related to quality supervision and career mentoring, collaborations, networking and a nurturing research environment. mentoring collaborations networking research environment international journal for academic development 2012 saito when a practitioner becomes a university faculty member: a review of literature on the challenges faced by novice ex-practitioner teacher educators. professional development faculty member ex-practitioner teacher educator 2012 weaver, robbie, kokonis & miceli collaborative scholarship as a means of improving both university teaching practice and research capability. australia academic development mentoring scholarship of teaching and learning 2011 cox the impact of communities of practice in support of early-career academics. us early-career academics academic development program transformative learning community of practice faculty learning community 2011 remmik, karm haamer & lepp early-career academics’ learning in academic communities. estonia early career academics professional learning professional identity community of practice 2010 hubball, clarke & poole ten‐year reflections on mentoring sotl research in a research‐intensive university. canada mentoring scholarship of teaching and learning (sotl) sotl research outcomes 2009 foote & solem toward better mentoring for early career faculty: results of a study of us geographers. us early career faculty mentoring doctoral education 2008 kamvounias, mcgrath‐ champ & yip ‘gifts’ in mentoring: mentees' reflections on an academic development program. australia mentee mentor mentoring gift 2005 mathias mentoring on a programme for new university teachers: a partnership in revitalizing and empowering collegiality. uk international journal for researcher development 2014 baker, pifer& griffin mentor-protégé fit. us mentoring mentor-protégé fit doctoral education student–faculty mentoring relationships academic identity 2014 browning, thompson & developing future research leaders. australia early career researchers researcher development boeren et al 73 | f l r dawson evaluation research leaders track record studies in higher education 2013 gilmore, maher, feldon & timmerman exploration of factors related to the development of science, technology, engineering, and mathematics graduate teaching assistants' teaching orientations. us graduate teaching assistant teaching orientation teacher beliefs graduate student education graduate student development graduate student mentoring 2011 lindén, ohlin & brodin mentorship, supervision and learning experience in phd education. sweden mentorship phd students phd supervision learning outcomes professional development 2010 hopwood doctoral experience and learning from a sociocultural perspective. uk doctoral education doctoral practices academic practice sociocultural perspectives doctoral study 2008 kamler rethinking doctoral publication practices: writing from and beyond the thesis. australia the journal of higher education 2012 noy & ray graduate students' perceptions of their advisors: is there systematic disadvantage in mentorship? us 2009 patton my sister’s keeper: a qualitative examination of mentoring experiencesamong african american women in graduate and professional schools us the review of higher education 2014 main gender homophily, ph.d. completion, and time to degree in the humanities and humanistic social sciences. us 2013 o’meara, knudsen & jones the role of emotional competencies in facultydoctoral student relationships. us boeren et al 74 | f l r 4. results as stated above, the main aim of this paper is to generate insight into the current academic literature on ecr mentoring; the findings are structured around the four research questions. 4.1 what is the range of ways in which ecr mentoring is defined or conceptually presented? in order to answer this question, we first explored the nature of the articles, the gaps identified by the authors and the specific research questions in these papers, as these elements could be expected to be related to the conceptual frameworks and definitions authors had drawn upon in developing their research study. nature of articles: our initial search of the journals confirmed ehrich’s (2004) outcome that while mentoring was frequently referred to, it was rarely studied. in fact, we found more articles that referred to mentoring than those which studied mentoring as the core business of their research project. of the 23 articles that studied mentoring and formed the basis of the review, 19 were empirical studies, four (mathias, 2005; kamvounias et al, 2008; hubball et al., 2010; bell & treveanor, 2011) of which evaluated programs that had a mentoring element. four articles were non-empirical in nature (baker et al., 2014; cox, 2013; saito, 2013; weave et al., 2013). gap identified: in examining the ‘gap’ that the authors were attempting to address, we noted that a definitional representation of mentoring was rare. for instance, most of the studies addressing doctoral experience used as a starting point that the supervisor (referred to as advisor in the us) was equivalent to a mentor, though baker et al. (2014) noted the ambiguity in the roles of supervisor, advisor, and mentor. so, while half of the studies explicitly named the ‘gap’ as the need to understand mentoring better, most appeared to assume a shared understanding of mentoring between authors and readers with the focus of each study mainly directed to a given situation in a specific context, e.g., support for teaching assistants. question/purpose: the research questions underlying the studies were formulated to answer question about (1) experiences of mentoring and (2) mentoring relationships. the bulk of the studies (kamler, 2008; patton, 2009; hopwood, 2009; lechuga, 2011; baker et al., 2012; noy & ray, 2012; gilmore et al., 2013; linden et al., 2013; main, 2014) addressed mentoring in the context of doctoral education, answering a wide range of research questions in relation to career advice, teaching and supervisory relationships. this group was followed by a substantial minority on mentoring related to teaching development with reference to ecrs, though not necessarily defining who they were in terms of their length of research or academic experience. lastly, only one addressed postdoctoral experience (scaffidi & berman, 2011). in general, across all papers reviewed, two main purposes were thus found. (1) many articles focused on gaining better insight into the way ecrs experience mentoring and whether they get something out of it in terms of their own learning process and professional development. examples include patton’s article (2009) on experiences of african american women in academia, kamler’s research (2008) on mentoring experiences in relation to academic writing, or mathias’ paper (2005) on specific mentoring experiences in relation to participation in the postgraduate certificate of academic practice (the uk’s officially recognised higher education teaching qualification). (2) another cluster of papers focused on the specific relationships that are being built between mentors and mentees. for instance, lechuga (2011) focused on mentors’ responsibilities and the relationships they built with faculty-graduate students. kamvanious et al’s research (2008) explored the idea of ‘gifts’ in mentoring, and how mentees want to give something back to their mentors. research by o’meara et al. (2013) explored the ‘emotional landscape’ of relationships between mentors/advisors and doctoral students. conceptual frame and core references: having identified the nature, gaps and purposes of these articles, the next step was to explore the conceptual frameworks used by these authors. in general, conceptual framing of the studies in relation specifically to mentoring was minimal. rather, papers tended to draw on general theories of learning and faculty development largely rooted in socio-constructivist boeren et al 75 | f l r perspectives, e.g., communities of practice (cox, 2013), learning (linden et al., 2013), emotional competence (o’meara et al., 2013), scholarship of teaching (gilmore et al., 2013; weave et al., 2013) or were firmly empirical (e.g. mathias et al. 2005; browning et al., 2014). two empirical studies stood out for their efforts to frame mentoring: van der weijden et al. (2004) and linden et al. (2013). linden et al. (2013) used a typology of learning outcomes related to mentoring in the business context (lankau and scandura 2007). van der weijden et al. (2014) drew on the meta-analysis of mentoring programs in a range of fields referred to earlier (ehrich et al, 2004). as well, baker et al. (2014) in their conceptual paper proposed a model based on the notion of professional, relational and personal fit, rather than similarity, between student and supervisor. given the diversity of stances taken in these studies, it was hard to discover a consistent pattern of common core references to conceptions of mentoring. to conclude, as a general answer to this research question, the most striking finding of this analysis was a confirmation of the findings in the earlier non-higher education review (ehrich et al., 2004): the generally under-conceptualized nature of mentoring in empirical studies on ecr experience. we would encourage researchers undertaking future studies of ecr mentoring to explicitly explore the value of different conceptual frameworks of mentoring, perhaps beginning with erich et al.’s meta-analysis. 4.2 what methodological tools are used, i.e., the range of ways in which mentoring is measured? in order to answer this research question, we explored the specific settings in which data were collected, by this we mean, the nature of the participants, their disciplines, the national location of the study, as well as the ways in which data were collected and analyzed, distinguishing principally between quantitative and qualitative methodologies. country/disciplines/participants: a majority of the papers represented research in english-speaking countries, with more in north america and australia than in the uk. there were ten north american (nine us gilmore et al., 2013; foote & solem, 2009; main, 2014; noy & ray, 2012; baker et al., 2014; o'meara et al., 2013; cox, 2011; lechuga, 2014; patton, 2009, one canada – hubball et al., 2010); eight australia and five eu (three in continental europe: sweden – linden et al., 2013; the netherlands – van der weijden et al. 2014; estonia – remmik, 2011); two in the uk (mathias, 2005; hopwood, 2010). as to disciplinary context, four focused on stem fields (scaffaldi & berman, 2011; gilmore et al., 2013; van der weijden et al., 2014; lechuga, 2014), one focused specifically on humanist disciplines (main, 2014) and the remainder represented participants from a range of disciplines, although most within social sciences. data collection/participants: as to the methods, qualitative and mixed methods were used more than solely quantitative studies. all, but one, of the quantitative studies were based on surveys, while main (2014) conducted an analysis of pre-existing large data sets. as for the qualitative studies, the four papers evaluating programs used semi-structured interviews, student work, program documents and sometimes focus groups (hubball et al., 2010; bell & treleaven 2011; kamvounias et al., 2008; mathias, 2005). the other qualitative studies were based on semi-structured interviews, with one also using focus groups, and another interviews over time (kamler, 2008). of the two mixed methods studies (foote & solem, 2009; gilmore et al., 2013), one used interviews which were analyzed both quantitatively and qualitatively; the other used interviews, followed by a survey. participant numbers in the qualitative studies tended to be quite small though foote & solem (2009) used focus groups with 46 ecrs and hopwood (2010) 33 in focus groups and interviews. the quantitative studies also varied considerably in size from 86 (van der weijden et al., 2014) to several thousands (main, 2014). as a general answer to this research question, we concluded that while a mix of qualitative and quantitative approaches were used, most studies (regardless of the methods) were based on one-time data collection with small numbers of participants. one study (kamler, 2008) stood out in studying participant experience longitudinally which we view an innovative approach which might be emulated in future studies. further, the tools used in the studies rarely were designed to capture experience related to the specific mentoring activities under study. we suggest future studies could develop tools to better capture the boeren et al 76 | f l r experience of specific elements of mentoring. lastly, the majority of studies were based on self-report; future research might move beyond this way of collecting data. 4.3 what evidence (or counter evidence) is there of the value of ecr mentoring? evidence of the value of mentoring was searched for in the results and conclusion sections of the papers under review. apart from the nature of the results, we also explored the way in which they were formulated, in order to search for a conceptual representation of the results, which could form a strong conceptual basis for future research. findings: key findings (if any), conceptual representation of results while most papers reported positive experiences and relationships in mentoring, it is important to recognise the influence of a range of factors on mentoring: e.g. mathias (2005) concluded that mentoring provided within a postgraduate course resulted in several positive experiences, though much of the effect depended on the successful match between mentor and mentee. still, given the research was undertaken in different contexts and within different disciplines, it is difficult to draw an overall conclusion which indicates either a positive or negative effect of mentoring. for example, lechuga (2014) proposes that some mentoring relations that are acceptable in the social sciences may be considered “intrusive” in science and engineering. still, at first sight, the cumulative results of these studies would appear to confirm the earlier non-higher education literature reviews: eby et al. (2008) argued that the effects of mentoring seem quite small, and that mentoring does not always lead to positive experiences; ehrich et al. (2004) that mentoring can, in fact, have negative consequences. some authors, like patton (2009) have already reflected on the lack of robust conceptual frameworks emerging from studies of mentoring, such as the emphasis on the paternal, male representation of mentoring and lack of critical studies on the topic. we agree that a more critical attitude is required among researchers in the field. as a general answer to this research question, and as noted earlier, there was a stronger focus on the positive outcomes of mentoring rather than any negative ones. given that the earlier reviews also noted this, a key aspect of any future research needs to be a careful seeking after possible negative effects as well as whether the effects are worth the time and money invested. to continue to propagate the notion that mentoring is important for ecr success without sufficient evidence seems counter-productive. further analysis of both short-term and long-term effects of mentoring on ecr development as well as the existing challenges is recommended. furthermore, apart from trying to position mentoring as something ‘positive versus negative’, it might be worthwhile to control for a wide range of other factors such as age, gender, subject, type of university, etc., in order to account for other direct or indirect effects of mentoring experiences and relationships. 4.4 to what extent does the literature point to future research? the papers pointed towards the future in two ways. on the one hand, several papers made recommendations for future policy and practice, which were mainly concentrated around actions to increase the importance of mentoring and awareness of what good mentoring among mentors consists of. both foote and solem (2009) and gilmore et al. (2013) reflected on the notion of inclusiveness and involvement of mentors in their mentoring practices with students. baker et al. (2014) recommended having more advanced reflections on the fit between doctoral students and the supervisor in order to increase the effectiveness of mentoring. as well, apart from reflecting on what needed to happen in future mentoring, some papers formulated recommendations for future research, such as the need for a better understanding of what is causing good mentoring (kamvounias et al., 2008; noy & ray, 2012; o’meara et al., 2013; baker et al., 2014; van der weijden et al., 2014), as well as the need to enlarge research in terms of countries and disciplines (patton, 2009; bell & treveaven, 2011). as a general answer to this research question, we boeren et al 77 | f l r suggest there is room for conducting further research on mentoring, in a broader and more diversified way than it has been conducted until now. 5. significance we undertook this review to assess the state of the literature on ecr mentoring in the past 10 years, post-ehrich (2004) review, to provide a base for future inquiry. we also wanted to consider possible policy implications since the eu has created an imperative for institutions to address the career development needs of post-docs/ early career researchers within the bologna strategy, the result of which has been a proliferation of institutional mentoring. we conclude there is much research still to be done that can better inform our conceptualization and implementation of ecr mentoring support and the development of mentoring programs. below we make specific recommendations for future research drawn from our review. a future research agenda we suggest a key goal is a more robust conceptualisation of ecr mentoring including a welldefined representation of the learning process. it should include at a minimum, starting with the recommendations at the top of this list: • examine the theoretical awareness of mentoring (including organisational and individual obstacles) that exist among ecrs and their mentors. these findings would help to capture the complexity of ecr mentoring support. • study ecrs who are actively engaged in structuring informal learning situations that meet their specific needs at different times, and that as mentees they are free to choose one or more mentors. this fits with the idea of mentoring as support towards self-organisation and self-development, in which early career researchers gain the skills to grow towards independence (gardner, 2008; laudel & glaser, 2008). • further examine the effects of mentorship relationships by looking at different elements of mentoring (especially the functions of mentoring), since there is evidence of some negative consequences related to the mismatch of expertise and/or personality (e.g. ehrich et al. 2004). • seek evidence of how the literature on mentoring is connected with growing ecr confidence and competence as independent researchers and scholars; this would mean linking mentoring to the range of abilities essential for ecrs to develop in relation to research, teaching, management, leadership, intercultural skills, publishing, media use, and expectations regarding social engagement (e.g., debowski, 2012). • explore the potential of a trans-organizational conceptualization of mentoring that addresses transfer across institutions and countries, as mobility and intercultural learning form important aspects of early career researcher experience today (horta, 2009). • consider ways to link mentoring for ecr development to other fields including education (not higher education), business and medicine (ehrich et al., 2004) in order to build up the required resource base and decide on suitable strategies and benchmarks. • examine organizational measures for supportive mentoring systems by integrating formal learning with informal mentoring to support cooperation between less and more experienced researchers and thus encourage collegial relationship across scholarly communities in activities such as publishing, research organization, data collection. this review analysis was conducted post-ehrich review, exploring the period from 2005 to 2014. overall, we consider ehrich et al.’s earlier assessment of the education and business reports on mentoring programs to still hold true – the need to: a) attend to mentors’ experiences our analyses showed a relatively weaker focus on mentors, and implied the need to examine both mentees and mentors’ experiences ; b) pay boeren et al 78 | f l r more attention to negative outcomes few authors reported negative outcomes of mentoring, and it is necessary to understand how the system malfunctions in those cases ; c) move beyond the data that have generally been collected (self-report process data) for example, observation approaches have not been used in our articles reviewed and seek evidence of impact on actual behaviour performance. and, in order to assess the value of mentoring, attention should be given to opportunities to collect longitudinal data. keypoints more research is recommended to better inform the conceptualization of mentoring a future research agenda needs to explore formal as well as informal aspects of mentoring it would be recommended to explore mentoring using a range of research methods references åkerlind, g. (2005). postdoctoral researchers: roles, functions and career prospects. higher education research & development, 24(1), 21-40. doi:10.1080/0729436052000318550 bonetta, l. (august 26, 2011). postdocs: striving for success in a tough economy. science careers from the journal science. debowski, s. (2012). strategic research capacity building: investigating higher education researcher development strategies in the united kingdom, united states and new zealand: the winston churchill memorial trust of australia. eby., l., allen, t., evans, s., ng, t., & dubois, d. (2008). does mentoring matter? a multidisciplinary meta-analysis comparing mentored and non-mentored individuals. journal of vocational behaviour, 72(2), 254-267. doi: 10.1016/j.jvb.2007.04.005 ehrich, l., hansford, b., & tennent, l. (2004). formal mentoring programs in education and other professions: a review of the literature. educational administration quarterly, 40(4), 518-540. doi: 10.1177/0013161x04267118 gardner, s. (2008). what's too much and what's too little?" the process of becoming an independent researcher in doctoral education. journal of higher education, 79(3), 326-350. doi 10.1353/jhe.0.0007 hansford, e.c., ehrich, l.c. & tennent, l. (2004). formal mentoring programs in education and other professions: a review of the literature. educational administration quarterly, 40(4), 518-540. doi: 10.1177/0013161x04267118 jepsen, d.m., sun, j. j-m., budhwar, p.s., klehe, u-c., krausert, a., raghuram, s. & valcour, m. (2014) ‘international academic careers: personal reflections’. the international journal of human resource management, 25(10), 1309–1326. doi:10.1080/09585192.2013.870307 hemmings, b. (2012). sources of research confidence for early career academics: a qualitative study. higher education research and development, 31(2), 171-184. doi:10.1080/07294360.2011.559198 kehm, b. (2007). the changing role of graduate and doctoral education as a challenge to the academic profession: europe and north america compared, in kogan, m. &teichler, u. (eds.) key challenges to the academic profession, unesco forum on higher education research and knowledge, 111-124. laudel, g., & glaser, j. (2008). from apprentice to colleague: the metamorphosis of early career researchers. higher education, 55, 387-406. doi: 10.1007/s10734-007-9063-7 lindén, j., ohlin, m. &brodin, e.m. (2013). mentorship, supervision and learning experience in phd education. studies in higher education, 38 (5), 639-662. doi:10.1080/03075079.2011.596526 boeren et al 79 | f l r mellors-bourne, r., metcalfe, j., & pollard, p. (2013). ‘what do researchers do? early career progression of doctoral graduates. retrieved from http://www.vitae.ac.uk/cms/files/upload/what-do-researchers-doearly-career-progression-2013.pdf. mullen, c., & forbes, s. (2000). untenured faculty: issues of transition, adjustment and mentorship. mentoring & tutoring: partnership in learning, 8(1), 31-46. doi:10.1080/713685508 ragins, b.r. & kram, k.e. (2007). the roots and meaning of mentoring. in b.r. ragins & k.e. kram (eds). the handbook of mentoring at work: theory, research, and practice (pp. 3-15). thousand oaks: sage. sauermann, h., & roach, m. (2012). science phd career preferences: levels, changes and advisor encouragement. plos one, 7(5), e36307. doi: 10.1371/journal.pone.0036307 schuster, s. (2009). bambed commentary: post-phd education. biochemistry and molecular biology education, 37(6), 381-382. doi: 10.1002/bmb.20337 shon, p.c.h. (2012). how to read journal articles in the social sciences. a very practical guide for students. london: sage. the european commission (2011) towards a european framework for research careers, [on-line], available at www.ec.europa.eu/pdf/research_policies/towards_a_european_framework_for_research_careers_fin al.pdf (accessed 18 december 2014) vitae. (2011). principal investigators and research leaders survey. london, uk. papers reviewed baker, v., l., pifer, m., j., & griffin, k., a. (2014). mentor-protégé fit. international journal for researcher development, 5(2), 83-98 http://dx.doi.org/10.1108/ijrd-04-2014-0003 bell, a., &treleaven, l. (2011). looking for professor right: mentee selection of mentors in a formal mentoring program. higher education, 61(5), 545-561. doi10.1007/s10734-010-9348-0 browning, l., thompson, k., & dawson, d. (2014). developing future research leaders. international journal for researcher development, 5(2), 123-134. http://dx.doi.org/10.1108/ijrd-08-2014-0019 cox, m. d. (2011).the impact of communities of practice in support of early-career academics. international journal for academic development, 18(1), 18-30. doi:10.1080/1360144x.2011.599600 foote, k. e., & solem, m. n. (2009). toward better mentoring for early career faculty: results of a study of us geographers. international journal for academic development, 14(1), 47-58. doi:10.1080/13601440802659403 gilmore, j., maher, m. a., feldon, d. f., & timmerman, b. (2013). exploration of factors related to the development of science, technology, engineering, and mathematics graduate teaching assistants' teaching orientations. studies in higher education, 39(10), 1910-1928. doi:10.1080/03075079.2013.806459 hopwood, n. (2010). doctoral experience and learning from a sociocultural perspective. studies in higher education, 35(7), 829-843. doi:10.1080/03075070903348412 hubball, h., clarke, a., & poole, g. (2010). ten-‐year reflections on mentoring sotl research in a research-‐ intensive university. international journal for academic development, 15(2), 117-129. doi:10.1080/13601441003737758 kamler, b. (2008). rethinking doctoral publication practices: writing from and beyond the thesis. studies in higher education, 33(3), 283-294. doi:10.1080/03075070802049236 kamvounias, p., mcgrath-‐champ, s., & yip, j. (2008). ‘gifts’ in mentoring: mentees' reflections on an academic development program. international journal for academic development, 13(1), 17-25. doi:10.1080/13601440801962949 lechuga, v. (2011). faculty-graduate student mentoring relationships: mentors’ perceived roles and responsibilities. higher education, 62(6), 757-771. doi: 10.1007/s10734-011-9416-0 lechuga, v. (2014). a motivation perspective on faculty mentoring: the notion of “non-intrusive” mentoring practices in sciences and engineering. higher education, 68, 909-926 doi10.1007/s10734-014-9751-z boeren et al 80 | f l r lindén, j., ohlin, m., &brodin, e. m. (2011). mentorship, supervision and learning experience in phd education. studies in higher education, 38(5), 639-662. doi:10.1080/03075079.2011.596526 main, j. b. (2014). gender homophily, ph.d. completion, and time to degree in the humanities and humanistic social sciences. the review of higher education, 37(3), 349-375. doi:10.1353/rhe.2014.0019 mathias, h. (2005). mentoring on a programme for new university teachers: a partnership in revitalizing and empowering collegiality. international journal for academic development, 10(2), 95-106. doi:10.1080/13601440500281724 noy, s., & ray, r. (2012). graduate students' perceptions of their advisors: is there systematic disadvantage in mentorship? the journal of higher education, 83(6), 876-914. o’meara, k., knudsen, k., & jones, j. (2013). the role of emotional competencies in faculty-doctoral student relationships.the review of higher education, 36(3), 315-347. doi: 10.1353/rhe.2013.0021 patton, l. d. (2009). my sister’s keeper: a qualitative examination of mentoring experiences among african american women in garaduate and professional schools. the journal of higher education, 80(5), 510537 doi:10.1353/jhe.0.0062 remmik, m., karm, m., haamer, a., &lepp, l. (2011).early-career academics’ learning in academic communities.international journal for academic development, 16(3), 187-199. doi:10.1080/1360144x.2011.596702 saito, e. (2012). when a practitioner becomes a university faculty member: a review of literature on the challenges faced by novice ex-practitioner teacher educators. international journal for academic development, 18(2), 190-200. doi:10.1080/1360144x.2012.692322 scaffidi, a., k., & berman, j., e. (2011). a positive postdoctoral experience is related to quality supervision and career mentoring, collaborations, networking and a nurturing research environment. higher education, 62(6), 685-698. doi10.1007/s10734-011-9407-1 van der weijden, i., belder, r., van arensbergen, p., & van den besselaar, p. (2014). how do young tenured professors benefit from a mentor? effects on management, motivation and performance. higher education, 1-13. doi10.1007/s10734-014-9774-5 weaver, d., robbie, d., kokonis, s., &miceli, l. (2012).collaborative scholarship as a means of improving both university teaching practice and research capability.international journal for academic development, 18(3), 237-250. doi:10.1080/1360144x.2012.718993 microsoft word meredith et al_publication.docx frontline learning research vol.5 no. 2 (2017) 24 -‐ 35 issn 2295-‐3159 the measurement of collaborative culture in secondary schools: an informal subgroup approach chloé mereditha*, nienke m. moolenaar b, charlotte struyve a, machteld vandecandelaere a, sarah gielen a & eva kyndt a a ku leuven, belgium b utrecht university, the netherlands article received 28 november / revised 17 january / accepted 21 january / available online 28 march abstract research on teacher collaboration underlines the importance of a collaborative culture for teachers’ functioning. however, while scholars usually regard collaborative culture as a school team characteristic, this study argues that subgroups may be more meaningful units of analysis to conceptualize and assess teachers’ perceptions of collaborative culture. based on the assumption that collaborative culture is developed, expressed, and maintained in frequent work-related interactions, this study hypothesizes that collaborative culture is not homogenously spread over the school but rather varies between informal subgroups. data from 760 flemish teachers were examined using social network analysis and consensus analyses. the results provided evidence that perceptions on collaborative culture are more homogeneous within informal subgroups that are characterized by frequent interactions than the entire school team. this finding stresses the importance of assessing the meaningful unit of analysis for collective-level and socially-constructed concepts, such as collaborative culture. moreover, the benefits and potential of a social network approach to identify (socially stable) subunits within the school team are illustrated. keywords: collaborative culture, informal subgroups, social network analysis, secondary schools * corresponding author: chloé meredith, faculty of psychology and educational sciences, ku leuven. dekenstraat 2 – bus 3772, 3000 leuven, belgium. chloe.meredith@kuleuven.be doi: http://dx.doi.org/10.14786/flr.v5i2.283 meredith et al | f l r 25 1. introduction several studies have described the benefits of having a collaborative culture in schools. as a result, teachers have repeatedly been advised to move away from traditional norms of isolation and individuality and move towards greater collaboration (louis & kruse, 1995; marks & louis, 1997). teachers are encouraged to discuss problems, offer different viewpoints, leading to constructive collaboration and consensus (barczak, lassk, & mulki, 2010). in line with research in various organizations, hargreaves (1994) and leithwood, leonard, and sharatt (1998) indicated that cultures with characteristics expressed in terms of collegiality and collaboration generally are those types that promote feelings of professional involvement, efficacy and satisfaction. collaborative culture can be regarded as a part of the organizational culture (peterson & beard, 2004, sveiby & simons, 2002). organizational culture has been described as the shared, and often taken for granted, values, norms and practices within an organization (schein, 2010). in previous studies, collaborative culture has often been conceptualized and assessed as a characteristic of the entire organization, in this case the school (e.g., strahan, 2003; waldron & mcleskey, 2010). however, questions can be asked whether the school is always the most meaningful unit to conceptualize and assess collaborative culture. a precondition for and significant characteristic of organizational culture is that values, norms and practices are shared by a significant portion of organizational members (yang, 2007). these shared values, norms, and practices are developed, expressed and maintained in the frequent communication of organizational members (dumay, 2009). however, within a school, especially in larger schools, interactions are often concentrated in subgroups (firestone & pennell, 1993; frank, 1995). as a result, subgroup members may develop their own norms, values and practices concerning collaboration, resulting in subcultures within the school (soeters, 1988). if we want to investigate collaborative culture as a working condition that affects teachers, it is crucial to identify the relevant unit of analysis. in order to make correct inferences, this unit of analysis needs to reflect the actual collaborative culture one works in. this study therefore focuses on the conceptualization and assessment of collaborative culture and investigates whether subgroups are more meaningful units of analysis than the entire school team. as culture is assumed to result from frequent work-related interactions among teachers, subgroups are identified by means of social network analysis. in this way, this study not only contributes to the assessment of socially-constructed concepts, such as collaborative culture, but also offers a different, innovative perspective on the identification of (socially stable) subunits within the school. 2. conceptual framework 2.1. conceptualizing collaborative culture collaborative culture can be defined as the shared values, norms and practices on the matter of teamwork and communication. flores (2004) referred to collaborative cultures in schools as the working relationships, which are spontaneous, voluntary, evolutionary, and development-oriented, wherein the stance of working together becomes part of the personality of the school. collaboration can then be regarded as a process in which teachers come together to discuss, share knowledge, coach each other, reflect on common experiences and build the curriculum together (lieberman, 1990, 1995). by doing this, teachers create a culture of acceptance for mutual support and collaboration (louis, marks, & kruse, 1996), wherein a norm of collegiality becomes a part of the working stance (little, 1982; nias, 2005). this ‘sharedness’, or homogeneity, of values, norms, and practices is developed, expressed, and performed in the day-to-day interactions of individuals working in the same context, facing the same challenges and goals (dumay, 2009; harris, 1994; van maanen & schein, 1979). however, in most schools, frequent interactions with all school team members are practically impossible and as a result, culture is meredith et al | f l r 26 dispersed over different parts (sackmann, 1992). this dispersion within an organization does not necessarily imply that a culture is heterogeneous, but rather that there are subgroups who have their own culture (adkins & caldwell, 2004). previous studies already provided evidence that members of subgroups share beliefs and values, and exhibit similar actions based on these beliefs and values (ashforth & mael, 1989; harris, 1994). moreover, social psychologists and sociologists indicated that individuals perceive and evaluate their organization based on what happens in their close social neighborhood, meaning the people with whom they engage in frequent interactions (lock & crawford, 1999; sackmann, 1992). this is important, especially because culture is manifested in and often measured by the perceptions of individuals (hofstede, 1998). in other words, teachers that frequently interact, will not only develop and maintain shared values, norms and practices concerning collaboration, they will also perceive and evaluate collaboration in the school based on what happens in their social neighborhood. previous studies have already indicated that subgroups can be found within the school team (frank, 1996). frank (1995) argued that subgroups can be regarded as the crucial link between individuals and organizations, and may therefore be more meaningful units to conceptualize and measure collaborative culture. 2.2. assessing collaborative culture: an informal subgroup approach previous studies often focused on formal subgroups to determine the boundaries of subunits within the school (e.g., busher & blease, 2000; visscher & witziers, 2004). this approach is based on the assumption that interactions among teachers follow the boundaries of this formal structure. however, along with the recent rise of interest in social networks in organizations, studies have started focusing on informal subgroups that are characterized by actual social interactions, and do not necessarily follow the same constellations as their formal counterparts (frank, penuel, & krause, 2015). however, based on the assumption that culture is constructed and evaluated in frequent work-related interactions, it is crucial to identify subgroups that are characterized by frequent interactions. therefore, this study adopts an informal subgroup approach in order to conceptualize and measure collaborative culture. an informal subgroup approach takes the work-related social network of members into account to subdivide the school in subgroups (frank, 1995, 1996). as a result, these informal subgroups can be regarded as stable social units within the broader organization (frank & yasumoto, 1998). we therefor expect that these subgroups may be more meaningful units of analysis to assess collaborative culture, in comparison to the whole school team. to empirically substantiate this assumption, the following research hypothesis is tested: perceptions of collaborative culture are more homogenous within informal subgroups compared to on the entire school team’s aggregated perception on collaborative culture. 3. method 3.1. participants data for this study were collected in secondary schools in flanders (northern part of belgium with approximately 6.500.000 inhabitants), using an online survey. the “school team questionnaire” was part of the liso-project (dutch acronym for student careers in secondary education) and was administered in 20 secondary school teams. in this study, a ‘school team’ consisted out of all teachers who actually worked in the same secondary school campus (= in the same geographical unit). all teachers of the 20 secondary schools received an invitation to fill out the online questionnaire that addressed the working conditions in the school. however, we could only include secondary schools that had a response rate of at least 75%, which is the threshold to reliably identify informal subgroups using social network analysis (borgatti, carley, & krackhardt, 2006; kossinets, 2006). in 13 schools, a response rate of 75% or more was reached, resulting in meredith et al | f l r 27 a dataset of 760 teachers. respondents in our dataset were on average around 40.65 years old (sd = 10.62), had 11.65 years of experience (sd = 9.68) in their current school and 64% of them were female. 3.2. instrument and analytic strategy 3.2.1. determining informal subgroups in order to determine informal subgroups within the secondary school, we included a sociometric question in the survey. as we wanted to grasp the day-to-day work-related interactions of teachers, we decided to focus on the information network and asked the following question: “whom do you go to for class-related information? (for instance, for teaching materials and methods, learning, content and classroom management)”. a roster with all teachers’ names was presented and participants could indicate to whom of their colleagues they went to and on what frequency basis (presented in 8 categories going from ‘once a year’ to ‘daily’). teachers could indicate an unlimited number of colleagues. in order to identify informal subgroups, we used a social network approach. social network analysis makes it possible to investigate the patterns of interactions between actors (scott, 1991). it focuses on a set of individuals and the relationships connecting them (wasserman & faust, 1994). several authors indicated that social network analysis helps to unravel social phenomena that can partly be explained by social structure (e.g., wellman & frank, 2001). based on the patterns and frequency of work-related interactions, informal subgroups can be identified. in this study, the approach of frank (1996) was used to determine nonoverlapping cohesive subgroups. a non-overlapping approach, meaning that teachers could only be member of one subgroup, was selected, as an overlapping approach makes it difficult to establish “an inside and an outside” of a social unit (abbott, 1995, p. 872). in order to identify these non-overlapping cohesive subgroups, the kliquefinder software was used (frank, 1996). the procedure implemented in kliquefinder is based on the exponential random graph model (ergm) framework. the algorithm within this software identifies subgroups by iteratively reassigning actors in order to see if the probability that two actors interact increases if they are members of the same subgroup. actors are assigned to a cohesive subgroup once the maximum probability of having a tie with other subgroup members is reached (for more information about the software and algorithm, see frank, 1995). 3.2.2. evaluating homogeneity of collaborative culture in order to test the hypothesis that perceptions on collaborative culture are more homogenous within the informal subgroup than within the entire school team, we included a scale to assess collaborative culture. schein (1985) indicated that actual practices are the most visible and tangible aspects of organizational culture, and underlying assumptions, values and norms come to surface in these practices. we selected six items of the scale of leonard (2002) addressing the perceptions of teachers on collaborative culture. we only included the items that addressed the actual collaboration within the school, such as ‘in my school, teacher collaboration is strong’ and ‘in my school, teaching is a team activity, rather than an individual activity’. the cronbach’s alpha of this subscale was satisfactory (α = .85). based on a confirmatory factor analysis, the fit of the scale was found acceptable (cfi = .954, tli = .968, srmr = .042, rmsea = .04). homogeneity of perceptions was operationalized by combining two complementary approaches to assess within-group agreement, namely 1) a consensus-based approach, using the intraclass correlation coefficient (icc(1)) as a measure of homogeneity; and 2) a disconsensus-based approach, calculating the average deviation index (admj) as a measure of lack of homogeneity. at the same time, these measures give information on the aggregation of individual perceptions to the proposed level of analysis (raudenbush & bryk, 2002). meredith et al | f l r 28 a) a consensus-based approach to homogeneity of perceptions. first, we assessed the icc(1). this icc(1) reflects the agreement between any pair of individuals within the same group (mcgraw & wong, 1996). the outcome is the portion of total variance in a variable that can be explained by membership in a group (raudenbush & bryk, 2002). in other words, a low icc(1) indicates that only a small proportion of the variance can be explained by membership of the informal subgroup, and variance within the subgroup is high. the icc(1) is interpreted as an effect size with values of .01, .10 and .25 respectively indicating a small, median and large effect (bliese, 2000; raudenbush & bryk, 2002). the advantage of using the icc(1) to reflect subgroup homogeneity of perceptions is that it corrects for group size and provides group-level properties that are not biased by either group size or the number of groups in the sample (bliese, 2000). b) a disconsensus-based approach to homogeneity of perceptions. admj is the degree of within-group disconsensus. it reflects the within-group variability, so high admj scores indicate within-group dispersion. the calculation of it consists out of two steps. first the average deviation for each scale item (j) is computed. second, the average deviation for the six items of the scale is calculated. this measure provides an alternative for the highly criticized within-group agreement (rwg) of james, demaree, and wolf (1984), because it does not require an a priori specification of a null response range and it takes the metric of the original response scale into account (gonzález-romá, peiró, & tordera, 2002). both measures were calculated for informal subgroups and school teams to compare the homogeneity of perceptions on collaborative culture in subgroups and schools. these measures reflect to what extent teachers perceive collaborative culture in a similar/dissimilar way. a high degree of correspondence is needed to draw inferences about organizational working conditions, such as culture, and to aggregate perceptual scores to the level of the organization or organizational unit (james & jones, 1974; schneider, 1975). aggregation is interesting because, and only justified when, the mean score goes beyond individual perceptions, and reflects an actual organizational condition (payne, fineman, & wall, 1976). in other words, in order to investigate collaborative culture as a characteristic of the organization or organizational subgroup, perceptions need to reflect sufficient agreement. 4. results 4.1. the identification of informal subgroups based on the analyses in kliquefinder, the 13 schools were found to be consisting of 136 informal subgroups. on average, there were 11.33 subgroups within a school team (min = 5, max = 25, sd = 6.15), containing an average of 6.31 members (sd = 1.99). within these subgroups, the average age was 40.67 years (sd = 6.13), average experience was 11.78 years (sd = 5.36), and averagely 62.35 % (sd = .33) of the subgroup members were female. for all schools, cluster p-values were below .01, indicating that teachers were significantly more likely to interact frequently with colleagues that were part of the same identified subgroup. 4.2. cultural consensus within schools and informal subgroups the results of homogeneity of collaborative culture indicated that the intraclass correlation for the subgroup level (icc(1) = .21) is almost twice as high as for the school level (icc(1) = .11), reflecting lower variance within the informal subgroup than within the school team. this means that agreement is higher meredith et al | f l r 29 within the informal subgroup. within-group disconsensus results showed that within-group dispersion was lower on the subgroup level (admj = .66) in comparison to the school team level (admj = .77). these results provide support for our research hypothesis, namely that perceptions on collaborative culture are more homogenous within the informal subgroup in comparison to the entire school team. in other words, the perceptions on collaborative culture are more similar among teachers within the same informal subgroup compared to teachers the same school, even when correcting for group size. as a high degree of correspondence is necessary to make inferences about organizational features, the informal subgroup seems a more appropriate unit to conceptualize and assess collaborative culture. 5. discussion over the last decades, the importance of a collaborative culture within organizations, such as secondary schools, has been advocated by several researchers. in many cases, collaborative culture is regarded as a school feature, with the whole school team being the meaningful unit of analysis. however, based on the assumption that culture is developed, maintained and evaluated in the day-to-day communication among organizational members, this study hypothesized that the subgroup might be a more meaningful unit of analysis for both the conceptualization and measurement of collaborative culture. to test this hypothesis, a social network approach was adopted to identify stable social units that are characterized by frequent work-related interactions among their members. thereafter, we compared the homogeneity of perceptions of collaborative culture in these informal subgroups and the entire school team. in what follows, several theoretical and methodological implications are discussed and limitations and suggestions for future research are indicated. 5.1. theoretical and methodological implications 5.1.1. collaborative culture as an informal subgroup characteristic the main goal of this study was to address the conceptualization and measurement of collaborative culture in secondary schools. our findings indicated that the perceptions on the school’s collaborative culture are more homogenous within the informal subgroup than the entire school team. in other words, members of an informal subgroup evaluate collaborative culture more similarly than all members of the entire school team. linking these findings to the conceptual framework, two important, interrelated conclusions can be drawn. first, the fact that there is more homogeneity within subgroups suggests that there are, to some extent, collaborative subcultures within secondary school teams. second, our findings provide evidence for earlier conclusions that teachers perceive and evaluate the organization based on what happens in their close social neighborhood (van maele, moolenaar, & daly, 2015). based on this, it can be concluded that informal subgroups are more meaningful units of analysis to conceptualize and assess collaborative culture, in comparison to the entire school team. in secondary schools, it is often impossible to communicate, and consequently collaborate, with all other school team members. this conclusion stresses the importance of identifying the appropriate level of assessment, especially for socially-constructed concepts. this conclusion is not solely relevant for the concept of collaborative culture and the context of secondary schools. the finding that culture is indeed developed, maintained and evaluated in informal subgroups, characterized by frequent interactions, could also be relevant for other collective-level concepts, such as other types of organizational culture, collective efficacy or organizational trust. moreover, the reasoning behind the importance of (informal) subgroups might also applicable in other organizations. research in organizations should therefore be aware that collective-level concepts are, to some extent, meredith et al | f l r 30 socially constructed and, as a result, differ between subgroups within the organization. in line with frank (1995), we argue that subgroups can be regarded as the crucial link between individuals and organizations. 5.1.2. using social network analysis to identify informal subgroups in order to identify subgroups, this study adopted a social network approach. this approach made it possible to identify subgroups that are characterized by frequent informal work-related interactions, and can therefore be considered as stable social units (frank & zhao, 2005). while previous studies relied on the boundaries of formal structure (e.g., departments or teams) to identify subgroups (e.g., ball & lacy, 1984; hargreaves & hargreaves, 2006; siskin, 1991), this study used the work-related information seeking network of teachers. this approach has not only the advantage that it captures meaningful subunits that are characterized by actual interactions. it provides the possibility to apply a uniform approach for all schools in the sample, even if they differ in formal structure or the meaning that is given to it. for instance, two of the schools in our sample indicated that they did not have the usual ‘subject department structure’. four schools indicated that next to subject department structure, they had other formal subunits, such as grade-level teams or working groups. finally, three schools mentioned that the subject department structure was a purely formal constellation that did not reflect the actual interactions and subunits within the school. a social network approach helps to overcome the issues of identifying the (most) meaningful subunit. the interest in social networks has increased exponentially during the last three decades (lazega & snijders, 2016). based on the assumption that ‘relationships matter’, several researchers already gained thorough understanding of the structure and content of teachers’ professional relationships (e.g., coburn & russel, 2008; daly, moolenaar, bolivar, & burke, 2010; moolenaar, sleegers, & daly, 2012). this study provided further insight into the social structure of secondary schools and contributed to the understanding on how social-structural and cultural aspects of secondary schools are intertwined. moreover, the results showed that research focusing on the school team features benefits from a social network approach, as it provides the possibility to identify stable social units within the broader organization. 5.2. limitations and suggestions for future research first, several limitations can be formulated on the identification of informal subgroups. based on theoretical ideas and empirical findings, this study adopted an informal subgroup approach to conceptualize and measure collaborative culture. to identify our informal subgroups, data on the information seeking network of teachers were used. the selection of the criterion ‘seeking information’ was based on the reasoning that interactions that are characterized by ‘seeking work-related information’ are one of the most general work-related interactions. as we were interested in subgroups that were characterized by day-to-day work-related interactions, this criterion seemed to fit our purpose best. in advance, we checked whether this criterion did not already reflect collaborative culture itself and calculated the correlation between subgroup density, namely the extent to which subgroups members are connected by information ties, and perceptions on collaborative culture in the subgroup. our results showed that both measures were not significantly related (r = .13, p > .05). this indicated that subgroups with more information seeking interactions are not necessarily perceived as having a higher collaborative culture. in other words, the criterion of ‘seeking information’ (and the determination of subgroups) is not related to the perception of collaborative culture itself. it seems that more profound interactions are necessary to establish and measure collaborative by means of interactions. future research could include different types of work-related interactions to further look into the identification of informal subgroups and the measurement of collaborative culture by means of social networks. also, to identify our informal subgroups; the non-overlapping cohesive subgroup approach of frank (1995, 1996) was adopted. we chose this approach as it has been applied and validated in a wide array of network studies (e.g., foster-fishman, berkowitz, lounsbury, jacobson, & allen, 2001; frank & yasumoto, 1998; mcfarland, 2001). however, methodologists have developed and employed various other meredith et al | f l r 31 techniques to identify cohesive subgroups based on the extent of interaction between actors (borgatti & everett, 2000). future research could explore and compare other ways of attributing members to subgroups. finally, we treated these subgroups as autonomous entities with their own norms values and beliefs. however, future research could also address the spillover effects of interactions between subgroups, taking the larger social structure of the school into account. the inclusion of both the subgroup and school as units of analysis seems promising to capture important working conditions that affect the functioning of teachers. second, limitations concerning the measurement of collaboration can be identified. the scale is aimed at measuring a school related concepts and the items of the scale are formulated assuming that the school is the reference group. the approach adopted in this study can be justified by the assumption that teachers’ perceptions are based on what happens in their social neighborhood. however, in future research, a more accurate measurement of collaborative culture could use the, formal or informal, subgroup as reference group. this could make it possible to distinguish between school collaborative culture and subgroup collaborative culture. for instance, the research of adkins and caldwell (2004) made a difference between subgroup culture and the organizational culture. moreover, as schein (1985) indicated that actual practices are the most visible and tangible aspects of organizational culture, it would be interesting to measure collaborative culture by means of observing the actual collaborative practices that take place in the school. this would not only lead to higher response rates, but also reduce potential response biases, such as social desirability bias, which can be present in self-report instruments such as online questionnaires (krumpal, 2013). third, it would be interesting to include other features of the subgroup to investigate the presence of collaborative culture. for instance, the research of barczak, lassk, and mulki (2010) found that team trust impacts collaborative culture. in addition, also the consequences of collaborative culture could be further researched, as well as other types of organizational culture. fourth, as in most survey-based research, our data collection was characterized by incomplete data. missing data can be a significant problem in social network analysis because, as noted, individuals are considered interdependent from social ties in their social context (kossinets, 2006). in order to achieve high response rates in each school, several strategies were adopted to convince teachers to fill out the questionnaire. first, teachers were informed about the questionnaire by means of mails, posters and brochures. in this communication, it was emphasized that all responses would be anonymized, that each response was crucial to get more insight in teacher careers and the working conditions in schools, and that general conclusions would be fed back to policy agencies. second, response rates were followed up and several reminders were sent out. finally, school leaders were motivated to convince their team to fill out the questionnaire. if the school team achieved a response rate of more than 90%, an anonymized feedback report was provided. as only schools with a response rate of 75% or more were included, missing data was limited. in studies adopting a classic statistical approach, and where standard sampling is used to draw a representative sample from a population, there are special techniques available to correct parameter estimates for imperfect response rates (rubin & little, 2002). however, in the case of social network analysis, these kind of treatments are questionable, and although methods for dealing with such issues have been proposed, researchers agree that, at this point in time, list wise deletion is the most appropriate approach (robins & alexander, 2004). finally, our results showed that socially constructed working conditions, more specifically collective-level concepts that are developed and sustained in collegial interactions, cannot simple be regarded as organizational-level concepts. through interactions, members of subgroups develop and maintain shared values, norms and practices. at the same time, this social neighborhood defines how teachers perceive the school and its working conditions. future research therefore needs to pay attention to working conditions that may potentially differ between subgroups within the school. when collective-level concepts are at stake, it is crucial to identify the meaningful level of analysis. only in this way, correct interferences about the importance of these working conditions can be made. the main innovation of this paper is the adoption of a social network approach to identify meaningful subunits and measure collaborative meredith et al | f l r 32 culture. we provided an example of how social network analysis can provide more insight and methodological advancements in educational research. keypoints an informal subgroup approach was adopted to determine meaningful subunits within secondary school teams informal subgroups, characterized by frequent work-related interactions, are identified by means of social network analysis perceptions of collaborative culture are more homogenous within informal subgroups compared to on the entire school team’s aggregated perception on collaborative culture. the importance of identifying the meaningful unit of analysis for collective-level and sociallyconstructed concepts is stressed acknowledgments this study was conducted within the framework of the policy research center on study and school careers and was financed by the flemish ministry of education (belgium). the conclusions of the study do not necessarily reflect the views (of (and do not commit) the financing body. references abbott, a. (1995). things of boundaries. social research, 62(4), 857–882. adkins, b., & caldwell, d. (2004). firm or subgroup culture: where does fitting in matter most? journal of organizational behavior, 25(8), 969-978. doi: 10.1002/job.291 ashforth, b. e., & mael, f. (1989). social identity theory and the organization. academy of management review, 14(1), 20-39. doi: 10.5465/amr.1989.4278999 ball, s. j. & lacy, c. (1984) subject disciplines as the opportunity for group action: a measure critique of subject subcultures. in a. hargreaves & p. woods (eds.), classrooms and staffrooms: the sociology of teachers and teaching (pp. 232-244). milton keynes, uk: open university press. barczak, g., lassk, f., & mulki, j. (2010). antecedents of team creativity: an examination of team emotional intelligence, team trust and collaborative culture. creativity and innovation management, 4, 332-345. doi: 10.1111/j.1467-8691.2010.00574.x bliese, p. d. (2000). within-group agreement, non-independence, and reliability: implications for data aggregation and analysis. in k.j. klein & s.w. kozlowski (eds.), multilevel theory, research, and methods in organizations (pp. 349-381). san francisco, ca: jossey-bass. borgatti, s. p., carley, k. m., & krackhardt, d. (2006). on the robustness of centrality measures under conditions of imperfect data. social networks, 28, 124-136. doi: 10.1016/j.socnet.2005.05.001 borgatti, s. p., & everett, m. g. (2000). models of core/periphery structures. social networks, 21(4), 375395. doi: 10.1016/s0378-8733(99)00019-2 busher, h., & blease, d. (2000). growing collegial cultures in subject departments in secondary schools: working with science staff. school leadership & management, 20(1), 99-112. doi: 10.1080/13632430068905 coburn, c. e., & russell, j. l. (2008). district policy and teachers’ social networks. educational evaluation and policy analysis, 30, 203-235. doi: 10.3102/0162373708321829 meredith et al | f l r 33 cohen, s. g., & bailey, d. e. (1997). what makes teams work: group effectiveness research from the shop floor to the executive suite. journal of management, 23(3), 239-290. doi: 10.1016/s01492063(97)90034-9 daly, a. j., moolenaar, n. m., bolivar, j. m., & burke, p. (2010). relationships in reform: the role of teachers' social networks. journal of educational administration, 48, 359-391. doi: 10.1108/09578231011041062 dumay, x. (2009). origins and consequences of schools’ organizational culture for student achievement. educational administration quarterly, 45(4), 523-555. doi: 10.1177/0013161x09335873 firestone, w. a., & pennell, j. r. (1993). teacher commitment, working conditions, and differential incentive policies. review of educational research, 63(4), 489-525. doi: 10.3102/00346543063004489 flores, m. a. (2004). the impact of school culture and leadership on new teachers' learning in the workplace. international journal of leadership in education, 7, 297-318. doi: 10.1080/1360312042000226918 foster-fishman, p. g., berkowitz, s. l., lounsbury, d. w., jacobson, s., & allen, n. a. (2001). building collaborative capacity in community coalitions: a review and integrative framework. american journal of community psychology, 29(2), 241-261. doi: 10.1023/a:1010378613583 frank, k. a. (1995). identifying cohesive subgroups. social networks, 17, 27-56. doi: 10.1016/03788733(94)00247-8 frank, k. a. (1996). mapping interactions within and between cohesive subgroups. social networks, 18, 93119. doi: 10.1016/0378-8733(95)00257-x frank, k. a., penuel, w. r., & krause, a. (2015). what is a “good” social network for policy implementation? the flow of know-‐‑how for organizational change. journal of policy analysis and management, 34(2), 378-402. doi: 10.1002/pam.21817 frank, k. a., & yasumoto, j. y. (1998). linking action to social structure within a system: social capital within and between subgroups. american journal of sociology, 104, 642-686. doi: 10.1086/210083 frank, k. a., & zhao, y. (2005). subgroups as meso-level entities in the social organization of schools. in l.v. hedges & b. schneider (eds.), social organization of schooling. new york, ny: sage publications. gonzález-romá, v., peiró, j. m., & tordera, n. (2002). an examination of the antecedents and moderator influences of climate strength. journal of applied psychology, 87(3), 465-473. doi: 10.1037//00219010.87.3.465 hargreaves, a. (1994). changing teachers, changing times: teachers' work and culture in the postmodern age. new york, ny: teachers’ college press. hargreaves, d. h., & hargreaves, d. (2006). social relations in a secondary school. london, uk: routledge & kegan paul. doi: 10.4324/9780203001837 harris, s. g. (1994). organizational culture and individual sensemaking: a schema-based perspective. organization science, 5(3), 309-321. doi: 10.1287/orsc.5.3.309 hofstede, g. (1998). attitudes, values and organizational culture: disentangling the concepts. organization studies, 19(3), 477-493. doi: 10.1177/017084069801900305 james, l. r., demaree, r. g., & wolf, g. (1984). estimating within-group interrater reliability with and without response bias. journal of applied psychology, 69(1), 85-98. doi: 10.1037/0021-9010.69.1.85 james, l. r., & jones, a. p. (1974). organizational climate: a review of theory and research. psychological bulletin, 81(12), 1096-1112. doi: 10.1037/h0037511 kossinets, g. (2006). effects of missing data in social networks. social networks, 28, 247-268. doi: 10.1016/j.socnet.2005.07.002 krumpal, i. (2013). determinants of social desirability bias in sensitive surveys: a literature review. quality & quantity, 47(4), 2025-2047. doi: 10.1007/s11135-011-9640-9 lazega, e., & snijders, t.a.b. (2016). multilevel network analysis for the social sciences. cham, ch: springer. doi: 10.1007/978-3-319-24520-1 leithwood, k., leonard, l., & sharratt, l. (1998). conditions fostering organizational learning in schools. educational administration quarterly, 34, 243-276. doi: 10.1177/0013161x98034002005 meredith et al | f l r 34 leonard, l. j. (2002). schools as professional communities: addressing the collaborative challenge. iejll: international electronic journal for leadership in learning, 6(17). lieberman, a. (1990). schools as collaborative cultures: creating the future now. bristol, uk: the falmer press. lieberman, a. (1995). practices that support teacher development: transforming conceptions of professional learning. phi delta kappan, 76, 591-596. little, j. w. (1982). norms of collegiality and experimentation: workplace conditions of school success. american educational research journal, 19, 325-340. doi: 10.3102/00028312019003325 lock, p., & crawford, j. (1999). the relationship between commitment and organizational culture, subculture, leadership style and job satisfaction in organizational change and development. leadership and organizational development journal, 20, 365-373. doi: 10.1108/01437739910302524 louis, k. s., & kruse, s. d. (1995). professionalism and community: perspectives on reforming urban schools. thousand oaks, ca: sage publications ltd. louis, k. s., marks, h. m., & kruse, s. (1996). teachers’ professional community in restructuring schools. american educational research journal, 33, 757-798. doi: 10.3102/00028312033004757 marks, h. m., & louis, k. s. (1997). does teacher empowerment affect the classroom? the implications of teacher empowerment for instructional practice and student academic performance. educational evaluation and policy analysis, 19, 245-275. doi: 10.3102/01623737019003245 mcfarland, d. a. (2001). student resistance: how the formal and informal organization of classrooms facilitate everyday forms of student defiance. american journal of sociology, 107(3), 612-678. doi: 10.1086/338779 mcgraw, k. o., & wong, s. p. (1996). forming inferences about some intraclass correlation coefficients. psychological methods, 1(1), 30-46. doi: 10.1037/1082-989x.1.4.390 moolenaar, n. m., sleegers, p. j., & daly, a. j. (2012). teaming up: linking collaboration networks, collective efficacy, and student achievement. teaching and teacher education, 28, 251-262. doi: 10.1016/j.tate.2011.10.001 nias, j. (2005). why teachers need their colleagues: a developmental perspective. in d. hopkins (ed.), the practice and theory of school improvement (pp. 223-237). dordrecht, nl: springer. doi: 10.1007/14020-4452-6 payne, r. l., fineman, s., & wall, t. d. (1976). organizational climate and job satisfaction: a conceptual synthesis. organizational behavior and human performance, 16(1), 45-62. doi: 10.1016/00305073(76)90006-4 peterson, t. o., & beard, j. w. (2004). workspace technology's impact on individual privacy and team interaction. team performance management: an international journal, 10(7/8), 163-172. doi: 10.1108/13527590410569887 raudenbush, s. w., & bryk, a. s. (2002). hierarchical linear models: applications and data analysis methods (vol. 1). thousand oaks, ca: sage. robins, g., & alexander, m. (2004). small worlds among interlocking directors: network structure and distance in bipartite graphs. computational & mathematical organization theory, 10(1), 69-94. doi: 10.1023/b:cmot.0000032580.12184.c0 rubin, d. b., & little, r. j. (2002). statistical analysis with missing data. new york, ny: john wiley & sons. sackmann, s. a. (1992). culture and subcultures: an analysis of organizational knowledge. administrative science quarterly, 37(1), 140-161. doi: 10.2307/2393536 schein, e. h. (1985). defining organizational culture. classics of organization theory, 3, 490-502. schein, e. h. (2010). organizational culture and leadership (vol. 2). san francisco, ca: john wiley & sons. schneider, b. (1975). organizational climate: individual preferences and organizational realities revisited. journal of applied psychology, 60(4), 459-465. doi: 10.1037/h0076919 scott, j. (1991) social network analysis: a handbook. london, uk: sage. doi: 10.4135/9781446294413 meredith et al | f l r 35 siskin, l. s. (1991). departments as different worlds: subject subcultures in secondary schools. educational administration quarterly, 27, 134-160. doi: 10.1177/0013161x91027002003 soeters, j. (1988). organisatiecultuur: inhoud, betekenis en veranderbaarheid [organizational culture: content, meaning and changeability]. in j.j. swanink (ed.), werken met organisatiecultuur: de harde gevolgen van de zachte factor [working with organizational culture: the harsh effects of the soft factor] (pp. 15-27). vlaardingen: nederlands studie centrum. strahan, d. (2003). promoting a collaborative professional culture in three elementary schools that have beaten the odds. the elementary school journal, 104(2), 127-146. doi: 10.1086/499746 sveiby, k. e., & simons, r. (2002). collaborative climate and effectiveness of knowledge work-an empirical study. journal of knowledge management, 6, 420-433. doi: 10.1108/13673270210450388 van maanen, j., & schein, e. h. (1979). towards a theory of organizational socialization. in b. m. staw (ed.), research in organizational behavior, vol. 1 (pp. 209-264). greenwich, ct: jai press. van maele, d., moolenaar, n.m., & daly, a.j. (2015). all for one and one for all: a social network perspective on the effects of social influence on teacher trust. in m. dipaola & w.k. hoy (eds.) (pp. 171-196), leadership and school quality. greenwich, ct: information age publishing. visscher, a., & witziers, b. (2004). subject departments as professional communities? british educational research journal, 30(6), 785-800. doi: 10.1080/0141192042000279503 waldron, n. l., & mcleskey, j. (2010). establishing a collaborative school culture through comprehensive school reform. journal of educational and psychological consultation, 20(1), 58-74. doi: 10.1080/10474410903535364 wasserman, s., & faust, k. (1994). social network analysis: methods and applications (vol. 8). cambridge, uk: cambridge university press. doi: 10.1017/cbo9780511815478 wellman, b., & frank, k. (2001). network capital in a multilevel world: getting support from personal communities. in n. lin, r.s. burt & k. cook (eds.), social capital: theory and research (pp. 233273), new york, ny: aldine de gruyter hawthorne. yang, j. t. (2007). knowledge sharing: investigating appropriate leadership roles and collaborative culture. tourism management, 28, 530-543. doi: 10.1016/j.tourman.2006.08.006 frontline learning research 3 (2014) 64-77 issn 2295-3159 corresponding author: siân e. jones, department of psychology, social work, and public health, oxford brookes university gipsy lane ox3 0bp +44 (0)1865 48371, sianjones@brookes.ac.uk http://dx.doi.org/10.14786/flr.v2i1.80 64 | f l r bullying and belonging: teachers’ reports of school aggression siân emily jones a , antony s.r. manstead b , andrew g. livingstone c a oxford brookes university, united kingdom b cardiff university, united kingdom c university of exeter, united kingdom article received 17 th january 2014 / revised 24 th february 2014 / accepted 3 rd march 2014 / available online 25 th april 2014 abstract research on bullying has confirmed that social identity processes and group-based emotions are pertinent to children’s responses to bullying. however, such research has been done largely with child participants, has been quantitative in nature, and has often relied on scenarios to portray bullying. the present paper departs from this methodology by examining group processes in qualitative reports of bullying provided by teachers. fifty-one teachers completed an internet-based survey about a bullying incident at a school where they worked. thematic analysis of survey responses concerned two core themes in the reports: (a) children ganging up on another child and (b) children sticking together to protect each other. there was evidence that children act in specific ways, in line with social identity processes, in order to support or resist bullying. there was also evidence that teachers understand bullying to be a group phenomenon. the implications of these findings for anti-bullying interventions are discussed. keywords: bullying; teachers; group processes; social identity 1. introduction bullying can happen in any setting where power relations exist (smith &brain, 2000). of particular concern in this paper is bullying in schools, because research indicates that bullying is a common experience for such children. for example, representative research shows that 28% of students aged 12-18 years reported being bullied during the school year (roberts, zhang, truman, & snyder, 2012). the effects of bullying are serious: targets may suffer higher rates of anxiety, depression, physical health problems, and social maladjustment (espelage, low, & de la rue, 2012). such negative consequences may last into s. e. jones et al. 65 | f l r adulthood (e.g., hunter, mora-merchan, & ortega, 2004; olweus, 1994). as these effects touch both perpetrators and targets (gini & pozzoli, 2009) and those who witness it (nishina & juvonen, 2005), it is important to reduce incidences of bullying. the finding that those who witness bullying are susceptible to negative consequences points to the ways in which bullying may be understood as a group process. indeed, recent research supports a framing of bullying in these terms. since the publication of atlas and pepler‟s (1998) observational study, which revealed that peers were present in 85% of all bullying episodes on a school playground, a burgeoning research literature has confirmed that it is helpful to regard bullying as a group process. for example, espelage, holt, and henkel (2003) used peer nomination techniques (for a review see hymel, vaillancourt, mcdougall, & renshaw, 2002) to identify peer groups of middle school children, and followed them longitudinally for a year. they found that members of peer groups that engaged in bullying increased their own bullying behaviours over time. additionally, using peer nomination techniques as part of the participant-role approach, it has been shown that peers may form groups that work collectively to resist bullying: sainio, veenstra, huitsing, and salmivalli (2011) found that targets who had one or more classmates defending them when they were bullied were less anxious, less depressed, and had higher self-esteem than undefended targets, even when the frequency of the bullying incidents was taken into account. in line with the above research findings, in recent years the zeitgeist in terms of responses to bullying in schools has changed from a focus at the level of the individual to interventions focused at the school level (for a review of school/class-wide interventions, see horne, stoddard, & belle, 2007). horne et al. (2007) note that a common feature of these group-level interventions is that they work at the whole school or class level, as well as targeting those directly affected by a bullying incident. as such, these interventions focus on social skills training of individuals, but do not address the peer/friendship group dynamics identified by researchers, and discussed in greater detail below. indeed, although much research has been directed at a group-level understanding of children‟s responses to bullying, comparatively little research has looked at the group-level nature of teachers‟ responses. in light of this, this paper aims to look at how groups are represented in teachers‟ responses to bullying. 1.1 a social identity account of bullying empirical work looking at bullying as a group process has used social identity theory (sit; tajfel & turner, 1979) as a means of understanding why children might work in groups to (a) bully, and (b) overcome bullying. this theory proposes that a person‟s group memberships are an important part of their identity – their social identity – and, as a consequence, group members will try to enhance their own self-esteem by seeking to maintain a positive image of their group. the more strongly one identifies with a given group membership, the more likely one is to act on behalf of the (positive image of) the group; in other words, the more likely one is to enhance one‟s social identity. the group image is epitomised, according to sit, by a set of group norms to which its members are expected to adhere (turner, 1999). as such, group members are likely to be rewarded for adherence to group norms, or rejected by the group when they fail to adhere to them (morrison, 2006). building on this, it was hypothesized (e.g., jones, haslam, york, & ryan, 2008; jones, livingstone, & manstead, 2011, 2012; nesdale, 2007) that bullying might be a set of behaviours that is motivated by social identity processes, including levels of ingroup identification, and adherence to group norms. in line with this hypothesis, a number of studies have indicated the role of social identity processes in maintaining bullying. these studies have been mainly conducted using the minimal group paradigm (tajfel, billig, bundy, & flament, 1971), in which children are assigned to a group at random (but ostensibly on the basis of some activity, such as a dot-estimation task) and their responses to hypothetical intergroup events are recorded (see dunham, baron & carey, 2011, for a review of minimal group research with children). ojala and nesdale (2004) demonstrated that children understand the need for group members to behave normatively, even if this involves bullying. they gave children scenarios to read, and found that children understood that story characters who engaged in bullying would be rejected by a group with an anti-bullying norm, but accepted by a group with a pro-bullying norm. evidence from jones et al. (2008), using the minimal group paradigm, showed that children encouraged to identify with a perpetrating group in a scenario s. e. jones et al. 66 | f l r concluded that one bullying child from that group was deserving of punishment for a bullying incident, whereas third party group members concluded that the whole of the perpetrating group was punishable. furthermore, nesdale, durkin, maass, kiesner, and griffiths (2008) showed, in a minimal group study, that children‟s intentions to engage in bullying were greater when they were assigned to a group that had a norm of outgroup-disliking, rather than a norm for outgroup-liking. in later research, jones et al. (2011) showed that children who identify highly with a target feel more anger on behalf of that target – they “stick together” with a target of bullying, while children who identify with a bullying group express more pride – and want to be friends with the bullying children. thus, social identity processes might account for children‟s responses to bullying, in terms of a need to maintain a positive ingroup image, and to adhere to ingroup norms. 1.2 teachers’ responses to bullying despite research showing that group processes might be involved in bullying, little research effort has been spent examining teachers‟ awareness of processes underlying bullying (nesdale & pickering, 2006). this lack of research attention is problematic in light of the finding from a study by whitney and smith (1993), which found that less than half of teachers intervened when a pupil was being bullied. this is despite the fact that it is a recommended government policy for children to be actively encouraged to talk to adults about bullying, to see that it is stopped (department for children, schools and families, 2007). more worryingly, teacher intervention in bullying decreases in likelihood as pupils get older (o‟moore, kirkham, & smith, 1998), and incidences of bullying increase with age (horne et al., 2007). one possible reason for lack of intervention is lack of awareness or understanding of a situation as bullying. fekkes et al. (2005) showed that a substantial number of both teachers and parents were unaware that the child was being bullied; for classmates this figure was lower. teachers did not speak to bullies, only to the bullied children. children indicate that verbal and psychological bullying is more prevalent than physical bullying, yet few teachers recognize these incidents or identify them as bullying (hazler, miller, carney, & green, 2001). boulton (1997), investigating teachers‟ definitions of and attitudes towards bullying, found that one in four teachers did not regard name-calling, spreading rumours or social exclusion as bullying. khoury-kassabri (2009) argued that in many cases school staff do not have the ability to determine who the victims and bullies are, and do not make an effort to distinguish each student‟s role in the bullying situation. thus, a student‟s involvement in bullying, in whatever role, is associated with being verbally or physically punished by teachers. also, in some instances, students who are involved in violent acts (even as victims) are perceived as disrupting the learning process, which might increase the probability of being punished. yoon and kerber (2003) investigated teacher attitudes via their responses to various bullying scenarios. they found that when teachers are unaware of the extent of bullying or when they did not consider the behaviour to be serious, they exhibited passive attitudes towards bullying and did not intervene or did not do so effectively. because non-physical acts of bullying are easier to hide, teachers must be aware of the symptomatic bullying behaviours (yoon & kerber, 2003). in a study by nicolaides, toda, and smith (2002), trainee teachers were reasonably accurate in their estimates of the frequency of bullying in school and the extent of teacher intervention. they were unaware that self-reports of victimization decline with age. in addition, they believed that girls and boys were equally likely to be bullies and that bullies have low selfesteem and lack social skills. these trainee teachers saw their role as instrumental in reducing bullying in the classroom. whether or not they will be effective in that role is contingent on a number of factors. further to this, bauman and del rio (2005) used a questionnaire assessing knowledge, attitudes and beliefs about bullying on a sample of 82 trainee teachers in the united states. participants had some accurate knowledge as well as some beliefs and attitudes that would not be consistent with effective teacher behaviours towards students involved in bullying. only 6 per cent mentioned repetitive behaviour and 28 per cent included power imbalance in their definitions. these are the two elements that are unique to bullying vis-à-vis s. e. jones et al. 67 | f l r aggression. boulton et al. (2014) found that willingness to intervene by teachers corresponded to the type of bullying portrayed. in a similar vein, hazler et al. (2001) reported that teachers frequently label any physical conflict as bullying, even when it is not, and show less concern for and intent to intervene in situations with the potential for social or emotional harm. the teachers were interested in further training. teachers‟ views and beliefs about bullying inform their anti-bullying action or inaction. yoon and kerber (2003)‟s research shows that teachers are less likely to intervene in bullying if they are unsympathetic to victims or believe that getting involved is unnecessary. kochenderfer-ladd and pelletier (2008) showed that avoidant beliefs (“children would not be bullied or picked on if they avoided mean children”) were predictive of separating students which was then associated both directly and indirectly (via reduced revenge seeking) with lower levels of peer victimization. teachers who held normative beliefs (“bullying is normative behaviour that helps children learn social norms”) about bullying were not likely to intervene. holt and keyes (2004) found that 27 % of teachers agreed with the statement, „a little teasing doesn‟t hurt‟. research suggests that teachers are aware of the group-level nature of bullying. yubero and navarro (2006) found that teachers believed that girls employ bullying tactics planned in advance, with the objective of creating unease in their relationships “in order to obtain a more advantageous position within their group.” (p. 499, emphasis ours). a vignette study by nesdale and pickering (2006) examined the impact on teachers‟ reactions to children‟s aggression of three variables, two of which were related to the aggressors and one was related to the teachers. teachers each read a scenario that described an aggressive episode committed by a group of boys against a boy from another class. the aggressors were either good or bad children, who were either popular or unpopular with their classroom peers. in addition, the scenario manipulated the teachers‟ social identity, in terms of the strength of their identification with the class to be either high or low. analysis of the teachers‟ ratings revealed a consistent negative response from the teachers towards the aggressors versus the victim. however, the teachers‟ responses were also influenced by the aggressors‟ goodness and popularity, and the teachers‟ class identification. 1.3. the present study given this, and that empirical research shows that social identity processes are relevant to bullying, it seems timely to explore whether teachers‟ narratives about bullying include mention of the role of groups. we sought to examine teachers‟ accounts of school bullying, with a particular focus on the way in which bullying involving more than two children was described. owing to the paucity of previous research on teachers‟ perceptions of bullying, this study was exploratory in nature. we used qualitative research methods as a means to explore the way in which teachers represented bullying episodes among pupils, and as a way of investigating the content of the bullying episodes and the approaches that were used to deal with them. qualitative research methods thus enabled us to consider a range of bullying episodes in order to determine whether there was any evidence that the group processes that have been investigated empirically are echoed in teachers‟ reports of school bullying. accordingly, teachers were invited to complete an internet-based survey of their experiences of children‟s bullying at a school where they had worked. through a series of open-ended questions, they were asked to recall the details of a bullying incident. 2. methods 2.1 data collection and participants following ethical approval, teachers were invited to take part in an online survey (hosted by survey monkey). to encourage participation, links to the survey were hosted on anti-bullying sites, social networking sites, and on discussion forums aimed at teachers. one hundred and fifty-six teachers responded to the questionnaire. responses from 51 teachers (25% of the total number of respondents) were sufficiently s. e. jones et al. 68 | f l r complete (i.e., these participants had answered, in a meaningful way, at least one open-ended question concerning the bullying incident) to be included in analyses. of these, 32 were female and 15 were male (four unknown). thirteen teachers taught at primary schools, 35 at secondary schools (three unknown). all teachers taught at state schools. in the interests of anonymity, no further demographic information about participants was gathered. 2.2 children and schools participants provided data concerning the children involved in the bullying incident and the schools in which these incidents took place. 2.2.1 age of children bullying incidents were reported among children between 6-7 years, up to 17-18 year-olds. bullying was most frequently reported among 11-13 year-olds, (14 cases) and was not reported among 4-6 year-olds. this information is reported in figure 1. figure 1. the number of bullying incidents reported by participants as a function of age group. 2.2.2. size the modal school size was over 1000 pupils (n = 13), while the modal class size was 20-29 pupils (n = 22). bullying incidents were most frequently reported in this sample in schools with over 1000 students where the class size was between 20-29 pupils. 2.3 questionnaire items three questionnaire items concerned the details of a bullying incident that had occurred at a school in which they had worked. open-response questions asked for details about (1) the reporting of the bullying incident, (2) the nature of the bullying, and (3) the extent to which children involved in the bullying were 0 1 2 3 4 5 6 7 4-5 years 5-6 years 6-7 years 7-8 years 8-9 years 9-10 years 10-11 years 11-12 years 12-13 years 13-14 years 14-15 years 15-16 years 16-17 years 17-18 years frequency of cases a g e g ro u p o f c h il d re n s. e. jones et al. 69 | f l r familiar to each other. following this were closed questions about the age of the children involved, sex of the teacher, school type, school and class size, and about whether the school had an anti-bullying policy. 2.4 data analysis strategy all usable data from open-response items were transferred to nvivo, and then submitted to a thematic analysis. two themes used to inform the analysis were guided by the extant research (see jones et al., 2011) on social identity processes: 1) children ganging up on another child, (condoning and joining in the bullying) and (2) children sticking together with the target (supporting the target and/or reporting the bullying). the analysis first involved organizing the data into categories according to the number of perpetrators involved. of the 51 incidents reported, seven involved only two children (one perpetrator and one target) and 44 cases involved more than one perpetrator. because the focus is on group processes in bullying, subsequent analyses concentrated on the latter 44 cases. data from these cases were coded under descriptive categories, such as “school journey” or “cyberbullying” in order to reduce the data to analyzable form (coffey & atkinson, 1996). extracts from the data were coded for each category to ensure that later abstractions would „fit‟ the data (straus & corbin, 1998). these descriptive categories were then arranged around the two primary themes, reflecting the nature of the bullying and the processes involved in reporting it, as indicated in the teachers‟ reports. illustrative extracts of each primary theme are reported below. 3. results 3.1 primary themes the following primary themes were examined in analysis of the teachers‟ reports: (1) children ganging up on another child, and (2) children sticking together. these are outlined in figure 2, and in more detail below, along with illustrative extracts. in parentheses immediately following each extract is the participant number, participant sex, and the age of the children involved in the bullying. figure 2. themes and sub-themes in the data (number of cases categorized in this theme in parentheses). support from peers (33) ganging up sticking together multiplicity of place (7) multiplicity of perpetrators (26) multiplicity of methods (28) support from school (41) s. e. jones et al. 70 | f l r 3.1.1 ganging up particularly common in teachers‟ accounts of bullying involving more than one perpetrator was the way in which children were seen as „ganging up‟ on their target. this theme could be divided into three subthemes. the first concerned the multiplicity of the perpetrators doing the bullying: “i discovered that a group of girls in my class were bullying one particular child ... there were about 7 or 8 involved altogether” (p30, female, 10-11 years old). “a year 8 boy [was] repeatedly called homophobic names by a number of class peers” (p22, female, 12-13 years old). “the [bullying] group involved two girls and four boys” (p4, male, 12-14 years old). in a few instances the ganging up by multiple perpetrators was directed at a group-level characteristic in the target, like race or sexuality: “one boy at a lunch table directed the word "nigger" at one of our black students…the white students at the table had been directing racial comments at the black student for quite some time” (p44, female, 12-13 years old). “there was a case when teenagers were harassing a student who was perceived to be gay” (p45, female, 12-13 years old) “a boy repeatedly called homophobic names by a number of class peers” (p22, female, 12-13 years old). the majority of the bullying occurred between perpetrators and a target who were members of the same class group, and who were sometimes described as close friends before the bullying started, but who would then gang up on a target: “they appeared to be good friends at the start of the year and sat next to each other in class. they certainly had several classes together” (p 25, female, 1112 years old). “bullying between girls that had been friends ... the main three girls had been close friends” (p2, female, 15-16 years old). “same class, close friends” (p3, female, 11-12 years old) “…the target student had previously been good friends with the bullies... children involved were in some of the same classes” (p8, female, 17-8 years old). “same class... child being bullied was friends with those showing bullying behaviour” (p14, male, 9-11 years old). ganging up was also apparent in the multiplicity of methods (the second sub-theme) that were used to bully the target according to many reports: “name-calling, nasty comments, bringing student to tears, getting others to ignore student, hiding student’s possessions” (p28, male, 13-14 years old). “the bullying was mostly gossiping, rumour-spreading and withdrawing friendships (also encouraging others to withdraw friendships)” (p2, female, 15-16 years old). s. e. jones et al. 71 | f l r “bullying included name-calling, throwing small objects [and] trying to split up friendship groups” (p19, unknown, 11-13 years old). among the reports, it was uncommon for one „type‟ of bullying to be administered to a target. also prevalent was that bullying occurred not just at school, but in multiple places (the third sub-theme): “bullying began in school and then moved to outside school and through e-mail and im [instant messaging]” (p29, female, 12-14 years). “bullying spilled over into extra-curricular activities” (p14, male, 9-11 years old). “the bullying took place mostly at home but intimidation followed in school” (p31, unknown, 17-18 years old). “this happened in school and continued out of school” (p32, female, 12-13 years old). “happened in school halls at first but carried over to homes” (p37, female, 14-16 years old). the effects of „ganging up‟ were seen in the emotional experiences of the targets, as reported by the teachers: “the target had been devastated by the bullying.” (p4, male, 12-14 years old) “they [parents] said he was very distressed and did not want to return to class as he was too afraid.” (p5, female, 15-16 years old) “name calling (about appearance)... is what upset the girl. (p10, male, 11-12 years old) thus, bullying is construed as a set of activities whereby a group of children „gang up‟ on another child, as illustrated by the multiplicity of the perpetrators involved, the acts that take place, the spaces they take place in, and the way in which children can turn upon former friends, with negative emotional reactions sometimes directly induced by the perpetrators, and often evident in the targets‟ responses. 3.1.2 sticking together in parallel with „ganging up‟ on the part of the perpetrators, in the majority of cases children who found themselves to be the target of bullying were supported by their peers. peers often showed solidarity with the target, independently of support of adults, in reporting the bullying to a teacher: “children (friends of the bullied) approached me and told me about what had happened, giving me names of the bullies, also of other children who could corroborate their story.[t]hey had not approached any other teachers or informed their parents” (p19, unknown, 11-13 years old). “a child reported the bullying – a friend of the child reported it” (p3, female, 11-12 years old). “his friend (not the target) reported to me an incident of verbal and physical bullying of the pupil” (p17, female, 13-14 years old). “five of the boy’s friends were all supportive of the bullying claims and spoke to the teacher about it” (p26, female, 14-15 years old). s. e. jones et al. 72 | f l r peers also encouraged targets to report bullying for themselves, because they saw the bullying behaviour as illegitimate: “she was supported by a small number of peers who had encouraged her to complain and felt her treatment was unfair” (p10, male, 11-12 years old). in one case alternative friendship groups were effective in dissipating negative effects of bullying: “[he] found a different friendship group that seemed to be more effective than the school intervention” (p20, male, 11-12 years old). there is evidence, then, that some children who are aware of bullying going on in their class appraise the situation as unfair, and work together as a group to „stick by‟ the target in order to overcome the bullying. beyond this, there was evidence in the teachers‟ responses that the school stuck together to deal with the bullying, often in line with a whole school policy: “in this instance i spoke to the whole class as well as the girls involved. i also did my next class assembly on bullying so that it was kept in the forefront of their minds” (p30, female, 10-11 years old). “whole year group received a number of anti-homophobia forum theatre and in-class support resources “(p22, female, 12 -13 years old). “there was a whole year 7 assembly on cyberbullying and how it was easy for comments to have an effect. there was also a pse [personal and social education] session on cyberbullying that linked in with this” (p25, female, 1112 years old). “in all the tutor groups we reminded students about the college’s zero tolerance policy towards bullying” (p9, female, 16 -17 years old). thus, not only children, but staff members were seen here to “stick together” to promote an antibullying message to pupils. 4. discussion the vast majority of cases that were reported by teachers for this research involved more than a twoperson perpetrator-target dyad. the data presented above provided a more nuanced picture of the ways in which social identity processes might be relevant to the problem of school bullying than that provided by previous experimental work (e.g., jones et al., 2011; nesdale et al., 2008), which has focused on strength of identification and group norms . specifically, it emerged that bullying in groups has a substantial intragroup dynamic, with bullying sometimes occurring among former friends. this bullying took multiple forms, and happened in multiple spaces. despite this, there was evidence that children work together in groups to overcome bullying. 4.1 social identity and bullying this research lends support to a social identity-based account of bullying. there was evidence in the teachers‟ accounts that children form groups in order to bully, and that bullying is based on characteristics of group membership (e.g., sexuality, race). there was also evidence that children form supportive groups around targets of bullying, and that children are encouraged to identify with school-level group norms s. e. jones et al. 73 | f l r surrounding peer victimization. these findings are thus in line with scenario-based research (e.g., jones et al., 2011, 2012; nesdale et al., 2008) showing children‟s tendency to follow group norms surrounding bullying, and to identify with, and behave in line with, their friendship groups. a novel insight for research looking at social identity processes in bullying is that bullying occurs between children who were former friends. situations were described by teachers whereby two or more children would target someone who was previously perceived to be part of their friendship group. notwithstanding possible misconceptions by teachers regarding friendship groups, or that this sample was self-selected, and likely to be unrepresentative of all bullying incidents in a school, or specific time period, this finding is consistent with recent research by mishna, wiener, and pepler (2008), whose interview data showed that children were sometimes targeted by their friends. this finding prompted the authors to pose further research questions concerning how friendships might become bullying relationships, as well as how children deal with such bullying. from a social identity perspective, one might also ask about the group dynamics entailed in such bullying. jetten, branscombe, spears, and mckimmie (2003) coined the term peripheral group members to describe new group members, or those who represent the group‟s prototype less well. it may be the case that the children who are bullied from within friendship groups are peripheral group members who want to become closer to the friendship group, but are bullied because they are unsure of the norms of that group. or, relatedly, is bullying within groups a way of policing friendship group norms, such that those who are bullied are those members who fail to conform to such norms? alternatively, is it the case that each friendship group contains multiple alliances between children such that the group is made up of one superordinate, and several subordinate groups, between which bullying occurs? these are all questions that could be addressed in future research. 4.2 teachers’ views this study shows that teachers are aware of a group-level nature to bullying. here, teachers reported which were the targets and perpetrators of the bullying, as well as the “group of girls and boys” who surround and support the perpetrators and targets. additionally, the teachers recognized that targets were often supported by friends in reporting what had happened. this is in line with yubero and navarro (2006), who found that teachers also showed awareness of the “relational” nature of bullying. the findings are also consistent with those of nesdale and pickering (2006) in showing that social identity concerns, regarding the schools norms about bullying (seen in their adherence to school policy) often came to the fore. indeed, teachers‟ responses to the bullying seemed overwhelmingly to stem from a need to ensure that key messages concerning bullying were understood at a group level: extensive group-level interventions were executed, in order to reinforce anti-bullying messages. nonetheless, the question regarding the extent to which these work in harmony with or at cross purposes to other aspects of the school‟s ethos remains open. it is not clear whether the anti-bullying strategies noted above are part of a coherent norm-based strategy, or an ad-hoc reaction to the bullying. thus, from a social identity perspective, it would be interesting to consider more carefully, and in a larger-scale study, with a representative sample of teachers, the processes of formation, dissemination, and acceptance of school-wide anti-bullying norms among school pupils and staff. 4.3 practical implications the research reported here has implications both for research into bullying and for practice. for researchers, it is apparent that one bullying episode is not always of a single type (e.g., verbal bullying, physical bullying, emotional bullying, or cyberbullying) as classified in the literature (e.g., rigby, 2007). although rigby recognized that these forms of bullying may co-occur, scenario-based research, such as jones, manstead and livingstone‟s (2009) work on cyberbullying, or hitti, mulvey, rutland, abrams, and killen‟s (in press) work on social exclusion, has typically focused on just one form of bullying. it may be advisable in future research to represent various forms of bullying as happening concurrently, in order to represent more accurately the ways in which children „gang up‟ on a peer. similarly, given the evidence reported above that children often show a supportive response to targets of bullying, this type of reaction could be investigated in scenario-based research: specifically, when there are children in support of a target, and children in support of a perpetrator, what determines bystanders‟ reactions? it should also be noted that s. e. jones et al. 74 | f l r previous research, to our knowledge, has only focused on one understanding of these scenarios (i.e., what teachers or children think). here, we assessed teachers‟ views. it would have been interesting to triangulate these with children‟s or parents‟ views about these same instances of bullying. this would have compromised anonymity, but would certainly be feasible in the context of scenario-based research. resolving mismatches and omissions in reporting of bullying could provide another route to intervention. at a practical level, this study points to a potential avenue for intervention in terms of teachers‟ responses to bullying. while the bullying described frequently happened among groups of children, current interventions do not focus on the group dynamics among perpetrating children that might have led to and sustained the bullying. thus, future interventions could seek to raise teachers‟ awareness of group dynamics, as outlined by social identity research, and of the (group-based) emotional responses of children other than the target. in this way, teachers might be better attuned to the group dynamics of the classroom and thereby be better positioned to „nip bullying in the bud‟ before it escalates. 4.4 conclusions the main aim in this research was to explore how teachers described bullying episodes in which they have been involved, with a particular focus on the role of the group in perpetrating, dealing with and stopping these bullying episodes. the qualitative analysis employed here was well suited to this aim. although it does not allow us to make conclusive statements regarding the broader picture of group bullying, for example concerning how commonly bullying episodes involve the group, or the specific characteristics of those children who are involved in group bullying, it does permit exploration of the content of bullying episodes. previous scenario-based research had shown that social identity concerns may be relevant to bullying. what is evident from the present study is that children bully in groups and work together to resist bullying. the teachers‟ reports also provide insight into the specific activities that children engage in in order to bully or support other children. the research could therefore be used as a basis for (a) helping teachers to understand better the nature of bullying, and (b) researchers to represent the group processes that children engage in a more realistic and more nuanced way in their empirical work. keypoints bullying may be understood as a group phenomenon. social identity theory gives a framework for how peer group processes might maintain or resist bullying. much work on bullying in groups has been scenario-based experimental research, while interventions work at the school or class, rather than at the friendship group level. this study asks teachers for accounts of bullying. the teachers provided rich accounts of bullying that evidence the group processes that might undergird its support or resistance, and which point to ways in which bullying might be addressed at the peer (friendship) group level. acknowledgements the first author gratefully acknowledges support from the economic and social research council (award number: pta-031-2006-00548). the third author would like to thank the leverhulme trust (ecf/2007/0050) for their support. we are also grateful to guida de abreu for her comments on an earlier draft of this manuscript, and to the children who took part in this research, and to the school, teachers, and parents who allowed them to do so. s. e. jones et al. 75 | f l r references atlas, r. s., & pepler, d. j. (1998). observations of bullying in the classroom. journal of educational research, 92, 86–89. doi: 10.1080/00220679809597580 bauman, s., & del rio, a. (2005). knowledge and beliefs about bullying in schools: comparing pre-service teachers in the united states and the united kingdom. school psychology international, 26(4), 428-442 doi: 10.1177/0143034305059019 boulton, m. j. (1997). teachers' views on bullying: definitions, attitudes and ability to cope. british journal of educational psychology, 67,(2) 223-233. doi: 10.1111/j.2044-8279.1997.tb01239.x boulton, m.j., hardcastle, k., down, j. simmonds, j., & fowles, j. a. (2014). a comparison of pre-service teachers‟ responses to cyber versus traditional bullying scenarios: similarities and differences and implications for practice. journal of teacher education. 65 ,(2)145-155. doi:10.1177/0022487113511496 coffey a & atkinson p (1996). making sense of qualitative data: complementary strategies. thousand oaks ca: sage. department for children, school, and families (2007). safe to learn: embedding anti-bullying work in schools. retrieved on 03/31/2011 from: http://www.teachernet.gov.uk/publications dunham, y., baron., a.s., & carey, s. (2011). consequences of "minimal" group affiliations in children child development, 82(3), 793-811. doi: 10.1111/j.1467-8624.2011.01577.x espelage, d.l., low, s. & de la rue, l. (2012). relations between peer victimization subtypes, family violence, and psychological outcomes during early adolescence. psychology of violence, 2, 313-24 fekkes, m., pijpers, f. i. m. and verloove-vanhorick, s. p. (2005). bullying: who does what, when and where? involvement of children, teachers and parents in bullying behavior. health education research: theory and practice, 20: 81-91. hazler, r. j., miller, d. l., carney, j. v. & green, s. (2001). adult recognition of school bullying situations. educational research 43, 133–46. hitti, a., mulvey, k. l., rutland, a., abrams, d. & killen, m. (2013). when is it ok to exclude a member of the ingroup?: children‟s and adolescents‟ social reasoning. social development, doi:org/10.1111/sode.12047 horne, a.m., stoddard, j.l., & bell, c.d. (2007). group approaches to reducing aggression and bullying in school. group dynamics theory, research and practice, 11 (4), 262-271.doi:10.1037/10892699.11.4.262 holt, m. k. & keyes, m. a. (2004). teachers‟ attitudes towards bullying. in d. l. espelage and s. m. swearer (eds). bullying in american schools: a social-ecological perspective on prevention and intervention, (pp. 121–40). mahwah, nj: erlbaum. hunter, s.c., mora-merchán, j.a., & ortega, r. (2004). the long-term effects of coping strategy use in the victims of bullying. the spanish journal of psychology, 7 (1), 3-12. hymel, s., vaillancourt, t., mcdougall, p., & renshaw, p.d. (2002). peer acceptance and rejection in childhood. in p.k. smith & c.h. hart (eds.), blackwell handbook of childhood social development (pp. 265–284). malden, ma: blackwell. jetten, j., branscombe, n. r., spears, r., & mckimmie, b. m. (2003). predicting the paths of peripherals: the interaction of identification and future possibilities. personality and social psychology bulletin, 29, 130-140. doi: 10.1177/0146167202238378 jones, s. e., haslam, s. a., york, l., & ryan, m. k. (2008). rotten apple or rotten barrel? social identity and children‟s responses to bullying. british journal of developmental psychology, 26(1), 117–132. doi:10.1348/026151007x200385 jones, s.e., manstead, a.s.r., & livingstone, a.g. (2012). fair-weather or foul-weather friends? group identification and children‟s responses to bullying. social psychology and personality science, 3(4), 414-420. doi: 10.1177/1948550611425105 http://www.teachernet.gov.uk/publications http://dx.doi.org/10.1111/sode.12047 s. e. jones et al. 76 | f l r jones, s.e., manstead, a.s.r., & livingstone, a.g. (2011). ganging up or sticking together: group processes and children‟s responses to bullying. british journal of psychology, 102 (1), 71-96. doi: 10.1348/000712610x502826 jones, s.e., manstead, a.s.r.,& livingstone, a.g. (2009). birds of a feather bully together: group processes and children‟s responses to bullying. british journal of developmental psychology, 27, 853873. doi: 10.1348/026151008_390267 khoury-kassabri, m. (2009). the relationship between staff maltreatment of students and bully-victim group membership. child abuse & neglect, 33, 914-923. kochenderfer-ladd, b., & pelletier, m. (2008). teachers' views and beliefs about bullying: influences on classroom management strategies and students‟ coping with peer victimization. journal of school psychology 46, 431-453. doi: 10.1016/j.jsp.2007.07.005 mishna, f., wiener, j., & pepler, d. (2008). some of my best friends: experiences of bullying within friendships. school psychology international, 29(5), 549-573. doi:10.1177/0143034308099201 morrison, b. (2006). school bullying and restorative justice: toward a theoretical understanding of the role of respect, pride and shame. journal of social issues, 62,(2) 371-392 doi: 10.1111/j.15404560.2006.00455.x nesdale, d. (2007). peer groups and children's school bullying: scapegoating and other group processes. european journal of developmental psychology, 4, 388-392.doi: 10.1080/17405620701530339 nesdale, d., durkin, k., maass, a., kiesner, j., & griffiths, j. (2008). effects of group norms on children's intentions to bully. social development, 17, 889–907. doi: 10.1111/j.1467-9507.2008.00475.x nesdale,d.,& pickering,k. (2006) teacher‟s reactions to children‟s aggression. social development, 15,(1)109-127. doi: 10.1111/j.1467-9507.2006.00332.x nicolaides, s., toda, y. & smith, p. k. (2002). knowledge and attitudes about school bullying in trainee teachers. british journal of educational psychology 72, 105–18. doi: 10.1348/000709902158793 ojala, k., & nesdale, d. (2004). bullying and social identity: the effects of group norms and distinctiveness threat on attitudes towards bullying. british journal of developmental psychology, 22, 19–35. doi: 10.1348/026151004772901096 olweus, d. (1994) annotation: bullying at school: basic facts and effects of a school based intervention program. journal of child psychology and psychiatry and allied disciplines, 35, 1171–1190.no doi o'moore, m., kirkham, c., & smith, m. (1998) bullying in schools in ireland : a nationwide study. irish educational studies, 17, 255 – 271. rigby, k. (2007). bullying in schools and what to do about it (updated, revised). melbourne: australian council for education research. roberts, s., zhang, j., truman, j., & snyder, t. d. (2012). indicators of school crime and safety: 2011 (pub no. nces 2012-002/ncj 236021). washington, dc: u.s. department of education and u.s. department of justice. retrieved from http://nces.ed.gov/pubs2012/2012002.pdf sainio, m., veenstra, r., huitsing, g., & salmivalli, c. (2011). victims and their defenders: a dyadic approach. international journal for behavioral development 35, 144-151. doi: 10.1177/0165025410378068 smith, p.k., & brain, s. (2000). bullying in schools: lessons from two decades of research. aggressive behavior, 26, 1-9. doi: 10.1002/(sici)1098-2337 strauss, a., & corbin, j. (1998). basics of qualitative research: techniques and procedures for developing grounded theory. thousand oaks, ca: sage. tajfel, h., billig, m. g., bundy, r. p., & flament, c. (1971). social categorization and intergroup behavior. european journal of social psychology, 1, 149-177. tajfel, h., & turner, j. (1979). an integrative theory of intergroup conflict. in w.g. austin & s. worchel (eds.) the social psychology of intergroup relations. (pp. 7-24). monterey, ca: brooks cole. turner, j. c. (1999). some current issues in research on social identity and self-categorization theories. in n. ellemers, r. spears, & b. doosje (eds.) social identity: context, commitment, content. (pp. 6-34). oxford: blackwell. http://nces.ed.gov/pubs2012/2012002.pdf s. e. jones et al. 77 | f l r whitney, i. & smith, p.k. (1993) a survey of the nature and extent of bullying in junior/middle and secondary schools. educational research, 35, 3–25. yoon, j. s., & kerber, k. (2003). bullying: elementary teachers‟ attitudes and intervention strategies. research in education 69, 27–35. yubero, s., & navarro, r. (2006): student‟s and teachers‟ views of gender-related aspects of aggression. school psychology international, 27, 488-512. doi: 10.1177/0143034306070436 schindler et al publication frontline learning research vol.7 no. 2 (2019) 23 39 issn 2295-3159 effectiveness of self-generation during learning is dependent on individual differences in need for cognition julia schindlera, simon schindlerb marc-andré reinhardb auniversity of würzburg, germany buniversity of kassel, germany article received 3 september 2018 / revised 18 february/ accepted 15 april / available online 7 may abstract self-generated information is better recognized and recalled than read information. this so-called generation effect has been replicated several times for different types of stimulus material, different generation tasks, and retention intervals. the present study investigated the impact of individual differences in learners’ disposition to engage in effortful cognitive activities (need for cognition, nfc) on the effectiveness of self-generation during learning. learners low in nfc usually avoid getting engaged in cognitively demanding activities. however, if these learners are explicitly instructed to use elaborate learning strategies such as self-generation, they should benefit more from such strategies than learners high in nfc, because self-generation stimulates cognitive processes that learners low in nfc usually tend not to engage in spontaneously. using a classical word-generation paradigm, we not only replicated the generation effect in free and cued recall but showed that the magnitude of the generation effect increased with decreasing nfc in cued recall. results are consistent with our assumption that learners higher in nfc engage in elaborate processing even without explicit instruction, whereas learners lower in nfc usually avoid cognitively demanding activities. these learners need cognitively demanding tasks that require them to switch from shallow to elaborate processing to improve learning. we conclude that self-generation is beneficial regardless of the nfc level, but our study extends the existing literature on the generation effect and on nfc by showing that self-generation can be particularly useful for balancing the learning disadvantage of students lower in nfc. keywords: desirable difficulties; generation effect; incidental learning; intentional learning; need for cognition info corresponding author: julia.schindler@uni-wuerzburg.de doi: 10.14786/flr.v7i2.407 1. introduction students often assume that learning strategies that are perceived as easy and effortless (e.g., rehearsal, rereading, or underlining) are highly effective. however, extant research suggests that under certain conditions learning is more effective when learners intentionally make the learning process more difficult (bjork and bjork 2011). specific difficulties, such as distributed learning sessions compared to massed learning (e.g., cepeda et al. 2006), interleaving topics (e.g., dunlosky et al. 2013), testing new knowledge (e.g., roediger and karpicke 2006), and self-generation of information (e.g., mcdaniel et al. 1988; slamecka and graf 1978), stimulate processes which are beneficial to the learning process. such difficulties often result in long-lasting memory for the learned material and make it easier to apply the acquired knowledge to new situations. thus, they are termed desirable difficulties (bjork 1994). the so-called generation effect (learners recall self-generated information better than read information) has been investigated extensively (for a meta-analysis, see bertsch et al. 2007). several extant studies used the classical word-generation paradigm by slamecka and graf (1978; see also e.g., mcdaniel et al. 1988). in these studies, learners were presented with word pairs consisting of a context word and a target word. in the read condition, learners read an associated word pair (saddle – horse). in the generate condition, they completed fragmented target words (saddle – h_ _ _ _) with the aid of the context word and a specific encoding rule (e.g., find the associated word). in subsequent learning tests, learners recalled and recognized generated target words better than read target words. the learning advantage of generated over read information has been replicated, for example, for different learning measures (recognition, cued recall, and free recall, e.g., mcdaniel et al. 1988; slamecka and graf 1978), for target and context words (mcdaniel and waddill 1990), for sentences (graf 1980, 1981), texts (e.g., doctorow et al. 1978; mcdaniel et al. 1986; mcdaniel et al. 2002), and numbers (e.g., gardiner and rowley 1984). it has been shown for immediate and delayed recall (e.g., schweickert et al. 1994; slamecka and frevreiski 1983), for withinvs. between-subject designs (fiedler et al. 1992), and for different generation rules (slamecka and graf 1978, exp.1 and 2). empirical studies like these suggest that self-generation might be a useful supplement to commonly used learning strategies in education. despite the extensive body of extant research on the generation effect, its general conditions of occurrence still need further clarification. mcdaniel and butler (2010) pointed out that self-generation is not necessarily beneficial for every learner and not appropriate for every type of learning material and criterial task. instead, they assume that complex interactions between the type of generation task, learner characteristics, learning material, and criterial task need to be considered when using self-generation to improve learning (see also mcdaniel and einstein 1989, 2005; einstein et al. 1990). the central claim of mcdaniel and butler’s contextual framework is that “desirable difficulties are those that stimulate processing that is not redundant with the processing spontaneously engaged by the learner (which […] will depend on learner characteristics, materials, or both) and that matches the demands of the criterial task” (p.179). in other words, self-generation can improve learning only when the generation task stimulates cognitive processes that go beyond the processes individual learners engage in spontaneously during learning or beyond processes encouraged by specific learning-material characteristics. one widely researched learner characteristic shown to differentially affect the degree that learners spontaneously engage in cognitive processing is a learner’s need for cognition. hence, need for cognition, in turn, is likely to affect the effectiveness of self-generation during learning. 2. need for cognition and the generation effect need for cognition (nfc) can be defined as a learner’s individual disposition to engage in effortful cognitive activities and to enjoy thinking and being cognitively challenged (cacioppo and petty 1982; cacioppo et al. 1984; cacioppo et al. 1986). high nfc is associated with thorough processing of arguments and argument quality (cacioppo et al. 1983; cacioppo et al. 1986), with thorough processing of task-relevant information (fleischhauer et al. 2014; reinhard 2010, exp. 1 & 2; reinhard and dickhäuser 2009; verplanklen et al. 1992), and thorough processing of learning materials (sadowski and gülgöz 1996). moreover, individuals high in nfc use more efficient learning strategies than learners low in nfc (cazan and indreica 2014), they tend to have better self-control during learning (bertrams and dickhäuser 2009; cazan and indreica 2014), and they are more willing to tackle difficult tasks (see et al. 2009; weißgerber et al. 2018). based on these findings, it is not surprising that individuals high in nfc recall learned information better than individuals low in nfc (cacioppo et al. 1983; kardash and noel 2000). they are more likely to solve complex problems or tasks (coutinho 2006; coutinho et al. 2005; nair and ramnarayan 2000) and they perform better on learning tests (heijne-penninga et al. 2010; sadowski and gülgöz 1996). consequently, individual differences in nfc are associated with academic achievement (luong et al. 2017). high nfc was found to be associated with course achievements (bertrams and dickhäuser 2009; sadowski and gülgöz 1996), university gpa (grade point average) (grass et al. 2017), course grades mediated by difficulty of learning material (leone and dalton 1988), and performance in exams mediated by self-regulated learning and deep information processing (cazan and indreica 2004). in sum, nfc seems to be directly or indirectly related to learner characteristics relevant for academic success and to different forms of academic performance and achievement measures (for a review see jebb et al. 2016; see also the meta-analyses by richardson et al. 2012 and von stumm and ackermann 2013). learners low in nfc are ‘cognitive misers’ (cacioppo et al. 1986; cacioppo et al. 1996) who usually avoid getting engaged in cognitively demanding activities. consistent with these findings, low nfc learners are found to be less willing to use elaborate learning strategies such as desirable difficulties in self-regulated learning than high nfc learners (weißgerber et al. 2018). in other words, learners low in nfc do not expend more cognitive resources on learning than necessary. however, if learners low in nfc are explicitly instructed to use elaborate learning strategies such as self-generation, they should benefit more from such strategies than learners high in nfc, because self-generation stimulates cognitive processes that learners low in nfc usually tend not to engage in spontaneously (mcdaniel and butler 2010). learners high in nfc, however, should readily engage in effortful cognitive processing of learning materials even when this is not explicitly required by the task (e.g., when just reading a text). for these learners, self-generation should contribute only weakly to their already elaborate processing. in sum, we assume that self-generation requires learners low in nfc to switch from less demanding shallow processing to elaborate processing, whereas learners high in nfc constantly use more elaborate processing strategies (see e.g., kardash and noel 2000). consequently, self-generation (compared to reading) should improve learning for learners low in nfc more strongly than for learners high in nfc. 2. the present study using a modified version of the classical word-generation paradigm by slamecka and graf (1978) and mcdaniel et al. (1988), the aim of the present study was to investigate the extent that effectiveness of self-generation in learning varies as a function of individual differences in nfc. learners were presented with word-pairs consisting of a context word and a target word. half of the presented word pairs consisted of incomplete target words which the learner needed to complete. (1) we aimed to replicate the generation effect, that is, we expected better recall for successfully generated target words than for read target words. (2) we expected to find a more pronounced generation effect for low nfc learners compared to high nfc learners. participants were also randomly assigned to one of two different learning settings. extant research found that the generation effect is more strongly pronounced in incidental than in intentional learning settings (bertsch et al. 2007). thus, to optimally investigate the expected interaction of learning condition (generation vs. reading) and individual differences in nfc, half of the participants were not informed about the learning test. however, learning in educational contexts is often intentional, for example when teachers and students purposefully use learning strategies to prepare for a learning test or to enhance the students’ learning outcome. thus, demonstrating that the effectiveness of self-generation differs as a function of individual differences in learners’ nfc not only in an incidental but also in an intentional learning setting would be highly relevant for adopting self-generation practices to applied educational contexts. hence, half of the participants were assigned to an intentional learning setting. 3. method 3.1 participants participants were 121 undergraduates, 19 grad students, and 3 non-students recruited at the campus of the university of kassel (germany). they came from varying disciplines with only 17 participants being undergraduate (n = 8) or graduate psychology students (n = 9). none of them surmised the exact purpose of our study. of the 143 participants in total (75 female, 68 male), 128 were native speakers of german. the age ranged from 17 to 55 with a mean age of 23.87 (sd = 4.73). all participants provided their written consent and were reimbursed with 5€ for their participation. 3.2 materials and procedure participants were tested individually or in groups of two to six in a laboratory. tasks and stimuli were presented on notebook computers. word-generation task. each participant was presented with 36 german word pairs in total consisting of a context word (e.g., kokon/ cocoon) and a semantically associated target word (e.g., raupe/ caterpillar). each target word belonged to one of six categories: fruit, body parts, clothing, animals, insects, and music instruments. six target words from each category were presented. half of the word pairs were complete (kokon – raupe), whereas the other half of the word pairs contained a fragmented target word (kokon – r_u_e) that participants were required to complete (varied within subjects). the number of slots indicated the number of missing letters in the generate condition. each word pair was presented in the middle of the notebook screen for a duration of 7 seconds with a 3 second interval between trials. participants were instructed to record the read and generated target words on a sheet of paper. half of them were instructed to memorize the target words for a later test (intentional learning setting, n = 71), and the other half was naive about the test (incidental learning setting, n = 72). each word pair occurred equally often in the generate and the read condition across participants. to ensure a balanced presentation of word pairs in both conditions, the 36 word pairs were divided into four blocks of nine word pairs. two blocks (18 word pairs) were presented in the generate and two blocks in the read condition for each participant. each block was paired equally often with each of the other three blocks in both conditions, which resulted in six stimulus lists. participants were randomly assigned to one of the six lists. presentation order of learning condition (generate-read vs. read-generate) was balanced across participants. word pairs were presented in randomized order within each learning condition. after the presentation of the final word pair, the experimenter collected the sheets of paper with the written target words. distractor task. after the word generation task, participants completed a computerized questionnaire on sleeping habits adopted from horne and ostberg (1975), which took participants about 5 minutes to complete. free and cued recall. following the distractor task, participants were asked to recall as many target words as possible within 5 minutes (free recall). after the free recall task, context words were presented for an additional 5 minutes in random order, and participants were asked to provide the target word to each context word (cued recall). need for cognition. participants completed the german 33-item need for cognition scale by bless et al. (1994). they read short statements (e.g., i really enjoy finding new solutions to problems; i prefer my life to be filled with puzzles that i must solve) and answered on a 7-point likert scale ranging from 1 (completely disagree) to 7 (completely agree). internal consistency of the nfc scale was high (cronbach’s = .88). a mean nfc score was calculated for each participant (m = 4.84; sd = .68; min = 3.09; max = 6.27). additional measures. for the purpose unrelated to the present study, we administered the personal and global belief in a just world scale (dalbert 1999) and the academic self-concept scale (dickhäuser et al. 2002). control measures. participants in the incidental learning group were asked to indicate whether they had expected a test and prepared for it. in addition, all participants reported on a 7-point likert scale the extent that they found completing the fragmented target words difficult (ranging from 1 – not difficult at all to 7 – very difficult), the extent that they were motivated in identifying the fragmented target words in the learning phase and also in recalling the target words in the tests phase (ranging from 1 – not motivated at all to 7 – highly motivated). given that nfc is an indicator of an individual’s disposition to engage in effortful cognitive activities (e.g., cacioppo and petty 1982), nfc was expected to correlate with learners’ self-reported task-specific motivation but not with self-reported generation difficulty. finally, participants were asked to report any additional strategies they used (such as grouping target words into semantic categories or rehearsal) to memorize the target words during the learning phase. given extant findings that learners higher in nfc use more efficient learning strategies than learners lower in nfc (cazan and indreica 2014), we assumed that learners higher in nfc would use not only more strategies than learners lower in nfc but also more elaborate learning strategies. sociodemographic data were collected via an additional questionnaire. 4. results control measures. as expected, learners’ nfc correlated significantly with learners’ self-reported motivation for generating target words (r = .24, p = .004) and with their self-reported motivation to recall the target words in the test phase (r = .19, p = .02) but not with self-reported generation difficulty (r = -.14, p = .10). in the intentional learning group, 16 of the 71 learners reported the use of elaborate learning strategies such as grouping target words into categories (fruit, insects, body parts etc.). five learners reported the use of mnemonic strategies such as rehearsal or rereading of their recorded target words, and 50 learners reported not having used specific learning strategies. despite the unannounced test, 10 of the 72 learners in the incidental learning group reported to have noticed the semantic categories of the target words and that they tried to make use of the categories to generate target words, 2 rehearsed or reread the target words once, and 60 used no specific processing strategy. no differences in nfc were found among learners who reported the use of elaborate learning strategies, mnemonic strategies, and no strategies. forty-nine of the 72 incidental learners reported not having expected a test. there was no significant difference in the learners’ nfc between those who did and those who did not expect the unannounced learning test. although 23 incidental learners checked the box yes, i did expect a learning test in the final questionnaire, 19 of these participants reported in the open answer field on strategy use not to have used any learning strategy at all (i.e., they not even tried to memorize the word pairs) or they reported that they did not exactly prepare for a learning test, even if they surmised that the words would be important somehow later in the study. we will return to the four remaining incidental learners who have expected and prepared for the unannounced learning test when we report free and cued recall accuracy. accuracy of recording target words. in total, 90.1% of the generated and read target words have been recorded correctly in the learning phase. in the read condition, 96.7% have been recorded correctly. in the generate condition, 83.5% were generated successfully. no differences in generation accuracy were found between learning settings, and the accuracy did not decrease with decreasing nfc. although learners lower in nfc reported less motivation for generating target words than learners higher in nfc, they made no more errors generating target words than learners higher in nfc. 4.1 free and cued recall accuracy data analysis procedure. we estimated generalized linear mixed models (glmms) with a logit link function (dixon 2008) for free and cued recall accuracy as dependent variables. one word pair from list 3 (0.4% of the data) was excluded from the analysis, because of a technical error in displaying the word pair. the models were estimated and tested with the software packages lme4 (bates et al. 2014) and lmertest for r (kutznetsova et al. 2014). the number of possible iterations of the optimizer was increased to 100,000 to account for the models’ complexity. all significance tests were based on a type i error probability of .05. separate models were estimated for free recall accuracy and cued recall accuracy. to test for differences in learners’ free and cued recall performance as a function of learning condition, learning setting, and individual differences in nfc, learning condition and learning setting were included as contrast-coded predictor variables (learning condition: -1=read, 1 = generate; learning setting: -1 = incidental, 1 = intentional) and nfc as continuous grand-mean centered predictor variable in the glmms with free and cued recall accuracy as dependent variables. two-way and three-way interaction effects were estimated for all variables. in addition, the intercept and all main and interaction effects were estimated for target words that were recorded correctly in the learning phase (90.1% of the data). from a theoretical perspective, estimating learning outcomes for incorrectly recorded target words, which could not have been learned properly, would be pointless. moreover, from an applied educational perspective, teachers who want to use self-generation to improve student learning must ensure that their students are capable of generating the information from the planned lessons (mcdaniel and butler 2010) and that the critical information can be generated successfully. to this aim, accuracy of recording target words was included as another dummy-coded predictor variable with correctly recorded target words being the reference category (0 = correctly recorded target words, 1 = incorrectly recorded target words). finally, because participants and word pairs were sampled from a larger population, intercepts for persons and word pairs were allowed to vary randomly. descriptive statistics are provided in table 1. the parameter estimates for the fixed and random effects are provided in table 2. in the following sections, we focus on the main and interaction effects of learning condition, learning setting, and nfc for correctly recorded target words only. table 1 descriptive statistics for free recall and cued recall for generated and read target words and nfc during incidental and intentional learning (n=143) table 2 fixed effects and variance components in the glmm for free recall accuracy and cued recall accuracy free recall accuracy. the glmm analysis with free recall accuracy as dependent variable revealed a significant main effect of learning condition (β = 0.51, z = 14.34, p < .001) indicating that learners recalled significantly more generated than read target words. moreover, the analysis revealed a significant two-way-interaction of learning setting and nfc (β = 0.13, z = 2.20, p = .03). recall for target words increased significantly with increasing nfc when learning was intentional (β = 0.18, z = 2.45, p = .01) but not when learning was incidental (see figure 1). figure 1. estimated probability for accurately recalling target words (free recall) for incidental and intentional learning: simple slopes for need for cognition and differences between learning settings estimated at three different levels of need for cognition. in sum, we replicated the generation effect (hypothesis 1), although we found no significant differences in the magnitude of the generation effect as a function of individual differences in nfc (hypothesis 2). our findings additionally indicate that learners higher in nfc prepared better for the announced learning test than learners lower in nfc. cued recall accuracy. the glmm analysis with cued recall accuracy as dependent variable revealed significant main effects of learning condition (β = 0.85, z = 19.69, p <.001) and nfc (β = 0.22, z = 2.44, p = .01). both main effects were further qualified by a significant two-way interaction of learning condition and nfc (β = -0.09, z = -2.03, p = .04). although all learners recalled generated target words significantly better than read target words (generation effect), the simple effect of learning condition was more strongly pronounced for learners lower in nfc (nfc minus 1 sd: β = 0.93, z=15.11, p <.001) than for learners higher in nfc (nfc plus 1 sd: β = 0.76, z = 12.61, p <.001). in other words, learners lower in nfc benefited more from generating target words than learners higher in nfc (see figure 2). moreover, the simple slope for nfc was significant in the read condition (β = 0.31, z = 3.25, p = .001) but not in the generate condition. that is, improved test performance with increasing nfc was found only in the read condition, whereas individual differences in nfc did not affect cued recall accuracy in the generate condition (figure 2). figure 2. estimated probability for accurately recalling generated and read target words (cued recall): simple slopes for need for cognition and differences between learning conditions estimated at three different levels of need for cognition. in sum, these findings are consistent with hypotheses 1 and 2. we replicated the generation effect and learners lower in nfc benefited more from the generation effect than learners higher in nfc. moreover, these findings revealed no differences between learning settings. finally, to test if our results were driven or contorted by test expectancy in the incidental learning setting, the four learners who reported to have expected and prepared for a learning test by memorizing the presented word-pairs were treated as intentional learners in additional analyses of free and cued recall accuracy (intentional learning setting: n = 75, incidental learning: n = 68). this did not change the results in terms of levels of significance. 5. discussion the aim of the present study was to investigate the moderating effect of individual differences in learners’ nfc on the generation effect. we expected (1) to replicate the generation effect and (2) to find a more strongly pronounced generation effect for learners lower in nfc than for learners higher in nfc. the results of the present study supported hypothesis 1 and corroborated hypothesis 2 for cued recall accuracy as dependent variable. learners recalled more generated target words than read target words with both free and cued recall accuracy as dependent variables. these results are consistent with extant empirical research on the beneficial effects of generation (for an overview, see bertsch et al. 2007). however, our study extends the existing literature on the generation effect, because our findings showed that individual differences in learners’ nfc moderate the magnitude of the generation effect. as expected, the analysis of cued recall accuracy revealed that the generation effect was more strongly pronounced for learners lower in nfc than for learners higher in nfc. that is, learners lower in nfc benefited significantly more from generating target words than learners higher in nfc. this finding is consistent with the idea that desirable difficulties such as self-generation are beneficial when they stimulate cognitive processes that learners tend not to engage in spontaneously (e.g., see mcdaniel and butler 2010). our finding that learners lower in nfc recalled less read target words than learners higher in nfc indicates that learners lower in nfc are cognitive misers who processed the read word pairs shallower than learners higher in nfc. however, the same learners recalled generated target words as accurately as the learners higher in nfc. we assume that especially the learners lower in nfc benefited from the generation task (as indicated by a more strongly pronounced generation effect compared to learners higher in nfc), because self-generation stimulated more elaborate cognitive processing of the word pairs. in other words, self-generation required them to switch from shallow cognitive processing to more elaborate processing (kardash and noel 2000). however, with increasing nfc, learners increasingly engaged in elaborate cognitive processing of the learning material (even without explicit instruction) as indicated by increasingly improved recall of read target words. for these learners, self-generation becomes increasingly redundant to the extent that they already show elaborate processing independent of the specific task. consequently, learners higher in nfc benefit less from the generation task than learners lower in nfc. it is noteworthy that in the generation condition, there was no main effect of nfc. learners lower in nfc recalled as much target words as learners higher in nfc. in other words, self-generation helped the learners lower in nfc to close the learning gap on learners higher in nfc. the finding that cued recall accuracy for read target words improved with increasing nfc is consistent with extant empirical findings showing that learners high in nfc recall information better than learners low in nfc (e.g., cacioppo et al. 1983; heijne-penninga et al. 2010; sadowski and gülgöz 1996). however, our findings suggest that this disadvantage of learners lower in nfc can be balanced by using generative tasks that stimulate elaborate cognitive processing. in contrast to cued recall, hypothesis 2 was not supported by the free recall data. although we replicated the generation effect for free recall accuracy as dependent variable, individual differences in nfc were not found to moderate the magnitude of the generation effect. plausible explanations are the different task requirements of free and cued recall and how they match the kind of processing in the learning phase (e.g., see the contextual framework, mcdaniel and butler 2010 or transfer-appropriate processing, morris et al. 1977). to successfully generate a target word in the learning phase, learners were required to establish a mental link between the context word and the target word. this mental link could then be used as a scaffold to retrieve the generated target words from memory when context words were provided as cues in the cued recall task. in free recall, however, the mental links established during the generation task could not serve as scaffolds for target word retrieval without providing context words (note, in this context, the elaborate processing of the target word still improves learning in the generate compared to the read condition). this idea is supported by the finding that learners recalled about twice as much target words in cued recall compared to free recall (see descriptive statistics in table 1). note that learners at all nfc levels should have established elaborate mental links between a context word and the target word in the generate condition (because it was required by the task). in contrast, only learners high in nfc should have established such links between context and target words in the read condition. this interaction between learning condition and nfc, however, can only be seen when the criterial task draws upon these established mental links, that is, in cued but not in free recall. the idea of self-generation as scaffold to enhance memory for the target word by constructing a mental bridge between context and target word might suggest that self-generation is kind of an epistemic action („an external [i.e., not solely mental] action that an agent performs to change his or her own computational state” in contrast to pragmatic actions “whose primary function is to bring the agent closer to his or her physical goal”, kirsh and maglio 1994, pp. 514–515; see also kirsh 2006). from this perspective, one could argue that self-generation is an external action that alters the environment (here the learning material) and, thereby, adds to problem solving (here target-word memory). as was demonstrated for epistemic actions (kirsh and maglio; maglio and kirsh, 1996), it is only in hindsight, that the benefit of the additional and putatively unnecessary generation task becomes evident. in contrast to epistemic actions, however, desirable difficulties do not reduce working memory load, the number of cognitive steps involved in processing, or the probability of processing errors (see kirsh and maglio). instead, desirable difficulties are characterized by increasing cognitive effort in a way that is beneficial to learning. they are, by definition, no reduction of complexity. in this way, self-generation is clearly distinct from epistemic actions. the participants in our study were randomly assigned to one of two learning settings – an incidental and an intentional learning setting. learning in educational contexts is often intentional, for example when teachers and students purposefully use learning strategies to prepare for a test or to enhance the students’ learning outcome. hence, demonstrating that the two-way interaction of learning condition and nfc shows in an intentional learning setting would further corroborate the practical relevance of our findings. as expected, the finding that the generation effect was more strongly pronounced for learners lower in nfc than for learners higher in nfc (cued recall) did not differ between learning settings. neither the main effect of learning setting nor the interaction effects of learning setting with learning condition and nfc became significant. this result suggests that the compensatory effect of self-generation on target word memory of learners lower in nfc occurs independently of the learning setting in cued recall. in free recall, we found that learners higher in nfc recalled more target words than learners lower in nfc when learning was intentional. this finding suggests that learners higher in nfc voluntarily invested more cognitive resources on preparation for a test than learners lower in nfc (even when test performance had no actual consequences for their studies). this interpretation is consistent with extant studies demonstrating that learners higher in nfc are more willing to tackle difficult tasks than learners lower in nfc (see et al. 2009; weißgerber et al. 2018). we assume that (in addition to establishing mental links between context and target words) they might have tried to explicitly memorize the target words to be prepared for later recall. since free recall (in contrast to cued recall) assesses context-free retrieval of target words, deeper processing of target words in the learning phase led to increased recall accuracy for learners higher in nfc independent of learning condition. in sum, the different findings for free and cued recall obtained in our study can be explained by different task requirements of both criterial tasks and how each of them matched the kind of processing in the learning phase. finally, extant studies showed that learners higher in nfc use more efficient learning strategies than leaners lower in nfc (cazan and indreica 2014). hence, we assumed that learners higher in nfc would use more elaborate learning strategies than learners lower in nfc. however, participants self-reported use of elaborate, less elaborate, and no additional learning strategies did not vary as a function of individual differences in nfc. a likely explanation for this finding is that 7 seconds of word-pair presentation and 3 seconds of inter-stimulus interval are too short a time for most participants to properly administer additional learning strategies, let alone elaborate ones. this might be different for more complex learning material such as texts or algebraic word problems and remains to be investigated in future research. the findings reported in this study should be interpreted with possible limitations in mind. in everyday life, learners usually deal with learning material that is much more complex than isolated word pairs. moreover, most of the time, learners are unaware of the kind of criterial task for which to prepare, and when preparing for a test or exam, retention intervals are usually longer (several days or weeks) than just a few minutes as in most laboratory studies on the generation effect. despite these limitations, the findings of the present study have important theoretical and practical implications. the finding that learners recalled three times as much target words in the generate condition as they recalled in the read condition (see descriptive statistics, table 1) strongly suggests that self-generation might be a useful supplement to commonly used learning strategies in education. the findings of the present study, however, also indicate that educators should be prepared to find individual differences in the effectiveness of generative activities depending on learners’ characteristics. we demonstrated for the first time that the generation effect differs as a function of individual differences in nfc. for those high in nfc, self-generation contributes comparatively little to learning. it can, however, be highly beneficial for learners low in nfc (both in incidental and intentional learning settings). this suggests that self-generation can be used to systematically improve learning for those who are likely to fall behind their peers due to low engagement in effortful cognitive processing. since nfc is easily and quickly accessed in single learning settings as well as in classrooms, learners low in nfc and, thus, in special need for cognitively demanding learning instructions can (and should) be effortlessly identified. another important practical implication of our study is that the compensatory effect of self-generation becomes visible only when the generation task matches the requirements of the criterial task. when adopting self-generation as a learning strategy in educational contexts (e.g., school classes, educational books, or computerized learning environments), teachers, authors, and programmers should ensure that the generation task encourages cognitive processes relevant to the test, exam, or task for which the learners prepare (mcdaniel and butler 2010). the results of the present study raise some interesting future research questions. first, future research needs to replicate and extend the reported findings with more complex and naturalistic learning materials (e.g., math problems or expository texts), in more naturalistic settings (such as classrooms or in collaborative action learning), with different types of generation and criterial tasks and longer retention intervals. moreover, when using more complex learning material such as texts, other learner characteristics such as working memory, reading ability, creativity, learning goals, and openness to ideas should be considered alongside nfc to account for mutual variance that these components might share with nfc and to account for possible moderating effects to further optimize the use of self-generation in everyday learning settings. finally, the interaction of self-generation and nfc raises not only the question which further learner characteristics might affect the effectiveness of self-generation, but also which other desirable difficulties might be affected by individual differences in nfc. a possible candidate to look at might be the testing effect. access to and alteration of knowledge structures during learning, relearning, and retesting might be differential for learners high in nfc and those low in nfc. 6. conclusion the present study replicated the generation effect with a version of the classical word-generation paradigm by slamecka and graf (1978) and mcdaniel et al. (1988). learners recalled generated target words better than read target words. moreover, our study demonstrated for the first time that individual differences in learners’ nfc moderate the effectiveness of self-generation during learning. learners lower in nfc benefited significantly more from generating target words than learners higher in nfc when retrieval cues were provided in the test phase. this finding corroborates mcdaniel and butler’s (2010) explanation that desirable difficulties such as self-generation are only beneficial when they stimulate cognitive processes that learners tend not to engage in spontaneously. we assume that learners higher in nfc voluntarily engage in elaborate cognitive information processing even without explicit instruction, whereas learners lower in nfc need a cognitively demanding task that requires them to switch from shallow to elaborate cognitive processing to improve learning. the reported findings suggest that using self-generation in educational contexts is beneficial for learners at all levels of nfc, but it could be systematically used to improve learning for learners with a weak disposition to engage in cognitively demanding learning processes. keypoints desirable difficulty generation effect incidental learning intentional learning need for cognition acknowledgments the research presented in this article was supported by the federal state of hessen and its loewe research initiative desirable difficulties in learning (loewe: landes-offensive zur entwicklung wissenschaftlich-ökonomischer exzellenz [state offensive for the development of scientific and economic excellence]). we would like to thank our student assistants for assisting in data collection and coding. researchers who are interested in the stimulus material are invited to send an e-mail to the first or the second author. references bates, d., maechler, m., bolker, b., walker, s., christensen, r. h. b., & sigmann, h. (2014). lme4: linear mixed-effects models using eigen and s4 [software]. r-package version 1.1-6. retrieved may 1, 2014 from: http://cran.r-project.org/package=lme4 bertrams, a., & dickhäuser, o. (2009). high-school students' need for cognition, self-control capacity, and school achievement: testing a mediation hypothesis. learning and individual differences, 19(1), 135–138. doi:10.1016/j.lindif.2008.06.005 bertsch, s., pesta, b. j., wiscott, r., & mcdaniel, m. a. (2007). the generation effect: a meta-analytic review. memory & cognition, 35(2), 201–210. doi:10.3758/bf03193441 bjork, r. a. (1994). memory and metamemory considerations in the training of human beings. in j. metcalfe & a. p. shimamura (eds.), metacognition: knowing about knowing (pp.185–205). cambridge: mit press. bjork, e. l., & bjork, r. a. (2011). making things hard on yourself, but in a good way: creating desirable difficulties to enhance learning. in m. a. gernsbacher, r. w. pew, l. m. hough, & j. r. pomerantz (eds.), psychology and the real world: essays illustrating fundamental contributions to society (pp. 56–64). new york: worth publishers. bless, h., wänke, m., bohner, g., fellhauer, r. f., & schwarz, n. (1994). need for cognition: eine skala zur erfassung von engagement und freude bei denkaufgaben [need for cognition: a scale measuring engagement and happiness in cognitive tasks]. zeitschrift für sozialpsychologie, 25, 147–154. cacioppo, j. t., & petty, r. e. (1982). the need for cognition. journal of personality and social psychology, 42(1), 116–131. doi:10.1037/0022-3514.42.1.116 cacioppo, j. t., petty, r. e., & kao, c. f. (1984). the efficient assessment of need for cognition. journal of personality assessment, 48(3), 306–307. doi:10.1207/s15327752jpa4803_13 cacioppo, j t., petty, r. e., kao, c. f., & rodriguez, r. (1986). central and peripheral routes to persuasion: an individual difference perspective. journal of personality and social psychology, 51(5), 1032–1043. doi:10.1037/0022-3514.51.5.1032 cacioppo, j. t., petty, r. e., & morris, k. j. (1983). effects of need for cognition on message evaluation, recall, and persuasion. journal of personality and social psychology, 45(4), 805–818. doi:10.1037/0022-3514.45.4.805 cazan, a.-m., & indreica, s. e. (2014). need for cognition and approaches to learning among university students. procedia social and behavioral sciences, 127, 134–138. doi:10.1016/j.sbspro.2014.03.227 cepeda, n. j., pashler, h., vul, e., wixted, j. t., & rohrer, d. (2006). distributed practice in verbal recall tasks: a review and quantitative synthesis. psychological bulletin, 132(3), 354–380. doi:10.1037/0033-2909.132.3.354 coutinho, s. a. (2006). the relationship between the need for cognition, metacognition, and intellectual task performance. educational research and reviews, 1(5), 162–164. coutinho, s. a., wiemer-hastings, k., skowronski, j. j., & britt, m. a. (2005). metacognition, need for cognition and use of explanations during ongoing learning and problem solving. learning and individual differences, 15(4), 321–337. doi:10.1016/j.lindif.2005.06.001 dalbert, c. (1999). the world is more just for me than generally: about the personal belief in a just world scale’s validity. social justice research, 12(2), 79–98. doi:10.1023/a:1022091609047 dickhäuser, o., schöne, c., spinath, b., & stiensmeier-pelster, j. (2002). die skalen zum akademischen selbstkonzept: konstruktion und überprüfung eines neuen instrumentes [the academic self-concept scales: construction and evaluation of a new instrument].zeitschrift für differentielle und diagnostische psychologie, 23(4), 393–405. doi:10.1024//0170-1789.23.4.393 dixon, p. (2008). models of accuracy in repeated-measures designs. journal of memory and language, 59(4), 447–456. doi:10.1016/j.jml.2007.11.004 doctorow, m., wittrock, m. c., & marks, c. (1978). generative processes in reading comprehension. journal of educational psychology, 70(2), 109–118. doi:10.1037/0022-0663.70.2.109 dunlosky, j., rawson, k. a., marsh, e. j., nathan, m. j., & willingham, d. t. (2013). improving students’ learning with effective learning techniques: promising directions from cognitive and educational psychology. psychological science in the public interest, 14(1), 4–58. doi:10.1177/1529100612453266 einstein, g. o., mcdaniel, m. a., owen, p. d., & coté, n. c. (1990). encoding and recall of texts: the importance of material appropriate processing. journal of memory and language, 29(5), 566–581. doi: 10.1016/0749-596x(90)90052-2 fiedler, k., lachnit, h., fay, d., & krug, c. (1992). mobilization of cognitive resources and the generation effect.the quarterly journal of experimental psychology, section a, 45(1), 149–171. doi:10.1080/14640749208401320 fleischhauer, m., miller, r., enge, s., & albrecht, t. (2014). need for cognition relates to low-level visual performance in a metacontrast masking paradigm. journal of research in personality, 48, 45–50. doi:10.1016/j.jrp.2013.09.007 gardiner, j. m., & rowley, j. m. c. (1984). a generation effect with numbers rather than words. memory & cognition, 12(5), 443–445. doi:10.3758/bf03198305 graf, p. (1980). two consequences of generating: increased inter and intraword organization of sentences. journal of verbal learning and verbal behavior, 19(3), 316–327. doi:10.1016/s0022-5371(80)90248-0 graf, p. (1981). reading and generating normal and transformed sentences.canadian journal of psychology/revue canadienne de psychologie, 35(4), 293–308. doi:10.1037/h0081193 grass, j., strobel, a., & strobel, a. (2017). cognitive investments in academic success: the role of need for cognition at university. frontiers in psychology, 8, 790. doi:10.3389/fpsyg.2017.00790 heijne-penninga, m., kuks, j. b. m., hofman, w. h. a., & cohen-schotanus, j. (2010). influences of deep learning, need for cognition and preparation time on openand closed-book test performance. medical education, 44(9), 884–891. doi:10.1111/j.1365-2923.2010.03732.x horne, j. a., & ostberg, o. (1976). a self-assessment questionnaire to determine morningness-eveningness in human circadian rhythms. international journal of chronobiology, 4, 97–110. jebb, a. t., saef, r., parrigon, s., & woo, s. e. (2016). the need for cognition: key concepts, assessment, and role in educational outcomes. in a. lipnevich, f. preckel, & r. d. roberts (eds.), psychosocial skills and school systems in the twenty-first century: theory, research, and applications. the springer series on human exceptionality (pp.115¬¬–132). new york: springer. doi:10.1007/978-3-319-28606-8_5 kardash, c. a. m., & noel, l. k. (2000). how organizational signals, need for cognition, and verbal ability affect text recall and recognition. contemporary educational psychology, 25(3), 317–331. doi:10.1006/ceps.1999.1011 kirsh, d. (2006). distributed cognition. a methodological note. pragmatics and cognition, 14(2), 249–262). doi:10.1075/pc.14.2.06kir kirsh, d., & maglio, p. (1994). on distinguishing epistemic from pragmatic action. cognitive science, 18, 513–549. doi:10.1016/0364-0213(94)90007-8 kuznetsova, a., brockhoff, p. b., & christensen, r. h. b. (2014). lmertest: tests for random and fixed effects for linear mixed effect models (lmer objects of lme4 package) . r-package version 2.0-6. retrieved in june 2014 from: http://cran.r-project.org/web/packages/lmertest/index.html leone, c., & dalton, c. h. (1988). some effects of the need for cognition on course grades. perceptual and motor skills, 67(1), 175–178. doi:10.2466/pms.1988.67.1.175 luong, c., strobel, a., wollschläger, r., greiff, s., vainikainen, m.-p., & preckel, f. (2017). need for cognition in children and adolescents: behavioral correlates and relations to academic achievement and potential. learning and individual differences, 53, 103–113. doi:10.1016/j.lindif.2016.10.019 maglio, p. p. & kirsh, d. (1996). epistemic action increases with skill. in g. w. cottrell (ed.), proceedings of the eighteenth annual conference of the cognitive science society (pp. 391–396). mahwah, nj: erlbaum. mcdaniel, m. a., & butler, a. c. (2010). a contextual framework for understanding when difficulties are desirable. in a. s. benjamin (ed.), successful remembering and successful forgetting: a festschrift in honor of robert a. bjork (pp. 175-198). new york: taylor and francis. doi:10.4324/9780203842539 mcdaniel, m. a., & einstein, g. o. (1989). material-appropriate processing: a contextualist approach to reading and studying strategies. educational psychology review, 1(2), 113–145. doi:10.1007/bf01326639 mcdaniel, m. a., & einstein, g. o. (2005). material appropriate difficulty: a framework for determining when difficulty is desirable for improving learning. in a. f. healy (ed.), experimental cognitive psychology and its applications (pp. 73–85). washington, dc: american psychological association. doi:10.1037/10895-006 mcdaniel, m. a., einstein, g. o., dunay, p. k., & cobb, r. e. (1986). encoding difficulty and memory: toward a unifying theory. journal of memory and language, 25(6), 645–656. doi:10.1016/0749-596x(86)90041-0 mcdaniel, m. a., hines, r. j., & guynn, m. j. (2002). when text difficulty benefits less-skilled readers. journal of memory and language, 46(3), 544–561. doi:10.1006/jmla.2001.2819 mcdaniel, m. a., & waddill, p. j. (1990). generation effects for context words: implications for item-specific and multifactor theories. journal of memory and language, 29(2), 201–211. doi:10.1016/0749-596x(90)90072-8 mcdaniel, m. a., waddill, p. j., & einstein, g. o. (1988). a contextual account of the generation effect: a three-factor theory. journal of memory and language, 27(5), 521–536. doi:10.1016/0749-596x(88)90023-x morris, c. d., bransford, j. d., & franks, j. j. (1977). levels of processing versus transfer appropriate processing. journal of verbal learning and verbal behavior, 16(5), 519–533. doi:10.1016/s0022-5371(77)80016-9 nair, k. u., & ramnarayan, s. (2000). individual differences in need for cognition and complex problem solving. journal of research in personality, 34(3), 305–328. doi:10.1006/jrpe.1999.2274 reinhard, m.-a (2010). need for cognition and the process of lie detection. journal of experimental social psychology, 46(6), 961–971. doi:10.1016/j.jesp.2010.06.002 reinhard, m.-a., & dickhäuser, o. (2009). need for cognition, task difficulty, and the formation of performance expectancies. journal of personality and social psychology, 96(5), 1062–1076. doi:10.1037/a0014927 richardson, m., abraham, c., & bond, r. (2012). psychological correlates of university students’ academic performance: a systematic review and meta-analysis. psychological bulletin, 138(2), 353–387. doi:10.1037/a0026838 roediger, h. l., & karpicke, j. d. (2006). test-enhanced learning: taking memory tests improves long-term retention. psychological science, 17(3), 249–255. doi:10.1111/j.1467-9280.2006.01693.x sadowski, c. j., & gülgöz, s. (1996). elaborative processing mediates the relationship between need for cognition and academic performance. the journal of psychology, 130(3), 303–307. doi:10.1080/00223980.1996.9915011 schweickert, r., mcdaniel, m. a., & riegler, g. (1994). effects of generation on immediate memory span and delayed unexpected free recall.the quarterly journal of experimental psychology, section a, 47(3), 781–804. doi:10.1080/14640749408401137 see, y. h. m., petty, r. e., & evans, l. m. (2009). the impact of perceived message complexity and need for cognition on information processing and attitudes. journal of research in personality, 43(5), 880–889. doi:10.1016/j.jrp.2009.04.006 slamecka, n. j., & fevreiski, j. (1983). the generation effect when generation fails. journal of verbal learning and verbal behavior, 22(2), 153–163. doi:10.1016/s0022-5371(83)90112-3 slamecka, n. j., & graf, p. (1978). the generation effect: delineation of a phenomenon.journal of experimental psychology: human learning and memory, 4(6), 592–604. doi:10.1037/0278-7393.4.6.592 verplanken, b., hazenberg, p. t., & palenéwen, g. r. (1992). need for cognition and external information search effort. journal of research in personality, 26(2), 128–136. doi:10.1016/0092-6566(92)90049-a von stumm, s. & ackerman, p. l. (2013). investment and intellect: a review and meta-analysis. psychological bulletin, 139(4), 841–869. doi:10.1037/a0030746 weißgerber, c. s., reinhard, m.-a., & schindler, s. (2018). learning the hard way: need for cognition influences attitudes towards and self-reported use of desirable learning difficulties. educational psychology, 38(2), 176–202. doi:10.1080/01443410.2017.1387644 microsoft word catrysse et al_publication.docx frontline learning research vol.4 no. 1 (2016) 1-16 issn 2295-3159 mapping processing strategies in learning from expository text: an exploratory eye tracking study followed by a cued recall catrysse leena1, gijbels davida, donche vincenta, de maeyer svena, van den bossche pieta, gommers lucib a university of antwerp, belgium b university of st. gallen, switzerland article received 22 july / revised 12 december / accepted 12 december / available online 27 january abstract this study starts from the observation that current empirical research on students’ processing strategies in higher education has mainly focused on the use of self-report instruments to measure students’ general preferences towards processing strategies. in contrast, there is a rather limited use of more direct and online observation techniques to uncover differences in processing strategies at a task specific level. we based our study on one of the most influential studies in the domain of students’ approaches to learning (sal) (marton, dahlgren, säljö, & svensson, 1975). in our exploratory experiment we used eye tracking followed by a cued recall to investigate how students use processing strategies in learning from expository text. nineteen university students participated in the experiment. results suggested that students in the deep condition did not look longer at the essentials in the text compared with students in the surface condition, but that they processed them in a more deep way. in our sample, students in the surface condition looked longer at facts and details and also reported repeating these facts and details more often. we suggest that the combination of eye tracking followed by a cued recall is a promising tool to investigate students’ processing strategies since not all differences in processing strategies are reflected in overt eye movement behaviour. the current methodology allows researchers in the domain of sal to complement and extend the present knowledge base that has accumulated through years of research with self-report questionnaires and interviews on students’ general preferences towards processing strategies. keywords: processing strategies; expository text; eye tracking; cued recall; higher education 1 corresponding author: catrysse leen, faculty of social sciences, department of training and education sciences, research group edubron. gratiekapelstraat 10, 2000 antwerpen, belgium. e-mail: leen.catrysse@uantwerpen.be doi: http://dx.doi.org/10.14786/flr.v4i1.192 catrysse et al 2 | f l r 1. introduction learning from text is one of the most essential skills in our modern society and the ability to understand challenging texts is an important key to success in education and beyond (mason, tornatora, & pluchino, 2013; mcnamara, 2004; moss, schunn, schneider, mcnamara, & vanlehn, 2011). one of the research traditions that is interested in how students learn from text is the domain of student approaches to learning (sal) (gijbels, donche, richardson, & vermunt, 2014; lonka, olkinuora, & mäkinen, 2004; richardson, 2000). research in the sal domain is founded on the seminal studies by marton and his colleagues in the 1970s in sweden (marton et al., 1975). they investigated how students went about reading academic texts in experimental situations by conducting retrospective interviews (marton et al., 1975; richardson, 2000). a distinction was made between deep processing strategies and surface processing strategies, which has been influential in the later development of self-report questionnaires to quantify individual differences in students’ processing strategies (biggs, 1987; entwistle & mccune, 2004). up till now, empirical studies in the sal field have mainly been focused on the use of self-report instruments such as interviews and questionnaires to uncover differences in students’ general preferences towards processing strategies. although these offline measures are claimed to be reliable and valid at this general level, many authors argue that the results are poor indicators of the actual processing at a task specific level (perry & winne, 2006; samuelstuen & braten, 2007; veenman, 2005; veenman, bavelaar, de wolf, & van haaren, 2014). recently, there has been a plea for the use of more direct and online measurement tools when it comes to describe students’ processing strategies (richardson, 2013). in the present study we will therefore use eye tracking to map individual differences in cognitive processing followed by a cued recall. eye tracking provides a unique opportunity to study processing strategies in a level of detail that no other measures can provide (lai et al., 2013; van gog & jarodzka, 2013). in what follows we will describe how different processing strategies can be manipulated in experimental designs by the assessment demands, and how eye tracking followed by a cued recall can be useful to investigate differences in processing strategies. 2. uncovering differences in processing strategies processing strategies refer to cognitive activities a student applies whilst studying (vermunt & vermetten, 2004). in general, two main types of processing strategies are described in the literature namely deep and surface processing strategies (gijbels et al., 2014). research in the sal domain showed that deep processors try to comprehend what the author wants to say about a certain topic, try to understand the overall meaning of the text, try to relate the message to a wider context and to prior knowledge, identify the main ideas and adopt a critical angle to the conclusion. in contrast, surface processors direct their attention towards learning the text itself, focus more on specific comparisons, focus on the parts of the text in sequence, memorize details and definitions, remember introductory sentences and list points (biggs & tang, 2007; entwistle & ramsden, 1982; marton et al., 1975; richardson, 2000). 2.1. processing strategies and task demands in the 1960s, rothkopf (1966) introduced the concept of mathemagenic activities, which refers to activities that stimulate students to actively engage in learning. the use of adjunct questions in written texts is one example of these mathemagenic activities. one possible type of an adjunct question is the inserted post question, which is placed within the text and follows the text passage containing the information needed. these questions result in a change in the processing strategy on subsequent text passages. they steer students attention to a specific type of information in the text (hamaker, 1986; rothkopf, 1966). catrysse et al 3 | f l r similarly, researchers in the sal domain agree that one of the most salient contextual variables to influence processing strategies is the assessment method (baeten, kyndt, struyven, & dochy, 2010; gielen, dochy, & dierick, 2003; marton et al., 1975; scouller, 1998; scouller & prosser, 1994; segers, nijhuis, & gijselaers, 2006). research showed that how students learn is influenced by their initial preference for a processing strategy (baeten et al., 2010), but they can shift between deep and surface processing strategies according to the assessment demands, also known as the backwash-effect of assessment (baeten et al., 2010; gielen et al., 2003; segers et al., 2006). in contrast to adjunct question research (hamaker, 1986; rothkopf, 1966), research in the sal domain evaluated the effect of the assessment method at the end of a text or study process, without inserting questions in the text or interrupting the study process. in the experiments of marton et al. (1975), students were asked to read three texts and to prepare for answering some questions on the content after reading them. the questions they received after the first two texts were the only indication on how to behave during reading the third text. students in the deep condition received questions at a deep level (e.g., making a summary statement), while students in the surface condition received reproduction-oriented questions. after studying the third text, a semi-structured interview was conducted to gather data on the effect of the experimental manipulation on the levels of processing. the results of the interviews suggested that students tended to adapt the intended level of processing (marton et al., 1975; richardson, 2000). this study was the first study in the sal domain to confirm the possibility to manipulate students’ levels of processing by appropriate questions or prompts. it shows that the level of processing depends on the expected form of assessment (richardson, 2000). another study of scouller and prosser (1994) suggested that the assessment method influences processing strategies. their research showed that multiple-choice questions led to more surface processing strategies. also research of scouller (1998) investigated how students perceived two assessment methods namely multiple-choice examination and an assignment essay and which processing strategies they used. the findings were in line with scouller and prosser (1994), multiple-choice examination was perceived as assessing lower levels of intellectual abilities and students indicated to engage in more surface processing strategies. an assignment essay was perceived as testing higher-level intellectual abilities and students engaged in more deep processing strategies. a last study of segers et al. (2006) showed that students who perceive the demands on a deep level, to demonstrate a thorough understanding and integration of knowledge, are more likely to employ deep processing strategies. in contrast, students who perceive the demands of assessment on a surface level, to acquire passive acquisition and reproduction of details, are expected to employ more surface processing strategies such as rote learning and concentrating on facts and details. 2.2. processing strategies and eye tracking online measures to map cognitive processing strategies include the think aloud method, observation of behaviour and eye movement measurement (schellings, 2011; veenman, 2011). the think aloud method provides a rich source of data, but it is intrusive and can alter the processing itself (ericsson & simon, 1993; veenman, 2005). the main limitation of the observation of behaviour is that it cannot detect covert cognitive processes (veenman, 2005). according to hyönä and lorch (2004) eye tracking is an attractive method for studying cognitive processing strategies in comparison with other online measures because eye tracking collects several indices of processing simultaneously and does not disrupt normal processing. there are two theoretical assumptions that make the relation between eye movement and cognitive processing clear: the immediacy assumption and the eye-mind assumption (just & carpenter, 1980). the immediacy assumption states that information processing is not postponed and takes place when the information is encountered. the eye-mind hypothesis explains that eye movements are closely linked to the focus of attention as students process the information in the text. therefore, eye movements can be used to trace cognitive processing when learning from text (hyönä, lorch, & rinck, 2003; just & carpenter, 1980). in eye tracking research, the movement of the eyeball is recorded and these movements are related to a stimulus. this allows us to investigate to what parts of the text a student allocates visual attention and for how long (holmqvist et al., 2011; van gog & jarodzka, 2013). a distinction is made between two main catrysse et al 4 | f l r measures namely fixations and saccades. during fixations the eye is almost completely still and information can be extracted from the text. in contrast, during saccades the focus of visual attention is moved to another location and the eye is rapidly moving between fixations, as a result students are not able to extract information from text during saccades (holmqvist et al., 2011; lai et al., 2013; van gog & jarodzka, 2013). although eye tracking methodology seems a promising tool to investigate students’ processing strategies, we could not find studies that examine eye movement behaviour that results from using different cognitive processing strategies such as deep and surface processing. in another related research field, namely research in reading comprehension, they already adopted the eye tracking methodology (hyönä, lorch, & kaakinen, 2002; ponce & mayer, 2014; rayner, 1998). more specifically, the perspective driven text comprehension framework states that the allocation of visual attention is influenced by the reading perspective and this reading perspective shapes the cognitive processing in learning from text (kaakinen & hyönä, 2005, 2007, 2010; kaakinen, hyönä, & keenan, 2002). a reading perspective refers to the mental frame from which the reader approaches a text and this perspective makes parts of the text more important to the reader than others (hyönä et al., 2003; kaakinen & hyönä, 2007). kaakinen and hyönä (2007) gave the example that when you read a travel guide in order to find information about a specific country (e.g., finland), you will approach the text with a specific reading perspective. this reading perspective is thus content related. alternatively, processing strategies correspond to the different aspects of the learning material on which the learner focuses (richardson, 2000). so students with different processing strategies focus on the same content but search for other types of information (e.g., facts and details vs. essences) (schellings, van hout-wolters, & vermunt, 1996). research that investigates the influence of reading perspective on eye movements showed that there is more time spent on relevant words or facts in the text than on irrelevant words or facts (kaakinen & hyönä, 2007; kaakinen et al., 2002). next to that, relevant words attracted more refixations than irrelevant words (kaakinen & hyönä, 2007). research of kaakinen and hyönä (2005) indicated that the extra time spent on relevant information is used to rehearse this information in order to encode it to memory. particularly relevant for research on learning from text is that these refixations reflect purposeful and effortful strategic eye behaviour (ariasi & mason, 2010). eye tracking is an interesting method to investigate cognitive processes, but to reduce the amount of inferences required by the researcher, eye movement data should be combined with other data such as verbal reports (hyönä, 2010; van gog & jarodzka, 2013). recent studies have already applied the think aloud method to obtain verbal reports on students’ processing strategies during reading and learning from text (dinsmore & alexander, 2012, 2015). concurrent reporting while learning from text can affect the eye movement patterns, and therefore cued retrospective reporting offers a valuable alternative in combination with eye tracking (van gog & jarodzka, 2013). besides recording the eye movement, the eye tracking software allows replaying the records of eye movements. using this eye movement pattern as a memory cue, it may help learners to recover how they encoded and interpreted elements in the text (hyönä, 2010; penttinen, anto, & mikkilä-erdmann, 2012; van gog, paas, & van merrienboer, 2005). because of the small delay after processing the text and the presentation of the memory cue, students are still able to report on their cognitive processes (veenman, 2005, 2011). for this reason we chose to use cued retrospective reporting to triangulate with eye movement measures. 3. present study our study aims to extend current research on processing strategies by using eye tracking methodology followed by a cued recall to map differences in processing strategies. this more direct and online way of measuring processing strategies allows to learn more about the actual processing behaviour of students while learning from expository text. catrysse et al 5 | f l r as stated above processing strategies shape what information is looked for in a text and what information is perceived as relevant (kaakinen & hyönä, 2005). next to that, research using self-report measures suggested that deep processors focus more on essences and surface processors focus more on details and definitions (biggs & tang, 2007; entwistle & ramsden, 1982; marton et al., 1975; richardson, 2000). based on findings from research on perspective driven text comprehension (kaakinen & hyönä, 2008) and the sal domain (lonka et al., 2004), we suggest the following hypotheses for students in the deep condition (after receiving guiding questions at a deep level) and students in the surface condition (after receiving reproduction-oriented questions): a) hypothesis 1: students in the deep condition focus their attention longer on the essentials (e.g., key phrases and words) in the text compared to students in the surface condition. b) hypothesis 2: students in the deep condition, more often return back to essences compared to students in the surface condition. c) hypothesis 3: students in the surface condition focus their attention longer on facts and details (e.g., names) compared to students in the deep condition. d) hypothesis 4: students in the surface condition, more often return back to facts and details compared to students in the deep condition. 4. method 4.1. participants twenty-eight students (age range: 18-25) enrolled at the university of antwerp (belgium), participated on a voluntary basis. participants were randomly divided in either the deep condition (dc, n = 14) or the surface condition (sc, n = 14). unfortunately, data of nine respondents could not be used due equipment failure and problems with eye tracking calibration. therefore, data of 19 students were considered in the statistical analyses (table 2). all participants had normal or corrected-to-normal vision and dutch was their native language. table 1 participant characteristics dc sc n 12 7 gender male 5 5 female 7 2 4.2. materials in order to test our hypotheses, we based our experimental design on the seminal studies by marton et al. (1975). in their experiments they induced either a deep or surface processing strategy by giving students questions after they studied an academic text. in our experiment, students were asked to study a series of three expository texts (± 800 words) on a topic they were not familiar with, namely research on happiness. the texts were taken from the dutch version of ‘the world book of happiness’ (bormans, 2010). catrysse et al 6 | f l r after processing each text they received a number of evaluation questions on the preceding text (figure 1). students in the deep condition received questions at a deep level (e.g., give a summary of the text). in contrast, students in the surface condition received reproduction-oriented questions (e.g., in which country was the research discussed in the text conducted?). so in both conditions students processed the same learning content, but received different questions. in the original study, marton et al. (1975) interviewed and tested the students after the third text and concluded that in the surface condition, students adopted more surface processing strategies while students in the deep condition adopted more deep processing strategies. similarly, in our study we analysed the eye tracking data and cued recalls from the third text. figure 1. experimental design. 4.3. eye tracking eye movements were collected using the tobii tx300 eye tracker (dark pupil tracking), manufactured by tobii technology (stockholm, sweden). it is integrated into a 23-inch tft monitor with a maximum resolution of 1920 x 1080 pixels. the camera samples data at the rate of 300 hz and registration was binocular. tobii tx300 does not require a head stabilization system and allows for more freedom of head movement (37 x 17 cm). gaze accuracy is 0.4° and gaze precision is 0.15°, as reported by the hardware producer. the eye tracker latency is between 1.0 and 3.3 milliseconds. data were recorded with tobii-studio (3.2) software. before starting the experiment, students were seated about 60 cm from the screen for the eye tracking calibration. a five point calibration procedure was used in which students needed to track five red calibration dots on a plain, grey background. areas of interest (aoi’s) define regions in the text that the researcher is interested in gathering data about (holmqvist et al., 2011). with regard to our hypotheses we are interested in key phrases and keywords for the deep condition and in details and facts for the surface condition. six volunteers (master students in educational sciences) read the text in a pilot study to determine the key phrases and keywords. in total 15 deep aoi’s (e.g., a topic sentence with summary statements) and three surface aoi’s (e.g., name of a country) were marked. there were only parts of the text defined as aoi’s, so not the whole text was covered with aoi’s. the total size of the text was 1490 x 1087 pixels, the smallest aoi was 47 x 31 pixels and the biggest aoi was 684 x 71 pixels. the complete text could be seen on the screen, so scrolling was not needed. in line with hyönä et al. (2002) first pass fixation time, look back fixation time and total fixation time were analysed at the level of aoi’s. an overview of the definitions is given in table 2 (holmqvist et al., 2011; hyönä et al., 2003). students were able to process the text in a self-paced manner and therefore we calculated relative duration measures. next to that, aoi’s differed in size because they sometimes contained phrases, while others consisted of only words. therefore, aoi measures were normalized by calculating the reading depth measure (holmqvist et al., 2011; holmqvist & wartenberg, 2005; holsanova, holmqvist, & rahm, 2006). this reading depth measure is defined by the total time spent in an aoi per cm2 and is an indication of how densely an aoi is processed. so for the three measures described in table 2 we calculated relative measures and reading depth measures. catrysse et al 7 | f l r table 2 overview of eye tracking measures and their definitions measure definition first pass fixation time the time spent in an aoi when it was visited for the first time. a visit can consist of more fixations. it reflects early processing and object recognition. look back fixation time duration of all the regressions back to an aoi. it reflects delayed processing, for example to integrate information. total fixation time the time spent in an aoi during the whole trial, it is the sum of the first pass fixation time and the look back fixation time in that aoi. the fixation indices were calculated for either the group of deep aoi’s or the group of surface aoi’s. we used the tobii fixation filter for fixation identification, which is an implementation of a classification algorithm proposed by olsson (2007). it uses a velocity threshold (35 pixels/window) and a distance threshold (35 pixels). for all the measures, the means and standard deviations were calculated. to compare students in both conditions, we used non-parametric tests due to the small sample sizes (van gog et al., 2005). therefore the medians together with the first and third quartile were calculated as well. relative measures and reading depth measures for the eye movement measures were compared for students in both conditions using mann-whitney u tests. we reported the exact two-tailed significance. also in line with van gog et al. (2005), we used a less stringent significance level of 0.10 to avoid type ii error and to increase power. 4.4. cued recall after the eye tracking experiment, a cued recall was conducted. after processing the third text, the experimenter informed students that they would watch the replay of eye movements of the third text together. the cued recall was conducted by using gaze videos produced by tobii-studio software (3.2). in the cued recall, a video showed the text and a moving red dot representing the point of fixation. the bigger the dot, the longer the fixation lasted. students saw their gaze videos at the same speed they processed the text. the interviewer instructed students to watch the video and to tell the interviewer what they were thinking during processing the text. the interviewer also stated that she would occasionally stop the video and ask questions about the reading process, such as ‘here you fixated a lot, what where you doing?’ or ‘here you are going back in the text, what were you doing?’. catrysse et al 8 | f l r table 3 coding scheme for the cued recall analysis strategy example dc sc surface processing 66 (65,3%) 45 (100%) rereading i tried to understand that part so i was rereading it. skimming now i am reading it again and just scanning for important words in the text. guessing meaning word in context that was gnp, i was wondering what the meaning of that word was. rehearsing those countries, i was trying to remember them. connecting to prior text i realise that i go back a lot in the text and that is because i am trying to link parts of the text. connecting to the research task i guess the first paragraph was going to give an overview about the rest of the task, so i thought that was important. detecting mistakes in the text i was looking at the ‘n’ that was missing in that word. deep processing 35 (34,7%) 0 (0%) questioning i was wondering what they meant with that phrase. paraphrasing first, they name something and then you know a summation is coming. second, they talk about cross national comparisons, … connecting to personal experiences you try to process the text critically and to take you own findings and personal experience into account. interpreting and elaborating what i do most of the time is reading the text and then trying to analyse what i just read. in this way i get a better picture of what the text is about. the cued recalls were transcribed from the audiotapes. next to that, we linked comments of the cued recalls to the part of the text that was discussed. the cued recalls were coded based on an initial set of ten codes developed in a study of dinsmore and alexander (2015). specifically, comments were coded as either a surface or deep processing strategy (table 3). after coding the interviews deductively, we added one extra code in the surface processing category namely “detecting mistakes in the text”. transcripts were coded with the qualitative analysis software package nvivo 10. two judges (authors lc and lg) coded the cued recalls and an inter-rater agreement of 73% was reached, which is considered as substantial. we compared the number of coded utterances in each condition between the two categories (table 3). we first analysed the data on a general level and looked for differences between students in both conditions. we also analysed the data at a more fine-grained level to see whether the reported strategies are linked to aoi’s and to examine differences at the aoi level between groups. 5. results table 4 shows the means and standard deviations. standard deviations for the measures in the deep condition are higher than in the surface condition. this may be an indication that students in the deep condition differ more from each other. when we look at the cued recall results of students in the deep catrysse et al 9 | f l r condition, some students pointed out that they sometimes took a pause to integrate processed information instead of looking back. this may also be an indication that there are two types of students in the deep condition, on the one hand students who process information immediately and take a pause to integrate information and on the other hand students who need to look back to parts in the text to integrate this information and to encode it to memory. “sometimes i keep staring at the text, because i try to visualize it for myself” (r7, dc) “i sometimes have the feeling that when i am staring at a word that i am not processing that word but that i am just taking a moment to think about what i have read” (r3, dc) “sometimes i have the feeling that i am staring at something in the text, to process the things i just read before” (r5, dc) the most reported processing strategy in the cued recalls, is the surface processing strategy and more specifically rereading. students in both groups indicated that they reread parts of the text the most. only students in the deep condition reported deep and surface processing strategies. students in the surface condition only reported surface processing strategies. deep processing strategies are reported on a more general level and are not linked to certain phrases, paragraphs or aoi’s in the text. “when you know you will need to answer questions after reading the text, you try to read the text critically and i always try to take into account my personal experiences and findings.” (r5, dc) “i first think about what i read in the text, before i proceed with the next part. i try to make a summary for myself of what i read in the previous parts.” (r3, dc) table 4 means and standard deviations essentials facts and details dc sc dc sc m sd m sd m sd m sd fpft r 2.69 2.22 3.23 1.48 0.40 0.21 0.38 0.25 fpft rd 45.53 23.77 57.50 25.19 80.60 24.49 79.02 51.30 lbft r 12.65 3.63 11.22 2.69 1.09 0.49 1.94 0.77 lbft rd 285.66 192.02 200.75 46.39 268.20 236.77 401.22 168.88 tft r 15.35 3.01 14.46 2.85 1.50 0.57 2.32 0.84 tft rd 331.19 189.72 285.25 43.45 348.80 234.62 480.24 185.68 fpft = first pass fixation time; lbft = look back fixation time; tft = total fixation time; r = relative measure; rd = reading depth measure. we compared the total reading time of students in both conditions with a mann-whitney u test, but no significant differences were found (u = 41, p = 0.97). so students in both conditions spent on average the same amount of time on processing the text. catrysse et al 10 | f l r 5.1. essentials in the text table 5 shows the medians and quartiles for the essentials in the text for students in both conditions. we conducted mann-whiney u tests on all these measures but no significant differences were found between students in both groups. table 5 first quartile, median and third quartile for relative measures and reading depth measures. dc sc mann-whiney u q1 mdn q3 q1 mdn q3 u p fpft r 1.53 1.92 2.86 2.13 2.57 4.43 27 0.227 fpft rd 27.65 41.14 45.17 41.18 48.55 75.67 30 0.340 lbft r 10.93 12.92 15.88 9.39 10.60 12.69 56 0.261 lbft rd 173.39 215.14 354.49 170.57 178.85 215.59 53 0.385 tft r 13.97 15.63 18.26 12.42 13.02 16.41 53 0.385 tft rd 209.29 261.89 396.90 235.15 257.08 267.00 45 0.837 results from the cued recalls indicate that both students in the deep and surface condition reread essentials in the text. a reason for rereading is that they did not really understand essential parts of the text. the motivation to better understand these essential parts in the text is only reported by students in the deep condition. these results suggest that students in the deep condition reread these parts at a deeper level to get a better understanding. “i am rereading a lot, i read something fast and then i think whether i understood it and no i did not, so then i go back again” (r3, dc) “i was trying to understand that part better, so that is why i was rereading it over and over again” (r2, dc) both groups indicated skimming the text after reading it for the first time to look back at the essential parts of the text. “what i often do when i finished reading, is rereading only the essential parts of the text” (r4, dc) “i am just scanning quickly to see if i missed important words in the text” (r16, sc) a final finding from the cued recall results is that both groups guessed the meaning of keywords in context, when they did not understand the word. overall, cued recall results are in line with results from eye tracking, in that no big differences are found between both groups when processing essential parts in the text. “here, that was a difficult word, elitist, i tried to understand the meaning in the text” (r8, dc) “some keywords i do not know, i need to think about them or see the context to understand them” (r19, sc) catrysse et al 11 | f l r 5.2. facts and details in the text table 6 shows the medians and quartiles for facts and details in the text for students in both conditions. students in the surface condition spent relatively more time on facts and details when they looked back at them and also during the whole experiment. next to that, these students read the facts and details with more depth than students in deep condition when they look back at them and during the whole experiment. table 6 first quartile, median and third quartile for relative measures and reading depth measures. dc sc mann-whiney u q1 mdn q3 q1 mdn q3 u p fpft r 0.23 0.38 0.54 0.19 0.39 0.54 46 0.773 fpft rd 66.34 78.06 102.04 38.77 82.39 112.36 44 0.902 lbft r 0.74 1.04 1.41 1.74 2.28 2.41 17 0.036 lbft rd 160.56 178.28 270.98 329.09 423.92 529.68 20 0.068 tft r 1.01 1.47 1.95 2.23 2.62 2.86 15 0.022 tft rd 241.65 276.45 351.52 440.90 479.24 611.01 18 0.045 cued recall results showed that students in the surface condition repeated facts and details in the text, while students in the deep condition did not. other coding categories did not show a link with processing facts and details in the text. again we can see a clear link between the eye movement measures and the results from the cued recalls. “the names of those countries, i really tried to remember those” (r14, sc) “those four countries, i memorized them” (r19, sc) “i tried to remember the name of the author, i thought that would be important” (r17, sc) 6. conclusion and discussion this exploratory study aimed at extending current research on processing strategies during learning from expository text. research in the sal domain is mostly based on students’ self-reports of processing strategies at a general level in which the context of learning is not taken into account (dinsmore & alexander, 2012; gijbels et al., 2014). by looking at the actual processing behaviour of students while learning from expository text, this study makes a first preliminary contribution to the field by using a more direct and online measurement tool at a task specific level that takes the context explicitly into account. it is the first experimental study to explore students’ cognitive processing strategies at a task specific level using objective online measures. most of the research using online measures is based on the think aloud method, which can alter the processing itself (veenman, 2005). by using eye tracking methodology followed by a cued recall this problem is circumvented, in that this method does not demand students to manage cognitive load of the task completion and self-reports of strategies at the same time (samuelstuen & braten, 2007). catrysse et al 12 | f l r in our study we manipulated the task demands to steer processing strategies. results from the cued recalls indicated that this manipulation was successful as students in the deep condition reported a combination of surface and deep processing strategies, while students in the surface condition only reported surface processing strategies. this is in line with previous research that showed that demands on a deep level, to demonstrate a thorough understanding, lead to more deep processing strategies whereas demands on a surface level, to acquire passive acquisition of facts and details, lead to more surface processing strategies (marton et al., 1975; richardson, 2000; scouller, 1998; scouller & prosser, 1994; segers et al., 2006). results of the cued recalls indicated that students in both conditions processed facts and details and essential parts in the text but they did it in a different way. these results are similar to results from think aloud studies in which processing strategies were examined without manipulating task demands (dinsmore & alexander, 2012, 2015; penttinen et al., 2012). based on the eye movement data, we cannot confirm the first and second hypothesis that stated that students in the deep condition focus their attention longer on essentials in the text compared to students in the surface condition and that they return more back to them. both groups of students spent time on processing the essentials in the text. although we could not find differences between groups based on their eye movement data, results from the cued recalls indicated that students in the deep condition reread the essentials in the text to understand them better. this motivation to better understand these parts is related to a deep way of processing (biggs & tang, 2007; entwistle & ramsden, 1982). students in the surface condition did not report this motivation. these descriptive findings indicate that students in our sample processed the text in a different way but more substantive research is needed to further explore found differences in overt eye movement behaviour. in contrast with research from the angle of perspective driven text comprehension, these essential parts do not seem to be perceived as more relevant by students in the deep condition (kaakinen & hyönä, 2005, 2007, 2008), they are just processed in a more deep way. another interesting finding from the cued recall results is that some students in the deep condition indicated that they took a pause at some places in the text to integrate the processed information instead of actively looking back. other students in the deep condition reported actively looking back at these essential parts in the text. also the higher standard deviations for students in the deep condition may be an indication of these differences. it is in line with other research that shows that building the necessary links to incorporate text information to the developing memory representation can be achieved mentally or can result in overt behaviour in which students actively reread essential parts (hyönä et al., 2003; kaakinen & hyönä, 2008). so, based on these preliminary findings, we suggest that some deep processors actively return back to essentials to encode it to memory, while others take a pause to integrate the new information without looking back to this information. further research is needed to confirm these findings. regarding the third and fourth hypothesis, the results indicated that students in the surface condition indeed looked longer at facts and details and returned more back to them. it seems that students in the surface condition switch to strategic processing by paying more attention to relevant parts, namely facts and details (kaakinen & hyönä, 2007). research of kaakinen and hyönä (2005) showed that the extra time spent on relevant information is used to rehearse this information in order to encode it to memory. results from both eye tracking and cued recalls indicate that facts and details are more repeated in order to encode into memory in the surface condition (kaakinen & hyönä, 2007). only students in the surface condition reported repeating facts and details, while students in the deep condition did not report learning activities like that. although our findings suggest that eye tracking followed by a cued recall is a fruitful way to investigate processing strategies, we want to stress the preliminary nature of this study because of some limitations. an important limitation of this study is the small sample size. due to equipment failure or problems with eye tracking calibration, the sample size decreased at the onset of this study. because of this smaller sample size we decided to use non-parametric tests and deepened the results obtained by a cued recall. we also raised the significance level to increase power due to the smaller sample size (van gog et al., 2005). the findings from this study can serve as a baseline for further research in which larger samples can be used to increase power without adjusting the significance level. another limitation of this study is that we catrysse et al 13 | f l r used a between groups design. reading times and online processing strategies can vary among adult readers (hyönä et al., 2002; kaakinen & hyönä, 2008). therefore we suggest for further research to use a within groups design in which students use both processing strategies to take this individual variability into account. another way to understand the significance of individual variability is to include control variables such as reading ability, interest in the topic and prior knowledge about the topic (fox, 2009; mason et al., 2013). by increasing the sample size and using a within subjects design, more complex statistical analysis can be conducted to confirm our preliminary findings. in this way it will be possible to give more generalized statements regarding processing strategies as measured by eye tracking. a last limitation is that students needed to process the text on a computer screen to be able to use the eye tracking. by doing this it does not reflect the natural setting in which students habitually process learning contents. despite the limitations, this study was able to show that eye tracking followed by a cued recall is a promising tool to examine students’ processing strategies. an important finding from our study is that it is valuable to combine eye tracking with a cued recall, because differences in processing strategies not always lead to overt eye movement behaviour (hyönä et al., 2003; kaakinen & hyönä, 2008). by using a cued recall we were able to uncover differences in processing strategies that were not reflected in eye movement behaviour. based on our preliminary findings, the combination of eye tracking and a cued recall seems to be a promising tool to further investigate cognitive processing strategies when learning from text. students in the deep condition do not seem to look longer at essentials and do not seem to return more back to them, but processed them in a more deep way then students in the surface condition. results suggest that students in the surface condition looked longer at facts and details and did return more back to them. this first exploratory eye tracking study in the sal domain is an important illustration on how processing strategies can be further examined beyond the use of self-report questionnaires. in our opinion it would be worthwhile to use this innovative eye tracking methodology in multi-method designs to triangulate it with often used self-report measures to look for convergent or divergent validity. in our study we steered students’ processing strategies by task demands. although research indicated that it is possible to influence processing strategies by manipulating this contextual variable (baeten et al., 2010; gielen et al., 2003; marton et al., 1975; scouller, 1998; scouller & prosser, 1994; segers et al., 2006), it would be interesting to combine it with these self-report measures in order to examine a more natural way of processing behaviour. next to that, using multiple sources of data is important to develop a comprehensive understanding of how we can adequately measure students’ processing strategies. eye tracking methodology followed by a cued recall in the sal domain can also deepen the conceptual underpinnings on what constitutes deep and surface processing of learning contents. keypoints eye tracking followed by a cued recall is a promising tool to uncover differences in students’ processing strategies while learning from expository text. students in the deep condition do not look longer at the essentials, but they process them in a more deep way by trying to understand these parts better. students in the surface condition look longer at facts and details and try to rehearse these parts. references ariasi, n., & mason, l. (2010). uncovering the effect of text structure in learning from a science text: an eye-tracking study. instructional science, 39(5), 581-601. doi: 10.1007/s11251-010-9142-5 baeten, m., kyndt, e., struyven, k., & dochy, f. (2010). using student-centred learning environments to stimulate deep approaches to learning: factors encouraging or discouraging their effectiveness. educational research review, 5(3), 243-260. doi: 10.1016/j.edurev.2010.06.001 catrysse et al 14 | f l r biggs, j. (1987). student approaches to learning and studyin. research monograph. melbourne: australian council for educational research. biggs, j., & tang, c. (2007). teaching for quality learning at university: open university press / mcgraw-hill education. bormans, l. (2010). geluk. the world book of happiness. tielt: lannoo. dinsmore, d. l., & alexander, p. a. (2012). a critical discussion of deep and surface processing: what it means, how it is measured, the role of context, and model specification. educational psychology review, 24(4), 499-567. doi: 10.1007/s10648-012-9198-7 dinsmore, d. l., & alexander, p. a. (2015). a multidimensional investigation of deep-level and surfacelevel processing. the journal of experimental education, 1-32. doi: 10.1080/00220973.2014.979126 entwistle, n., & mccune, v. (2004). the conceptual bases of study strategy inventories. educational psychology review, 16(4), 325-345. doi: 10.1007/s1064800400030 entwistle, n., & ramsden, p. (1982). understanding student learning. new york: nichols publishing company. ericsson, k. a., & simon, h. a. (1993). protocol analysis. verbal reports as data. massachusetts: massachusetts institute of technology. fox, e. (2009). the role of reader characteristics in processing and learning from informational text. review of educational research, 79(1), 197-261. doi: 10.3102/0034654308324654 gielen, s., dochy, f., & dierick, s. (2003). evaluating the consequential validity of new modes of assessment: the influence of assessment on learning, including pre-, postand true assessment effects. . in m. segers, f. dochy & e. cascallar (eds.), optimising new modes of assessment: in search of qualities and standards (pp. 37-54). the netherlands: kluwer academic publishers. gijbels, d., donche, v., richardson, j. t. e., & vermunt, j. d. (eds.). (2014). learning patterns in higher education. dimensions and research perspectives. . london: routledge. hamaker, c. (1986). the effects of adjunct questions on prose learning. review of educational research, 56(2), 212-242. doi: 10.3102/00346543056002212 holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2011). eye tracking : a comprehensive guide to methods and measures. oxford ; new york: oxford university press. holmqvist, k., & wartenberg, c. (2005). the role of local design factors for newspaper reading behaviour an eye-tracking perspective. lund university cognitive studies (vol. 127). lund university. holsanova, j., holmqvist, k., & rahm, h. (2006). entry points and reading paths on newspaper spreads: comparing a semiotic analysis with eye-tracking measurements. visual communication, 5(1), 65-93. doi: 10.1177/1470357206061005 hyönä, j. (2010). the use of eye movements in the study of multimedia learning. learning and instruction, 20(2), 172-176. doi: 10.1016/j.learninstruc.2009.02.013 hyönä, j., & lorch, r. f. (2004). effects of topic headings on text processing: evidence from adult readers’ eye fixation patterns. learning and instruction, 14(2), 131-152. doi: 10.1016/j.learninstruc.2004.01.001 hyönä, j., lorch, r. f., & kaakinen, j. k. (2002). individual differences in reading to summarize expository text: evidence from eye fixation patterns. journal of educational psychology, 94(1), 44-55. doi: 10.1037//0022-0663.94.1.44 hyönä, j., lorch, r. f., & rinck, m. (2003). eye movement measures to study global text processing. in j. hyönä, r. radach & h. deubel (eds.), the mind's eye: cognitive and applied aspects of eye movement research. amsterdam: elsevier science. just, m. a., & carpenter, p. a. (1980). a theory of reading: from eye fixations to comprehension. pyschological review, 87(4), 329-354. doi: 10.1037/0033-295x.87.4.329 kaakinen, j. k., & hyönä, j. (2005). perspective effects on expository text comprehension: evidence from think-aloud protocols, eyetracking, and recall. discourse processes, 40(3), 239-257. doi: 10.1207/s15326950dp4003_4 catrysse et al 15 | f l r kaakinen, j. k., & hyönä, j. (2007). perspective effects in repeated reading: an eye movement study. memory & cognition, 35(6), 1323-1336. doi: 10.3758/bf03193604 kaakinen, j. k., & hyönä, j. (2008). perspective-driven text comprehension. applied cognitive psychology, 22, 319-334. doi: 10.1002/acp.1412 kaakinen, j. k., & hyönä, j. (2010). task effects on eye movements during reading. journal of experimental psychology: learning, memory and cognition, 36(6), 1561-1566. doi: 10.1037/a0020693 kaakinen, j. k., hyönä, j., & keenan, j. m. (2002). perspective effects on online text processing. discourse processes, 33(2), 159-173. doi: 10.1207/s15326950dp3302_03 lai, m.-l., tsai, m.-j., yang, f.-y., hsu, c.-y., liu, t.-c., lee, s. w.-y., . . . tsai, c.-c. (2013). a review of using eye-tracking technology in exploring learning from 2000 to 2012. educational research review, 10, 90-115. doi: 10.1016/j.edurev.2013.10.001 lonka, k., olkinuora, e., & mäkinen, j. (2004). aspects and prospects of measuring studying and learning in higher education. educational psychology review, 16(4), 301-323. doi: 10.1007/s10648–004– 0002–1 marton, f., dahlgren, l. o., säljö, r., & svensson, l. (1975). the göteborg project on non-verbatim learning. göteborg: university of göteborg. mason, l., tornatora, m. c., & pluchino, p. (2013). do fourth graders integrate text and picture in processing and learning from an illustrated science text? evidence from eye-movement patterns. computers & education, 60(1), 95-109. doi: 10.1016/j.compedu.2012.07.011 mcnamara, d. s. (2004). sert: self-explanation reading training. discourse processes, 38(1), 1-30. moss, j., schunn, c. d., schneider, w., mcnamara, d. s., & vanlehn, k. (2011). the neural correlates of strategic reading comprehension: cognitive control and discourse comprehension. neuroimage, 58(2), 675-686. doi: 10.1016/j.neuroimage.2011.06.034 olsson, p. (2007). real-time and offline filters for eye tracking. kth royal institute of technology. penttinen, m., anto, e., & mikkilä-erdmann, m. (2012). conceptual change, text comprehension and eye movements during reading. research in science education, 43(4), 1407-1434. doi: 10.1007/s11165-012-9313-2 perry, n. e., & winne, p. h. (2006). learning from learning kits: gstudy traces of students' self-regulated engagements with computerized content. educational psychological review, 18, 211-228. doi: 10.1007/s10648-006-9014-3 ponce, h. r., & mayer, r. e. (2014). an eye movement analysis of highlighting and graphic organizer study aids for learning from expository text. computers in human behavior, 41, 21-32. doi: 10.1016/j.chb.2014.09.010 rayner, k. (1998). eye movements in reading and information processing: 20 years of research. psychological bulletin, 124(3), 372-422. doi: 10.1037//0033-2909.124.3.372 richardson, j. t. e. (2000). researching student learning. buckingham: open university press and srhe. richardson, j. t. e. (2013). research issues in evaluating learning pattern development in higher education. studies in educational evaluation, 39(1), 66-70. doi: 10.1016/j.stueduc.2012.11.003 rothkopf, e. z. (1966). learning from written instructive materials: an exploration of the control of inspection behavior by test-like events. american educational research journal, 3, 241-249. doi: 10.3102/00028312003004241 samuelstuen, m. s., & braten, i. (2007). examining the validity of self-reports on scales measuring students' strategic processing. britisch journal of educactional psychology, 77(pt 2), 351-378. doi: 10.1348/000709906x106147 schellings, g. l. m. (2011). applying learning strategy questionnaires: problems and possibilities. metacognition and learning, 6(2), 91-109. doi: 10.1007/s11409-011-9069-5 schellings, g. l. m., van hout-wolters, b., & vermunt, j. d. (1996). individual differences in adapting to three different tasks of selecting information form texts. contemporary educational psychology, 21, 423-446. doi: 10.1006/ceps.1996.0029 catrysse et al 16 | f l r scouller, k. m. (1998). the influence of assessment method on students' learning approaches: multiple choice question examination versus assignment essay. higher education, 35, 453-472. doi: 10.1023/a:1003196224280 scouller, k. m., & prosser, m. (1994). students' experiences in studying for multiple choice question examinations. studies in higher education, 19(3), 267-279. doi: 10.1080/03075079412331381870 segers, m., nijhuis, j., & gijselaers, w. (2006). redesigning a learning and assessment environment: the influence on students' perceptions of assessment demands and their learning strategies. studies in educational evaluation, 32, 223-242. doi: 10.1016/j.stueduc.2006.08.004 van gog, t., & jarodzka, h. (2013). eye tracking as a tool to study and enhance cognitive and metacognitve processes in computer-based learning environments. in r. azevedo & v. a. w. m. m. aleven (eds.), international handbook of metacognition and learning technologies. new york: springer. van gog, t., paas, f., & van merrienboer, j. j. g. (2005). uncovering expertise-related differences in troubleshooting performance: combining eye movement and concurrent verbal protocol data. applied cognitive psychology, 19(2), 205-221. doi: 10.1002/acp.1112 veenman, m. v. j. (2005). the assessment of metacognitive skills: what can be learned from multi-method designs? in c. artett & b. moschner (eds.), lernstrategien und metakognition. implikationen für forschung und praxis (pp. 77-99). münster: waxmann. veenman, m. v. j. (2011). alternative assessment of strategy use with self-report instruments: a discussion. metacognition and learning, 6(2), 205-211. doi: 10.1007/s11409-011-9080-x veenman, m. v. j., bavelaar, l., de wolf, l., & van haaren, m. g. p. (2014). the on-line assessment of metacognitive skills in a computerized learning environment. learning and individual differences, 29, 123-130. doi: 10.1016/j.lindif.2013.01.003 vermunt, j. d., & vermetten, y. j. (2004). patterns in student learning: relationships between learning strategies, conceptions of learning, and learning orientations. educational psychology review, 16(4), 359-384. doi: 10.1007/s10648-004-0005-y vekkaila et al publication frontline learning research vol.7 no. 1 (2019) 51 64 issn 2295-3159 how do doctoral students in stem fields engage in scientific knowledge practices? jenna vekkailaa, viivi virtanena,jani kukkolab, liezel frick c, kirsi pyhältöa a centre for university teaching and learning (hype), university of helsinki, helsinki, finland b department of philosophy, history and arts, university of helsinki, finland c department of curriculum studies, stellenbosch university, stellenbosch, south africa. article received 1 august 2018/ revised 25 november / accepted 31 january / available online 22 february abstract knowledge creation is at the core of scientific endeavour. as early career researchers, doctoral students take part in knowledge creation through engaging in various knowledge practices and make their original contribution to knowledge, and become experts in their particular domain. however, our understanding of what doctoral knowledge practices entails is still insufficient. for this study, a total of 34 doctoral students from stem fields, including natural sciences, bioand environmental sciences and medicine were interviewed to gain a better understanding of the kinds of knowledge practices in which doctoral students in the sciences engage. the data were collected with semi-structured interviews, which were qualitatively content analysed. the results showed that the participants mostly described activities that were established everyday knowledge practices of the researcher community (75 %), whereas practices that were innovative (25 %), entailing transformation of the current practices and developing new ones, were less often reported. moreover, the practices were typically collective, involving the students, their supervisors or other members of their research groups (67 %). further investigation showed that the participants were typically actively engaged in knowledge practices (79 %) rather than just adapting existing ones (13 %). perceiving oneself as a bystander was even less typical (8 %). the significance of this study lies in exploring doctoral students’ self-reported knowledge practices in stem fields, and demonstrates that they perceive themselves as actively and collaboratively engaged in creating knowledge. keywords: doctoral training; doctoral student; qualitative study; knowledge practice; stem fields info corresponding author email: jenna.vekkaila@helsinki.fi doi 10.14786/flr.v7i1.393 1. introduction knowledge creation is at the core of scientific endeavour. doctoral students are key players in knowledge creation within any university or discipline since they contribute to the endeavour by producing an original contribution in the form of doctoral dissertation, and by extending the knowledge boundaries of a particular discipline (see e.g., the united kingdom quality assurance agency for higher education, 2008). therefore, they should also be a key interest to both universities and disciplinary communities that stand to benefit from such advances in knowledge. knowledge creation takes place through knowledge practices, entailing various disciplinary research activities such as data collection, analysis, article writing, elaboration of concepts and theories, planning a research project, and presenting research. in stem fields (the abbreviation stem referring to science, engineering, technology and mathematics will be used in this article) such practices are suggested to be typically collective (hakkarainen et al., 2014): doctoral research in stem fields is typically focused on solving shared research problems related to a supervisor’s research projects, pursuing article-based dissertations that consist of co-authored internationally refereed journal articles, and working intensively in relatively strong researcher communities, including several doctoral students, postdocs, and academic staff. yet, not all the researcher communities in the stem fields embrace collective knowledge practices, nor do all doctoral students have similar access to such practices even if they may exist in their communities. accordingly, in order to create an optimal learning environment for knowledge creation for doctoral students in stem fields, a better understanding of the knowledge practices, and ways in which the students engage in these practices during their studies, is needed. the study aims to contribute to bridging the gap in the literature in the field by exploring the kinds of knowledge practices in which doctoral students in stem fields engage during their studies. the knowledge practices are explored in the framework of socio-constructivist views of learning (see e.g. sfard, 1998; paavola, lipponen, & hakkarainen, 2004) by drawing on the seminal work on” knowledge building” by nonaka and taceukhi (1995), engeström (1999), and bereiter (2002). 1.1 knowledge practices as key for knowledge creation knowledge creation is a socially embedded endeavour (john-steiner, 2000), rooted in the researcher community typically comprising of supervisors, other senior scholars, post-doctoral researchers, doctoral students, and both national and international researcher networks (mcalpine & norton, 2006; pyhältö & keskinen, 2012) sharing the same object of activity and knowledge artefacts such as research interest, frameworks, and/or methods. this has several consequences. the knowledge creation takes place in the researcher communities via knowledge practices, which are the socially created ways in which scientists think, interact, and engage in their day-to-day work (brew et al., 2011; mcalpine & åkerlind, 2010) while carrying out research enquiries. such practices entail, for instance, various methods employed in research, frameworks utilised, research designs carried out, and scientific writing genres applied. as a result, doctoral student learning is highly embedded in the knowledge practices, not only in terms of knowledge acquisition (a mental process of individual learning) and knowledge participation (a process of being socialised into an epistemic community), but also in terms of the deliberate process of creating new knowledge that has the potential to transform the student’s ways thinking and behaving (hakkarainen et al., 2004, 2013). prior empirical research on doctoral research knowledge practices is very limited. few prior studies indicate that doctoral students do engage to different extents in different kinds of knowledge practices, and that differences between the researcher communities in this regard occur. hakkarainen and his colleagues (2013), for instance, showed that doctoral students in cutting edge research groups in medicine and in natural sciences were most typically engaged in collective inquiry efforts. in a more recent study (2014) on leaders of national centers of excellence in the sciences, it was shown that professors aimed at cultivating the pursuit of collectively shared research objects, the pursuit of externally reviewed co-authored journal articles, and were focused on collective supervision (hakkarainen et al., 2014). the findings imply that in the best, cutting-edge research communities, the aim of such communities is often to deliberately involve doctoral students in their collective knowledge practices – including the co-construction of goals, reciprocal monitoring and planning of research, and the shared regulation of joint cognitive processes in complex problem-solving (e.g., hadwin & oshige, 2011; volet, vauras, & salonen, 2009), co-authoring, hard work and intensive training – to become members of the research communities (florence & yore, 2004; kamler 2008; hakkarainen et al. 2013). through sustained engagement, new doctoral students are gradually socialized into the knowledge practices that at its best allow them to work at the frontiers of knowledge and transform their ways of thinking and behaving (holmes, 2004). a great deal of this kind of learning takes place through horizontal (between-peer) (see fenge, 2012) and vertical (between newcomers and senior researchers) knowledge sharing. engaging in the cutting-edge knowledge practices allows doctoral students’ co-evolvement and co-development along with their research problems and co-authored articles, and eventually ‘authoring themselves’ as full-members of top researcher communities (holland et al., 1998). however, knowledge practices should not be seen as a singular construct. accordingly, at least distinction between the established knowledge practices (commonly known in the community that everyone needs to master) and innovative practices (that are typically novel or recently transformed), can be made (hakkarainen et al., 2004). moreover, the practices may vary from individual to collective, and from routine practices related to supporting knowledge building to more fluid and innovative practices, which foster the solving of emergent and novel problems (e.g., hakkarainen et al., 2013) that mediate progress towards new scientific discoveries. the practices can also be more or less explicit and intentional. established knowledge practices are more often tacit, since they are well mastered by the established members of the researcher community than in the case of newly developed innovative practices that often still require extra effort to maintain. in addition, the practices are not static in nature; instead, they constantly and intentionally evolve and change in the interplay between individuals and their communities (lave & wenger, 1991). learning about these practices and how to participate in them is essential for becoming a scientist (e.g., becher & trowler, 2001). the knowledge practices are to a certain extent context dependent. in stem fields, solving complex problems through laboratory or fieldwork often requires intensive group-based collaborative knowledge practices (cumming, 2009; delamont & atkinson, 2001) and expertise is distributed among the various researchers at different career phases. this is especially typical in large-scale research projects with many staff members and where a variety of research instruments are utilised (e.g., furner, 2003). moreover, researcher groups often develop their own set of distinctive knowledge practices that evolve over the time. the knowledge practices of the researcher community determine to a great extent not only the quality of their research outputs, but also what the doctoral students learn during their studies, and the overall quality of the doctoral experience. in addition, individual variation between the students in how they engage in these practices is likely to occur. accordingly, in order to understand the influence such practices may have on the students, we need to explore what kinds of knowledge practices doctoral students engage in, and how they engage in them. 1.2 doctoral student engagement in knowledge practices doctoral students themselves can engage differently within the knowledge practices provided by the researcher communities (hopwood, 2010; mathieson, 2011). they can, for instance, adopt, adapt, or withdraw from the practices, and their involvement or lack thereof may eventually modify the practices i.e. display agentic behaviours (hopwood, 2010; pyhältö & keskinen, 2012). this includes working with others to expand the “object of activity”, by recognizing the motives and resources of others, interpreting them, and aligning one’s own responses to these interpretations with the responses of others involved while expanding knowledge in terms of the doctoral project (pyhältö & keskinen, 2012). because sense of agency, while internal, is always constructed in a physical, social, and cultural context, the researcher community can either promote or hinder doctoral students’ sense of agency (o’ meara, terosky, & neumann, 2008). variation across the doctoral students in their experienced ability to exercise their agency is based on a variety of personal, social, and organizational resources and demands at hand (o’ meara & campbell, 2011). therefore, an important aspect of developing relational agency is having an opportunity to participate and contribute (greeno, 2006; lipponen & kumpulainen, 2011; pyhältö, pietarinen & soini, 2012) to the knowledge practices of the researcher community (hancock, hughes, & walsh, 2017). this requires creating the kinds of practices in which doctoral students are seen and treated as accountable researchers. however, the students’ active and responsive collaboration with the researcher community that makes it possible to expand understanding and create new knowledge cannot be taken for granted (pyhältö & keskinen, 2012). hence students can display various degrees of agency in the knowledge practices provided by the researcher community ranging from active and interactional agent, to obedient employer, whose task is to learn “the rules of the game” and carry tasks given by the senior members of the researcher community. the ability of doctoral students to participate in knowledge creation is shown to be determined both by individual attributes, such as their motivation, skills, and ability to carry out agentic behaviour (jazvac-martek, chen, & mcalpine, 2011; mcalpine & amundsen, 2009; see also bandura, 2001; hadwin & oshige, 2011), as well as researcher community attributes, such as the way in which doctoral students are introduced to the community, the quality and quantity of supervisory and researcher community support provided, and the nature of practices in the given community (e.g., delamont & atkinson, 2001; gardner, 2007; golde, 2010; jazvac-martek et al., 2011). at its best, from the beginning of their doctoral processes students are involved in the knowledge practices, which are focused on the knowledge objects that enhance both knowledge and associated practices (hakkarainen, et al., 2004; walker et al., 2008). it has been suggested that in order for doctoral students to create new ideas, they first need a foundation for their creative actions – that is, they must master the existing frameworks, their rules and limits (frick & brodin, 2014). thus, engaging in shared and innovative knowledge practices enables doctoral students to surpass their individual limitations and create new ideas (e.g., walker et al., 2008). this further results in changes both in the relationship between the researchers and their working environment, and in shared knowledge objects (e.g., hakkarainen et al., 2004; lave & wenger, 1991). yet, pyhältö & keskinen (2012) found that doctoral students in behavioural sciences, humanities and medicine rarely displayed agentic behaviours within their researcher communities. prior studies imply that the knowledge practices of scientific communities play a central role in the process of learning to become a scientist, yet our understanding of the nature and function of knowledge practices, especially among stem field doctoral students, is insufficient. 2. aim of the study since no research (empirical or non-empirical) exists on doctoral students’ knowledge practices in stem areas, the aim of this study was to gain a better understanding of the kinds of knowledge practices in which doctoral students in stem fields engage during their doctoral process. in order to reach the aim, the study addressed two complementary research questions; firstly, the kinds of knowledge practices reported by the doctoral students were identified, and secondly, the ways in which students participated into these practices were explored. 3. methods 3.1 finnish doctoral education in stem fields finnish doctoral education in the sciences (pyhältö, stubb, & tuomainen, 2011) is based on the european model. conducting doctoral thesis research is embedded in the activities of the research community. the doctorate involves a dissertation and its public defence. it is complemented with coursework (total 60-80 ects) that is based on personal study plans, typically including international conferences and some methodological studies. doctoral education in finland is outlined in more detail by pyhältö, nummenmaa, soini, stubb, and lonka (2012). our study includes science, medicine, and bioand environmental sciences. considering academic research they all can be viewed as natural sciences in which research is based on empirical evidence from observation and experimentation with mathematics as crucial partner. in finland physics, chemistry, and biology are the contents of entrance examination into studying medicine, and in the research-intensive university the master students from biology often do their doctorates in medicine. hence, we use in this paper the abbreviation stem with medicine included. in finnish science, medicine, and bioand environmental sciences, the most common type of doctoral thesis is a summary of articles. each doctoral student is required to publish from three to five articles in peer-reviewed international journals. the articles are often co-authored with the supervisors. doctoral students in these fields usually work on their phds full time, and the typical completion time varies from four to six years. the key distributives of doctoral education in the faculties of bioand environmental sciences, medicine and science are reported in table 1. there are some differences between the faculties. most doctoral students conduct their work alone in science whereas in medicine and bioand environmental sciences the work is usually conducted in the research group. further, the science students are less often engaged in the doctoral programs compared to their colleagues in the other two faculties. yet the graduation time is shorter in science and medicine than in bioand environmental sciences. the original survey data for the analysis was collected in 2011 with a broad range of disciplines included (pyhältö, stubb, & tuomainen, 2011), and in light of that, the three faculties share a quite similar system of doctoral education. however, particularly the differences in research group status may have effect on the knowledge practices identified from the doctoral students’ interviews. table 1 the structure of doctoral education in the faculties under study: doctoral students’ (n) membership of doctoral program and research group, typical form of conducting thesis, and typical graduation time (see original data; pyhältö, stubb, & tuomainen, 2011). 3.2 participants a total of 34 doctoral students from stem fields (7 participants from the natural sciences, 7 participants from medicine, and 20 participants from the bioand environmental sciences) participated in the study. they were all conducting their research and theses at a large research-intensive finnish university. all the participants had a master’s degree; most of the participants (n=29) were full-time doctoral students and five were part-time. all the participants were pursuing a summary of articles, but they were in different phases of their doctoral process: five were in the beginning of the doctoral process, meaning that they were typically launching their research projects, collecting or analysing data, or writing their first or second article. nine of the participants were in the middle part of the process, which typically included data analysis and writing a third or fourth article. most of the participants (n=16) were in the last part of the process, which typically meant finalizing the last articles and the summary of the articles. four participants had already defended their doctoral theses. all the participants were interviewed on a voluntary basis. 3.3 data collection data were collected by employing semi-structured interviews (e.g., kvale, 2007). the interview protocol was designed to investigate the doctoral students’ experiences of their thesis processes and their views of themselves within these processes (stubb, pyhältö, & lonka, 2014). all interviews were conducted by members of the authors’ research group. the interviews lasted from 22 minutes to almost three hours. the interviews were recorded and transcribed verbatim. 3.4 analysis the interview data were qualitatively content analysed (e.g., creswell, 2012) by relying on an abductive strategy (e.g., morgan, 2007). hence, when categorising the data, observations and prior understanding based on theories were repeatedly assessed in relation to each other by combining data-grounded (harry, sturges, & klingner, 2005; mills, bonner, & francis, 2006) and theory-guided analysis strategies (creswell, 2012) in order to acquire the most accurate possible understanding of doctoral students’ experiences of knowledge practices. the analysis included four complementary phases. at first, all text segments related to knowledge practices were identified. these included all doctoral students’ expressions of conducting research work alone or together with other researchers. the criteria for the text segments, which where coded as experiences of knowledge practices, were that they involved a description of research activities and the object of activity (e.g., data collection, analysis, article writing, elaboration of concepts and theories, planning a research project, presenting research). the analysis resulted in 192 text segments from 34 interviews that were included in the further analysis. the units ranged from a couple of sentences to a dozen sentences. secondly, the knowledge practices identified in the first phase were coded according to the quality of the practices into two exclusive categories by applying a model proposed by hakkarainen et al. (2004): 1): established practices: including text segments in which the practices are reported to be commonly known in the community, or practices that everyone needs to adapt; and 2) innovative practices: including text segments in which the practices are reported to be modified from the existing or new practices. thirdly, the knowledge practices were further categorised into two categories based on whether they were described as individual or collective. the analysis yielded two categories: 1) individual knowledge practices, consisting of descriptions of working alone with the research; and 2) collective knowledge practices, consisting of reports of community-based activities in which two or more researchers are involved. at the fourth analytical phase, all the knowledge practices were coded further into three categories according to how the doctoral students described their roles in the practices: 1) active, containing expressions of being an intentional participant who can affect the activities; 2) adaptation, containing descriptions of being a passive participant in the practice or doing activities that someone else has ordered them to do; and 3) bystander, containing reports of not being involved in or having an unorganized perception of one’s role in the practice. the analysis process was conducted by the first, the second and the third authors. the categories derived from the analysis were critically assessed by the research group at the end of each analysis phase in order to enhance the trustworthiness and credibility of the analysis and results (e.g., miles & huberman, 1994). in the few cases of disagreement, a consensus of final categorization was reached through discussion amongst the researchers. to increase the reliability of our analysis parallel coding was carried out with 67% of the data (total of 129 text segments) independently by two co-authors. the inter-rater reliability for each of the analysis phase were: the agreement range was 100 % (first phase); 81 % (second phase), 95 % (third phase) to 74 % (fourth phase). the few cases of disagreement (particularly phase 2 and 4) we relied on coding of the co-author who had background in the stem-field research, since we presumed that she was more familiar with the knowledge practices of the stem fields. in the findings section, we provide direct quotations from the participants’ descriptions, translated from finnish to english. the quotations were selected to illustrate the particular category as well as to highlight the differences between the categories. for each category, there were several potential illustrative quotations from each discipline available. the most comprehensive quotations were chosen from each category while keeping at the same time track that all disciplines were equally represented. 4. results the doctoral students described a variety of knowledge practices (f=192). the practices ranged from individual work with research instruments to dialogues about theories and observation, as well as shared problem solving and making new discoveries. the reported practices also differed in terms of how established or innovative they were, as described by the participants. furthermore, the participants described their role in the reported practices in varying ways. 4.1 established and innovative knowledge practices the majority of all the knowledge practices reported were established everyday practices cultivated by the researcher community (75 %). such practices typically involved mastering and using research instruments and methods, defining and planning the research topics and processes. the students also described practices related to scientific writing and publishing. one of the participants described such a practice in the following way: in the beginning, my time was spent grasping the laboratory practices and that sort of stuff. and i do use all of them quite diversely, the different laboratory techniques, i mean. and they are demanding—i didn’t even learn them at first. so, they do require a slow and steady pace to get the hang of them. (medicine 2) occasionally, the students described established practices resulting in a discovery. the established practices resulting the discoveries were often cultivated and sustained by the researcher communities for long time. in these cases, the method itself was established and well-known, but it was used in a way that resulted in originality, as the following excerpt shows: in a way, the techniques i use in that [study] are the ones that have been the practice in many laboratories for a long time, but in many places these are no longer used. still, our group has trusted that this is the way to go… in the end we took a real risk and it turned out to be fruitful, and of course that was really motivating. (bio 2) sometimes the participants reported practices that were innovative, entailing transformation of the current practices and developing new ones. these practices were less typical (25 %) compared to established practices. innovative practices are the ones of utmost interest with knowledge creation at issue. innovative practices typically emerged in situations where established ones did not work or did not provide solutions for the problems faced. accordingly, they were characterised by learning from errors. the innovative practices were either reformed or modified from established ones, or new practices that were just created for solving novel problems. these practices were typically related to developing research ideas and theoretical observations, solving empirical problems and mastering research techniques, as well as getting results and making discoveries. the new ways of doing were often associated with aiming at, or actually constructing, new knowledge: a new research idea, theoretical observation, or scientific discovery. occasionally, the students faced research related problems and found solutions on their own: so i have been testing different techniques as a kind of pioneer work, as there hasn’t been anyone in the research group who’s used these methods… i have, kind of, made these tools up for myself and that is the reason why it has taken such a long time… and i often find myself at a dead end. (bio 18) 4.2 collective and individual knowledge practices the participants described that the knowledge practices involved not only the students themselves, but also others, typically their supervisors, peers or other researchers from their researcher groups. hence, the reported practices were mostly collective (67 %). resulting from the fact that research was often carried out in research groups (table 1). the students reported that engaging in conceptual discussion and working with theoretical ideas, defining and planning their research work, as well as mastering research techniques and writing a publication were the kinds of practices that involved their supervisors, other senior researchers and peers. for instance, the supervisors and colleagues were often active in providing suggestions and guidelines for their students in choosing their research topics, as well as planning their research processes: depending on the article i’ve been working on, many of my colleagues have collaborated with me…discussing how to do this and this. depending on the research questions, the procedure has been different with different people. so maybe this says something about the multidisciplinarity we’ve had, having all these different people with their different viewpoints involved in a single project. (natural science 4) further, while typically the collective practices were also described as established activities, interestingly, there were more descriptions of innovative activities among collective practices than among individual practices. in stem fields, solving complex problems through laboratory or field research often requires intensive group-based collaborative research practices resulting that not only knowledge creation but also researcher development is highly embedded in intensive group-based collaborative research practices. one participant expressed how he had started to develop his own research ideas and increasingly became involved in dialogues throughout the doctoral process: in the beginning, it was mostly the supervisors who had the ideas—that we could do things this or that way. but the longer it took, the more i got into the practicalities and learned to deal with them. after that, i’ve been able to think more about what i want to research next, and to bring more and more of my own ideas to the brainstorming. (medicine 3) participants described a third of all the reported practices as individual (33 %), such as working on their own and how they learned to use research methods, instruments and devices through individual study or experimentation. the students also described their individual responsibilities and the challenges they faced, such as experiences of being without support, as one participant describes: so if the group does something for the first time. i feel that it’s sort of my responsibility. and it slightly burdens me, because i haven’t got any training for that. and the supervisors, they are clearly not able to help. then you feel quite alone. (medicine 2) 4.3 the engagement of doctoral students in knowledge practices further investigation showed that the students typically experienced being actively engaged in knowledge practices (79 %). hence, they perceived themselves as active actors and intentional participants who were able to affect activities and make decisions in the practice at hand. this is a key for cultivating relational agency both in terms of the engaging in knowledge creation in order to deliver original research output as well as in terms of becoming full member of the researcher community. active engagement was described in established and innovative, collective and individual practices related to using and mastering research methods, instruments, or techniques, as well as working with conceptual and theoretical problems, and developing and sharing ideas: i just went through and compared the comments, and there was this sort of eureka moment i had, that maybe all i need to do is to decide for myself. i realized that i have my own opinion about where this should go, and it was very close to what one of my supervisors thought as well. but, it was also against the view of my other supervisor. but then again, in the end, i just made my own decision. and i came to the conclusion that even in the so-called ‘hard sciences’ there really isn’t always exactly one truth to follow. (natural science 3) in the following, another participant expresses how he had an active role and control over his research work: i’ve been given a lot of room for my own self-guidance, and my own thoughts and implementations. and i’ve never really had any difficulties in getting my own thoughts about what i wanted to do heard. so in that regard it’s been quite rewarding, and i’ve been given the opportunity to do plenty of different kinds of things. (bio 18) doctoral students less frequently described adapting existing activities and ideas. such experiences were only occasionally reported among all the instances of knowledge practices (13 %). a characteristic of these experiences was that the students considered themselves to be passive participants who were doing activities and work that someone else wanted or had ordered them to do. accordingly, developing relational agency is not easy or self-evidently resulted from carrying out doctoral research. if doctoral students are not given opportunity to participate and contribute actively to the knowledge practices of their researcher community, including opportunity for experimenting and even to fail, also the opportunities for learning to become a researcher are limited. in some cases, the students believed that the way others carried out the activities was not meaningful for them. such roles were often described in association with established and collective practices. furthermore, these descriptions were typically related to planning the research topic and process, developing research ideas, as well as writing an article. one of the participants expressed his adaptive role in choosing and conducting his doctoral research in the following way: i was told that i should be working on this doctoral dissertation topic. they needed a candidate for it. and i just went and started in that project where they had the opening for one more student and was stuck there. i could not choose the topic myself, but partly there was kind of pressure to have this topic that was worth four academic articles. and i need those four. but in this situation, it is not that i can just creatively come up with something to research. something i’d find interesting to look further into by myself. but it’s just not possible. if the thing you’re working on doesn’t sound [to the supervisors] like it’s going to be good enough, then it’s not worth spending your time on, and you would be told to work on other stuff. kind of from the top down. (medicine 4) doctoral students rarely considered themselves bystanders (8 %), in other words, observers who were left outside the practice. yet experiencing oneself as bystander can be considered highly problematic since it limits doctoral student’s learning both in terms of becoming independent researcher, and in delivering original research output. in the cases where this did occur, they described seemingly unclear and unorganised perceptions of their role in the practice. the bystander role was typically associated with established and collective practices related to, for instance, defining and discussing the research topic and plan, designing the research questions or writing a publication. one participant described his role of a bystander in the following way: then, our clinician actually wrote the paper because i did not have the right clinical background for that. (bio 6). 5. discussion our results show that established knowledge practices played an important role in cultivating doctoral students’ insights into their research, developing creative thoughts and behaviours, enabling them to define a problem space and to solve them. such practices are typically well tested and cultivated over long period of time by the researcher community, and hence provide a grounding for its knowledge creation. engaging doctoral students in these baseline knowledge practices is key both for becoming full member of the community and teaching them about the research and disciplinary practices, as mastery of existing ideas and tools are often a precondition for creativity. the established knowledge practices served as the basis for making discoveries and, hence, the creation of new knowledge (see also sternberg & lubart, 1999). accordingly, the established knowledge practices provided a vehicle for introducing and engaging doctoral students into the researcher communities, i.e. socialising the student as a novice into the academic community (becher & trowler, 2001). they also provided a starting point for researcher development. in addition, the existence and extent of the reported innovative practices evident from our dataset is encouraging, since it implies that doctoral students are contributing to novel ways of working, and the transformation of their respective fields of study (trafford & leshem, 2009; wellington, 2012). engaging in such practices provides also opportunity to learn from the mistakes, and use them as opportunity to further cultivate the established practices. more importantly, it allows doctoral students learn how to develop new collective knowledge practices in order to create knowledge in their field. doctoral students are frequently found to face academic isolation and a lack of academic connections (see for example ali & kohun, 2006; austin, 2009) that hinder their progress. our results on knowledge practices suggest that the doctoral students in the stem fields engaged primarily in collective practices. this is partly explained by the fact that majority of the participants engaged in the intensive research group collaborations and conducted article-based dissertations including typically co-authored articles with senior members of the group. however, the result cannot be reduced into the research group status, since the majority of students in sciences reported that they did not carry out their dissertation work in the research group. accordingly, rather than being matter of the structure, it seems to be a matter of the quality of knowledge practices developed in which doctoral students engaged in that matters. the argument follows that students can be actively engaged in collective practices even though they are not formally carrying out their dissertation work in the group. this finding is in accordance with our prior results on medical, humanities, and behavioural science doctoral students, which showed that more than half of the students perceived themselves as members of a scholarly community and its practices. no statistically significant differences were detected in the previous study even though in medicine the majority of the students worked in a research group and carried out article-based dissertations, while in the humanities the students were more likely to follow a monograph dissertation format and were not formally engaged in research groups (pyhältö, stubb, & lonka, 2009). even tough working collectively does not guarantee that doctoral students won’t experience feelings of isolation, the data presented in this article can be considered encouraging since the advantages of collaboration to productivity, and thus also to knowledge creation, have been emphasised (see for example becher & trowler, 2001). the results also suggest that the doctoral students typically considered themselves active participants in knowledge practices instead of mere adapters or bystanders. this provides a good grounding for cultivating the doctoral students’ relational agency within their researcher communities in terms of knowledge practices that may contribute to eventual knowledge creation. given that research groups and collective work are typical in stem areas, this finding is not surprising. yet, it partly contradicts some of our earlier findings suggesting that a minority of doctoral students in humanities, behavioural sciences, and medicine perceived themselves as active relational agents in their own researcher communities (pyhältö & keskinen, 2012). however, our results also imply that not all the students enjoy equal opportunities to exercise relational agency. moreover, it is important to note that active engagement in knowledge practices in their researcher groups does not guarantee an active role in other activities of the group or in other researcher communities. as the stakes are high for doctoral students to complete their studies in a timely manner, our findings about doctoral students’ active role imply that doctoral education provided engaging learning environment for knowledge creation for the majority of our participants (see also frick, 2010). however, taking on the role of adapter or bystander should not be viewed with outright suspicion, as mastery that supports knowledge creation requires an understanding of existing knowledge and an immersion in the field before such a field can be extended or transformed through new and original work (dewett et al., 2005; sternberg & lubart, 1999). yet, if students spend the majority of their time as either adapters or bystanders, in which case they may become stuck in these roles, or resort to mimicking others’ knowledge work rather than creating their own contribution and transforming the field in so doing it can be considered highly problematic (kiley, 2009). the significance of this study lies in exploring doctoral students’ self-reported knowledge practices in stem fields, and shows that they typically perceive themselves as actively and collaboratively engaged in the practices through transforming their respective fields of study. moreover, the study indicates that doctoral knowledge creation embedded in knowledge practices in the studied stem areas is not only an individual cognitive endeavour. instead, it is also a collective process, which takes place in a broader scientific community, not exclusively limited in conducting doctoral dissertation in the research group 5.1 methodological considerations the strength of the chosen qualitative design was that it enabled a multifaceted and deep investigation of doctoral students’ experiences of knowledge practices. in addition, the multiphase analysis enabled investigation of the knowledge practices from various perspectives. however, one problem with the used retrospective approach is that it exposes the memory effect (cox & hassard, 2007), potentially resulting in difficulties for participants in recalling their experiences (kvale, 2007). at the same time, use of the retrospective approach ensured that the participants had a chance to deeply reflect on their experiences and recall the most significant past events (kvale, 2007). the majority of the participants were in the middle or in the last part of the doctoral process and, because of their experience, they have had more opportunities to be involved in and gain experience with various kinds of knowledge practices. the interview data were collected from 34 doctoral students in the stem fields from a large research-intensive university in finland. because of the distinctive features of the disciplines included (e.g., lindblom-ylänne et al., 2006) and the limited sample size, the results should be generalised to other fields and other countries with caution. knowledge creation practices evolve over time and, hence, further research is needed to explore the knowledge practices among researcher communities from different domains and from a longitudinal perspective. 5.2 implications for doctoral education our results indicate that doctoral students can have active roles and be intentional participants in various scientific knowledge practices. further, the findings suggest that active engagement in knowledge practices can be enabled by supporting doctoral students to influence or direct the surrounding. this requires further developing strategies that promote the intentional participation of students in scientific activities and practices (pyhältö & keskinen, 2012; zhao & kuh, 2004). active engagement can be supported through environments that enable doctoral students to share their knowledge and expertise with others, take more responsibility for and ownership of their research activities, and perceive themselves as contributing members of their community (e.g., dunlap, 2006; mcalpine & amundsen, 2009). for instance, the more experienced members of the researcher communities, such as supervisors, senior researchers and post-doctoral fellows, could support and encourage doctoral students to take increasingly more ownership and responsibility for planning, monitoring and evaluating the everyday practices of knowledge creation. such practices, according to our results, could be planning and conducting actual research work, theoretical problem solving, and dialogues on research ideas. supporting the active role of doctoral students in knowledge practices is likely to be an investment in the quality of future academic work. at best, active doctoral students will become autonomous scientists who create new, high-quality knowledge. keypoints the study aims to contribute to the doctoral education literature by exploring the kinds of knowledge practices in which doctoral students in the stem fields engage during their studies. the significance of this study lies in exploring doctoral students’ self-reported knowledge practices. this study demonstrates that the doctoral students perceive themselves as actively and collaboratively engaged in the knowledge practices. the study concludes that active engagement in knowledge practices can be enabled by supporting doctoral students to influence or direct the surrounding knowledge creation activities. references ali, a., & kohun, f. (2006). dealing with isolation feelings in is doctoral programs. international journal of doctoral studies, 1 (1), 21–33. austin, a.e. (2009). cognitive apprenticeship theory and its implications for doctoral education: a case example from a doctoral program in higher and adult education. international journal for academic development, 14(3), 173–183. https://doi.org/10.1080/13601440903106494 bandura, a. (2001). social cognitive theory: an agentic perspective. annual review of psychology, 52, 1–26. https://doi.org/10.1146/annurev.psych.52.1.1 becher, t. & trowler, p. r. (2001). academic tribes and territories. intellectual enquiries and the culture of disciplines (2nd ed.). open university press, buckingham. bereiter, c. (2002). education and mind in the knowledge age. mahwah, nj: lawrence erlbaum associates. brew, a., boud, d., & namgung, s. u. (2011). influences on the formation of academics: the role of the doctorate and structured development opportunities. studies in continuing education, 33(1), 51–66. https://doi.org/10.1080/0158037x.2010.515575 cox, j. w., & hassard, j. (2007). ties to the past in organization research: a comparative analysis of retrospective methods. organization, 14(4), 475–497. https://doi.org/10.1177/1350508407078049 creswell, j. (2012). qualitative inquiry & research design. choosing among five approaches (3rd ed.). london: sage publishers. cumming, j. (2009). the doctoral experience in science: challenging the current orthodoxy. british educational research journal, 35(6), 877–890. https://doi.org/10.1080/01411920902834191 delamont, s., & atkinson, p. (2001). doctoring uncertainty: mastering craft knowledge. social studies of science, 31(1), 87–107. https://doi.org/10.1177/030631201031001005 dewett, t., shin, s. j., toh, s. m., & semadeni, m. (2005). doctoral student research as a creative endeavour. college quarterly, 8(1), 1–20. dunlap, j. c. (2006). the effect of a problem-centered, enculturating experience on doctoral students’ self-efficacy. interdisciplinary journal of problem-based learning, 1(2), 19–48. https://doi.org/10.7771/1541-5015.1025 engeström, y. (1999). activity theory and individual and social transformation. in y. engeström, r. miettinen, & r. punamäki (eds.), perspectives on activity theory. learning in doing: social, cognitive and computational perspectives (pp. 19–38). cambridge: cambridge university press. fenge, l. a. (2012). enhancing the doctoral journey: the role of group supervision in supporting collaborative learning and creativity. studies in higher education, 37(4), 401–414. https://doi.org/10.1080/03075079.2010.520697 florence, m. k., & yore, l. (2004). learning to write like a scientist: co-authoring as an enculturation task. journal of research in science teaching, 41(3), 637–668. https://doi.org/10.1002/tea.20015 frick, b. l. (2010). creativity in doctoral education: conceptualising the original contribution. in c. nygaard, n. courtney & c.w. holtham (eds.), teaching creativity – creativity in teaching. oxfordshire: libri publishing. frick, b. l. & brodin, e. m. (2014). developing expert scholars: the role of reflection in creative learning. in e. shiu (ed.), creativity research: an interdisciplinary and multidisciplinary research handbook (pp. 312–333). london: routledge. furner, j. (2003). little book, big book: before and after little science, big science: a review article, part i. journal of librarianship and information science, 35(2), 115–125. https://doi.org/10.1177/0961000603352006 gardner, s. k. (2007). “i heard it through the grapevine”: doctoral student socialization in chemistry and history. higher education, 54(5), 723–740. https://doi.org/10.1007/s10734-006-9020-x golde, c. m. (2010). entering different worlds. socialization into disciplinary communities. in s. k. gardner & p. mendoza (eds.), on becoming a scholar. socialization and development in doctoral education (pp. 79–95). virginia, usa: stylus publishing, llc. greeno, j. g. (2006). authoritative, accountable positioning and connected, general knowing: progressive themes in understanding transfer. the journal of learning sciences, 15, 537–547. http://dx.doi.org/10.1207/s15327809jls1504_4 hadwin, a., & oshige, m. (2011). self-regulation, coregulation and socially shared regulation: exploring perspectives of social in self-regulated learning theory. teachers college record, 113(2), 240–264. hakkarainen, k., hytönen, k., makkonen, j., seitamaa-hakkarainen, p., & white, h. (2013). interagency, collective creativity, and academic knowledge practices. in a. sannino & v. ellis (eds.), learning and collective creativity: activity-theoretical and sociocultural studies (pp. 77–95). london: routledge. hakkarainen, k., palonen, t., paavola, s., & lehtinen, e. (2004), communities of networked expertise. professional and educational perspectives . amsterdam: elsevier. hakkarainen, k. p., wires, s., keskinen, j., paavola, s., pohjola, p., lonka, k., & pyhältö, k. (2014). on personal and collective dimensions of agency in doctoral training: medicine and natural science programs. studies in continuing education, 36(1), 83–100. https://doi.org/10.1080/0158037x.2013.787982 hancock, s., hughes, g., & walsh, e. (2017). purist or pragmatist? uk doctoral scientists’ moral positions on the knowledge economy. studies in higher education, 42(7), 1244–1258. https://doi.org/10.1080/03075079.2015.1087994 harry, b., sturges, k. m., & klingner, j. k. (2005). mapping the process: an exemplar of process and challenge in grounded theory analysis. educational researcher, 34(2), 3–13. https://doi.org/10.3102/0013189x034002003 holland, d., lachiocotte, w., skinner, d., & cain, c. (1998). identity and agency in cultural worlds. cambridge, ma: harvard university press holmes, l. (2004). challenging the learning turn in education and training. journal of european industrial training, 28(8/9), 625–638. https://doi.org/10.1108/03090590410566552 hopwood, n. (2010). a sociocultural view of doctoral students’ relationships and agency. studies in continuing education, 32(2), 103–117. https://doi.org/10.1080/0158037x.2010.487482 jazvac-martek, m., chen, s., & mcalpine, l. (2011). tracking the doctoral student experience over time: cultivating agency in diverse spaces. in l. mcalpine & c. amundsen (eds.), doctoral education: research-based strategies for doctoral students, supervisors and administrators (pp. 17–36). netherlands: springer. john-steiner, v. (2000). creative collaboration. oxford university press. kamler, b. (2008). rethinking doctoral publication practices: writing from and beyond the thesis. studies in higher education, 33(3), 283–294. https://doi.org/10.1080/03075070802049236 kiley, m. (2009). identifying threshold concepts and proposing strategies to support doctoral candidates. innovations in education and teaching international, 46(3), 293–304. https://doi.org/10.1080/14703290903069001 kvale, s. (2007). doing interviews. london: sage publications. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge: university press. lindblom-ylänne, s., trigwell, k., nevgi, a., & ashwin, p. (2006). how approaches to teaching are affected by discipline and teaching context. studies in higher education, 31(3), 285–298. https://doi.org/10.1080/03075070600680539 lipponen, l., & kumpulainen, k. (2011). acting as accountable authors: creating interactional spaces for agency work in teacher education. teaching and teacher education, 27, 812–819. http://dx.doi.org/10.1016/j.tate.2011.01.001 mathieson, s. (2011). developing academic agency through critical reflection: a sociocultural approach to academic induction programmes. international journal for academic development, 16(3), 243–256. https://doi.org/10.1080/1360144x.2011.596730 mcalpine, l., & amundsen, c. (2009). identity and agency: pleasures and collegiality among the challenges of the doctoral journey. studies in continuing education, 31(2), 109–125. https://doi.org/10.1080/01580370902927378 mcalpine, l., & norton, j. (2006). reframing our approach to doctoral programs: an integrative framework for action and research. higher education research & development, 25(1), 3–17. https://doi.org/10.1080/07294360500453012 mcalpine, l., & åkerlind, g. (2010). academic practice in a changing international landscape. in l. mcalpine & g. åkerlind (eds.), becoming an academic. international perspectives (pp. 1–15). united kingdom: palgrave macmillan. miles, m. b., & huberman, a. m. (1994). qualitative data analysis (2nd ed.). thousand oaks, ca: sage publications. mills, j., bonner, a., & francis, k. (2006). the development of constructivist grounded theory. international journal of qualitative methods, 5(1), 25–35. https://doi.org/10.1177/160940690600500103 morgan, d. l. (2007). paradigms lost and pragmatism regained: methodological implications of combining qualitative and quantitative methods. journal of mixed methods research, 1(1), 48–76. https://doi.org/10.1177/2345678906292462 nonaka i, takeuchi h. (1995). the knowledge creating company. oxford university press: new york. o’ meara, k., terosky, a.l. & neumann, a. (2008). faculty careers and work lives: a professional growth perspective. ashe higher education report, 34 (3). san fransisco, ca: jossey-bass. o’ meara, k, & campbell, c. m. (2011). faculty sense of agency in decisions about work and family. the review of higher education, 34 (3), 447–476. http://dx.doi.org/10.1353/rhe.2011.0000 paavola, s., lipponen, l., & hakkarainen, k. (2004). models of innovative knowledge communities and three metaphors of learning. review of educational research, 74(4), 557–576. https://doi.org/10.3102/00346543074004557 pyhältö, k., & keskinen, j. (2012). doctoral students’ sense of relational agency in their scholarly communities. international journal of higher education, 1(2), 136–149. pyhältö, k., nummenmaa, a. r, soini, t., stubb, j., & lonka, k. (2012). research on scholarly communities and development of scholarly identity in finnish doctoral education. in s. ahola & d. m. hoffman (eds.), higher education research in finland. emerging structures and contemporary issues (pp. 337–357). jyväskylä: jyväskylä university press. pyhältö, k., pietarinen, j., & soini, t. (2012). do comprehensive school teachers perceive themselves as active professional agents in school reforms?. journal of educational change, 13(1), 95–116. pyhältö, k., stubb, j., & lonka, k. (2009). developing scholarly communities as learning environments for doctoral students. international journal for academic development, 14(3), 221–232. https://doi.org/10.1080/13601440903106551 pyhältö, k., stubb. j., & tuomainen, j. (2011). international evaluation of research and doctoral education at the university of helsinki to the top and out to society. summary report on doctoral students’ and principal investigators’ doctoral training experiences. retrieved from http://wiki.helsinki.fi/display/evaluation2011/survey+on+doctoral+training sfard, a. (1998). on two metaphors for learning and the dangers of choosing just one. educational researcher, 27(2), 4–13. https://doi.org/10.1080/13601440903106551 sternberg, r. j., & lubart, t. i. (1999). the concept of creativity: prospects and paradigms. in r.j. sternberg (ed.), handbook of creativity (pp. 3–15). cambridge: cambridge university press. stubb, j., pyhältö, k., & lonka, k. (2014). conceptions of research: the doctoral student experience in three domains. studies in higher education, 39(2), 251–264. https://doi.org/10.1080/03075079.2011.651449 trafford, v. & leshem, s. (2009). doctorateness as a threshold concept. innovations in education and teaching international, 46(3), 305–316. https://doi.org/10.1080/14703290903069027 united kingdom quality assurance agency for higher education (2008). the framework for higher education qualifications in england, wales and northern ireland . mansfield: linney direct. volet, s., vauras, m., & salonen, p. (2009). selfand social regulation in learning contexts: an integrative perspective. educational psychologist, 44(4), 215–226. https://doi.org/10.1080/00461520903213584 walker, g. e., golde, c. m., jones, l., conklin bueschel, a., & hutchings, p. (2008). the formation of scholars. rethinking doctoral education for the twenty-first century . san francisco, usa: jossey-bass. wellington, j. (2012). searching for doctorateness. studies in higher education, 38(10), 1490–1503. https://doi.org/10.1080/03075079.2011.634901 zhao, c., & kuh, g. d. (2004). adding value: learning communities and student engagement. research in higher education, 45(2), 115–138. harteis et al publication frontline learning research vol.6 no. 3 (2018) 37 56 issn 2295-3159 do we betray errors beforehand? the use of eye tracking, automated face recognition and computer algorithms to analyse learning from errors christian harteisa, christoph fischera, torben töniges b, britta wrede b apaderborn university, germany btechnical faculty, bielefeld university, germany article received 14 may/ revised 25 september/ accepted 26 september/ available online 7 december abstract preventing humans from committing errors is a crucial aspect of man-machine interaction and systems of computer assistance. it is a basic implication that those systems need to recognise errors before they occur. this paper reports an exploratory study that utilises eye-tracking technology and automated face recognition in order to analyse test persons’ emotional reactions and cognitive load during a computer game and learning through trial and error. computer algorithms based on machine learning and big data were tested that identify particular patterns of test persons’ gaze behaviour and facial expressions that antecede errors in a computer game. the results show that emotions and learning from errors are positively correlated and that gaze behaviour and facial expressions inform about the errors that follow. however, the algorithms still need to be improved through further studies to be suitable for daily use. this research is innovative in its use of mathematical formulae to operationalise learning through errors and the use of computer algorithms to predict errors in human behaviour in trial-and-error situations. keywords: face recognition; eye tracking; emotions; learning from errors info: mail corresponding author christian.harteis@upb.de doi: https://doi.org/10.14786/flr.v6i3.370 1. introduction: research problem working life becomes increasingly complex and challenging, particularly through technological development and digitalisation that aim to enable flexible work processes. as a result, the organisation of work is changing, as well as – in consequence – working tasks and working tools. these become more difficult and the need for efficacy may generate time pressures. under these conditions, the risk of errors arises. estimations differ of the amount of worktime spent on errors in enterprises but are as high as half of the entire worktime (hofman & frese, 2011). on the one hand, systems engineering can strive to develop intelligent systems that prevent humans from committing errors. on the other hand, research into complex systems has revealed that it is not possible for human activity to avoid errors completely. hence, learning from errors becomes a relevant issue because it is at least possible to avoid the repetition of errors (harteis & bauer, 2014). there is no contradiction in simultaneously trying to develop systems that prevent errors (as far as possible) and postulating learning from errors, because both issues are interrelated. rather, understanding how to learn from errors is a precondition for developing man-machine interaction that assists in error avoidance. hence, the main research problem addressed here is how to understand learning from errors in order to provide safety through error prevention in man-machine interaction. theoretically, learning from errors has the following preconditions, none of which is trivial in the context of work (bauer & mulder, 2013; oser & spychiger, 2005): (a) the error has to be identified, (b) feedback to the acting person has to occur and (c) reflection and cause analyses have to result in the creation of negative knowledge – that is, knowledge about how things are not shaped and how processes do not work (gartmeier, bauer, gruber, & heid, 2008; oser, naepflin, hofer, & aerni, 2012). in addition, during these processes, the individual concernment of the failing person has to occur. “concernment refers to the emotional reaction, in which the error embarrasses the actor in a certain way. such an emotional reaction adds value to the experience of the error situation” (harteis & bauer, 2014, p. 710). this added value attaches sufficient importance to the error to initiate the cause analysis – subjectively unimportant events can easily be neglected – and adds authority to the knowledge resulting from reflecting on the cause analyses, which ultimately supports appropriate storage in the memory by amplifying the episodic memory and, thus, learning from error that prevents from its repetition (oser et al., 2012). however, while knowing that concernment resulting in emotional engagement is a crucial precondition for learning from errors, there is so far no evidence about the kind of valence of emotions (e.g. positive or negative) that best supports learning from errors. to sum up: any kind of emotional reaction in an error situation can be considered a basic precondition for learning from errors. it was oser who identified situations that almost – but not finally – ended up with errors as incidents of interest for their suggestion of a sense of failure (oser, müller, obex, volery, & shavelson, 2018; oser & obex, 2015). of course, in order to prevent errors, it is important not only to learn from errors but also from incidents in which errors nearly occur. in general, the crucial moment of learning from near misses is the emotional reaction that arises when somebody realises that an error is about to occur or that an error almost happened. cause analysis and reflection upon the incident play a similar role here as they do for learning from errors. however, investigations into this sense of failure revealed that emotional reactions that accompany (almost) error situations do not necessarily occur as a reaction to the incident itself but may arise shortly before the error occurs (oser & volery, 2012). whereas emotional reactions after an error tend toward embarrassment, cognitive load is considered to be the reason for an emotional reaction before an (almost) error situation (de jong, 2010). cognitive load refers to the working load within the limited capacity of the short-term memory (sweller, 1994). when cognitive load becomes too big, the actor’s capacity for information processing and problem perception decreases so that errors become probable. to investigate the research problem stated above, several issues have to be considered: emotional reactions and cognitive load are important phenomena in relation to the occurrence of errors or near misses and are important indicators that can help to prevent upcoming errors. the challenge, of course, lies in how best to operationalise and measure these phenomena. empirical research on learning from errors and near misses has to date applied self-reporting methods, that is, interviews and questionnaires. investigating error situations in work contexts is particularly challenging because companies tend to avoid publishing business processes. there are studies investigating employees’ attitudes towards errors at work (e.g. hetzner, gartmeier, heid, & gruber, 2011) which make use of standard self-report questionnaires (e.g. the error orientation questionnaire – rybowiak, garst, frese, & batinic, 1999) and there are studies investigating ways of dealing with error situations in daily working life which apply self-report questionnaires or interviews (e.g. harteis, bauer, & gruber, 2008). the current state of research thus has to acknowledge the following problems: studies operating with questionnaires focus either on general attitudes towards errors or they introduce a constructed error situation (e.g. through case stories or vignettes) and ask for potential reactions. neither option provides any insight into how a person actually behaves and reacts in a real error situation. in addition, whether the constructed situation causes a similar emotional engagement to real situations remains a matter for speculation. studies focusing on incidents that participants actually experienced usually ask test persons for episodes in which an error occurred and ask them to describe how the people concerned dealt with this situation. however, it is difficult to relate different cases described by different test persons to each other because the cases themselves represent error situations of different dimensions, because it is unclear how representative the described cases are for the test persons’ (work) environment and because the descriptions are probably subjectively biased. by their nature, self-reports feature only those mental and emotional processes that test persons are aware of and can remember. those studies therefore neglect whatever may remain unconscious or cannot be recalled. obviously, there is a research gap in the studies that investigate learning from errors, requiring a study that (a) on the one hand reliably provides stable and repeatable error situations for an entire sample, (b) does not depend on subjective biases and memory performances, and (c) is able to grasp unconscious emotional reactions. this study aims to test particular online measures of emotional reactions and cognitive load. 2. research questions and study context in order to reach the research aims, research questions were formulated that address theoretical issues and issues of online measurement. since field studies have to accept the problems described above, a laboratory setting appears appropriate to establish conditions that are identical for all test persons and exactly repeatable. admittedly, a laboratory environment lacks authenticity. however, it should be acceptable as long as test persons develop concernment when failing during the experiment. hence, the present study used a regular jump-and-run computer game (give up 2 by armor games) and controlled for test persons’ involvement. 2.1 research questions the research questions for this study can be separated into thematic (rq 1 and rq 2) and methodological (rq 3 and rq 4) questions. rq 1. are there emotional reactions and indicators for cognitive load to be found that precede errors? the hypothesis to be tested is that emotional reactions and/or cognitive load precede errors. an answer to this question is relevant for the intention to anticipate errors before their occurrence. rq 2. is the quality of learning related to emotional reactions? the hypothesis to be tested is that better learners show stronger emotional reactions than worse learners do. this question tests oser’s theory about the importance of concernment for learning from errors. rq 3. are there appropriate online measures that indicate emotional reactions? this study also aims to test particular online measures for their suitability for educational research questions. rq 4. is there a specific measure for the quality of learning from errors? a computer game with exactly repeatable conditions allows the combination of indicators for learning from errors and degree of difficulty. 2.2 description of the computer game give up 2 is a computer game in which the player operates a figure that needs to overcome obstacles and dangers on various levels of increasing difficulty. two kinds of error can occur in this game: (a) the player fails to overcome the obstacle with the figure or (b) the figure gets attacked by weapons and dies. in that case, the player has to start from the beginning of the respective level again. the course of action always remains the same, i.e. each player faces the same conditions. video 1. demonstration of the computer game this game permits the introduction of different test persons into comparable problem settings that provoke errors. it provides a competitive scenario that should motivate the test persons to perform as well as possible. 3. methods of data collection, analyses and challenges this main section of this paper describes the procedures of data collection, data preparation and data analysis. first, a flowchart illustrates the sequence of data processing. then, the system configuration and the sample description provide insight into the way the data collection was realised (3.1). the raw data were then prepared (3.2) for further analyses, exploring the predictors of errors (3.3) and learning from errors (3.4). the description of the analyses applied here also comprises a discussion of the challenges, because novel approaches were tested. figure 1 shows a flowchart of the procedures that were applied for this investigation. they comprise regular and well-established approaches to eye and face analysis utilising particular parameters and procedures that will be explained within this section. the flowchart presents the sequence showing how the data were prepared and analysed. figure 1. flow chart of data acquisition and processing. for eye analyses, pupil diameters were used to grasp cognitive load, and face analyses used standard tools (i.e. the openface toolkit and affectiva framework) that apply particular parameters to identify emotional reactions. these data were aggregated to temporal bindings per item, which were then used for machine learning. the result of the data-mining process is a trained classifier for the prediction of an error. 3.1 test procedure data collection was realised with the remote eye-tracking system smi red250, using the software versions experiment center 3.7.60 and begaze 3.7.42 and a video camera (logitech c922 pro stream) installed at the top of the stimuli screen within a laboratory, with a headrest and steady artificial lightning, i.e. robust laboratory conditions with no brightness differences between test persons (holmqvist et al., 2011). additionally, the stimulus itself did not vary a lot in luminesce. thirty-eight test persons with varying experience in computer gaming voluntarily took part in the experiment. table 1 sample description. before starting the experiment, the test persons filled in the consent form, received a video introduction to the game and were asked to confirm that they understood the game and the operation of the system. they also completed a questionnaire before and after the experiment describing their gaming experience and their engagement with the game measured by the personal involvement inventory (zaichkowsky, 1994). table 1 indicates that the test persons were sufficiently engaged. the test persons played the game for five minutes on the stimuli presentation screen using three keys of a keyboard, and they were observed by a video camera and remote eye-tracking system. the following in situ-data were generated and utilised: game video. via screen recording, a video of the game was generated, including all inputs of the test persons during the game. facial video. the video camera recorded the test person’s face during the game. pupil diameter. the remote eye-tracking system constantly recorded the test person’s right pupil diameter during the game. changes in pupil diameter apply as indicators of cognitive load (szulewski, kelton, & howes, 2017). 3.2 data preparation the synchronisation of these data was realised by time stamps implemented through the eye-tracking software. the challenge for subsequent analyses was to derive meaningful information from these data. therefore, they were further edited. as a first step, comprehensive annotations were added to the game video: errors. every event in which a test person failed (i.e. being hit by an object or failing to overcome an obstacle) was marked as an ‘error’. successes. every event in which a test person succeeded in avoiding an object or overcoming an obstacle was marked as a ‘success’. 3.3 analyses exploring predictors of errors (emotional reactions and cognitive load) the face videos were analysed by applying the affectiva framework (mcduff, el kaliouby, & picard, 2015) and the openface toolkit (baltrušaitis, robinson, & morency, 2016). based on millions of facial recording and facial images, these frameworks are able to extract crucial facial landmarks, such as eyebrows or mouth contours (see left side of video 2). these extracted points on each image are used to extract the head pose, gaze and facial action units (aus). the facial action coding system (facs, ekman & friesen, 1978) is a taxonomy for classifying various facial behaviours (e.g. au1: inner brow raise), and the frameworks used here were able to classify up to 17 of these aus (wolf, 2015). the affectiva framework also combines different aus to build up expression classes that are more abstract, such as emotional valence (i.e. the positive or negative nature of an emotion) and engagement (the expressiveness of the emotions). these frameworks, particularly in combination, provide derived data on crucial landmarks of the face representing emotional reactions as well as their valence and engagement. the next step of analysis aimed at identifying the precursors of errors. data-mining procedures were applied that utilised the derived data from the emotional reactions and a binary classification was modelled. all recorded videos were divided into snippets of 300 milliseconds in length with an overlap of 150 milliseconds. the positive class within the classification was modelled as ‘error predication’ and all snippets occurring before errors were marked. the negative class was modelled as ‘all the rest’ and all other data were assigned accordingly. to avoid noisy data, the negative class was filtered by removing those video snippets that followed an error event. the data were split into training (75%) and testing (25%) sets, each set containing the same percentage of positive and negative data. for each frame of the 300-millisecond snippets, the following 28 items were extracted by openface and affectiva: 17 aus i.e. facial expressions. valence. engagement. gaze angle x + gaze angle y. extracted in radians and averaged for both eyes. a person looking from the left to the right results in a change of gaze of angle x, while a person looking from up to down results in a change of gaze of angle y. if a person is looking straight ahead, both angles will be close to 0 (see figure 2). normalised pupil diameter. if the surrounding lighting conditions are steady, the pupil diameter (see figure 3) can be used as an indicator of cognitive workload: the wider the diameter, the higher the workload of the person (beatty, 1982; krejtz, duchowski, niedzielska, biele, & krejtz, 2018; laeng, sirois, & gredebäck, 2012; szulewski, kelton, & howe, 2017). head pose tx, ty, tz (location of the head). head rotation rx (pitch), ry (yaw), rz (roll) – see figure 4. figure 2. gaze angles. figure 3. pupil diameter. figure 4. head rotation. figure 5. example of a linear regression for items within 300ms frame. to take temporal dynamics into account, these items were further processed. for each of the 28 items, the temporal and stochastic variations were extracted. figure 5 shows an exemplary schematic representation: the red crosses represent items occurring within the respective timeframe. for each item, the following features were extracted: linear regression parameters (l0, l1) maximum value (max) minimum value (min) mean standard deviation (std) this results in 6 features per item, totalling 168 features, which were used as inputs for the machine training procedure. genetic programming (olson, urbanowicz, andrews, lavender, kidd & moore, 2016) was used to train an optimised machine-learning pipeline on the training set that would subsequently be used to evaluate the testing set. the pipeline consisted of multiple steps, such as feature selection, preprocessing, model selection and parameter optimisation. ultimately, the learned classifier could be used to classify unknown video snippets with the aim of identifying the precursors of errors. while this description of procedures describes the general ratio of analyses, combinations and sets of features were also varied in order to explore answers for the research questions raised above. video 2. demonstration of aggregated data. 3.4 analyses exploring learning from errors at this point, two kinds of derived data had been generated: errors and successes on the one hand, and emotional reactions on the other. as a third kind of derived data, the quality of learning from errors had to be identified. disciplines that investigate learning through a formalised lens have developed the idea of calculating a learning curve to indicate quality of learning (jaber, 2016; yelle, 1979). applied to the computer game, the learning curve (orange) can be defined by indicating the errors and successes (y-axis) of a specific task for all attempts (x-axis) at solving the task on the different game levels (separated by vertical lines; see figure 6). figure 6. example of a learning curve. as a computer game requires a quite specific kind of learning of one particular skill (i.e. mastering the task), a formalised perspective for differentiating qualities of learning appears appropriate. an overall comparison of the ratio of successes and errors would provide a measurement of the overall performance of the player but would grant no insight into skill development. in order to grasp this, it is necessary to evaluate the performance progression over time. however, changes in performance indicated by changes in the slope of the learning curve can be illustrated by an angle (in figure 6: α and β). these angles indicate improvement if they have a positive value and a decline if they have a negative value. the design of the computer game provides a remarkable increase in playing difficulty at levels 2 and 6. each time the difficulty increases, the player has to readjust their behaviour and thereby improve their skill. hence, at levels 2 and 6, we can expect relatively more errors, compared to successes, than at levels 3, 4 and 5. the introduction of a new task problem provides the player with an opportunity to learn or improve their skill. the learning curve is thus steeper at levels 3, 4 and 5 than at levels 2 and 6. a possible approach to calculate learning quality contrasts the angles of the learning curve between levels 2 and 5 and between levels 5 and 6. level 5 represents the last easy playing level; the learning curve is considered to be steepest here. levels 2 and 6 represent difficult playing levels. looking at the absolute slope of the learning curve would be biased in favour of test persons who started with an already high skill level; however, it is not the intention to measure a test person’s absolute skill level but their skill development. hence, the slope at level 2 defines the baseline skill the test person shows after the first increase in difficulty. the following three levels of a similar difficulty provide the test persons with opportunities to improve on the lower level of difficulty. the difference between the slopes at levels 2 and 5 (angle α) represents the skill increase (or decrease) during this phase of the game. however, focusing on this difference alone would be biased in favour of test persons who performed very weakly at level 2 but who improved at level 5. as the change in skill is also relevant for difficult tasks, it is necessary to consider the development during level 6 – represented in the difference between the slope at levels 5 and 6 (angle β). a consideration of both angles corrects the bias of angle α towards a weak performance at level 2 combined with a good performance at level 5. hence, the derived data on learning quality considers the relative increase in successes between levels 2 and 5 in contrast to the relative increase in errors between levels 5 and 6. to make this measurable, the two angles α and β are calculated, where α is the angle between the slopes of levels 2 and 5 and β is the angle between the slopes of levels 5 and 6. the quality of learning can thus be defined by the following formula: learning = α + β hence, at this step of data preparation, the following derived data was available in order to answer the research questions: hperformance: errors and successes hemotional reactions: valence and engagement hcognitive load: changes in pupil diameter hquality of learning: learning = α + β the methodological challenges were to overcome the weaknesses of previous research on learning from errors as discussed in the section above. one of the major concerns raised there was that previous research does not inform about factual learning from errors. of course, a computer game provides quite a specific scenario for learning from errors. however, the major advantage is that it provides stable and repeatable conditions for all test persons. the gaming situation provides an opportunity to establish experimental conditions that appropriately motivate test persons to perform as well as possible while allowing them to commit errors without serious consequences. hence, the computer game provides an experimental scenario in which actual reactions to error can be observed. there is a further concern about the reliability and validity of data on learning from errors. in this experiment, particular online measurements were implemented to indicate learning from errors. it is a challenge, of course, to derive meaningful data from the data which are themselves extracts from raw data. the quality of these derived data will be discussed in later sections. 4. results the presentation of the results follows the sequence of research questions listed above. besides a t-test for group comparisons, the quality of the trained classifier resulting from data-mining and machine-learning procedures will be illustrated by using standard big data indicators, namely, receiver operating characteristic (roc) curves (fawcett, 2006) and confusion matrices (congalton, 1991). rq 1: are there emotional reactions and indicators for cognitive load to be found that precede errors? to answer this question, the previously described method was used to train a classifier on the whole training set features. the genetic programming reveals that the best result can be achieved with a gradient boosting classifier (friedman, 2001). the corresponding roc curve (see figure 7) shows the performance of the trained classifier. the roc curve visualises the diagnostic capability of a classifier: therefore, the true positive rate (sensitivity; i.e. an error is correctly predicted) is plotted against the false positive rate (probability of false error predictions) while varying the different thresholds of the classifier. the finally received operating characteristic of the trained classifier can be seen in figure 7. figure 7. roc curve of the optimal classifier. the dashed line represents the baseline of random guessing (i.e. the pure chance of a wrong or correct error prediction is 0.5). an optimal classifier (i.e. each prediction is correct) would receive an roc curve with an area of 1.0. the trained classifier here received a roc curve with area of 0.85. hence, with the current training set, it was possible to train a classifier able to precede errors sufficiently. changes in pupil diameters were considered as an indicator of cognitive load. the 300ms timeframes before errors and those before successes were examined and linear regressions for the changes in the test persons’ pupil diameters before errors and before successes were calculated for each test person individually. consequently, all test persons’ beta-coefficients can be put into a t-test for independent samples distinguishing errors and successes. table 2 presents the results of this t-test. table 2 two-sided t-test comparing changes in pupil diameters before errors and successes. in mean, changes in pupil diameter in error situations were significantly larger than changes in success situations. rq 2: is the quality of learning related to emotional reactions? since the quality of learning was operationalised through the formula developed above, a median split was applied to distinguish two groups of test persons: better learners and worse learners. for this calculation, only those n = 19 test persons could be considered who finished level 6 in the game and whose face recognition was successful. on this basis, a t-test reveals differences in emotional reactions (i.e. engagement and valence). table 3 two-sided t-test between better and worse learners. table 3 shows the results of the t-test that confirm the theoretical expectations: better learners show significantly higher emotional reactions in terms of engagement and valence. this means that better learners show emotional reactions of a stronger amount or intensity than worse learners do, and they also tend to a higher extent towards positive emotions than worse learners do. rq 3: are there appropriate online measures that indicate emotional reactions? to answer this research question, the features that were used for training the error classifier can be analysed in more detail. an importance ranking of all features was calculated, based on chi-square statistical analyses between each feature and class. a chi-square analysis test is able to measure dependence between stochastic variables. for classification, this test can be used to obtain a measure of dependence between the features and the two classes of the classifier. the features that are most likely to be independent of class receive a low score and the features that are most likely to be dependent of class receive a high score. the more dependent a feature of the class is, the better this particular feature is for use in classification – in our case, for predicting an error. figure 8 shows the 20 most important features for the prediction of errors. figure 8. ranking of most important features for the prediction of errors. it is remarkable here that only eye blink (au45), gaze (gaze angle x), cognitive load (pupil diameter), nose wrinkle (au09) and lip stretcher (au20) are represented in this top ranking. this means that those five items bear the majority of information required for the classification of errors. this calculation of the importance of singular features for the prediction of errors reveals that not only can emotional reactions be considered as relevant precursors of errors but also that pupil diameters can be interpreted as indicators of cognitive load (i.e. significant changes within 300 milliseconds before an error occurs). hence, a combination of pupil diameter features and facial video features contributes to the improvement of the classification of errors. in order to assess the increase in quality through this combination, the described machine pipeline was trained in two different ways: option 1 considers all 168 features and option 2 considers all features except the 6 pupil diameter features. figure 9 shows confusion matrices for both options (option 1, left side; option 2, right side). figure 9. confusion matrices. these confusion matrices cover a four-field table. the x-axis marks the prediction of an event (0 = no error prediction; 1 = error prediction) and the y-axis marks the actual outcome of an event (0 = no error; 1 = error). hence, the first and fourth quadrants indicate correct predictions, while the second and third indicate incorrect predictions. hence: the upper left quadrant refers to true negative predictions, the upper right quadrant to false positive ones, the lower left quadrant to false negative ones, and the lower right quadrant to true positive predictions. the figure also comprises the absolute number of cases (#) and the probability of correct/incorrect predictions. the comparison of both options reveals that the prediction of the first quadrant (i.e. the correct prediction of no error occurring) and the fourth quadrant (i.e. the correct prediction of an error) are slightly better if the pupil diameter features are also considered. in addition, the probability of detecting an error (prediction of no error occurring but error occurs – third quadrant) can be improved (from 38% down to 32%) if pupil diameters are used as well. hence, considering pupil diameter features in addition to facial expression features slightly improves the overall score and the practical usage of the entire classification. the better performance can also be seen in the roc analysis. the roc curve of the classifier using all features can be seen above (figure 7). the roc curve of the classifier excluding the pupil diameter features is shown in figure 10. figure 10. roc curve for classifier excluding pupil diameter features. the roc curve area for the classifier that excludes pupil diameter features is 0.80, whereas the roc curve area for the classifier considering all features is 0.85 (see figure 7). for the purpose of this study, this combination of indicators resulting from online measurement can be considered an appropriate measure of emotional reactions. it is important to emphasise that we did not strive to distinguish between different emotions but were simply interested in any kind of emotional reaction. rq 4: is there a specific measure for the quality of learning from errors? as the computer game provides similar tasks of increasing difficulty, it is plausible to assume that test persons improve by time, trials and errors. as a crucial measure of the change in performance (i.e. learning), the addition of angles α and β of the learning curve was chosen (see figure 6). table 3 reveals the descriptive statistics of the test persons’ performance. table 4 descriptive statistics for learning. the test persons varied substantially in their performances. in total, four test persons scored positively, ten test persons scored negatively and five test persons scored zero. for this particular setting, the learning curve indicates if and how an individual improves or develops during the course of a computer game of increasing difficulty. it indicates successful and unsuccessful attempts and thus can be considered a measure of the quality of learning from errors during the game. 5. critical reflection on data quality and validity the study comprises two variables: learning from errors and emotional reactions. the discussion and critical reflection on data quality and validity will focus on each variable separately. finally, a reflection follows on the relevance of data generated by a computer game for gaining insight into learning from errors. 5.1 data on learning from errors the construal of learning from errors follows quite a specific approach: the quality of learning – indicated through a learning score – was constructed as individual development along several trial-and-error attempts within a regular jump-and-run computer game. the scores on learning appear to tend towards the negative side (see negative mean within table 3), which requires a careful interpretation and must not be confused with a decrease in knowledge or negative learning. on the one hand, and as described above, the angles that result in the learning score were deliberately chosen because they refer to moments when the game’s difficulty increased substantially (at levels 2 and 6). it is part of a regular performance to fail initially with an increase of difficulty and then to adapt to the new learning. such regular performance results in a negative turn in the learning curve. on the other hand, only those test persons could be considered for the analyses who were able finally to complete this level of increased difficulty. this implies that even the test person with the lowest learning score in the sample was able to master this task of increased difficulty – albeit while making the highest number of errors within the sample. this measurement of learning faces two major limitations. first, test persons failing to master level 6 within the limited time could not be included in the measurement of the learning curve because the relation of successes and failures within level 6 could be determined only if a test person completed this level successfully within the given 5 minutes. such a time-based cut-off results in a flawed learning curve for the last level. future research attempts may allow as much time as a test person needs to cope with the increased difficulty. second, very good test persons who master all challenges without any failures show a learning score of 0, because the learning score reflects individual development. test persons who perform consistently well and thus do not show any difference in their error and success rates between the different levels do not develop in the sense of the measurement applied here. a learning score of 0 can be considered a ceiling effect because the task was not challenging enough for these test persons. hence, the data on learning from errors here indicate individual development during the run of a computer game which presumed that test persons fail occasionally during the run of the game. individuals who performed constantly at the same (high or low) level received a learning score of 0. the learning score does not therefore provide information about the quality of performance but about the quality of individual development; that is the crucial aspect of learning from errors in the context of this setting and the important focus of the online data used here. of course, the data would provide additional potential for focusing on learning from errors by directly connecting incidents when a test person initially fails and later succeeds during the game. however, since the focus of the study was on exploring opportunities to predict errors before they occur, the decision was made to include as many data as possible for the machine training. considering combinations of initial failures and subsequent successes would have decreased the number of observed cases dramatically. in addition, there would also be alternative data available that reflect a test person’s quality of performance (e.g. score, time, number of successes or failures), but such information does not tell us anything about learning from errors. 5.2 data on emotional reactions without doubt, the face is an important means of expressing emotional reaction. the recorded video data made it possible to identify a variety of aus that indicate emotional reactions. these indicators are based on analyses of fixpoints based on computer algorithms. in real face-to-face communications, the human mind is capable of processing the fine nuances of facial expressions unconsciously in order to interpret reactions appropriately. however, observational studies with human observers would probably not be able provide those kinds of data reliably. hence, the kind of facial analyses provided in this study can be considered to be an advance. this study – as already mentioned – did not aim at differentiating between different kinds of emotional reaction, however. this can be seen as a limitation, but for the context of the research questions raised here, this limitation does not have an impact on the data quality either for analysing learning from errors or for predicting errors because negative emotional reactions (e.g. fear) can limit human behaviour and situational perception in a similar way to positive reactions (e.g. euphoria). as the results reveal, the features considered here for developing a classifier for errors are sufficient to predict an error before it occurs – in the context of the video game that was part of the investigation. the choice of features resulted from an exploratory procedure of data mining that searched for relevant patterns in the training set and then confirmed the choice in the test set of the sample. it is difficult to judge if the resulting values of a 70% correct error prediction, 80% correct prediction of non-occurring errors and an roc curve of 0.85 are sufficient to justify applying these instruments in the technical context of man-machine-interaction. the acceptability of values probably depends on the range of application (e.g. high security areas or back-up systems). however, given that the classifier quite often predicted an error even though no error occurred (20%, in total > 4,500 cases, see figure 8) it would still seem inappropriate for application in real contexts. one explanation might be that test persons showed emotional reactions but still managed to avoid making an error in the video game. further machine-learning procedures may help to reduce this kind of wrong prediction. 5.3 relevance of data generated through a computer game learning from errors during a computer game may be seen as very different from learning from errors in real life, particular workplace settings. indeed, learning from errors at work occurs within an organisational error culture (putz, schilling, & kluge, 2012) that cannot be transferred to a laboratory setting. however, learning during work and learning during a computer game share important similarities: in both situations, learning occurs as a by-product of the intention to reach a goal. in both cases, there is no curriculum and no instruction that guides the acting but simply the intention to reach the goal successfully. learning from errors in real-life contexts varies widely across occasions and individuals. hence, it is difficult to identify the general characteristics of learning from errors empirically because situations are not comparable enough. a computer game, by contrast, provides stable conditions across all test persons and makes it possible to observe learning processes by looking at the tasks at hand. it should therefore provide enough insights into general processes related to learning from errors that can claim relevance to learning from errors in real-life contexts. 6. conclusions first, the findings reveal on the one hand that non-specified emotional reactions antecede test persons’ failures to overcome an obstacle or avoid an attack. second, the findings reveal that test persons who show a beneficial pattern of emotional reactions – that is, a higher extent of engagement and a positive valence – achieve higher learning scores than do test persons with an awkward pattern of emotional reactions. hence, they confirm oser’s theory about the importance of emotions for learning from errors (oser & spychiger, 2005; oser & volery, 2012). for the field of learning from errors in working life, this finding reveals the importance of the organisational culture (schein, 2004) and team climate in workplaces (edmondson, 1999). these require the appropriate social conditions that accept emotional reactions without generating disadvantages for the failing person. conditions that fail to provide such an environment tend to provoke the concealment, disregard and thus repetition of errors (marsick & watkins, 2003). on the other hand, given the extent to which the results fit into theoretical patterns, the findings also indicate that the tested way of gathering online data is a promising one. certainly, the procedures applied in this study have potential for improvement, as discussed above. as long as there are no repeat studies applying the same or similar measures, we do not know much about the validity of such measures. the experiences from this study suggest the need to repeat it under improved circumstances in two respects. first, a time limit is to be avoided; all test persons should receive as much time as they require to master level 6 of this game – as long as it appears reasonable to expect that each test person is able to cope with the difficulties of level 6 within a reasonable time. second, a repeat of this study should use a larger sample that would make it possible to connect initial failure with subsequent success directly in order to permit a focus on and analysis of concrete cases of learning from errors. in addition, a repeat of this study with a larger sample would provide an appropriate set of data to test the quality of the classifier found herein. keypoints emotional reactions and cognitive load precede errors. measures of automated face recognition generate data coherent with literature on emotions. the combination of face recognition and eye-tracking data can be used to predict errors before they occur. online measurements confirm keypoints of the theory of learning from errors. references baltrusaitis, t., robinson, p., & morency, l. p. (2016). openface. an open source facial behavior analysis toolkit. in wacv (ed.), 2016 ieee winter conference on applications of computer vision (pp. 1-10). lake placid: ieee. doi: 10.1109/wacv.2016.7477553 bauer, j., & mulder, r. h. (2013). engagement in learning after errors at work: enabling conditions and types of engagement. journal of education and work, 26(1), 99–119. beatty, j. (1982). task-evoked pupillary responses, processing load, and the structure of processing resources. psychological bulletin, 91(2), 276-292. congalton, r. g. (1991). a review of assessing the accuracy of classifications of remotely sensed data. remote sensing of environment,37(1), 35-46. de jong, t. (2010). cognitive load theory, educational research, and instructional design: some food for thought. instructional science , 38(2), 105-134. edmondson, a. (1999). psychological safety and learning behavior in work teams. administrative science quarterly,44(2), 350-383. ekman, p., & friesen, w. (1978). facial action coding system: a technique for the measurement of facial movements . sunnyvale: consulting psychologist press. fawcett, t. (2006). an introduction to roc analysis. pattern recognition letters,27(8), 861-874. friedman, j. h. (2001). greedy function approximation: a gradient boosting machine. the annals of statistics, 29(5), 1189-1232. gartmeier, m., bauer, j., gruber, h., & heid, h. (2008). negative knowledge: understanding professional learning and expertise. vocations and learning: studies in vocational and professional education, 1 (2), 87–103. harteis, c., & bauer, j. (2014). learning from errors at work. in s. billett, c. harteis & h. gruber (eds.), international handbook of research in professional and practice-based learning (pp. 699-732). dordrecht: springer academics. harteis, c., bauer, j., & gruber, h. (2008). the culture of learning from mistakes: how employees handle mistakes in everyday work. international journal of educational research, 47(4), 223–231. hetzner, s., gartmeier, m., heid, h., & gruber, h. (2011). error orientation and reflection at work. vocations and learning: studies in vocational and professional education , 4(1), 25-39. hofman, d. a., & frese, m. (2011). errors, error taxonomies, error prevention, and error management: laying the groundwork for discussing errors in organisation. in d. a. hofmann & m. frese (eds.), errors in organisations(pp. 1–43). london: routledge. holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2011). eye tracking. a comprehensive guide to methods and measures. oxford: oxford university press. jaber, m. y. (ed.). (2016). learning curves: theory, models, and applications. boca raton: crc press. krejtz, k., duchowski, a. t., niedzielska, a., biele, c., & krejtz, i. (2018). eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. plos one,13(9), e0203629. laeng, b., sirois, s., & gredebäck, g. (2012). pupillometry. a window to the preconscious? perspectives on psychological science, 7(1), 18-27. marsick, v. j., & watkins, k. e. (2003). demonstrating the value of an organization’s learning culture: the dimension of the learning organization questionnaire. advances in developing human resources, 5 (2), 132-151. mcduff, d., el kaliouby, r., & picard, r. w. (2015, september). crowdsourcing facial responses to online videos. in ieee (ed.), affective computing and intelligent interaction (acii), 2015 (pp. 512-518). piscataway township: ieee. olson r.s., urbanowicz r.j., andrews p.c., lavender n.a., kidd l.c., & moore j.h. (2016). automating biomedical data science through tree-based pipeline optimization. in g. squillero & p. burelli (eds.), applications of evolutionary computation. evoapplications 2016. lecture notes in computer science (pp. 123-137). cham: springer. oser, f., müller, s., obex, t., volery, t., & shavelson, r. j. (2018). rescue an enterprise from failure: an innovative assessment tool for simulated performance. in o. zlatkin-troitschanskaia, m. toepper, h. a., c. lautenbach & c. kuhn (eds.), assessment of learning outcomes in higher education(pp. 123-144). springer, cham. oser, f., näpflin, c., hofer, c., & aerni, p. (2012). towards a theory of negative knowledge (nk): almost-mistakes as drivers of episodic memory amplification. in j. bauer & c. harteis (eds.), human fallibility. the ambiguity of errors for work and learning (pp. 53–70). dordrecht: springer. oser, f., & obex, t. (2015). gains and losses of control: the construct “sense of failure” and the competence to “rescue an enterprise from failure”. empirical research in vocational education and training, 7(1), 3. oser, f., & spychiger, m. (2005). lernen ist schmerzhaft. beltz: weinheim. oser, f., & volery, t. (2012). "sense of failure" and "sense of success" among entrepreneurs: the identification and promotion of neglected twin entrepreneurial competencies . bern: skbf. putz, d.,schilling, j., & kluge, a. (2012). measuring organizational climate for learing from errors at work. in j. bauer & c. harteis (eds.), human fallibility(pp. 107-123). dordrecht: springer. rybowiak, v., garst, h., frese, m., & batinic, b. (1999). error orientation questionnaire (eoq): reliability, validity, and different language equivalence. journal of organizational behavior, 20, 527-547. schein, e. h. (2004). organizational culture and leadership. san francisco: jossey-bass. sweller, j. (1994). cognitive load theory, learning difficulty, and instructional design. learning and instruction, 4(4), 295-312. szulewski, a., kelton, d., & howes, d. (2017). pupillometry as tool to study expertise in medicine. frontline learning research, 5(3), 55-65. wolf, k. (2015). measuring facial expression of emotion. dialogues in clinical neuroscience,17(4), 457-462. yelle, l. e. (1979). the learning curve: historical review and comprehensive survey. decision sciences,10(2), 302-328. zaichkowsky, j. l. (1994). the personal involvement inventory: reduction, revision, and application to advertising. journal of advertising, 23(4), 59-70. baltrusaitis, t., robinson, p., & morency, l. p. (2016). openface. an open source facial behavior analysis toolkit. in wacv (ed.), 2016 ieee winter conference on applications of computer vision (pp. 1-10). lake placid: ieee. doi: 10.1109/wacv.2016.7477553 bauer, j., & mulder, r. h. (2013). engagement in learning after errors at work: enabling conditions and types of engagement. journal of education and work, 26(1), 99–119. beatty, j. (1982). task-evoked pupillary responses, processing load, and the structure of processing resources. psychological bulletin, 91(2), 276-292. congalton, r. g. (1991). a review of assessing the accuracy of classifications of remotely sensed data. remote sensing of environment,37(1), 35-46. de jong, t. (2010). cognitive load theory, educational research, and instructional design: some food for thought. instructional science , 38(2), 105-134. edmondson, a. (1999). psychological safety and learning behavior in work teams. administrative science quarterly,44(2), 350-383. ekman, p., & friesen, w. (1978). facial action coding system: a technique for the measurement of facial movements . sunnyvale: consulting psychologist press. fawcett, t. (2006). an introduction to roc analysis. pattern recognition letters,27(8), 861-874. friedman, j. h. (2001). greedy function approximation: a gradient boosting machine. the annals of statistics, 29(5), 1189-1232. gartmeier, m., bauer, j., gruber, h., & heid, h. (2008). negative knowledge: understanding professional learning and expertise. vocations and learning: studies in vocational and professional education, 1 (2), 87–103. harteis, c., & bauer, j. (2014). learning from errors at work. in s. billett, c. harteis & h. gruber (eds.), international handbook of research in professional and practice-based learning (pp. 699-732). dordrecht: springer academics. harteis, c., bauer, j., & gruber, h. (2008). the culture of learning from mistakes: how employees handle mistakes in everyday work. international journal of educational research, 47(4), 223–231. hetzner, s., gartmeier, m., heid, h., & gruber, h. (2011). error orientation and reflection at work. vocations and learning: studies in vocational and professional education , 4(1), 25-39. hofman, d. a., & frese, m. (2011). errors, error taxonomies, error prevention, and error management: laying the groundwork for discussing errors in organisation. in d. a. hofmann & m. frese (eds.), errors in organisations(pp. 1–43). london: routledge. holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2011). eye tracking. a comprehensive guide to methods and measures. oxford: oxford university press. jaber, m. y. (ed.). (2016). learning curves: theory, models, and applications. boca raton: crc press. krejtz, k., duchowski, a. t., niedzielska, a., biele, c., & krejtz, i. (2018). eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. plos one,13(9), e0203629. laeng, b., sirois, s., & gredebäck, g. (2012). pupillometry. a window to the preconscious? perspectives on psychological science, 7(1), 18-27. marsick, v. j., & watkins, k. e. (2003). demonstrating the value of an organization’s learning culture: the dimension of the learning organization questionnaire. advances in developing human resources, 5 (2), 132-151. mcduff, d., el kaliouby, r., & picard, r. w. (2015, september). crowdsourcing facial responses to online videos. in ieee (ed.), affective computing and intelligent interaction (acii), 2015 (pp. 512-518). piscataway township: ieee. olson r.s., urbanowicz r.j., andrews p.c., lavender n.a., kidd l.c., & moore j.h. (2016). automating biomedical data science through tree-based pipeline optimization. in g. squillero & p. burelli (eds.), applications of evolutionary computation. evoapplications 2016. lecture notes in computer science (pp. 123-137). cham: springer. oser, f., müller, s., obex, t., volery, t., & shavelson, r. j. (2018). rescue an enterprise from failure: an innovative assessment tool for simulated performance. in o. zlatkin-troitschanskaia, m. toepper, h. a., c. lautenbach & c. kuhn (eds.), assessment of learning outcomes in higher education(pp. 123-144). springer, cham. oser, f., näpflin, c., hofer, c., & aerni, p. (2012). towards a theory of negative knowledge (nk): almost-mistakes as drivers of episodic memory amplification. in j. bauer & c. harteis (eds.), human fallibility. the ambiguity of errors for work and learning (pp. 53–70). dordrecht: springer. oser, f., & obex, t. (2015). gains and losses of control: the construct “sense of failure” and the competence to “rescue an enterprise from failure”. empirical research in vocational education and training, 7(1), 3. oser, f., & spychiger, m. (2005). lernen ist schmerzhaft. beltz: weinheim. oser, f., & volery, t. (2012). "sense of failure" and "sense of success" among entrepreneurs: the identification and promotion of neglected twin entrepreneurial competencies . bern: skbf. putz, d.,schilling, j., & kluge, a. (2012). measuring organizational climate for learing from errors at work. in j. bauer & c. harteis (eds.), human fallibility(pp. 107-123). dordrecht: springer. rybowiak, v., garst, h., frese, m., & batinic, b. (1999). error orientation questionnaire (eoq): reliability, validity, and different language equivalence. journal of organizational behavior, 20, 527-547. schein, e. h. (2004). organizational culture and leadership. san francisco: jossey-bass. sweller, j. (1994). cognitive load theory, learning difficulty, and instructional design. learning and instruction, 4(4), 295-312. szulewski, a., kelton, d., & howes, d. (2017). pupillometry as tool to study expertise in medicine. frontline learning research, 5(3), 55-65. wolf, k. (2015). measuring facial expression of emotion. dialogues in clinical neuroscience,17(4), 457-462. yelle, l. e. (1979). the learning curve: historical review and comprehensive survey. decision sciences,10(2), 302-328. zaichkowsky, j. l. (1994). the personal involvement inventory: reduction, revision, and application to advertising. journal of advertising, 23(4), 59-70. smets et struyven publication frontline learning research vol.6 no. 2 (2018) 66 80 issn 2295-3159 aligning with complexity: system-theoretical principles for research on differentiated instruction wouter smetsa, katrien struyvenb c akarel de grote university college, belgium b vrije universiteit brussel, belgium c uhasselt, belgium article received 23 november 2017 / revised 17 july/ accepted 31 august/ available online 1 october abstract differentiated instruction is a teaching philosophy and practice that deals with responding appropriately to student heterogeneity. in order to gain deep understanding of this complex concept, research methodology is challenged to use appropriate data collection and data analysis. the aim of this paper is to reflect on how system theory may be used as ontological and epistemic grounding for research on differentiated instruction. three challenges for this research are presented: to focus on the interplay between the individual and complex collective behaviour; to acknowledge the external influences in research design; and to describe patterns of non-linear causality and emergence. three design principles for research on differentiated instruction are presented to address these challenges: organic design, interaction and reflectivity. by using these principles, we believe research on differentiated instruction would be aligned with the theoretical foundations of the concept. keywords: differentiated instruction – system theory – complexity – emergence – nestedness – forest-tree perspective info. corresponding author mail: wouter.smets@kdg.be doi: https://doi.org/10.14786/flr.v6i2.340 1 the methodological need for a forest-tree perspective in an increasingly diverse world the call for teachers to provide instruction that caters for different learning needs sounds ever clearer. teachers are expected to design instruction that takes diversity among students seriously (schleicher, 2013). research over the last years has taken a lot of effort to find and study new ways of teaching that respond to diversity in heterogeneous classes (gay, 2002; ware, 2006). a central challenge for teaching in heterogeneous classes is to adapt teaching strategies which are designed, organised and assessed at the classroom level, taking into account the apparent student diversity. theoretically this is described by bronfenbrenner’s (1977) analytical stance that calls for a naturalistic perspective on psychological research. bronfenbrenner argues that, in order to understand the complexity of education, phenomena must be studied from diverse perspectives. his ecology of human development is defined as: “the scientific study of the progressive, mutual accommodation, throughout the life span, between a growing human organism and the changing immediate environments in which it lives, as this process is affected by relations obtaining within and between these immediate settings, as well as the larger social contexts, both formal and informal, in which the settings are embedded.” (p. 514). bronfenbrenner uses different systemic levels to describe interactions between individuals and their surroundings. he defines the microsystem as a “complex of relations between the developing person and […] the immediate setting containing that person” (p. 514). phenomena related to students’ individual experiences are thus described as the individual level within a learning ecosystem, whereas phenomena related to dynamics among (groups of) students are described as the micro-level of the learning ecosystem. also interactions between a teacher and students are situated at the micro-systemic level. other aspects which may influence learning (such as school culture or leadership) are described as the mesoor macro-level. scholarly research is challenged by bronfenbrenner’s approach to take different perspectives in order to study the complexity of the phenomena: the study of the phenomena with an exclusive focus on one systemic level would thus be seen as reductionist. a metaphor proposed by jacobson and kapur (2012) describes the scope of the challenge to teach in heterogeneous classes. they suggested that, in order to increase our understanding on learning environments, not solely individual phenomena or phenomena at a collective level must be studied, but rather the combination of both. metaphorically, they suggest, not solely a forest perspective must be taken, nor solely a tree perspective, but rather a forest-tree perspective. with regard to teaching in heterogeneous classes, jacobson and kapur’s (2012) metaphor stresses the vital role of distinguishing individual perspectives of particular students from the collective behaviour within the microsystem. hence, also within the microsystem, it should not be assumed that all students act and react similarly to changing conditions. the forest-perspective addresses the microsystem of a particular class. meanwhile, a focus on the learning of individual students would be seen as a tree-perspective. figure 1: the forest-tree perspective: multiple individuals within a learning ecosystem in this reflective study the concept of differentiated instruction takes a central place, as this concept tends to merge the perspective of teaching to heterogeneous groups, the micro-level of the learning ecosystem, with the perspective of the learning of heterogeneous individuals. it essentially takes heterogeneity of classes as a given fact and, therefore, assumes that each teaching process needs to be not only based on the targeted learning goals, but also on the apparent student diversity. notwithstanding the great potential of differentiated instruction for the sciences of teaching and learning, we believe that research on it is faced with important challenges. a key methodological challenge for educational science is how to design research which merges the forest and tree perspectives, to a forest-tree perspective. at present this forest-tree perspective is largely absent in research on differentiated instruction. thus, bronfenbrenner’s naturalistic approach remains a challenge for this research. much research on differentiated instruction is currently being conducted. in the next section we provide more details of the construct. two main research focuses may be discerned in research on the topic: first some studies focus on the role of teachers in a differentiated classroom. these studies, as a consequence, do not address the role of students in a differentiated classroom, nor do they focus on teacher-student interactions (de neve, devos, & tuytens, 2015). other studies on differentiated instruction focus on the learning outcomes of (groups of) students for a particular type of strategy (e.g. chen, yang, & hsiao, 2016; van klaveren, vonk, & cornelisz, 2017). we do not contest the added value of such a perspective on differentiated instruction, however it is argued in this study, that both these approaches are reductionist. the epistemic assumptions that underpin such approaches focus on parts of the teaching process instead of documenting interactions within the microsystem throughout the teaching process. it is argued in this study that the central idea of differentiated instruction lies in the responsive act(s) of a teacher which links the chosen teaching strategy to the given heterogeneity of the group. in order to grasp the full complexity of this idea, we argue for system-theoretical epistemology and hence for the use of research designs which are methodologically aligned with it. insofar as system-theoretical concepts are already in use in various fields of educational research, we discuss their usefulness ontologically, epistemologically and methodologically to underpin research on differentiated instruction. to do so, three design principles are proposed at the end of this study: organic design, interaction and reflectivity. in what follows, more details are provided on the concept of differentiated instruction in order to be able to reflect on the methodological challenges the concept poses to research. 2. differentiated instruction given the fundamental characteristics of differentiated instruction, it may be noticed that the concept entails difficulties for scholars trying to study it. differentiated instruction has been proposed as a teaching philosophy and practice that intends to maximise learning outcomes of all students in a class by responding to students’ different learning needs, namely their readiness level, interest or learning profile (tomlinson, 2000). the added value of tomlinson’s approach is that it merges a microsystem-perspective of heterogeneous classes with the perspective of individual students with particular characteristics. hence, she implicitly uses the aforementioned forest-tree perspective: teachers are supposed to teach a heterogeneous class (the forest), however their instructional design is supposed to be adaptive, and thus based on students’ individual perspectives (the tree). in doing so, differentiated instruction encompasses other strategies that focus on student heterogeneity, such as cultural responsive teaching (gay, 2002) or inclusive education (schumm & vaughn, 1995) that focus on specific types of student heterogeneity. tomlinson accepts heterogeneity as a given fact without detailing the origins of it. differentiated instruction rests on the constant responsiveness of the teaching based on these perpetual changing characteristics. tomlinson’s work at the beginning of the 21st century (tomlinson, 2000, 2001; tomlinson et al., 2003) pioneered instructional design to a broad range of student differences. a lot of enthusiasm has arisen for her practice-oriented publications. the idea is not to address any target-specific type of diversity such as learning disabilities or students at risk of academic failure, rather differentiated instruction intends to foster learning by giving the proper attention to heterogeneity in the broad sense. students’ readiness levels, interests and learning profiles are the three large, overlapping and constantly changing categories of heterogeneity used to adapt instruction. as differentiated instruction stresses the role of tailoring instruction to students’ characteristics, tomlinson’s ideas were initially picked up primarily by scholars in the activity theory tradition (shabani, khatib, & ebadi, 2010; wass & golding, 2014). yet now, the concept of differentiated instruction is also commonly applied in more cognitive scholarly approaches which focus, for instance, on the learning effect of differentiated instruction (e.g. deunk, doolaard, smale-jacobse, & bosker, 2015; prast, weijer-bergsma, kroesbergen, & luit, 2015). following tomlinson, teachers must carefully consider which teaching strategy is appropriate at which particular moment for a particular group of students. multiple teaching strategies are used, based on flexible grouping, to build learning paths respecting the unique group composition. a cyclical approach to teaching is proposed in which it is the teacher’s responsibility to engage in ongoing assessment and to adapt instructional design based on that. it is a ‘key principle that assessment and instruction are inseparable’ for differentiated teaching (tomlinson, 2000, p. 20). while practising differentiated instruction, teachers use the output of a learning sequence as the starting point of a subsequent one. dependent on students’ readiness, teachers build on prior knowledge, or on previously acquired strategies and schemes. also, teachers may respond to differences in students’ interests or learning profiles. this relationship between input and output is a fundamental characteristic of the instructional design of differentiated instruction. it marks the cyclical – responsive – character of the approach. further, we elaborate on how this cyclical character of differentiated instruction challenges research design. dweck’s (2008) growth mindset theory is often proposed as an essential characteristic that guides teaching in a differentiated classroom (coubergs, struyven, vanthournout, & engels, 2017). this theory stresses the importance of an incremental theory of intelligence or, in other words, the idea of the potential growth of students’ talents in order to maximise learning outcomes (rattan, savani, chugh, & dweck, 2015). teachers responding to the learning needs of their students will need a growth mindset in order to fulfil the learning potential of all students. teaching in a differentiated classroom seems to be closely tied to such a growth mindset, as a teacher needs to see and develop the student’s growth potential. moreover, also students’ growth mindsets need to be developed, in order to fulfil their full learning potential. in summary, differentiated instruction stands for responding to students’ heterogeneity based on a cyclical process of formative assessment and observation. the principles of growth mindset are used to guide teachers’ practice in a differentiated classroom. building on these intertwined characteristics we believe that scholarly study of the concept cannot solely rely upon classic reductionist empirical epistemology. in the following paragraph, we detail insights of systems theory, which provides a useful conceptual framework for ground research on differentiated instruction. further we elaborate how systems theory can be used to build the needed ontological and epistemic foundations to study the concept of differentiated instruction. 3 systems theory and complexity systems theory has its roots in physics and environmental sciences (von bertalanffy, 1968). for decades it has taken effort to use concepts and metaphors to describe comparable patterns across different scientific fields (luhmann, 2013 ). essential for systems theory is the notion of open systems, which stands opposed to closed systems. open systems have interactions with external surroundings. often open systems are described as complex in the sense that the properties of a system as a whole cannot necessary be derived from the properties of individual components within a system (prigogine, 1980). by conceptualising a classroom as a learning ecosystem (see §1) we are able to understand better the complexity in education (bakker & montesano montessori, 2016). the ‘open’ character of this systemic approach lies in acknowledging its interactions between the microsystem and other systemic levels. systems theory has built a reputation for helping understanding counterintuitive phenomena. its ambition is to fully acknowledge the complexity of phenomena. by doing so it stands in opposition to more reductionist scientific approaches which aim at more specific insights (sawyer, 2002). morrison (2008) stresses it is vital to see complexity theory as a collection of ideas, metaphors and concepts to describe and not to prescribe educational phenomena. thus, with this reluctance for the prescriptive ambitions, systems theory seems to stand in a postmodernist ontological and epistemological tradition. it may therefore be criticised as relativist (morrison, 2008). we agree that some system-theoretical notions (e.g. emergence or nonlinearity, see further) make insights into causal relationships hard to achieve and, as a consequence, limit the prescriptive ambitions of educational science. however, describing patterns of change may in itself be a sufficient added value to gain deeper understanding of teaching and learning in a differentiated classroom, and thus to legitimise a system-theoretical perspective. table 1 systems theory compared to modernistic approaches jacobson, kapur, and reimann (2016) proposed a framework that conceptualises the role of complexity in the learning sciences: the complex systems conceptual framework of learning (cscfl). this framework intends to reframe the traditional situated versus cognitive debate among educational scholars. for decades scholars have discussed ontological and epistemological issues on how learning processes must be interpreted (derry & steinkuehler, 2003). the discussion on the primacy of the cognitive (anderson, reder, & simon, 1996) or situated perspectives (engeström, 2001; greeno, 1997) has so far not yielded a sustainable consensus (jacobson et al., 2016). the cscfl adds a new perspective to this debate based on systems theory: it intends to harmonise both views, building on notions of complexity science. two central domains of the framework are: (1) complex collective behaviour in systems, and (2) behaviour of individual agents in systems. each of these domains is characterised by concepts which illustrate the complex character of learning. complex collective behaviour of agents or elements within a system follows the idea of self-organisation or emergence. this means that dynamics within and between systems are sensitive to initial conditions and are nonlinear. the notion of emergence is pivotal for systems theory, it is used to describe patterns of change. sawyer described it as an ‘attempt to bridge the micro-macro divide’ (2005, p. 210). using this concept of emergence, systems theory describes counterintuitive phenomena. resnick (1996) famously referred to the emergence of traffic jams or termite constructions. “strong emergence presents a direct challenge to determinism (the idea that given one set of circumstances there is only one logical outcome). with strong emergence, what emerges is always radically novel” (osberg & biesta, 2007, p. 34). it gives an insight into how non-linear patterns of interaction influence the relationships among individual agents of systems and how complex collective behaviour emerges out of it. these patterns are then fed by positive or negative feedback loops which result in these sometimes unexpected outcomes. essentially, jacobson et al. (2016) describe learning as not ontologically determined something that is rather as emergent. understanding learning and teaching through this prism evidently challenges research methodology. the idea of nestedness (burns & knox, 2011) is used to describe interactions between individual agents within systems and other systemic agents at, for instance, the mesoor macro-level. interactions within a classroom or interactions with external influences are, therefore, crucial to grasp why dynamics may be different among systems. it refers to the common idea of the contextual nature of learning (greeno, 1997). however, jacobson et al. (2016) use the term nestedness, which is more common in systems theory across many disciplines. an essential consequence of system theory is that teaching and learning must be understood as interaction between elements or agents in a system. this organic view sees the interactions in a class as fundamentally related. this contrasts with a mechanistic world view in which all elements of a system can be understood separately. a major question is therefore whether elements of the teaching process should be studied separately or not. following systems theory, agents within a system are in continuous interaction with the system in which they are active. it is therefore crucial to see the role of complex collective behaviour and of individual agents in systems. if learning must be interpreted as a complex phenomenon for which these characteristics are genuinely valid, this poses a tremendous challenge for our concept of the role of a teacher in it. davis and sumara (2007) described the shift from a mechanical view on teaching to a more organic one. damsa and jornet (2016) comparably argued to reframe learning as ‘collective achievements of whole ecosystems’ (p. 39). some distinctive properties, compared with mechanical management, are that knowledge in organic systems is said to be structured anywhere in the system, compared to top-down knowledge structures. communicative relations are horizontal rather than vertical in organic systems. and individual tasks within organic systems are said to be continuously adjusted and refined, compared to mechanical structures where tasks are specialised and differentiated. resnick (1996) argued for a decentralised concept of social institutions in order to account for self-organisation and emergence. ‘from the perspective of complexity multidimensional relationships and dynamic interactions among agents and elements, rather than predictable linear effects, are responsible for patterns and phenomena’ (cochran-smith, ell, ludlow, grudnoff, & aitken, 2014, p.5). davis and sumara (2007) comparably argue to reposition the role of teachers and teaching: teaching is not to be understood any more by what a teacher does or intends. tomlinson’s ideas on the responsiveness of teachers and reliance on formative assessment as grounds for differentiated instruction align with this repositioning of teaching and learning. differentiated instruction is here not a linear type of instructional design initiated by the teacher. the responsiveness of differentiated instruction is characterised by a decentralised concept of teaching. it accounts for the nestedness of learning and for patterns of emergence at a level of complex collective behaviour. moreover, it accounts for interactions within and across (open) systems. 4 system-theoretical grounding for research on differentiated instruction the concept of differentiated instruction is multifaceted. it invites teachers to adopt a growth mindset. it involves formative assessment to gather data on student heterogeneity and it is essentially characterised by the act of responding to these differences. system theory provides a useful theoretical framework to understand more deeply the complexity of differentiated instruction. in particular, we believe the notions of nestedness and emergence (combined with non-linearity) are of particular value for the study of differentiated instruction. in the following paragraph we link the fundamental properties of the concept of differentiated instruction to systems theory. we argue that, to fully acknowledge the complexity of differentiated instruction, empirical data are needed which are grounded on systems theory. in this section, three methodological challenges are presented which could increase our understanding of teaching in a differentiated classroom. although the challenges partially overlap, we present them in three different sections. 4.1 focus on the interplay between the individual and complex collective behaviour generalisation is often thought to be one of the main quality criteria in educational sciences (hammersley, 1997). with regard to teaching effectiveness, many educational scholars tend to generalise the validity of their ideas (cohen, manion, & morrison, 2007). cochran-smith et al. (2014) have noticed that these claims of generalisation in the educational sciences are challenged by complexity science. building on this claim, we add that, in a differentiated classroom generalised claims cannot account for all deviant profiles of individuality. therefore, scholarly research on differentiated instruction is challenged to describe the interplay between individual behaviour and complex collective behaviour. an exclusive focus on one type of agent of the differentiated classroom does not permit study of the interplay between all systemic agents. empirical data with an exclusive focus on the individual level (the role of teachers or students in a differentiated classroom) or on the microsystem-level (instructional design) may have an important added value for the debate on differentiated instruction. however, they do not permit a comprehensive insight into the complexity of it. to do so, the interplay between individual agents and complex collective behaviour within systems also needs to be studied. for differentiated instruction, this idea would imply the systematic study of the responsive act of teaching in a heterogeneous class, meaning analysing the impact of it at both studentand teacher-level. such studies would add substantially to our understanding of the responses within a differentiated teaching process. to monitor both the individual impact on students and the collective behaviour of a group of students are therefore important challenges for research on differentiated instruction. rarely, however, do studies seek to understand the link between phenomena at an individual level and the management of collective behaviour, which essentially is the teaching process in a differentiated class. as differentiated instruction essentially intends to maximise learning opportunities for all students in a class by taking their individual characteristics into account, a randomly composed research sample that undergoes a homogenised treatment, or that aims at reaching a common goal, is exactly the opposite of what the idea of differentiated instruction is. differentiated instruction essentially intends to take into account, not only the average student in a class (the microsystem-level), but also acknowledges deviant or changing profiles and characteristics of individual students (the individual perspective). this core characteristic of differentiated instruction makes scholarly research about it standing at odds with classic randomised research designs which aim at generalised research conclusions. methodologically these assertions bring us to plea for empirical data that inform on the interactions between agents in a learning system. with its responsive approach, differentiated instruction stands in an interactionist tradition (mead, 1934). building on blumer (1973), a long research tradition has focused on studying interactions in detail in order to understand the relationship between the social and the individual. classic interactionist methodology that documents 1-on-1 interactions is now critiqued for not (fully) accounting for the complexity of the interactions (e.g. sawyer, 2005). from a systems perspective, interactions between all systemic agents must be documented in order to gain deep insight into the interplay between the individual and complex collective behaviour. with regard to differentiated instruction, such an approach should lead to documenting, at least, the interactions among students and describing teacher-student interactions in a differentiated classroom. 4.2 interdependence with other systems the actual heterogeneity of a class changes throughout the year. students who drop out may influence opportunities for collaborative learning. moreover, newly arrived students may lack sufficient prior knowledge in order to participate in learning activities. it also occurs that less visible changes in the class influence the actual learning process when motivation, or other personal characteristics, change as a result of external influences. if a student experiences anything interesting or emotional in his or her personal life (e.g. a trip to a foreign country, the death of relative, an unusual encounter) this could, in a differentiated classroom, be a relevant take-off point for instruction. this nestedness of differentiated instruction is described by tomlinson as follows: “teachers who care about their students as individuals accept the difficult task of trying to identify the interests students bring to the classroom with them.” (tomlinson, 2001, p. 53). empirical methodology that intends to control data collection and data analysis cannot account for this nestedness. classic research designs make an effort to control the variables they study, and hence to wipe out the effects of external influences (sansone, morf, & panter, 2004). from a mechanistic point of view, these influences are apt to be avoided or neglected. by controlling for external bias, external validity of research conclusions may be increased (tipton, 2013). it needs to be questioned whether the idea of (semi-)controlled research conditions can result in external validity towards situations where external variables will have considerable influence. it is exactly for the lack of external validity of experimental research that bronfenbrenner’s naturalistic approach (1977) to research was grounded. a systemic perspective on data collection intends to account for externals instead of neglecting them. it could be assumed that research conclusions have stronger external validity when external influences are not ignored but seen as part of the complex reality. in system-theoretical terminology this would be described as accounting for the nestedness of systemic patterns. the philosophy of differentiated instruction implies that influences at the mesoor macro-level on the teaching process are taken into account. therefore it would be useful to build on the tradition that argues for the relevance of this stance. cultural-historical activity theory argues that this addresses the challenges and possibilities of inter-organisational learning (engeström, 2001). moreover a multi-systemic approach intends to stress cross-boundary relationships of agents in educational systems (akkerman & van eijck, 2013; bronkhorst & akkerman, 2016). in order to obtain a deep understanding of differentiated instruction, it must be targeted to describe concisely the nestedness of teaching and learning in a differentiated classroom. methodological choices for empirical research on the matter need to be informed by this nestedness. building on this argument some scholars ask for increased attention for the researcher’s role of reflexive methodology (alvesson & sköldberg, 2009) to acknowledge this nestedness. 4.3 non-linearity and emergence the character of differentiated instruction implies that teachers respond to student diversity. the cyclical process of teaching, learning, adapting teaching and further learning is essential to it. if teaching is not (only) to be seen as an activity with linear causal consequences, but as an agent (the teacher, the student) within a system that adds to the emergence of complex collective behaviour (such as learning and interactions between learners), then research is challenged to study the dynamics of the relationship between these two perspectives. these patterns are non-linear, they are cyclical in the sense that feedback mechanisms are at work. to illustrate the importance of non-linearity for differentiated instruction, we elaborate here on the role of mindset theory. if, indeed, a growth mindset is a central concept that facilitates the application of differentiated instruction both for teachers and students, then the influence of this factor should always be taken into account for studies on differentiated instruction. many studies have described spectacular results based on growth mindset interventions (blackwell, trzesniewski, & dweck, 2007; dweck, 2015; rattan et al., 2015). linear mechanistic research interventions have studied whether a growth mindset affects learning by isolating the growth mindset-factor from other motivational components in the learning process. in a systems-theoretical perspective on research methodology, however, this factor may not be isolated from other factors. a systemic approach will, in consequence, not study whether there is an effect of a growth mindset, but describe how growth mindset affects learning. it could, for instance, be hypothesised that feedback mechanisms between a growth mindset and goalsetting or tenacity of students result in the emergence of learning. differently stated: from a systems-theoretical perspective this influence of growth mindset on other aspects of the teaching process must not be isolated, rather must it be studied how feedback-mechanisms influence the outcomes of the learning process. if research is designed to be static or linear it is restricted in its scope to describe patterns of emergence between systemic agents. systems theory, however, contrasts with linear-causal thinking, given the assumption that, as a result of feedback mechanisms, the outcome of processes are thought to be unpredictable (brown, 2016; osberg & biesta, 2007). therefore, also patterns of non-linear causality need to be studied in order to understand fully the complexity of differentiated instruction. to incorporate feedback mechanisms in research designs means to challenge research to study patterns of non-linearity and emergence. as long as a mechanistic view on teaching and learning is adopted, this cyclical approach on learning does not necessarily pose problems for research design. with its focus on ongoing assessment and adaptive instructional design, differentiated instruction holds an iterative view on teaching. it acknowledges differences in learning pace between students and, consequently, adjusts the teaching process for students depending on the pace at which learning occurs. the use of formative assessment is seen as fundamental to document students’ learning needs and, hence, optimal learning chances are provided for all students in the classroom (coubergs et al., 2017; hall, 2006; tomlinson, 2015). building on ideas of non-linearity and emergence, this cyclical approach on learning would provide important new insights for scholarly research on differentiated instruction. classic planned experimental design-based research is difficult to align with these concepts of non-linearity or emergence. data collection that opens up for emergence is needed to mirror the complexity of learning processes in a differentiated classroom. as long (2001) states: “intervention is an on-going transformational process that is constantly re-shaped by its own internal organizational and political dynamic and by the specific conditions it encounters or (…) creates” (p. 27). this implies that data collection goes further than describing linear patterns of change, but includes more complex patterns of change within educational systems. describing these mechanisms at work is a major challenge for research on differentiated instruction. 5. design principles for research on differentiated instruction generalised knowledge on the micro-level of a classroom stands at odds with the concept of differentiated instruction. moreover, ideas of emergence and nestedness provide fundamental challenges for research designs on differentiated instruction. in this section we propose three design principles for this research that aim at aligning research design with the philosophy and practice of differentiated instruction such as proposed by tomlinson. examples of existing empirical research are added to illustrate these principles. although these examples all refer to learning in heterogeneous settings, not all of them explicitly refer to the construct of differentiated instruction such as proposed by tomlinson. principle 1: organic design. understanding the complexity of teaching in a differentiated classroom implies a holistic focus on the interaction between agents and components of systems instead of mechanical design which isolates particular agents or components. this means that what happens at the level of these agents or components is not necessarily seen as representative on a higher (meso-)level. only a holistic analysis can bring about the necessary understanding on teaching and learning in a differentiated classroom. jacobson et al. (2016) claim the concept of emergence must necessarily be considered when reflecting on causal relationships with regard to teaching and learning. moreover, certainly with regard to differentiated instruction, processes of emergence must stand central in empirical data collection. interventions that open up for non-linear patterns of change are needed for this purpose. applied to research on differentiated instruction this would mean, for instance, studying feedback mechanisms at work within a classroom related to a growth mindset theory. the following example illustrates how non-linear interventions could be designed in order to study these types of patterns. jafari and hashim (2012) described the use of advance organisers in order to improve english foreign language listening skills. an experimental intervention was designed to document students’ learning progress. advanced organisers were administered for a treatment group of students. however, depending on their actual learning progress, the strategy was differentiated. the monitoring of the learning progress of subgroups of students (higher or lower performing) permitted them to assess the strategy at this subgroup level. repeated formative assessment was used to document students’ learning progress and, hence, the further development of the intervention. in addition to monitoring students’ learning progress, this study also gathered qualitative data on the affective outcomes of the chosen strategy. again these data were related to the subgroups of students’ achievement levels. this type of intervention design represents closely the instructional design as it would be applied in a differentiated classroom. through extending the intervention for students who needed more practice or more extended direct teacher instruction, this study permitted them to gain insight into the structure of the learning process of diverse types of students within a group. referring to jacobson and kapur (2012) metaphor of the forest-tree perspective, we believe that this type of study approaches the idea of merging both perspectives on teaching and learning in a differentiated classroom. the teachers set a targeted goal for a heterogeneous group of students. however, patterns of change – learning – are monitored at the level of subgroups of students in order to gain insight into how learning emerges at the level of these subgroups. the organic nature of this intervention lies in the fact that it acknowledges diverse needs of students in its data collection (cognitive and affective, high and low performing). unfortunately, no data were provided in this study on how dynamics among students added to the emergence of learning at an individual or collective level. evidently, data collection that provides more detailed insight of learning at the individual level of students would come even closer to this system-theoretical principle. principle 2: interaction. studies on differentiated teaching must aim at matching the perspective of heterogeneous groups with learning at the level of the individuals within it. as a consequence, interactions between these levels must be monitored. responsiveness being one of the main characteristics of differentiated instruction, this element must necessarily occupy a central position in research design. this means that the students’ individual and collective characteristics are used as a basis for teaching and that the teachers’ response to these depends on formative assessment of students. students’ initial characteristics are pre-assessed and their progress is monitored using formative assessment. understanding how responsiveness of teaching is related to students’ individual learning is therefore a major challenge for empirical research. a study of martin-beltran, guzman, and chen (2017) describes how teachers differentiate discourse in order to foster collaboration between linguistically diverse students. this study is a typical example of interaction in the sense that it studies the interaction between teachers and their students. it draws upon system-theoretical principles in the sense that it intends to describe the complexity of discourse that teachers use in order to cater for diversity in their classes. these patterns of interaction are essential to understand how differentiated instruction materialises into everyday practice. recently the study of interactions within learning systems has attracted a lot of attention due to research design in which interactive software allows the documentation in detail of the learning processes of students. jacobson, kapur, so, and lee (2011) describe, for instance, how systems of hypermedia learning environments work when different types of scaffolding are provided. they collect data through interactive software. they argue how performance on problem solving transfer tasks is determined by the different types of scaffolding provided. building on the interactions between software, individual students and the scaffolds provided, systemic patterns could be described. principle 3: reflexivity. studies on differentiated instruction must acknowledge the interdependence of systems by adopting reflexivity. a more reflexive attitude of researchers is needed in order to achieve more transparency with regard to diverse external or internal dynamics that lie out of the control of researchers (tracy, 2010). building on the idea that control of all external influences is not achievable, it is our suggestion to increase reflexivity about conditions that lie out of control. the idea of proposing a research design in which all necessary factors are controlled seems unachievable with regard to teaching in a differentiated classroom. therefore, instead of controlling all potential disturbing variables, a researcher’s reflexive stance is needed to account for the systems’ interdependence. the notion of reflexivity encompasses different sorts of reflections on how the choices of a particular research design influence its results. alvesson and sköldberg (2009) suggest that this notion should be used not only for reflexivity on the choices made with regard to the systematics of data collection and techniques of procedures of data analysis but also propose to reflect on the interpretative and political-ideological character of research. with regard to differentiated instruction where the responsive character of teaching always implies teachers and, hence, also educational researchers to make difficult choices, we believe reflexivity to be the most credible option to foster transparency in educational research. a study by pilten (2016) comes close to what would be meant with the concept of reflexivity. it documents the experiences with the implementation of differentiated reading instruction of seventeen turkish elementary school teachers. their experiences are limited and their implementation of differentiated instruction is reluctant. although participants in this study often see a potential advantage of the idea of differentiated instruction, most of them classify the use of differentiated instruction as impracticable and thus hard to implement in practice. interestingly the authors of this study chose to reflect on the validity and reliability of their findings in the method section of their study. by doing so, they openly reflect on the extent to which their findings are credible. the phenomenological approach they use allows the authors to dig deep into the complexity of the participants’ teaching practice. building on the aforementioned ideas on open systems, it appears that the implementation of differentiated instruction by essence always relies on dynamics between open systems at the micro-level and other systemic levels. in this case, the implementation of differentiated instruction by the participants of this study could be mediated by external factors. this is why reflection on the validity is desirable. the act of reflecting on the way in which controlled conditions have been achieved and reflecting on potential inter-systemic relations should stand at the heart of methodological sections of studies on differentiated instruction. in the aforementioned example of pilten’s (2016) study, we believe that reflections on, for instance, growth mindset could have been an added value to strengthen further the reflexivity component of this study. 6. limitations we have sought to retheorise empirical research on differentiated instruction, drawing upon system-theoretical epistemic and ontological positions. a major critique on systems theory is that no consensus exists (yet) on the conceptualisation of some of its central concepts. according to fenwick, complexity science remains “slippery, heterogeneous and contested” (2012, p.110 ). most importantly, we notice a certain ambiguity in descriptions of how non-linearity and emergence are related to each other. in addition to this we believe that some of the concepts that are commonly used in systems theory, are sometimes differently conceptualised in more traditional educational approaches. we have built on the cscfl which provides terminology that is accessible for scholars in both cognitive and situated learning traditions. our choice to draw on the classic systems-theoretical terminology of this framework does not imply a positioning in favour of, or against, conceptualisation as situated learning theory or any other research tradition. we want to broaden further and refine research on differentiated instruction, but not by disputing any approach. however, by showing complexity, we argue for the added value of a system-theoretical stance. finally, it may be noticed that we have used a human-centred interpretation of systems theory. as systems theory originates from physics, the consequences of it cannot be strictly focused on human beings. we believe our choice to interpret differentiated instruction with a dominant human interactionist focus, may be argued referring to the existing literature of tomlinson et al. (2003). it could, however, be worthwhile to reinterpret differentiated instruction by tracing more clearly its socio-materiality. research on differentiated instruction has a tendency to couple learning and teaching with a strictly human-centered ontology. fenwick (2012) argues however, against the tendency to focus on human learning figures: “complexity science urges a re-focusing on the relations that produce things, not the things themselves”, (2010, p.111) several scholars have treated material conditions in which differentiated instruction is enacted (gaitas & martins, 2017; keuning et al., 2017). they see them as fostering or inhibiting teaching practice. future research could determine to which extent these material conditions are actually shaping the nature of differentiated teaching and learning. 7. conclusion the concept of differentiated instruction describes a philosophy and an approach to teaching to adapt to diversity in heterogeneous classroom settings (tomlinson, 2015). the complexity of the concept challenges scholarly research on it: it seeks to practice a responsive approach to teaching in which a variety of differences among students are addressed. a range of strategies is used for flexible grouping of students. moreover a growth mindset is adopted in order to maximise learning of all students. system-theoretical insights are needed to describe concisely and to understand deeply, teaching and learning in a differentiated classroom. the notions of non-linearity and emergence, and the concept of nestedness challenge scholarly study on differentiated instruction to broaden and refine research methodology. they help understanding the complex interplay between individual and collective behavior in a differentiated classroom. moreover, they provide insight in the role of interdependence with other systems that mediate learning in a differentiated classroom. methodology that draws upon the description of human interactions, or that includes interventions that open up for non-linearity or emergence, may be used to underpin empirical research on differentiated instruction. moreover, scholars need to reflect on conditions that lie out of their control during data collection. based on these ideas of systems theory, we suggest three design principles for research on differentiated instruction: organic design, interactions and reflectivity. organic design could apply a holistic focus to differentiated instruction. focus on interactions could draw attention to the role of responsivity of the construct. reflexivity is needed in order to account for conditions that lie out of control of studies that focus on differentiated instruction. keypoints differentiated instruction is a complex teaching concept that needs research aligned with this complexity educational research on it is challenged to use theoretical foundations that align with this complexity, both ontological and epistemological three methodological design principles are proposed to align scholarly research on differentiated instruction with the notions of non-linearity and emergence, and nestedness these principles are: organic design, interaction and reflectivity references akkerman, s. f., & van eijck, m. (2013). re-theorising the student dialogically across and between boundaries of multiple communities. itish educational research journal, 39(1), 60-72. doi:10.1080/01411926.2011.613454 alvesson, m., & sköldberg, k. (2009). reflexive methodology. new vistas for qualitative research (2nd ed.). london: sage. anderson, j., r., reder, l., m. , & simon, h., a. (1996). situated learning and education. educational researcher, 25(4), 5-11. doi:10.3102/0013189x025004005 bakker, c., & montesano montessori, m. (2016). complexity in education. from horror to passion. rotterdam: sense. blackwell, l. s., trzesniewski, k. h., & dweck, c. s. (2007). implicit theories of intelligence predict achievement across an adolescent transition: a longitudinal study and an intervention. child development, 78(1), 246-263. doi:10.1111/j.1467-8624.2007.00995.x blumer, h. (1973). a note on symbolic interactionism. american sociological review, 38(6). bronkhorst, l. h., & akkerman, s. f. (2016). at the boundary of school: continuity and discontinuity in learning across contexts. educational research review, 19, 18-35. doi: https://doi.org/10.1016/j.edurev.2016.04.001 bown, b. (2016). a systems thinking perspective on change processes in a teacher professional development programme. journal of education, 66, 37-64. burns, a., & knox, j. (2011). classrooms as complex adaptive systems: a relational model. teaching english as a second or foreign language, 15(1). chen, s. c., yang, s. j. h., & hsiao, c. c. (2016). exploring student perceptions, learning outcome and gender differences in a flipped mathematics course. itish journal of educational technology, 47(6), 1096-1112. doi:10.1111/bjet.12278 cochran-smith, m., ell, f., ludlow, l., grudnoff, l., & aitken, g. (2014). the challenge and promise of complexity theory for teacher education research. teachers college record, 116(5). doi:10.1007/s10833-012-9183-4 cohen, l., manion, l., & morrison, k. (2007). research methods in education. oxford: routledge coubergs, c., struyven, k., vanthournout, g., & engels, n. (2017). measuring teachers’ perceptions about differentiated instruction: the di-quest instrument and model. studies in educational evaluation, 53, 41-54. doi:10.1016/j.stueduc.2017.02.004 damsa, c., & jornet, a. (2016). revisiting learning in higher education—framing notions redefined through an ecological perspective. frontline learning research, 4(4), 39-47. davis, b., & sumara, d. (2007). complexity science and education: reconceptualizing the teacher’s role in learning. interchange, 38(1), 53-67. doi:10.1007/s10780-007-9012-5 de neve, d., devos, g., & tuytens, m. (2015). the importance of job resources and self-efficacy for beginning teachers' professional learning in differentiated instruction. teaching and teacher education, 47, 30-41. doi:10.1016/j.tate.2014.12.003 derry, s. j., & steinkuehler, c. a. (2003). cognitive and situative theories of learning and instruction. in l. nadel (ed.), encyclopedia of cognitive science (pp. 800–805). london nature. deunk, m., doolaard, s., smale-jacobse, a., & bosker, r. (2015). differentiation within and across classrooms: a systematic review of studies into the cognitive effects of differentiation practices . retrieved from groningen: dweck, c. s. (2008). mindset: the new psychology of succes. new york: ballantine. dweck, c. s. (2015). growth mindet. itish journal of educational psychology, 85(2), 242-245. doi:10.1111/bjep.12072 engeström, y. (2001). expansive learning at work: toward an activity theoretical reconceptualization. journal of education and work, 14(1), 133-156. doi:10.1080/13639080020028747 fenwick, t. (2012). tracing the socio-material: emerging approaches to theory and research in adult education. in t. fenwick, r. edwards, & p. sawchuk (eds.), emerging approaches to educational research. london: routledge. gaitas, s., & martins, m. a. (2017). teacher perceived difficulty in implementing differentiated instructional strategies in primary school. international journal of inclusive education, 21(5), 544-556. doi:10.1080/13603116.2016.1223180 gay, g. (2002). preparing for culturally responsive teaching. journal of teacher education, 53(2), 106-116. greeno, j. g. (1997). on claims that answer the wrong questions. educational researcher, 26(1), 5-17. doi: http://doi.org/10.3102/0013189x026001005 hall, t. s., nicole; meyer, anne. (2006). differentiated instruction and implications for udl implementation [press release] hammersley, m. (1997). educational research and teaching: a response to david hargreaves' tta lecture. itish educational research journal,, 23(2), 141-161. jacobson, m., & kapur, m. (2012). learning environments as emergent phenomena: theoretical and methodological implications of complexity. in d. h. jonassen & l. s. (eds.), theoretical foundations of learning environments (2nd ed.). london: routledge. jacobson, m., kapur, m., & reimann, p. (2016). conceptualizing debates in learning and educational research: toward a complex systems conceptual framework of learning. educational psychologist, 51 (2), 210-218. doi:10.1080/00461520.2016.1166963 jacobson, m., kapur, m., so, h., & lee, j. (2011). the ontologies of complexity and learning about complex systems. instructional science, 39(5), 763-783. doi:10.1007/s11251-010-9147-0 jafari, k., & hashim, f. (2012). the effects of using advance organizers on improving efl learners' listening comprehension: a mixed method study. system, 40 (2), 270-281. doi:10.1016/j.system.2012.04.009 keuning, t., geel, m. v., frèrejean, j., merriënboer, j. v., dolmans, d., & visscher, a. j. (2017). differentiëren bij rekenen: een cognitieve taakanalyse van het denken en handelen van basisschoolleerkrachten. pedagogische studiën, 94, 160-181. long, n. (2001). development sociology: actor perspectives. london: routledge. luhmann, n. (2013 ). introduction to systems theory. cam idge: polity press. martin-beltran, m., guzman, n. l., & chen, p. j. j. (2017). "let's think about it together:' how teachers differentiate discourse to mediate collaboration among linguistically diverse students. language awareness, 26(1), 41-58. doi:10.1080/09658416.2016.1278221 mead, g. (1934). mind, self, and society. chicago: university of chicago press. morrison, k. (2008). educational philosophy and the challenge of complexity theory. educational philosophy and theory, 40(1), 19-34. doi: https://doi.org/10.1111/j.1469-5812.2007.00394.x osberg, d., & biesta, g. j. j. (2007). beyond presence: epistemological and pedagogical implications of ‘strong’ emergence. interchange, 38(1), 31-51. doi:10.1007/s10780-007-9014-3 pilten, g. (2016). a phenomenological study of teacher perceptions of the applicability of differentiated reading instruction designs in turkey. educational sciences-theory & practice, 16(4), 1419-1451. doi:10.12738/estp.2016.4.0011 prast, e. j., weijer-bergsma, e. v. d., kroesbergen, e. h., & luit, j. e. h. v. (2015). readiness-based differentiation in primary school mathematics: expert recommendations and teacher self-assessment. frontline learning research, 3(2), 90-116. prigogine, i. (1980). from being to becoming: time and complexity in the physical sciences. new york: w h freeman & co . rattan, a., savani, k., chugh, d., & dweck, c. s. (2015). leveraging mindsets to promote academic achievement: policy recommendations. perspectives on psychological science, 10(6), 721-726. doi:10.1177/1745691615599383 resnick, m. (1996). beyond the centralized mindset. journal of the learning sciences, 5(1), 1-22. sansone, c., morf, c. c., & panter, a. t. (2004). the sage handbook of methods in social psychology. london: sage. sawyer, k. (2002). emergence in psychology: lessons from the history of non-reductionist science. human development, 45, 2-28. sawyer, k. (2005). social emergence: societies as complex systems. cam idge: cam idge university press. schleicher, a. e. (2013). preparing teachers and developing school leaders for the 21st century. lessons from around the world . retrieved from paris: schumm, j. s., & vaughn, s. (1995). getting ready for inclusion: is the stage set? learning disabilities research and practice, 10 (3), 169-179. shabani, k., khatib, m., & ebadi, s. (2010). vygotsky's zone of proximal development: instructional implications and teachers' professional development. english language teaching, 3(4), 237-248. tipton, e. (2013). improving generalizations from experiments using propensity score subclassification: assumptions, properties, and contexts. journal of educational and behavioral statistics, 38 (3), 239-266. doi:10.3102/1076998612441947 tomlinson, c. a. (2000). the differentiated classroom: responding to the needs of all learners . alexandria: association for supervision and curriculum development. tomlinson, c. a. (2001). differentiating instruction in mixed-ability classrooms (2nd ed.). alexandria: association for supervision and curriculum development. tomlinson, c. a. (2015). teaching for excellence in academically diverse classrooms. society, 52(3), 203-209. doi:10.1007/s12115-015-9888-0 tomlinson, c. a., ighton, c., hertberg, h., callahan, c. m., moon, t. r., imijoin, k., . . . reynolds, t. (2003). differentiating instruction in response to student readiness, interest, and learning profile in academically diverse classrooms: a review of literature. journal for the education of the gifted, 27(2-3), 119-145. tracy, s. (2010). qualitative quality: eight “big-tent” criteria for excellent qualitative research. qualitative inquiry, 16(10), 837-852. van klaveren, c., vonk, s., & cornelisz, i. (2017). the effect of adaptive versus static practicing on student learning evidence from a randomized field experiment. economics of education review, 58 , 175-187. doi: https://doi.org/10.1016/j.econedurev.2017.04.003 von bertalanffy, l. (1968). organismic psychology and systems theory. worcester: clark university press. ware, f. (2006). warm demander pedagogy culturally responsive teaching that supports a culture of achievement for african american students. urban education, 41(4), 427-456. doi:10.1177/0042085906289710 wass, r., & golding, c. (2014). sharpening a tool for teaching: the zone of proximal development. teaching in higher education, 19 (6), 671-684. frontline learning research 5 special issue „learning through networks‟ (2014) 38-55 issn 2295-3159 corresponding author: martin rehm, department of educational media & knowledge management, university duisburg-essen, forsthausweg 2, 47057 duisburg, germany, phone: +49-203-3794323, email: martin.rehm@uni-due.de doi: http://dx.doi.org/10.14786/flr.v2i2.85 38 | f l r effects of hierarchical levels on social network structures within communities of learning martin rehm a , wim gijselaers b , mien segers b a university duisburg-essen, germany b maastricht university, the netherlands article received 12 february 2014 / revised 14 may 2014 / accepted 15 may 2014 / available online 15 july 2014 abstract facilitating an interpersonal knowledge transfer among employees constitutes a key building block in setting up organizational training initiatives. with practitioners and researchers looking for innovative training methods, online communities of learning (col) have been promoted as a promising methodology to foster this kind of transfer. however, past research has only provided limited data from actual organizations and largely neglected characteristics that constitute a major obstacle to such collaborative processes, namely participants’ hierarchical levels. the current study addresses these shortcomings by providing empirical evidence from 25 col of an online training program, provided for 249 staff members of a global organization. using social network analysis, we are able to show significant differences in participants’ network behaviour and position based on their hierarchical rank. this translates into higher inand out-degree network ties, as well as centrality scores among participants from higher up the hierarchical ladder. finally, based on a longitudinal analysis of all indicated network measures, our results indicate that the main trend develops predominately during the first half of the training program. by incorporating these insights into the implementation of future col, it is not only possible to anticipate participants’ behaviour. our findings also allow to draw conclusions about how collaborative activities within col should be designed and facilitated, in order to provide participants with a valuable learning experience. keywords: social learning networks; longitudinal analysis; centrality; hierarchical levels m. rehm et al. 39 | f l r 1. introduction researchers have stipulated that organizations are transactive knowledge systems, where the vast majority of knowledge is stored in the heads of individual employees (cross, borgatti, & parker, 2001). consequently, it has been suggested that facilitating an interpersonal knowledge transfer among employees constitutes a key building block in setting up organizational training initiatives (argote & ingram, 2000). this notion is further supported by researchers who suggested that knowledge is being created while collaborating in social networks composed of diverse groups of people (e.g. hakkarainen, palonen, paavola, & lehtinen, 2004; paavola, lipponen, & hakkarainen, 2004). in practice, this process of connecting people greatly builds upon the extensive use of electronic communication tools, such as asynchronous discussion forums. these types of communication channels have been proposed by scholars to effectively enable the establishment and development of new ways in which training can build upon networked communities (e.g. venkatraman, 1994). yet, organizations cannot assume that once a technology is introduced and the appropriate structure has been designed the rest will follow. instead, previous research has established that for social (learning) networks to achieve their intended goals, a clear understanding is needed of how existing organizational structures influence not only the adoption of electronic communication tools, but also their implementation (zack & mckenney, 1995). with practitioners and researchers starting to increasingly look for new approaches to design and implement organizational training programs (yamnill & mclean, 2001), online collaborative learning has received a growing amount of attention in recent years (brower, 2003). in the context of this study, we consider (online) collaborative learning as a setting where “[participants] are working in groups on a shared task or problem, in which they are expected to have equal contributions and participation” (de laat, lally, simons, & wenger, 2006, p. 103). one promising methodology that has been developed within this framework is the concept of online communities of learning (col). being defined as groups of people “engaging in collaborative learning and reflective practice involved in transformative learning” (paloff & pratt, 2003, p. 17), col have been proposed to foster the effective exchange of knowledge and experience between members of an organization‟s workforce (e.g. stacey, smith, & barty, 2004). moreover, online communities, like col, have been considered as an almost ready-made laboratory for analysing collaboration in social (learning) networks over time (haythornthwaite, 2001). in order to conduct these types of analysis, numerous researchers have suggested social network analysis (sna) as a valuable tool for describing and understanding whether and how members of a (learning) network interact with each other (e.g. daradoumis, martínez-monés, & xhafa, 2004; de laat, lally, lipponen, & simons, 2007). according to aviv, erlich, ravid and geva (2003) a social network can be defined as “a group of collaborating (and/or) competing entities that are related to each other” (p. 4). sna has been used to analyse various networks from several academic domains, ranging from social sciences, communication studies, economics, to computer networks and different other fields (aviv et al., 2003). moreover, garton and colleagues (2006) specifically suggest using sna methods in the context of online learning networks. when considering their structure and development, and following the seminal work of erdös and rényi (1960), social networks should evolve according to the concept of random graph theory. in essence, the underlying supposition of this theory is that while some participants of a network might get in touch with more people than others, on average everyone should have made the same amount of contacts, similar to a random distribution of connections. in other words, all participants of a network should have an equal chance of making connections (rienties, tempelaar, giesbers, segers, & gijselaers, 2012). however, if everyone did indeed have equal chances of getting connected with others, why can we then observe so many biased networks in the real world (barabási, 2003)? more specifically, based on numerous studies of newly emerging online communities, researchers have found that a small minority of participants (15%) is gravitating around the centre of their community‟s activity, while a considerable larger group (40%) is barely engaging into communication with their colleagues (e.g. cross, laseter, parker, & velasquez, 2006). in order to explain these observed patterns, some researchers have referred to the fact that communication is an inherently social act (pearce, 1976). new m. rehm et al. 40 | f l r tools and methodologies can only reach their full potential, if organizers fully understand how existing social relationships influence communication patterns and participants‟ behaviour therein (wellman, 2001). moreover, de laat and lally (2003) stipulated that the social and contextual frameworks in which the learning takes place have a considerable influence on how participants behave and perform within online learning networks. furthermore, the nature of social networks, as well as their development over time, is significantly affected by the background characteristics of their individual members (e.g. barabasi & albert, 1999). yet, past research has largely been concerned with the static features of online communities (panzarasa, opsahl, & carley, 2009). while this offers preliminary insights on the overall processes that take place within these communities, it lacks a more refined picture of how social relationships might develop over time (e.g. aviv et al., 2003; haythornthwaite, 2001). additionally, the vast amount of research has neglected a particular background characteristic that can have a severe effect on the underlying learning processes, namely participants‟ hierarchical levels (carley, 1992; griffith & neale, 2001; romme, 1996). the present study addresses these shortcomings by providing empirical evidence from 25 col of an online training program that was provided for 249 staff members of a global organization. each col consisted of 7 – 13 participants and was centred on asynchronous discussion forums, where participants from different parts of the organization‟s hierarchical ladder collaboratively enhanced their knowledge and skills. in order to analyse whether participants‟ network behaviour was influenced by their hierarchical level, social network analysis (sna) was employed. based on the resulting findings of our study, organizers of col will able to anticipate (groups of) individuals holding crucial positions and design actions targeted at participants who tend to be situated more towards the fringe of the network (hatala, 2006). moreover, incorporating our findings into the design and implementation strategies of future col will allow a more refined setup that contributes to employees‟ learning experience and can foster the knowledge creation within an entire organization. 2. effects of hierarchical levels on social network structures within col one of the key elements of online (learning) communities is that they allow for an open dialogue between participants (amin & roberts, 2006). yet, when considering the findings and experiences from real-life communities within organizations, there is increasing evidence that information flows are constrained by underlying organizational structures, such as departments, units and hierarchical levels (e.g. cross, laseter, parker, & velasquez, 2004). one possible explanation for this finding has been put forth by authors like drazin (1990), who stipulated that professionals might not join communities with the intention of learning. instead, individuals would primarily engage into discussion with colleagues, in order to secure their role, and gain access to and control over information. holmqvist (2009) indicated that all organizational learning processes are subject to the influence of a dominant individual or group of individuals. similarly, van der krogt (1998) postulated that “[…] powerful work actors will attempt to influence both the work and the learning network” (p. 170). furthermore, yates and orlikowski (1992) argued that top management will spent more time proactively setting the tone, as they are concerned with losing control of online groups, which could potentially feed through to the real world. considering the role of middle management, bird (1994) advocated that they would act as a “nexus between the real and the ideal” (p. 333). in practice this would result in members of this hierarchical level to “translate” information from one level to the next, providing clarifications and elaborating on shared information. focusing on the lower end of the hierarchical ladder, edmondson (2002) has shown that lower level management is particularly concerned about how colleagues perceive them and their work. consequently, they tend to limit their interaction with colleagues from higher hierarchical levels. additionally, members of this group have been suggested to be more passive in discussions within training programs (nembhard & edmondson, 2006). fox (2000) has described this situation as being “caught in a dilemma” (p.856). on the one hand, individuals would like to establish a reputation of being knowledgeable. on the other hand, they also need to consider the existing rules of conduct. sutton and colleagues (2000) follow this notion and propose that members from lower hierarchical levels will mainly try to blend in while not upsetting the status quo. in practice, this then translates into activities such as flattering, where lower level management frequently contacts their colleagues from higher hierarchical levels (bird, 1994). m. rehm et al. 41 | f l r regarding the overall structure of a (learning) network, it has been established that the position of individuals within such a network is related to their access to valued resources (e.g. ibarra & andrews, 1993; sparrowe, liden, wayne, & kraimer, 2001). casciaro (1998) noted that occupying high-level positions within an organization provides individuals with an intrinsic attraction to lower level management. studying three research centres of an italian university, the author implied that, given their position within the organization, higher level management has privileged access to (vital) information and knowledge sources that are relevant for all employees. moreover, this power can create a type of vortex, where lower level management is trying to get connected and, over time, stay in contact with higher level management (krackhardt, 1990). additionally, borgatti and cross (2003) have argued that lower level management, with only constrained access to valued resources, will be less likely to be contacted for information. as a result, they should hold more peripheral network positions. johnson-cramer, parise and cross (2007) have found empirical evidence for this argument. in their study of a consumer electronic company, they were able to show that higher level management held more central positions in the organization‟s information sharing network. on the contrary, lower level management primarily occupied positions at the outer fringe of the same network. based on these considerations, and taking into the suggestions of previous studies that called for more longitudinal research (e.g. haythornthwaite, 2001), we formulate three research hypotheses: hypothesis 1 (h1): over time, participants' propensity to actively contact other colleagues will be positively influenced by their hierarchical level. hypothesis 2 (h2): over time, participants’ ability to attract connections from other colleagues will be positively related to their hierarchical level. hypothesis 3 (h3): over time, the higher a participant’s hierarchical level, the higher her degree of centrality within col. 3. organisational setting the data was collected from an online training program that aimed at enhancing the capacity and skills of a global organization‟s staff, operating in the sector of economic development. overall, the organization has more than 7.000 employees, operates in 126 countries worldwide, and has its headquarters located in northern america. the training program was delivered twice over a time-span of 14 weeks and covered five pre-defined content modules on the general topic of economics. operating in a fast changing environment, where new analyses and solutions are needed to address old problems, the organization wanted to embrace these developments by training their management staff accordingly. participants engaged into two types of learning activities, namely self-study and collaborative learning. the self-study element included (multimedia) learning materials, such as web lectures and online quizzes. during the collaborative learning activities, which constituted the backbone of the training program, participants discussed real-life tasks via asynchronous discussion forums. the forums were nested in dedicated col that consisted of 10 – 15 randomly assigned participants. each of the five content modules had a separate task, which were discussed within dedicated forums in chronological order. participation in these forums was obligatory and assessed by academic staff members, who facilitated the col. more specifically, a team of two academic staff members was assigned to one col each. these facilitators graded participants‟ contributions, facilitated the discussions, and provided technical assistance. in practice, this could take the form of encouraging discussions and notifying participants when the communication departed too much from the intended focus of the discussion. before engaging with their assigned col, all facilitators were trained on how to work with col and received elaborate guidelines for all collaborative learning activities. additionally, regular meetings were scheduled where facilitators could discuss their experiences and streamline their behaviour and actions towards participants. next to the obligatory, content-driven discussion forums, participants also had the opportunity to exchange private information and socialize via a so-called “café-talk” forum. upon successful completion, participants could attain a certificate of m. rehm et al. 42 | f l r participation, together with academic credits that were based on the european credit transfer and accumulation system (ects). 4. method 4.1 participants and sampling overall, 337 participants were randomly assigned to 30 col. however, the present study analyses a subset of 25 col and 249 participants (73.88%). the underlying reason for this smaller subset is twofold. on the one hand, we had incomplete datasets for some participants. on the other hand, we discovered that some col were biased, in the sense that not all applicable hierarchical levels were represented. consequently, we dropped the applicable col from the analyses. the remaining 25 col had an average of 9.96 members (sd = 1.72, range = 7 – 13), the average age was 43.92 (sd = 7.33, range = 27 – 58), 54.61 percent of the participants were female, and more than 80 nationalities were represented. the educational backgrounds of participants were categorized into master‟s (71.37 %), phd‟s (14.51 %), bachelor‟s (7.26 %), to other degrees (6.85 %). particular examples of the latter category included, health sciences and international law. following the official job categories of the organization in question, participants‟ could be subdivided into “low” (n = 82, 32.93 %), “middle” (n = 93, 37.35 %) and “high” hierarchical levels (n = 74, 29.71 %). 4.2 data collection procedure following the work of daradoumis and colleagues (2004), and based on the collected log-files and user statistics from the underlying discussion forums, we subdivided the data according two different types of network links, namely indirect and direct links. indirect links refer to passive connections that took the form of reading a colleague‟s contributions, but not replying to them. this type of activity was separately recorded in the log-files captured via read-networks. in case a participant actively reacted to another col member‟s contribution and replied, this established a direct link, created another applicable entry in the logfile, and was included in reply-networks. based on this distinction it was then possible to make inferences about the type of learning actions underlying a certain network connection. 4.3 data collection instruments participants reported their own hierarchical level via the training‟s official registration form. the indicated options were subject to the organization‟s official job categories. based on the target group of the training program, three main categories were identified, namely ”low”-, “middle”and “high”-level hierarchical levels. generally, representatives of the “low” group were associated with project level work, contributing to sub-parts of the overall product. members of the “middle” group were leaders of such projects. finally, participants from the “high” group were responsible for departments and often entire regions in which the organization was operating. 4.4 data analysis procedure the analyses of this study focus on data from individual participants. however, these participants were distributed over different col. depending on the specific composition of a particular col, with respect to participants‟ hierarchical levels, this could have led to different dynamics and results. as a result, the validity of comparing across different learning networks might have been reduced. hence, in order to account for possible differences in group compositions across col, we employed the shannon equitability index (magurran, 1988). the index ranges from 0 to fa1 and indicates the percentage share of diversity in relation to the maximal possible diversity within a given col. focusing on participants‟ hierarchical levels as a source of diversity, the average score for the investigated 25 col was .44 (sd = .05, range = .35 – .55). based on this value and the low standard deviation, we concluded that the col represented comparable sample for our analysis. m. rehm et al. 43 | f l r all network statistics were computed with the help of ucinet 6.357 (borgatti, everett, & freeman, 2002). the visualization of an exemplary col network, in terms of sociograms, was conducted with the help of the incorporated visualization software netdraw (borgatti, 2002). the underlying data was based on the log-files and user statistics from the discussion forums within the different col. in order to determine the basic nature of the networks‟ structure, we measured the col network density scores. the density measure is based on the amount of actual ties, divided by the amount possible ties within a col. consequently, it provides an indication of how well-connected participants within a particular col are (hanneman & riddle, 2005). the amount and nature of an individual‟s network connections was determined via the concept of freeman degree centrality, including inand out-degree measures. in-degree network connections indicate how often and by how many colleagues a particular individual was contacted from within a col. more specifically, in the context of the reply-networks, the measure captures how often an individual has been replied to by their colleagues. when considering the read-networks, it reveals how frequent an individual‟s contributions were read by her colleagues. generally, a high amount of in-degree connections has been attributed to prominent participants within (learning) networks, with whom others would like to be connected (hanneman & riddle, 2005). therefore, this constituted our main variable to check our second research hypothesis. the out-degree measure accounts for all those links that originate from a focal individual and summarizes how often that individual contacted her colleagues within the col. when distinguishing between replyand read-networks, the out-degree captures how often a participant has replied to their colleagues and read their contributions, respectively. scholars have often equated a high level of out-degree connections with influential participants, who are able and willing to shape discussions (hanneman & riddle, 2005). consequently, this measure formed the basis for testing the validity of our first research hypothesis. for the analysis of our third research hypothesis, we combined the results of the previous analyses. more specifically, taking into account that we were dealing with multiple col, we determined participants overall centrality on the basis of the normalized number of inand out-degree ties, which allowed to control for the different sizes of the individual col (hanneman & riddle, 2005). in contrast to the more general, nominal network measures, these particular values provided more profound insights on how an individual‟s network ties affected their overall network position within their col. in order to test for the parametric assumption of normality of the data‟s distribution, kolmogorovsmirnov tests (k-s) were conducted. the results revealed a violation of the normality assumption for all measured variables, which translated into statistically significant k-s results at the .01 level. consequently non-parametric tests were used to examine the research hypotheses. more specifically, correlations were determined with the spearman‟s rho measure (rs). in order to assess whether mean differences in the chosen network measures between the different hierarchical levels could be observed, we employed kruskal-wallis tests (h). jonckheere-terpstra tests (j-t) were used to identify whether the potential main effect, as assessed by h, exhibited any possible linear trends. the results of this provided valuable information on how the different hierarchical levels differed in their network measures. the occurrence of possible patterns within the underlying h-test results was determined by post-hoc mann-whitney (u) tests. being designed to only measure differences between two independent conditions, the u-test results were corrected by the bonferroni method. as a result, our adjusted critical value of significance was .016 for this part of the analysis. in order to cater for the longitudinal nature of the data and to test for any possible changes in participants‟ network measures over time, a range of wilcoxon signed rank test were used. the chosen points in time for the longitudinal study were based on the work of previous studies, who conducted similar research on networked learning within teacher education (de laat et al., 2007). the authors of these studies chose for the beginning, the middle and the end phases of online (learning) community. in the context of this study, we decided to subdivide the overall duration of the underlying col of 14 weeks into six time intervals of about two weeks each. this allowed to capture a short “transition period”, during which the focus of the discussions changed from one content module to the next. during this timeframe, participants rounded-up the discussion of the previous module and started preparing for the next one. following the work of de laat and colleagues (2007), out of the six time intervals, we then considered intervals 1 (beginning), 3 (middle) and 6 (end) for our analysis. finally, we also estimated the effect size of our findings. however, the vast majority of effect size measures are only suitable for parametric data (snyder & lawson, 1993). consequently, we followed m. rehm et al. 44 | f l r the suggestion of rosenthal (1991) and approximated the effect size (r) on the basis of the u-results. this measure takes on values from 0 to 1, where small, medium and large effects are associated with .10, .30 and .50, respectively (cohen, 1992). 4.5 control measures although the focus of this research is on the impact of hierarchical levels, we acknowledge that this aspect might only explain parts of possible observed differences between participants. consequently, we controlled for age, gender, educational background, prior knowledge, culture and motivation for attending the training, which have been suggested to influence online collaborative learning. with respect to age, some researchers have suggested that older employees tend to participate less in online training activities (e.g. garavan, carbery, o'malley, & o'donnell, 2010). additionally, other empirical studies have been able to show that age similarity had the potential to trigger emotional conflicts within groups, resulting in lower participation rates (pelled, eisenhardt, & xin, 1999). regarding gender, im and lee (2004) stipulated that if males dominate women in a regular face-to-face environment, this is also likely to carry over to an online environment. in contrast, joinson (2001) was able to show that online training environments had an equalizing effect on participants. when considering participants‟ educational background and prior knowledge, previous studies have highlighted the potential impact of participants‟ prior knowledge on their behaviour within learning initiatives (dochy & mcdowell, 1997). even more so, there has been a growing consensus that individuals‟ prior knowledge constitutes an important variable in participants‟ activity patterns (dochy, segers, & buehl, 1999). if a participant already possesses a considerable amount of prior knowledge about a certain topic, it can be expected that she will be more comfortable in contributing to discussions, thereby positively influencing her general activity and performance levels. participants‟ cultural background has also been suggested to have an impact on participants‟ behave (jehn & bezrukova, 2004). more specifically, researchers like pelled and colleagues (1999) suggested that some cultures tend to exhibit more competitive behaviours than others. hence, representatives of a more competitive culture are also more likely to proactively engage into conversations, trying to shape discussions and thereby achieve higher potential benefits. finally, numerous studies have highlighted the importance of motivation on participants‟ behaviour within the context of online learning (e.g. rienties, tempelaar, van den bossche, gijselaers, & segers, 2009). for example, yang and colleagues (2006) conducted research in online learning environments and discovered that motivation was positively related with how learners perceive each other. consequently, when participants share a similar level of motivation when starting a training program, they tend to “get along” better, which in turn affects their network behaviour (e.g. they connect more often). in this study, participants‟ age, gender, educational background and culture, as assessed by participant‟s country of birth, were self-reported as part of the training programs official registration form. for educational background, participants were asked to indicate their highest attained educational degree, including bachelor, master, phd and other (e.g. vocational training). prior knowledge was measured via a diagnostic test, consisting of 25 multiple choice questions. all five pre-defined content modules were assessed based on five dedicated questions each. these questions were created by academic experts and related to the working environment of the participants. the response rate for the test was 88.76 % and the internal consistency of participants‟ answers was acceptable (cronbach α = .81) (cortina, 1993). participants‟ motivation for attending the training, were approximated based on a previously developed instrument (rienties et al., 2009; rienties, tempelaar, waterval, rehm, & gijselaers, 2006). the questionnaire consisted of 24 questions, subdivided into four categories, and was administered with a 7-point likert scale ranging from 1 (not true for me at all) to 7 (completely true for me). the applicable categories for this study were (the number of questions are reported in brackets): “reasons to join the training” (6), and “expectations and goals” (10). the response rate was 88.51 % and the internal consistency was again acceptable (cronbach α = .95) (cortina, 1993). m. rehm et al. 45 | f l r 5. results overall, while the vast majority of posts were placed in the forums of the five content modules (86%), only few contributions were shared in the “café-talk” forums (14%). in order to visualise the underlying data, figure 1 represents a graphical depiction of the final readand reply-network of an exemplary col. a first glance already indicated a great amount of divergence between these two types of networks. participants were highly connected and exhibited very similar communication patterns with respect to their reading behaviour (fig. 1a). however, considerable differences prevailed regarding whether and how participants replied to each other (fig. 1b). furthermore, a closer look at the figure also revealed a first preliminary sign that participants behaviour and network position were related to their hierarchical level within the organization. an overall picture of the longitudinal nature of our data is depicted in figure 2, which captures the average density values of the col across time. as can be seen from the applicable figure, the average density per time interval of the read-networks is about 10-times higher than those of the reply-networks. yet, while the average density of the read-networks declined over time, the reply-networks increased in terms of density. nonetheless, at the end of the col, the average density for the read-networks remained considerably higher at a value of 62.27 (range = 26.36 – 86.36), as compared to a final value of 11.54 (range = 0 – 28.21) for the reply-networks. a) b) figure 1. read (a) and reply (b) network of an exemplary community of learning. the layout of the figure has been determined using iterative metric multidimensional scaling. the different hierarchical levels are denoted as: “low” – light circle; “middle” – grey square; “high” – dark diamond figure 2. longitudinal data on average density scores for the communities of learning. m. rehm et al. 46 | f l r 5.1 hypotheses 1 & 2 table 1 summarizes the results of participants‟ overall inand out-degree network ties for both types of networks. as can be seen from the table, all measures for the read-networks were statistically insignificant, which led us to reject research hypotheses 1 and 2 for these types of network. in contrast, our kruskal-wallis tests clearly indicated significant differences between hierarchical levels and the degree with which participants‟ either replied to their colleagues, or attracted replies from others. moreover, the jonckheere-terpstra tests showed a clear trend that the amount of both inand out-degree ties were both positively related to participants‟ hierarchical level. additionally, an investigation of the underlying patterns revealed that the observed differences were especially pronounced between the “low” and “high” groups (in-degree: u = 2,261.50, p < .01; out-degree: u = 2,338.00, p < .05), which is also reflected in the observed effect sizes (rin-degree = -.23; rout-degree = -.20). the results of our longitudinal analysis are represented in table 2. as participants‟ behaviour within the read-networks did not show any signs of statistically significant differences, these networks were neglected from the analysis. our results indicated a significant increase of inand out-degree ties for the “middle” and “high” groups over the entire duration of the col. the “low” group did not exhibit a common, noticeable trend. moreover, the evidence indicated that the increases for the “middle” and “high” groups were mainly situated in the first half of the col. during the second half, only members of the “high” group showed significant signs of continued contact-seeking with their colleagues. taken together, these findings indicate that, over time, higher level management was contacted more frequently than lower level management (h1). moreover, our evidence also supported the supposition that over the duration of the col, participants from higher hierarchical levels were more likely to actively contact other col members, than lower level management (h2). table 1 results of kruskal-wallis and jonckheere-terpstra tests for (nominal) inand out-degree network measures m. rehm et al. 47 | f l r table 2 results of wilcoxon signed ranked test for (nominal) inand out-degree measures (reply-networks). 5.2 hypotheses 3 similarly to the previous findings, we again found no significant differences between hierarchical levels within the read-networks. however, as can be seen from table 3, our results for the reply-networks did again sketch another picture. more specifically, the kruskal-wallis tests revealed significant inand outdegree centrality measure differences between hierarchical levels. another set of jonckheere-terpstra tests was then conducted to determine a possible underlying trend. the results showed that whether participants hold a central position within their network was significantly and positively influenced by their hierarchical level. in order to determine the pattern of the main effect, we conducted another range of mann-whitney tests. similarly to hypotheses one and two, the most pronounced difference was again found between the “low” and “high” groups (in-degree: u = 2,202.50, p < .01; rcentrality-in = -.23; out-degree: u = 2,234.50, p < .05; rcentrality-out = -.24). for the longitudinal analysis, based on the described results, we again decided to focus on the replynetworks. table 4 summarizes the main results of the applicable analyses. as in the case of the more general network statistics, we did not find any significant results for the “low” group. in contrast, participants from the “middle” and “high” groups attained higher inand out-degree centrality measures throughout the duration of the col. however, the main acceleration for this development again appeared to be situated in the first half of the col. taking into account that the read-networks did again not yield any significant results, we did not find any support for the notion that, over time, higher level management will hold more central positions in their col network, compared to their colleagues from lower positions (h3). however, based on the statistically significant findings for the reply-networks, we accepted our third research hypothesis for these types of col networks. table 3 results of kruskal-wallis and jonckheere-terpstra tests for (normalized) inand out-degree network measures m. rehm et al. 48 | f l r table 4 results of wilcoxon signed ranked test for (normalized) network measures (reply networks). 5.3 control measures the investigation of whether participants differed in terms of age, gender, educational background, prior knowledge, culture, or motivation for attending the training, subject to their hierarchical levels, revealed no significant results. however, we also conducted a separate correlation analysis, where we investigated any possible, underlying relations between all variables included in this study. as can be seen from table 5, in terms of our dependent and control variables, participants‟ hierarchical level was positively correlated with age. a closer look at the control variables revealed that age (reply-networks: in-degree), gender (read-networks: in-degree) and prior knowledge (read-networks: out-degree) were positively correlated with some of the network measures. hence, in order to incorporate this finding in our analysis, we conducted a separate partial correlation analysis between hierarchical levels and the chosen network measures, while holding age, gender and prior knowledge constant. the results are presented in table 6. while hierarchical levels continued to be significantly correlated with network measures, a more refined picture emerged. more specifically, the potential influence of hierarchical levels now seemed to be mainly applicable for the out-degree measures. moreover, the partial correlation analysis showed this to be true for both the replyand read-networks. consequently, when interpreting the main results of this research, these findings need to be taken into account. moreover, a closer look at the results also revealed that all measured network statistics were highly and significantly correlated with each other. in other words, if an individual participant attained a high amount of in-degree ties, for example, she would also be very likely to initiate a high amount out-degree ties and achieve a comparatively high degree of centrality within her col. as we have been able to show that hierarchical levels have a strong effect on each one of these measures, this provided additional support for our supposition that hierarchical levels have a significant impact on network structures within col. 6. discussion the purpose of this study was to determine whether and to what extend participants‟ hierarchical levels influence the network structures of col. we thereby were able to address a number of shortcomings in current research and contributed to the discussion about how existing organizational structures can affect training initiatives. in order to investigate the relationship between hierarchical levels and network structures, we employed social network analysis and conducted a longitudinal study to test for our research. m. rehm et al. 49 | f l r in the context of the investigated read-networks, we did not find any evidence for individuals‟ hierarchical levels influencing their network behaviour. however, when considering the reply-networks, our results clearly indicated that higher level management attracted more attention, contacted more colleagues, and attained more central positions within their col, as compared to their colleagues from lower level positions. additionally, based on our longitudinal analyses of all network measures, we were able to show that the overall impact generally increased over time, and in particular during the first half of the training program. table 5 overview of correlation coefficients between hierarchical level, control variables and network measures. table 6 correlation coefficients for hierarchical levels and network measures (controlling for age, gender and prior knowledge). m. rehm et al. 50 | f l r in terms of the read-networks, which capture passive connections between participants (daradoumis et al., 2004), this can be considered as a preliminary indication that col have the potential to stimulate an interpersonal knowledge transfer among participants (argote & ingram, 2000). however, the observed range of density scores across the different col varied considerably. moreover, while the average overall density score of 62.27 can be regarded as reasonable, there still remains a considerable gap to be filled in order to achieve a situation where “everyone reads everything”. regarding the reply-networks, we were able to validate our second research hypothesis, which stated that over time, participants‟ ability to attract connections from other colleagues will be positively related to their hierarchical level (h2). this supports the work of krackhardt (1990), who suggested the existence of a vortex that allows higher level management to attract more attention and connections from their colleagues. additionally, our evidence suggested that higher level management will proactively set the tone in online discussions (h1), which confirms the work of yates and orlikowski (1992). we were also able to show that higher level management held central positions, while lower level management was located more towards the fringe of their col (h3) (borgatti & cross, 2003). finally, when conducting longitudinal analyses of the underlying data, our results indicated that the observed general patterns increased over the duration of the col (e.g. bird, 1994; sutton et al., 2000). additionally, this positive trend was particularly pronounced during the first half of the training program, which appears as a kind of “initiation phase”. however, we also discovered that this trend was not statistically significant for the “low” group. this finding can be considered as support for the work of nembhard and edmondson (2006), who suggested that members of this group generally tend to be more passive in discussions within training programs. additionally, it could also be attributed to the importance of the “initiation phase”. once members from the “middle” and “high” group have established their comparatively more central role within their col, it seems as if the “low” group is content with the situation. alternatively, it could also be that members of the “high” group convey such an “imposing message”, trying to lead the group and becoming (more) central to the discussions, that representatives of the “low” group rather not change their behaviour and become more active. furthermore, when reinvestigating the potential influence of hierarchical levels on the chosen network measures, while incorporating our control variables, an even more refined picture emerged. our results indicate that age, gender and prior knowledge seem to have a mediating role in determining participants‟ network measures. more specifically, participants‟ hierarchical background mainly affected their out-degree behaviour, e.g. the degree with which they reply to colleagues in discussions. additionally, this effect was applicable for both the replyand the read-networks, which suggests two main conclusions for higher level management. first, members of this group really try to set the tone and actively try to shape the discussions. second, higher level management more carefully followed the discussions by reading the contributions of their colleagues from lower hierarchical levels. considering these findings, we can draw conclusions about how collaborative learning activities within col should be designed and facilitated, in order to provide participants with a valuable learning experience. for example, acknowledging the considerable influence of hierarchical levels, organizers can device targeted interventions that increase the potential benefits of col (cross et al., 2006). more specifically, higher level management could be stimulated to actively draw upon the input of their colleagues, thereby allowing participants from lower level management to gradually move towards the centre of the col network. in practice, this could be achieved via two possible approaches. on the one hand, facilitators could try to foster a (more) active exchange of information between members of these two opposite parts of the organization. the potential benefit of this approach would be that connections between participants would be initiated and supported by an external party. this in turn could relax underlying norms and regulation that govern how members from different hierarchical levels communicate with each other. alternatively, participants could be asked to complete assignments that build upon a type of mentoring system. with higher level management occupying more central positions, these participants could take their colleagues from lower hierarchical levels “by the hand” and actively include them in the discussions. this could create a pull-effect, whereby participants, who generally tend to occupy positions towards the fringe of a learning network, are drawn closer towards the centre. this not only has the potential to make them a more integral part of the col. it also would provide them with better opportunities to share their knowledge and m. rehm et al. 51 | f l r insights. using the analogy of kozlowski and colleagues (2009), they could thereby more easily contribute their piece to the puzzle, which can enhance the success of the entire organization. finally, considering the longitudinal findings of our research, we have highlighted the importance of the “initiation phase” within col. during the beginning stages of the learning process, participants get to know each other‟s background characteristics, including professional experience and prior knowledge. additionally, participants will also exchange either directly (as part of their introduction to the col), or indirectly (by making appropriate references) information about their hierarchical levels. this in turn will significantly influence their behaviour towards each other throughout the col. consequently, facilitators of such communities should pay specific attention to this initiation process, in order to be able to possibly intervene in the discussions and assist the central participants to engage the entire group into the discussions. 7. conclusions 7.1 limitations the current study exhibits two main limitations that should be taken into account when considering our results. first, we have based our social network statistics purely on observed links between participants. in contrast, previous studies have also commonly incorporated familiarity measures in the context of social network analysis (e.g. krackhardt, 1990). these measures allow to control for the degree with which participants might already be acquainted with each other. this in turn could influence the comfort level of participants‟ and thereby affect their behaviour within col. second, connections between participants did not take into account the content of the shared information. consequently, network ties between individual participants might have reflected personal commonalities that have no direct link with the actual content of the training and are therefore difficult to control for by organizers of similar initiatives. 7.2 future research building upon the findings of this study, future research should conduct (hierarchical) multilevel regression modelling (goldstein, 1995). our results indicate that age, gender, and prior knowledge also had an effect on participants‟ network behaviour. consequently, in order to incorporate these findings and to further contribute to our understanding of whether and how hierarchical levels are transferred into the network structures of col, future studies should consider modelling a larger set of explanatory variables simultaneously. moreover, future research should conduct a content analysis (ca) of the underlying discussions forums within col. this approach is widely accepted to assess the quality of learning processes and outcomes (de laat & lally, 2003) and allows to draw a more refined picture of the actual level of content and knowledge that has been exchanged between participants. moreover, by mapping the ca results against the findings of a sna analysis, it would be possible to provide detailed insights about who has been in contact with whom, what they talked about, and whether this has had an impact on their network position (de laat et al., 2007). additionally, future research should incorporate the role of facilitators into the analysis of col. previous research has suggested that online learning communities must be cherished and protected in order to become an effective educational resource (paloff & pratt, 2003). in other words, facilitators‟ involvement can have a considerable influence on how learning networks develop and evolve over time (anderson, rourke, garrison, & archer, 2001). yet, although a considerable amount of research has already investigated how online facilitation can affect learning processes, the vast majority of these studies has focused on the context of higher education (berge, 1995; de laat et al., 2006; garrison, anderson, & archer, 2010) and largely neglected the field of training within organizations. by investigating the role of facilitators in col, it would be possible to provide profound insights that can serve as a springboard for facilitators to design and implement an effective teaching strategy for col. consequently, the quality of learning process could be further augmented. m. rehm et al. 52 | f l r keypoints we assess the impact of hierarchical levels on online learning networks. the higher the hierarchical level, the higher the connectedness of participants. the higher the hierarchical level, the higher the centrality of participants. our findings are particularly strong for the first half of the networks‟ duration. references amin, a., & roberts, j. (2006). communities of practice: varieties of situated learning‟. eu network of excellence dynamics of institutions and markets in europe (dime). http://www.dimeeu.org/files/active/0/amin_roberts.pdf anderson, t., rourke, l., garrison, d. r., & archer, w. (2001). assessing teaching presence in a computer conferencing context. journal of asynchronous learning networks, 5(2), 1-17. argote, l., & ingram, p. (2000). knowledge transfer: a basis for competitive advantage in firms. organizational behavior and human decision processes, 82(1), 150-169. doi: 10.1006/obhd.2000.2893 aviv, r., erlich, z., ravid, g., & geva, a. (2003). network analysis of knowledge construction in asynchronous learning networks. journal of asynchronous learning networks, 7(3), pp. 1-23. barabási, a.-l. (2003). linked : how everything is connected to everything else and what it means for business, science, and everyday life. from http://quijote.biblio.iteso.mx/dc/ver.aspx?ns=000149871 barabasi, a. l., & albert, r. (1999). emergence of scaling in random networks. science, 286(5439), 509512. berge, z. l. (1995). facilitating computer conferencing: recommendations from the field. educational technology, 15(1), 22-30. bird, a. (1994). careers as repositories of knowledge a new perspective on boundaryless careers. journal of organizational behavior, 15(4), 325-344. doi: 10.1002/job.4030150404 borgatti, s. p. (2002). netdraw: graph visualization software. harvard, ma: analytic technologies. borgatti, s. p., & cross, r. (2003). a relational view of information seeking and learning in social networks. management science, 49(4), 432–445. borgatti, s. p., everett, m. g., & freeman, l. c. (2002). ucinet for windows: software for social network analysis. harvard, ma: analytic technologies. brower, h. h. (2003). on emulating classroom discussion in a distance-delivered obhr course: creating an on-line learning community. academy of management learning & education, 2(1), 22-36. carley, k. (1992). orgabnizational learning and personnel turnover. organization science, 3(1), 20-46. casciaro, t. (1998). seeing things clearly: social structure, personality, and accuracy in social network perception. social networks, 20(4), 331-351. doi: 10.1016/s0378-8733(98)00008-2 cohen, j. (1992). statistics a power primer. psychology bulletin, 112, 155–159. cortina, j. m. (1993). what is coefficient alpha? an examination of theory and applications. journal of applied psychology, 78(1), 98-104. cross, r., borgatti, s. p., & parker, a. (2001). beyond answers: dimensions of the advice network. social networks, 23(3), 215-235. doi: 10.1016/s0378-8733(01)00041-7 cross, r., laseter, t., parker, a., & velasquez, g. (2004). assessing and improving communities of practice with organizational network analysis. paper presented at the the network roundtable at the university of virginia, virginia. cross, r., laseter, t., parker, a., & velasquez, g. (2006). using social network analysis to improve communities of practice. california management review, 49(1), 32-60. daradoumis, t., martínez-monés, a., & xhafa, f. (2004). an integrated approach for analysing and assessing the performance of virtual learning groups groupware: design, implementation and use (vol. 3198, pp. 289-304): springer berlin / heidelberg. http://www.dime-eu.org/files/active/0/amin_roberts.pdf http://www.dime-eu.org/files/active/0/amin_roberts.pdf http://quijote.biblio.iteso.mx/dc/ver.aspx?ns=000149871 m. rehm et al. 53 | f l r de laat, m., & lally, v. (2003). complexity, theory and praxis: researching collaborative learning and tutoring processes in a networked learning community. instructional science, 31(1-2), 7-39. de laat, m., lally, v., lipponen, l., & simons, r.-j. (2007). investigating patterns of interaction in networked learning and computer-supported collaborative learing: a role for social network analysis. computer-supported collaborative learning, 2, 87-103. de laat, m., lally, v., simons, r.-j., & wenger, e. (2006). a selective analysis of empirical findings in networked learning research in higher education: questing for coherence. educational research review, 1(2), 99-111. dochy, f., & mcdowell, l. (1997). assessment as a tool for learning. studies in educational evaluation, 23(4), 279-298. dochy, f., segers, m., & buehl, m. m. (1999). the relation between assessment practices and outcomes of studies: the case of research on prior knowledge. review of educational research, 69(2), 145-186. doi: 10.3102/00346543069002145 drazin, r. (1990). professionals and innovation structural functional versus radical structural perspectives. journal of management studies, 27(3), 245-263. doi: 10.1111/j.14676486.1990.tb00246.x edmondson, a. c. (2002). the local and variegated nature of learning in organizations: a group-level perspective. organization science, 13(2), 128-146. doi: 10.1287/orsc.13.2.128.530 erdös, p., & rényi, a. (1960). on the evolution of random graphs. publications of the mathematical institute of the hungarian academy of sciences, 5, 17-61. fox, s. (2000). communities of practice, focault and actor-network theory. journal of management studies, 37(6), 853-867. garavan, t. n., carbery, r., o'malley, g., & o'donnell, d. (2010). understanding participation in elearning in organizations: a large-scale empirical study of employees. international journal of training and development, 14(3), 155-168. doi: 10.1111/j.1468-2419.2010.00349.x garrison, d. r., anderson, t., & archer, w. (2010). the first decade of the community of inquiry framework: a retrospective. internet and higher education, 13(1-2), 5-9. doi: 10.1016/j.iheduc.2009.10.003 garton, l., haythornthwaite, c., & wellman, b. (2006). studying online social networks. journal of computer-mediated communication, 3(1), 0-0. doi: 10.1111/j.1083-6101.1997.tb00062.x goldstein, h. (1995). multilevel statistical models. sydney: edward arnold. griffith, t. l., & neale, m. a. (2001). information processing in traditional, hybrid, and virtual teams: from nascent knowledge to transactive memory. research in organizational behavior, vol 23, 23, 379421. doi: 10.1016/s0191-3085(01)23009-3 hakkarainen, k., palonen, t., paavola, s., & lehtinen, e. (2004). communities of networked expertise: professional and educational perspectives. amsterdam: elsevier. hanneman, r. a., & riddle, m. (2005). introduction to social network methods. riverside, ca: university of california. hatala, j. p. (2006). social network analysis in human resource development: a new methodology. human resource development review, 5(1), 45-71. doi: 10.1177/1534484305284318 haythornthwaite, c. (2001). exploring multiplexity: social network structures in a computer-supported distance learning class. information society, 17(3), 211-226. holmqvist, m. (2009). complicating the organization: a new prescription for the learning organization? management learning, 40(3), 275-287. doi: 10.1177/1350507609104340 ibarra, h., & andrews, s. b. (1993). power, social-influence, and sense making effects of network centrality and proximity on employee perceptions. administrative science quarterly, 38(2), 277303. doi: 10.2307/2393414 im, y., & lee, o. (2004). pedagogical implications of online discussion for preservice teacher training. journal of research on technology in education, 36(2), 155-170. jehn, k. a., & bezrukova, k. (2004). a field study of group diversity, workgroup context, and performance. journal of organizational behavior, 25(6), 703-729. johnson-cramer, m. e., parise, s., & cross, r. l. (2007). managing change through networks and values. california management review, 49(3), 85-109. m. rehm et al. 54 | f l r joinson, a. n. (2001). self-disclosure in computer-mediated communication: the role of self-awareness and visual anonymity. european journal of social psychology, 31(2), 177-192. doi: 10.1002/ejsp.36 kozlowski, s. w. j., chao, g. t., & jensen, j. m. (2009). building an infrastructure for organizational learning: a multilevel approach. in e. salas & s. w. j. kozlowski (eds.), learning, training, and development in organizations. new york, ny, united states of america: routledge. krackhardt, d. (1990). assessing the political landscape structure, cognition, and power in organizations. administrative science quarterly, 35(2), 342-369. doi: 10.2307/2393394 magurran, a. e. (1988). ecological diversity and its measurement. princeton, nj, usa: princeton university press. nembhard, i. m., & edmondson, a. c. (2006). making it safe: the effects of leader inclusiveness and professional status on psychological safety and improvement efforts in health care teams. journal of organizational behavior, 27(7), 941-966. doi: 10.1002/job.413 paavola, s., lipponen, l., & hakkarainen, k. (2004). models of innovative knowledge communities and three metaphors of learning. review of educational research, 74(4), 557-576. doi: 10.3102/00346543074004557 paloff, r., & pratt, k. (2003). the virtual student: a profile and guide to working with online learners. san francisco: jossey-bass. panzarasa, p., opsahl, t., & carley, k. m. (2009). patterns and dynamics of users' behavior and interaction: network analysis of an online community. journal of the american society for information science and technology, 60(5), 911-932. doi: 10.1002/asi.21015 pearce, w. b. (1976). the coordinated management of meaning: a rules-based theory of interpersonal communication. in g. r. miller (ed.), explorations in interpersonal communication (pp. 17-35). beverly hills, ca: sage publications. pelled, l. h., eisenhardt, k. m., & xin, k. r. (1999). exploring the black box: an analysis of work group diversity, conflict, and performance. administrative science quarterly, 44(1), 1-28. doi: 10.2307/2667029 rienties, b., tempelaar, d., giesbers, b., segers, m., & gijselaers, w. (2012). a dynamic analysis of social interaction in computer mediated communication: a preference for autonomous learning. interactive learning environments. rienties, b., tempelaar, d., van den bossche, p., gijselaers, w., & segers, m. (2009). the role of academic motivation in computer-supported collaborative learning. computers in human behavior, 25(6), 1195-1206. doi: 10.1016/j.chb.2009.05.012 rienties, b., tempelaar, d., waterval, d., rehm, m., & gijselaers, w. (2006). remedial online teaching on a summer course. industry and higher education, 20(5), 327-336. romme, a. g. l. (1996). a note on the hierarchy-team debate. strategic management journal, 17(5), 411417. rosenthal, r. (1991). meta-analytic procedures for social research. newbury park, ca: sage. snyder, p., & lawson, s. (1993). evaluating results using corrected and uncorrected effect size estimates. journal of experimental education, 61(4), 334-349. sparrowe, r. t., liden, r. c., wayne, s. j., & kraimer, m. l. (2001). social networks and the performance of individuals and groups. academy of management journal, 44(2), 316-325. doi: 10.2307/3069458 sutton, r., neale, m. a., & owens, d. (2000). technologies of status negotiation: status dynamics in email discussion groups: stanford university, graduate school of business. van der krogt, f. j. (1998). learning network theory: the tension between learning systems and work systems in organizations. human resource development quarterly, 9(2), 157-177. doi: 10.1002/hrdq.3920090207 venkatraman, n. (1994). it-enabled business transformation: from automation to business scope redefinition. sloan management review, 35(2), 73-87. wellman, b. (2001). computer networks as social networks. computers and science, 293(5537), 20312034. doi: 10.1126/science.1065547 yamnill, s., & mclean, g. n. (2001). theories supporting transfer of training. human resource development quarterly, 12(2), 195. doi: 10.1002/hrdq.7 m. rehm et al. 55 | f l r yang, c.-c., tsai, i., kim, b., cho, m.-h., & laffey, j. m. (2006). exploring the relationships between students' academic motivation and social ability in online learning environments. the internet and higher education, 9(4), 277-286. yates, j., & orlikowski, w. j. (1992). genres of organizational communication a structurational approach to studying communication and media. academy of management review, 17(2), 299326. doi: 10.2307/258774 zack, m. h., & mckenney, j. l. (1995). social-context and interaction in ongoing computer-supported management groups. organization science, 6(4), 394-422. doi: 10.1287/orsc.6.4.394 frontline learning research vol. 5 no. 3 special issue (2017) 139 154 issn 2295-3159 * corresponding author: a.m. williams, department of health, kinesiology, and recreation, college of health, university of utah, 250 s. 1850 e. rm 200, salt lake city, ut 84112. email: mark.williams@health.utah.edu. doi: http://dx.doi.org/10.14786/flr.v5i3.267 using the ‘expert performance approach’ as a framework for improving understanding of expert learning a. mark williamsa*, bradley fawvera, & nicola j. hodgesb a university of utah, usa b university of british columbia, canada article received 15 august / revised 14 december / accepted 23 march / available online 14 july abstract the expert performance approach, initially proposed by ericsson and smith (1991), is reviewed as a systematic framework for the study of ’expert’ learning. the need to develop representative tasks to capture learning is discussed, as is the need to employ process-tracing measures during acquisition to examine what actually changes during learning. we recommend the use of realistic retention and transfer tests to infer what has been learned, so that the effects of various interventions on learning may be evaluated. a focus on individual differences in learning within groups of expert performers is considered as a way to identify the characteristics of more efficient and effective learners. the identification and study of expert (or good) learners will enhance our understanding of skill acquisition and how this may be promoted using instructional interventions and practice opportunities. although these ideas are predicated on our research on perceptual-cognitive expertise in sport, we argue that they have general merit beyond this domain. the challenge for scientists is to generate new knowledge that helps those involved in developing learners who can acquire and refine skills more efficiently and effectively across professional domains. keywords: perceptual-cognitive training; skill acquisition; representative task-design; process-tracing measures; deliberate practice http://dx.doi.org/10.14786/flr.v5i3.267 williams et al | f l r 140 1. introduction over recent decades there has been significant growth of interest in the study of expert performance, with a specific focus on perceptual-cognitive expertise (for up-to-date reviews, see baker & farrow, 2015; ericsson, hoffman, aaron, & williams, in press; farrow, baker & macmahon, 2013; hodges & williams, 2012). the growth of interest in topics such as anticipation and decision making has spanned multiple domains including sport (williams & abernethy, 2012), medicine (mcrobert, causer, vasiliadus, watterson, & williams, 2013), aviation (kennedy, taylor, reade, & yesavage, 2010), and automobile driving (stahl, donmez, & jamieson, 2016). the typical finding is that experts can be differentiated from less expert or novice counterparts based on a number of domain-specific, perceptual-cognitive skills. these skills include the ability to pick up biological motion information from the movements of others (e.g., abernethy & zawi, 2007), a capacity to identify familiarity or patterns in structured displays (e.g., north, ward, ericsson, & williams, 2011), and a more refined knowledge of likely situational or event probabilities (e.g., ward, ericsson, & williams, 2013). systematic differences have been reported in the gaze behaviours underpinning these perceptual-cognitive skills, with the specific strategies employed being task and context specific (see roca, ford, & williams, 2013). in contrast, evidence pointing towards individual differences in measures of basic visual and cognitive functions (i.e., domain-generic skills) is relatively weak or at best mixed (voss, kramer, basak, prakash, roberts, 2010). the suggestion is that expertise arises through adaptations that are mostly specific to the target domain of expertise (ericsson & kintsch, 1995). in this paper, we consider how research on expertise in sport, particularly related to perceptual-cognitive expertise, could impact how we study expert performance in general and ‘expert’ or good learning more specifically. we argue that a distinction between expert performance and expert learning could provide significant insight into the processes underpinning skill acquisition. although the number of published reports on expertise has grown substantially, there remain significant shortcomings with existing literature. first, while the vast majority of researchers have used the traditional expert-novice paradigm to examine differences as a function of performance, there has been a paucity of published papers focusing on expertise within the context of learning in contrast to expertise in terms of performance on the task itself. there are numerous questions that remain unanswered. is it true that expert performers are always expert learners? what is the relationship between performance and the ability or receptiveness to acquire skills efficiently and effectively? are there individual differences with respect to how skills are acquired and refined that distinguish individuals even within skill groups? what is the relationship between skill acquisition processes, for component skills, and eventual skilled performance/expertise? in this review article, the merits of using the ‘expert performance approach’ as a framework to study expert learning rather than, or as well as, expert performance are highlighted. generally, research into skill development and learning has lacked a systematic framework (including methods and measures) for the study of perceptual-cognitive expertise and, equally importantly, its acquisition. while the expert performance approach is not new, having originally been proposed more than two decades ago (see ericsson & smith, 1991), it does offer a framework to study how skills are learned as well as performed. we acknowledge that this approach puts a focus on the acquisition of new skills, rather than their refinement, with the supposition that intentional learning and factors related to skill learning are likely different to the background corrections (bernstein, 1967/1996) and refinements in skill that might take place at a more implicit/non-conscious or subcortical level. in the following sections, the three stages outlined in the expert performance approach are introduced with some examples of how the framework has been used successfully to evaluate expertise across domains. in the second half of the article, the focus shifts to examining how the expert performance approach may be used to evaluate expert learning rather than expert performance. our intention is that some of the questions, issues and ideas we raise in this paper will encourage those working in the learning sciences to adopt this approach (or at least consider these issues) as they pursue further research in this area. williams et al | f l r 141 2. the expert performance approach the expert performance approach was first presented by ericsson and smith in 1991. the three-stage approach to studying expert performance is highlighted in figure 1, with each stage described in detail below. figure 1. the expert performance approach proposed by ericsson and smith (1991). adapted from williams and ericsson (2005). 2.1 stage 1 – capturing expert performance: what factors discriminate? in the first stage, the aim is to develop a representative task that enables expert performance to be captured in a reliable and objective manner. performance may be captured under controlled conditions in the laboratory or using appropriate measurement systems in the field (williams & ericsson, 2005). the goal is to develop tests that discriminate at an empirical level those who are skilled from those less skilled on the task(s). the stable and reliable aspects of performance are captured in a repeated manner. initially, researchers focused primarily on cognitive tasks representative of performance in domains such as chess or mathematics. more recently, there have been efforts to determine representative tasks for both perceptualcognitive and perceptual-motor skills (williams, ford, eccles, & ward, 2011). for example, in sport, these latter efforts have focused both on capturing performance in the field using player and motion tracking systems or by recreating realistic simulations of performance situations using filmor virtual-reality simulations (williams & abernethy, 2012). figure 2 presents a typical example of a film-based simulation set up designed to capture anticipation and decision making in tennis. the advantage with such methods is that they enable performance on the task to be measured accurately such that groups of participants varying in expertise may be compared under standardised and reproducible test conditions. williams et al | f l r 142 2.2 stage 2 – identifying mechanisms underpinning expert performance: how do experts perform better? in the second stage, the aim is to identify the processes and mechanisms underpinning superior performance. process-tracing measures such as eye movement recording and think-aloud verbal protocols are employed to examine the perceptual-cognitive processes underlying expert performance (e.g. mcrobert, ward, eccles, & williams, 2011; roca, ford, mcrobert, & williams, 2013). neurophysiological techniques, such as transcranial magnetic stimulation (tms), electroencephalography (eeg) and functional magnetic resonance imaging (fmri) can be employed to identify and relate areas of brain activity to performance (e.g., balser et al., 2014; dennis, rowe, williams, & milne, in press; wright, bishop, jackson, & abernethy, 2010, 2011). psycho-physiological measures such as pupil diameter, inter-beat heart rate intervals, electromyography (emg), and galvanic skin response may be recorded to examine changes in mental effort or stress/anxiety as a function of skill on the task (e.g., vater, roca, & williams, 2015). biomechanical measurement tools can also be employed to record kinetic and kinematic variables that give insight into movement planning and execution processes (e.g., müller, brenton, dempsey, harbaugh & reid, 2015; smeeton & williams, 2012). finally, various behavioural manipulations, such as dual-task paradigms, can be used to determine the type of processes engaged during anticipation and decision making (e.g., mulligan, lohse & hodges, 2016a,b) or the cognitive and attentional load associated with a task or skill level (e.g., broadbent, causer, williams, & ford, in press). a combination of different process-tracing measures may be needed to cross-validate findings and to provide a more complete picture of the important processes that mediate expert performance. the challenge is to identify how experts demonstrate superior performance on the task compared to less expert individuals with the overall aim of enhancing conceptual and empirical understanding of performance processes. 2.3 stage 3 – facilitating the acquisition of expert performance: why has expertise developed? the aim in the final stage is to improve understanding of how adaptations occur during the acquisition of expertise and to use the knowledge acquired to develop training interventions that help facilitate the more rapid acquisition of expertise. in regards to understanding how these adaptations occur, the prototypical approach has been to use qualitative and quantitative methods to probe practice histories; such as questionnaires, interviews and practice logs. the deliberate practice theory (ericsson et al., 1993) is usually used as the framework around which empirical questions are generated and related (e.g., see ward, hodges, starkes, & williams, 2007; ford & williams, 2012). preto post-test intervention designs have also been used to study the efficacy of shortto medium-term training interventions on performance and learning. such interventions have focused on determining whether the practice behaviours of experts differ from less skilled individuals and how these behaviours relate to current understanding of best practice methods (e.g., coughlan, ford, mcrobert, & williams, 2014; hodges, edwards, luttin, & bowcock, 2011). other researchers have studied how the perceptual-cognitive skills that underpin anticipation and decision making can be trained using simulationor field-based interventions (e.g., see broadbent, causer, williams, & ford, 2015a; williams, ward, & chapman, 2003; williams, ward, knowles, & smeeton, 2002). the general intention is to develop interventions that facilitate the more rapid and robust acquisition of skills, based on knowledge of what good learners or elite performers do, as well as understanding of how and why this works. williams et al | f l r 143 figure 2. an illustration of a film-based simulation used to evaluate anticipation and decision making in tennis. the participant is required to anticipate the location and type of shot played by the opponent located at the net and to decide on an appropriate response. verbal or motor responses are recorded (both speed and accuracy), with the latter from pressure sensitive mats placed on the floor around the participant. 3. applying the expert performance to the study of expert learners the expert performance approach has been used to study differences between expert and less expert or novice performers. however, the approach has not been used systematically to evaluate how skilled individuals acquire (and refine) skills. there are many factors which are likely to affect the rate of acquisition and retention of skills, including the dispositional characteristics of the performer and exposure to previous learning experiences (e.g., hodges et al., 2014), existing skill sets (e.g., hodges et al., 2011), motivation (wulf, shea, & lewthwaite, 2010), and potentially, individual differences related to intelligence or other factors (e.g., ackerman, 1987). the expert performance approach offers a framework to address issues concerning how experts learn and engage in practice to acquire new skills, characteristics that define good learners with respect to factors such as efficiency in rate of skill acquisition and potentially transferability of skills to new conditions, as well as to assess relationships between current performance and learning. as such, by studying how expert performers learn and what factors distinguish the best learners from the worst, we might learn more about best practice principles for more elite performers and about processes of learning in general that could impact conceptual understanding and applied interventions. 3.1 how can we capture learning and how do we define ‘expert’ and ‘less expert’ learners? learning is differentiated from performance during practice on the grounds that the latter is evaluated through observation of current behaviour, whereas estimates of learning usually involve both a measure of skill retention (to measure the longevity of the change in performance) and an assessment of transfer to probe what has been learned and the robustness of learning (schmidt & lee, 2013). typically, learning is assessed at least 24 hours after the end of practice (i.e., after a period of sleep consolidation, walker et al., 2002), under equitable conditions for all groups (such as in the absence of feedback). thus far, however, limited attention has been paid to the efficiency of learning and how it differs within or between individuals. the general focus has been on testing how various interventions facilitate, or not, learning at the group rather than individual level. in fact, the process of collating mean scores across both trials and individuals and summating these largely eliminates what may be interesting variability in the data, which may be a functional component of learning (davids, bennett, & newell, 2006). williams et al | f l r 144 the rate of learning may be negatively related to long-term retention. for example, in short-term adaptation studies, where individuals practice making novel aiming movements in visually-rotated conditions, there is evidence for both fast and slow adaptation processes, with the slow process being more robust over time and arguably predictive of better learning (smith, ghazizadeh & shadmehr, 2006). this slow process is more implicit in nature, showing little sensitivity to error, in comparison to the fast process that is more explicitly-driven and highly responsive to errors. in a study of sequence learning, where individuals practiced three different sequences in a serially repeating order, there was evidence that slower learners, or the ones who spent more time in what was termed the cognitive phase of skill acquisition, had better retention (wadden, hodges, de asis, neva & boyd, 2017). these data seem to support an efficiencyeffectiveness trade-off in learning, which is underscored by current conceptualizations of practice and learning relating to the need for ‘challenge points’ during practice (guadagnoli & lee, 2004) and the promotion of desirable difficulties (bjork & bjork, 2011). however, there are individuals that show both fast acquisition and good retention and “good” learners do not always adhere to established principles of practice. for example, when learners are allowed to schedule their own practice, a useful method to assess characteristics of good or expert learners, the best learners are not always the ones who adhere to good practice principles (such as high contextual interference between attempts of different skills). as long as practice is self-determined, low levels of switching (i.e., contextual interference) between to-be-acquired skills can produce good learning, without accuracy costs in acquisition (hodges et al., 2011, 2014; keetch & lee, 2007). also, there has been resistance to methods which have a negative impact on performance in the short-term (in order to benefit retention), as these can discourage change, demotivate learners and of course have a cost function in terms of amount or duration of practice (especially pertinent when practice time is limited or safety concerns are at play, see lee & wishart, 2005). systematic investigations of experts acquiring new skills (or studying good learners who are able to circumvent practice time) may help to give us insights into this efficiency-effectiveness relationship and any individual differences that might impact these variables. in order to enhance measurement sensitivity, it is necessary to consider the variability in learning across participants to determine how individuals differ in the amount of learning that has occurred (perhaps in response to varying instructions or demonstrations). such an analysis is important if we are to better understand the subtle differences that may exist between ‘expert’ and ‘less expert’ learners. one approach would be to classify performers based on the amount of learning that has occurred by looking at either absolute retention, change in performance (from preto retention tests) or learning efficiency, with respect to the relationship or ratio between rate of acquisition and retention. with respect to this latter measure, those with the best ratio (i.e., efficient and effective) would be classified as a more ‘expert learner’ (or the best within the group). if a sufficiently large sample is used, more and less expert learning groups may be created using a quartileor median-split approach or alternatively, regression analyses may be used, with all participants included, to examine changes in performance and how these ultimately may be linked to changes in process measures. such an approach has been used previously to stratify participants into high and low performing individuals based on their perceptual-cognitive expertise (e.g., bourne, bourne, bennett, hayes, & williams, 2011; savelsbergh, van der kamp, williams, & ward, 2006; williams, ward, bell-walker, & ford, 2011), yet, thus far, the approach has not been employed to stratify participants based on their proficiency in learning a skill. it is often difficult to assess performance directly in many domains, making it hard to measure the amount of improvement that occurs during learning. when performance can be evaluated using standard units of assessment (e.g., time and distance), it is relatively straightforward to evaluate learning. yet, in domains such as in team sports, the military and emergency room medicine, performance is hard to measure as it is not expressed through a single unit of measurement. in such scenarios, performance is typically made up of several individual components that could interact. as such, particularly when the components are more independent, a sensitive assessment could be provided through measurement of an isolated component of performance. for example, a specific test of decision making could be designed, such that performance on this component can be isolated for assessment in the laboratory. a within-task criterion (i.e., actual score on williams et al | f l r 145 the test) may then be used to identify those who are high or low on decision making, rather than general performance in the domain. an advantage of this method is that the efficiency and effectiveness of learning on a specific component of performance may be identified, allowing identification of participants who are more or less able to learn and improve on that component. 3.2 identifying the processes and mechanisms that mediate expert learning when conducting traditional experimental work on skill learning, the vast majority of researchers have relied almost exclusively on changes in outcome measures of performance. process-tracing measures, although used, have been employed far less frequently. the difficulty in relying on outcome measures is often we have limited understanding of what actually changes during learning and neglect the fact that positive changes in process may not necessarily be (immediately) reflected in changes in outcomes (schmidt & lee, 2013). eysenck and calvo (1995), in their processing efficiency theory, present a conceptual account of the effects of anxiety on performance. the model differentiates between changes in processing efficiency and processing effectiveness. as participants become anxious often there is no immediate change in effectiveness, but there may be an increase in the amount of mental effort or resources that need to be devoted to the task, thereby decreasing performance efficiency. at higher levels of anxiety, the cognitive resources needed for task execution may exceed available resource capacity, leading to declines in both efficiency and effectiveness. we argue that the process of learning may function in a similar way such that the efficiency and effectiveness of learning may not necessarily be highly correlated. a call is therefore made for the use of process-tracing measures in conjunction with outcome measures in skill acquisition research. it could be argued that the collection of process-measures to study learning may be as important as the adoption of measures of retention and transfer proved to be in the motor learning literature in the late 1980s (williams & ericsson, 2007). we need to better differentiate between the process and product of learning and to ascertain the efficiency and effectiveness gains (or trade-offs) that may be obtained through different interventions. visual gaze has been used as an index of attention during performance on a specific task (e.g., anticipation), but thus far, only in a few published reports have gaze behaviours been recorded before and after an intervention or practice phase (for exceptions, see alder, ford, causer, & williams, 2016; breslin, hodges, williams, kramer, & curren, 2007; causer, holmes, & williams, 2011). measures of gaze can help establish whether more stable or efficient patterns of visual search behaviour in some learners (i.e., longer duration fixations on information rich areas of a display) inform what information is being acquired, how improvements in performance are attained, and/or explain “learning” in the absence of performance effects. such an approach is especially important when studying expert learning, where subtleties in processes may be more evident following practice than (positive) changes in behavioural outcomes. moreover, there is a paucity of research where verbal reports have been gathered on a pre-test as well as on subsequent retention and transfer-tests (for an exception, see coughlan et al., 2014). whilst kinematic measures are commonly employed on simple tests of motor skill learning, detailed motion analysis is less common for the acquisition of more complex skills, such as those involved in sport. researchers have tended to focus on changes in motor control by measuring variation in single measures (e.g., range of motion, linear velocity) rather than in global motor coordination patterns (e.g., conjugate cross correlations and angle-angle plots as measures of coordination; see carling, reilly, & williams, 2009). the use of neurophysiological measures such as fmri, tms or eeg are reported more frequently as process measures of skill learning in simple, constrained laboratory-based tasks of motor skill (e.g., key press sequencing or single limb adaptation to novel environments; for reviews see hardwick, rottschy, miall, & eickhoff, 2013; lohse, wadden, boyd, & hodges, 2014). however, the recording of neurophysiological measures on preand retention/transfer-tests following the acquisition of more complex skills has been rare (for an exception, see bezzola, merillat, gaser, & jancke, 2011). the difficulty is that while we have a reasonable understanding of the effectiveness of different instructional interventions at the outcome level, only limited knowledge has been generated in regards to how changes in process-measures williams et al | f l r 146 accompany changes in outcome. an interesting question is the extent to which process-tracing measures (such as visual search) may predict changes in learning across and within individuals. such research would help us better understand the processes underpinning expert learning and how these relate to the development of mechanisms that promote expertise. 3.3 tracing the development of expert learners an extensive body of research now exists focusing on deliberate practice and its contribution to expert performance. the findings remain somewhat controversial with wide ranging views regarding the variance in performance across individuals accounted for by hours accumulated in deliberate practice (see ericsson et al., in press; hambrick, altmann, oswald, meinz, & gobet, 2014; mcnamara, hambrick, & oswald, 2014). perhaps the most interesting issue is that that variability in the amount of hours accumulated across individuals has been largely unexplored. numerous researchers have reported extremely large standard deviations in the number of hours athletes accumulate in different types of practices activities during development, implying that practice (at least given issues in measurement) is not the sole factor in developing expertise (e.g., see ford et al., 2012; hopwood, macmahon, farrow & baker, 2016). the difficulty with the majority of the existing research on deliberate practice is that it tends to be correlational in nature, rather than prospective or intervention-based. the data may largely be describing the social and cultural backgrounds surrounding a particular cohort or domain, rather than highlighting causal factors which promote excellence. if the deliberate practice framework is to continue to make a valuable contribution to the learning sciences, better efforts are needed to link engagement in specific types of practice activities with specific improvements in related components of performance. the need to be able to draw firmer inferences about causality may necessitate moving away from retrospective, historical accounts of practice (i.e., the ‘bean counting’ approach) towards more prospective approaches combining some of the tenants of deliberate practice theory with more traditional, quasi-experimental designs. a published report by coughlan, williams, and ford (2014) nicely illustrates how the tenants of deliberate practice may be studied under controlled conditions involving a traditional learning design, including measures of transfer and retention, and process measures of learning. the need remains to specify not just how much deliberate practice occurs amongst learners, but also how deliberate the learners are during practice itself. self-regulated learners take control of their learning environment and engage in specific learning strategies to improve (zimmerman, 2008). the motivation to develop deeper understanding of key concepts and subject matter lead to long-term retention and a greater ability to transfer skills across domains. similarly, mental toughness, grit and resilience may mediate expert learning as performers must develop new and innovative ways to overcome obstacles and continue to challenge themselves (hodges, ford, hendry, & williams, in press). do expert learners simply respond better to challenges in the learning environment or do they create those challenges to push themselves to new heights? although research using the deliberate practice framework and cross-sectional, expert-novice type comparisons may be criticised, shortcomings equally exist with more ‘traditional’ learning designs that rely on an experimental approach. the majority of researchers who study motor-skill learning have employed novice participants who often have no or very little skill (or interest) in the chosen task. novel tasks are often used, in efforts to equate individuals before practice, which have little applicability to real-world scenarios and relatively short acquisition periods, sometimes not even including a retention and/or transfer test. moreover, experience and skill are often confounded in the choice of participants, such that, for example, participants high in skill are typically highly experienced making it impossible to disassociate these two factors. the absence of research with more skilled performers is notable, particularly involving real-world tasks. no published reports exist using more and less expert learners. what is needed is a refinement in the methods used by the different camps of researchers and a greater emphasis on using mixed-methods, combining retrospective, experimental and prospective designs and employing processes-tracing measures. williams et al | f l r 147 the advantages and disadvantages of retrospective practice history profiling, traditional learning studies and prospective designs are highlighted in table 1. table 1 the advantages and disadvantages associated with using retrospective practice history profiles, traditional experimental approaches to learning as well as prospective designs (adapted from williams & ericsson, 2005). retrospective practice history profiling traditional preto post-test designs prospective designs provides a general description of the types of activities performers have engaged in to become expert enables the validity of specific instructional interventions and practice schedules to be examined under controlled conditions leading to inferences regarding causality can answer questions about why differences exist across individuals and enable causality judgements strong emphasis on identifying the macro rather than micro structure of practice strong emphasis on the micro-structure of practice, rather than macro-level allow subtle measurement of changes in performance with practice that might not be seen in typical behavioural measures limited attempts to identify the specific practice activities engaged in and to link these to changes in specific components of performance overreliance on simplistic and novel laboratory tasks leading to concerns regarding generalizability of findings (to the real-world and to experts) allows tracking of specific activities within a targeted group of individuals few attempts to use the approach in conjunction with experimentally-based, pre-post-test designs short-term interventions with limited follow-up regarding long-term retention of skills assessment of short-term interventions is permitted over long periods of time absence of control groups (matched for age and experience) applicability of findings somewhat dependent on strengths of transfer tests employed, particularly if stressors such as fatigue and anxiety missing prospective designs allow for better reliability and validity in measures/conclusions lack of focus on individual differences and large standard deviations in data often an absence of retention tests, or the use of short retention periods, and inappropriate use of, or absent, transfer tests age can be factored into longitudinal designs, such that comparisons can be made across age groups and across individuals over time not possible to imply causality from such data majority of published reports involve novice participants with limited focus on expert performers and on expert learners prospective designs afford a better focus on changes within individuals and causality statements concerns with validity and reliability of retrospective estimates of practice hours lack of research studying how instructional variables, practice variables and learner’s skill interact, leading to a limited, reductionist approach to exploring skill acquisition multiple measures of process and outcome can be integrated to identify efficiency-effectiveness trade-offs and how these change over time an area of study that has attracted a growing body of research in recent years relates to the use of video and other forms of simulation that enable opportunities for anticipation and decision making to be isolated under controlled conditions in the laboratory, greatly increasing the opportunity for repetition and engagement in deliberate practice (for a recent review, see broadbent et al., 2015). the prototypical williams et al | f l r 148 approach involves filming an action sequence (e.g., the serves of opponents in tennis) from the perspective of the player and then replaying the action, which may be occluded at various time intervals before, at or after ball-racket contact. the task for the participant would be to anticipate where and what type of serve the opponents would employ before deciding on the correct return shot. such an approach would be used for the preand post-test, whereas a transfer test would likely involve serves from opponents not previously observed, and potentially some form of on-court data collection. the intervention period would typically require players to view other servers and to receive instruction related to the pick-up of key postural cues to facilitate anticipation. feedback regarding task performance would be provided (for examples, see abernethy, schorer, jackson, & hagemann, 2012; smeeton, williams, hodges, & ward, 2005; williams et al., 2002). while there remain limitations with existing work on perceptual-cognitive training, notably in regards to the measurement of skill transfer, the paradigm has been used to examine the conceptual underpinnings of a range of issues in the learning sciences, including: practice scheduling (broadbent, causer, williams, & ford, 2015b; hodges et al., 2011); focus of attention (abernethy, schorer, jackson, & hagemann, 2012); the effectiveness of imagery (smeeton, hibbert, stevenson, cumming, & williams, 2015); perceptual cueing (ryu, kim, abernethy, & mann, 2012); and training under pressure (alder et al., 2015). although there have been attempts to look at individual differences in perceptual-cognitive skills and determine how well these predict performance (e.g., mangine et al., 2014), such an approach has not been used to examine how elite athletes improve or respond to practice over a relatively short intervention (for an exception, see faubert, 2013). in terms of differentiating across skill groups, the evidence is pretty mixed regarding the ability of general perceptual-cognitive skills, such as attention or spatial-iq, to distinguish better from worse athletes (for a review, see voss et al., 2010). with respect to learning, this general abilities approach may be informative, especially if combined with sport-specific measures of skill. for example, do the athletes who show better recall or recognition for patterns of play specific to their sport, also respond better to practice manipulations designed to improve perceptual cognitive skill in recognizing deceptive plays for example? whilst there is anecdotal evidence for “good” learners in sports, the reasons for this receptivity have not been explored and as such, studying variables such as current skill level, playing experience, attention or cognitive capacities, might provide insight into variables and characteristics which are most related to continued learning and the overcoming of challenges associated with necessary technique changes (such as in the case of injury or changing demands in the sport). in addition to the relative short acquisition periods that are employed in laboratory based research on motor learning (e.g., 45-60 min, see williams, ward, & chapman, 2003), there is an absence of longitudinal work. consequently, we have limited idea of how skill learning progresses and changes over time or the extent to which the typically observed changes are durable and lasting. we know very little about the factors that differentiate someone who learns these skills effectively and efficiently from those who record smaller changes in performance either over time or as a result of the training intervention. the absence of any longitudinal work also makes it difficult to judge whether these measures of specific task components (e.g., perceptual-cognitive skills) have any predictive utility from a talent identification perspective. if someone scores well on these tests at an early age does this suggest that performance will continue to be high relative to others in an older age grouping? moreover, are there key time windows for the acquisition of these skills and how is this linked to general development in young children? do individuals that demonstrate ‘exceptional talent’ learn skills more efficiently following brief exposure or is the development of perceptual-cognitive skills non-linear, varying from one stage of development to the next, making prediction difficult? williams et al | f l r 149 4. conclusions in this article, the expert performance approach, originally introduced by ericsson and smith (1991) was reviewed as a systematic framework for the study of expert learning. over recent decades, this approach has helped to stimulate research in the area of perceptual-cognitive expertise. in contrast to the growing research on expert performance across numerous domains, as well as significant research focusing on the practice history profiles of experts and novices, there remains a need for research on how expert learners continue to learn new skills and refine existing ones. we advocate a more systematic approach to the study of (motor) learning in general, based on process tracing measures and identification of individual differences related to effectiveness and efficiency. someone who acquires, retains and transfers skill better than another individual may perhaps be categorised as a skilled or more expert learner. the absence of research on the above topic largely arises because the prototypical approach has been to use participants classified based on their current level of performance rather than their expertise in learning (which of course requires some formative assessment). this latter approach necessitates the selection of participants (typically retrospectively) based on the level of performance change observed over time on a particular component of performance. a strong focus on identifying individual differences in learning is essential, rather than relying on group means over practice blocks. the use of process-tracing measures during acquisition will improve understanding of how learning takes place rather than the amount of learning that has occurred, as is the case when relying solely on outcome scores. we suggest that the expert performance approach has considerable potential in offering a framework to identify expert learners across domains and that it offers a systematic, guiding framework for better understanding how experts have learned the skills needed to perform at high levels in their respective domains. a stronger focus on the science of learning is key to enhance knowledge of how skills are acquired and how we then can promote more efficient and effective skill acquisition cross many professional domains. keypoints the expert performance approach presents a systematic framework for examining how expert learners acquire and refine skills across domains. more effort is needed to identify individuals who demonstrate exceptional learning on specific skills (as defined by efficiency and effectiveness in attainment) when compared to norms, if we are to develop more refined methods to accelerate skill learning across domains. a stronger focus is needed on exploring individual differences in learning and on using processtracing measures to evaluate how learning progresses, rather than on an existing overreliance on outcome scores, averaged across individuals. measures of learning efficiency (rate of learning, number of practice trials or days of practice) and effectiveness (i.e., accuracy, consistency, skill quality, speed) are needed to gain a more accurate picture of how experts learn across many domains of professional activity. references abernethy, b., schorer, j., jackson, r. c., & hagemann, n. (2012). perceptual training methods compared: the relative efficacy of different approaches to enhancing sport-specific anticipation. journal of experimental psychology: applied, 18, 143. http://psycnet.apa.org/doi/10.1037/a0028452 abernethy, b., & zawi, k. (2007). pickup of essential kinematics underpins expert perception of movement patterns. journal of motor behavior, 39, 353-367. http://dx.doi.org/10.3200/jmbr.39.5.353-368 http://psycnet.apa.org/doi/10.1037/a0028452 http://dx.doi.org/10.3200/jmbr.39.5.353-368 williams et al | f l r 150 ackerman, p. l. (1987). individual differences in skill learning: an integration of psychometric and information processing perspectives. psychological bulletin, 102, 1, 3. http://dx.doi.org.ezproxy.lib.utah.edu/10.1037/0033-2909.102.1.3 alder, d., ford, p. r. causer, j., & williams, a. m., (2016). the effects of highand low-anxiety training on the anticipation judgements of elite performers. journal of sport & exercise psychology, 38, 93-104. http://dx.doi.org/10.1123/jsep.2015-0145 balser, n., lorey, b., pilgramm, s., stark, r., bischoff, m., zentgraf, k., williams, a. m., & munzert, j. (2014) prediction of human actions: expertise and task-related effects on neural activation of the action observation network. human brain mapping, 35, 4016-4034. doi:10.1002/hbm.22455 baker, j. & farrow, d. (2015). routledge handbook of sport expertise. london: routledge bezzola, l., merillat, s., gaser, c., & jancke, l. (2011). training-induced neural plasticity in golf novices. the journal of neuroscience, 31, 35, 12444-12448. https://doi.org/10.1523/jneurosci.1996-11.2011 bjork, e. l., & bjork, r. a. (2011). making things hard on yourself, but in a good way: creating desirable difficulties to enhance learning. in m. a. gernsbacher, r. w. pew, l. m. hough, & j. r. pomerantz (eds.), psychology and the real world: essays illustrating fundamental contributions to society (pp. 5664). new york: worth publishers. breslin, g., hodges, n.j., williams, a.m., kremmer, j. & curren w. (2006). the role of intraand interlimb relative motion information in modeling a novel motor skill. human movement science, 25, 6, 753-752. http://dx.doi.org/10.1016/j.humov.2006.04.002 broadbent, d. p., causer, j., williams, a. m., & ford, p. r. (2015a). perceptual-cognitive skill training and its transfer to expert performance in the field: future research directions. european journal of sport science, 15, 322-331. http://dx.doi.org/10.1080/17461391.2014.957727 broadbent, d. p., causer, j., ford, p. r., & williams, a. m. (2015b). contextual interference effect on perceptual-cognitive skills training. medicine and science in sports and exercise, 47, 6, 1243-1250. http://dx.doi.org/10.1249/mss.0000000000000530 broadbent, d. p., causer, j., williams, a. m., & ford, p. r. (in press). the role of cognitive effort and error processing in the contextual interference effect during perceptual-cognitive skills training. journal of experimental psychology: human perception & performance. http://psycnet.apa.org/doi/10.1037/xhp0000375 bourne, m., bennett, s.j., hayes, s., & williams, a.m. (2011). the dynamical structure of handball penalty shots as a function of target location. human movement science, 30, 1, 40-5. http://dx.doi.org/10.1016/j.humov.2010.11.001 carling, c., reilly, t.p., & williams, a.m. (2009). the handbook of soccer match analysis. taylor & francis: london. causer, j., holmes, p.s., & williams, a.m. (2011). quiet eye training in elite performers. medicine & science in sport & exercise, 43, 1042-1049. doi: 10.1249/mss.0b013e3182035de6 chiviacowsky, s., & wulf, g. (2002). self-controlled feedback: does it enhance learning because performers get feedback when they need it? research quarterly for exercise and sport, 73, 408-415. http://dx.doi.org/10.1080/02701367.2002.10609040 coughlan, e., williams, a.m., mcrobert, & ford., p. (2014) a novel test of deliberate practice theory: how experts learn. journal of experimental psychology: learning, memory & cognition, 40, 449-458. http://psycnet.apa.org/doi/10.1037/a0034302 davids, k., bennett, s.j., & newell, k.m. (2006). movement system variability. human kinetics: champaign, illinois. dennis, d., rowe, r., williams, a.m., & milne, e. (2017). the role of cortical sensorimotor oscillation in action anticipation. neuroimage, 146, 1102-1114. http://dx.doi.org/10.1016/j.neuroimage.2016.10.022 ericsson, k. a., & kintsch, w. (1995). long-term working memory. psychological review, 102, 211-245. http://psycnet.apa.org/doi/10.1037/0033-295x.102.2.211 ericsson, k. a., & smith, j. (1991). prospects and limits of the empirical study of expertise: an introduction. in k. a. ericsson & j. smith (eds.), towards a general theory of expertise: prospects and limits (pp. http://dx.doi.org.ezproxy.lib.utah.edu/10.1037/0033-2909.102.1.3 http://dx.doi.org/10.1123/jsep.2015-0145 https://doi.org/10.1523/jneurosci.1996-11.2011 http://dx.doi.org/10.1016/j.humov.2006.04.002 http://dx.doi.org/10.1080/17461391.2014.957727 http://dx.doi.org/10.1249/mss.0000000000000530 http://psycnet.apa.org/doi/10.1037/xhp0000375 http://dx.doi.org/10.1016/j.humov.2010.11.001 http://dx.doi.org/10.1080/02701367.2002.10609040 http://psycnet.apa.org/doi/10.1037/a0034302 http://dx.doi.org/10.1016/j.neuroimage.2016.10.022 http://psycnet.apa.org/doi/10.1037/0033-295x.102.2.211 williams et al | f l r 151 1-38). new york: cambridge university press. ericsson, k. a., & williams, a. m. (2007). capturing naturally occurring superior performance in the laboratory: translational research on expert performance. journal of experimental psychology: applied, 13(3), 115-123. http://psycnet.apa.org/doi/10.1037/1076-898x.13.3.115 ericsson, k. a., krampe, r. t., & tesch-römer, c. (1993). the role of deliberate practice in the acquisition of expert performance. psychological review, 100, 363-406. http://psycnet.apa.org/doi/10.1037/0033-295x.100.3.363 ericsson, k. a., hoffman, r., aaron, k., & williams, a. m. (in press) (ed.) the cambridge handbook of expertise (second edition). cambridge university press. eysenck, m. w., & calvo, m. g. (1992). anxiety and performance. the processing efficiency theory. cohgnition and emotion, 6, 409-434. http://dx.doi.org/10.1080/02699939208409696 farrow, d., baker, j., & macmahon, c. (2013). developing sport expertise: researchers and coaches put theory into practice (2nd edition). routledge/taylor and francis. faubert, j. (2013). professional athletes have extraordinary skills for rapidly learning complex and neutral dynamic visual scenes. scientific reports, 3, 1154. doi:10.1038/srep01154 ford, p. r., & williams, a. m. (2012). the developmental activities engaged in by elite youth soccer players who progressed to professional status compared to those who did not. psychology of sport & exercise, 13, 349–352. http://dx.doi.org/10.1016/j.psychsport.2011.09.004 ford, p. r., hodges, n. j., &williams, a. m. (2013). expert sports performance and its development. in beyond “talent or practice?”: the multiple determinants of greatness (edited by b. kaufman), 391414. oxford: oxford university press. ford, p. r., carling, c., garces, m., marques, m., miguel, c., farrant, a., & williams, a.m. (2012). the developmental activities of elite soccer players aged under-16 years from brazil, england, france, ghana, mexico, portugal and sweden. journal of sports sciences, 30, 1653-1663. http://dx.doi.org/10.1080/02640414.2012.701762 hambrick, d. z., altmann, e. m., oswald, f. l., meinz, e. j., & gobet, f. (2014). facing facts about deliberate practice. frontiers in psychology, 5, 751. doi:10.3389/fpsyg.2014.00751 hardwick, r. m., rottschy, c., miall, r. c., & eickhoff, s. b. (2013). a quantitative meta-analysis and review of motor learning in the human brain. neuroimage, 67, 283-297. http://dx.doi.org/10.1016/j.neuroimage.2012.11.020 hodges, n. j., edwards, c., luttin, s., & bowcock, a. (2011). learning from the experts: gaining insights into best practice during the acquisition of three novel motor skills. research quarterly for exercise & sport, 82, 178-187. http://dx.doi.org/10.1080/02701367.2011.10599745 hodges, n. j., lohse, k. r., wilson, a., lim, s. b., & mulligan, d. (2014). exploring the dynamic nature of contextual interference: previous experience affects current practice but not learning. journal of motor behavior, 46(6), 455-467. http://dx.doi.org/10.1080/00222895.2014.947911 hodges, n. j., ford, p. r., hendry, d. j., & williams, a. m. (in press). getting gritty about practice and success: motivational characteristics of great performers. progress in brain research. http://dx.doi.org/10.1016/bs.pbr.2017.02.003 hodges, n. j., & williams, a. m. (2012). skill acquisition in sport: research, theory and practice (second edition). routledge: london. hopwood, m., macmahon, c., farrow, d., & baker, j. (2016). is practice the only determinant of sporting expertise? revisiting starkes (2000). international journal of sport psychology, 47(1), 631-651. keetch, k. m., & lee, t. d. (2007). the effect of self-regulated and experimenter-imposed practice schedules on motor learning for tasks of varying difficulty. research quarterly for exercise and sport, 78(5), 476-486. http://dx.doi.org/10.1080/02701367.2007.10599447 kennedy, q., taylor, j. l., reade, g., & yesavage, m. d. (2010). age and expertise effects in aviation decision making and flight control in a flight simulator. aviation, space and environmental medicine, 81, 489-497. https://doi.org/10.3357/asem.2684.2010 lee, t. d., & wishart, l. r. (2005). motor learning conundrums (and possible solutions). quest, 57(1), 6778. http://dx.doi.org/10.1080/00336297.2005.10491843 http://psycnet.apa.org/doi/10.1037/1076-898x.13.3.115 http://psycnet.apa.org/doi/10.1037/0033-295x.100.3.363 http://dx.doi.org/10.1080/02699939208409696 http://dx.doi.org/10.1016/j.psychsport.2011.09.004 http://dx.doi.org/10.1080/02640414.2012.701762 http://dx.doi.org/10.1016/j.neuroimage.2012.11.020 http://dx.doi.org/10.1080/02701367.2011.10599745 http://dx.doi.org/10.1080/00222895.2014.947911 http://dx.doi.org/10.1016/bs.pbr.2017.02.003 http://dx.doi.org/10.1080/02701367.2007.10599447 https://doi.org/10.3357/asem.2684.2010 http://dx.doi.org/10.1080/00336297.2005.10491843 williams et al | f l r 152 lohse, k. r., wadden, k., boyd, l. a., & hodges, n. j. (2014). motor skill acquisition across short and long time scales: a meta-analysis of neuroimaging data. neuropsychologia, 59, 130-141. http://dx.doi.org/10.1016/j.neuropsychologia.2014.05.001 mangine, g.t., hoffman, j.r., wells, a.j., gonzalez, a.m., townsend, j.r., jajtner, a.r., beyer, k.s., bohner, j.d., pruna, g.j., fragala, m.s., stout, j.r. (2014). visual tracking speed is related to basketball-specific measures of performance in nba players. journal of strength conditioning research, 28 (9), 2406-2414. doi: 10.1519/jsc.0000000000000550 mcnamara., b. n., hambrick, d. z., & oswald, f. c. (2014). deliberate practice and performance in music, games, sports, education, and professions: a meta-analysis. psychological science, 25, 1608-1618. doi:10.1177/0956797614535810 mcrobert, a., causer, j., vasiliadus, j., watterson, l., & williams, a. m. (2013). contextual information influences diagnosis accuracy and decision-making in simulated emergency medicine emergencies. british medical journal: quality & safety, 22, 478-484. http://dx.doi.org/10.1136/bmjqs-2012000972 müller, s., brenton, j., dempsey, a. r., harbaugh, a. g., & reid, c. (2015). individual differences in highly skilled visual perceptual-motor striking skill. attention, perception, & psychophysics, 77(5), 1726-1736. doi:10.3758/s13414-015-0876-7 mulligan, d., lohse, k. r., & hodges, n. j. (2016a). an action-incongruent secondary task modulates prediction accuracy in experienced performers: evidence for motor simulation. psychological research, 80, 496-509. doi:10.1007/s00426-015-0672-y mulligan d., lohse, k.r., & hodges, n.j. (2016b). evidence for dual mechanisms of action prediction dependent on acquired visual-motor experiences. journal of experimental psychology: human perception and performance, 42, 1615-1626. doi: 10.1037/xhp0000241 north, j. s., ward, p., ericsson, a., & williams, a. m. (2011). mechanisms underlying skilled anticipation and recognition in a dynamic and temporally constrained domain. memory, 19(2), 155-168. http://dx.doi.org/10.1080/09658211.2010.541466 roca, a., ford, p. r., mcrobert, a. p., & williams, a. m. (2013). perceptual-cognitive skills and their interaction as a function of task constraints in soccer. journal of sport & exercise psychology, 35, 144155. http://dx.doi.org/10.1123/jsep.35.2.144 ryu, d., kim., s., abernethy, b., & mann, d. l. (2012). guiding attention aids the acquisition of anticipatory skill in novice soccer goalkeepers. research quarterly for exercise & sport, 84, 252-262. http://dx.doi.org/10.1080/02701367.2013.784843 savelsbergh, g.j.p., van der kamp, j., williams, a.m., & ward, p. (2005). anticipation and visual search behavior in expert soccer goalkeepers. a within-group comparison. ergonomics, 48(11-14), 1686-1697. http://dx.doi.org/10.1080/00140130500101346 schmidt, r., & lee, t. (2013). motor learning and performance, 5th edition: from principles to application. champaign, il: human kinetics. smeeton, n. j., & williams, a. m. (2012). the role of movement exaggeration in the anticipation of deceptive soccer penalty kicks. british journal of psychology, 103, 539-555. doi: 10.1111/j.20448295.2011.02092.x smeeton, n. j., williams, a. m., hodges, n. j., & ward, p. (2005). the relative effectiveness of various instructional approaches in developing anticipation skill. journal of experimental psychology: applied, 11, 98–110. http://psycnet.apa.org/doi/10.1037/1076-898x.11.2.98 smeeton, n. j., hibbert, j. r., stevenson, k., cumming, j., & williams, a. m. (2013). can imagery facilitate improvements in anticipation behavior? psychology of sport & exercise, 14, 200-210. http://dx.doi.org/10.1016/j.psychsport.2012.10.008 smith, m. a., ghazizadeh, a., & shadmehr, r. (2006). interacting adaptive processes with different timescales underlie short-term motor learning. plos biol, 4(6), e179. http://dx.doi.org/10.1371/journal.pbio.0040179 stal, p., donmez., b., & jamieson, g. a. (2016). supporting anticipation in driving through attentional and interpretational in-vehicle displays. accident, analysis and prevention, 91, 103-113. http://dx.doi.org/10.1016/j.neuropsychologia.2014.05.001 http://dx.doi.org/10.1136/bmjqs-2012-000972 http://dx.doi.org/10.1136/bmjqs-2012-000972 http://dx.doi.org/10.1080/09658211.2010.541466 http://dx.doi.org/10.1123/jsep.35.2.144 http://dx.doi.org/10.1080/02701367.2013.784843 http://dx.doi.org/10.1080/00140130500101346 http://psycnet.apa.org/doi/10.1037/1076-898x.11.2.98 http://dx.doi.org/10.1016/j.psychsport.2012.10.008 http://dx.doi.org/10.1371/journal.pbio.0040179 williams et al | f l r 153 http://dx.doi.org/10.1016/j.aap.2016.02.030 vater, c., roca, a., & williams, a. m. (2015). effects of anxiety on anticipation and visual search in dynamic, time-constrained situations. sport, exercise, & performance psychology. advance online publication. http://dx.doi.org/10.1037/spy0000056 voss, m. w., kramer, a. f., basak, c., prakash, r. s., & roberts, b. (2010). are expert athletes ‘expert’ in the cognitive laboratory? a meta‐analytic review of cognition and sport expertise. applied cognitive psychology, 24(6), 812-826. doi:10.1002/acp.1588 wadden, k.p., hodges, n.j., de asis, k.l., neva, j.l., & boyd, l.a. (2017). individualized challenge point practice informs motor sequence learning: more time in an early phase of practice benefits later retention. plosone. walker, m. p., brakefield, t., morgan, a., hobson, j. a., & stickgold, r. (2002). practice with sleep makes perfect: sleep-dependent motor skill learning. neuron, 35(1), 205-211. http://dx.doi.org/10.1016/s0896-6273(02)00746-8 ward, p., & williams, a.m. (2003). perceptual and cognitive skill development in soccer: the multidimensional nature of expert performance. journal of sport & exercise psychology, 2(1), 93-111. http://dx.doi.org/10.1123/jsep.25.1.93 ward, p., ericsson, k. a., & williams, a. m. (2013). complex perceptual-cognitive expertise in a simulated task environment. journal of cognitive engineering & decision making, 7(3), 231-254. doi:10.1177/1555343412461254 ward, p., hodges, n.j., williams, a.m., & starkes, j. (2007). the role of deliberate practice in the development of expert performers. high ability studies, 18, 119-153. http://dx.doi.org/10.1080/13598130701709715 wright, m. j., bishop, d., jackson, r. c., & abernethy, b. (2010). functional mri reveals expert-novice differences during sport-related anticipation. neuroreport, 21, 94-98. doi:10.1097/wnr.0b013e328333dff2 wright, m. j., bishop, d., jackson, r. c., & abernethy, b. (2011) cortical fmri activation to opponents’ body kinematics in sport-related anticipation: expert-novice differences with normal and point-light video. neuroscience letters, 500(3), 216-221. http://dx.doi.org/10.1016/j.neulet.2011.06.045 williams, a. m., & abernethy, b. (2012). anticipation and decision-making: skills, methods and measures. in g. tenenbaum, r. c. eklund & a. kamata (eds.), measurement in sport & exercise psychology. (pp. 191-202), champaign, il: human kinetics. williams, a. m., & ericsson, k. a. (2007). perception, cognition, action and skilled performance. journal of motor behavior, 39(5), 338-340. doi:10.3200/jmbr.39.5.338-340 williams, a. m., & ericsson, k. a. (2005). perceptual-cognitive expertise in sport: some considerations when applying the expert performance approach. human movement science, 24(3), 283-307. http://dx.doi.org/10.1016/j.humov.2005.06.002 williams, a.m., ward, p., & chapman, c. (2003). training perceptual skill in field hockey: is there transfer from the laboratory to the field? research quarterly for exercise & sport, 74, 98-104. http://dx.doi.org/10.1080/02701367.2003.10609068 williams, a. m., ford, p. r., eccles, d. w., & ward, p. (2011). perceptual-cognitive expertise in sport and its acquisition: implications for applied cognitive psychology. applied cognitive psychology, 25, 432442. doi:10.1002/acp.1710 williams, a. m., north, j. s., & hope, e. r. (2012). identifying the mechanisms underpinning recognition of structured sequences of action. the quarterly journal of experimental psychology, 65, 1975-1992. http://dx.doi.org/10.1080/17470218.2012.678870 williams, a.m., ward, p., bell-walker, j., & ford, p. (2012). discovering the antecedents of anticipation and decision making skill. british journal of psychology, 103, 393-411. doi: 10.1111/j.2044-8295.2011.02081.x. williams, a. m., ward, p., knowles, j. m., & smeeton, n. j. (2002). anticipation skill in a real-world task: measurement, training, and transfer in tennis. journal of experimental psychology: applied, 8, 259270. http://psycnet.apa.org/doi/10.1037/1076-898x.8.4.259 http://dx.doi.org/10.1016/j.aap.2016.02.030 http://dx.doi.org/10.1037/spy0000056 http://dx.doi.org/10.1016/s0896-6273(02)00746-8 http://dx.doi.org/10.1123/jsep.25.1.93 http://dx.doi.org/10.1080/13598130701709715 http://dx.doi.org/10.1016/j.neulet.2011.06.045 http://dx.doi.org/10.1016/j.humov.2005.06.002 http://dx.doi.org/10.1080/02701367.2003.10609068 http://dx.doi.org/10.1080/17470218.2012.678870 http://psycnet.apa.org/doi/10.1037/1076-898x.8.4.259 williams et al | f l r 154 wulf, g., shea, c., & lewthwaite, r. (2010). motor skill learning and performance: a review of influential factors. medical education, 44, 75-84. doi:10.1111/j.1365-2923.2009.03421.x zimmerman, b.j. (2008). investigating self –regulation and motivation: historical background, methodological developments, and future prospects. american educational research journal, 45, 166-183. doi:10.3102/0002831207312909 harteis et al publication frontline learning research vol.6 no. 2 (2018) 57 -71 issn 2295-3159 do we betray errors beforehand? the use of eye tracking, automated face recognition and computer algorithms to analyse learning from errors christian scharingera a leibniz-institut für wissensmedien, germany article received 14 may 2018 / revised 25 september/ accepted 26 september/ available online 7 december abstract during the last decade the combined recording of eye-tracking data and electroencephalographic (eeg) data has led to the methodology of fixation-related potentials analysis (frp). this methodology has been increasingly and successfully used to study eeg correlates in the time domain (i.e., event-related potentials, erps) of cognitive processing in free viewing situations like text reading or natural scene perception. basically, fixation-onset serves as time-locking event for epoching and analysing the eeg data. in this article the methodology of fixation-related frequency band power analysis (frbp) is proposed and conceptually outlined to study cognitive load and affective variations in learners during free viewing situations of multimedia learning materials (i.e., combinations of textual and pictorial elements). the eeg alpha frequency band power at parietal electrodes may serve as a valid measure of cognitive load, whereas the frontal alpha asymmetry may serve as a measure of affective variations. i will introduce and motivate the measures and the methodology and discuss methodological challenges and potential ways to overcome them. the methodology is frontline for learning research, first, as to date the eeg has been rarely used to study design effects of multimedia learning materials and second, as fixation-related eeg data analysis has rarely been done focussing on the frequency domain (i.e., frbp). despite methodological challenges still to be solved, frbp may provide a more in-depth picture of cognitive processing during multimedia learning compared to eye-tracking data or eeg data in isolation and thus may help clarifying effects of multimedia design decisions. keywords: eeg; eye-tracking; fixation-related eeg data analysis; eeg alpha frequency band power; multimedia info corresponding author mail: c.scharinger@iwm-tuebingen.de . doi: https://doi.org/10.14786/flr.v6i3.373 1. introduction there is general agreement in instructional psychology that an adequate design of multimedia learning material (i.e., combinations of text and picture) is crucial for learning success (e.g., mayer, 2009). this is because the design of multimedia learning material can alter the amount of additional, extraneous cognitive load (cl) imposed on the learner, either, in case of "good" design by freeing working memory resources, or, in case of "bad" design, by depleting working memory resources, potentially leading to an overload-situation hampering successful learning (see theoretical accounts like the cognitive load theory, sweller, van merrienboer, & paas, 1998; or the cognitive theory of multimedia learning, mayer, 2009). various multimedia design principles have been described that may alter learners' extraneous cl (mayer & fiorella, 2016; mayer & moreno, 2003). however, it still remains a matter of research of how exactly cl is influenced by certain multimedia elements and multimedia design decisions. for example, how exactly adding decorative pictures to learning materials (i.e., texts) influence cognitive (and affective) processing and consequently alter learning outcomes is still a matter of debate (for a comprehensive review see rey, 2012). such decorative pictures (or graphical elements) adjacent to textual information in multimedia learning materials that are only loosely content-related (e.g., a picture of a lightning stroke adjacent to a text describing the meteorological formation of thunderstorms) have been termed pictorial seductive details (harp & mayer, 1998). pictorial seductive details have resulted in mixed effects on learning outcomes, ranging from beneficial effects (schneider, nebel, & rey, 2016), to no-effects (park & lim, 2007), and even detrimental effects (harp & mayer, 1998; mayer & fiorella, 2016). while the beneficial effects might be explained by affective processes with the pictures positively altering the learners' motivational state (knörzer, brünken, & park, 2016; lenzner, schnotz, & müller, 2013; magner, schwonke, aleven, popescu, & renkl, 2014; schneider et al., 2016), detrimental effects might be explained by increased extraneous cl due to the additional, yet irrelevant pictorial information (mayer & fiorella, 2016; mayer & moreno, 2003), and due to effects of distraction away from and interruption of the learning process (i.e., schema construction; schneider, dyrna, meier, beege, & rey, 2017). clearly, in order to unravel potential reasons for the disparate effects of pictorial seductive details, adequate process measures (i.e., online measures) are necessary to better understand the effects of multimedia design elements on cognitive and affective processing. the electroencephalogram (eeg) and more precisely the eeg alpha and theta frequency band power might serve as such adequate, promising process measures. especially, the methodology of fixation-related eeg frequency band power analysis (frbp) might allow studying the eeg data in ecological valid task settings, that is, in free viewing situations of multimedia learning material, and may allow identifying which multimedia elements (i.e., text or picture) alter cl to what extent. in the next section, i will introduce the eeg alpha (and theta) frequency band power as potential and promising measures of cl and affective processing. i will then describe the frbp methodology on a conceptual level and address open methodological challenges as well as possible approaches to overcome them. note that the main purpose of the current article is to conceptually propose the frbp analysis as a promising, new methodological account for multimedia research by providing an overview of the eeg frequency band power as a measure of cl and affective processing and by making readers aware of potentials as well as weaknesses of the frbp methodology, yet without discussing practicalities and single methodological challenges in depth. the article may nevertheless serve as a valid and helpful primer for future research using the frbp methodology in the context of multimedia materials. 2. eeg frequency band power as a measure of cognitive load and affective processing identifying adequate process measures still is an important and general matter of debate in instructional psychology (brünken, seufert, & paas, 2010; paas, tuovinen, tabbers, & van gerven, 2003). traditionally, most research focused on outcome measures like learning success, retention, or transfer of the learned knowledge. subjective rating scales are used to assess the learners' invested effort during learning (e.g., hart & staveland, 1988; klepsch, schmitz, & seufert, 2017; paas, 1992) with effort sought to be directly related to cl and hence working memory load (schnotz & kürschner, 2007). one general drawback of subjective rating scales is that they only allow assessing cl in hindsight and for longer time periods. whether participants report an averaged impression of cl or certain load-peaks remains elusive (schmeck, opfermann, van gog, paas, & leutner, 2015). more importantly, motivational factors might confound the subjective ratings of cl (schnotz et al., 2009). consequently, the use of objective, online process measures of cl during learning have been proposed (antonenko, paas, grabner, & van gog, 2010; brünken, plass, & leutner, 2003; korbach, brünken, & park, 2017; paas et al., 2003). for example, performance in a parallel secondary task can be used to assess the current cl of the primary task (brünken et al., 2003; park & brünken, 2015). however, the secondary task may unintentionally interfere with the primary task. in contrast, physiological measures like pupil dilation or the eeg, and, more specifically, the eeg alpha (8 – 13 hz) frequency band power at parietal electrodes and the theta (4 – 6 hz) frequency band power at frontal electrodes may serve as valid measures of cl overcoming the aforementioned limitations (antonenko et al., 2010; beatty & lucero-wagoner, 2000). in addition, the frontal alpha asymmetry (faa; smith, reznik, stewart, & allen, 2017) might serve as an index of affective effects in multimedia task materials. the human eeg is typically recorded via several (e.g., 32 or 64) electrodes that are placed on the scalp at different positions consistently defined by the (extended) international 10-20 system (jasper, 1958) using electrode caps with predefined slots for the electrodes (see figure 1 for an exemplary electrode layout). after applying electrode gel to reduce the electrical impedance between the electrodes and the skin, the raw eeg can be measured. the eeg is the amplified recording of the small electrical currents generated within the brain by the summed post-synaptic electrical potentials of large neuronal assemblies consisting of several millions of neurons (pyramidal cells of the cortex) that are oriented in parallel and synchronously active (for reviews see cohen, 2017; jackson & bolger, 2014; olejniczak, 2006). the typical sampling rate of an eeg recording device is 500 or 1000 hz (i.e., one measurement point every two or each millisecond, respectively). the eeg thus reflects the oscillatory activity of specific neuronal populations with excellent time resolution, yet the spatial resolution is rather low (in the range of centimeter; olejniczak, 2006). figure 1. schematic head (nose up) with exemplary electrode layout. highlighted are representative electrode positions for measuring cl (blue color) or affective processing (red color). note. electrodes outside the head are due to the projection of 3d locations onto 2d space. after some data preprocessing steps (e.g., filtering, artefact removal; for guidelines see picton, 2000), the recorded eeg data can be analyzed in the time domain and/or in the frequency domain. for analyses in the time domain, parts of the eeg (i.e., epochs) time-locked to certain events (e.g., stimulus-onsets) are averaged across several trials (i.e., repetitions of a specific event) to increase the signal-to-noise ratio. this procedure results in so-called event-related potential (erp) curves, that is, positive and negative deflections (i.e., components) that have been linked quite specifically to certain cognitive processes (for a comprehensive review see münte, urbach, düzel, & kutas, 2000). for analyses in the frequency domain, the spectrum is calculated for the eeg epochs of interest (e.g., using fast-fourier transforms, fft, or wavelet analysis; cohen, 2014). this results in power values for the different frequencies contained in the eeg signal (and also phase information, which however is beyond the scope of the current article; the interested reader may refer to bastiaansen, mazaheri, & jensen, 2012; cohen, 2014). the more neurons are synchronously active (i.e., 'firing') at a specific frequency in response to an event, the higher is the measured power (i.e., amplitude to the square) at this specific frequency. therefore, an increase in frequency band power after an event is generally termed event-related synchronization (ers), whereas a decrease in frequency band power is termed event-related desynchronization (erd; pfurtscheller & lopes da silva, 1999). the amount of change in erd or ers in relation to an event can be expressed as percentage of change using the erd/ers%-formula given in pfurtscheller & lopes da silva (1999; see also antonenko et al., 2010). five frequency bands have traditionally been differentiated in the eeg reflecting functionally different oscillatory neuronal activity: slow oscillatory activity in the delta (< 3 hz) range, oscillatory activity in the theta (4 – 6 hz), alpha (8 – 13 hz), and beta (13 – 24 hz) range, and fast oscillatory activity in the gamma (> 40 hz) range (for reviews see bastiaansen et al., 2012; krause, 2003). in the context of multimedia research oscillatory activity (and hence, power) in the theta and alpha frequency band is of specific interest as will be detailed below. slow (delta) and fast (gamma) oscillatory activity are rather prone to artefacts (e.g., slow drifts or muscle activity) requiring highly controlled lab settings and rather simple, basic tasks and task materials (see, e.g., bastiaansen et al., 2012), thus being rather inadequate for multimedia research. oscillatory activity in the beta band has mainly been attributed to reflect activity of the motor cortex (pfurtscheller, zalaudek, & neuper, 1998), but might also be interesting for studying cognitive processes (engel & fries, 2010). importantly, the eeg alpha and theta frequency band power might be used as reliable process measures of cl in multimedia research. it has been consistently observed that the eeg alpha frequency band power at parietal electrodes decreases for increasing cl whereas the frontal-central eeg theta frequency band power increases for increasing cl (gevins & smith, 2000; kretzschmar et al., 2013; palomäki, kivikangas, alafuzoff, hakala, & krause, 2012; pesonen, hämäläinen, & krause, 2007; scharinger, soutschek, schubert, & gerjets, 2015; 2017). functionally, the alpha erd (i.e., the decreasing alpha power) has been interpreted to index cortical activity related to attentional and memory processing (klimesch, 1999; krause, 2003). according to klimesch (1999) the alpha frequency band might be functionally divided further in an upper alpha (10 – 13 hz) frequency band that might mainly be related to (semantic) memory processing and a lower alpha (8 – 10 hz) frequency band that might mainly be related to attentional processes. neural activity in the eeg theta frequency band has been associated predominantly with processes of working memory and cognitive control (itthipuripat, wessel, & aron, 2013; nigbur, ivanova, & stürmer, 2011; sauseng et al., 2002; sauseng, griesmayr, freunberger, & klimesch, 2010). note however, that the functional relationship between eeg frequency band power and specific cognitive processes is not as clear and well established as the relationship between certain erp components of the eeg and cognitive processes (krause, 2003). however, in contrast to erps that require a highly controlled, artificial task environment, especially eeg alpha frequency band power has been shown to reliably index cl also in complex task materials like those of instructional psychology (antonenko & niederhauser, 2010; gerlic & jausovec, 2001; scharinger, kammerer, & gerjets, 2015; 2016). importantly, changes in the eeg frequency band power index fluctuations of cl with high temporal acuity. furthermore, apart from indexing cl the eeg alpha frequency band power might also be used to assess emotional and motivational aspects of stimulus-processing. the frontal alpha asymmetry (faa) may serve as such a measure. the faa is a relational measure of the eeg alpha (8 – 13 hz) frequency band power over the left-frontal hemisphere compared to the right-frontal hemisphere, commonly calculated as the difference between corresponding electrodes (e.g., f4 minus f3; smith, reznik, stewart, & allen, 2017). the faa has its origin in cortical activity of the prefrontal cortex. the prefrontal cortex plays an important role in affective processing and emotion regulation showing specific left and right hemispheric lateralization effects for emotions and affective stimulus content (demaree, everhart, youngstrom, & harrison, 2005; dixon, thiruchselvam, todd, & christoff, 2017). although still matter of debate, greater left than right hemispheric cortical activity has been associated with approach motivation and hence predominantly emotionally positively connoted stimuli, whereas greater right hemispheric cortical activity has been associated with withdrawal motivation and hence predominantly emotionally negatively connoted stimuli (ahern & schwartz, 1985; harmon-jones, gable, & peterson, 2010). most eeg studies so far assessed the faa during rest conditions after emotion induction procedures (e.g., coan & allen, 2004). yet, recent research indicated that the faa might be also used during task performance when emotional stimuli are presented for a brief period of time (schöne, schomberg, gruber, & quirin, 2016; weinreich, stephani, & schubert, 2016). thus, potentially, the faa might be used to study affective aspects of multimedia. whether the faa can be used in such a way for complex multimedia materials has however to be studied further. the methodology of fixation-related eeg data analysis may allow analyzing the eeg frequency band power in free viewing situations of multimedia task materials when certain elements (i.e., areas of interest, aois) are fixated and thus may allow to assess how cl (or in case of the faa, affective processing) is altered by specific multimedia elements like pictorial seductive details. in the next section, i will give a brief overview on the methodology, then conceptually pointing out methodological challenges that one has to be aware of. 3. fixation-related eeg data analysis eye-tracking is increasingly used in research on instructional multimedia materials to study underlying cognitive processes during learning (e.g., eitel, scheiter, & schüler, 2012; hyönä, 2010; jarodzka, holmqvist, & gruber, 2017; mayer, 2010; schüler, 2017; van gog & scheiter, 2010; for a recent overview on the literature see alemdag & cagiltay, 2018). eye-tracking data shows the movement of the eyes, basically differentiating between saccades (i.e., the eyes quickly moving) to elements of the visual scenery and fixations (i.e., the eyes practically at rest) on elements of the visual scenery (for a comprehensive discussion of different eye-tracking patterns see holmqvist & andersson, 2017). it is generally agreed on, that during saccades the visual information intake is not possible (e.g., kok & jarodzka, 2017a), whereas it is possible (and most of the time takes place) during fixations. it has been shown that the fixation patterns vary depending on the task materials and the given task (e.g., in picture viewing, yarbus, 1967, or in reading, strukelj & niehorster, 2018; for reviews see kowler, 2011; rayner, 1998; 2009), indicating a plausible link between fixations and cognitive processing (just & carpenter, 1976; 1980). note however, this link might not always be straightforward. visual attention might be slightly ahead of what is fixated (depending on the task, up to 250 ms; deubel, 2008). in addition, due to peripheral viewing especially in reading more elements than those fixated at might be processed (baccino & manunta, 2005; dimigen, kliegl, & sommer, 2012; rayner, 2009). on the contrary, what is fixated at might not always be (consciously) processed. this might be the case during periods of mind-wandering (foulsham, farley, & kingstone, 2013), or if visual elements are not task relevant (as indicated in change blindness paradigms; e.g., triesch, ballard, hayhoe, & sullivan, 2003). nevertheless, fixations have been used as a valid proxy reflecting the individuals' structure of the information intake (and procesing) in free viewing situations and hence as triggers for defining epochs in time for which the eeg data can be analysed. in free viewing or free reading situations no specific stimulus-onset exists. however, the fixation-onset can be used as the event for which the eeg data is time-locked, epoched, and analysed to (depending on the concrete research question, some studies also used saccade-onset as time-locking event; cf. dimigen, sommer, hohlfeld, jacobs, & kliegl, 2011). most studies so far have studied the eeg in the time domain, that is, fixation-related potentials (frps), for example in reading research (dimigen, et al., 2011; frey, lemaire, vercueil, & guérin-dugué, 2018; henderson, luke, schmidt, & richards, 2013; hutzler et al., 2007; kliegl, dambacher, dimigen, & sommer, 2014; kornrumpf, dimigen, & sommer, 2017; kornrumpf, niefind, sommer, & dimigen, 2016; léger et al., 2014; niefind & dimigen, 2016; weiss, knakker, & vidnyánszky, 2016), natural scene perception (giannini, alexander, nikolaev, & van leeuwen, 2018; simola, le fevre, torniainen, & baccino, 2015; simola, torniainen, moisala, kivikangas, & krause, 2013), visual search (brouwer, hogervorst, oudejans, ries, & touryan, 2017; kamienkowski, ison, quiroga, & sigman, 2012; kaunitz et al., 2014; ries, touryan, ahrens, & connolly, 2016; winslow et al., 2010), decision making (frey et al., 2013), and human-computer interaction (léger et al., 2014). despite methodological challenges (see below) these studies concurrently report frp-effects comparable to classical erp-effects (e.g., n400-like effects for word predictability; dimigen et al., 2011), thus validating the basic feasibility and meaningfulness of fixation-related eeg data analyses. only few studies so far have analysed the eeg data in the frequency domain using frbp. for example, scharinger and colleagues (scharinger et al., 2015, experiment 1) used frbp to study cl during hypertext-like reading, comparing several parts of a text that defined two different areas of interest (aois): aois of parts of the text where participants simply read and aois of parts of the text where participants additionally had to perform hyperlink-like selection processes. as hypothesized, the cl was higher when participants had to perform hyperlink-like selection processes in addition to purely text reading, indicated by decreased eeg alpha frequency band power. interestingly, this result was also confirmed by the pupil dilation data, with the pupil showing a larger diameter (i.e., higher cl) for parts of the text with hyperlink-like selection processes as compared to purely text reading. in a second experiment (scharinger et al., 2015, experiment 2) the results could be replicated using classical response-locked eeg data analysis instead of fixation-related eeg data analysis, thus underlining the validity of the fixation-related eeg data analysis methodology. another study by scharinger and colleagues (scharinger et al., 2016) used frbp to study cl (as indexed by the parietal eeg alpha frequency band power) in free viewing and evaluating of search engine result pages. this study indicated that a perfect hit (i.e., a semantically and lexically matching search result for a specific search query) resulted already during initial fixations in decreased eeg alpha frequency band power as compared to search results that are no, or no perfect matches for a given search query. this has been interpreted as indicating that the best hit is recognized early in time and potentially then more thoroughly processed (resulting in increased cl) as compared to other, less fitting search results. finally, vignali and colleagues (vignali, himmelstoss, hawelka, richlan, & hutzler, 2016) used frbp to study semantic violations in sentences in free reading situations. they observed decreased lower-beta band (13-18 hz) power for fixations of semantically unrelated words as compared to semantically related words, also indicating higher cl. to sum up, studies so far indicated the principal feasibility and validity of the methodology of fixation-related eeg data analysis. combining eye-tracking and the eeg in research on instructional multimedia materials seems to be promising for two reasons. first, it allows to study the eeg during learning with multimedia task materials in task settings of high ecological validity (i.e., 'realistic' task materials, with texts and pictures presented simultaneously on one screen). without this methodology, text and pictures would have to be presented separately in time (i.e., in rather artificial sequences) to create stimulus-onsets for the eeg data analysis. second, the eeg data might be used as an additional measure for triangulating the meaning of observed eye-movement patterns as has been proposed for verbal data (kok & jarodzka, 2017b). for example, the eeg might help differentiating whether longer fixations might indicate increased cl (könig et al., 2016; reichle & reingold, 2013) or individuals' expertise (bertram, helle, kaakinen, & svedström, 2013; reingold & sheridan, 2011), or it might help differentiating whether learners are still working on a (difficult) task or mind-wandering (foulsham et al., 2013). yet, some methodological challenges remain that one has to be aware of. 4. methodological challenges of fixation-related eeg data analysis and potential solutions there are some challenges that one has to be aware of when analysing fixation-related eeg data. first, the eeg and eye-tracking data has to be synchronized. this can be done by regularly sending triggers (e.g., each second) during data recording to both recording devices (i.e., the eeg and the eye-tracker). based on these triggers the two data streams can then be synchronized offline using for example the toolbox eeglab (delorme & makeig, 2004) with the eye-eeg plugin (dimigen et al., 2011). the synchronisation of the eeg and eye-tracking data includes matching the sampling rates of both data streams (which is for the eeg typically at 500 hz or 1000 hz and for current remote eye-tracking devices at 120 hz or 250 hz), either by upor down-sampling. once the eeg data and the eye-tracking data are synchronized there remain several challenges that have to be taken into consideration. these include the correction of eye-movement artefacts, dealing with overlapping eeg data segments due to different and rather short fixation durations, and the selection of an adequate baseline (for a comprehensive discussion of these challenges and potential solutions the interested reader may refer to baccino, 2011; dimigen et al., 2011; nikolaev et al., 2016; the purpose of the current article is to make readers aware of potentials as well as weaknesses of the methodology, yet without discussing single methodological challenges in depth). eye-movements (i.e., saccades and blinks) alter the electric fields around the eyes and consequently confound the raw eeg, especially at frontal electrodes (iwasaki et al., 2005). thus, eye-movement artefacts may either mask the eeg correlates of interest or, worse, may induce a systematic error. for example, in multimedia the eye-movement patterns vary between viewing of textual and pictorial elements. thus, when interested in fixation-related eeg data for multimedia, that is, when comparing eeg data for text reading and picture viewing, the eeg may be confounded by different eye-movement patterns. however, several methodologies exist to clean the raw eeg data from eye-movement artefacts (for reviews see croft & barry, 2000; islam, rastegarnia, & yang, 2016). for example, independent component analysis (ica) can be used to reliably identify and correct for eye-movement artefacts (chaumon, bishop, & busch, 2015; delorme, sejnowski, & makeig, 2007; jung et al., 2000; zhou & gotman, 2009). noteworthy, especially in the context of fixation-related eeg data analysis the use of ica has been shown to result in adequately cleaned eeg, outperforming other methodologies like standard regression-based data cleaning (henderson et al., 2013; hutzler et al., 2007). while a variety of methodologies exists for cleaning eeg data from eye-movement artefacts, the varying and rather short length of fixations is another challenge for fixation-related eeg analysis that one has to be aware of. for example, during reading typical fixations last on average between 200-250 ms (e.g., dimigen et al., 2011). this is problematic when analysing the eeg data in the time domain as later components in the frp of a current fixation (e.g., the n400) might be overlapped (i.e., confounded) by early components of the frp of the following fixation. in frp analysis very short fixations are therefore excluded from analysis (e.g., fixations < 80 ms; frey et al., 2018). several statistical methods (e.g., regression-based models) have been proposed to deal with the potentially confounding effect of overlapping eeg data segments (baccino, 2011; dimigen et al., 2011; frey et al., 2018; nikolaev, pannasch, ito, & belopolsky, 2014). it has also been proposed to compare only those data sequences of comparable fixation-patterns between task conditions of interest to minimize potentially confounding effects due to different fixation durations and hence differently overlapping eeg segments (nikolaev et al., 2016). while this proceeding might be feasible for task conditions that are quite comparable (e.g., both within the domain of reading), for multimedia task materials with texts and pictures it might be impossible to adequately match the fixations used for analysis due to the different fixation patterns for reading and picture viewing. moreover, in frbp analysis overlapping eeg data epochs due to short fixation durations might frequently occur. this is because, the length of the eeg data epoch defines the possible frequency resolution of the calculated spectrum. for example, in the theta frequency range (4 – 6 hz) one oscillation lasts at minimum 250 ms. thus, when interested in the theta frequency band power an analysis window of at least 250 ms would be necessary. commonly, an analysis window including several oscillations of the specific frequency of interest is recommended for calculating the spectrum (i.e., an analysis window of at least 500 ms length). consequently, one has to be aware of the constraint that frbp analysis is seldom suitable for analysing the eeg data for single fixations (unless their duration is long enough). yet for research on multimedia task materials interested in cl this constraint might not be of too much relevance, as the aois of interest (i.e., parts of the text versus the pictures) might generally comprise several fixations that one could summarize, resulting in eeg data epochs long enough for analysis (see figure 2). importantly, one still would be able to differentiate between first and later visits of an aoi. nevertheless, a comparison of the eeg in classical sequential stimulus presentation paradigms (e.g., in multimedia research by presenting text and pictures sequentially) with fixation-related paradigms might be necessary to further validate the reliability of the frbp analysis when used in new experimental paradigms or for new, complex multimedia task materials. figure 2. left part: schematic illustration of the problem of overlapping eeg data segments (marked by the lightning symbol) in case of short fixations and a frbp analysis based on single fixations (i.e., using very small aois). right part: schematic illustration of eeg data segments aligned to larger aois which would be typically used in multimedia research. longer eeg segments could be used without overlapping. note. the eeg and eye-tracking data has been artificially combined and does not show data of a real person. another, however easily addressable challenge of overlapping eeg segments in fixation-related eeg data analysis are specific eeg correlates at the very beginning of a stimulus (e.g., when the text for reading is shown on the screen for the first time). to avoid stimulus-presentation associated event-related eeg correlates potentially masking fixation-related eeg correlates, the first 700 ms of stimulus presentation (i.e., the first few fixations) should be excluded from eeg data analysis (dimigen et al., 2011). finally, defining an adequate baseline for eeg data analysis is not trivial in free viewing situations (dimigen et al., 2011; nikolaev et al., 2016). in classical eeg data analysis typically a pre-stimulus baseline is used for baseline correction in order to reduce slow currency drifts (e.g., due to fatigue) in the eeg epochs used for analysis (picton, 2000). as in fixation-related eeg data analysis the pre-fixation baseline is contaminated by saccadic activity, it has been proposed to alternatively use a short time-frame directly after fixation-onset as baseline (e.g., baccino, 2011) or to use a global, pre-stimulus baseline (i.e., a baseline at the beginning of the task; nikolaev et al., 2016). to date, there is no clear recommendation of what time interval is best suited as baseline in fixation-related eeg data analysis. choosing an adequate baseline may largely depend on the concrete task design. potentially, in frbp it might also be possible to report absolute power values (i.e., to avoid using a baseline) if the experimental conditions (i.e., the aois for which the eeg data is analyzed and compared) are fully permutated with respect to spatial and temporal positions (i.e., when spatial or timing issues can be excluded as potential confounds of the data). 5. conclusions despite the remaining challenges of using the methodology of fixation-related eeg data analysis, the methodology clearly is at the frontline of learning research. eeg (alpha and theta) frequency band power may help gaining deeper insight in the cognitive processing of multimedia elements (i.e., text and picture and resulting cl or affective effects). thus, frbp may substantially contribute to a better understanding of multimedia design effects like pictorial seductive details and consequently may foster better instructional design. keypoints eeg alpha frequency band power allows assessing cognitive load (and potentially affective effects) during learning with multimedia task materials. the methodology of fixation-related frequency band power analysis (frbp) allows studying the eeg frequency band power in free viewing situations. frbp thus allows comparing cognitive load when different multimedia elements are fixated (e.g., text versus picture). the methodology may thus provide a deeper understanding of multimedia design effects like the pictorial seductive detail effect. references ahern, g. l., & schwartz, g. e. (1985). differential lateralization for positive and negative emotion in the human brain: eeg spectral analysis. neuropsychologia, 23(6), 745–755. https://doi.org/10.1016/0028-3932(85)90081-8 alemdag, e., & cagiltay, k. (2018). a systematic review of eye tracking research on multimedia learning. computers & education, 125(july), 413–428. https://doi.org/10.1016/j.compedu.2018.06.023 antonenko, p., & niederhauser, d. s. (2010). the influence of leads on cognitive load and learning in a hypertext environment. computers in human behavior, 26(2), 140–150. https://doi.org/10.1016/j.chb.2009.10.014 antonenko, p., paas, f., grabner, r., & van gog, t. (2010). using electroencephalography to measure cognitive load. educational psychology review, 22(4), 425–438. https://doi.org/10.1007/s10648-010-9130-y baccino, t. (2011). eye movements and concurrent event-related potentials’: eye fixation-related potential investigations in reading. in s. liversedge, i. gilchrist, & s. everling (eds.), oxford handbook of eye movements(pp. 857–870). oxford, uk: oxford university press. baccino, t., & manunta, y. (2005). eye-fixation-related potentials: insight into parafoveal processing. journal of psychophysiology, 19(3), 204–215. https://doi.org/10.1027/0269-8803.19.3.204 bastiaansen, m., mazaheri, a., & jensen, o. (2012). beyond erps: oscillatory neuronal dynamics. in s. j. luck & e. s. kappenman (eds.), the oxford handbook of event-related potential components(pp. 31–50). oxford, uk: oxford university press. beatty, j., & lucero-wagoner, b. (2000). the pupillary system. in j. t. cacioppo, l. g. tassinary, & g. berndtson (eds.), handbook of psychophysiology(2nd ed., pp. 142–162). cambridge, uk: cambridge university press. bertram, r., helle, l., kaakinen, j. k., & svedström, e. (2013). the effect of expertise on eye movement behaviour in medical image perception. plos one, 8(6), e66169. https://doi.org/10.1371/journal.pone.0066169 brouwer, a.-m., hogervorst, m. a., oudejans, b., ries, a. j., & touryan, j. (2017). eeg and eye tracking signatures of target encoding during structured visual search. frontiers in human neuroscience, 11, 1–11. https://doi.org/10.3389/fnhum.2017.00264 brünken, r., plass, j. l., & leutner, d. (2003). direct measurement of cognitive load in multimedia learning. educational psychologist, 38(1), 53–61. https://doi.org/10.1207/s15326985ep3801_7 brünken, r., seufert, t., & paas, f. (2010). measuring cognitive load. in j. l. plass, r. moreno, & r. brünken (eds.), cognitive load theory(pp. 181–202). cambridge: cambridge university press. https://doi.org/10.1007/978-1-4419-8126-4_6 chaumon, m., bishop, d. v. m., & busch, n. a. (2015). a practical guide to the selection of independent components of the electroencephalogram for artifact correction. journal of neuroscience methods, 250 , 47–63. https://doi.org/10.1016/j.jneumeth.2015.02.025 coan, j. a, & allen, j. j. b. (2004). frontal eeg asymmetry as a moderator and mediator of emotion. biological psychology, 67(1–2), 7–49. https://doi.org/10.1016/j.biopsycho.2004.03.002 cohen, m. x. (2014). analyzing neural time series data: theory and practice.cambridge, ma: mit press. https://doi.org/10.1007/s13398-014-0173-7.2 cohen, m. x. (2017). where does eeg come from and what does it mean? trends in neurosciences, 40(4), 208–218. https://doi.org/10.1016/j.tins.2017.02.004 croft, r. j., & barry, r. j. (2000). removal of ocular artifact from the eeg: a review. neurophysiologie clinique/clinical neurophysiology, 30 (1), 5–19. https://doi.org/10.1016/s0987-7053(00)00055-1 delorme, a., & makeig, s. (2004). eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis. journal of neuroscience methods, 134(1), 9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009 delorme, a., sejnowski, t., & makeig, s. (2007). enhanced detection of artifacts in eeg data using higher-order statistics and independent component analysis. neuroimage,34(4), 1443–1449. https://doi.org/10.1016/j.neuroimage.2006.11.004 demaree, h. a., everhart, d. e., youngstrom, e. a., & harrison, d. w. (2005). brain lateralization of emotional processing: historical roots and a future incorporating "dominance". behavioral and cognitive neuroscience reviews, 4(1), 3–20. https://doi.org/10.1177/1534582305276837 deubel, h. (2008). the time course of presaccadic attention shifts. psychological research,72(6), 630–640. https://doi.org/10.1007/s00426-008-0165-3 deubel, h., & schneider, w. x. (1996). saccade target selection and object recognition: evidence for a common attentional mechanism. vision research, 36(12), 1827–1837. https://doi.org/10.1016/0042-6989(95)00294-4 dimigen, o., kliegl, r., & sommer, w. (2012). trans-saccadic parafoveal preview benefits in fluent reading: a study with fixation-related brain potentials. neuroimage, 62(1), 381–393. https://doi.org/10.1016/j.neuroimage.2012.04.006 dimigen, o., sommer, w., hohlfeld, a., jacobs, a. m., & kliegl, r. (2011). coregistration of eye movements and eeg in natural reading: analyses and review. journal of experimental psychology. general, 140(4), 552–572. https://doi.org/10.1037/a0023885 dixon, m. l., thiruchselvam, r., todd, r., & christoff, k. (2017). emotion and the prefrontal cortex: an integrative review. psychological bulletin, [epub], 1–61. https://doi.org/10.1037/bul0000096 eitel, a., scheiter, k., & schüler, a. (2012). the time course of information extraction from instructional diagrams. perceptual and motor skills, 115(3), 677–701. https://doi.org/10.2466/22.23.pms.115.6.677-701 engel, a. k., & fries, p. (2010). beta-band oscillations signalling the status quo? current opinion in neurobiology, 20(2), 156–165. https://doi.org/10.1016/j.conb.2010.02.015 foulsham, t., farley, j., & kingstone, a. (2013). mind wandering in sentence reading: decoupling the link between mind and eye. canadian journal of experimental psychology/revue canadienne de psychologie expérimentale , 67(1), 51–59. https://doi.org/10.1037/a0030217 frey, a., ionescu, g., lemaire, b., lópez-orozco, f., baccino, t., & guérin-dugué, a. (2013). decision-making in information seeking on texts: an eye-fixation-related potentials investigation. frontiers in systems neuroscience, 7(39). https://doi.org/10.3389/fnsys.2013.00039 frey, a., lemaire, b., vercueil, l., & guérin-dugué, a. (2018). an eye fixation-related potential study in two reading tasks: reading to memorize and reading to make a decision. brain topography, 1–21. https://doi.org/10.1007/s10548-018-0629-8 gerlic, i., & jausovec, n. (2001). differences in eeg power and coherence measures related to the type of presentation: text versus multimedia. journal of educational computing research, 25 (2), 177–195. http://dx.doi.org/10.2190/ydwy-u3fj-4ly4-lynd gevins, a., & smith, m. e. (2000). neurophysiological measures of working memory and individual differences in cognitive ability and cognitive style. cerebral cortex, 10(9), 829–839. https://doi.org/10.1093/cercor/10.9.829 giannini, m., alexander, d. m., nikolaev, a. r., & van leeuwen, c. (2018). large-scale traveling waves in eeg activity following eye movement. brain topography, 1–15. https://doi.org/10.1007/s10548-018-0622-2 harmon-jones, e., gable, p. a., & peterson, c. k. (2010). the role of asymmetric frontal cortical activity in emotion-related phenomena: a review and update. biological psychology, 84(3), 451–462. https://doi.org/10.1016/j.biopsycho.2009.08.010 harp, s. f., & mayer, r. e. (1998). how seductive details do their damage: a theory of cognitive interest in science learning. journal of educational psychology,90(3), 414–434. https://doi.org/10.1037/0022-0663.90.3.414 hart, s. g., & staveland, l. e. (1988). development of nasa-tlx (task load index): results of empirical and theoretical research. in p. a. hancock & n. meshkati (eds.), human mental workload(pp. 139–183). amsterdam, nl. https://doi.org/10.1016/s0166-4115(08)62386-9 henderson, j. m., luke, s. g., schmidt, j., & richards, j. e. (2013). co-registration of eye movements and event-related potentials in connected-text paragraph reading. frontiers in systems neuroscience, 7(28). https://doi.org/10.3389/fnsys.2013.00028 holmqvist, k., & andersson, r. (2017). eye tracking: a comprehensive guide to methods, paradigms, and measures . lund, sweden: lund eye-tracking research institute. hutzler, f., braun, m., võ, m. l.-h., engl, v., hofmann, m., dambacher, m., leder, h., & jacobs, a. m. (2007). welcome to the real world: validating fixation-related brain potentials for ecologically valid settings. brain research, 1172, 124–9. https://doi.org/10.1016/j.brainres.2007.07.025 hyönä, j. (2010). the use of eye movements in the study of multimedia learning. learning and instruction, 20(2), 172–176. https://doi.org/10.1016/j.learninstruc.2009.02.013 islam, m. k., rastegarnia, a., & yang, z. (2016). methods for artifact detection and removal from scalp eeg: a review. neurophysiologie clinique/clinical neurophysiology, 46 (4–5), 287–305. https://doi.org/10.1016/j.neucli.2016.07.002 itthipuripat, s., wessel, j. r., & aron, a. r. (2013). frontal theta is a signature of successful working memory manipulation. experimental brain research, 224(2), 255–262. https://doi.org/10.1007/s00221-012-3305-3 iwasaki, m., kellinghaus, c., alexopoulos, a. v., burgess, r. c., kumar, a. n., han, y. h., lüders, h. o., & leigh, r. j. (2005). effects of eyelid closure, blinks, and eye movements on the electroencephalogram. clinical neurophysiology, 116(4), 878–885. https://doi.org/10.1016/j.clinph.2004.11.001 jackson, a. f., & bolger, d. j. (2014). the neurophysiological bases of eeg and eeg measurement: a review for the rest of us. psychophysiology, 51(11), 1061–1071. https://doi.org/10.1111/psyp.12283 jarodzka, h., holmqvist, k., & gruber, h. (2017). eye tracking in educational science : theoretical frameworks and research agendas. journal of eye movement research, 10(1), 1–18. https://doi.org/10.16910/jemr.10.1.3 jasper, h. h. (1958). the ten-twenty electrode system of the international federation. electroencephalography and clinical neurophysiology, 10, 371–375. jung, t., makeig, s., humphries, c., lee, t., mckeown, m. j., iragui, i., & sejnowski, t. j. (2000). removing electroencephalographic aretfacts by blind source seperation. psychophysiology,37(2), 163–178. https://doi.org/10.1111/1469-8986.3720163 just, m. a., & carpenter, p. a. (1976). eye fixations and cognitive processes. cognitive psychology, 8(4), 441–480. https://doi.org/10.1016/0010-0285(76)90015-3 just, m. a., & carpenter, p. a. (1980). a theory of reading: from eye fixations ot comprehension. psychological review, 87(4), 329–354. https://doi.org/10.1037/0033-295x.87.4.329 kamienkowski, j. e., ison, m. j., quiroga, r. q., & sigman, m. (2012). fixation-related potentials in visual search: a combined eeg and eye tracking study. journal of vision, 12(7), 4–4. https://doi.org/10.1167/12.7.4 kaunitz, l. n., kamienkowski, j. e., varatharajah, a., sigman, m., quiroga, r. q., & ison, m. j. (2014). looking for a face in the crowd: fixation-related potentials in an eye-movement visual search task. neuroimage, 89, 297–305. https://doi.org/10.1016/j.neuroimage.2013.12.006 klepsch, m., schmitz, f., & seufert, t. (2017). development and validation of two instruments measuring intrinsic, extraneous, and germane cognitive load. frontiers in psychology, 8, 1–18. https://doi.org/10.3389/fpsyg.2017.01997 kliegl, r., dambacher, m., dimigen, o., & sommer, w. (2014). oculomotor control, brain potentials, and timelines of word recognition during natural reading. in current trends in eye tracking research(pp. 141–155). cham: springer international publishing. https://doi.org/10.1007/978-3-319-02868-2_10 klimesch, w. (1999). eeg alpha and theta oscillations reflect cognitive and memory performance: a review and analysis. brain research reviews, 29(2–3), 169–195. https://doi.org/10.1016/s0165-0173(98)00056-3 knörzer, l., brünken, r., & park, b. (2016). facilitators or suppressors: effects of experimentally induced emotions on multimedia learning. learning and instruction, 44, 97–107. https://doi.org/10.1016/j.learninstruc.2016.04.002 könig, p., wilming, n., kietzmann, t. c., ossandon, j. p., onat, s., ehinger, b., gameiro r. r., & kaspar, k. (2016). eye movements as a window to cognitive processes. journal of eye movement research, 9(5), 1–16. https://doi.org/10.16910/jemr.9.5.3 kok, e. m., & jarodzka, h. (2017a). before your very eyes: the value and limitations of eye tracking in medical education. medical education, 51(1), 114–122. https://doi.org/10.1111/medu.13066 kok, e. m., & jarodzka, h. (2017b). beyond your very eyes: eye movements are necessary, not sufficient. medical education, 51(11), 1190–1190. https://doi.org/10.1111/medu.13384 korbach, a., brünken, r., & park, b. (2017). measurement of cognitive load in multimedia learning: a comparison of different objective measures. instructional science. https://doi.org/10.1007/s11251-017-9413-5 kornrumpf, b., dimigen, o., & sommer, w. (2017). lateralization of posterior alpha eeg reflects the distribution of spatial attention during saccadic reading. psychophysiology, 54(6), 809–823. https://doi.org/10.1111/psyp.12849 kornrumpf, b., niefind, f., sommer, w., & dimigen, o. (2016). neural correlates of word recognition: a systematic comparison of natural reading and rapid serial visual presentation. journal of cognitive neuroscience, 28(9), 1374–1391. https://doi.org/10.1162/jocn_a_00977 kowler, e. (2011). eye movements: the past 25 years. vision research, 51(13), 1457–1483. https://doi.org/10.1016/j.visres.2010.12.014 krause, c. m. (2003). brain electric oscillations and cognitive processes. in k. hugdahl (ed.), neuropsychology and cognition. experimental methods in neuropsychology (21st ed., pp. 111–130). boston, ma: kluwer academic publishers group. kretzschmar, f., pleimling, d., hosemann, j., füssel, s., bornkessel-schlesewsky, i., & schlesewsky, m. (2013). subjective impressions do not mirror online reading effort: concurrent eeg-eyetracking evidence from the reading of books and digital media. plos one, 8(2), e56178. https://doi.org/10.1371/journal.pone.0056178 léger, p. m., titah, r., sénecal, s., fredette, m., courtemanche, f., labonte-lemoyne, él., & de guinea, a. o. (2014). precision is in the eye of the beholder: application of eye fixation-related potentials to information systems research. journal of the association for information systems, 15, 651–678. http://dx.doi.org/10.17705/1jais.00376 lenzner, a., schnotz, w., & müller, a. (2013). the role of decorative pictures in learning. instructional science, 41(5), 811–831. https://doi.org/10.1007/s11251-012-9256-z magner, u. i. e., schwonke, r., aleven, v., popescu, o., & renkl, a. (2014). triggering situational interest by decorative illustrations both fosters and hinders learning in computer-based learning environments. learning and instruction,29, 141–152. https://doi.org/10.1016/j.learninstruc.2012.07.002 mayer, r. e. (2009). multimedia learning(2nd ed.). new york, ny: cambridge university press. mayer, r. e. (2010). unique contributions of eye-tracking research to the study of learning with graphics. learning and instruction, 20(2), 167–171. https://doi.org/10.1016/j.learninstruc.2009.02.012 mayer, r. e., & fiorella, l. (2016). principles for reducing extraneous processing in multimedia learning: coherence, signaling, redundancy, spatial contiguity, and temporal contiguity principles. in r. mayer (ed.), the cambridge handbook of multimedia learning(pp. 279–315). cambridge: cambridge university press. https://doi.org/10.1017/cbo9781139547369.015 mayer, r. e., & moreno, r. (2003). nine ways to reduce cognitive load in multimedia learning.educational psychologist, 38(1), 43–52. https://doi.org/10.1207/s15326985ep3801_6 münte, t. f., urbach, t. p., düzel, e., & kutas, m. (2000). event-related brain potentials in the study of human cognition and neuropsychology. handbook of neuropsychology, 1, 1–97. niefind, f., & dimigen, o. (2016). dissociating parafoveal preview benefit and parafovea-on-fovea effects during reading: a combined eye tracking and eeg study. psychophysiology, 53(12), 1784–1798. https://doi.org/10.1111/psyp.12765 nigbur, r., ivanova, g., & stürmer, b. (2011). theta power as a marker for cognitive interference. clinical neurophysiology, 122 (11), 2185–2194. https://doi.org/10.1016/j.clinph.2011.03.030 nikolaev, a. r., meghanathan, r. n., & van leeuwen, c. (2016). combining eeg and eye movement recording in free viewing: pitfalls and possibilities. brain and cognition, 107, 55–83. https://doi.org/10.1016/j.bandc.2016.06.004 nikolaev, a. r., pannasch, s., ito, j., & belopolsky, a. v. (2014). eye movement-related brain activity during perceptual and cognitive processing. frontiers in systems neuroscience(8). https://doi.org/10.3389/fnsys.2014.00062 olejniczak, p. (2006). neurophysiologic basis of eeg. journal of clinical neurophysiology,23(3), 186–189. https://doi.org/10.1097/01.wnp.0000220079.61973.6c paas, f. g. (1992). training strategies for attaining transfer of problem-solving skill in statistics: a cognitive-load approach. journal of educational psychology,84(4), 429–434. https://doi.org/10.1037/0022-0663.84.4.429 paas, f., tuovinen, j. e., tabbers, h., & van gerven, p. w. m. (2003). cognitive load measurement as a means to advance cognitive load theory. educational psychologist, 38(1), 63–71. https://doi.org/10.1207/s15326985ep3801_8 palomäki, j., kivikangas, m., alafuzoff, a., hakala, t., & krause, c. m. (2012). brain oscillatory 4-35 hz eeg responses during an n-back task with complex visual stimuli. neuroscience letters, 516 (1), 141–145. https://doi.org/10.1016/j.neulet.2012.03.076 park, b., & brünken, r. (2015). the rhythm method: a new method for measuring cognitive load. an experimental dual-task study. applied cognitive psychology, 29(2), 232–243. https://doi.org/10.1002/acp.3100 park, s., & lim, j. (2007). promoting positive emotion in multimedia learning using visual illustrations sanghoon park and jung lim. journal of educational multimedia & hypermedia, 16 (2), 141–162. pesonen, m., hämäläinen, h., & krause, c. m. (2007). brain oscillatory 4-30 hz responses during a visual n-back memory task with varying memory load. brain research, 1138, 171–177. https://doi.org/10.1016/j.brainres.2006.12.076 pfurtscheller, g., & lopes da silva, f. h. (1999). event-related eeg/meg synchronization and desynchronization: basic principles. clinical neurophysiology, 110(11), 1842–1857. https://doi.org/10.1016/s1388-2457(99)00141-8 pfurtscheller, g., zalaudek, k., & neuper, c. (1998). event-related beta synchronization after wrist, finger and thumb movement. electroencephalography and clinical neurophysiology, 109 (2), 154–160. https://doi.org/10.1016/s0924-980x(97)00070-2 picton, t. w. (2000). guidelines for using human event-related potentials to study cognition: recording standards and publication criteria. psychophysiology, 37(2), 127–152. https://doi.org/10.1111/1469-8986.3720127 rayner, k. (1998). eye movements in reading and information processing: 20 years of research. psychological bulletin, 124(3), 372–422. https://doi.org/10.1037/0033-2909.124.3.372 rayner, k. (2009).eye movements and attention in reading, scene perception, and visual search. quarterly journal of experimental psychology (vol. 62). https://doi.org/10.1080/17470210902816461 reichle, e. d., & reingold, e. m. (2013). neurophysiological constraints on the eye-mind link. frontiers in human neuroscience, 7(july), 1–6. https://doi.org/10.3389/fnhum.2013.00361 reingold, e. m., & sheridan, h. (2011). eye movements and visual expertise in chess and medicine. in s. p. liversedge, i. d. gilchrist, & s. everling (eds.), the oxford handbook of eye movements(pp. 528–550). oxford university press. https://doi.org/10.1093/oxfordhb/9780199539789.013.0029 rey, g. d. (2012). a review of research and a meta-analysis of the seductive detail effect. educational research review, 7 (3), 216–237. https://doi.org/10.1016/j.edurev.2012.05.003 ries, a. j., touryan, j., ahrens, b., & connolly, p. (2016). the impact of task demands on fixation-related brain potentials during guided search. plos one, 11(6), e0157260. https://doi.org/10.1371/journal.pone.0157260 sauseng, p., griesmayr, b., freunberger, r., & klimesch, w. (2010). control mechanisms in working memory: a possible function of eeg theta oscillations. neuroscience and biobehavioral reviews, 34 (7), 1015–1022. https://doi.org/10.1016/j.neubiorev.2009.12.006 sauseng, p., klimesch, w., gruber, w., doppelmayr, m., stadler, w., & schabus, m. (2002). the interplay between theta and alpha oscillations in the human electroencephalogram reflects the transfer of information between memory systems. neuroscience letters, 324(2), 121–124. https://doi.org/10.1016/s0304-3940(02)00225-2 scharinger, c., kammerer, y., & gerjets, p. (2015). pupil dilation and eeg alpha frequency band power reveal load on executive functions for link-selection processes during text reading. plos one, 10(6), e0130608. https://doi.org/10.1371/journal.pone.0130608 scharinger, c., kammerer, y., & gerjets, p. (2016). fixation-related eeg frequency band power analysis: a promising neuro-cognitive methodology to evaluate the matching-quality of web search results? in c. stephanidis (ed.), hci international 2016 posters’ extended abstracts, part i (pp. 245–250). cham, switzerland: springer international publishing. https://doi.org/10.1007/978-3-319-40548-3_41 scharinger, c., soutschek, a., schubert, t., & gerjets, p. (2015). when flanker meets the n-back: what eeg and pupil dilation data reveal about the interplay between the two central-executive working memory functions inhibition and updating. psychophysiology, 52(10), 1293–1304. https://doi.org/10.1111/psyp.12500 scharinger, c., soutschek, a., schubert, t., & gerjets, p. (2017). comparison of the working memory load in n-back and working memory span tasks by means of eeg frequency band power and p300 amplitude. frontiers in human neuroscience, 11(6). https://doi.org/10.3389/fnhum.2017.00006 schmeck, a., opfermann, m., van gog, t., paas, f., & leutner, d. (2015). measuring cognitive load with subjective rating scales during problem solving: differences between immediate and delayed ratings. instructional science,43(1), 93–114. https://doi.org/10.1007/s11251-014-9328-3 schneider, s., dyrna, j., meier, l., beege, m., & rey, g. d. (2017). how affective charge and text–picture connectedness moderate the impact of decorative pictures on multimedia learning. journal of educational psychology. https://doi.org/10.1037/edu0000209 schneider, s., nebel, s., & rey, g. d. (2016). decorative pictures and emotional design in multimedia learning. learning and instruction, 44(march), 65–73. https://doi.org/10.1016/j.learninstruc.2016.03.002 schnotz, w., fries, s., & horz, h. (2009). motivational aspects of cognitive load theory. in m. wosnitza, s. a. karabenick, a. efklides, & p. nenniger (eds.), contemporary motivation research. from global to local perspectives (pp. 69–96). göttingen, germany: hogrefe & huber. schnotz, w., & kürschner, c. (2007). a reconsideration of cognitive load theory. educational psychology review, 19(4), 469–508. https://doi.org/10.1007/s10648-007-9053-4 schöne, b., schomberg, j., gruber, t., & quirin, m. (2016). event-related frontal alpha asymmetries: electrophysiological correlates of approach motivation. experimental brain research, 234(2), 559–567. https://doi.org/10.1007/s00221-015-4483-6 schüler, a. (2017). investigating gaze behavior during processing of inconsistent text-picture information: evidence for text-picture integration. learning and instruction, 49, 218–231. https://doi.org/10.1016/j.learninstruc.2017.03.001 simola, j., le fevre, k., torniainen, j., & baccino, t. (2015). affective processing in natural scene viewing: valence and arousal interactions in eye-fixation-related potentials. neuroimage, 106, 21–33. https://doi.org/10.1016/j.neuroimage.2014.11.030 simola, j., torniainen, j., moisala, m., kivikangas, m., & krause, c. m. (2013). eye movement related brain responses to emotional scenes during free viewing. frontiers in systems neuroscience, 7(41). https://doi.org/10.3389/fnsys.2013.00041 smith, e. e., reznik, s. j., stewart, j. l., & allen, j. j. b. (2017). assessing and conceptualizing frontal eeg asymmetry: an updated primer on recording, processing, analyzing, and interpreting frontal alpha asymmetry. international journal of psychophysiology, 111, 98–114. https://doi.org/10.1016/j.ijpsycho.2016.11.005 strukelj, a., & niehorster, d. c. (2018). one page of text: eye movements during regular and thorough reading, skimming, and spell checking. journal of eye movement research, 11(1), 1–22. https://doi.org/10.16910/jemr.11.1.1 sweller, j., van merrienboer, j. j. g., & paas, f. g. w. c. (1998). cognitive architecture and instructional design. educational psychology review, 10(3), 251–296. https://doi.org/10.1023/a:1022193728205 triesch, j., ballard, d. h., hayhoe, m. m., & sullivan, b. t. (2003). what you see is what you need. journal of vision, 3(1), 86–94. https://doi.org/10.1167/3.1.9 van gog, t., & scheiter, k. (2010). eye tracking as a tool to study and enhance multimedia learning. learning and instruction,20 (2), 95–99. https://doi.org/10.1016/j.learninstruc.2009.02.009 vignali, l., himmelstoss, n. a., hawelka, s., richlan, f., & hutzler, f. (2016). oscillatory brain dynamics during sentence reading: a fixation-related spectral perturbation analysis. frontiers in human neuroscience, 10, 1–13. https://doi.org/10.3389/fnhum.2016.00191 weinreich, a., stephani, t., & schubert, t. (2016). emotion effects within frontal alpha oscillation in a picture oddball paradigm. international journal of psychophysiology, 110, 200–206. https://doi.org/10.1016/j.ijpsycho.2016.07.517 weiss, b., knakker, b., & vidnyánszky, z. (2016). visual processing during natural reading. scientific reports, 6, 1–16. https://doi.org/10.1038/srep26902 winslow, b., carpenter, a., flint, j., wang, x., tomasetti, d., johnston, m., & hale, k. (2010). combining eeg and eye tracking: using fixation-locked potentials in visual search. journal of eye movement research, 6(4), 1–11. https://doi.org/10.16910/jemr.6.4.5 yarbus, a. l. (1967). eye movements and vision. new york, ny, usa: plenum press. https://doi.org/10.1016/0028-3932(68)90012-2 zhou, w., & gotman, j. (2009). automatic removal of eye movement artifacts from the eeg using ica and the dipole model. progress in natural science, 19(9), 1165–1170. https://doi.org/10.1016/j.pnsc.2008.11.013 frontline learning research 1 (2013) 24 41 issn 2295-3159 corresponding author: ming fai pang, the university of hong kong, pangmf@hku.hk, t (852) 28592428, f (852) 28585649 http://dx.doi.org/10.14786/flr.v1i1.16 24 | f l r meanings are acquired from experiencing differences against a background of sameness, rather than from experiencing sameness against a background of difference: putting a conjecture to the test by embedding it in a pedagogical tool ference marton a , ming fai pang b a university of gothenburg, sweden b the university of hong kong, hong kong sar, china article received 26 march 2013 / revised 31 may 2013 / accepted 30 june 2013 / available online 27 august 2013 abstract in helping learners to make a novel meaning their own, such as when helping children to understand what a word means or teaching students a new concept in school, we frequently point to examples that share the aimed-at meaning but differ otherwise. this type of approach rests on the assumption that novel meanings can be acquired through the experience of sameness against a background of difference. this paper argues that this assumption is unfounded and that the opposite is the case: we make novel meanings our own through the experience of differences against a background of sameness. we put this conjecture to the test in an experimental study by embedding it in a computer game and the results support the conjecture. . keywords: variation theory; phenomenography; discernment; critical experiment 1. the conjecture this paper is about a conjecture and how it is put to the test. the conjecture is actually the title of the paper and we first briefly describe the theory that elaborates its implications, together with some previous results. after that we report on a study which is meant to be a critical test of it. we call this conjecture—and the system of corollaries that it implies—somewhat immodestly the variation theory (of learning) (marton, forthcoming; marton & tsui, 2004). f. marton & m.f. pang 25 | f l r 1.1 the origin of meaning it is commonly believed that a child or an adult for that matter can learn the meaning of a word by observing a number of examples of what the word refers to, that share this meaning but differ in other ways. for example, we point to a dog and say dog, point to another dog and say dog, point to a third dog and say dog, and then expect the child to understand what the word ―dog‖ means (refers to), i.e., a certain kind of animal. in an experimental context such a learning event could look like the following: ―…children might be shown a red fuzzy triangle labeled ―wug‖, a blue bumpy triangle labeled ―wug‖, a green scratchy triangle labeled ―wug‖ and then at the test be asked to pick out a ―wug‖ (a yellow squishy triangle) among two or three objects‖ (vlach et al, 2008). now, if a child has noticed previously that there are different geometric forms of which triangle is one, it is most likely that she will see that the three things are different, but they are all triangles regardless if she has learned that they are called ―triangle‖. hence she will identify the yellow squishy triangle in the test as a ―wug‖. but if she has never noticed triangles previously, or geometric forms in general, she will not see any triangles at all. in consequence, she will not be able to see what the different cases have in common. there is no way of learning the idea of triangle in such an experimental context, if you have not come across that idea earlier. but if you have, you might be able to learn that triangles are called ―wug‖ in the actual context. in the same way, no child can learn that dogs are a kind of animals, without coming across other animals than dogs. the idea of triangle derives from how it differs from other geometric forms, and the idea of dog derives from how it differs from other animals. providing different examples of the same thing is not only the most common method of helping young children to build a vocabulary, but probably also the most common method of teaching concepts, principles, and problem-solving methods in school. stigler and hiebert (1999) describe such an approach as the typical way of teaching mathematics in u.s. schools, and the highly authoritative volume how people learn urges teachers to provide ―…many examples in which the same concept is at work.‖ (bransford, brown, & cocking, 2000, p 20) looking at cases that are the same in one respect but differ in others to determine what they have in common is called induction. according to fodor (1980), this is the only idea that exists to explain how novel meanings (concepts) are learned, and it simply does not work, for the reasons already cited. it follows then that there is no explanation of how we learn, find, create, or appropriate new meanings. hence, by default, fodor concludes that meanings (concepts) are innate. in our view, however, even if the concept (meaning) of ―dog‖ were innate, you would never be able to separate that meaning from the meaning of ―animal‖ if you had never encountered any animals other than dogs. regardless of whether dogs were then called dogs or animals, the meaning of ―dog‖ would be exactly the same as the meaning of ―animal.‖ hence, you still would not have acquired the meaning of ―dog.‖ nor of ―animal‖ for that matter. the meaning of dog has to be learned, and this happens by coming across dogs, as well as other animals. similarly, if we lived in an entirely green world, then we would be unable to notice the greenness of everything. hence, whether or not concepts (meanings) are innate, we must encounter alternatives to them if we are to be able to notice and grasp these concepts. awareness of a particular number presupposes awareness of other numbers (or at least one other number), and awareness of a particular color presupposes awareness of other colors (or at least one other color). you cannot possibly understand what chinese is simply by listening to different people speaking chinese if you have never heard another language, and you cannot possibly understand what virtue is by inspecting different examples of the same degree of virtue. nor can you understand what a ―linear equation‖ is by looking only at linear equations. you cannot arrive at a novel meaning through induction, but you can through contrast. in induction, the focused meaning, i.e., the one that you are trying to help another to make his or her own (e.g. ―chinese‖) is kept invariant, while the other features of the same entity (e.g. words) vary. in contrast, it is just the other way around. the focused meaning (e.g. language) varies, while other features (e.g. words) are invariant. instead of saying different words in the same language (chinese), you say the same word in different languages (one of which is chinese). f. marton & m.f. pang 26 | f l r inductive learning is a frequent research topic, not the least in the field of machine learning (e.g. michalski, 1983). as the conjecture being put to the test in our study is about how novel meanings are acquired, and as it states that they are not acquired through induction, we will leave those studies aside here. 1.2 earlier attempts to put the conjecture to the test most work on variation theory has been carried out in the form of learning studies, the inspiration for which is the japanese lesson study, which came to wider attention through the publication of stigler and hiebert’s (1999) best-selling book the teaching gap. in this type of study, a group of teachers teaching a particular subject at a particular level together choose an object of learning (something to be learned) that is vitally important for students’ continued learning and that has earlier been found to present difficulties for them. the teachers plan a lesson together, and one of them carries it out usually in his or her own class while the others observe. afterwards, the group analyzes and discusses what happened in the classroom. the learning study is a hybrid form of lesson study and design experiment. it is a theory-based research undertaking whose important components include exploration of students’ ways of making sense of the object of learning before and after the lesson(s). a learning study usually comprises three cycles, each building on the conclusions of the previous. finally, a learning study is documented, frequently in publishable form. while lesson study is primarily an arrangement for in-service training of the participating teachers, learning study is primarily teachers’ research, the results of which are supposed to be widely shared with other teachers. the variation theory of learning has so far been the theoretical point of departure for the studies carried out. the model was originally developed right after the turn of the millennium in hong kong, and subsequently spread to other countries, notably to sweden. our estimate is that nearly 1000 such studies have been carried out by now (cf. lo, 2009). the main (quantitative) results of the studies published to date can be summarized as follows. in nearly all of the studies, students’ results were better after the lesson(s) than before (lo, pong, & chik, 2005). (although this may appear self-evident, it is not. unfortunately, there are many school lessons in which students learn nothing, or at least not what the teacher had hoped they would.) students with weaker learning prerequisites usually learn the most. hence, not only does the average rise, but the spread diminishes (lo et al., 2005). in cases in which what the students had learned was observed not only immediately after the lesson but also on a later occasion, the results were often found to be better at the later time (thus indicating a content-specific ―learning to learn‖ effect) (holmqvist, gustavsson & wernberg, 2008). results on national achievement tests increased for classes that had participated in several learning studies, an effect that in all likelihood was mediated by changes in teachers’ regular ways of teaching (maanula, 2011). when the same object of learning is dealt with in a learning study and in a lesson study by groups of equally well-qualified teachers, the quality of learning turns out to be strikingly higher in the former (marton & pang, 2006, 2008; pang, 2010; pang & marton, 2003, 2005, 2007). when the three cycles of a learning study are compared, the results from the third are usually better than those from the second, and those from the second are usually better than those from the first (lo, 2009). john elliot, one of the founders of the ―action research‖ movement in education, has evaluated two large-scale learning study projects carried out in hong kong. he concluded: ―the evaluation gathered convincing evidence of the positive impact of the process on teachers’ and students’ learning …. learning study is focused on realizing new kinds of pedagogical roles. from the evidence gathered in this evaluation it has enormous potential in this respect‖. (elliott, 2004) f. marton & m.f. pang 27 | f l r it seems, in other words, that the learning study approach has been something of a success story. what about our conjecture? has it been supported in learning study research? in our learning studies, every lesson was initially planned to be consistent with variation theory, and hence consistent with our conjecture. differences between cycles were related to differences between different interpretations of the same ideas. although this approach may be a good way to improve lessons, it is not really suitable for testing a theoretical conjecture. accordingly, we carried out a few studies using comparison groups, controlling for the assumed generally positive effects of the co-operative lesson study model. two groups of teachers, randomly selected for the two conditions (i.e., a learning study and lesson study condition), agreed on a particular object of learning. together, they explored their students’ understanding of that object, and planned a lesson on the basis of what they found and on their previous experience of teaching the same object of learning. one of the teachers then carried out the lesson, while the others observed. after the lesson, the group again explored students’ understanding of the object of learning, and the lesson was analyzed in light of the results. a researcher was present as a resource person during both the discussions and lessons. the only difference between the two conditions was that in the learning study group, the researcher introduced variation theory, which he did not do in the lesson study group. although he participated in the discussions in both groups, he tried to act in a reactive rather than active (initiating) manner. the focus of the studies was a comparison of students’ results under the two conditions in relation to a comparison of the patterns of variation and invariance brought about in those conditions (marton & pang, 2006, 2008; pang & marton, 2003, 2005, 2007). although the results showed dramatic differences to the advantage of the learning study (and hence the theory on which it is based, as these patterns were controlled by the teachers and by the students, of course these comparisons had to be post hoc. to sharpen the comparison of patterns of variation and invariance, the researcher must be able to ascertain exactly what patterns are being compared. in quasi-experimental comparisons, such as that described here, there are usually no consecutive cycles. even if a researcher tries to be as blind to the two conditions as possible, we can hardly claim that he or she has succeeded completely. in our case, the ―theory group‖ may have had an advantage beyond that originating in the theory itself. furthermore, the comparisons were made between the conditions in terms of the patterns of variation and invariance observed by the researcher, which means that they were post hoc, as noted, and hence the matter of empirical support for variation theory is not entirely straightforward. 1.3 there are no teaching experiments a fair number of studies have been published in recent years in which the outcomes of learning have been found to be systematically related to the patterns of variation and invariance inherent in the conditions of learning. the lived object of learning (learning outcome) in these studies has generally been found to be related to the enacted object of learning (teaching and classroom interaction) in ways entirely consistent with our conjecture. the outcomes of learning, and differences therein, can be made sense of in terms of the patterns of variation and invariance or the differences in these patterns that are inherent in the conditions of learning (see, for instance, fraser, allison, coombes, case, & linder, 2006; fraser & linder, 2009; linder, fraser & pang, 2006; marton & pang, 2006, 2008; pang, linder & fraser, 2006; pang & lo, 2012; pang & marton, 2003, 2005, 2013). if lessons are to provide stronger evidence, then they must be defined in advance, and their effects on learning must also be predicted in advance. kullberg (2010) carried out an interesting study in which she instructed teachers to teach particular objects of learning in terms of the critical features identified and patterns of variation and invariance employed in previous successful studies. the teachers were familiar with variation theory, according to which critical features and patterns of variation and invariance are powerful tools for communicating ways of handling a certain object of learning. even when kullberg’s (2010) results supported her expectations, however, there were several cases in which the enacted pattern of variation and invariance differed from that expected. although in some cases, the teacher had failed to open up dimensions f. marton & m.f. pang 28 | f l r of variation to make it possible for the students to discern certain critical features, in others, the students opened up dimensions of variation that they were not supposed to under their specific condition, but that were critical for learning. in such cases, the class was meant to serve as a control, and the unpredicted changes may have strengthened or weakened the results. 1.4 a critical experiment the only possible way to ensure that what is being compared is what we want it to be seems to be to build a pattern of variation and invariance into pedagogical tools: texts, tasks, examples, illustrations, problems, and the like. variation and invariance as far as the conditions of learning are concerned can then be defined in terms of the relationships between the constituent parts of the pedagogical tools that are used. a study of this kind was carried out by ki and marton (2003). they investigated how non-native speakers of cantonese could be helped to learn to attend to both the tonal and segmental (the sound but not the tone) aspects of cantonese words simultaneously to identify their meanings. cantonese is a tonal language in which the distinctions between six tones are of vital importance. the difficulty that speakers of non-tonal languages have when they try to learn it is not so much their inability to distinguish between two juxtaposed tones (stagray & downs, 1993) as their inability to link variation in pitch at the word level to variation in word meanings. variation in pitch exists in all languages, but its significance in non-tonal languages is at the sentencerather than word-level. learning to pay attention to differences in pitch at the word level as a cue to differences in word meanings requires reorganization of the attentional field. ki and marton (2003) employed a set of nine words grouped in two ways. in the first, they were grouped to constitute three triplets, each characterized by one tone (the same within each triplet, but differing from the other two). in the second, three segments were grouped to constitute three triplets, each characterized by one segment and three different tones (see figure 1). segmental1 segmental2 segmental3 tone1 word11 word12 word13 tone2 word21 word22 word23 tone3 word31 word32 word33 segmental1 segmental2 segmental3 tone1 word11 word12 word13 tone2 word21 word22 word23 tone3 word31 word32 word33 figure 1. three triplets characterized by one segment and three different tones (if read by column) and by one tone and three different segments (if read by rows). the participants’ task was to learn to identify the meaning of the word they heard by selecting its english equivalent. if we consider each triplet as a sub-task, then to be able to come up with the meaning of f. marton & m.f. pang 29 | f l r each word, a participant must be able to differentiate between the three words. if the three words in the sub-task have a tone in common, then the participant must learn to distinguish between the three different segments and link them to the three different meanings. if, instead, the three words in the sub-task have a segment in common, then the participant must learn to distinguish between the three different tones and link them to the three different meanings. hence, when the segments vary, you learn segments, and when the tones vary, you learn tones. the two ways of grouping the words can be seen as a comparison between two patterns of variation and invariance, that is, as induction and contrast from the point of view of tones. if we believe that language learners learn tones (i.e., differentiate between them) best if we offer them different examples of the same tone, then we group words into triplets, within which each has the same tone but a different segment, and ask learners to compare them. if we believe instead as our conjecture suggests that meaning (in this case, ―the meaning of tones‖) derives from variation, then we group the words into triplets, within which the tones differ but the segment is the same, and ask learners to compare them. in ki and marton’s (2003) study, the participants clearly learned to distinguish words more effectively by means of tones in the condition in which the tones were varied during the lesson and the segment remained the same, than in the condition in which the tone was invariant and the segments varied. the study thus demonstrated that learning is more effective under the contrast condition than under the induction condition, as predicted by our main conjecture (see also guo & pang, 2011). this was the first critical experiment in which it was put to the test. 1.5 another way of putting the conjecture to the test above, we have argued in agreement with fodor (1978, 1980) that induction is the most common means of trying to help others to acquire novel meanings, but it is certainly not the only one. in our own studies of the teaching and learning of economics (pang & marton, 2003, 2005; marton & pang, 2006, 2008), we found that teachers frequently used neither induction, nor contrast. they differed from the teachers using variation theory by not only varying the focused aspect but also varying the unfocused aspect. the teachers not using variation theory actually used more variation than the teachers using variation theory. the comparison between induction and contrast mentioned above and being the first critical test of the conjecture, can be illustrated in the following form: induction contrast focused aspect unfocused aspect focused aspect unfocused aspect i v v i in relation to the tone learning experiment described in the previous section, induction means that the participants learn one tone at a time in three different runs. in each run the tone is the same in every task, while the segments (the unfocused aspect) vary. in the case of contrast, there are three runs too, but in each run the segment is the same in every task, while the tone (the focused aspect) varies. this is one way of putting the conjecture to the test. contrast is aligned to the theory, induction is not. but what if the object of learning is cantonese words (and not only tones)? then we have two focused aspects (tonal and segmental). according to the theory, they should vary one at a time. but there is a third aspect, not new for the learners, hence unfocused. this is the meaning aspect of the words represented by pictures and english words in the experiment. it is not independent from the other two aspects: when one or both vary, the meaning varies too. the second way of putting the conjecture to the test is to compare the case when the two focused aspects vary one at a time, followed by both varying simultaneously (to bring the different aspects of the words together), with the case of having the focused aspects varying simultaneously f. marton & m.f. pang 30 | f l r from the beginning. the former pattern of variation and invariance is consistent with the conjecture, the latter is not. this is exactly the comparison that ki, ahlberg and marton (2006) carried out (see figure 2), demonstrating that the participants in the condition that was consistent with the main conjecture learned better than those in the condition that was not. moreover, the conjecture was built into the pedagogical tools they used in the study, a computer-administered program that afforded variation, invariance, and feedback to the participants. in the first experiment, one of the aspects, tone, was considered focused (what is to be learned) and the other, segment, was considered unfocused. in this second critical experiment in which the conjecture was put to the test, both aspects were considered focused (they had to be learned). tone segment meaning tone segment meaning v i v v v v i v v v v v v v v v v v figure 2. comparing patterns of variation and invariance consistent with (left) and not consistent with (right) the conjecture. discerning an aspect amounts to separating it from other aspects. two aspects can be distinguished from each other if one varies and the other is invariant. furthermore, if there are two focused aspects that learners are expected to learn to discern, then they should be varied one at a time, rather than simultaneously. if we want these learners to relate the two aspects, then we should vary them simultaneously, but only after they have been discerned. in the second experiment carried out by ki, ahlberg and marton (2006), there was a third aspect, meaning, that was assumed to be recognized by the learners (they were expected to make sense of the pictures representing the meanings). this aspect is a function of the other two aspects and cannot be kept invariant when any of the other aspects vary; nor does it interfere with the experience of variation in the other aspects, of which it is a function. the first critical experiment showed that letting the focused aspect (that which is to be learned) vary, while keeping the unfocused aspect (that which has already been learned) invariant yields better learning than keeping the focused aspect invariant and letting the unfocused aspect vary. the conjecture was thus supported. in the second critical experiment it was shown that in the case of two focused aspects varying one at a time, and then varying both simultaneously, yields better learning than letting both focused aspects vary from the beginning, even when an unfocused aspect, which is a function of the two focused aspects varies at the same time. the conjecture was supported again. it was put to the test in a third critical experiment in a study reported in the next section. in this case, in addition to the two focused aspects (demand and supply) and the unfocused aspect being a function of the two (price), there was an additional unfocused aspect involved (good) independent of the two focused aspects which according to the theory was supposed to remain invariant. again, two patterns of variation and invariance one consistent with, and one not consistent with our main conjecture were compared in terms of their effect on learning. and the conjecture was supported once again. this is the empirical contribution of the present paper. f. marton & m.f. pang 31 | f l r 2. the study 2.1 understanding pricing the point of departure for this study was an earlier study in which 10-year-old children were taught to discern price as a function of demand and supply (lo, lo-fu, chik & pang, 2005). that study, in turn, built on an earlier study of qualitatively different ways of understanding price and pricing (dahlgren, 1978). in both of these studies, with minor differences, it was found that most children and many adults see price as a function of the attributes of goods. for instance, if something is expensive, then it is because it is big, beautiful, tastes good, etc. price is thus seen as an attribute of the good in question, and linked to its other attributes, not as a function of market conditions (notably demand and supply), as economics tells us that it is. some see price as a function of demand only, and others as a function only of supply. for others still, price is a function of both demand and supply, or rather of the relationship between the two, which is roughly in accordance with the canonical conceptualization of price in classical, liberal economics. we use the expression ―learning to see something in a certain way‖ as synonymous with ―making a novel meaning your own‖ or ―appropriating a meaning.‖ all three refer to the capability to discern certain aspects of a phenomenon and focus on them simultaneously. what then are ―ways of seeing something‖? they are categories of description used to depict the various appearances of something or the different ways in which it is experienced (or its different meanings). the research specialization of phenomenography (marton, 1981; marton & booth, 1997; marton & pang, 2008) is the study of categories of description depicting appearances, experiences, and meanings. it posits that if a learner exhibits a certain way of seeing something, then this does not imply that he or she has that way of seeing (as a mental representation, for instance). what then does it imply? it implies that he or she is seeing or has seen a particular phenomenon in a particular way under particular circumstances. further, the fact that he or she is so seeing implies that he or she is able or has been able to see that particular phenomenon in that particular way under the given particular circumstances. accordingly, what we might wish to explore is the extent to which the same person can see the same phenomenon in the same way under different circumstances. if he or she can, then this could be interpreted as demonstrating that he or she has separated the particular way of seeing this particular phenomenon from the particular circumstances. becoming an ―expert‖ frequently amounts to being able to see particular phenomena in particular ways under widely varying circumstances (cf. chi, feltovich, & glaser, 1981; goodwin, 1994; marton & booth, 1997, p. x; sandberg, 1994). hence, phenomenography does not tell you what individuals’ ways of seeing something are. it tells you how their ways of seeing something vary (between people under the same circumstances and/or within people under different circumstances). the different categories of description together constitute the outcome space (of how the particular phenomenon might be experienced). as previously mentioned, studies have established four categories of description that together constitute the outcome space of the experience of price. 2.2 making it possible to learn to see price in a more powerful way are the different ways of seeing price equally powerful? we do not believe that we can always or even most of the time find a universal ordering of how valid and powerful different ways of seeing the same thing are. in a planned economy, and according to marxist economics, for example, price is not a function of demand and supply. however, we can delimit a set of contexts and settle for ordering the different options within that set. we could thus argue that it is better to enable learners to see something in an additional way that we believe to be powerful in certain contexts, that is than not doing so. accordingly, we may try to help learners to see something in a new way, that is, in a way that they have previously been unable to. although we can certainly try, we can never be certain of success. at best, we can ascertain that this new way of seeing might have been instilled, that is, that under the conditions f. marton & m.f. pang 32 | f l r given, it is possible that the learners learned to discern certain critical features, which is exactly what lo et al. (2005) did in five primary school classes (grade 4) in hong kong in the context of a learning study. the aim of each lesson in this study was the same: to enable the students to see price as a function of demand and supply in novel situations. after the lesson, a novel question was used to probe their way of seeing price. 2.3 the enacted object of learning a double-lesson was used to help the students to learn to discern demand and supply, and the relationship between the two, as determinants of price. during the lesson, the students formed groups and participated in an auction of four items (a mechanical dinosaur, a doll, a dinosaur card, and a stationery set). the auction was repeated several times, with variations. to encourage the students to focus on and discern the critical aspects of supply and demand separately, changes were made in supply (by varying the number of items available) while demand was kept invariant, and then changes were made in demand (by varying the purchasing power through changes in the auction money afforded the groups) while supply was kept invariant. after each auction, they were asked what would be a reasonable price for a new, limited-edition mechanical dinosaur if people had more money to spend. after the groups had written their answers on a worksheet, the teacher engaged the class in a discussion of the case of supply going down and demand going up. did the teachers who took part in this study achieve their goal? if so, to what extent did they do so? as can be seen from table 1, their attempts were not especially successful, with the possible exception of class 4b (see the frequencies for category d, considered the canonical conception here). table 1 distribution of conceptions in preand post-tests in learning study carried out by lo et al. (2005) class 4a class 4b class 4c class 4d class 4e conceptions of price pretest posttest pretest posttest pretest posttest pretest posttest pretest posttest a. attributes of the good 6.1% 15.2% 7.7% 0.0% 0.0% 9.7% 17.9% 0.0% 6.5% 9.7% b. demand 39.4% 45.5% 64.0% 28.2% 77.5% 71.0% 50.0% 78.6% 51.5% 48.4% c. supply 0.0% 6.0% 2.6% 7.7% 3.2% 3.2% 10.7% 3.6% 19.4% 6.5% d. demand and supply 3.0% 9.1% 10.3% 61.5% 3.2% 12.9% 7.1% 14.2% 9.7% 22.6% e. other noneconomic reasons 3.0% 0.0% 7.7% 2.6% 0.0% 0.0% 3.6% 0.0% 3.2% 0.0% unclassified 48.5% 24.2% 7.7% 0.0% 16.1% 3.2% 10.7% 3.6% 9.7% 12.8% rather than ask whether (and why or why not) seeing price in terms of demand and supply is too difficult for 10-year-old children, we are more eager to understand the striking difference in results between f. marton & m.f. pang 33 | f l r class 4b and the other classes. (as is shown in table 1, while the frequency of the target conception (d) increased after the lesson from 10.3 to 61.5% in class b, it increased from about 6 to about 15% in the other classes). did something happen in this class that did not in the others? or did something happen in all of the classes except 4b? prompted by the same curiosity, lo et al. (2005) did indeed come up with an interpretation for the discrepancy in their results: the necessary conditions for discerning a simultaneous variation in demand and supply were present only in class 4b, which was the only class in which the unfocused aspect (the good, i.e. the item for auction) was invariant throughout the entire sequence of variation and invariance in the focused aspects (demand and supply). differences of this kind (the focused aspect varying and the unfocused aspect remaining invariant versus both aspects varying) have also been found in two other studies, and in both cases were linked to rather dramatic differences in what the participants had learned (i.e., according to the outcome measures) (marton & pang, 2006; pang & marton, 2003). the conjecture that we want to put to the test here has two component parts: what is expected to vary in sequence (the focused aspects) and what is expected to remain invariant (the unfocused aspect) throughout. in the study reported here, we wanted to compare two conditions: one consistent with the second component part (the unfocused aspect remaining invariant throughout) and one not consistent with it. could we replicate the findings of the aforementioned study (lo et al., 2005), which served as our point of departure, with the same difference built into pedagogical tools? figure 3 shows the comparison carried out. demand supply meaning good demand supply meaning good v i v i v i v v i v v i i v v v v v v i v v v v figure 3. comparing patterns of variation and invariance, consistent (left) and not consistent (right) with the conjecture. 2.4 design of the study to reduce the number of factors that could affect the outcome, we tried to build the pattern of variation and invariance (which we assumed to be necessary) into the task structure of the learning resources in such a way that the entire experiment would be an interaction between students and the auction game tool: the computer. students were invited to attempt to achieve the object of learning by using two different computerized learning resources during an independent learning session that lasted approximately one and a half hours and was held in the multi-media learning center of the participating school. in line with lo et al. (2005) study, in both learning resources, the economic principle to be dealt with was the determination of the market price through the interaction of supply and demand. an auction game was used to embody the variation in the dimensions of supply and demand. to test whether it is crucial to keep the auction item in question invariant, so as to enable students to focus on and discern the critical aspect of the interaction between supply and demand more readily and effectively, the two learning resources were identical in all respects but: one resource made use of the same product (i.e., boxes of candy) throughout the auction game, whereas the other featured different products within and across each round. seventy-eight grade 4 students from four classes of one school in hong kong participated in the study. within each class, students were randomly divided into two groups, with each given one of the two learning resources. to minimize the teacher effect, learning took place in an autonomous manner, with the students involved playing the computerized auction game on their own, although the researcher gave a fiveminute summary at the end of the session to remind the students of the key learning points. (note that it was impossible for the researcher to know under which condition each student was working. the only difference between the two conditions was that students in the same multi-media learning center received one or the other of the two versions of the learning resource, with the distribution of the two completely randomized.) f. marton & m.f. pang 34 | f l r to obtain students’ existing understanding of the object of learning before they engaged with the learning resources and to form a baseline for comparison of the learning outcomes of the two groups, a pretest was administered to all students. then, immediately after the independent learning session, they were required to complete a post-test to allow evaluation of their mastery of the object of learning. in both tests, the students were asked to consider a problem relating to a real-life scenario embodying the principle in question, i.e., the interaction of supply and demand in determining the market price of a good. they were also asked to elaborate upon the factors they had considered in setting that price. the questions in the preand post-tests were essentially identical, except that the product in question varied. mirroring lo et al. (2005) study, a hot dog and a box of biscuits were used. students who were asked a question about the hot dog in the pre-test were asked about a box of biscuits in the post-test, and vice versa. the preand post-test questions were as follows: have you ever tried the hot dogs (biscuits) sold in the school shop? do you know how much they cost? maybe you know or you don’t know. anyway, just for your information, hot dogs are (a box of biscuits is) now sold at hk$5. suppose that you are the new owner of the shop. what price would you set for a hot dog (box of biscuits)? would you set the current price, or a different price? what would you consider when you set the price? the students’ answers were analyzed and described in terms of the aforementioned set of four categories of understanding. 2.5 the learning resources to build a relevance structure (marton & booth, 1997, p. 143) that would enable students to appropriate the object of learning, they were given the task of bidding on goods for an upcoming new year’s celebration through the computerized auction game. in the first round, students were introduced to the basic rules and operation of the game. each student was given hk$400 in auction money and asked to bid for and thus try to obtain as many items as possible from the nine being auctioned, which were displayed on screen with their base prices shown. each round of the auction came to an end after three minutes or once the student had used up all of his or her money, whichever came first. the average prices of the goods auctioned were then calculated and shown to the student so that he or she could associate possible changes in those prices with changes in the conditions of each round of the auction, such as the amount of auction money provided, the number of goods to be auctioned, or both. as previously noted, the only difference between the two learning resources was that for the ―different goods‖ group the nine items, which included different kinds of snacks such as potato chips, chocolate bars, biscuits, and so on, differed both within each round and between rounds, whereas for the ―same goods‖ group the nine items were all the same, i.e., every item was a box of candy. in the second round (see figures 4 and 5), to bring students’ focal awareness to bear upon the dimension of demand, demand was deliberately varied (by varying students’ purchasing power by changing the amount of auction money they were given) while the supply of goods was kept invariant. each student’s auction money was cut by hk$200, thus diminishing their purchasing power and demand for goods. the supply of goods for auction, however, remained invariant, with the number of items kept at nine. everything was identical for both learning resources except that the nine items for auction remained invariant in the ―same goods‖ design (the same nine boxes of candy as in the first round, whereas the type of goods varied in the ―different goods‖ design, changing from the nine kinds of snacks in the first round to nine kinds of soft drink in the second.) f. marton & m.f. pang 35 | f l r day 2 group a record of the average price of items for auction day 1 day 2 day 3 day 4 amount of auction money hk$400 hk$200 number of items auctioned average price of items for auction now the shop has closed. are you happy with the items that you have obtained? reflect on the auctions on days 1 and 2 and complete the following task. task: compare today’s average item price with yesterday’s. what have you found? figure 4. same goods design (round 2). day 2 group b record of the average price of items for auction day 1 day 2 day 3 day 4 amount of auction money hk$400 hk$200 number of items auctioned average price of items for auction now the shop has closed. are you happy with the items that you have obtained? reflect on the auctions on days 1 and 2 and complete the following task. task: compare today’s average item price with yesterday’s. what have you found? figure 5. different goods design (round 2). f. marton & m.f. pang 36 | f l r in the third round, to help students to shift their focal awareness to the dimension of supply, supply was deliberately varied while demand was kept invariant. the number of items for auction was reduced from nine to seven, whereas the amount of auction money remained the same (hk$200). however, the only but critical difference between the two learning resources was that all seven items in the ―same goods‖ design remained boxes of candy, whereas the seven items used in the ―different goods‖ design now differed from those in the two previous rounds, with participants being asked to consider different kinds of balls in this round. unlike the classroom study carried out by lo et al. (2005), we introduced a fourth auction round in which variation was introduced in both the demand and supply of goods in a simultaneous manner. our purpose was to help students to focus on the dimensions of both in determining the market price of a good. to this end, the auction money given to students was increased from hk$200 to hk$400, and the number of items to be auctioned was reduced from seven to six. this round thus involved a simultaneous variation in the supply of goods and variation in purchasing power (demand for the goods), the aim of which was to enable students to discern the critical aspects of experiencing price and pricing. as before, the only difference between the two learning resources was that the six items in the ―same goods‖ design remained the same, whereas a new set of items (six different kinds of decorations) was introduced in the ―different goods‖ design. lastly, similar to the procedure in the earlier study lo et al. (2005), students were asked a question (for instructional purposes) about what would happen to the price if the supply were increased and purchasing power decreased. in the current study, they were invited to predict the direction of change in the market price, that is, whether the price would go up or down, if the amount of auction money was decreased from hk$400 to hk$100 while the number of items to be auctioned increased from six to 11. as noted, the learning session concluded with a five-minute summary delivered by the researcher to remind students of the key learning points in the computerized learning resources. he simply read the following powerpoint slides to the two groups of students at the same time. 1. (slide 1) ―compare the auction game on days one and two. as the auction money given to you on day two was less than that on day one, your income decreased. when your income decreased, your purchasing power also decreased. this made your demand for goods decrease. as the supply of goods on day two was the same as that on day one, the average price of goods was lower.‖ 2. (slide 2) ―compare the auction game on days two and three. as the auction money given to you on day three was the same as that on day two, your income and purchasing power remained unchanged. your demand for goods also remained unchanged. as the cost of production, such as the prices of raw materials, electricity, and labor increased, the supply of goods decreased. as a result, the average price of goods on day three was higher than that on day two.‖ 3. (slide 3) ―compare the auction game on days three and four. as the auction money given to you was more than that that on day three, your income and purchasing power increased, and your demand for goods also increased. at the same time, the increase in the cost of production made the supply of goods decrease. as demand increased and supply decreased, the average price of goods on day four was higher than that on day three.‖ 4. (slide 4) ―the price of a good is determined by its supply and demand. the supply of a good is affected by its cost of production, such as the prices of raw materials, electricity, and labor, whereas the demand for a good is affected by people’s income and purchasing power. when businesses set the price of a good, they need to consider the factors affecting supply and demand at the same time.‖ f. marton & m.f. pang 37 | f l r 3. results and findings the results presented in table 2 show that students who belonged to the group using the learning resource with the ―same goods design‖ outperformed their counterparts using the learning resource with the ―different goods design‖ in the post-test, in which statistically significant difference was observed between the two groups ( 2 = 10.36, p = 0.03 (< 0.05); effect size = 0.32). (note, in particular, the relative frequencies for the target understanding category d in the post-tests for the two conditions.) table 2 distribution of conceptions, preand post-test conception of price ―same goods design‖ group (40 students) ―different goods design‖ group (38 students) occurrence percentage occurrence percentage pre-test post-test pre-test post-test pre-test post-test pre-test post-test a. attributes of the good 10 1 25.0% 2.5% 4 2 10.5% 5.3% b. demand 14 11 35.0% 27.5% 14 15 36.8% 39.5% c. supply 5 9 12.5% 22.5% 2 5 5.3% 13.2% d. demand and supply 5 16 12.5% 40.0% 5 6 13.2% 15.8% e. other noneconomic reasons 6 3 15.0% 7.5% 13 10 34.2% 26.3% total 40 40 100% 100% 38 38 100% 100% 2 = 7.92 (df = 4) (p = 0.09, i.e., p > 0.05) – pre-test 2 = 10.36 (df = 4) (p = 0.03, i.e., p < 0.05) – post-test we can see in table 2 that while the frequency of the target conception (d) increased after the lesson from 5 to 16 (of 40) under conditions consistent with the conjecture, it increased from 5 to 6 (of 40) under conditions not consistent with the conjecture. 4. conclusions only in a restricted sense was this study a replication of lo et al. (2005) investigation. we wanted to find out if invariance or variation in an unfocused aspect can really have such a strong impact on the learning of the focused aspects as was interpreted to be the case in the previous study. the question could be answered in the affirmative and the conjecture was thus supported. it should be noted that in both the original and follow-up studies, variation was restricted. when the supply was invariant, demand went down (instead of going up in one case and down in another), and when demand was invariant, the supply went down (instead of going up in one case and down in another). in the f. marton & m.f. pang 38 | f l r last round, only one of the four combinations of demand (up/down) and supply (up/down) was realized. we decided not to include all four combinations as it would have made the task too difficult for such young participants. in all of the circumstances considered, it is of course possible that the results would have differed had the students been exposed to more of the possible differences among the patterns of variation and invariance. quite a few of the students in the comparison group managed to learn to discern the critical features of pricing, even though the good was not invariant. in general, even if there is variation in several dimensions, learners may be able to block out all dimensions but one, that on which they happen to focus. there is an interesting twist concerning how this question appeared in the experiment, however. as noted, in the target group, a number of items of the same type of good were offered in each round at the same base price. in the comparison group, the same number of items as in the target group were offered at the same base price. further, whereas the target group considered the same type of good both within and between rounds, the comparison group considered different goods in each round. the difference between the conditions was illusory, however: all the relevant factors (the number of items available, the amount of money participants had, and the base price of goods) were exactly the same. the only element that differed was the irrelevant labels placed on the goods. the only thing those in the comparison group had to do was to separate what was relevant for their decisions from what was not, and bracket the latter. if all of them had done so, then the conditions for the two groups would have been the same. as we can see from the differences in outcome, however, this was not the case. our finding that the comparison group was affected by the irrelevant differences in the item labels implies that quite a few learners in that group failed to separate relevant information from irrelevant information, and therefore failed to see the former. the main contribution of this study is the support it provides for our conjecture: if both the focused and unfocused aspects of the object of learning vary, then it is more difficult to discern the focused aspects and relate them to one another than if the unfocused aspect remains invariant while the focused aspects vary. we found this to be true in the current study, even though the unfocused aspect was completely redundant. however, the conjecture also addresses the question of how we can acquire new meanings (or how we can learn to see certain things in certain ways). as mentioned earlier, fodor (1978, 1980), and others, claim that there is no answer to this question and, in fact, there cannot be any. meanings are innate. however, we argue that regardless of whether meanings (concepts) are innate, or of the sense in which they are (or are not) innate, we have to learn to discern them as aspects of the world around us, and for this to happen, there are necessary conditions. these necessary conditions are specific to particular meanings and to learners’ particular experiential history. they can be formulated in terms of patterns of variation and invariance among instances that do and do not have that particular meaning. our conjecture is thus very straightforward, as is the way in which it can be put to the test. we simply have to create the necessary conditions in one case and ensure that they are absent in another, as pang and marton (2003) did in their aforementioned study. then, we can compare the two cases and determine whether, as expected, all participants in the first case learn the target meaning, whereas none of those in the second do. if these are indeed the results, then the conjecture is strengthened. obviously, this is not what happened. even if we can demonstrate that contrast is more powerful than induction as far as the learning of new meanings is concerned, we cannot demonstrate that new meanings cannot be learned through induction. after all, some learners seem to learn in that condition too, and certainly not all learners will learn even if all possible steps are taken to make it possible for them to do so. the relationship between what is learned, on the one hand, and the conditions of learning, on the other, is stochastic rather than deterministic. but why is this so? returning to the experiment reported in this paper, beyond the fact that the target principle (price as a function of the relationship between demand and supply) was made explicit to both groups, there is a more general answer to the foregoing question. our conjecture concerns the pattern of variation and invariance as experienced by the learner, whereas a pattern of variation and invariance that can be controlled by the researcher refers to the patterns as seen by the researcher. what might the relationship between the two look like? one condition of experiencing variation is that there is variation to be experienced. making sure that f. marton & m.f. pang 39 | f l r this condition is met is the first step toward making learning possible (which in our view is what teaching is all about). however, variation can also be experienced because of previous experiences. experienced variation is thus not necessarily the experience of what is present in the learning situation as seen by the observer. on the other hand, even if there is variation, it is not necessarily experienced by all learners. in conclusion, when comparing two randomly selected groups, we would expect more learners to experience variation if it is present than if it is not. however, experiencing variation not only concerns the variation to be experienced in a relevant dimension; it also presupposes invariance in other dimensions. in other words, variation can be experienced only against a background of invariance. in this sense, experienced variation is a function of invariance, and, as previously stated, experienced variation is also a function of variation. our intention with the present study was to illustrate that learning (in the sense of the discernment of the necessary features of a phenomenon) is a function of experienced variation (by the learner), which is a function of both variation and invariance (as seen by the observer). this we did, with a focus on the latter (invariance). we have thus shown that introducing redundant information (different goods) that is correlated with a variation in critical aspects (a change in demand and supply) significantly reduces the likelihood of learners being able to discern the critical features of the object of learning. a seemingly subtle difference between two conditions, both representing 90 minutes of pedagogical effort, is proved to play a key role in what the students managed to learn. keypoints this paper addresses one of the oldest unsolved mysteries of learning: how do we make novel meanings our own? the answer suggested is: by discerning, separating and bringing together the critical aspects of what we learn about. a critical aspect can be discerned and separated through the experience of variation in that aspect against the background of invariance in other respects. we have put our conjecture to the test by embedding it in a pedagogical tool. the conjecture was supported. acknowledgments the research reported here was financially supported by the swedish research council. we also want to thank the two reviewers of our paper for their excellent input. references chi, m. t. h., feltovich, p. j., & glaser, r. (1981). categorization and representation of physics problems by experts and novices. cognitive science, 5(2), 121-152. doi: 10.1207/s15516709cog0502_2 dahlgren, l. o. (1978). effects of university education on the conception of reality. reports from the institute of education, university of goteborg. gothenburg: institute of education, university of gothenburg. elliott, j. (2004). the independent evaluation of the pips project. hong kong: hong kong institute of education. fodor, j. a. (1978). the language of thought. hassocks, sussex: harvester. f. marton & m.f. pang 40 | f l r fodor, j. a. (1980). fixation of belief and concept acquisition. in m. piattelli-palmarini (ed.), language and learning: the debate between jean piaget and noam chomsky (pp. 142-162). london: routledge & kegan paul. fraser, d., allison, s., coombes, h., case, j., & linder, c. (2006). using variation to enhance learning in engineering. the international journal of engineering education, 22(1), 102-108. fraser, d., & linder, c. (2009). teaching in higher education through the use of variation: examples from distillation, physics and process dynamics. european journal of engineering education, 34(4), 369381. goodwin, c. (1994). professional vision. american anthropologist, 96(3), 606-633. doi: 10.1525/aa.1994.96.3.02a00100 guo, j.p., & pang, m. f. (2011). learning a mathematical concept from comparing examples: the importance of variation and prior knowledge. european journal of psychology of education, 26(4), 495-525. doi: 10.1007/s10212-011-0060-y holmqvist, m., gustavsson, l., & wernberg, a. (2008). variation theory: an organizing principle to guide design research in education. in a. e. kelly, r. a. lesh & j. y. baek (eds.), handbook of design research methods in education: innovations in science, technology, engineering, and mathematics learning and teaching (pp. 111-130). new york: routledge. ki, w. w., ahlberg, k., & marton, f. (2006). computer-assisted perceptual learning of cantonese tones. paper presented at the 14th international conference on computers in education, beijing, china: asia-pacific society for computers in education (apsce). ki, w. w., & marton, f. (2003). learning cantonese tones. paper presented at the earli biennial conference 2003, padova, italy. kullberg, a. (2010). what is taught and what is learned: professional insights gained and shared by teachers of mathematics. doctoral dissertation, university of gothenburg, acta universitatis gothoburgensis, göteborg. linder, c., fraser, d., & pang, m. f. (2006). using a variation approach to enhance physics learning in a college classroom. the physics teacher, 44(9). 589-592. lo, m. l. (2009). building a teacher learning network for developing the ability to teach for learning. paper presented at the 13th biennal conference of earli, amsterdam, the netherlands. lo, m. l., lo-fu, y. w., chik, p. m. p., & pang, m. f. (2005). two learning studies. in m. l. lo, w. y. pong & p. m. p. chik (eds.), for each and everyone: catering for individual differences through learning studies (pp. 75-116). hong kong: hong kong university press. lo, m. l., pong, w. y., & chik, p. m. p. (eds.). (2005). for each and everyone: catering for individual differences through learning studies. hong kong: hong kong university press. maanula, t. (2011). resultat från nationella prov i matematik m m. available from tuula.maunula@telia.com (unpublished manuscript). marton, f. (1981). phenomenography—describing conceptions of the world around us. instructional science, 10(2), 177-200. doi: 10.1007/bf00132516 marton, f. (forthcoming). necessary conditions of learning. new york: routledge. marton, f., & booth, s. (1997). learning and awareness. mahwah, n.j.: l. erlbaum associates. marton, f., & pang, m. f. (2006). on some necessary conditions of learning. journal of the learning sciences, 15(2), 193-220. doi: 10.1207/s15327809jls1502_2 marton, f., & pang, m. f. (2008). the idea of phenomenography and the pedagogy for conceptual change. in s. vosniadou (ed.), international handbook of research on conceptual change (pp. 533-559). london: routledge. marton, f., & tsui, a. b. m. (2004). classroom discourse and the space of learning. mahwah, nj: lawrence erlbaum associates. michalski, r. (1983). a theory and methodology of inductive learning. artificial intelligence, 20, 111-161. pang, m. f. (2010). boosting financial literacy: benefits from learning study. instructional science, 38(6), 659-677. doi: 10.1007/s11251-009-9094-9 f. marton & m.f. pang 41 | f l r pang, m. f., linder, c., & fraser, d. (2006). beyond lesson studies and design experiments: using theoretical tools in practice and finding out how they work. international review of economics education, 5(1), 28-45. pang, m. f., & lo, m. l. (2012). learning study: helping teachers to use theory, develop professionally, and produce new knowledge to be shared. instructional science, 40(3), 589–606, doi: 10.1007/s11251011-9191-4 pang, m. f., & marton, f. (2003). beyond "lesson study'': comparing two ways of facilitating the grasp of some economic concepts. instructional science, 31(3), 175-194. doi: 10.1023/a:1023280619632 pang, m. f., & marton, f. (2005). learning theory as teaching resource: enhancing students’ understanding of economic concepts. instructional science, 33(2), 159-191. doi: 10.1007/s11251-005-2811-0 pang, m.f., & marton, f. (2007).the paradox of pedagogy. the relative contribution of teachers and learners to learning. iskolakultura, 1(1), 1-29. pang, m. f., & marton, f. (2013). interaction between the learners’ initial grasp of the object of learning and the learning resource afforded. instructional science, doi: 10.1007/s11251-013-9272-7 sandberg, j. (1994). human competence at work: an interpretative approach. göteborg, sweden: bas. stagray, j. r., & downs, d. (1993). differential sensitivity for frequency among speakers of a tone and a non-tone language. journal of chinese linguistics, 21(1), 143-163. stigler, j. w., & hiebert, j. (1999). the teaching gap: best ideas from the world's teachers for improving education in the classroom. new york: free press. vlach, h.a., sandhofer, c.m. & kornell, n. (2008). the spacing effect in children's memory and category induction. cognition, 109, 163-167. microsoft word proofsstenalt.docx frontline learning research vol. 9 no. 3 (2021) 52 68 issn 2295-3159 info corresponding author: maria hvid stenalt, department of science education, university of copenhagen, denmark email: mhs@ind.ku.dk doi: https://doi.org/10.14786/flr.v9i3.697 digital student agency: approaching agency in digital contexts from a critical perspective maria hvid stenalt department of science education, university of copenhagen, denmark article received 3 august 2020 / article revised 12 april 2021/ accepted 1 july / available online 30 july abstract developing student agency is a critical aspect of higher education and, in particular, digital education. in this sense, the capacity to understand what constitutes agency in digital contexts of education and evaluate students’ digital agency is now crucial. in contrast to traditional approaches to student agency in digital contexts that subsume technologies to educational intentions, media research has illustrated a more complex interplay between humans and technology. drawing on this insight, the paper argues for a more critical disposition to digital student agency, wherein relational, cultural, and technological dynamics are central to agency. specifically, the article proposes a framework for digital student agency that distinguishes five critical domains to student agency in digital contexts: (1) agentic possibility, (2) digital self-representation, (3) data uses, (4) digital sociality, and (5) digital temporality. the article concludes by outlining the implications of the framework for educational practice and academic research around student agency and student learning. specifically, adopting the framework implies changes in how we investigate student agency in digital contexts and enables critical investigations of student-centred teaching practices. keywords: student agency, networked publics, learning, educational technologies, digital education stenalt 53 | f l r 1. introduction cultivating individuals’ capacity to intervene in and transform given frames of action is key in higher education strategies and teaching that seeks to develop human beings’ capacity to act agentically (damşa et al., 2010; klemenčič, 2017; oecd, 2018). as research exploring student agency in higher education increases, recognition is growing that integration of agency-supportive practices and learning environments is essential for the cultivation of agentic students in higher education (jääskelä, poikkeus, et al., 2020; marín et al., 2020; toom et al., 2017). in many ways, the prevalence of digital technologies in contemporary higher education offers new grounds for supporting student agency. digitalisation plays a vital role in expanding the range of course delivery formats, from campus-based delivery to fully online, hybrid, or blended courses, facilitating flexibility on the learner’s part (kirkwood & price, 2011). moreover, online courses can present students with consequential choices and adapt ‘both the course curriculum and assessment to accommodate those choices’ (lindgren & mcdaniel, 2012, p. 346). this expansion of student choice and redistribution of initiative from educational institutions to the learner is seen to support student agency (bandura, 2002; irvine et al., 2013). another example is learning analytics. as jääskelä, heilala, et al. (2020) explained, collecting and analysing educational data can provide feedback on student progress, promote students’ agentic awareness, and tailor education to students’ needs. moreover, using technologies to facilitate student-centred learning and teaching links to agency. as defined by klemenčič et al. (2020, p. 33), student-centred education concerns ‘the capability of students to participate in, influence, and take responsibility for their learning pathways and environments, in order to achieve the expected learning outcomes’. providing students with opportunities for active participation and involvement in their own learning is central for much use of digital technologies such as student response systems that enable immediate student feedback and digital platforms that support collaborative processes. crucially, such teaching practices are discursively linked to the empowerment of students (starkey, 2019). while several studies investigating digital technologies include student agency, only few pay attention to underpinning their utilisation of the concept theoretically (marín et al., 2020). some studies have provided a clear theoretical account of student agency, but these studies tend to subsume the digital to educational intentions and settings. for example, irvine, code, and richards (2013) framed multiaccess learning as an opportunity for student choice, enabling face-to-face students and distancelearning students to access course materials and other participants and personalise learning. in this study, the context of the course and social processes were presented in terms of delivery formats, overall student characteristics, and concerns of the teacher. lindgren and mcdaniel (2012) explored whether online learning can be improved by employing narratives and student agency as prominent design features. similar to irvine et al. (2013), the specific interplay between agency and the context of action was subsumed in representations of learning activities, the steps involved in completing assignments, and the specific technologies used. in a study by luo et al. (2019), the role of student agency in a course utilising a flipped learning format was presented and analytically considered in relation to the educational instructions. hamilton and friesen (2013) describe this approach to technology as technological instrumentalism, where the technology is ‘interpreted in light of this or that pedagogical framework or principle and measured against how well they correspond in practice to that framework or principle, and technologies are neutral means employed for ends determined independently by their users’ (hamilton & friesen, 2013, p. 3). as bayne (2015) has reasoned, the disassembly of technology from social activity overlooks the epistemological consequences of technology use. crucially, it isolates the social and meaning-making aspects of digital education from the material (fenwick & landri, 2012). studies that focus on digital agency (passey et al., 2018; shonfeld et al., 2017) emphasise agency as a requirement for and through education. more specifically, digital agency refers to having the necessary digital competencies, digital confidence, and digital accountability to control and adapt to the digital world as an individual. however, this framing of digital agency pays little attention to agency in education and how the digital affects humans. stenalt 54 | f l r 2. aim the relationship between agency and digital contexts of learning has been given little attention in research addressing student agency and digital technology. a tendency has been to subsume the digital to educational intentions, frame digital agency as competencies required to control the digital world, or skip definitions of agency. these perspectives neither consider how the digital affects agency in contexts of learning nor offer support for developing agency-supportive teaching practices. to overcome this gap, this papers sets out an alternative conceptualisation of the digital in relation to student agency. it builds upon insights from media research and frames student actions in digital contexts as socially, technologically, and contextually configured and shaped through student negotiations of agentic power and will. thus, this theoretical paper combines understandings of the digital from media research with versions of agency from student agency research to cultivate empirical investigations that account for the agency-structure interplay in digital contexts of education and consider the possible implications of technology use. also, by drawing more explicitly on media research, it becomes possible to add a relational and participatory dimension to agency that allows us to identify and analyse the conflicts that might lead students to maladaptive practices to protect themselves from forms of digital participation. examples from research and discussions of implications are included to illustrate the theoretical approach suggested. the starting point of the paper is not to develop a new definition of agency as such but to broaden understandings of agency as a primarily individual phenomenon with a more relational understanding of agency (burkitt, 2016; stenalt, 2021), building on the premises of the digital. moreover, while emerging into an arena with blurry boundaries between human and nonhuman agency (fenwick & landri, 2012; leonardi, 2010), the paper focuses on students as the primary research object and uses conceptualisations of human agency as a stepping stone. lastly, the paper should not be seen as attempting to present a complete framework for understanding and supporting student learning in digital contexts. instead, it is intended to serve as a supplement to research and designs for learning. with this in mind, the remainder of the article proceeds in four iterative stages. first, it maps distinct approaches to and dimensions of agency. second, key dynamics of digital engagement from media research are presented. third, the key ideas are used to develop a theoretical framework that includes five domains and suggests a nuanced way of approaching the topic of digital student agency. fourth, having worked through the different domains, the paper discusses possible implications for practice and research. 3. what is student agency? defining student agency is by no means straightforward because student agency has been conceptualised in various ways within higher education literature (nieminen et al., 2021). in constructing the outset of this research, this paper draws on the frameworks typically adopted in higher education research (jääskelä et al., 2016; klemenčič, 2015). in the context of higher education, a sociological approach typically pays attention to the dualistic interplay between humans and structure and the ways power structures and structural factors impact human agency (hitlin & long, 2009). agency is understood as something that actors always have; however, the possibility of fulfilling personal goals is conditional on external structures. higher education studies conceptualising agency in this way tend to foreground individual students’ pursuit of personal, educational success within a context of macrosocial structures. for instance, calitz et al. (2016) examined patterns of unequal participation for working-class first-generation students at a south african university, applying sen’s (2005) capability approach to make sense of the relationship between a person and the social forces that can hinder or enable them to convert resources into capabilities. such forces include physical or mental stenalt 55 | f l r disabilities, variations in available nonpersonal resources, environmental variations, and differences in relative social positions. arkoudis and tran (2007) applied positioning theory and notions of moral agency (harré & van langenhove, 1999) to move away from static and stereotyped descriptions of chinese international students studying in australian higher education and toward understandings that depict these students as actively involved in meaning-making. analysing how students intentionally position themselves in relation to their lecturers and university expectations, these authors highlighted issues of mismatching between institutional expectations and students’ struggles. sociocognitive studies rest on a nondualistic, agentic perspective. in it, human development involves an agent intentionally influencing their life circumstances through individual psychological processes of development or knowledge acquisition (bandura, 2006). bandura (2006) argued, ‘through cognitive self-regulation, humans can create visualised futures that act on the present; construct, evaluate, and modify alternative courses of action to secure valued outcomes; and override environmental influences’ (p. 164). studies proposing this conceptualisation of agency foreground the psychological exercise of self-management through self-reflection, self-regulation, and self-efficacy (bandura, 2001, 2006; zimmerman, 1995). self-management is understood to depend on students’ ability to generate goals for their engagement through cognitive representations of desired future states that match their individual strengths and preferences (zimmerman, 1995). self-efficacy comprises students’ beliefs about their capability to succeed in what is required of them (bandura, 2006). bandura (2006) described the sources of self-efficacy to be (a) enactive mastery experiences (actual performances); (b) social comparison and modelling based on observation of others and their successes and failures (vicarious experiences); (c) forms of persuasion, both verbal and otherwise if within realistic bounds; and (d) interpretation of one’s own physiological and affective states. for instance, malmberg and hagger (2009) investigated supportive and instructional agency beliefs, defined as the perceived ability to facilitate others’ learning. nye et al. (2011) framed agency from a sociocognitive position as a willingness to engage with the curriculum. by contrast, studies that use a sociocultural framework focus on how individuals use the resources they have available in their sociocultural context (eteläpelto, 2017). jääskelä, poikkeus, et al. (2020) defined a subject-centred sociocultural understanding of student agency as ‘a student’s experience of having access to or being empowered to act through personal, relational, and participatory resources, which allow him/her to engage in purposeful, intentional, and meaningful action and learning in study contexts’ (p. 2). sociocultural frameworks have been used to explore course-related experiences of agency (jääskelä et al., 2016) and to map students’ agency profiles and their connection to students’ perceptions of teaching practices (jääskelä, poikkeus, et al., 2020). some scholars study agency from a life-course perspective, directed towards students’ professional futures (soini et al., 2015; toom et al., 2017), and pay attention to students’ past, present, and future. this research refers to emirbayer & mische’s (1998) understanding of agency as ‘the temporally constructed engagement by actors of different structural environments—the temporal relational contexts of action—which, through the interplay of habit, imagination, and judgment, both reproduces and transforms those structures in interactive response to the problems posed by changing historical situations’ (p. 970). the interplay in a specific context is teased out by exploring how students enter into relationships with surrounding people, places, meanings, and events; and students’ actual interactions with their context. the temporal perspective, in particular, has been adopted in recent research into the development of professional agency (soini et al., 2015; toom et al., 2017). in addition, it underpins merrill’s (2014) study in which agency is understood within contexts and time; individuals engage in changing their future. harris et al. (2018) discussed student agency’s temporal nature in the context of assessment, where student resistance to assessment or assessment decisions might draw on past experiences. stenalt 56 | f l r 3.1. limitations in current approaches to student agency even if existing studies address significant aspects of student agency, some argue that research remains limited by typically investigating student agency through only some of the aspects related to agency (jääskelä, heilala, et al., 2020; jääskelä et al., 2016). instead, jääskelä et al. (2016) argued that a holistic operationalisation of agency includes personal, relational, and participatory domains. the personal domain of agency connects to individuals’ resources and disposition towards being agentic. in particular, it involves self-efficacy and competence beliefs. the relational domain of agency refers to relational recourses for learning such as the learning climate, peer support, and power relations. participatory resources involve the contextual dimension of agency, comprising the extent to which students perceive themselves to have the opportunity to participate, influence, and make choices in learning and within the learning context. while the holistic operationalisation of agency might be thought of as supporting the complexity of the construct, the framework remains limited in explaining how, for example, opportunities for active participation emerge and how peers become resources for learning. more specifically, in terms of the purpose of this article, the framework is less helpful in understanding how the digital becomes a resource for students, how the digital affects students and vice versa. further, others argue that there is a need to be more sensitive towards ‘the intentional projects of individual and collective agents, and how these projects are enabled and constrained’ (ashwin, 2012, p. 21). as klemenčič (2015) has described, ‘student agency is conceptualised as a process of student actions and interactions during studentship, which encompasses variable notions of agentic orientation (“will”), the way students relate to past, present and future in making choices of action and interaction, and of agentic possibility (“power”), that is their perceived power to achieve intended outcomes in a particular context of action and interaction’ (p. 16). against this concern, student agency research needs to analytically differentiate agentic possibilities from agentic orientation and account for how agentic possibilities emerge to students and how student orientation influences student actions. this paper contends that this issue requires critical thought – particularly in light of the optimistic accounts of the ways digital technologies benefit student learning (garrison & kanuka, 2004) and the tendency to frame digital learning activities from an educational perspective. 4. central dynamics of digital engagement there is clearly a need to cultivate a better sense of agency within digital contexts and the ways the distinct digital features influence students’ agency. this paper turns to media research to theorise how the digital configures the environment in a way that has the potential to shape students’ engagement. the research of interest examines social network sites, and while networking socially or for professional purposes may not dominate digital technologies use in formal educational contexts, they serve many of the same functions. they allow people to connect and share content with other people than close friends and family, and they help people gather for a purpose. 4.1 networked publics within social network sites, people are expected to act as networked individuals, maintaining various networks of people and resources that can be navigated as required to meet specific needs (boyd, 2010; wellman et al., 2003; wellman & rainie, 2013). as baym and boyd (2012) explained, people who use social media ‘juggle multiple layers and kinds of audiences, bringing into being multiple and diverse kinds of publics, counterpublics, and other emergent social arrangements’ (pp. 321–322). boyd (2010) makes the important observation that it is useful to think of such sites and the practices unfolding here as networked publics. as stated: stenalt 57 | f l r ‘networked publics are publics that are restructured by networked technologies. as such they are simultaneously (1) the space constructed through networked technologies and (2) the imagined collective that emerges as a result of the intersection of people, technology, and practice’ (boyd, 2010, p. 39). according to livingstone (2005), networked publics are constructed by a space and collection of people or bounded by a shared performance or object. while the terminology emphasises networks, it is not the primary purpose of many social sites (boyd & ellison, 2007), and people might not use digital technologies to connect with others (gourlay, rodríguez-illera, barberà, et al., 2021). 4.2 sharing of profiles: locus of interaction and self-representation most social network sites allow participants to generate personal profiles, and boyd (2010) argues that profiles act as the locus of interaction and represent the individual. following this line of thinking, mccosker (2017) has described social sites as performative spaces where people actively construct their identities and profiles for an audience. because individuals’ profiles are objects of selfrepresentation attention, comparison, negotiation, and remix (papacharissi, 2011), constantly editing or remixing oneself is an essential online practice (papacharissi, 2012). in addition to profiles being a site of self-management, profiles are also a site of external control. indeed, self-representation depends on the content and activity selected by the specific media (mccosker, 2017). as illustrated by mccosker (2017), one’s name, username, profile image, ‘about’ information, and relationships (follows, followers) appear to be a standardised basis for personal identifiers. in contrast, data on education, birthday, age, and other websites are less common features in profiles. with external structures restricting selfrepresentation, users often engage in actions of resistance or playfulness to overcome reductive versions of themselves (mccokser, 2017). hence, research has found that performing identity and sociality online involves a constant balancing of social benefits with privacy costs (papacharissi, 2012). thus, while networking sites enable individuals to enact social roles, play and resistance strategies are central to staging a digital profile that conceals the aspects of oneself that the individual would like not to be shared. as digital profiles and data become contested, it is also becoming critical to consider personal privacy. we can think of privacy or expressive privacy as the protection of acts of speech or activity that express self-identity or personhood (ess, 2015). according to ess (2015), a space of expressive privacy is required if individuals are to reflect and critique alternatives. mechanisms implemented in social media allow users to control access to the data they generate to some extent. profiles and usergenerated data may be open source, available for many to access and potentially use, or closed, accessible only to the student or the teacher. yet, privacy settings are enacted in several ways in digital contexts, beyond settings of open/closed or individual/all, such as the ‘following mechanism’ of reciprocal or nonreciprocal kind: either mutual acceptance of each other must be in place to read each other’s contributions, or one-way following is accepted. 4.3 data-based content: persistent, replicable, scalable, and searchable according to boyd (2010), networked individuals are also challenged to manage the persistence, replicability, scalability, and searchability of their profiles and data in digital environments. the first property, persistence, suggests that individuals’ contributions, such as text or expressions, are easily captured and stored. in fact, many systems operate on persistence by default, making previous unmediated moments of communication persistent. second, boyd (2010) argued that technology has increased the replicability of content. because of this and the ease with which one can modify original content, what is original and replicated is hard to discern. third, boyd (2010) mentioned scalability as an affordance of networking sites. scalability is the potential to enhance the distribution of content or who has access to it. however, what is amplified through broad distribution is not always what the content owner would have chosen. lastly, boyd emphasises searchability as an affordance, based on stenalt 58 | f l r the premise that technology use leaves digital traces that can be located to a person or an object. these characteristics imply that profiles and individuals’ activities are difficult to erase and easy to share with known and unknown others. participants’ digital contributions to interactions, then, persist and can easily be retrieved, duplicated, and redistributed across various contexts. when considering the ways the audiences of personal data emerge to humans, this becomes increasingly important. as boyd (2010) stated, ‘in unmediated spaces, it is common to have a sense for who is present and can witness a particular performance. the affordances of networked publics change this’ (p. 49). not knowing one’s audience makes it difficult to make a contribution that considers others’ reactions. 4.4 affect: blurring of relational investment and tools for engagement a crucial underpinning of social media is affective investments on the users’ part. as people engage with digital technologies, they are frequently offered choices for expressing emotions and social bonds. indeed, what makes social networking sites unique is that they allow individuals to connect with others and articulate and enact sociality through affect and affective technical features (boyd & ellison, 2007). technical features such as comments, views, or likes are referenced as social buttons (gerlitz & helmond, 2013). social buttons comprise affective statements such as ‘great’ or affective states such as ‘feeling amused’ as reactions to others’ comments, which are shared with a particular group of users. hence, they allow users not only to share or recommend content but also to share social connections and affect. affective features such as those mentioned transform user affect and spontaneous responses into comparable acts of engagement and signs of social connections that structure digital performance and are critical to social networking sites (boyd, 2010; gerlitz & helmond, 2013). this includes how personal data might act as objects of memories or mementoes on social sites (lupton, 2020). as noted by lupton (2020), data which are shared are archived, and can be revived to connect people with their past connections and activities. yet, while affective interactional acts are critical in terms of maintaining or increasing users’ digital engagement, the so-called like economy ‘is facilitating a web of positive sentiment in which users are constantly prompted to like, enjoy, recommend and buy as opposed to discuss or critique’ (gerlitz & helmond, 2013, p. 1362). thus, a characteristic of networked sites is how networked communication entails a ‘panoply of affective attachments: articulations of desire, seduction, trust and memory; sharp jolts of anger and interest; political passions; investments of time, labor, and financial capital; and the frictions and pleasures of archival practices’ (paasonen et al., 2015, p. 1). as paasonen (2018) reasoned, ‘if time, attention and data are the price that people pay, or that which they hand over in order to access social media, then affective ripples are part of that which is gained in return’ (p. 9-10). 4.5 time: when does something happen online? networked interactions or exchanges can also be seen as underpinned by temporalities that structure activities and influence peoples’ abilities to access and share data. while measures of time such as clocks and calendars dominate, in many instances, time in digital contexts is difficult to pinpoint. ‘network time’ has been proposed by hassan (2007) as a way of opening up the idea of many temporal possibilities. hassan (2007) saw the internet as an inherently asynchronous space where nothing occurs simultaneously. instead, the internet offers multiple spectra of temporalities. operating systems might respond to input at high speed and immediately, but there will always be some temporal lag. the internet connection might be slow or fail, and we can spend seconds, minutes, or hours waiting for content to appear. moreover, the digital space includes different time zones, which challenges the idea of time consistency. network time, then, is referred to as ‘a digitally compressed clock-time’ (hassan 2003, p. 233), which is time that has exploded into a million different time fractions. stenalt 59 | f l r due to this tension, it makes sense to interrogate time less regarding its logical meaning and more in its existential meaning (bennett & burke, 2018; lash, 2001). here, hassan (2007) used adam’s (2008) notion of timescapes to illustrate the range of experiences of time possible. the concept of timescapes captures ‘that we cannot embrace time without simultaneously encompassing space and matter, that is, without embodiment in a specific and unique context’ (adam, 2008, p. 1). following adam’s work, time is constitutive of seven elements: (a) time frame, referring to a bounded unit with a beginning and an end; (b) temporality, referring to the unfolding of time and the direction it takes; (c) timing, referring to when or something happening at a specific time; (d) tempo, referring to the speed at which something happens; (e) duration, referring to considerations of how long something takes; (f) sequence, referring to the order of things such as actions; and (g) temporal modalities, referring to when something happens (in the past, present, or future). crucially, these forms of time are linked to digital behaviour and sense-making. 4.7 implications while student agency research has focused on the individual aspects of human actions and subsumed the digital context to educational intentions, media studies offer significant insight into digital behaviour as socially, culturally, and technologically dependent. this is not to say that technologies used in education have the same precise features as social network technologies or that the purpose of technology use is the same. instead, this paper suggests taking the elements into account as central dynamics forming part of engagement in the digital world to shed light on why students engage the way they do in digital interactions. thus, the value of constructing digital technologies in education as social network sites is analytical. it directs our attention to practices as being informed by the dynamics of networked publics. as such, it involves paying attention to the interweaving of technology and humanity and how these are interconnected with other practices and relations (markham, 2018), rather than focusing on the digital as a tool (focusing on cultural practices in or of digital contexts; markham, 2018) or a medium (viewing the digital context as a cultural space in which one can be present and feel absorbed; markham, 2018). against this background, approaches to digital student agency need to include critical understandings of the way agency is constructed and constrained. this involves the observable settings and features and how the interplay between social, cultural and technological aspects emerge to students. 5. digital student agency a critical framework the paper continues to sketch out a digital student agency framework, which considers the complex construct of student agency in relation to the distinct digital features. the proposed framework involves five domains: agentic possibility, digital self-representation, data uses, digital sociality, and digital temporalities (see table 1). it is important to stress that each domain is critical in orientation, developed to identify and understand the ways student agency is constrained. this coincides with selwyn’s (2010) argument that we need nuanced and thick descriptions of technology use. given the framework’s critical nature, it is not intended to present a normative ideal for how a digital context for learning should be to facilitate student agency. for example, it does not state that online peer feedback should be conducted in a certain way, using a specific technology with specific features. being critical means exploring and understanding the implications that settings and the cultural and social context might have on student agency. some practical examples of how each domain might be approached are described as actions. stenalt 60 | f l r table 1 ‘digital student agency’ framework (disa) domain key questions actions agentic possibility what power do students have to achieve the intended outcomes in the particular context of action and interaction? identifying sources and resources of agency in the interaction analysing the ways sources emerge to students during the interaction determining how the object of engagement links to student trajectories identifying student possibility to influence the object of engagement and the interaction required evaluate the level of access that students have to influence the object of engagement digital selfrepresentation how can students manage and adapt their selfrepresentation in the digital context of learning? identifying how and where options for managing students’ profiles are constructed and processed analysing how students’ profiles are visible to others (peers, teachers, managers) exploring the implications data uses how are student data circulated or recirculated? identifying how and where student contributions are generated and circulated identifying how students can manage their data identifying who has access to student data and when identifying how students’ data can be used, including purposes extending the original intent evaluating the implications of the uses digital sociality how is sociality constructed, and how can students manage sociality? identifying the means that are available for communication and cultivating a sense of sociality identifying how and where sociality is constructed determining how students can manage whom they interact with, how they interact, and the purpose of the interaction analysing the role of socialisation in terms of how it affects the interaction or the outcome of the interaction digital temporalities how are student actions constructed in terms of time? identifying the digital temporalities and analysing the underpinning of time: is it structured by individual students, groups of students, teaching staff, or a digital system? stenalt 61 | f l r analysing how time materialises to students digitally explore the implications of time from a student perspective the first domain mentioned in the framework – agentic possibility – is critical to identifying the possibilities for cultivating agency in the specific context of action. it pays attention to students’ effective opportunities and freedom to do what they reason to make sense within education. it also includes assessing the sources of agency available in the particular context. sources refer here to the broad array of internal dispositions, relational, and contextual sources or resources known from mainstream agency frameworks. while the sources cultivate student agency in higher education contexts (jääskelä, poikkeus, et al., 2020; toom et al., 2017), research has also made clear that technology and educational instructions can hinder access to sources of agency (stenalt, 2021). moreover, ashwin and mcvitty (2015) have emphasised that educational planning of student engagement includes stratified and directive access to knowledge and influence. thus, the first domain is the impetus for discussions of how the educational purpose and guidelines configure student agency. the second domain – digital self-representation – involves identifying the particular ways students are placed in digital interactions and the type of self-representation possible in the specific exchange. while digital system profiles are relatively easy to locate, students might perceive other data types to represent them. for example, students might be identified through the following: • contributions with their signature • an alias • the order of appearance • pictures • oral expressions of authorship in-class supplementing the digital contribution • familiarity with students’ style of expression once the profiles have been identified, we can begin to think about how students can control their self-representation and explore how it might influence student behaviour. while social media often enable users to adjust to their privacy settings (waterloo et al., 2018), these choices are likely to be distributed to platform managers or teachers within educational contexts. students, then, are managed as a collective group and assumed to have the same relationship with each of their peers. following bayne et al. (2019), students might turn to acts of resistance if they cannot control their selfrepresentation. similarly, research has found limited options for self-representation to decrease student engagement (stenalt, 2021). the third domain – data uses – examines the life of student data. for example, does data stick around in the sense that others can access the information? does it have a value for students in extending the particular interaction and interaction outcomes? because learning activities aligned with the formal assessment are of high importance to students (biggs & tang, 2011), critical academic data, which disappears, might lead to student frustration. automated personalisation of educational offerings using students’ learning data as the stepping stone can be seen by students to reduce them to numbers and restrict their access to resources (tsai et al., 2020). it is crucially important to address the conflicting beliefs and opportunities available for social learning in digital contexts. understanding the type of sociality that an interaction fosters or requires is the fourth domain’s concern. the domain also supports explorations of expected affective investment in relation to successful participation and students’ actual affective investment – allowing insight into the way sociality is constituted. for example, successful collaboration is seen to require coregulation (volet et al., 2009), which involves ‘individuals’ various attempts to affect each other’s motivation, stenalt 62 | f l r emotional state, cognitive actions, etc. for their own purpose or others’ benefits, or alternatively to coordinate their actions for a shared purpose’ (järvenoja et al., 2013, p. 35). while social and communicative activities are essential for maintaining a positive group climate (janssen et al., 2012), such opportunities and features might not be sufficiently devised digitally. at the same time, research has found that students prefer technologies that ease the logistics of their life, rather than technologies that support collaborative work (henderson et al., 2017). this challenges ideals of digital collaboration. relatedly, educational framings of interactions may stress social learning but provide limited opportunities for students to affect the object of engagement. here, we can draw on ashwin and mcvitty (2015) model to explore the potential impact. in light of this, decoding the imagined collective, the actual collective, and the role of the collective should be an essential element of understanding digital educational spaces. the last domain – digital temporalities – enables consideration of the temporalities involved in the context and how they affect students. following from hassan and others, students’ relation to others and content in digital contexts is not a universal thing but instead processes that develop from being engaged and shaped by structures that include the educational instructions and technology used. as a simple example, a blog post is, by default, typically invisible for others during the writing of the post. the invisibility ceases after data has been published, where it is structured in relation to the logic of a calendar. it remains visible to students until the month or week changes or until a sufficient number of other data entries are made. while the data visually disappears as time moves on, it can be retrieved later on through search mechanisms. in contrast, entries by students through kahoot, which can be used to facilitate quizzes with students in the same room, fall into sequences, are visible for all students for a limited time (the duration of a sequence), and cannot be retrieved later on by the individual student. taking a broader perspective, technologies and educational choices produce different relationships to time and content. time settings also risk labelling certain groups of students rather than helping them achieve their potential. students, for example, who struggle to manage their time and meet deadlines in an online course with various activities might see themselves as lacking self-discipline and less capable of studying (bennett & burke, 2018). for students with small children at home, an online course might be more challenging to attend than a course at the campus, reducing the proposed benefit of providing student flexibility (kirkwood & price, 2014). additionally, the temporality of digital interactions can affect students’ approaches to learning. lash (2001), for example, has described how technological forms of life that are sped up can result in content being devaluated within hours or days (lash, 2001). what lash (2001) points to is how experiences of limited time or moving fast forward can lead to a higher degree of student insecurity because their attention may become directed to the consequences of the present for the future rather than looking to the past to explain the present. reeve and jang (2006) found that limited time to engage and a teacher monopolising time correlated negatively with student experience of autonomy. this, at least, invites us to consider how the speed of digital interactions and the distribution of time relate to student self-regulation and self-efficacy. 6. future approaches to student agency recent events such as covid-19 and the turn towards distance education have increased the awareness of digital technologies for education (williamson et al., 2020). due to the challenges of making online teaching feel meaningful and relevant to students, approaches that help develop digital student agency are needed. in light of this, the paper offers three recommendations. first, the paper suggests that the digital should not be assumed to support agency by default. rather than subsuming agency to technical affordances or pedagogical guidelines, the digital should be confronted as something constituted by several interrelated digital domains nested in social relations. stenalt 63 | f l r key to this, the framework offers a disposition that encourages critical investigations of students’ agency possibilities in digital contexts, how they may emerge to students, and affect their actions. following this, the framework moves beyond simply stating online tools and educational intentions when exploring digital contexts of education to in-depth investigations of the dynamics configuring digital environments. in specific, the framework enables explorations of the invitational quality of the digital (adams & thompson, 2016) and decentring the object of inquiry (pink et al., 2017), allowing researchers to give an account of the mode of being constructed through digital technologies and the way agency is constrained. taking the domains into account, it becomes clear how digital technologies can mediate our being in the world and direct our attention in a certain way, affecting how we come to know the content (rosenberger, 2017a, 2017b). thus, technology use fundamentally involves substantive interventions for the context into which it is embedded. in this light, the digital is neither a neutral mediator of information nor independent of the system into which it is adopted. the use of the domains allows us to raise a range of questions to explore the interplay between human-technology in educational contexts, including the following: • how do educational practices constrain student agency? • what forms of settings do students see as appropriate in cultivating their agency and making connections with others to learn with digital technology? • how do students make sense of data produced and shared online? • what forms experiences of time, and how do temporalities affect students’ engagement? second, the logic of digital engagement suggests that agency in digital contexts needs to be understood less as an individual phenomenon and more as a relational phenomenon to balance digital ways of being. the relational approach to agency points to the notion of ‘relational agency’ (burkitt, 2016), whereby agency describes ‘people producing particular effects in the world and on each other through their relational connections and joint actions, whether or not those effects are reflexively produced’ (burkitt, 2016, p. 323). the key point of relational agency is not to understand the digital from a single student perspective but to understand that sense-making of the digital is associated with the different forms of relationships being confronted and constituted in the digital context. as burkitt (2016) stated, ‘it is not simply relations, and the objects and entities produced in relations, that is the focus of study; more specifically, it is social relations and the mode of life humans produce through them, including material culture and technology, that relational sociologists need to bring into the analysis’ (p. 331). in light of this, the framework can build an understanding of the forms of power affecting student agency and how to organise learning that considers the challenges to student agency in digital contexts. third, the complexity of agency in digital contexts highlights the need to consider how technology-supported student-centred environments emerge to students. while jääskelä, poikkeus, et al. (2020) have identified a relationship between high levels of student agency and perceptions of courses as student-centred, more knowledge is needed of the features constituting a student-centred environment to develop models that can guide the design of such environments. for instance, to what degree does student-centredness depend on high levels of privacy and how can privacy be materialised in designs? here, the framework can be used to help identify the underlying features of a digital studentcentred environment from a student perspective. 7. conclusion because digital education connects to student agency, it is important that we understand the various qualities and capabilities of digital contexts of learning and how they affect student agency. this article has outlined how approaches to understanding the digital in existing student agency research are stenalt 64 | f l r too narrow and therefore miss out on important insight derived from the field of media research. instead, the paper proposes a digital student agency framework for developing a better understanding of the human-technology interplay that pays attention to the ways relational, cultural, and technological dynamics constitute agency. the proposed framework suggests understanding digital student agency through five domains formed by media research and research into student agency in higher education. the framework is an initial attempt at addressing the interplay between student agency and digital contexts of learning. therefore, it invites testing and critique of the framework and its domains. developing knowledge of digital ways of being in educational contexts is complex. in doing so, it makes sense to look at student agency as a basis for exploring the student perspective and working out realistic accounts of digital encounters and ways to support students in digital contexts. thus, the task proposed by this paper is not to better digital education and practices by promoting certain ideals of higher education teaching and learning but by considering the ways students’ trajectories and social connections might challenge students in mundane digital contexts. as digital ways of engaging and managing students become ubiquitous, the distinction between learning as private and learning as public will become blurry. the considerations presented by the framework will not only be constrained to specific digital teaching-learning interactions but will be part of students’ everyday life. keypoints the article: provides a critical approach to student agency in digital contexts of higher education expands current student agency research by moving beyond an agency-context dichotomy towards understandings of agency-context as interrelated offers a ‘digital student agency’ framework that distinguishes five significant domains: (1) agentic possibility, (2) digital self-representation, (3) data uses, (4) digital sociality, and (5) digital temporalities references adam, b. (2008). of timescapes, futurescapes and timeprints. in l. university (ed.), lüneburg talk web 070708. adams, c., & thompson, t. l. (2016). attending to objects, attuning to things. in t. l. t. cathrine adams (ed.), researching a posthuman world (pp. 23-56). palgrave macmillan. https://doi.org/10.1057/9781-137-57162-5_2. arkoudis, s., & tran, l. t. (2007). international students in australia: read ten thousand volumes of books and walk ten thousand miles. asia pacific journal of education, 27(2), 157-169. https://doi.org/10.1080/02188790701378792. ashwin, p. (2012). analysing teaching-learning interactions in higher education. accounting for structure and agency. london, new york, continuum. ashwin, p., & mcvitty, d. (2015). the meanings of student engagement: implications for policies and practices. in a. m. curaj, liviu; pricopie, remus; salmi, jamil; scott, peter (ed.), the european higher education area between critical reflections and future policies (pp. 343-359). springer open. https://doi.org/10.1007/978-3-319-20877-0_23. bandura, a. (2002). growing primacy of human agency in adaptation and change in the electronic era. european psychologist, 7(1), 2. https://doi.org/10.1027/1016-9040.7.1.2. bandura, a. (2006). toward a psychology of human agency. perspectives on psychological science, 1(2), 164-180. https://doi.org/10.1111/j.1745-6916.2006.00011.x. stenalt 65 | f l r baym, n. k., & boyd, d. (2012). socially mediated publicness: an introduction. journal of broadcasting & electronic media, 56(3), 320-329. https://doi.org/10.1080/08838151.2012.705200. bayne, s. (2015). what's the matter with ‘technology-enhanced learning’? learning, media and technology, 40(1), 5-20. https://doi.org/10.1080/17439884.2014.915851. bayne, s., connelly, l., grover, c., osborne, n., tobin, r., beswick, e., & rouhani, l. (2019). the social value of anonymity on campus: a study of the decline of yik yak. learning, media and technology, 44(2), 92-107. https://doi.org/10.1080/17439884.2019.1583672. bennett, a., & burke, p. j. (2018). re/conceptualising time and temporality: an exploration of time in higher education. discourse: studies in the cultural politics of education, 39(6), 913-925. https://doi.org/10.1080/01596306.2017.1312285. biggs, j., & tang, c. (2011). teaching for quality learning at university (4 ed.). open university press. boyd, d. (2010). social network sites as networked publics: affordances, dynamics, and implications. in z. papacharissi (ed.), a networked self (pp. 47-66). routledge. boyd, d. m., & ellison, n. b. (2007). social network sites: definition, history, and scholarship. journal of computer‐mediated communication, 13(1), 210-230. https://doi.org/10.1111/j.10836101.2007.00393.x. burkitt, i. (2016). relational agency: relational sociology, agency and interaction. european journal of social theory, 19(3), 322-339. https://doi.org/10.1177/1368431015591426. calitz, t. m. l., walker, m., & wilson-strydom, m. (2016). theorising a capability approach to equal participation for undergraduate students at a south african university. perspectives in education, 34(2), 57-69. https://doi.org/10.18820/2519593x/pie.v34i2.5. damşa, c. i., kirschner, p. a., andriessen, j. e., erkens, g., & sins, p. h. (2010). shared epistemic agency: an empirical study of an emergent construct. the journal of the learning sciences, 19(2), 143-186. https://doi.org/10.1080/10508401003708381. emirbayer, m., & mische, a. (1998). what is agency? american journal of sociology, 103(4), 962-1023. https://doi.org/10.1086/231294. ess, c. (2015). new selves, new research ethics. in h. fossheim & h. ingierd (eds.), internet research ethics (pp. 48-76). cappelen damm akademisk. eteläpelto, a. (2017). emerging conceptualisations on professional agency and learning. in m. p. goller, susanna (ed.), agency at work an agentic perspective on professional learning and development (1 ed., pp. 183-201). springer. doi:10.1007/978-3-319-60943-0_10. fenwick, t., & landri, p. (2012). materialities, textures and pedagogies: socio-material assemblages in education. pedagogy, culture & society, 20(1), 1-7. https://doi.org/10.1080/14681366.2012.649421. garrison, d. r., & kanuka, h. (2004). blended learning: uncovering its transformative potential in higher education. the internet and higher education, 7(2), 95-105. https://doi.org/10.1016/j.iheduc.2004.02.001. gerlitz, c., & helmond, a. (2013). the like economy: social buttons and the data-intensive web. new media & society, 15(8), 1348-1365. https://doi.org/10.1177/1461444812472322. gourlay, l., rodríguez-illera, j. l., barberà, e., bali, m., gachago, d., pallitt, n., jones, c., bayne, s., hansen, s. b., hrastinski, s., jaldemark, j., themelis, c., pischetola, m., dirckinck-holmfeld, l., matthews, a., gulson, k. n., lee, k., bligh, b., thibaut, p., vermeulen, m., nijland, f., vrielingteunter, e., scott, h., thestrup, k., gislev, t., koole, m., cutajar, m., tickner, s., rothmüller, n., bozkurt, a., fawns, t., ross, j., schnaider, k., carvalho, l., green, j. k., hadžijusufović, m., hayes, s., czerniewicz, l., knox, j., & networked learning editorial, c. (2021). networked learning in 2021: a community definition. postdigital science and education. https://doi.org/10.1007/s42438-02100222-y hamilton, e., & friesen, n. (2013). online education: a science and technology studies perspective/éducation en ligne: perspective des études en science et technologie. canadian journal of learning and technology/la revue canadienne de l’apprentissage et de la technologie, 39(2). harré, r., & van langenhove, l. (1999). positioning theory: moral contexts of intentional action. blackwell oxford. stenalt 66 | f l r harris, l. r., brown, g. t., & dargusch, j. (2018). not playing the game: student assessment resistance as a form of agency. the australian educational researcher, 45(1), 125-140. https://doi.org/10.1007/s13384-018-0264-0. hassan, r (2003). network time and the new knowledge epoch. time & society, 12(2-3), 226-241. https://doi.org/10.1177/0961463x030122004. hassan, r. (2007). 24/7: time and temporality in the network society, stanford university press. henderson, m., selwyn, n., & aston, r. (2017). what works and why? student perceptions of ‘useful’digital technology in university teaching and learning. studies in higher education, 42(8), 1567-1579. https://doi.org/10.1080/03075079.2015.1007946. hitlin, s., & long, c. (2009). agency as a sociological variable: a preliminary model of individuals, situations, and the life course. sociology compass, 3(1), 137-160. https://doi.org/10.1111/j.17519020.2008.00189.x. irvine, v., code, j., & richards, l. (2013). realigning higher education for the 21st century learner through multi-access learning. journal of online learning and teaching, 9(2), 172. janssen, j., erkens, g., kirschner, p. a., & kanselaar, g. (2012). task-related and social regulation during online collaborative learning. metacognition and learning, 7(1), 25-43. https://doi.org/10.1007/s11409010-9061-5 järvenoja, h., volet, s., & järvelä, s. (2013). regulation of emotions in socially challenging learning situations: an instrument to measure the adaptive and social nature of the regulation process. educational psychology, 33(1), 31-58. https://doi.org/10.1080/01443410.2012.742334. jääskelä, p., heilala, v., kärkkäinen, t., & häkkinen, p. (2020). student agency analytics: learning analytics as a tool for analysing student agency in higher education. behaviour & information technology, 1-19. https://doi.org/10.1080/0144929x.2020.1725130. jääskelä, p., poikkeus, a.-m., häkkinen, p., vasalampi, k., rasku-puttonen, h., & tolvanen, a. (2020). students’ agency profiles in relation to student-perceived teaching practices in university courses. international journal of educational research, 103. https://doi.org/10.1016/j.ijer.2020.101604. jääskelä, p., poikkeus, a. m., vasalampi, k., valleala, u. m., & rasku-puttonen, h. (2016). assessing agency of university students: validation of the aus scale. studies in higher education, 1-19. https://doi.org/10.1080/03075079.2015.1130693 kirkwood, a., & price, l. (2011). enhancing learning and teaching through technology: a guide to evidencebased practice for academic developers. h. e. academy. http://oro.open.ac.uk/32489/ kirkwood, a., & price, l. (2014). technology-enhanced learning and teaching in higher education: what is ‘enhanced’and how do we know? a critical literature review. learning, media and technology, 39(1), 6-36. https://doi.org/10.1080/17439884.2013.770404. klemenčič, m. (2015). what is student agency? an ontological exploration in the context of research on student engagement. in m. klemenčič, s. bergan, & r. primožič (eds.), student engagement in europe: society, higher education and student governance. (pp. 11-29). council of europe higher education series no. 20. strasbourg: council of europe publishing. klemenčič, m. (2017). from student engagement to student agency: conceptual considerations of european policies on student-centered learning in higher education. higher education policy, 30(1), 69-85. https://doi.org/10.1057/s41307-016-0034-4. klemenčič, m., pupinis, m., & kirdulytė, g. (2020). mapping and analysis of student-centred learning and teaching practices: usable knowledge to support more inclusive, high-quality higher education (neset analytical report). publications office of the european union. http://dx.doi.org/10.2766/67668. lash, s. (2001). technological forms of life. theory, culture & society, 18(1), 105-120. https://doi.org/10.1177/02632760122051661. leonardi, p. m. (2010). digital materiality? how artifacts without matter, matter. first monday, 15(6). https://doi.org/10.5210/fm.v15i6.3036. lindgren, r., & mcdaniel, r. (2012). transforming online learning through narrative and student agency. educational technology & society, 15(4), 344-355. livingstone, s. (2005). on the relation between audiences and publics. in s. livingstone (ed.), audiences and publics: when cultural engagement matters for the public sphere. (2 ed., pp. 17-41). intellect books. stenalt 67 | f l r luo, h., yang, t., xue, j., & zuo, m. (2019). impact of student agency on learning performance and learning experience in a flipped classroom. british journal of educational technology, 50(2), 819-831. https://doi.org/10.1111/bjet.12604. lupton, d. (2020). data selves: more-than-human perspectives. polity press. malmberg, l. e., & hagger, h. (2009). changes in student teachers' agency beliefs during a teacher education year, and relationships with observed classroom quality, and day-to-day experiences. british journal of educational psychology, 79(4), 677-694. https://doi.org/10.1348/000709909x454814. marín, v. i., de benito, b., & darder, a. (2020). technology-enhanced learning for student agency in higher education: a systematic literature review. interaction design and architecture(s) journal ixd&a, 45, 15-49. markham, a. n. (2018). ethnography in the digital internet era. in n.k. denzin & y.s. lincoln (eds.), sage handbook of qualitative research (5 ed., pp. 650-668). sage. mccosker, a. (2017). data literacies for the postdemographic social media self. first monday, 22(10). https://doi.org/10.5210/fm.v22i10.7307. merrill, b. (2014). determined to stay or determined to leave? a tale of learner identities, biographies and adult students in higher education. studies in higher education, 40(10), 1859-1871. https://doi.org/10.1080/03075079.2014.914918 nieminen, j. h., tai, j., boud, d., & henderson, m. (2021). student agency in feedback: beyond the individual. assessment & evaluation in higher education, 1-14. https://doi.org/10.1080/02602938.2021.1887080. nye, a., hughes-warrington, m., roe, j., russell, p., deacon, d., & kiem, p. (2011). exploring historical thinking and agency with undergraduate history students. studies in higher education, 36(7), 763-780. https://doi.org/10.1080/03075071003759045. oecd. (2018). the future of education and skills: education 2030. oecd education working papers. oecd paris, france. papacharissi, z. (2011). conclusion. a networed self. in z. papacharissi (ed.), a networked self (pp. 304318). routledge. papacharissi, z. (2012). without you, i'm nothing: performances of the self on twitter. international journal of communication, 6, 18. passey, d., shonfeld, m., appleby, l., judge, m., saito, t., & smits, a. (2018). digital agency: empowering equity in and through education. technology, knowledge and learning, 23(3), 425-439. https://doi.org/10.1007/s10758-018-9384-x. paasonen, s. (2018). affect, data, manipulation and price in social media. distinktion: journal of social theory, 19(2), 214-229. https://doi.org/10.1080/1600910x.2018.1475289. paasonen, s., hillis, k., & petit, m. (2015). networks of transmission: intensity, sensation, value. in k. hillis, s. paasonen, & m. petit (eds.). networked affect. (pp. 1-24) cambridge: mit press. pink, s., sumartojo, s., lupton, d., & heyes labond, c. (2017). empathic technologies: digital materiality and video etnography. visual studies, 32(4), 371-381. https://doi.org/10.1080/1472586x.2017.1396192. reeve, j., & jang, h. (2006). what teachers say and do to support students' autonomy during a learning activity. journal of educational psychology, 98(1), 209-218. https://doi.org/10.1037/00220663.98.1.209. rosenberger, r. (2017a). the ict educator’s fallacy. foundations of science, 22(2), 395-399. https://doi.org/10.1007/s10699-015-9457-4. rosenberger, r. (2017b). notes on a nonfoundational phenomenology of technology. foundations of science, 22(3), 471-494. https://doi.org/10.1007/s10699-015-9480-5. selwyn, n. (2010). looking beyond learning: notes towards the critical study of educational technology. journal of computer assisted learning, 26(1), 65-73. https://doi.org/10.1111/j.13652729.2009.00338.x. sen, a. (2005). human rights and capabilities. journal of human development, 6(2), 151-166. https://doi.org/10.1080/14649880500120491 stenalt 68 | f l r shonfeld, m., passey, d., appleby, l., judge, m., saito, t., smits, a., khablan, s., & starkey, l. (2017). digital agency to empower equity in education: summary report. in: rethinking learning in a digital age. edusummit 2017. (pp. 39-45). soini, t., pietarinen, j., toom, a., & pyhältö, k. (2015). what contributes to first-year student teachers’ sense of professional agency in the classroom? teachers and teaching, 21(6), 641-659. https://doi.org/10.1080/13540602.2015.1044326. starkey, l. (2019). three dimensions of student-centred education: a framework for policy and practice. critical studies in education, 60(3), 375-390. https://doi.org/10.1080/17508487.2017.1281829. stenalt, m. h. (2021). researching student agency in digital education as if the social aspects matter: students’ experience of participatory dimensions of online peer assessment. assessment & evaluation in higher education, 46(4), 644-658. https://doi.org/10.1080/02602938.2020.1798355. toom, a., pietarinen, j., soini, t., & pyhältö, k. (2017). how does the learning environment in teacher education cultivate first year student teachers' sense of professional agency in the professional community? teaching and teacher education, 63, 126-136. https://doi.org/10.1016/j.tate.2016.12.013. tsai, y.-s., perrotta, c., & gašević, d. (2020). empowering learners with personalised learning approaches? agency, equity and transparency in the context of learning analytics. assessment & evaluation in higher education, 1-14. https://doi.org/10.1080/02602938.2019.1676396. volet, s., vauras, m., & salonen, p. (2009). self-and social regulation in learning contexts: an integrative perspective. educational psychologist, 44(4), 215-226. https://doi.org/10.1080/00461520903213584. waterloo, s. f., baumgartner, s. e., peter, j., & valkenburg, p. m. (2018). norms of online expressions of emotion: comparing facebook, twitter, instagram, and whatsapp. new media & society, 20(5), 18131831. https://doi.org/10.1177/1461444817707349. wellman, b., quan-haase, a., boase, j., chen, w., hampton, k., díaz, i., & miyata, k. (2003). the social affordances of the internet for networked individualism. journal of computer-mediated communication, 8(3). https://doi.org/10.1111/j.1083-6101.2003.tb00216.x wellman, b., & rainie, l. (2013). if romeo and juliet had mobile phones. mobile media & communication, 1(1), 166-171. https://doi.org/10.1177/2050157912459505 williamson, b., eynon, r., & potter, j. (2020). pandemic politics, pedagogies and practices: digital technologies and distance education during the coronavirus emergency. learning, media and technology, 45(2), 107-114. https://doi.org/10.1080/17439884.2020.1761641. zimmerman, b. j. (1995). self-regulation involves more than metacognition: a social cognitive perspective. educational psychologist, 30(4), 217-221. https://doi.org/10.1207/s15326985ep3004_8. microsoft word andres et al_publication.docx frontline learning research vol.3 no. 3 special issue (2015) 5 22 issn 2295-3159 1 corresponding author: lesley andres, 2125 main mall, university of british columbia | vancouver, bc v6t 1z4, phone +1 604 822 8943, fax +1 604 822 4244, email lesley.andres@ubc.ca, doi: http://dx.doi.org/10.14786/flr.v3i3.177 drivers and interpretations of doctoral education today: national comparisons lesley andresa1, søren s. e. bengtsenb, liliana del pilar gallego castañoc, barbara crossouardd, jeffrey m. keefere, kirsi pyhältöf a university of british columbia, canada b aarhus university, denmark c university of caldas, colombia d university of sussex, uk e new york university, usa f university of oulu and university of helsinki, finland article received 17 may 2015 / revised 17 may 2015 / accepted 16 june 2015 / available online 14 august 2015 abstract in the last decade, doctoral education has undergone a sea change with several global trends increasingly apparent. drivers of change include massification and professionalization of doctoral education and the introduction of quality assurance systems. the impact of these drivers, and the forms that they take, however, are dependent on doctoral education within a given national context. this paper is frontline in that it contributes to the literature on doctoral education by examining the ways in which these global trends and drivers are being taken up in policies and practices by various countries. we do so by comparing recent changes in each of the following countries: canada, colombia, denmark, finland, the uk, and the usa. each country case is based on national education policies, policy reports on doctoral education (e.g., oecd and eu policy texts), and related materials. we use the same global drivers to examine educational policies of each country. however, depending each national context, these drivers are framed in considerably different ways. this raises questions about (1) their comparability at a global level and (2) the universality of the phd. also we find that this global-local nexus reveals unresolved tensions within the national doctoral educational frameworks. keywords: doctoral education, higher education policy, massification, professionalization, quality assurance andres et al | f l r 6 1. introduction globally, research and researchers are viewed increasingly as critical to social and economic competitiveness and societal health (e.g., uk council for science and technology, 2007; european commission, 2014). it follows that over the past quarter of a century, the education of future researchers, principally through doctoral education, has become increasingly valued. as doctoral education shifts from the periphery (e.g., available to a small elite) to a more mainstream trajectory of the total educational experience, it is undergoing a sea change. several global trends and related drivers of such changes can be identified. the forms that the drivers take, however, and their impacts, are dependent on the specific contexts of doctoral education in a given national context. our paper contributes to the literature on doctoral education by examining the ways in which these global trends and drivers are being taken up in policies and practices by various countries. depending on priorities, path dependencies, and openness to change, global trends play out in different ways in given countries. however, countries are also influenced by wider historical, economic, and cultural geopositioning. in this paper, we highlight how drivers and trends have manifested themselves in individual countries. in a six country comparative case study – approach canada, colombia, denmark, finland, the uk, and the usa – we address the following question: what recent changes related to doctoral education in relation to the three drivers and trends identified above can be identified in each country? to address this question, document based cases of doctoral education in each country are presented below. particular attention has been paid to the identification of the most recent policy changes in doctoral education and the ways in which the changes are taken up. drawing on our analysis of each context, we conclude by proposing future research agendas for examining doctoral education. the case countries were selected because they present different cultural geopositionings and traditions of doctoral education, ranging from the more structured and course work based model of the usa to the less structured model in nordic countries. also, the cases present variation in terms of the extent to which the higher education system in a given country is teaching-oriented, – for example, colombia as a highly teaching-oriented system and finland more research oriented – their emphasis on performance based management (e.g., the uk and usa presenting highly performance-based systems, denmark, finland and canada being at the middle and colombia being at the other end), and whether a country’s higher education system is in the process of developing (colombia), recently developed (finland and denmark) or well developed (canada, usa & uk) (shin, 2010; shin & jung, 2014). first, we begin with an overview of the key drivers, followed by country specific descriptions. 2. global drivers of doctoral education core global drivers affecting doctoral education have been identified in the research literature (e.g., kehm, 2006) and in various policy reports (oecd, 2010; 2014; department for education and skills, 2003). these trends include massification of doctoral education, professionalization of doctoral education and careers, and the development of various quality assurance systems. 1.1. massification of doctoral education worldwide, the number of doctoral students and the number of doctoral degree holders has increased significantly. since 2000, the proportion of those who have earned doctoral degrees has risen by 38% from 154,000 new graduates in 2000 to 213,000 new doctoral graduates in 2009 in oecd countries (auriol, misu & freeman, 2013; oecd, 2014). on average in 2009, 1.6% of young people, compared to 1% in 2000, in oecd countries have earned doctoral degrees (oecd, 2014). although graduation rates for women in 2012 (1.5%) at the doctoral level are still somewhat lower than those of men (1.7%), in several countries the expected proportion of women who are expected to graduate is larger based on increased number of women andres et al | f l r 7 currently undertaking the doctoral studies (oecd, 2014). massification of doctoral education has also increased researcher mobility. in 2010, worldwide about 3.6 million students were enrolled as international students in tertiary education (auriol, misu, & freeman, 2013) and it is assumed that this number will continue to grow (moguerou & di pietrogiacomo, 2008; & rizen & marconi, 2011). in addition to a highly educated work force, rapid increases in the number of doctoral degree holders have resulted in an unequal balance across disciplines. for instance, the number of doctoral degree holders in the majority of oecd countries is significantly higher is natural sciences than in humanities (oecd, 2014). also, there is considerable variation in gender representation of doctoral degree holders across countries; as such, these figures mask substantial differences in the gender balance across different disciplines. at the phd level, education, health, and welfare and the humanities continue to be female dominated; male phds are predominant in science, mathematics and computing, and particularly in engineering, manufacturing, and construction (oecd, 2012). there is some evidence that outside of academia, labour markets have not been able to fully absorb these highly qualified individuals kehm (2006). in general, however, high employment rates between 93 to 99% have been reported among individuals possessing doctoral degrees. in most countries, employment rates of male doctoral degree holders slightly exceed those of females and male doctoral degree holders have higher earnings than their female counterparts (auriol, misu, & freeman, 2013). 1.2. professionalization of doctoral education considering the rise in the number of doctoral degree holders, it is evident that not all will be able to pursue careers in academia, nor should they be assumed to desire this. based on a comparison of oecd countries, doctoral degree holders in the natural sciences and engineering are more likely to be engaged in research, while social scientists are likely to find more opportunities in non-research occupations (auriol, misu, & freeman, 2013). given that research skills are also now seen as being valuable to a broad range of employment sectors, a current driver is therefore the perceived need to better prepare doctoral students to work outside of academia through emphasizing more strongly the acquisition of “generic skills” in doctoral education (eua, 2009; 2010; fiske, 2011; gilbert, balatti, turner & whitehouse, 2004; oecd, 2012). doctoral degree holders are considered to have the potential to contribute to economic growth, advancement, and diffusion of knowledge and technologies, and to solve societal and environmental problems (auriol, schaaper & felix, 2012). research, particularly in engineering, sciences, and medicine, is expected to result in innovations that will increase national competitiveness. also, researchers are expected to participate in turning scientific discoveries into patents and innovations. hence, fostering an entrepreneurial culture by instilling the skills and attitudes needed for creative enterprises is suggested to be a central part of 21st century researcher competence (oecd, 2010). this is driven by (1) an increased number of doctoral students, (2) an agenda to create “free flow of knowledge,” (3) accountability demands, such as reducing the time spent earning the degree, and (4) the goal of lowering levels of attrition among doctoral students. for instance, in europe the berlin communiqué, 2003 and bucharest communiqué, 2012 have espoused professionalization of doctoral education including emphasising learning generic skills yet, a comparison of 19 oecd countries shows that government policies typically emphasise general researcher development, employability of researchers in academia, and improving research work rather than explicitly transferable skills in doctoral education (oecd, 2012). somewhat paradoxically, this agenda sits alongside a perceived need to develop more comparable and structured doctoral programs, which suggests increasing standardisation and routination of programs of study. 1.3. quality assurance in knowledge-based economies, knowledge production has become a commoditized and strategic resource (fernandez-zubieta & guy, 2010; kehm, 2006). the impact of global competition has resulted in a greater emphasis on evaluating the quality of research (adras 2011). frequent evaluation is seen as a means to meet the demands of greater transparency to the public and accountability of research organizations andres et al | f l r 8 (edler, georhiou, blind & uyrra 2012). many western countries have adopted higher education policies such as systematic benchmarking and research evaluation of universities, including doctoral education, as a means of quality assurance (e.g. buela-casal, gutierrez-matinez, bermudez-sanchez & vadillo-munzo 2007). principal methods used in quality assurance are peer review, high volume bibliometric data (geuna & martin 2003), or a combination of these methods (e.g., informed peer review). quality assurance has resulted in the burgeoning of global ranking schemes that have contributed to the intensification of institutional hierarchies. also, the role of strategic alliances and competitive advantages – among market areas, countries, universities, and even individuals – has become an increasingly important asset in research. as knowledge producers, doctoral students are recognized as increasingly important societal and economic assets. a downside of this is that practices such as poaching highly qualified people who travel abroad from developing countries to earn doctoral credentials is on the increase (auriol, schaaper & felix, 2012; oecd, 2014). 3. research design the paper focuses on the exploration of global drivers of doctoral education and their local manifestation by using a comparative case study strategy (yin, 2012). each country case is based on national education policies, policy reports on doctoral education (e.g., oecd and eu policy texts), and related materials. based on similarities and differences in terms of recent changes in the area of doctoral education in each country (hsieh & shannon, 2005), changes related to massification, professionalization and quality assurance were most frequently reported. accordingly, our comparison focuses on addressing these three trends. 4. country cases each country invoked different ways in which trends have unfolded. hence, each country case was analysed according to its most predominant trends. to provide readers a systematic overview, we conclude this analysis by summarising the findings in table 1. 4.1. canada in recent years, it has been recognized that canada needs more individuals educated at the phd level. according to the conference board of canada (2014), “highly skilled people [i.e., phd graduates] are key to the creation, commercialization, and diffusion of innovation” (p. 1). yet, since 1998, canada has earned a “d” in the multi-country rankings of phd graduates. in 2010, canada was ranked 15th out of 16 in terms of numbers of graduated phds. this suggests the need toward, rather than away from massification of doctoral programs and graduates. however, coordinated efforts to change the course of phd education are difficult because of canada’s decentralized education system. in terms of phd studies, responsibility for education – including higher education – rests with the provinces,. the primary influence of the federal government on increasing the number of phd graduates is through the awarding of doctoral scholarships. regarding phd funding, rather than providing moderate scholarships to many students, currently the trend is to award a select few with “winner take all” super-scholarships (frank, 1999; tamburri, 2013) the federal government has moved away from a more equitable playing field to one of promoting academic “stars” housed in institutions of “excellence,” which seems to be at odds with the goal of increasing the number of phd graduates. other types of funding, for example, by the universities themselves (e.g., through teaching assistantships) and faculty research grants, are not guaranteed and are disproportionally available across andres et al | f l r 9 disciplines. hence, some students may spend their entire doctoral careers with little or no financial support. several drivers for the need to re-imagine the phd can be found in both policy documents and in the academic literature, including lengthy time to completion, limited or uneven funding opportunities, disappointing completion rates, allegedly antiquated forms of assessment (i.e., the traditional doctoral dissertation), oversupply in some disciplines, demand for skilled workers – highly qualified personnel (hqp) – in a knowledge society, and a poor employment outlook within academia (elgar, 2003; institute for the public life of arts and ideas, 2013, tamburri, 2013). however, the demand for highly qualified personnel could be argued to be the strongest driver of change. the policy headlights appear to be aimed most strongly on changes that will produce labour market-ready workers – in other words, professionalisation of the phd – who will be employed outside of the tenure track framework. however, the discourse around preparation for the labour force and related “skill” acquisition is rather is messy and often contradictory. labour marketready skills can include critical thinking, creativity, and effective communication skills. others believe that internships, professional development programs, partnerships with businesses and industry external to the university are needed to expand the skill repertoires of phd students. one recent report that emerged out of a re-imagining exercise, provided the following criticism: “rather than simply supplementing the student experience with additional opportunities, doctoral programs need to re-think their pedagogical aims and methods at the most fundamental level” (ubc graduate and postdoctoral studies, 2014). however, the absence of national quality assurance mechanisms beyond implicit checks and balances within and among universities (e.g., comprehensive examinations, examination of the dissertation by external assessors) create challenges for re-imagining exercises. in terms of massification of the phd within the canadian context, it is paradoxical that (1) more phd graduates are required; (2) scholarships are awarded to a small proportion of phd students; and (3) there appears to be a glut of phds in terms of employability within academia. hence, the re-imagining process will be a long and contentious process in canada. time will tell whether re-imagining the phd as just-in-time training for the workforce can in any way successfully supplant previous educational ideals such as newman’s notion of education as an end in itself or humboldt’s conceptualization of bildung – that is, cultivation of the entire individual. 4.2. colombia in the 1990s, with the introduction of law 30 (general law of education, 1994), colombia experienced the second highest increase of latin american countries, at 150%, in university (undergraduate and graduate) attendance. however, this increase lagged behind the mean achieved by oecd countries in the same period of time; only 6% of the population in colombia continued their studies and entered to phd programs. in the 1960s and 1970s, the need to promote doctoral studies was identified and one of the first attempts of the government to ameliorate the problem took place in 1968 with the creation of the national institute to promote science and technology (colciencias). additionally, to encourage high quality assurance of future phd candidates from that time onward, the national ministry of education (república de colombia, ministerio de educación nacional, 2010) invested large amounts of money to train colombian doctoral students abroad. however, as in other countries, such a mobility policy generated considerable “brain drain” and most students remained in their host countries because of better professional opportunities. massification of doctoral programs has occurred in many developed countries. however, this was not the case in colombia as national doctorate programs only began to appear some decades ago. thus, between 1986 and 1990, only nine programs were in existence; between 1997 and 2001 this increased to 14 doctoral programs (national council of acreditation cna, 2010). during that time only 2% of university professors held doctoral degrees. in the year 2001 for instance, only 26 individuals had completed doctoral degrees in colombian universities; that is, a very low rate of only four graduates per 1,000,000 people. the world bank (2003) predicted that globalization and economic growth policies would positively affect growth, professionalization, and the development of tertiary education in colombia during 2001 and would andres et al | f l r 10 lead to a greater number of people graduating with phds. in 2008, around 100 people had graduated from doctoral programs. today, there are 92 doctoral programs in colombia officially reported by the national council of accreditation, cna (national system of innovation in higher education; 2008; unesco-ibe, 2011), with more in natural sciences and mathematics, social sciences, education, and humanities than in engineering, health sciences, and economics (jaramillo, 2009). of these, 52% of doctoral programs are offered by private institutions. hence, doctoral studies are still available only for a small elite and the low availability of doctoral programs in some areas has led some professionals to choose a doctorate not with the goal of mastering an area related to their own field, but only in order to gain access to good jobs. additionally, there is a lack of employment opportunities after graduation because funds provided by government for financing state universities and the opening of places for full-time faculty are not enough to meet national demand. to assure the quality of programs, the government has adopted strategies such as creating and designing regulations (curricular, administrative and academic) and regulatory institutions (cesu, snies, cndm, cna, icfes, among others). however, with so many institutions assigned to assure quality, overlap of functions has the potential to interfere negatively with the flow and development of doctoral programs which differ a great deal from one another (brunner, 2001). all of the work undertaken regarding colombian doctorate education has led to gradual and positive academic development. however, tensions regarding the existing dichotomy between promoting the creation of more doctorates while not addressing the parallel necessity of creating opportunities for employment of alumni exist. the other tension has to do with giving more importance to the regulation of programs rather than for the preparation of academic communities to develop new ways to teach and conduct research. 4.3. denmark in response to the rapid increase in doctoral students at danish universities during the 1990s, denmark created its first graduate schools in 1996. the university act of 2007 required the establishment of graduate schools at all danish universities. the purpose of mandatory graduate schools was to enhance the quality of doctoral education, including optimizing completion rates and standardizing doctoral education across universities (danish ministry of higher education and science, 2014). with the finance act of 2005 and the globalization agreement of 2006, the danish government decided to double the annual enrollment rate of doctoral students from 1,200 in 2003 to 2,400 in 2010. since then, universities have maintained high enrollment rates and today around 2,400 doctoral students are enrolled annually (danish ministry of higher education & science 2015a). the development of doctoral education in denmark is part of a wider european trend of more closely aligning research and doctoral education at the local universities with national and international “policy making and regulation through qualifications framework, benchmarking and evaluation” (fortes, kehm, & mayekiso, 2014, p. 100). together with most of the nordic countries, doctoral education in denmark has been reformed recently “involving a clear trend towards programmed teaching (a more heavy reliance on generic phd courses for example,) and all of the countries are participating in the bologna process for the creation of ehea, the european higher education area” (gudmundsson, 2008, p. 86), which is a body “meant to ensure more comparable, compatible, and coherent systems of higher education in europe” (european higher education area, 2015). as fortes, kehm and mayekiso (2014) point out, the tendency towards increase in “quality assurance at the european level should not be underestimated” (p.100) in terms of the fact that policy making at the european level highly influences and informs national policies on doctoral education in denmark. fortes, kehm and mayekiso highlight that despite the fact that the locus of doctoral education and its curricular content is a national issue, the european commission “acts as a true policy entrepreneur” (p. 100) by specifying agendas and encouraging regulation at the european level. in denmark, the ministry urges universities to ensure that their doctoral programs promote interdisciplinary training and the development of transferrable skills, thus meeting the needs of the wider employment market (gudmundsson, 2008, p. 77). however, at the same time the ministry states that “[o]verregulation of andres et al | f l r 11 doctoral programs should be avoided” as doctoral education is seen as “a source for human capital for research but is also an extremely important part of the research itself” (p. 77). the danish ministry of higher education and science foregrounds the importance of the european qualifications framework (eqf) and the discourse of lifelong learning with the aim to align the quality and level of doctoral education internationally (danish ministry of higher education & science 2015b). with the eqf, it is possible to compare educational systems, increase mobility across borders, and more fully to internationalize danish universities. this can be said to increase competition among universities, which is seen in the benchmarking systems and the global ranking systems in relation to which the danish universities navigate. the eqf’s effect on doctoral education in denmark has been to promote formalised generic skills and competences within research, development, and teaching at universities. the goals advanced by the ministry focus on “better quality and better cohesion in higher education; even more quality and relevance in research; increased use and dissemination of knowledge and technology; improve[ment] of internationalisation of higher education, research and innovation; increased innovation in businesses, public institutions and higher education, and effective administration of education support and grants” (european commission, 2014). this development points to some potential tensions including a dual focus on wider employment for the market and development of deep research skills necessary for academic environments specifically, together with an increased focus on internationalization and mobility and while attempting to build strong research environments at home universities in denmark. also, the dual goal of increasing training programs and support systems to anchor doctoral education more closely to the home institutional structure and the wish to enhance mobility and independence of individual doctoral students creates another tension. 4.4. finland massification of doctoral education has been driven by the needs of a knowledge economy and national innovation policy and has been promoted systematically by the ministry of education and culture (mec) that provides the primary source of funding for the universities in finland. accordingly, between the 1990s and 2010 the number of doctoral degrees completed annually tripled. currently, about 1600 doctoral degrees are awarded annually. the number of degrees completed yearly is highest in medicine, natural, and technical sciences. although half of doctoral degrees are awarded to women, there are still some gendered disciplinary differences (auriol, misu & freeman, 2013; kota-national data base, 2009; puhakka & rautapuro, 2013). doctoral education has become more mainstream and at the same time researcher mobility has become increasingly important in national doctoral education policy. one result is an increased number of international doctoral students. to promote this inflow, the mec provides financial support to universities to attract international doctoral students earning their degrees in finland. however, the proportion of foreigners in doctoral training is still relatively low (14.8%). also, the outflow of finnish doctoral students is slightly higher than the inflow of international doctoral students studying in finland (garam, 2013). the need to provide a highly skilled workforce for labour markets and the need to improve the quality of doctoral education has led to increasing professionalization of doctoral education (niemi et al, 2011; the graduate school working group, 2012). this resulted in the introduction of more structured forms of doctoral education, that is, the launching of a doctoral school system funded by the academy of finland (finnish ministry of education, 1997). however, by 2010 only about 50% of the doctoral student population studied in these selected doctoral schools. in 2011, a national graduate school system reform was implemented that reversed this and as a result, most universities adopted a single graduate school model to support systematic doctoral education. now all doctoral students belong to a doctoral school in their university and to one of the university’s doctoral programs. there are no tuition fees, but funding for doctoral studies is not automatically provided by, for example, the universities, projects, or foundations for the doctoral students. as a result, some students receive little or no financial support. despite taking a stance towards a more structured system, doctoral studies are still highly research intensive rather than course centred (niemi et al, 2011). to promote the attractiveness and predictability of researcher careers, a four stage researcher career model (first stage being completion of doctoral degree, followed by 2-5 year post andres et al | f l r 12 doctoral fellow that paves the way for becoming an independent researcher, and finally professorships and research directorships in the final stage) has been introduced (academy of finland, 2010). also, a tenure track system that aims to promote the shift between stages three and four has been introduced. the employment rate of the doctoral degree holders is extremely high 97.6% (treuthardt, & nuutinen, 2012) and the majority (about 80%) work at the universities or research institutions in finland (sainio, 2010; the graduate school working group, 2012). this may explain why, despite the emphasis on learning transferable skills in doctoral education policy documents (academy of finland, 2010; oecd, 2012), efforts to ensure and support work/life relevance have still remained somewhat minor at universities (niemi et al, 2011). the bologna process and adaptation to the european qualifications framework (eqf) to increase the potential to promote international mobility and to facilitate equal participation in european doctoral programs (berlin communiqué, 2003; bucharest communiqué, 2012; european commission, 2014) has resulted in the enhancement of quality assurance in finnish doctoral education (the graduate school working group, 2012) and engagement in international benchmarking and global ranking systems. quality assurance developments have included setting the target doctoral completion time at four years of full-time study; however, time to graduation has remained almost unchanged at six to seven years (sainio, 2010), also, launching the finnish higher education evaluation council that carries out audits of quality systems of the universities and assists universities in thematic and research evaluations, including doctoral education, is another development. 4.5. united kingdom even before its inclusion in the bologna qualifications framework, uk doctoral education had emerged as an area of some interest to policy makers. this phenomenon can be related to the growing significance attached to the knowledge economy and to doctoral education as a training ground for professional researchers, both within and outside of the academy. although the data presented by the uk’s higher education statistics agency (hesa) on students and qualifiers (higher education statistics agency (n.d.) suggests that the number of doctoral graduates in the uk has tripled from 7,000 in 1994-5 to 22,000 in 2012-13, early concerns emerged during this period (e.g. harris, 1996; national committee of inquiry into higher education, 1997) about whether doctoral education was producing the highly skilled knowledge workers required by the knowledge economy, particularly in science, technology, engineering, mathematics, and medicine, that is, the so-called stemm subjects. doctoral education was considered to be overspecialised and not providing training in generic skills relevant to industry and commerce. in addition to questions about whether its assessment mode (a doctoral thesis judged in a viva voce examination) was fair (morley 2004; morley et al, 2002) but also appropriate (park, 2007), given the wider range of skills acquisition expected within the doctorate, other concerns included low and lengthy completion rates, low numbers entering stemm subjects, and gender biases in these disciplines (harris, 1996; institute of employment research, 2003). a key concern during this period has therefore been to intensify quality assurance of doctoral education. quality assurance agency for higher education, qaa (2004) introduced national guidelines regarding the frequency of doctoral supervision meetings, who can be a doctoral supervisor, the monitoring of student progress (overlapping uncomfortably with immigration-related monitoring of international students), and use of completion rates as a quality assurance measure. new institutional roles and practices (e.g., specialist consultants, specialized software for institutional monitoring of doctoral education, and new academic specialisations such as doctoral pedagogy) have evolved in response to these regulatory demands. uk doctoral education has also seen a strong emphasis on researcher training, framed in a discourse of individual skills and competences. a review by roberts (2002) was largely prompted by concerns about the supply of scientists and engineers and found that the phd provided “inadequate training – particularly in the more transferable skills” (p.10). having been constituted in 2005 to evaluate the impact of the “roberts” funding stream that was then created to support such training, the sector working group on the evaluation andres et al | f l r 13 of skills development of early career researchers, known as the “rugby team” also promulgated the concept of “early career researcher” (ecr), defined as encompassing the first 10 years of a researcher’s postgraduate career (rugby team, 2006). their work also informed the constitution of vitae, a nationallyfunded body that promotes but also shapes uk researcher training through instruments such as its “researcher development framework” (rdf) (vitae, 2010), a text that continues to reflect the language of skills and competences. vitae is now promoting its rdf to european audiences and more widely, projecting the uk as a leader in doctoral education provision. maintaining a high level of international postgraduate admissions (currently around one third of the annual intake) is a further important priority for heis (universities uk, 2014). uk research council support for doctoral research has also become more focused. whereas in the past, applicants from a wide range of universities could apply for doctoral studentships, these are now awarded through a national network of “doctoral training centres (dtcs),” accredited by the research councils to award “mres” degrees (a structured masters’ degree devoted to research methods). in the social sciences, there are only 21 dtcs, so many universities (particularly “newer” universities) are excluded from accessing these studentships. this raises potential equity questions which require further research, as does the intensification of a research “training” agenda that aspires to incorporate a wider range of skills, but within a timeframe whose boundaries are more firmly regulated. 4.6. united states with greater numbers pursuing doctorates than ever before, the notion of a traditional research phd is expanding. the federal survey of earned doctorates (sed) reported that there were 52,760 earned research doctorates (phds) awarded from 421 doctoral granting institutions in 2013. this represents a 3.5% increase from 2012; in 2012 the rate had increased 4.2% from the previous year. fifty-eight percent of earned doctorates were in science and engineering, with the remainder being in the social sciences, humanities, and education (national science foundation, 2014). with these increases, fields such as the humanities continue to produce more doctorates than can be absorbed by available research careers (june, 2014; lederman, 2014). also, these figures mask the growth of professional or practice doctorates, including the edd (education, including educational administration), psyd (psychology), or dm (management). this double growth in doctorates exemplifies a massification of the credential, typified in disciplinary areas that require individuals to have doctoral credentials. this suggests an inflation of educational requirements with questionable value or unjustified educational costs, commonly without their mapping on to a societal or personal return on investment. although most formal educational institutions expect their researchers to have earned phds, it is not universally mandated. disciplinary bodies are beginning to acknowledge that the status quo of research doctorates solely for the purpose of preparing learners to continue on to academic rather non-academic careers is problematic (neem, 2014). for example, the american historical association is seeking to broaden career options for those who will not be able to obtain academic positions; academic positions will eventually be one of only several potential career opportunities or directions (grafton & grossman, 2011; jaschik, 2014). the 2014 report of the modern language association has as its first recommendation the need to redesign doctoral programs away from only academic careers. the goal of the mla is to “align [careers] with the learning needs and career goals of current and future students and to bring degree requirements in line with the ever evolving character of our fields” (mla task force on doctoral study in modern language and literature, 2014, p. 13). this is increasingly addressed by university career placement offices that help research students find positions outside academia (patel, 2015). lacking a central oversight body, doctoral regulations regarding program content, degree specifics, and university requirements are guided by 37,000 combinations of institutional, disciplinary, state, or national accreditation criteria (u.s. department of education office of postsecondary education (ope), n.d.). related to the number of disciplinary certification bodies and proprietary information among programs, it is difficult at best to try to compare data across programs and degrees to determine successful andres et al | f l r 14 outcomes, speak to activities of early career researchers, or even track career paths (national science foundation, n.d.; sinche, 2014). with ambiguous quality assurance, it should not be surprising that there is nearly a 50% rate of doctoral attrition, including those in a limbo of decade-long abd (all but dissertation / defended) student status (yesko, 2014). given that less than 30% of u.s. faculty now work with tenure or are full-time on a tenure track (mla task force on doctoral study in modern language and literature, 2014), the growing population of casual and adjunct instructors, specifically those with doctoral degrees, will further invite investigation over educational quality. endemic challenges of fairness in pay and labour related to the increase of faculty in temporary or contract positions result in time spent ensuring future teaching contracts rather than engaging in research or university / disciplinary service. given the pragmatic nature of american doctoral training, current efforts focused on saving money through defunding education while eliminating full-time permanent faculty by relying increasingly on contingent labour point to a challenging future. 5. discussion from the individual country cases we have revealed three main issues: (1) what is happening on the ground? (2) the consequences of an increased formalization of doctoral education, and (3) the global-local nexus. 5.1. what is happening on the ground? in keeping with our recognition of the necessary recontextualisation of any policy narrative, our comparative study points to the need to examine more fully “what is happening on the ground” in order to understand more adequately how the different global trends play out in the institutional environments in specific countries. our comparative analysis demonstrates that the links between global (international) and local (national, institutional) levels of doctoral education are not similar across countries. even though the countries considered in this paper do subscribe to the same global trends on the policy level, there are many differences on the national and institutional levels. this, we argue, makes comparisons among systems of doctoral education at the global level difficult and fraught with uncertainties and potential inequalities. more research should be undertaken into unlocking the potential for understanding more fully the diverse and complex nature of doctoral educational practice worldwide. to fully understand the character and consequences of global trends within doctoral education, one needs to take into account the level of integration that always takes place at the local level. not only do countries differ when it comes to interpreting and understanding the meaning and relevance of global drivers such as massification, professionalization, and quality assurance within doctoral education, but individual institutions (universities) also face the task of integrating the global drivers into their own specific educational contexts and frameworks. 5.2. formalization of doctoral education as seen across the different country cases, even though there is a tendency to increase the numbers of doctoral programs, at the same time the aim is to consolidate them within doctoral “schools” and to enlarge the size of graduate schools within their institutions – hereby also increasing the level of formal training expected within doctoral education. the first issue relates back to the global trends of massification and quality assurance, while the latter issue is linked to the global trend of professionalization of doctoral education. as visible in the country cases, as the number of doctoral students have increased over the years, this has been met with the response of structuring doctoral education more “tightly” organisationally and demanding more formal procedures for how to develop and evaluate the performance done both by doctoral andres et al | f l r 15 students and their supervisors. this has been described as the development of a generic doctoral curriculum (green 2009) and a “transdisciplinary doctorate” (willetts, mitcell, abeysuriya, & fam, 2012) which is promoted in order to ensure educational relevance for the job market and to safeguard the quality of doctoral education globally. the aim of foregrounding and developing the generic dimension of the phd across disciplines creates tension in relation to the desire to at the same time strengthen research environments at the disciplinary level, to maintain the strong disciplinary focus of the phd, and to resist its over-regulation (gudmundsson 2008). 5.3. the global-local nexus despite presenting country cases at the level of global drivers of doctoral education, we are aware that even if similarities exist at the level of policy, how these policies play out at local levels will always involve a process of recontextualization (bernstein, 2001). during our discussion of the meaning of the global drivers seen from individual national perspectives, it becomes apparent that although some of the same discourses and semantics are being used across different countries, the national, or local, meanings vary greatly. in addition to the shifts which recontextualisation necessarily involves, other factors which come into play include the size of the universities, the variation of gender, ethnicity, and age in student population, and the underlying political-economic conditions in each country. in a similar vein, teichler (2004) has pointed to the fact that “nations and strategic policies of national governments continue to play a major role in setting the frames for international communication, cooperation and mobility as well as for international competition. therefore, the frequent use of the term ‘globalization’ might be based on misunderstandings” (21). for example, this is specifically seen in the variation across countries regarding the meaning and management of “massification.” in some countries, massification seems to imply that the specific country “opens up the gate” with the simple aim of increasing the total number of people with phd credentials, as in the cases of colombia, denmark and finland. however, we see in the cases of canada, usa and the uk that massification is also about generating hierarchies within the doctoral system itself – creating a difference between the so called eliteand super-scholarship holders and the rest, thus pointing to equity issues within the phd system, which needs further scrutinizing. this calls for further research into what we call the “global-local nexus” of doctoral education. this nexus can be seen in several of the country cases where goals of increased internationalization of the doctorate and enhancing mobility among universities on a global scale stand alongside goals of strengthening research environments at the home institutions and the desire to allocate resources to enhance doctoral learning environments. also, this affects the very nature of the phd degree. originating as a universal degree with universal credentials, the increasing focus on internationalization and mobility paradoxically makes visible how diverse, complex, and in some cases incomparable, the phd degree has become. promotion of doctoral student mobility and concommitent alignment of different research programs and structures of different doctoral schools have become exceedingly difficult and has the potential to create many problems and unwanted strain for individual doctoral students and universities alike. this calls for further discussion about whether the phd degree is, still, really a universal degree or if it has transformed into a culturally and regionally contextual educational phenomenon. notwithstanding these distinctions, national and local priorities are not always aligned and the breadth of the doctoral experiences covered here, primarily those involving research doctorates, do not always transfer to professional doctorates that in some national contexts may often focus on more local priorities. the point emerging from this paper is that understandings of global and local levels of doctoral education are deeply linked, as global drivers saturate local doctoral education and supervision practice. more in depth understandings is needed regarding how this is played out at local institutional levels and also if and how these local practices relate back to global and political levels of doctoral education. more specifically, further research about the following is required: a) how global trends, drivers, and strategies for doctoral education play out in local national settings and how such global drivers are integrated locally in specific teaching and learning environments at specific universities; andres et al | f l r 16 b) more awareness and discussion about the “universality” of the phd degree. in an era where mobility regarding doctoral education policy is on the agenda, more attention should be given to what is actually possible to transfer across national arenas; c) what possibilities and challenges do the infrastructures of graduate schools bring with them in relation to doctoral education. we need to examine the everyday workings of graduate schools to learn more about what forms of organisation are at work within the broader higher education system. this paper focused on six national sets of policies regarding research doctorates. it was beyond the scope of this paper to address the varied complexities of doctoral study in other nations. without attending to all national and international trends, including those in australia, new zealand, and the asian and african regions, the scope of this study is necessarily limited. we hope that our attempt to initiate these discussions will serve the purpose of highlighting what can only be thought of as a expanding area of study. andres et al | f l r 17 table 1 summary of recent changes of doctoral education in canada, colombia, denmark, finland, uk and usa driver country massification professionalization quality assurance canada • aim to increase number of phds • poor employment outlook within academia • funding for elite students and institutions • structuring doctoral fellowships to be in line with economic and social trends • emphasis on labor market ready skill acquisition, internships, and partnerships with industry/business colombia • investment in educating phds abroad to increase number of degree holders • rapid increase in the number of doctoral programs providing degrees • 150% increase in number of phds • launching doctoral programs • increase in regulation and regulatory institutions • defunding education • cutting permanent places for faculty denmark • doubling the annual enrollment of doctoral students • emphasizing learning of generic skills and interdisciplinarity • investing in developing teaching at the university • harmonizing doctoral degree according to european standards (i.e. third cycle of bologna process) • launching graduate schools • adopting benchmarking and raking systems finland • increase in number of doctoral degree holders • awarding universities for attracting international students completing the phds • launching doctoral schools and programs • harmonizing doctoral degrees according to european standards (adopting bologna qualifications) • introducing four stage researcher career model and tenure track system • launching international doctoral programs • adopting benchmarking, and international evaluation systems uk • increase in number of doctoral students • development of professional doctorates • development of legislation and charters to address inequalities related to gender and race inequalities • emphasis on training: in skills and competences • providing national funding for generic skills training • more structured preparation for phd entry degree • contract researcher career system • awarding national funding for phd scholarships through networks of accredited doctoral training centres • stronger regulation of a host of doctoral education issues usa • increase in both professional and research doctorates • aim to increase number of phds amongst african american and hispanic leaners • introducing professional doctorate degrees • emphasizing labor market ready skills also in the training of research doctorates • defunding education • cutting permanent faculty • increasing contingent labor force andres et al | f l r 18 keypoints the national, or local, meanings of doctoral education vary greatly. we question whether the phd degree is a universal degree or if it has transformed into a culturally and regionally contextual educational phenomenon. comparison among systems of doctoral education on the global level difficult and fraught with uncertainties and potential inequalities. references academy of finland. (2010). get ahead in our career. get a doctorate. retrieved from http://aka.smartpage.fi/en/doctors_career/. andras, p. (2011). research: metrics, quality, and management implications. research evaluation, 20(2), 90–106. doi: 10.3152/095820211x12941371876265. auriol, l., misu, m., & freeman, r. a. (2013). careers of doctorate holders: analysis of labour market and mobility indicators. oecd science, technology and industry working papers, 2013/04. oecd publishing. doi: 10.1787/5k43nxgs289w-en. auriol, l., schaaper, m., & felix, b. (2012). mapping careers and mobility of doctorate holders: draft guidelines, model questionnaire and indicators – third edition. oecd science, technology and industry working papers, 2012/07. oecd publishing. doi: 10.1787/5k4dnq2h4n5c-en. berlin communiqué. (2003). realising the european higher education area. conference of ministers responsible for higher education in 33 european countries (september). bernstein, s. (2001). the compromise of liberal environmentalism. new york: columbia university press. brunner, j. j. (2001). globalización y el futuro de la educación: tendencias, desafíos, estrategias. análisis de prospectivas de la educación en américa latina y el caribe. santiago de chile: unesco. retrieved from http://www.rmm.mineduc.cl/usuarios/jvill1/file/futuroedunesco.pdf. bucharest communiqué. (2012). making the most of our potential: consolidating the european higher. education area. retrieved from http://www.ehea.info/uploads/%281%29/bucharest%20communique%202012%281%29.pdf. buela-casal, g., gutiérrez-martínez, o., bermúdez-sánchez, m. p., & vadillo-muñoz, o. (2007). comparative study of international academic rankings of universities. scientometrics, 71(3), 349– 365. doi: 10.1007/s11192-007-1653-8. conference board of canada. (2014). how canada performs. retrieved from http://www.conferenceboard.ca/hcp/provincial/education/phd.aspx. danish ministry of higher education and science (2014). official website. retrieved from http://ufm.dk/en?set_language=en&cl=en danish ministry of higher education and science (2015a). official website. retrieved from http://ufm.dk/uddannelse-og-institutioner/videregaende-uddannelse/universiteter/ph-duddannelse/ph-d-skoler. danish ministry of higher education and science (2015b). official website. retrieved from http://ufm.dk/uddannelse-og-institutioner/anerkendelse-ogdokumentation/dokumentation/kvalifikationsrammer/europaeisk-kvalifikationsramme-eqf. department for education and skills (dfes). (2003). the future of higher education. london: hmso. edler, j., georhiou,l., blind, k., & uyrra, e. (2012). evaluating the demand side: new challenges for evaluation. research evaluation, 21, 33–47. doi:10.1093/reseval/rvr002. elgar, f. j. (2003). phd degree completion in canadian universities: final report. halifax: graduate students association of canada. retrieved from http://www.researchgate.net/profile/frank_elgar/publication/236595361_phd_degree_completion_in_ canadian_universities_final_report/links/02e7e5182b86db33e0000000.pdf. andres et al | f l r 19 eua (european university association). (2009). collaborative doctoral education: university-industry partnerships for enhancing knowledge exchange. eua, brussels. retrieved from http://www.eua.be/eua-work-and-policy-area/research-and-innovation/doctoral-education/doc-careers/. eua (european university association). (2010). salzburg ii recommendations: european universities’ achievements since 2005 in implementing the salzburg principles. eua, brussels. retrieved from http://www.eua.be/news/10-1028/eua_publishes_recommendations_for_continued_reform_of_doctoral_education.aspx. european commission. (2014). european research area progress report. brussels, european commission. retrieved from http://ec.europa.eu/research/era/eraprogress_en.htm european higher education area. (2015). official website. retrieved from http://www.ehea.info. fernandez-zubieta, a. & guy, k. (2010). developing the european eesearch area: improving knowledge flows via researcher mobility. jcr scientific and technical reports. european commission. retrieved from http://erawatch.jrc.ec.europa.eu/erawatch/opencms/information/reports/countries/eu/report_0020. finnish ministry of education (1997). tutkijakoulut suomessa 1995–1998. tutkijakouluissa annettavan opetuksen ja ohjauksen laadun arviointi [doctoral schools in finland 1995–1998.]. opetusministeriö, koulutusja tiedepolitiikan osasto. fiske, p. (2011). what is a phd really worth? nature, 472, 381. doi:10.1038/nj7343-381a. fortes, m., kehm, b. m., & mayekiso, t. (2014). evaluation and quality management in europe, mexico, and south africa. in m. nerad & b. evans (eds.), globalization and its impacts on the quality of phd education (pp. 81–110). springer: rotterdam. frank, r. h. (1999). higher education: the ultimate winner-take-all market? ithica, ny. retrieved from http://digitalcommons.ilr.cornell.edu/cheri/2/. garam, i. (2013) kansainvälinen liikkuvuus yliopistoissa ja ammattikorkeakouluissa 2013. tietoja ja tilastoja -raportti 2/2014. centre for international mobility cimo. retrieved from http://www.cimo.fi/instancedata/prime_product_julkaisu/cimo/embeds/cimowwwstructure/32368_tieto a_ja_tilastoja-raportti_2_2014.pdf. general law of education. (1994). [law 115, 1994]. do: 41.214. colombian congress. geuna, a., & martin, b. (2003). university research evaluation and funding: an international comparison, minerva, 41(4), 277–304. doi: 10.1023/b:mine.0000005155.70870.bd. gilbert, r., balatti, j., turner, p., & whitehouse, h. (2004). the generic skills debate in research higher degrees. higher education research & development, 23(3), 375–388. doi: 10.1080/0729436042000235454 grafton, a. t., & grossman, j. (2011). no more plan b: a very modest proposal for graduate programs in history. historians.org. retrieved from http://historians.org/publications-and-directories/perspectiveson-history/october-2011/no-more-plan-b. green, b. (2009). challenging perspectives, challenging practices. doctoral education in transition. in boud, d. & lee, a. (eds.). changing practices of doctoral education. london & new york: routledge. gudmundsson, h. k. (2008). nordic countries. in nerad, m., & heggelund, m. (eds.). toward a global phd? forces & forms in doctoral education worldwide. seattle & london: university of washington press. harris, m. (1996). review of postgraduate education. report for higher educational funding council for england, committee of vice chancellors and principals, and standing conference of principals (bristol). higher education statistics agency (n.d.) retrieved from https://www.hesa.ac.uk/component/datatables/, accessed 29 april 2015 hsieh, h.-f., & shannon, s. e. (2005). three approaches to qualitative content analysis. qualitative health research, 15(9), 1277–1288. doi:10.1177/1049732305276687. institute for employment research. (2003). bulletin: women in science, engineering and technology. university of warwick: warwick. institute for the public life of arts and ideas. (2013). white paper on the future of the phd in the humanities. montreal: mcgill university. andres et al | f l r 20 jaramillo s., h. (2009). la formación de posgrado en colombia. revista iberoamericana de ciencia tecnología y sociedad, 5(13), 131–155. retrieved from http://www.scielo.org.ar/scielo.php?script=sci_arttext&pid=s185000132009000200008&lng=es&nrm=iso. issn 1850-0013. jaschik, s. (2014). a broader history ph.d. inside higher ed. retrieved from https://www.insidehighered.com/news/2014/03/20/historians-association-and-four-doctoral-programsstart-new-effort-broaden-phd. june, a. w. (2014). doctoral degrees increased last year, but career opportunities remained bleak. the chronicle of higher education. retrieved from http://chronicle.com/article/doctoral-degreesincreased/150421/ kehm, b. m. (2006). doctoral education in europe and north america. a comparative analysis. in u. teichler (ed.), the formative years of scholars. wenner-gren international series, vol. 83 (pp. 67– 78). london: portland press. kota-national data base. (2009). retrieved from https://kotaplus.csc.fi/online/transfer.do. lederman, d. (2014). doctorates up, career prospects not. inside higher ed. retrieved from https://www.insidehighered.com/news/2014/12/08/number-phds-awarded-climbs-recipients-jobprospects-dropping. mla task force on doctoral study in modern language and literature. (2014). report of the mla task force on doctoral study in modern language and literature, 1–41. moguérou, p., & di pietrogiacomo, m. p. (2008). stock, career and mobility of researchers in the eu. jcr scientific and technical reports. european comission. retrieved from http://erawatch.jrc.ec.europa.eu/erawatch/opencms/information/reports/countries/eu/report_mig_0011 morley, l. (2004). interrogating doctoral assessment. international journal of educational research, 41(2), 91–97. morley, l., leonard, d., and david, m. (2002). variations in vivas: quality and equality in british phd assessments. studies in higher education, 27(3), 263–273. doi:10.1080/03075070220000653. national committee of inquiry into higher education (the dearing report). (1997). higher education in the learning society. hmso: london. national science foundation (nsf). (n.d.). early career doctorates project (forthcoming). nsf.gov. retrieved from http://www.nsf.gov/statistics/srvyecd/. accessed 1 november 2014. national science foundation. (2014). doctorate recipients from u.s. universities 2012: survey of earned doctorates. national center for science and engineering statistics. retrieved from http://www.nsf.gov/statistics/srvydoctorates/. neem, j. (2014). ministers, not m.b.a.s. inside higher ed. retrieved from https://www.insidehighered.com/views/2014/10/03/humanities-phd-calling-not-vocational-trainingessay. niemi, h., aittola, h., harmaakorpi, v., lassila, o., svärd, s., ylikarjula, j., hiltunen, k., & talvinen, k. (2011). tohtorikoulutuksen rakenteet muutoksessa: tohtorikoulutuksen kansallinen seurantaarviointi [changing structures of doctoran education: national follow-up evaluation of doctoral education]. korkeakoulujen arviointineuvoston julkaisuja, 15. oecd (2010). the oecd innovation strategy: getting a head start on tomorrow. paris: oecd. oecd (2012). transferable skills training for researchers: supporting career development and research. oecd publishing. doi: 10.1787/9789264179721-en. oecd (2014). education at a glance 2014: oecd indicators. oecd publishing. doi: 10.1787/eag-2014en. park, c. (2007). redefining the doctorate. higher education academy, york. retrieved from http://eprints.lancs.ac.uk/435/1/redefiningthedoctorate.pdf. patel, v. (2015). new job on campus: expanding ph.d. career options. the chronicle of higher education. retrieved from http://chronicle.com/article/new-job-on-campusexpanding/151105/?key=sg16ivrgnicwm3bnzglanjhroh1snbwjanfjbcgkbllxfg%3d%3d andres et al | f l r 21 puhakka, a., & rautopuro, j. (2013). sumusta nousee riski – tieteentekijöiden liiton jäsenkysely [the finnish union of university researchers and teachers membership survey]. joensuu: grano. retrieved from http://tieteentekijoidenliitto.fi/materiaali/jasenkyselyraportit. quality assurance agency for higher education (qaa). (2004). code of practice for the assurance of academic quality and standards in higher education. section 1: postgraduate research programmes. gloucester: qaa. república de colombia, ministerio de educación nacional (men) (2010). consejo nacional de acreditación (cna). lineamientos para la acreditación de alta calidad de programas de maestría y doctorado. 2010, bogotá. retrieved from http://cmsstatic.colombiaaprende.edu.co/cache/binaries/articles-186363_lineam_myd.pdf?binary_rand=7259. rizen, j., & marconi, g. (2011). internationalization in european higher education. international journal of innovation science, 3(2), 83–100. doi: 10.1260/1757-2223.3.2.83. roberts, g. (2002). set for success. final report of sir gareth roberts review. london: hmso. rugby team. (2006). evaluation of skills development of early career researchers a strategy paper from the rugby team. uk grad programme roberts policy forum. birmingham: uk grad. sainio, j. (2010). asiantuntijana työmarkkinoille. vuosina 2006 ja 2007 tohtorin tutkinnon suorittaneiden työllistyminen ja heidän mielipiteitään tohtorikoulutuksesta [as an expert to the labour market. employment of those receiving a doctorate in 2006 and 2007, and their opinions about doctoral education]. aarresaaren julkaisusarja. retrieved from https://www.aarresaari.net/uraseuranta/julkaisut. shin, j. c. (2010). impacts of performance-based accountability on institutional performance in us. higher education, 60(1), 47-68 doi: 10.1007/s10734-009-9285-y. shin, j. c. & j. jung (2014). academic job satisfaction and job stress across countries in changing academic environments. higher education, 67, 603–620. doi: 10.1007/s10734-013-9668-y. sinche, m. (2014). tracking ph.d. career paths. inside higher ed. retrieved from https://www.insidehighered.com/advice/2014/10/27/essay-importance-tracking-phd-career-paths. tamburri, r. (2013). the phd is in need of revision. university affairs. retrieved from http://www.universityaffairs.ca/features/feature-article/the-phd-is-in-need-of-revision/. teichler, u. (2004). the changing debate on internationalisation of higher education, higher education, 48 (1), 5–26. doi: 10.1023/b:high.0000033771.69078.41. the graduate school working group, (2012). towards quality, transparency and predictability in doctoral training. the graduate school working group’s suggestions for doctoral training development. academy of finland. retrieved from http://www.aka.fi/en-gb/a/academy-offinland/academy-publications/other-publications/ the world bank. (2003). colombian tertiary education in the context of reform in latin america. colombia: tertiary education paving the way for reform vol. 1. policy briefing. the world bank, report no. 23935-co. treuthardt, l., & nuutinen, a. (2012). the state of scientific research in finland 2012. publications of academy of finland 7/2012. retrieved from http://www.aka.fi/en-gb/a/decisions-and-impacts/thestate-of-scientific-research-in-finland/previous-reviews/the-state-of-scientific-research-in-finland20121/ ubc graduate and postdoctoral studies. (2014). re-imagining the phd: new forms and futures for graduate education. symposium summary report. vancouver: university of british columbia. retrieved from http://www.ligi.ubc.ca/sites/liu/files/news/ubcgraduatesymposiumreport2014.pdf. uk council for science and technology. (2007). pathways to the future: the early career of researchers in the uk. cst, london. unesco-ibe. (2011). world data on education vii ed. 2010/11. retrieved from http://www.ibe.unesco.org //. universities uk. (2014). international students in higher education: the uk and its competition. london: uuk. u.s. department of education office of postsecondary education (ope). (n.d.). the database of accredited postsecondary institutions and programs. retrieved from http://ope.ed.gov/accreditation. accessed 19 october 2014. andres et al | f l r 22 vitae. (2010). researcher development statement. cambridge: vitae and crac. willetts, j., mitcell, c., abeysuriya, k., & fam, d. (2012). creative tensions. negotiating the multiple dimensions of a transdisciplinary doctorate. in lee, a. & danby, s. (eds.). reshaping doctoral education. international approaches and pedagogies. london & new york: routledge. yesko, j. (2014). an alternative to abd. inside higher ed. retrieved from https://www.insidehighered.com/advice/2014/07/25/higher-ed-should-create-alternative-abd-statusessay yin, r. k. (2012). applications of case study research (3rd ed.). thousand oaks, ca: sage. frontline learning research vol.4 no. 4 special issue (2016) 1 6 issn 2295-3159 doi: http://dx.doi.org/10.14786/flr.v4i4.320 expanding conceptualizations for the study of learning antti rajalaa, giuseppe ritellaa, kristiina kumpulainena, & louise wilkinsonb auniversity of helsinki, finland bsyracuse university, usa 1. the background and need for this special issue this special issue is dedicated to expanding conceptualizations for the study of learning in contemporary education. ongoing social changes in the private, public, and economic spheres create new and various demands for learning and education. as the articles of this special issue propose, reasoning, critical thinking, imagination, and managing emotions in dealing with controversial issues have become increasingly important learning requirements in the pursuit of interests toward learning across diverse settings. also, the way in which people learn to take part in such practices in contemporary contexts, both in formal education and everyday life, are shifting. for instance, novel kinds of digital tools create continuously evolving spaces for learning that transform social interactions and learning practices across contexts and time (ritella, ligorio, & hakkarainen, 2016). overall, ongoing changes in society, and the learning requirements they entail, challenge research communities to reconsider how to understand and advance learning in diverse settings. it is also increasingly recognised that there is a need to apply and further develop conceptual and methodological frameworks that are able to account for the complexity of learning in contemporary societal conditions (kumpulainen & erstad, 2016). the five articles included here address several important and under-researched topics in contemporary learning and education. each article also proposes and elaborates on potential conceptual frameworks for expanding conceptualizations for the study of learning in the 21st century. before introducing the articles, we will describe the impetus for the publication of the special issue. after that we describe each of the contributions including their specific research topics and conceptual frameworks. we then outline and discuss some cross-cutting themes emerging from our reading of the articles and conclude by pointing out some key arguments made by the commentators to further the ongoing dialogue and research in the field. http://dx.doi.org/10.14786/flr.v4i4.320 introduction 2 | f l r 2. the impetus for publishing the special issue the impetus for this publication was a symposium, “evolving theoretical frameworks for studying collaboration in diverse 21st century learning contexts,” presented at the meeting of the earli special interest groups 10, 21, and 25 in padova, italy on august 27‒30, 2014. the conference, “open spaces for interaction and learning diversities,” was sponsored by earli and the university of padova. the original call for proposals emphasized the need to address the challenges that global movements and cross-cultural communication continue to pose for learning and education. the meeting was the joint effort of three special interest groups: sig 10, which represents researchers who study aspects of the field of social interaction in learning and instruction; sig 21 with an emphasis on learning and teaching in culturally diverse settings; and sig 25 with a focus on educational theory. in particular, the business meeting of sig 25 (educational theory) encouraged us to publish a collection that would explicitly address the issue of reconceptualizing learning. subsequent to the earli symposium, we decided to propose a special issue of frontline learning research that would capture and extend the discussion. additional articles were commissioned and this special issue represents the culmination of this process. as guest editors, it was our aspiration from the beginning to both extend and catalyze further discussions about the future of research on learning and education in changing societies. 3. descriptions of the individual articles and commentaries the special issue consists of five articles and two commentaries. the first article by tsafrir goldberg and baruch schwarz asserts that scant attention has been paid to the role of emotions in students’ reasoning. contemporary society is also characterized by tensions and social conflicts related to cultural and religious diversity, which carry interpretations of historical facts that are heavily loaded with emotions. goldberg and schwarz address these tensions in history education by introducing a framework for studying the role of emotions and identity for deliberative argumentation. the article provides an overall conceptual framework for the centrality of emotion in cognitive processes such as reasoning and argumentation, and it reports the results of an analysis of students’ responses to three different approaches to teaching history: (1) a conventional authoritative approach involving patriotic apologetic teaching, (2) an empathetic dual-narrative approach involving nonjudgmental listening to collective narratives and identifying with emotions and values, and (3) a critical disciplinary inquiry approach involving the critical analysis and synthesis of conflicting sources. the article reports a study of peer discussions between jewish and arab students regarding the 1948 “war of independence.” the approaches resulted in different processes and outcomes where learners invoked identity, emotion, and perspective-taking; students became aware of their own biases, which limited, or eliminated in some cases, productive discussions. goldberg and schwarz propose that teachers consider ways of engaging students’ positive and negative emotions and construct authentic learning activities; in this way, students may harness their emotions and engage in critical classroom discussions of emotionally-charged topics in content areas such as history. the article by jaakko hilppö, antti rajala, tania zittoun, kristiina kumpulainen, and lasse lipponen addresses the role of imagination in learning, an aspect of the cognitive process that has received scant attention in prior research. the article proposes an overall conceptual framework of the centrality of imagination in learning, and it reports the findings from a case analysis of finnish primary-school students in a science classroom. imagination, from their point of view, is characterized by a partial and temporary separation from the immediate, proximal experience of the social and material world into distal experiences, which are ultimately connected back to the present experience. the analysis of the single case illustrates several aspects of imagination as identified by their conceptual framework. these include encouraging students to break the constraints of time and space that are often superimposed on the teaching process in introduction 3 | f l r classrooms both in europe and in the u.s. their discussion illustrates how students make sense of scientific phenomena and how that sensemaking expands both in time and in space. throughout this process, the students’ thinking becomes more refined and differentiated through what they reference as “loops of imagination,” referring to the back-and-forth movement between proximal and distal experiences. finally, they acknowledge that while the analysis of the single case is instructive, further research needs to be conducted to determine how the conceptual framework applies to varied instances of imagination. the article by william penuel, daniela digiacomo, katie van horne, and ben kirshner argues that although equity-oriented efforts of expanding young people’s access and learning opportunities in science, technology, engineering, and math (stem) are laudable, to date, too little attention has been paid to gaining a more nuanced understanding what it entails to develop and sustain an interest-driven stem-learning pathway across settings and over time among diverse youth. to overcome this limitation, the authors propose a social-practice theory as a prominent theoretical lens to guide the study of learning pathways across varied contexts and over time. they contend that using social-practice theory in the analysis of youth learning is potentially transformative as it unpacks potential leverage points for transforming systems to enable broader participation in stem. in order to justify their argument, the authors apply social-practice theory to interpret the learning pathway of one adolescent, jerome, who they followed as part of a longitudinal study of interest-related stem learning. in their analysis of jerome, the authors demonstrate how he pursued diverse concerns and became aware of new possibilities for action as he moved across different settings of practice and learned to adjust his contributions to the flow of ongoing activity to fit demands and structures of local institutions. the analysis also powerfully illustrates how institutional structures of practice framed the choices jerome made about his participation, learning, and becoming in relation to stem. the article by crina damsa and alfredo jornet shows how an ecological perspective can be used to redefine key concepts of research on learning in higher education. in the article, learning is conceptualized as an achievement of whole ecosystems and involves mutually transformative transactions between people and their sociomaterial environments. although the article builds on existing sociocultural, situative, and sociomaterial approaches to learning, it contributes new insights into conceptualizing learning by discussing the implications of the ecological premises underlying these approaches. by analyzing a video-based case study of an undergraduate course in web design and development, the article shows how ‟co-construction” in a collaborative student group can be characterized as an unfolding field of action in which intellectual agency intertwines with affective and performative relations. they also demonstrate how ‟knowledge resources and materials” form an ecology that becomes inseparably entangled with the students’ activities. finally, damsa and jornet discuss the limitations of the notions of transfer and boundary crossing in accounting for learning in new higher-education contexts where, they argue, students’ activity takes place within a ‟trans-contextual” ecology of learning in which no clear boundary exists between university and professional settings. the authors suggest practical implications for reorganizing higher education to address transformative potentials emerging in the students’ activities. lastly, giuseppe ritella, beatrice ligorio, and kai hakkarainen introduce chronotope as a conceptual tool to examine if and how the organization of space and time might affect learning processes. some previous research, the authors argue, demonstrates that spatial and temporal relations are important for learning, but that our knowledge on this topic is still limited. a problem concerning the examination of space-time is that – as mentioned by wegerif in his commentary in this issue – its dynamics are often implicit, going beyond verbally articulated understanding, and thus, it is challenging to grasp its effects on learning. the authors’ claim is that the emergence of chronotope as a scientific concept might help us to verbalize – and scientifically investigate – what is usually implicit, allowing reflection and dialogue on a dimension of learning that is often taken for granted but that seems to exert a silent influence on how we learn. in this sense, the aim of the article is not to conceptualize education exclusively in terms of spacetime, but to suggest a theoretically founded way to examine how the variation of spatial and temporal relations might affect learning processes. three main features of chronotope are presented and discussed to explore its value as a scientific concept: the examination of the potential interdependency between space and introduction 4 | f l r time; the focus on the social negotiation of space-time; and the coordinated examination of material and discursive processes involved in the negotiation of space-time frames. using some examples from their own empirical investigations and from the existing literature, the authors discuss how we can gain further insights about learning processes by examining the space-time relations of learning by using chronotope as analytic lens. 4. cross-cutting themes and emphases a closer reading of the conceptualizations presented in the articles revealed three main themes, which we believe are central for the study of learning and educational practice in changing societies. below, we briefly introduce these themes and discuss how the articles addressed them. 4.1 theme 1: expanding time-space contexts of learning first, some of the articles in this collection challenge the static framing of time and space that often underpins the research of learning. these studies make visible the implicit spatial and temporal infrastructure on which learning and education rely and which is, in turn, being shaped by the processes of learning and education. common to these studies is that they conceptualize space-time contexts as dynamically intertwined with the activities of the people who are being studied. in this respect, ritella et al. introduce the concept of chronotope to create a nuanced theoretical account of how the digitalization of education reshapes the time and space relations of learning activities. they argue that the new technological innovations and ongoing educational reforms transform the spatial and temporal organization of learning in ways that have profound implications for the study of learning. for instance, they use the example of the ‟flipped learning” approach to exemplify how a new pedagogical approach combined with novel digital technology transforms the learning process by changing where and when school learning takes place. penuel et al. use the concept of learning pathway to account for learning as movement across settings of practice and over time while people pursue their interests. hilppö et al. show how primary school students use their imaginations to explore temporally and spatially distant phenomena relevant to the science topic they are studying. 4.2 theme 2: agency-driven learning some of the contributions in this collection develop concepts that help to examine student agency in learning. agency is an emerging research topic in the learning sciences that accounts for acting upon and transforming activities and life circumstances (rajala, kumpulainen, & martin, 2016). the focus on agency permits approaching the culture-learning interfaces in terms of both enculturation and transformation and balances the over-emphasis on the collective and reproductive dimensions of learning (kumpulainen & renshaw, 2007). damsa and jornet seek to redefine agency in knowledge creation in a way that avoids the dualism of the agent and the material world. in the ecological perspective, people are construed both as active in transforming their circumstances and as passive in being subject to the performative and affective relations in which they engage. ritella et al. echo the idea of agency as transformation by arguing that the time-space relations of an activity are amenable to change through the actions of the participants. they illustrate their argument by discussing a study of student teachers who managed the time-space contexts of their activities by arranging their bodies in space, searching for resources in the environment, and exploring physical and virtual space. introduction 5 | f l r some of the articles address agency indirectly. the article by penuel et al. conceptualizes interestdriven science learning as involving agentic learners who direct their learning pathways across a range of settings. the article also discusses how structures of practice constrain agency and limit access in some settings. the article by hilppö et al. examines imagination, which can be considered a prerequisite for agency that allows people to distance themselves from the immediate constraints of action and imagine alternatives to the present circumstances (emirbayer & mische, 1998). 4.3 theme 3: new directions for the study of non-cognitive dimensions of learning the dominant discourse on learning still concerns standardized measurement of cognitive learning outcomes. the articles of this collection challenge this discourse and make room for imagination, interest, and emotion as well as discussion of values in learning and education. imagination is an under-researched topic that merits further attention. building on the pioneering work in cultural psychology of zittoun and gillespie (2016), hilppö et al. develop a framework for researching processes of imagination in science learning. this framework accounts for the back-and-forth movement of imagination between proximal and distal experiences in classroom discourse. penuel et al. highlight interest as a driving force in learning across contexts. in their conceptualization, interest is not seen as confined within an individual but as emergent in practice and shaped by the available resources and opportunities. goldberg and schwarz argue that emotions are often considered an obstacle for deliberation and reasoning in classrooms. however, they develop a framework for supporting engagement with emotions to promote critical and productive engagement with history topics. their article also makes an important contribution in starting to theorize how controversial and politically charged topics can be addressed in classroom situations. thus, their article contributes to a recent discussion in the learning sciences that pays more attention than before to the socio-political contexts of learning (politics of learning writing collective, 2017). this discussion can be considered a partial response to gert biesta’s (2010) critique that the dominant focus on learning in educational research has obscured the the value dimension and promoted a view of education as a technical matter of effectiveness and efficiency. 5. conclusion while addressing important research topics in contemporary education, the articles of this special issue introduce several potential frameworks for expanding conceptualizations for the study of learning in the 21st century. the research foci and the conceptual framings that they discuss also point out the need for research communities in learning and education not only to engage in empirical research in novel settings, but also to reflect upon the theoretical frameworks that explicitly or implicitly inform the research on learning. it is important to unpack the often-implicit assumptions, values, and educational purposes that underlie the theoretical frameworks and to consider the implications for what it means to learn in the 21st century. in his commentary, rupert wegerif applauds the authors of this collection for making their theoretical assumptions visible and bringing them under scrutiny. at the same time, he points out quite rightly that there are always theoretical assumptions involved in research on learning that foreground some relevant educational phenomena and make others more difficult to consider. he cautions that new conceptualizations of learning are not valuable in themselves, and he challenges us to think what is gained if we look at things using the conceptual frameworks proposed in the articles and why we should invest our energy specifically in these conceptual frameworks and not others. introduction 6 | f l r the other commentators, jessica mckeown and cindy hmelo-silver, suggest that the proposed conceptual frameworks can be useful in inspiring new designs for promoting emergent learning that is often valued in contemporary educational settings. as a way forward, both of the commentaries suggest putting the conceptual frameworks advanced in the articles to rigorous empirical tests, for example, through experimental studies and design research. as guest editors, we agree that the worth of the conceptual and theoretical frameworks presented and discussed will be determined by their potential to inform further empirical and interventionist research. yet, we also underscore that there is no straightforward way to determine “what works” in educational research. learning is a normative concept that involves both analytical reasoning concerning the theoretical assumptions that lead educational research and value judgements concerning the purposes of education at large (biesta, 2010). thus, we urge further research and political discussion on how learning is conceptualized in the research community and in the larger society, in attempts to document and assess the value of educational interventions and programs in scientifically sound manners. references biesta, g. (2010). good education in an age of measurement: ethics, politics, democracy. boulder/london: paradigm. publishers. emirbayer, m., & mische, a. (1998). what is agency? american journal of sociology, 103(4), 962–1023. kumpulainen, k., & erstad, o. (2016). (re)searching learning across contexts: conceptual, methodological and empirical explorations. international journal of educational research. kumpulainen, k., & renshaw, p. (2007). cultures of learning. international journal of educational research, 46(3), 109–115. politics of learning writing collective. (2017). the learning sciences in a new era of us nationalism. cognition & instruction, 35(2). rajala, a., martin, j., & kumpulainen, k. (2016). agency and learning: researching agency in educational interactions. learning, culture and social interaction, (10), 1–3. ritella, g., ligorio, m. b., & hakkarainen, k. (2016). the role of context in a collaborative problem-solving task during professional development. technology, pedagogy and education, 25(3), 395–412. frontline learning research 1 (2014) 1-21 issn 2295-3159 corresponding author: hilde haider, department of psychology, university of cologne, richard-strauss-str. 2 50931 cologne, germany, phone: +49-221-4704719, email: hilde.haider@uni-koeln.de http://dx.doi.org/10.14786/flr.v2i1.37 1 | f l r how we use what we learn in math: an integrative account of the development of commutativity hilde haider a , alexandra eichler a , sonja hansen a , bianca vaterrodt b , robert gaschler c , peter a. frensch b a university of cologne, germany b humboldt-university berlin, germany c university koblenz-landau, germany article received 28 may 2013 / revised 6 january 2014/ accepted 16 january 2014 / available online 27 january 2014 abstract one crucial issue in mathematics development is how children come to spontaneously apply arithmetical principles (e.g. commutativity). according to expertise research, well-integrated conceptual and procedural knowledge is required. here, we report a method composed of two independent tasks that assessed in an unobtrusive manner the spontaneous use of procedural and conceptual knowledge about commutativity. this allowed us to ask (1) in which grade students spontaneously apply this principle in different task formats and (2) in which grade they start to possess an integrated concept of the commutativity. procedural and conceptual knowledge of 8 to 9 year olds (163 second and 180 third graders) as well as 46 adult students was assessed independently and without any hint concerning commutativity. results indicated procedural as well as conceptual knowledge about commutativity for second graders. however, their procedural and conceptual knowledge was unrelated. an integrated relation between the two measures first emerged with some of the third graders and was further strengthened for adult students. keywords: conceptual knowledge; procedural knowledge; commutativity; integrated concept. h. haider et al. 2 | f l r 1. introduction one major skill in mathematics is the acquisition of adaptive expertise. that is, students should be able to deliberately recognize those constraints that allow to apply a certain mathematical principle (e.g., torbeyns, de smedt, ghesquière & verschaffel, 2009; verschaffel, luwel, torbeyns & van dooren, 2009). for instance, in the pisa mathematical literacy test students have to spontaneously apply mathematical principles in order to solve problems in real-world contexts. given the important role of self-guided learning and performance in the development of mathematical abilities and concepts, some recent studies have started to focus on spontaneous recognition of mathematical aspects in natural surroundings (e.g., hannula, & lehtinen, 2005; hannula, lepola, & lehtinen, 2010; mcmullen, hannula-sormunen, & lehtinen, 2011). an important question with regard to adaptive expertise is how students come to recognize that they can use a certain principle in order to facilitate calculation. or to put it in other words, what kind of knowledge underlies the ability to adaptively apply a mathematical principle spontaneously whenever it facilitates calculation? in the research on adaptive expertise, it is widely accepted that this ability is not only based on procedural knowledge (knowing how to apply a certain strategy), but also on conceptual knowledge (knowing when and why a certain principle applies). procedural and conceptual knowledge should be integrated and the resulting knowledge base should be abstract enough to ensure flexibility in knowledge application (e.g., anderson & schunn, 2000; baroody, 2003; gentner & toupin, 1986; haider & frensch, 1996; koedinger & anderson, 1990; star, 2005; verschaffel et al., 2009). as one example of linking concepts and procedures, this research has revealed that conceptual knowledge is important to guide attention to task relevant information in order to solve problems (e.g., baroody & rosu, 2006). an abstract conceptual understanding might also be of particular importance when knowledge has to be transferred from one domain to another (e.g., goldstone & sakamoto, 2003; kaminsky, sloutsky & heckler, 2008). it supports flexible shortcut application when problems to which a principle applies are presented mixed with problems to which the principle does not apply. for instance, siegler and stern (1998) have shown that second graders relied less on inversion-based procedures when inversion problems (a + b – b) were randomly interspersed with control problems (a + b – c) compared to blocked presentation. mixed presentation of inversion and control problems hindered the use of inversion short-cuts. this suggests that younger children do not deliberately recognize the constraints important for applying the inversion principle. rather, they simply seem to know that the strategy applies for a certain class of problems. likely they cannot rely on a well integrated, abstract understanding of the inversion principle (i.e., they have not yet developed adaptive expertise). but, how and when do children develop an integrated representation of basic mathematical principles? the first goal of the current study was to develop a method to unobtrusively measure the spontaneous application of procedural and conceptual knowledge taking the commutativity principle as a test case. the second goal was to shed some light on the development of an abstract and well integrated representation of the commutativity principle. with regard to the second goal, we pursued to different questions: (1) in which grade are students able to spontaneously apply commutativity knowledge in different task formats and (2) in which grade starts performance expressed in the different tasks to correlate with oneanother? for this purpose, we investigated the deliberate use of the commutativity principle in two different situations. the term “deliberate” means that children did not receive any hint about the commutativity principle at all (e.g., torbeyns et al., 2009). in the first test children simply solved addition problems that sometimes allowed for a shortcut based on the commutativity principle (procedural knowledge). the second test was aimed at assessing conceptual knowledge. children were instructed to mark – without solving the problem – those problems that they believed could be solved without calculation. hence, this task required children to realize that the order of addends does not change cardinality. the correlation between these two independent tasks allowed us to gauge how well integrated children’s knowledge was. focusing on just two unobtrusive measures (one procedural and one conceptual), the current work can potentially lay the ground to develop multi-method approaches in the same spirit, safeguarding that multiple testing does not cue participants towards what the test situation is about. h. haider et al. 3 | f l r we focused on the commutativity principle as it is one of the most basic properties in mathematics. it refers to the principle that changing the order of operands in addition and multiplication does not change the end result. it is known as a fundamental property of many binary operations. the commutativity of simple operations, such as the multiplication and addition of numbers are usually acquired throughout elementary school. however, many mathematical proofs also depend on this property. 1.1 development of procedural and conceptual knowledge about commutativity former research in the field of developmental psychology has already shown that children acquire informal knowledge of commutativity as an arithmetic principle long before they enter school (e.g., baroody & gannon, 1984; baroody, ginsburg & waxman, 1983; canobi, reeve & pattison, 1998, 2002; cowan & renton, 1996; resnick, 1992; siegler & jenkins, 1989; sophian, harley & manos martin, 1995). one potential reason for this early development is that at least the core property of commutativity, the orderirrelevance principle, applies to many non-numerical situations. for example, children may experience that some tasks require a certain sequence (e.g., putting on one’s clothes), whereas others do not (e.g., laying the table). already toddlers have many opportunities to learn that order does not affect the end result in some situations, but does in others. order-irrelevance is also a core principle for counting (e.g., gelman, 1990; gelman & gallistel, 1978). learning to count requires children to learn, on the one hand, that the sequence of number words is relevant. on the other hand, the sequence in which the objects are counted is irrelevant. consequently, briars and siegler (1984) found that children need time to understand order-irrelevance in counting. furthermore, counting is the dominant skill through which preschool children learn to map concrete objects to numbers. also, counting is one of the important precursors of addition. through counting, pre-school children can learn order-irrelevance in a numerical manner before entering school. they thus do not only have the chance to understand order-irrelevance in a non-numerical manner. however, even though considerable interest in research on counting and addition principles emerged already in the 1980s and still continues (e.g., baroody, 1984; baroody & gannon, 1984; baroody et al., 1983; briars & siegler, 1984; canobi, reeve & pattison, 1998, 2002, 2003; fuson, 1988; gelman & gallistel, 1978; gelman & meck, 1983; resnick, 1992; sophian & adams, 1987; starkey & gelman, 1982), the central question has not been solved yet: how and when do children acquire integrated knowledge representations, in the sense of true formal arithmetic principles? for instance, geary (2006) stated that it is not clear when children “explicitly understand commutativity as a formal arithmetical principle” (p. 791). on the one hand, the difficulties in answering this question are due to the fact that researchers by no means agree upon the characteristics of procedural or conceptual knowledge that must be given in order to conclude that children possess an abstract mathematical concept (cf., star, 2004). concerning procedural knowledge, most researchers agree that it refers to the ability to apply a certain strategy when performing a mathematical task (e.g., hiebert & lefevre, 1986). conceptual knowledge or metastrategic competences (kuhn, garcia-mila, zohar & andersen, 1995) often is assumed to refer to children’s explicit understanding of a certain principle (i.e., why and when it is allowed to a apply a certain strategy; e.g., baroody, feil & johnson, 2007; hiebert & lefevre, 1986; rittle-johnson, siegler & alibali, 2001). on the other hand, there is no consensus how best to assess procedural and conceptual knowledge. one frequently used approach to measure procedural knowledge is to ask children to solve addition problems and afterwards have them explain their strategies (e.g., baroody & gannon, 1984; baroody, ginsburg & waxman, 1983; bisanz & lefevre, 1992; canobi et al., 1998, 2002, 2003; cowan & renton, 1996). conceptual knowledge in these studies has, for example, been assessed by letting children observe a puppet solving problem pairs (see, e.g. baroody et al., 1983; canobi et al, 1998). if a child on enquiry stated that the puppet could know the answer to the second problem from looking at the previous one, he or she was asked for reasons and eventually prompted for more detailed explanations. this form of assessment implies that children are being informed about the underlying arithmetic principle – at least they are made aware that different efficient strategies are applicable. such procedural and the conceptual knowledge tests might guide h. haider et al. 4 | f l r children’s attention to the task-relevant information. also, it is conceivable that they look at the problems more attentively when they are asked to verbalize their strategies. consequently, conclusions concerning the question whether a child possesses abstract conceptual knowledge may vary depending on the tests that were applied. based on the above-mentioned forms of assessment, the empirical research on commutativity suggests that conceptual and procedural knowledge in this domain are moderately related (e.g., baroody et al., 1983; canobi, 2004; canobi et al., 1998). however, the findings do not allow to exclude that the acquired conceptual knowledge of first or even second graders is still domain-specific rather than akin to an abstract concept representing the formal arithmetic principle of commutativity (e.g., bisanz, watchhorn, piatt & sherman, 2009; geary, hoard, byrd-craven & desoto, 2004; lefevre et al., 2006). therefore, investigating the spontaneous application of commutativity knowledge would complement and broaden this research. in summary, the goal of our study was twofold: first, we aimed to develop a method to unobtrusively test for spontaneous application of procedural and conceptual commutativity knowledge. our second goal was to investigate the degree of integration of this spontaneously expressed procedural and conceptual knowledge of second and third graders. additionally, for means of comparison, we also tested adult students. as described above, knowledge about a mathematical principle like, for instance, the commutativity principle, can be said to represent an integrated or abstract concept in the sense of a true formal mathematic principle when learners are able to apply their knowledge whenever task constraints permit. that is, learners should be able to deliberately recognize task properties allowing them to apply the mathematical principle irrespectively of task context. 2. method 2.1. general method we investigated the commutativity principle with three-element addition problems (i.e., 5+3+7 = ?) 1 . these three-element problems are unfamiliar at least for younger students. since we wanted to investigate whether or not students would recognize the applicability of the commutativity principle without any further information about this principle, we needed less familiar problems. therefore, we accepted that threeelement problems implicitly presuppose knowledge about associativity (e.g., canobi et al., 1998). the three-element problems were always presented in blocks, one problem beneath the other. unbeknownst to the participating students, some problems consisted of identical addends in a different order as the preceding problem, and thus could be solved without calculation (commutative problems, hereafter). students received two different and completely independent task formats. the first task, the arithmetic task, consisted of two blocks. one block contained interspersed commutative problems, the other one did not. participants did not receive any information about the existence of these commutative problems. rather, they were simply asked to solve the two blocks of addition problems as fast and accurately as possible. if students are faster when working on the block that includes three-element commutative problems as compared to the block which does not contain such shortcut options, they can be said to possess procedural knowledge about commutativity. 1 some researchers use the term associativity instead of commutativity when an addition or multiplication problem has more than two addends or factors (geary et al., 2008). other researchers (canobi, et al., 1998) refer to commutativity as the property that problems containing the same terms in a different order have the same answer independent of the number of terms, whereas associativity is the property that problems in which terms are decomposed and recombined in different ways have the same answer [(a + b) + c = a + (b + c)]. h. haider et al. 5 | f l r in the second task, the so-called judgment task, students were instructed to identify those problems which they believed need no calculation. they explicitly were told to refrain from calculating any problems. if students understand that the order of identical addends does not change the cardinality, they should be able to correctly mark the commutative problems. by virtue of this second task type, we were able to assess conceptual knowledge about commutativity without cueing the concept. importantly and in contrast to former experiments (e.g., baroody et al., 1983; canobi, 2005), participants in our experiment did not receive any hint about the existence of commutative problems in either task. that is, they were not instructed to further explain their strategies. the rationale behind this procedure was that any instruction to think about the strategies used to solve the problems might trigger active search for regularities, thereby making it impossible to assess the spontaneously activated concept of commutativity. if students possess an abstract understanding of the commutativity principle, they should be able to recognize and rely on the relevant task characteristics in any task context and without any hint (e.g., bisanz et al., 2009; prather & alibali, 2009). to the extent that children have acquired an abstract concept of commutativity, performance should correlate between both of our two tasks reflecting knowledge about this principle. likely procedural and conceptual knowledge about commutativity becomes iteratively more integrated in the first years of primary school. we should thus find that the relation between procedural and conceptual knowledge is stronger in third graders as compared to second graders (e.g., lefevre et al. 2006). 2.2 participants overall, 163 second graders (79 girls) with a mean age of 8 years 1 month (sd = 7 months), 180 third graders (91 girls) with a mean age of 9 years 1 month (sd = 8 months) participated in the study. as a control condition, we also collected data of 46 students of the university of cologne (37 women) with a mean age of 23.6 years (sd = 5.2). children were recruited from six different elementary schools located in middle socioeconomic status suburbs of cologne. all children had their parents’ or guardians’ permission to participate in the study. 2.3 procedure and materials the study consisted of two parts. in the first part, participants received the arithmetic task: one block with interspersed repetitions of addends in changed order in consecutive problems (commutativity block) and one block without such repetitions (control block). in the second part, participants were administered the judgment task. both tasks were designed as paper-pencil tests and children and adult students were tested in groups of up to 25 participants in a classroom-like setting. we generated three sets of 30 arithmetic problems with three addends between 2 and 9 (e.g., 3 + 6 + 8 = ?; maximum result was 24; 1 as an addend was not included). the problems in all three sets yielded at least approximately the same totals and within a problem each numeral could only occur once. the 30 problems of each of the two blocks were distributed over five pages with six problems on each page. in the commutativity block, each page contained two pairs of commutative problems (i.e., one problem and its repetition with a different order of addends). in the control block no such commutative pairs occurred. instead participants received pairs of control problems which yielded the same results but were composed of different addends. in both blocks, participants were instructed to calculate the problems page by page from top to bottom. the judgment task consisted of overall 30 problems with 10 problems per page. on each page, three pairs were commutative pairs and the remaining four problems were filler problems. the first page was for practice only. participants were instructed to first solve all 10 problems from top to bottom on the page. afterwards they were asked to mark those problems that needed no calculation on that page. in particular, they were told that some of the problems need no calculation and that they should figure out for which of these problems they could have written down the result without calculation. after this practice page, h. haider et al. 6 | f l r participants were instructed to only judge on the next pages whether or not they needed to calculate the result for a problem without actually attempting to solve it. therefore, all problems on pages 2 and 3 were presented without equal sign. instead, there was a circle to the right of each problem and participants were told to mark this circle when they believed they did not need to calculate the result. again, students were instructed to work on the problems from the top to the bottom of each page. table 1 depicts examples of the problems in each of the two arithmetic blocks and the judgment task. table 1 examples of the problems presented on one page in the two arithmetic blocks (commutativity and control block) and the judgment task arithmetic task judgment task commutativity block control block 3 + 5 + 4 = 4 + 9 + 8 = 4 + 8 + 9 = 6 + 2 + 5 = 9 + 7 + 2 = 2 + 7 + 9 = 5 + 3 + 4 = 8 + 9 + 4 = 6 + 7 + 8 = 5 + 2 + 6 = 2 + 7 + 9 = 9 + 4 + 5 = 2 + 7 + 9 9 + 5 + 4 2 + 6 + 5 6 + 5 + 2 8 + 7 + 5 3 + 5 + 6 6 + 5 + 3 2 + 9 + 5 6 + 7 + 9 9 + 6 + 7 problems in bold indicate the commutative pairs of the respective task each of the two arithmetic blocks was administered as a separate booklet, as was the judgment task. students only worked with a pencil and were not allowed to use an eraser. rather, to increase the reliability of the timing measure, they were told to cross out any errors and to write the correct answer right beside the problem. an experimenter instructed all participants in the classroom. the experiment started with six arithmetic practice problems with three addends. the only goal of this phase was to familiarize the children with the task requirements. students were given 2 minutes to solve these six warm-up problems. then, the first of the two arithmetic blocks was presented. approximately half of the children (second graders and third graders) and all adults in the control condition received the commutativity block first, followed by the control block. the remaining participants started with the control block and subsequently received the commutativity block. the time limit was set to 3 minutes per block (1 minute for adult students) with a 1-minute break between blocks. one minute after having finished the second arithmetic block, the judgment task was presented. participants were allowed 2 minutes (adult students again 1 minute) to calculate the problems on the practice page. afterwards they had the same amount of time for marking those problems they believed they could h. haider et al. 7 | f l r have solved without calculation. after the practice phase, the same time limit was applied for the two subsequent pages, so that time did not suffice to calculate the problems and to concurrently mark those problems requiring no calculation. in addition, up to four additional experimenters observing small groups of children (up to six) ensured that they were not calculating the problems. after this last block, all children received some sweets. adult students in the control condition were debriefed about the study. 2.4. design independent variables were grade (second versus third graders) and block type (commutativity versus control block in the arithmetic task). dependent variables in the arithmetic problem blocks were calculation time per problem in each of the two arithmetic blocks, as well as the number of correct results. calculation time was computed separately for each participant and each of the two arithmetic problem blocks by dividing the individual number of completed problems by the total time given for the block in the respective age group (three minutes for second and third graders; 1 minute for adults). for the judgment task, the dependent variables were relative number of hits (correctly identified commutative problems) and false alarms (problems incorrectly identified as commutative problems), as well as the sensitivity index d’ from signal detection theory (i.e., the difference between z-transformed hit rate and false alarms rate). 2.5 split-half reliability in order to check if our measures are reliable, we computed split-half reliabilities for each task type. that is, for each of the two age groups and the adults, we calculated correlations between the two arithmetic blocks (control and commutativity block) and between the second and third pages of the judgment task (the practice page of the judgment task was excluded). table 2 shows the spearman-brown corrected correlation coefficients separately for each age group and each task format (arithmetic task and judgment task). table 2 spearman-brown corrected correlation coefficients for the arithmetic task and the judgment task for all participants and separately for the three age groups (arithmetic task: correlation between the amount of computed problems in the commutativity block and the control block; judgment task: correlation between correct responses on the first and on the second pages of the test) arithmetic task amount of computed commutative problems all participants grade 2 grade 3 adults amount of computed control problems .90 .88 .88 .95 judgment task amount of correct judgments on the first page all participants grade 2 grade 3 adults amount of correct judgments on the second page .82 .83 .78 .86 as can be seen from table 2, the correlation coefficients in each age group ranged between r = .78 and r = .95. thus, the two tasks used to assess participants’ procedural and conceptual knowledge seem to be reliable measures. 3. results second or third graders were excluded from further analyses if they completed less than 16 problems across the two arithmetic problem blocks (i.e., 2 standard deviations below the group means; 15 second graders, 12 third graders, and 1 adult). they were also excluded from further analyses if they solved all 30 h. haider et al. 8 | f l r problems in the control and the commutativity block, as calculation times could not be calculated for these participants (2 second graders, 23 third graders, and 8 adults). this led to 146 second graders, 145 third graders, and 37 adult students in the control condition. the following result section is divided into three parts. we first describe the results for the arithmetic problem blocks. second, we report the performance in the judgment task. lastly, we analyze the relation between these two tasks. 3.1 arithmetic task as a preliminary analysis did not reveal substantial effects of the order of presentation (commutative problem first followed by control problem or vice versa), we collapsed the data for all participants within the groups of second and third graders. table 3 depicts the calculation times for problems in the commutativity and the control blocks per age group. mean calculation times suggest that second and third graders benefitted from the commutative problems whereas adult students did not. table 3 calculation times per task in the commutativity and the control block for each age group. the table holds the means and standard deviations for the different age group in seconds as well as lower and upper limit of the 95-% confidence interval (ci; loftus & masson, 1994) age group commutativity block control block n m (sd) m±95%ci m (sd) m±95%ci grade 2 12.33 (3.74) 12.09 12.66 13.28 (4.39) 12.95 13.60 146 grade 3 9.55 (2.51) 9.34 9.76 9.93 (3.12) 9.72 10.14 145 adults 3.79 (1.04) 3.66 3.92 3.63 (1.04) 3.50 3.76 37 a 2 (age group) x 2 (block type: commutativity block vs. control block) analysis of variance (anova) with calculation time as dependent variable revealed significant main effects of age group (f[1, 289] = 66.23, mse = 20.64, p .23), and of block type (f[1, 289] = 15.94, mse = 4.04, p = .06). the interaction between age group and block type was close to significance (f[1, 289] = 2.98, mse = 4.04, p = .088). planned contrasts revealed that only second graders significantly profited from the commutative problems (second graders: f[1, 289] = 16.32, mse = 4.04, p .06; third graders: f[1,289] = 2.59, mse = 4.04, p = .108). a separate t-test with block type as within-participants variable revealed that the adult control group did not show a significant benefit from commutative problems (t < 1). in addition, we also analyzed the percentage of correct responses in the commutativity and the control blocks. table 4 presents the percentage of correct responses in the two age groups for these two types of problems. as can be seen from table 4, percentage of correct responses was higher for thirdas compared to second graders. accordingly, the 2 (age group) x 2 (block type) anova yielded a significant main effect of age group (f[1, 289] = 3.8, mse = 118.27, p < .05, η² = .01). no other effect was significant. h. haider et al. 9 | f l r table 4 mean percent correct responses in the three age groups for the commutativity and control blocks. also depicted are standard deviations (in parentheses) and the lower and upper limit of the 95-% confidence interval (ci; loftus & masson, 1994) age group commutativity block control block n m (sd) m±95%ci m (sd) m±95%ci grade 2 92.66 (9.85) 91.64 93.68 93.11 (9.13) 92.0894.13 146 grade 3 94.33 (8.62) 91.64 93.68 94.82 (8.31) 94.0495.60 145 adults 96.66 (9.08) 94.8198.51 95.94 (12.55) 94.0997.78 37 overall, the results up to this point show that third graders were faster and less error prone as compared to second graders. furthermore and more importantly, second graders showed a substantial benefit of commutative problems. third graders in tendency also profited from these problems, but for them the effect was not significant. adults, by comparison, did not show such a benefit, probably due to a floor effect based on the simplicity of the problems (for similar results, see robinson & dubé, 2009; robinson & ninowski, 2003). 3.2 judgment task for each student, we individually computed the hit rate, false alarms rate, and the sensitivity index (d’) from signal detection theory 2 . table 5 depicts the means for these dependent measures separately for each of the two age groups and the adult students. as expected, hit rate was higher than false alarms rate in all age groups. accordingly, the sensitivity index d’ differed significantly from chance (all ts > 2.5, ps < .01). this suggests that students were able to correctly identify at least some of the commutative problems. in addition, d’ was substantially higher in thirdas compared to second graders (t[289] = 2.18, p < .05, η² = .02). 2 separately for each student within the respective age groups, we computed z-score of his or her hit and false alarms rate. then, we individually computed the sensitivity index d’ from signal detection theory by subtracting the ztransformed false alarms rate from the z-transformed hit rate. h. haider et al. 10 | f l r table 5 rate of hits, false alarms and d’ in each of the three age groups in the judgment task. standard deviants are given in parentheses. age group judgment task hits false alarms d’ grade 2 .70 (.28) .36 (.33) 1.81 (2.35) grade 3 .81 (.22) .35 (.32) 2.41 (2.32) adults .82 (.25) .11 (.18) 3.92 (2.33) low sensitivity could result from two different sources: the difficulty to identify commutative problems (hit rate) or a tendency to mark other than commutative problems (false alarm rate). therefore, we additionally analyzed the hit and false alarm rates in the two age groups. these analyses revealed that higher sensitivity in grade 3 as compared to grade 2 was mainly due to a higher hit rate. second graders were less able to identify the commutative problems than third graders (t[289] = 3.44, p < .01, ² = .04). the false alarm rate did not differ significantly between these two age groups (t < 1). by contrast, as can be seen from table 5, the higher sensitivity in adults as compared to third graders resulted from a lower false alarm rate in adults, whereas hit rate was almost identical in these two age groups. thus, older participants were better able to discriminate between commutative and control problems than younger participants. this finding from our cross-sectional age-comparison suggests a progress in conceptual knowledge with increasing age (as cohort differences are unlikely). 3.3 relation between procedural and conceptual knowledge the results reported up to this point are somewhat counterintuitive. even though third graders were better able to identify commutative problems in the judgment task, they seemed to rely less on a commutativity-based shortcut during calculation than second graders. in addition, for adults we found no benefit of commutative problems in the arithmetic task. thus, it seems that either the willingness to use more efficient arithmetic strategies or procedural knowledge of commutativity itself decreases (while conceptual knowledge increases with age). the last analyses of the relationship between procedural and conceptual knowledge might help to reconcile this picture. these analyses will answer the research question whether or not participants possess an integrated concept of commutativity. if so, we should find significant positive correlations between the use of the commutative-based shortcut in the arithmetic task (procedural knowledge) and the ability to correctly identify the commutative problems in the judgment task (conceptual knowledge). for procedural knowledge, we used for each participant the average calculation time per problem in the control and the commutativity block of the arithmetic task as well as the difference between these two measures (i.e., savings; with positive values indicating shorter calculation times in the commutativity block). for conceptual knowledge, we used hit rate, false alarms rate, and the sensitivity measure d’. in a first analysis we calculated correlations across second and third graders. second, we calculated correlations within the two age groups and for the adults. table 6 depicts the correlation between procedural and conceptual knowledge. h. haider et al. 11 | f l r table 6 correlation coefficients between procedural and conceptual knowledge depicted separately for all second and third graders as well as for the three age groups. procedural knowledge is indicated by calculation times in seconds for commutative problems, control problems, and in addition for savings. hit rate, false alarms, and d’ indicate conceptual knowledge arithmetic tasks judgment task commutative problems control problems savings n second and third graders hits -0.23 ** -0.17 ** 0.001 291 false alarms -0.09 -0.09 0.02 d’ -0.05 -0.02 0.01 grade 2 hits -0.23 ** -0.18 ** -0.04 146 false alarms -0.13 -0.15 -0.03 d’ -0.09 -0.04 -0.008 grade 3 hits -0.02 0.06 0.11 145 false alarms 0.06 -0.04 0.01 d’ 0.05 0.09 0.06 adults hits -0.23 0.03 0.39 * 37 false alarms -0.15 -0.24 -0.14 d’ -0.11 0.13 0.36 * ** p < .01; * p < .05 as can be seen from table 6, hit rate for the entire group correlated negatively with calculation time for commutative and control problems. that is, the faster second and third graders solved the arithmetic problems, the better they were able to identify commutative problems in the judgment task. savings in solution time due to commutative problems were not related to the ability to identify commutative problems, suggesting that their knowledge about commutativity was not very well integrated. a closer look at the different age groups revealed, however, that adults showed the expected positive correlation between savings and sensitivity. adults who applied the commutativity-based shortcut in the arithmetic blocks were also those who were better able to identify the commutative problems in the judgment task. this correlation suggests that the tested adults do possess an integrated knowledge representation of the commutativity principle. h. haider et al. 12 | f l r in contrast, second and third graders’ procedural and conceptual knowledge were only weakly related at best. as table 6 additionally reveals, second graders’ hit rate correlated negatively with calculation time. again, this correlation suggests that the faster second graders solved the arithmetic problems the better they were able to identify the commutative problems in the judgment task. thus, second graders’ ability to discriminate between commutative and control problems was linked to more general calculation competencies rather than to their procedural knowledge about using the commutativity-based shortcut. third graders, by contrast, did not show any significant correlation between calculation performance and discrimination. overall, these findings suggest that only adults’ spontaneous application of commutativity knowledge is based on an integrated concept of the commutativity principle. in contrast, procedural and conceptual knowledge seem to be only weakly related in second and third graders. note that alternatively, one also could argue that our assessments of procedural and conceptual knowledge are not sufficiently reliable (but, see table 2). in order to further rule out this latter argument and to better understand the missing correlations between savings in the arithmetic task (procedural) and the sensitivity index in the judgment task (conceptual knowledge), we conducted a final fine grained analysis for second and third graders. in the judgment task, false alarms rate of second and third graders was rather high (approximately 40%; table 5) and differed largely between participants in both age groups. presumably, children with a high false alarms rate might have correctly recognized the commutative problems in the judgment task, but at the same time might have believed that also easy to calculate problems (i.e., those with comparatively small addends) needed no calculation. this might have inflated false alarms rate and thus might have reduced the correlations between procedural and conceptual knowledge within second and third graders. following up on these assumptions, we divided the second and third graders into three different groups according to their false alarm rate: children with no false alarms, with up to 50% false alarms, and children with a false alarm rate higher than 50%. table 7 presents the number of participants within these three groups as well as the hit rates separately per grade. table 7 mean hit rates for second and third graders with no (fa = 0), medium (fa ≤ 50%), or high (fa > 50%) false alarms rate in the judgment task false alarm rate no false alarms medium fa-rate high fa-rate hit rate n hit rate n hit rate n grade 2 .81 41 .52 61 .85 44 grade 3 .88 44 .73 60 .84 41 as can be seen from table 7, for second and third graders hit rate was high when either the false alarm rate was low or when the false alarm rate was high (i.e., some children marked only the commutative problems while others marked the commutative and many other problems). this might have caused the overall low correlations between procedural and conceptual knowledge within these two age groups. therefore, we re-analyzed the correlation between hit rates and d’ and arithmetic abilities separately for these three groups within second and third graders. table 8 presents these correlations. in both age groups, only those participants who produced high hit rates without incorrectly marking the filler problems also h. haider et al. 13 | f l r showed substantial correlations. however, second and third graders differed qualitatively with regard to these correlations. table 8 correlations between procedural and conceptual knowledge for second and third graders with no, medium, or high false alarms rate. procedural knowledge is indicated by calculation times in seconds for commutative and control problems as well as savings. hit and false alarms rate (fa) indicate conceptual knowledge grade 2 no false alarms (n = 41) medium fa-rate (n = 61) high fa-rate (n = 44) hits fa hits fa hits fa commutative -.50 ** --.11 .07 -.01 -.01 control -.39 ** --.11 -.10 .02 -.04 savings .03 --.11 -.15 .04 .04 grade 3 no false alarms (n = 44) medium fa-rate (n = 60) high fa-rate (n = 41) hits fa hits fa hits fa commutative -.25 - .04 .12 .04 -.22 control -.01 -.07 .08 -.04 -.18 savings .31 * -.05 -.03 -.02 -.04 ** p < .01; * p < .05 once again, the results suggest that the second graders’ ability to discriminate between commutative and control problems in the judgment task is mainly related to their general calculation skills rather than to their ability to rely on efficient calculation strategies (i.e., the commutativity-based shortcut strategy). by contrast, third graders with high discrimination abilities seem to already possess integrated procedural and conceptual knowledge, starting to form an abstract understanding of commutativity. they use this knowledge, on the one hand, to identify commutative problems in the context of control problems and, on the other hand, to increase efficiency in solving arithmetic problems. 4. discussion with the current study we aimed at presenting an approach to unobtrusively measure the spontaneous usage of procedural and conceptual knowledge of the commutativity principle. apart from providing a basis to develop the method further (see below), the second goal of our study was to investigate h. haider et al. 14 | f l r the relation between procedural and conceptual knowledge about the commutativity principle in second and third graders. for this we asked (1) at which grade the different forms of commutativity knowledge can be detected and (2) at which grade they start to correlate with one-another. overall, our study yielded three main results: first, as expected, third graders showed higher general calculation proficiency (procedural knowledge) and more conceptual knowledge about commutativity than second graders. second, a solution time benefit based on the procedural use of the commutativity principle was only found for second graders. they calculated commutative problems faster than control problems. neither calculation times of third graders nor of adult students reflected significant profit from interspersed commutative problems. third, the correlation between (a) the benefit resulting from a commutativity-based shortcut and (b) conceptual knowledge of commutativity was rather weak in second and third graders. the relation seems to arise in some of the third graders. the link was also present in the control group (adults). the second and third findings merit some further discussion before we come to the theoretical and practical implications. the second finding (i.e., that only second graders’ calculation performance reflected the exploitation of commutativity whereas that of third graders and adults did not) was somewhat surprising. interestingly however, gaschler, vaterrodt, frensch, eichler, and haider (2013) found similar patterns of results with the identical arithmetic task. therefore, we assume that this finding is not due to a sample artefact. nevertheless, it does not fit the general claim that with experience, children become faster and more accurate at solving addition problems and also tend to use more sophisticated strategies, such as order-irrelevant, decomposition, and retrieval strategies (baroody et al., 1983; canobi et al., 1998, 2002; 2003; geary, brown & samaranayake, 1991; goldman, mertz & pellegrino, 1989; resnick, 1992; rittle-johnson & siegler, 1998; siegler, 1987; but see, mcneil, 2007; robinson & dubé, 2009; robinson & ninowski, 2003; torbeyns et al., 2009). it also seems to contradict the results of baroody et al. (1983), which show that approximately 80% of their third graders applied the commutativity-based shortcut to solve arithmetic problems (see also, canobi et al., 2003). one obvious reason for these divergent findings might be that we used three-element addition problems which probably were hard for second graders but (due to the rather small addends) easy for third graders and adults. this may have caused second graders to rely on the more efficient commutativity-based shortcut strategy, whereas third graders and adults were fast in solving the problems anyway, so that they did not consider any gain through using the shortcut strategy. for instance, siegler and araya (2005) mentioned that participants are more likely to adopt solution strategies if they contribute to significant performance advantages. this argument is further supported by the results of gaschler et al. (2013) who found larger benefits when presenting problems with large rather than with small addends. a second reason might be that in former studies (e.g., baroody et al., 1983; canobi et al., 1998; farrington-flint, canobi, wood & faulkner, 2010) students were instructed to explain their strategy immediately after responding. by contrast, our participants received no hint about the existence of commutative problems. while in our study the use of any shortcut strategy was spontaneous, it is possible that the explanation required in the baroody et al.’s study might have triggered students to apply the commutativity-based strategy. for instance, torbeyns et al. (2009) found less strategy application when students could spontaneously apply different strategies during calculation than when they were instructed to do so. in a similar vein, a yet unpublished study from our labs revealed that second and third graders as well as adults substantially benefitted from instruction (compared to a non-instructed group). participants reminded of the commutativity principle and alerted to the fact that commutative problems might occur, showed a larger solution time advantage on commutative problems as compared to control problems. as students of all three age groups relied on the commutativity principle after being instructed accordingly, it seems justified to conclude that (with the exception of second graders) students in our study indeed did not profit much from spontaneously applying the commutativity-based shortcut strategy. concerning our second research question, we found that the second graders’ understanding of commutativity was unrelated to their use of commutativity-based shortcut strategies. first signs of an integrated concept (assessed by the correlation of procedural and conceptual knowledge measures) occurred h. haider et al. 15 | f l r in a small group of third graders and were substantial only for adult students. thus, the integration of procedural and conceptual knowledge seems to increase with age. however, it also suggests that second graders may have used the shortcut strategy without entirely understanding the commutativity principle. this finding seems at odds with the early onset assumption of, for example, baroody and gannon (1984). furthermore, canobi et al. (1998; 2002) had found that second graders’ conceptual (assessed by an explanation task) and procedural knowledge (assessed by solving addition problems) correlated moderately. however, as already discussed concerning baroody et al.’s (1983) findings, canobi (2009; canobi et al., 1998, 2002) assessed conceptual knowledge by asking participants to explain their strategies after they had solved an addition problem. thus, even though canobi et al. used different tasks for assessing procedural and conceptual knowledge, the knowledge assessed by their addition task might have resulted from a mixture of procedural and conceptual competencies (see also, robinson & dubé, 2009). this might have led to a higher correlation between both tests compared to a variant were spontaneous application of procedural and conceptual knowledge is independently assessed. alternatively, one might suspect that our measures of procedural and conceptual knowledge were unreliable. however, this is not likely as we did find satisfying split-half reliabilities for all age groups and both task formats (see, table 2). in addition, our results showed significant correlations (a) for all second graders between the calculation time and percentage of hits and (b) for at least some third graders and all adult students between savings due to the use of commutativity-based shortcuts and hits in the judgment task. therefore, it seems worthwhile to ask for further theoretical causes concerning our third finding. 4.1 theoretical implications at first glance, the results seem to fit with a procedural-first development of commutativity (e.g., baroody et al., 2007; briars & siegler, 1984; siegler & stern, 1998). that is, second graders use the commutativity-based shortcut before having acquired an abstract understanding of the principle. it therefore appears that the development of conceptual knowledge progresses more slowly than that of procedural knowledge – at least as measured in this study and for commutativity (cf. canobi, 2004; canobi et al., 1998). however, in their review about relations between children’s understanding of mathematical concepts and their ability to execute arithmetic procedures, rittle-johnson and siegler (1998) provided ample evidence that with regard to commutativity, children first acquire conceptual knowledge before then applying corresponding strategies. in order to reconcile this conflict, we refer to the iterative model of the development of conceptual and procedural knowledge (e.g., resnick, 1992; rittle-johnson et al., 2001). our findings suggest that second graders possess at least rudimentary conceptual knowledge of the commutativity principle, but their conceptual representation of the commutativity principle is less well integrated (with procedural knowledge) than that of third graders or adult students. this is in line with many findings in the field of mathematic development showing that already second graders possess conceptual knowledge about commutativity (for a review, see rittle-johnson & siegler, 1998). also, our sensitivity index d’ indicated such knowledge. however, the consistent use of this knowledge may still be reduced; that is, their competency to identify the relevant task properties for applying a certain shortcut strategy has not fully developed yet. therefore, they may need a certain external trigger in order to activate their knowledge about commutativity and the corresponding strategies. our instructions for the arithmetic and the judgment task did not provide any such trigger which probably made it rather difficult, particularly for second graders and also for most of the third graders, to realize that they should rely on the commutativity principle in both tasks. consequently, it may be that some participants applied the commutativity-based shortcut strategy to solve the arithmetic problems, but did not use it in the judgment tasks or vice versa. this does not imply that they first learn procedures before they acquire conceptual knowledge. rather, we assume that such a finding mainly reflects that children’s conceptual knowledge is not sufficiently integrated to spontaneously recognize that they could rely on the commutativity principle. in a similar vein, research on expertise (e.g., anderson & schunn, 2000; gentner & toupin, 1986; haider & frensch, 1996; koedinger & anderson, 1990) also shows that wellintegrated and thus abstract conceptual knowledge is required to identify task relevant information in order h. haider et al. 16 | f l r to solve problems and to flexibly transfer knowledge from one task domain to another (see also e.g., sloutsky & fisher, 2008; star & seifert, 2006). to summarize, we assume that the divergent findings concerning the development of an abstract understanding of the commutativity principle reflect the fact that after students have acquired some procedural and conceptual knowledge in this domain, this knowledge needs to be integrated. this integration of knowledge, we suspect, is done in an iterative way which means that procedures are applied, which then refine the conceptual knowledge (rittle-johnson et al., 2001). the conceptual knowledge is then used to guide children’s attention to information which is needed to adaptively apply efficient strategies. 4.2 further improvements of the measurement of spontaneous application of a mathematical principle with the current study, we took a first step to measure spontaneous usage of commutativity knowledge in a non-reactive way. participants worked on the paper-and-pencil tasks in a setting very similar to other tests in the classroom. in the arithmetic blocks, we asked for fast and correct solutions to the arithmetic problems and did not mention that regularities in the task material might be exploited for efficient task processing. we inferred procedural knowledge of the commutativity principle from the performance benefits on material containing identical addends in changed order in consecutive problems (as compared to material that did not contain such pairs of problems). probing for conceptual knowledge, we asked participants to indicate in which cases calculation was not necessary – again without hinting that it might be the commutativity principle that made calculation superfluous. the rationale behind this procedure was that participants who had well integrated knowledge of the commutativity principle should recognize the respective arithmetic problems and consequently should be able to relate it to the task demand (marking problems where calculation was not necessary). age-related changes in hits and false alarms in the judgment task suggested that this was indeed the case. similar indirect approaches to measure knowledge have been developed in order to measure insight (cf. haider & rose, 2007). when investigating insight, it is not feasible either to directly ask participants again and again if they already have discovered the regularity in the task material – without providing them with a strong hint that such a regularity exists. one way to further improve the method would be to include control problems which also feature the same numbers as their predecessor problems in changed order but do not allow to apply the commutativity principle (i.e. subtractions). such interspersed control problems could help to rule out superficial matching strategies (i.e. “same numbers = same result”) that do not capture the essence of commutativity. first explorations in our labs indicate that second graders do not confuse subtraction problems containing the same digits as a preceding addition with genuine commutative problems. furthermore, it would be interesting to implement our instruments within a multi-method approach, administering multiple measurements per person and construct (cf. prather & alibali, 2009). as a first step one would have to estimate to what extend repeated testing of spontaneous usage of a mathematical principle induces participants to recognize and use the principle – and by this spoils the possibility to assess spontaneous usage. paper-and-pencil-based testing in the classroom has the advantage that the test situation is similar to other tests the students take. in a parallel line of research, we have started to employ eyetracking to obtain process measures related to commutativity knowledge (e.g., gaschler et al. 2013; godau, wirth, hansen, haider, & gaschler, in press). for instance, it is possible to quantify the extent to which a child searches for repetitions of addends in subsequent addition problems. however, when they are tested individually with an eyetracking system, children are aware that the measurement is about where they look and how they calculate. class-based testing in computer labs within schools might offer the possibility to obtain process measures while keeping up the character of the assessment as allowing to measure spontaneous application of the principle knowledge. h. haider et al. 17 | f l r 4.3 theoretical conclusions and practical implications recently, prather and alibali (2009; see also, bisanz et al., 2009; schneider & stern, 2010) called for multifaceted knowledge assessment in the context of arithmetic development. our use of the arithmetic and judgment tasks in order to independently assess procedural and conceptual knowledge can be seen as a first step in this direction. complementing earlier work on commutativity knowledge (e.g., baroody & gannon, 1984; canobi et al., 2002), our findings suggest that, when second graders and third graders are not alluded to rely on the commutativity principle, second graders and most of the third graders show but weak signs of spontaneous application of commutativity and interrelation of different forms of commutativity knowledge. this suggests that they do not possess well-integrated knowledge about commutativity in the sense of an abstract formal mathematical principle. our results suggest that even if children use procedures that suggest integrated conceptual knowledge about commutativity, the learning process has by far not reached an endpoint. rather, it still progresses, before leading to a well-integrated, abstract representation of the mathematical principle as with our measures found in adults. as long as children do not possess such an abstract representation, they will not be able to flexibly and adaptively use the commutativity principle in different task contexts. accordingly, we suspect that increasing experience in the field of mathematics is needed in order to better integrate conceptual knowledge about various arithmetic principles. this might explain why transfer of knowledge from one context to another is often found to be rather weak (e.g., frensch & haider, 2008; kaminsky, sloutsky & heckler, 2008; siegler & stern, 1998; sloutsky & fisher, 2008). therefore, helping students to develop well-integrated knowledge concepts should be one of the most important tasks education has to fulfill (see, e.g., geary et al., 2008; prather & alibali, 2009; verschaffel et al. 2009). in more practical terms, if children are taught the commutativity principle in the context of addition, they seem to learn that they can use this principle to avoid unnecessary labor. however, our results suggest that this does not mean that they concurrently acquire an idea of the abstract principle of cardinality. we suspect that many children only acquire a procedure (or a strategy) that they can easily apply for twoelement addition problems. in order to help students to understand the abstract principle of commutativity, it might be worthwhile to activate students’ prior knowledge of this principle, such as the order-irrelevance principle they already use in counting. when introducing the commutativity principle in addition (or multiplication) it might be helpful to tell students that they already have used this principle in other contexts and explain how and why it works in all these different situations. this then might help them to understand the commutativity principle in a more abstract manner and probably also to understand task properties needed to correctly apply this principle. further, it may help to support children in recognizing the consequences of using alternative strategies in order to ensure representational redescription (e.g., baroody & gannon, 1984). keypoints procedural and conceptual commutativity knowledge increase with increasing age. second graders show no signs of an integrated concept of commutativity. first signs of an integrated concept of commutativity emerge in grade three. acknowledgements this research was supported by the german research foundation (dfg; hh1471/12-1). some of the results were presented at the kongress der deutschen gesellschaft für psychologie 2010 in bremen, germany. we thank annette bräutigam, yvonne radermacher, pia blase and ester jung for help with data collection. h. haider et al. 18 | f l r references anderson, j. r. & schunn, c. d. (2000). implications of the act-r learning theory: no magic bullets. in r. glaser (ed.), advances in instructional psychology: educational design and cognitive science (vol. 5, pp. 1-33). mahwah, nj: lawrence erlbaum associates publishers. baroody, a. j. (1984). more precisely defining and measuring the order-irrelevance principle. journal of experimental child psychology, 38, 33-41. doi:10.1016/0022-0965(84)90017-1 baroody, a. j. (2003). the development of adaptive expertise and flexibility: the integration of conceptual and procedural knowledge. in a. baroody & a. dowker (eds.), the development of arithmetic concepts and skills. constructing adaptive expertise (1st ed., pp. 1–34). mahwah, nj: lawrence erlbaum associate publishers. baroody, a. j., feil, y. & johnson, a. r. (2007). an alternative reconceptualization of procedural and conceptual knowledge. journal for research in mathematics education, 38, 115-131. baroody, a. j. & gannon, k. e. (1984). the development of the commutativity principle and economical addition strategies. cognition & instruction, 1, 321-339. doi:10.1207/s1532690xci0103_3 baroody, a. j., ginsburg, h. p. & waxman, b. (1983). children's use of mathematical structure. journal for research in mathematics education, 14, 156-168. doi:10.2307/748379 baroody, a.j., & rosu, l. (2006). adaptive expertise with basic addition and subtraction combinations the number sense view. paper presented at theannual meeting of the american educational research association (april) san francisco, ca. bisanz, j., & lefevre, j. (1992). understanding elementary mathematics. in j. d. campbell (ed.) , the nature and origins of mathematical skills (pp. 113-136). oxford england: north-holland. doi:10.1016/s0166-4115(08)60885-7 bisanz, j., watchorn, r. p. d., piatt, c., & sherman, j. (2009). on “understanding” children's developing use of inversion. mathematical thinking and learning, 11, 10–24. doi:10.1080/10986060802583907 briars, d. & siegler, r. s. (1984). a featural analysis of preschoolers' counting knowledge. developmental psychology, 20, 607-618. doi:10.1037/0012-1649.20.4.607 canobi, k. h. (2004). individual differences in children's addition and subtraction knowledge. cognitive development, 19, 81-93. doi:10.1016/j.cogdev.2003.10.001 canobi, k. h. (2005). children's profiles of addition and subtraction understanding. journal of experimental child psychology, 92, 220-246. doi:10.1016/j.jecp.2005.06.001 canobi, k. h. (2009). concept-procedure interactions in children's addition and subtraction. journal of experimental child psychology, 102(2), 131-149. doi:10.1016/j.jecp.2008.07.008 canobi, k. h., reeve, r. a. & pattison, p. e. (1998). the role of conceptual understanding in children's addition problem solving. developmental psychology, 34, 882-891. doi:10.1080/01443410903473597 canobi, k. h., reeve, r. a. & pattison, p. e. (2002). young children’s understanding of addition concepts. educational psychology, 22, 513-532. doi:10.1080/0144341022000023608 canobi, k. h., reeve, r. a. & pattison, p. e. (2003). patterns of knowledge in children’s addition. developmental psychology, 39, 521–534. doi:10.1037/0012-1649.39.3.521 cowan, r. & renton, m. (1996). do they know what they are doing? children's use of economical addition strategies and knowledge of commutativity. educational psychology, 16, 407-420. doi:10.1080/0144341960160405 farrington-flint, l., canobi, k.h., wood, c. & faulkner, d. (2010). children’s patterns of reasoning about reading and addition concepts. british journal of developmental psychology, 28, 427–448. doi:10.1348/026151009x424222 frensch, p. a. & haider, h. (2008). transfer and expertise: the search for identical elements. in h. l. roediger, iii (ed.), cognitive psychology of memory. vol. [2] of learning and memory: a comprehensive reference (pp. 579-596) oxford: elsevier. doi:10.1016/b978-012370509-9.00177-7 fuson, k. c. (1988). children's counting and concepts of number. new york, ny: springer-verlag publishing. h. haider et al. 19 | f l r gaschler, r., vaterrodt, b., frensch, p. a., eichler, a. & haider, h. (2013). spontaneous usage of different shortcuts based on the commutativity principle. plos one 8(9): e74972. doi:10.1371/journal.pone.0074972 geary, d. c. (2006). development of mathematical understanding. in w. damon, r. m. lerner, & n. eisenberg (eds.), handbook of child psychology: social, emotional, and personality development (vol. 3, pp. 777–810). john wiley and sons. geary, d. c., boykin, a. w., embretson, s., reyna, v., siegler, r., berch, d. b., et al. (2008). report of the task group on learning processes. in national mathematics advisory panel, reports of the task groups and subcommittees (pp. 4-1–4-211). geary, d. c., brown, s. c. & samaranayake, v. a. (1991). cognitive addition: a short longitudinal study of strategy choice and speed-of-processing differences in normal and mathematically disabled children. developmental psychology, 27, 787-797. doi:10.1037/0012-1649.27.5.787 geary, d. c., hoard, m. k., byrd-craven, j. & desoto, m. c. (2004). strategy choices in simple and complex addition: contributions of working memory and counting knowledge for children with mathematical disability. journal of experimental child psychology, 88, 121–151. doi:10.1016/j.jecp.2004.03.002 gelman, r. (1990). first principles organize attention to and learning about relevant data: number and the animate-inanimate distinction as examples. cognitive science, 14, 79-106. gelman, r. & gallistel, c. r. (1978). the child's understanding of number. in. cambridge, ma: harvard university press. gelman, r. & meck, e. (1983). preschoolers' counting: principles before skill. cognition, 13, 343-359. doi:10.1016/0010-0277(83)90014-8 gentner, d. & toupin, c. (1986). systematicity and surface similarity in the development of analogy. cognitive science, 10, 277-300. doi:10.1207/s15516709cog1003_2 godau, c., wirth, m., hansen, s., haider, h., & gaschler, r. (in press). from marbles to numbers estimation influences looking patterns on arithmetic problems. psychology. goldman, s. r., mertz, d. l. & pellegrino, j. w. (1989). individual differences in extended practice functions and solution strategies for basic addition facts. journal of educational psychology, 81, 481-496. doi:10.1037/0022-0663.81.4.481 goldstone, r. l., & sakamoto, y. (2003). the transfer of abstract principles governing complex adaptive systems. cognitive psychology, 46, 414-466. doi:10.1016/s0010-0285(02)00519-4 haider, h. & frensch, p. a. (1996). the role of information reduction in skill acquisition. cognitive psychology, 30, 304-337. doi:10.1006/cogp.1996.0009 haider, h. & rose, m. (2007). how to investigate insight: a proposal. methods, 42, 49–57. doi: 10.1016/j.ymeth.2006.12.004 hannula, m. m., & lehtinen, e. (2005). spontaneous focusing on numerosity and mathematical skills of young children. learning and instruction, 15, 237-256. doi:10.1016/j.learninstruc.2005.04.005 hannula, m. m., lepola, j., & lehtinen, e. (2010). spontaneous focusing on numerosity as a domainspecific predictor of arithmetical skills. journal of experimental child psychology, 107, 394-406. doi:10.1016/j.jecp.2010.06.004 hiebert, j., & lefevre, p. (1986). conceptual and procedural knowledge in mathematics: an introductory analysis. in j. hiebert (ed.), conceptual and procedural knowledge: the case of mathematics (pp. 127). hillsdale, nj: erlbaum. kaminski, j. a., sloutsky, v. m., & heckler, a. f. (2008). learning theory: the advantage of abstract examples in learning math. science, 320, 454–455. koedinger, k. r. & anderson, j. r. (1990). abstract planning and perceptual chunks: elements of expertise in geometry. cognitive science, 14, 511-550. doi:10.1207/s15516709cog1404_2 kuhn, d., garcia-mila, m., zohar, a., & andersen, c. (1995). strategies of knowledge acquisition. society for research in child development monographs, 60 (4), serial no. 245. lefevre, j.-a., smith-chant, b. l., fast, l., skwarchuk, s.-l., sargla, e., arnup, j. s., et al. (2006). what counts as knowing? the development of conceptual and procedural knowledge of counting from kindergarten through grade 2. journal of experimental child psychology, 93, 285-303. doi:10.1016/j.jecp.2005.11.002 h. haider et al. 20 | f l r mcmullen, j.a., hannula-sormunen, m.m., & lehtinen, e. (2011). young children’s spontaneous focusing on quantitative aspects and their verbalizations of their quantitative reasoning. in ubuz, b. (ed.). proceedings of the 35th conference of the international group for the psychology of mathematics education, 3, pp. 217-224. ankara, turkey: pme. mcneil, n. m. (2007). u-shaped development in math: 7-year-olds outperform 9-year-olds on equivalence problems. developmental psychology, 43, 687–695. doi:10.1037/0012-1649.43.3.687 prather, r. w., & alibali, m. w. (2009). the development of arithmetic principle knowledge: how do we know what learners know? developmental review, 29, 221–248. doi:10.1016/j.dr.2009.09.001 resnick, l. b. (1992). from protoquantities to operators: building mathematical competence on a foundation of everyday knowledge. in g. leinhardt, r. t. putnam & r. a. hattrup (eds.), analysis of arithmetic for mathematics teaching (pp. 373-429). hillsdale, nj, uk: lawrence erlbaum associates, inc. rittle-johnson, b. & siegler, r. s. (1998). the relation between conceptual and procedural knowledge in learning mathematics: a review. in c. donlan (ed.), the development of mathematical skills (pp. 75-110)). east sussex, uk: psychology press. rittle-johnson, b., siegler, r. s. & alibali, m. w. (2001). developing conceptual understanding and procedural skill in mathematics: an iterative process. journal of educational psychology, 93, 346362. doi:10.1037/0022-0663.93.2.346 robinson, k. m. & dubé, a. k. (2009). children’s understanding of addition and subtraction concepts. journal of experimental child psychology, 103, 532–545. doi:10.1016/j.jecp.2008.12.002 robinson, k. m., & ninowski, j. e. (2003). adults’s understanding of inversion concepts: how does performance on addition and subtraction inversion problems compare to performance on multiplication and division inversion problems? canadian journal of experimental psychology, 57 321-330. doi:10.1037/h0087435 schneider, m. &. stern, e. (2010). the developmental relations between conceptual and procedural knowledge: a multimethod approach. developmental psychology, 46, 178–192. doi:10.1037/a0016701 siegler, r. s. (1987). the perils of averaging data over strategies: an example from children's addition. journal of experimental psychology: general, 116, 250-264. doi:10.1037/0096-3445.116.3.250 siegler, r. s. & araya, r. (2005). a computational model of conscious and unconscious strategy discovery. in r. v. kail (ed.), advances in child development and behavior (vol. 33) (pp. 1-42). oxford, uk: elsevier. siegler, r. s. & jenkins, e. (1989). how children discover new strategies. hillsdale, nj, uk: lawrence erlbaum associates, inc. siegler, r. s. & stern, e. (1998). conscious and unconscious strategy discoveries: a microgenetic analysis. journal of experimental psychology: general, 127, 377-397. doi:10.1037/0096-3445.127.4.377 sloutsky, v. m. & fisher, a. v. (2008). attentional learning and flexible induction: how mundane mechanisms give rise to smart behaviors. child development, 79, 639-651. doi:10.1111/j.14678624.2008.01148.x sophian, c. & adams, n. (1987). infants' understanding of numerical transformations. british journal of developmental psychology, 5, 257-264. doi:10.1111/j.2044-835x.1987.tb01061.x sophian, c., harley, h. & manos martin, c. s. (1995). relational and representational aspects of early number development. cognition & instruction, 13, 253-268. doi:10.1207/s1532690xci1302_4 star, j.r. (2004, april). the development of flexible procedural knowledge in equation solving. paper presented at the annual meeting of the american educational research association, san diego. star, j. r. (2005). reconceptualizing procedural knowledge. journal for research in mathematics education, 36, 404-411. star, j. r. & seifert, c. (2006). the development of flexibility in equation solving. contemporary educational psychology, 31, 280-300. doi:10.1016/j.cedpsych.2005.08.001 starkey, p. & gelman, r. (1982). the development of addition and subtraction abilities prior to formal schooling in arithmetic. in t. p. carpenter, j. m. moser & t. a. romberg (eds.), addition and subtraction: a cognitive perspective (pp. 99–116). hillsdale, nj: erlbaum. h. haider et al. 21 | f l r torbeyns, j., de smedt, b., ghesquière, p. &verschaffel, l. (2009). acquisition and use of shortcut strategies by traditionally schooled children. educational studies in mathematics, 71(1), 1-17. verschaffel, l., luwel, k., torbeyns, j. & van dooren, w. (2009). conceptualizing, investigating, and enhancing adaptive expertise in elementary mathematics education. european journal of psychology of education, 24(3), 335-359. doi:10.1007/bf03174765 frontline learning research 6 (2014) 26-45 issn 2295-3159 corresponding author: christoph könig, institute of educational science, university of regensburg, universitätsstrasse 31, d-93051 regensburg, germany. e-mail: christoph.koenig@ur.de doi: http://dx.doi.org/10.14786/flr.v2i4.109 26 | f l r a change in perspective – teacher education as an open system christoph könig a , regina h. mulder a a university of regensburg, germany article received 30 april 2014 / revised 5 june 2014 / accepted 11 september 2014 / available online 24 september 2014 abstract teacher education is the environment for the learning and instruction of prospective teachers. its structure, components, and contents shape the development of relevant competences which enable prospective teachers to be effective in the classroom. but its relevance is questioned because respective research, characterised by inconclusive results, does not offer explanations about the reasons why certain teacher education programmes are more effective than others in the development of relevant competences. one reason for the lack of explanations can be found in the way research assesses the effectiveness of teacher education. this might be due to problems regarding the conceptualisations of teacher education, as well as to the inherent selection and nonrandom allocation problems in research on the relation between teacher education and student achievement. in this paper we respond to claims for an organisational perspective on teacher education and develop such a new perspective. accordingly, we provide these claims with an adequate theoretical foundation and develop an organisational model of teacher education based on open systems theory. besides being one of the first integrative organisational models of teacher education, it is among the first models which illustrate the relations and interdependencies of systems, its different parts, and its different levels, and enables researchers to investigate these interdependencies. the development of this model is further based on an alteration of the input variables of the concept of teacher quality. moreover, the model has consequences for the notion of teacher education effectiveness. we illustrate these changes, and discuss them and the model with respect to possible areas of further research. keywords: teacher selection; teacher allocation; teacher education effectiveness; open system; positive matching c. könig & r.mulder 27 | f l r 1. introduction teacher education is the environment for the learning and instruction of prospective teachers. its structure, components, and contents shape the development of relevant competences which enable prospective teachers to be effective in the classroom. these competences comprise cognitive, motivational, volitional, and social abilities and skills necessary for effective teaching (weinert, 2001). but its relevance is questioned because respective research, characterised by inconclusive results, does not offer explanations about the reasons why certain teacher education programmes are more effective than others in the development of relevant competences (boyd, grossman, lankford, loeb, & wyckoff, 2009; harris & sass, 2011; yeh, 2009). one reason for the lack of explanations can be found in the way research assesses the effectiveness of teacher education. most studies compare graduates from different teacher education programmes with regard to differences in the achievement of students in schools; this approach has relatively high demands concerning methodology and conceptualisations of teacher education (boyd, grossman, hammerness, lankford, loeb, ronfeldt, & wyckoff, 2012; morge, toczek, & cakroun, 2010). however, this dominant approach and the conceptualisations of teacher education in these studies do not fully grasp the complexity of teacher education, especially the interplay between different components and the learning and instruction of prospective teachers. four specific aspects illustrate the problems associated with the way research currently investigates teacher education effectiveness. the first two aspects are directly related to teacher education conceptualisations. first, many studies conceptualise teacher education as an individual teacher attribute. they use narrow sets of variables, for example the degree and certification status, as proxies for competences which teachers bring into the classroom (harris & sass, 2011). even structural features or policies of teacher education, for example the selection procedures or the structure of learning opportunities, are considered such individual teacher attributes (little & bartlett, 2010). these kinds of conceptualisations may not adequately reflect the relation between organisational aspects of teacher education and the behaviour of individuals, e.g. the use of learning opportunities by prospective teachers during initial teacher training. what happens at the level of the individual prospective teacher, that is, his learning processes, is embedded in the structure of teacher education. harris and sass (2011) labelled this aspect the “inherent selection problem”. second, most studies directly relate the aforementioned narrow sets of indicators for teacher education to the achievement of students in schools. however, as konold, jablonski, nottingham, kessler, byrd, imig, berry, and mcnergney (2008, p. 310) argue, “[…] there is little to be learned by examining the long jump between teacher characteristics and pupil learning. […]”. few studies take into account the full complexity of the relation between teacher education, teacher characteristics (such as their competences), teacher behaviour, and student achievement. especially the relation between teacher behaviour and student achievement is neglected (connor, son, hindman, & morrison, 2005). an effect size of 0.91 for teacher behaviour measured by classroom observations on student achievement, found by schacter and tum (2004), illustrates the importance of teacher behaviour. the „long jump‟ disregards this relation, and does not take into account the distinction between teacher quality (characteristics teachers possess) and teaching quality (their teaching practice). thus, it hinders the identification of teacher characteristics which are important for effective teaching. the other two aspects are related to potential sources of bias in current estimates of the effectiveness of teacher education (harris & sass, 2011). third, one source of bias is the variation in the development of relevant competences across teacher education programmes (boyd, et al., 2009). this variation may not be attributed only to a better provision of opportunities to learn, but also to a better selection of prospective teachers (denzler & wolter, 2009). structural features of the selection procedures may shape unobserved characteristics of prospective teachers which influence their learning (kennedy, 1998). individual conceptualisations of teacher education lack explanatory power with regard to such organisational aspects. fourth, another source of bias is the nonrandom allocation of teachers to schools. a prominent manifestation of this problem is positive matching. students in schools with high socioeconomic status have better access to highly qualified teachers (in terms of paper qualifications), compared to students in schools with a lower socioeconomic status (luschei & carnoy, 2010; loeb, kalogrides, & beteille, 2012). only few studies investigate relevant structural features of the teacher labour market with regard to their influence on teacher distributions (goldhaber, 2007; c. könig & r.mulder 28 | f l r winters, dixon, & greene, 2012). current individual conceptualisations of teacher education do not allow for explanations of the development of positive matching, because they address this problem when the allocation of teachers to schools has already happened. hence, it remains unknown why teachers bring their competences into schools and classrooms in such a systematic way. in this paper we address these issues and argue that, with a change in perspective on teacher education, some of them may be attenuated. this change in perspective is based on three premises: (1) a rearrangement of teacher education and teacher characteristics within the concept of teacher quality, accompanied by a clear distinction between teacher quality and teaching quality (goe & strickler, 2008). (2) an organisational approach to teacher education modelling teacher education as a system, which focuses on structural features relevant for the selection of teacher education candidates and prospective teachers, the development of relevant competences, and for the allocation of teachers to schools. (3) a change in the notion of teacher education effectiveness, which is due to the rearrangement of the teacher quality concept and the organisational approach to teacher education. the aim is to develop an organisational model of teacher education which allows researchers to take into account (1) the relation between teacher education and its context, as well as (2) the interplay between teacher education and prospective teachers. the development is oriented along the ecological framework of teacher education proposed by zeichner and conklin (2008) and specifically focuses on the admission process and the institutional and labour market context of teacher education. grossman and mcdonald (2008) identify these contexts as being important influences on the policy and practice of teacher education, and argue that in order to gain new insights research should incorporate these contextual conditions. moreover, the model provides a theoretical basis for explanations of learning and instruction of prospective teachers which is embedded in a teacher education system (zeichner, 2005). given the lack of research on organisational level we make use of system and organisational theories in order to characterise teacher education as a system. however, the reliance on these theories might be an advantage because, as grossman and mcdonald (2008) state, broadening the theoretical basis of research on teacher education might facilitate new insights and explanations of teacher education policy and practice. eventually, the model will provide researchers with a new theoretical basis for research in order to reach a better understanding of learning and instruction of prospective teachers, because it illustrates the connections between different (organisational and individual) levels and systems, as well as the interdependencies of individual and organisational learning. these new insights might further be used for policies aimed at the facilitation of learning and instruction of prospective teachers. 2. the prerequisite rearranging components of the teacher quality concept goe and strickler (2008) conceptualise teacher quality as a multidimensional concept consisting of three interrelated dimensions. they conceive of teacher qualifications (understood as degrees, majors, and other paper qualifications) and characteristics (such as their competences) as input variables, teacher behaviour as process variable, and teacher effectiveness as output variable which is commonly measured by standardised student test scores. in accordance with other authors they emphasise that teacher quality and teaching quality are two different aspects, and that they should be modelled accordingly (goe & strickler, 2008; konold et al., 2008). however, as we already mentioned in the introduction, many studies on the relation between teacher education and student achievement disregard this distinction. the interrelations between the different concepts are as follows. teacher qualifications and characteristics (such as their competences) have an influence on the behaviour of teachers, that is, what they do and can do in the classroom (teaching quality). following weinert‟s (2001) definition of competence, teacher characteristics constituting teacher quality comprise cognitive abilities and skills, for example knowledge about and mastery of subject-didactics and a repertoire and understanding of multiple models of teaching, as well as motivational, volitional, and social aspects such as commitment to a continued professional development after initial teacher training, love of children, collaboration with colleagues, and reflection over practice (hopkins, 2008). teacher quality translates into teaching quality. at the same time, with teaching being an experience good and social practice (jovanovic, 1979), teaching quality influences teacher quality. for c. könig & r.mulder 29 | f l r example, reflection over practice, collaboration with colleagues, and a high commitment to continued professional development enables teachers to refine their practice and to further develop their competences after their initial teacher training. eventually, the interplay between teacher and teaching quality is an important influencing factor for student achievement and, consequently, directly related to student achievement. what becomes obvious is that teacher characteristics (such as their competence) have no direct relation to student achievement. their effect on student achievement is mediated by the respective teacher behaviour, that is, they only have an indirect effect on student achievement. this indirect relation is also disregarded by many studies (for example marshall & sorto, 2012). differences in teacher characteristics may lead to differences in what teachers are able to do in the school and in the classroom, and in turn to differences in student achievement. as of yet the specifics of these pedagogical mechanisms are unclear (baumert, kunter, blum, brunner, voss, jordan, klusmann, krauss, neubrand, & tsai, 2010). the unclear picture is due to a negligence of the indirect effect of teacher quality on student achievement. hence, a first prerequisite for the change in perspective on teacher education involves acknowledging this indirect relation. this is accompanied by shifting the focus to the relation between teacher and teaching quality. this may be a way to identify specific teacher characteristics which are relevant for effective teaching. teacher qualifications and characteristics are frequently used interchangeably. however, they are two distinct concepts. teacher qualifications are frequently used in studies as proxies for what the teacher did during initial teacher training (harris & sass, 2011). but teacher characteristics, such as their competence, are a consequence of teacher qualifications, that is, of what they did during initial teacher training. in other words, what teachers did during their initial teacher training, and why, has consequences for what they bring into the school and the classroom, and where. jackson (2010) showed that the quality of teacher-student matches accounts for up to 40 percent of what is usually attributed to a teacher effect on student achievement. hence, the second prerequisite involves a clear distinction between teacher qualifications and teacher characteristics. however, with individual level conceptualisations of teacher education, which mix up teacher qualifications and teacher characteristics, we cannot explain what a prospective teacher actually does during initial teacher training and why, where he ends up teaching, what he is able to do in the classroom, and eventually how his behaviour affects student achievement. having teacher education disentangled from teacher characteristics, and having it identified as starting point for the complex chain between the resulting teacher characteristics (such as their competence), teacher behaviour, and student achievement, we are now in a position to model teacher education as a system of structured learning opportunities, including structural elements governing the selection of prospective teachers and the allocation of teachers, which is embedded in multiple institutional contexts (zeichner, 2006). 3. a different perspective – teacher education as an open system the organisational model of teacher education described in this section is based on open systems theory (katz & kahn, 1978). despite being a rather old model, up to this date it still remains “the most systematic introduction of open system concepts into organisation theory” (scott & davis, 2007, p. 90), and is furthermore the theoretical basis for much of current organisational research (schneider & somers, 2006; martz, 2013). katz and kahn (1978) were among the first recognising the dependency of organisations and their environment, as well as the linkage between psychological and structural/economic aspects of organisations. compared to other currently used open system models, for example contingency theory (lawrence & lorsch, 1967), it is the aforementioned linkage between individual and organisation which makes open systems theory an appropriate framework for teacher education systems. compared to current further developments of open system models, for example complex adaptive systems (stacey, 1995), open systems theory provides a more accessible framework due to the comprehensiveness of its core components. however, the main reason for choosing open systems theory was the fit of its theoretical propositions with the characteristics of teacher education systems (bess & dee, 2008) also show the c. könig & r.mulder 30 | f l r usefulness of this theory for educational organisations in their application of open system theory to higher education). first, it explicitly takes into account the relations and exchanges between different systems. this is important because the teacher education system is not an isolated entity, but is embedded in multiple contexts, for example higher education and the teacher labour market (grossman & mcdonald, 2008). in this part of the framework we are able to model which individuals choose teacher training, and where teachers bring their characteristics (such as their competence) to the school and the classroom. second, it explicitly takes into account the dependencies and interplay of system and prospective teachers. this is important for modelling the use of available learning opportunities by prospective teachers. in this part of the framework we integrate what the prospective teacher does during initial teacher training. 3.1 teacher education from the point of view of open systems theory an open teacher education system consists of a sequence of structured learning opportunities provided to prospective teachers within the system. the sequence and structure of the learning opportunities constitute an environment where the learning of prospective teachers is situated in a gradually growing participation in teaching practice (korthagen, 2010). the active use of these opportunities leads to the development of competences required for effective teaching. the use of learning opportunities by prospective teachers is labelled as, in open system terms, patterned activities of individuals and describe the core of the interplay between system and prospective teachers (katz & kahn, 1978). thus, what happens within the teacher education system is seen as an active developmental process, rather than just a transmission of declarative knowledge (zeichner, 1983). what prospective teacher do, and how successful their professional development is during initial teacher training depends on the characteristics they bring into the teacher education system. at the same time, the learning opportunities provided by the teacher education system require certain individual characteristics. if teacher education candidates or prospective teachers do not meet these requirements, the utilisation of learning opportunities, as a part of their professional development, becomes suboptimal and may even get cancelled prior to graduation (blömeke, 2009). thus, for an open teacher education system control over entry is essential (which is also called boundary maintenance; scott & davis, 2007). the selection function plays a key role in this regard, and is defined as the selection and sorting of teacher education candidates and prospective teachers (musset, 2010; van de werfhorst & mijs, 2010). it is based on the characteristics of the candidates and prospective teachers. an optimal selection function avoids adverse selection in terms of characteristics which hinder a successful utilisation of learning opportunities as a part of the professional development of prospective teachers. given the connection of an open teacher education system to its context (scott & davis, 2007), we have to consider what happens immediately after initial teacher training. the degree to which prospective teachers successfully use the learning opportunities during initial teacher training influences the competence they bring into schools and classrooms. this is a second component of the connection between teacher education and the education system. this allocation function is defined as the assignment of teachers to schools (parsons, 1951), which has long been based on the assumption that schools and teaching position are equivalent across districts and regions (johnson & kardos, 2008). however, jackson (2010) could show that there are teacher-school combinations which lead to better student achievement. thus, it matters where teachers bring their competence into the classroom. an optimal allocation function provides teacher-school matches that minimise teacher turnover and attrition. in sum, the general characteristics of an open teacher education system closely resemble the three common functions of education systems, which constitute an input-transformation-output-model (kast & rosenzweig, 1972): the selection and sorting of candidates and prospective teachers (input/selection), the provision of learning opportunities for students situated in a gradually growing participation in teaching practice to develop relevant competences (transformation/instruction), and the allocation of qualified teachers to schools (output/allocation). c. könig & r.mulder 31 | f l r 3.2 the selection and allocation functions in order to establish and maintain the selection and allocation processes, the open teacher education system develops respective structural elements (katz & kahn, 1978; wang, coleman, coley, & phelps, 2003). these structural elements are arranged in subsystems governing the selection and sorting of prospective teachers, and the allocation of teachers to schools. these structural elements comprise institutional structures and administrative regulations for control over and socialisation of prospective teachers and teachers (maaz, hausen, mcelvany, & baumert, 2006). they allow screening out individuals when they do not meet the requirements of teacher education or a given teaching position in a school. both functions are closely connected to the context of the teacher education system, because they govern the transitions of individuals into and out of initial teacher training. thus, the arrangements of structural elements can be understood as transition systems (van der velden & wolbers, 2007). as such, they are means for the teacher education system to react to policy changes in the immediate context, namely the education system and the teacher labour market. an example for such reactions is a change in the selection mechanisms of a teacher education system given a shortage of teachers in the teacher labour market (blömeke, 2006). 3.2.1 general characteristics of the selection function the selection function governs the admission of teacher education candidates at entry into, and the sorting of prospective teachers within the teacher education system. by means of the aforementioned control and socialisation elements, the selection function provides information about (1) the aptitude of teacher education candidates for teaching, and (2) about the success of prospective teachers in their use of learning opportunities. moreover, socialisation mechanisms initiate the transfer of professional role expectations and norms from teacher education to the prospective teacher and support the professional development of the prospective teacher (saks, uggerslev, & fassina, 2007). this information can be used by prospective teachers in order to judge his attitude to and aptitude for teaching. furthermore, it enables prospective teachers to reflect on their practice in order to determine how to improve his teaching. moreover, the information provided by the selection function serves also as relevant feedback for the system for admission and progression decisions, in order to reduce the variability in the use of learning opportunities, which is due to variability in individual characteristics (scott & davis, 2007). with its control and socialization mechanisms, the selection function serves both the prospective teachers and the teacher education system in determining if a given prospective teacher can progress to the next developmental stage. while the information provided by the function is at first only a rough estimate of how well a given candidate might do, the information becomes more detailed when the actual development of the prospective teacher is assessed. it is important to note that it is only possible to select individuals who (are able to) make themselves available (grodsky & jackson, 2009). thus, variability in individual characteristics can be found either in the candidate pool or the prospective teachers. the structural elements constituting, and in turn influencing the success of the selection function, can be assigned to and described with three dimensions. first, the capacity of the teacher labour market influences the number and characteristics of the candidates. this comprises the accessibility of teacher education and the attractiveness of teaching. second and third, the comprehensiveness of available information about candidates and students and the level of integration of students into teaching influence the number and the characteristics of the prospective teachers. 3.2.2 structural elements of the selection function we begin with the structural elements constituting the capacity of the teacher labour market. the theoretical rationale of the respective structural elements is based on rational choice and supply and demand models (sicherman & galor, 1990; ehrenberg & smith, 2011). given that initial teacher training is an educational choice among others they postulate that individuals analyse educational alternatives by weighing costs against benefits. when the costs of a given educational alternative are higher than individual resources, individuals will opt for another alternative. rational choice models emphasise two core aspects relevant for c. könig & r.mulder 32 | f l r characteristics of the candidate pool: structure and status. based on these core aspects, the length and level of initial teacher training and the occupational status of teaching are structural elements of the capacity of the teacher labour market. while the influence of the length and level is ambiguous, a high occupational status of teaching attracts a greater number of teacher training candidates and increases the candidate pool. countries with a highly attractive teaching profession do not have teacher supply problems (schwille & dembele, 2007). however, with an increased candidate pool it is more likely that the variability in individual characteristics is increased as well. furthermore, characteristics of the student population affect the number of available teaching positions, that is, the demand of teachers. for example, an increased number of students in the education system affects the student-teacher ratio, which in turn influences teacher demand. while this aspect has no direct influence on the candidate pool, it affects the control mechanisms at entry into initial teacher training. educational decisions and the selection process are characterised by an asymmetric distribution of information (van der velden & wolbers, 2007). imperfect information about candidates and prospective teachers is problematic for systems, because they rely on signals (stiglitz, 1975). lack of information increases the risk of admitting and progressing teacher education candidates and prospective teachers who are not successfully using the learning opportunities, or else show an insufficient development. hence, structural elements influencing the comprehensiveness of information available to the teacher education system are admission and assessment procedures, which are based on respective criteria. these criteria determine which individual characteristics are required for entry into initial teacher training and for teaching. students with required characteristics utilise learning opportunities successfully and are more likely to graduate. while the admission procedures are implemented in order to collect information about teacher education candidates, the assessment procedures are implemented in order to monitor prospective teachers with respect to their use of learning opportunities as part of their professional development. moreover, the assessment procedures serve as feedback and possibility for the prospective teachers to reflect on their development and teaching practice. the comprehensiveness of information increases if the admission and assessment procedures exhibit certain characteristics. according to baartman, bastiaens, kirschner and van der vleuten (2006) the characteristics of such assessment procedures within a competence-based approach to teacher education comprise fitness for purpose, comparability and reproducibility of results, acceptability and transparency. moreover, the fairness, cognitive complexity, meaningfulness, and authenticity of the procedures are relevant, besides their costs and efficiency and their consequences (admission and progression decisions). especially admission procedures are closely linked to the demand of teachers. the literature frequently discusses solutions to teacher shortages in form of reduced entry requirements for initial teacher training (blömeke, 2006). the sequence, rigor, and the aforementioned quality-characteristics of procedures and their criteria increase the comprehensiveness of information about candidates and prospective teachers. this is especially important when the candidate pool is large. socialisation mechanisms serve as means to help prospective teachers to take on new roles and simultaneously stress the social aspects of the learning processes. these structural elements reduce the uncertainty of students about expectations and requirements about teaching when entering teacher education. furthermore, the respective structural elements situate the learning of prospective teachers in a social environment, where they are guided and supported in their professional development (korthagen, 2010). one structural element is internal support. it gives access to structured forms of support, either with guidance by experienced teachers or sequenced in clearly defined courses. the other is field experience. it describes opportunities for field experiences prior to entering the teaching profession, and directly influences the transfer of professional role expectations and norms. the level of integration of the selection function is high when a prospective teacher receives frequent internal support, as well as several possibilities to make relevant field experiences. the structural elements of the selection function and their assignment to their respective dimensions are summarised in table 1. 3.2.3 general characteristics of the allocation function the selection function governs the transition of trained teachers from initial teacher training into the teaching profession. thus, it is related to the allocation of teachers to schools. by means of the c. könig & r.mulder 33 | f l r aforementioned control and socialisation elements, the allocation functions provides information (1) for schools about the characteristics of trained teachers, and (2) for trained teachers about characteristics of teaching positions in schools. the socialisation mechanisms initiate the transfer of school specific role expectations and norms. they serve as information for schools about how well a trained teacher is able to integrate into the specific school context. this is a relevant feedback for schools in order to make recruitment decisions. these decisions result in teacher-school matches (lankford & wyckoff, 2010). similarly, the information is at first only a rough estimate of the characteristics of teachers, but becomes more detailed by an increasing amount of time between the first assignment and the definite recruitment decision (liu & johnson, 2006). due to varying success regarding the use of learning opportunities variability in teacher competences is likely. for example, despite having obtained the same degree, trained teachers still can vary in their acquired cognitive, motivational, volitional, and social skills (van der velden & wolbers, 2007). thus, it is difficult for schools to distinguish between teachers who are suited for a given teaching position, and those who are not. hence, the structural elements constituting, and in turn influencing the success of the allocation function, can be assigned to and described with three dimensions. the first dimension is control over the recruitment process. this dimension includes the level of control, as well as the actual utilisation of the level of control with adequate recruitment procedures. a more direct control over recruitment, combined with various recruitment measures may facilitate staffing (liu & johnson, 2006). the control over the recruitment process is directly connected with the second dimension, namely the comprehensiveness of information which is available to schools and teachers about each other. with an increased comprehensiveness of information it is possible to make more informed recruitment decisions. third, the level of integration of teachers into schools influences the smoothness of the transition into the specific teaching position. 3.2.4 structural elements of the allocation function the starting point are structural elements constituting the comprehensiveness of available information about teachers and their characteristics. similarly to the selection function, the allocation process is characterised by an asymmetric distribution of information (van der velden & wolbers, 2007). signals for teachers‟ characteristics and structural factors of the recruitment process attenuate the lack of information (stiglitz, 1975). lack of information about teacher characteristics increases the risk of recruiting the “wrong” teacher and increases the risk of teacher turnover. signals are provided by certification requirements which trained teachers have to fulfil. however, certification requirements and respective teacher test scores are only weak signals of teachers‟ knowledge and skills (goldhaber, 2007). thus, another structural element for information about beginning teachers is probationary periods. with probationary periods, where teachers are monitored regarding their performance, the definite recruitment decision can be delayed, and more information about a teacher can be collected (staiger & rockoff, 2010). however, it is important to note that not only the length of the probationary period is relevant, but also its implementation. probationary periods may be successful only if they provide trained teachers with a well-established and supportive environment (oecd, 2011). examples of respective aspects are, for example, faculty collaborative periods, meeting with supervisors, classroom assistance, or a reduced workload (ingersoll & strong, 2011), within which teachers are enabled to reflect on their practice. probationary periods may be combined with induction measures. in sum, the comprehensiveness of information is high if the allocation function includes certification requirements combined with elaborate probationary periods for teachers. however, the influence of the level of information on the allocation process depends on the control over the recruitment process. as mentioned before, control over the recruitment process comprises the level of and utilisation of this control. the level of control is indicated by the degree of school autonomy regarding recruitment decisions. a direct control over recruitment decisions might facilitate the staffing of schools (liu & johnson, 2006). it may be hindered when there are central authorities or union regulations governing the recruitment process. such regulations may not adequately consider school specific needs regarding personnel and can be understood as constraints interfering with school based recruitment. thus, the level of control over recruitment decisions can be distinguished between school based or local recruitment, a recruitment controlled by regional or central authorities, or a recruitment which is coordinated c. könig & r.mulder 34 | f l r between local and central authorities. however, the level of control alone is not sufficient to characterise control over the recruitment process. several studies have found that although schools have a high degree of autonomy in staffing decisions, they only utilise a small set of recruitment procedures during recruiting teachers (balter & duncombe, 2008; staiger & rockoff, 2010). respective recruitment procedures might include for example interviews and supervised sample lessons. in sum, control over the recruitment process is adequate only if a school-based recruitment is complemented by a variety of recruitment procedures. at the same time, such control has a positive influence on the comprehensiveness of information about trained teachers (liu & johnsons, 2006). table 1 the functions, their dimensions, and their respective structural elements function dimension structural elements context capacity of the teacher labour market length of teacher education level of teacher education occupational status of teaching student population selection comprehensiveness of information about candidates & prospective teachers admission procedures assessment procedures admission criteria assessment criteria level of integration of prospective teachers internal support field experiences allocation control over the recruitment process school autonomy union regulations recruitment procedures comprehensiveness of information about trained teachers certification probationary periods level of integration of teachers into schools teacher mentoring teacher induction socialisation mechanisms serve as means to help teachers to take on school-specific roles and norms. first, the beginning teacher learns the requirements of a role or teaching position (functional aspect); second, he integrates into the social structure of the school (inclusion aspect). over time they get accustomed to the specific organisational characteristics and can adapt to them. similarly to the selection function, these structural elements reduce the uncertainty of teachers about expectations and requirements when they start teaching in a given school. moreover, they offer possibilities for teachers to reflect on their practice in order to improve their teaching. as such the socialisation mechanisms are means to foster teacher professional development after initial teacher training (ingersoll & strong, 2011). structural elements related to the level of integration are teacher induction and teacher mentoring. they are means to make the teachers acquainted to the specific characteristics of a given school. it includes a formalised system to support teachers. teacher mentoring is personal guidance provided by a senior teacher at a school. it varies from single meetings to formalised programmes involving frequent communications between teacher and mentor. teacher induction and mentoring also influences teacher retention, thus decreasing teacher shortages and turnover (wang, odell, & schwille, 2010). schools are more frequently required to provide teachers with school-specific learning opportunities (ingersoll & strong, 2011). the level of integration varies according the comprehensiveness of induction and mentoring measures. the structural elements of the allocation function and their assignment to their respective dimensions are summarised in table 1. c. könig & r.mulder 35 | f l r 4. a change in notion – a different view of teacher education effectiveness we already mentioned that the change in perspective on teacher quality and teacher education requires a different notion of teacher education effectiveness. morge et al. (2010) distinguish three levels of validation of teacher education, depending on the specific outcome variable which is evaluated. the first level comprises teacher thinking and teacher knowledge as primary outcome. the effectiveness of teacher education is assessed by the level of cognitive and non-cognitive characteristics of teachers, that is, their knowledge and motivational, volitional, and social skills which they acquired during initial teacher training. however, at this first level the link between these characteristics and the instructional practice of teachers is not included (morge et al., 2010). the second level includes this link, i.e. the effectiveness of teacher education is assessed with respect to the behaviour of the teachers. while the first level only allowed to ask what teachers know, the second level extends this question to what they are able to do in the school and in the classroom. the third level further extends the concept of teacher education effectiveness. here, teacher education effectiveness is a question of what teacher is able to do in schools and in the classroom, and how this affects student achievement. current notions of teacher education effectiveness involve primarily the third level of validation. however, with the narrow teacher education conceptualisations which directly relate distal variables to student achievement, we cannot expect to gain reliable estimates of the effect of teacher education on student achievement (konold et al., 2008). furthermore, we cannot investigate if teachers who participated in initial teacher training behave in ways which positively affect student learning (morge et al., 2010; konold et al., 2008). the organisational model of teacher education as an open system, however, may be a way to investigate this question. in this regard, a change in notion of teacher education effectiveness, that is, a focus on the second level of validation, might be a necessary step. in the following we illustrate this change in notion and focus. the starting point is teacher competence as outcome of teacher education. thus, we focus on the first level of teacher education validation. teacher competence depends on the utilisation of learning opportunities by prospective teachers. as already mentioned, the learning process situated in a gradually growing participation in teaching practice requires specific individual characteristics (tillema, 1994). teacher education is effective if it provides learning opportunities, based on specific curricula, which provide prospective teachers with the possibility to develop competences necessary for effective teaching. given that the characteristics of prospective teachers depend on the effectiveness of the selection function in sorting them, the notion of teacher education effectiveness is extended: a teacher education system is only effective if (1) it provides prospective teachers with information about their development, with which they can reflect on their practice, and additionally if (2) the system screens out prospective teachers who are likely to fail. besides this individual outcome of teacher education, we also have an organisational outcome. a successful utilisation of learning opportunities by students implies higher success rates (gansemer-topf & schuh, 2006). hence, a comprehensive notion of teacher education effectiveness includes selection effects on the use of learning opportunities and, thus, the professional development of prospective teachers, and an organisational aspect in terms of success rates. moreover, the competences of prospective teachers are related to their teaching practice. in other words, teacher quality may only become visible through the associated teaching quality (mulder, messmann, & gruber, 2009). this means that in order to assess teaching quality it is necessary to consider the competences of the (prospective) teachers, and vice versa. classroom observations during initial teacher training, along with guided support by experienced teachers and room for reflection on their teaching practice, may facilitate an assessment of prospective teachers‟ readiness to teach and teaching quality, given the consensus on effective teaching practices (akiba, letendre, & scribner, 2007). however, classroom observations require the teachers‟ reflections on their teaching, that is, explications of the reasons why they did what they did. this may be a way to unravel the connection between teacher and teaching quality, and thus a possible clarification of the mechanisms with which teachers translate their competence into effective teaching. including what a teacher is able to do in a real classroom in a school, and how this affects student achievement in the concept of teacher education effectiveness is difficult. each school, even each classroom, is a unique social system (johnson & kardos, 2008). hence, specific contextual characteristics of schools, c. könig & r.mulder 36 | f l r for example their facilities and equipment, or the leadership style of the principal, may influence how well teachers are able to translate their knowledge into effective teaching. moreover, where teachers bring their characteristics into schools and in the classroom depends on the specific characteristics of the allocation function. each teacher effect on student achievement involves a complex interplay between recruitment decisions, school and classroom characteristics, and the behaviour of the teacher in the schools and in the classroom. given that it is still unclear how teachers translate their knowledge into effective teaching (baumert et al., 2010; croninger, rice, rathbun, & nishio, 2007), it is questionable if an effect of teacher education on student achievement can be identified. as a consequence, the assessment of teacher education effectiveness remains a question of the development of competences necessary for effective teaching, and thus remains on the second level of validation. figure 1. the organisational model of teacher education as an open system. rectangles depict the dimensions of the selection and allocation function, as well as contextual conditions in the education system/teacher labour market. black arrows illustrate the transition of an individual through teacher education into schools, from teacher education candidate over prospective teacher to a trained teacher in a school. gray arrows and boxes show the consequence of the use of learning opportunities by prospective teachers on their competence and success rates, and the consequences of specific teacher distributions (teacher turnover and positive matching). from an organisational point of view it is nevertheless possible to relate the allocation function to specific manifestations of teacher distributions, such as the positive matching between teachers and schools. it is a peculiarity of the allocation in the context of education systems that a successful allocation is not only a question of balancing supply and demand, but to a greater degree a question of students‟ equal access to highly qualified teachers. hence we have an organisational indicator for the effectiveness of the allocation function: the degree to which its structural arrangement of elements attenuates positive matching of teachers to schools. c. könig & r.mulder 37 | f l r in sum, based on the changes in the teacher quality concept and the organisational perspective on teacher education as an open system, the notion of teacher education effectiveness receives a narrower, but more meaningful and distinct focus. the inclusion of organisational indicators for the effectiveness of the selection and allocation functions allow for an investigation of teacher education effectiveness on a different level. an interesting aspect in this regard is the relation between higher success rates of the teacher education system and the impact of the allocation function on positive matching, because higher success rates imply a higher number of teachers available for allocation. hence, the organisational model allows investigating the relation between the functions as well. the complete organisational model is visualised in figure 1. 5. discussion – the model’s value in research on teacher education in this paper we addressed four shortcomings of current research on the relation between teacher education and student achievement, namely the conceptual, the complexity, the inherent selection, and the non-random allocation problem (konold et al., 2008; harris & sass, 2011). the aim was to develop an organisational model of teacher education which provides researchers with a new, alternative perspective on teacher education practice. this perspective enables researchers to investigate the relation between teacher education and its context (for example the teacher labour market and the education system), the interaction of different systemic levels, as well as the interdependencies of individual and organisational development. the development was based on three specific premises. first, an alteration of the input variables of the teacher quality concept. this involved a clear distinction between teacher education as an antecedent of teacher characteristics, that is, teacher education directly influences teacher competences relevant for teaching. second, a change in perspective away from teacher education as an individual teacher characteristic to a model of teacher education as an open system. within this model, we outlined the role of the selection function for prospective teachers‟ professional development, and the role of the allocation function for different manifestations of the non-random allocation of teachers to schools, for example positive matching. third, as a consequence of the change in perspective, we illustrated an associated change in the notion of teacher education effectiveness. this concept was refocused on the development of competences of prospective teachers, and extended with two organizational indicators of effectiveness. this narrower focus is necessary because of the complex interplay between school and classroom characteristics and what teachers are able to do in the school and in the classroom, which may hinder the identification of a definite teacher education effect on student achievement. the relative underspecification of the learning opportunities in the model is intentional. in contrast to the elements of the selection and allocation functions, it is difficult to identify generic elements of learning opportunities which are comparable across institutional or national settings. although there is some convergence in the design of learning opportunities, there is still a great variety in elements of learning opportunities (paine & zeichner, 2012). moreover, research shows that some of the more generic characteristics such as the length and structure of teacher education are unrelated to teacher education effectiveness (zeichner, 2006). however, in order to make the model useful for, for example, cross-country comparisons it is necessary to keep the model as generic as possible. the underspecification of the learning opportunities provided by a teacher education system might be interpreted as an opportunity for researchers to take into account country-specific characteristics of the learning opportunities in their own studies. hence, researchers are able to fill this gap in the model with characteristics of learning opportunities in their respective samples. the model as a whole imposes high requirements on the collection, amount, and quality of data. this limitation applies to all aspects mentioned in this section. although recent international comparative studies such as teds-m and talis provide new databases, available data might not be sufficient to test the model as a whole. thus, it might be more reasonable to concentrate on specific aspects of the model, such as the relation between selection and student characteristics, the relation between allocation and positive matching, or the relation between student teachers and their use of learning opportunities. nevertheless, the model c. könig & r.mulder 38 | f l r outlined in this paper might serve as a foundation for more elaborate and comprehensive data collection in future studies on teacher education systems. we hope that the organisational model of teacher education will provide a theoretical basis which initiates new research leading to new insights and a better understanding of teacher education policy and practice, especially with regard to the identification of teacher characteristics relevant for teaching, the selection of teacher education candidates and prospective teachers, and the positive matching between teachers and schools. in the following sections we will discuss the usefulness of our model in the context of three possible areas of research. 5.1 identification of teacher characteristics relevant for effective teaching we already mentioned in the introduction that research on the relation between teacher education and student achievement is unsuccessful at identifying teacher characteristics relevant for effective teaching. besides the inherent selection problem, that is, the unobserved characteristics which influence what a teacher did during his initial teacher training, this is further due to the distal conceptualisations of teacher education used in current studies. these conceptualisations, for example the certification status of teachers, are selected because of their relevance for policies concerning the teacher labour market (goldhaber, 2007; harris & sass, 2011). however, these conceptualisations might gain meaning if the aforementioned unobserved characteristics are made observed, and their relations to effective teaching are established (this is in line with the focus on the second teacher education validation level described in section four). we argue that our model can provide a means in order to accomplish these tasks. our model explicitly states relations between characteristics of prospective teachers and their use of learning opportunities provided by the system. with teaching being an experience good, the identification of relevant characteristics requires accurate information about what prospective teachers are able to do in the classroom, that is, classroom observations of prospective teachers which are supported by guided reflection on teaching practice (morge et al., 2010). these classroom observations and possibilities for reflection may be integrated in a more refined concept of the assessment procedures. the authenticity of the assessment procedures may be the core aspect with regard to the identification of relevant characteristics, because it is a more direct way of assessing how well prospective teachers are able to translate the contents of their initial teacher training into effective teaching behaviour (darling-hammond & snyder, 2000). the performance scores derived from these observations, as well as information about the reflections of the teachers, may then be related to a set of characteristics prospective teachers possess. the identification of relevant teacher characteristics further has positive consequences for the selection and sorting of prospective teachers during initial teacher training. the selection and sorting of teacher education candidates and prospective teachers are still based on rather gross measures, such as the grade point average or subject-specific grades in secondary education (blömeke, 2009). with an increased authenticity of assessment in the context of the selection function, and with the associated more accurate information about prospective teachers, the identified characteristics can in turn be used as more refined and accurate admission and assessment criteria. hence, our model not only allows addressing the inherent selection problem on individual, but also on organisational level. it has to be noted that the identification and use of the identified characteristics is an iterative process and requires a significant amount of time, that is, longitudinal models. however, our model is flexible enough to allow for such extensions. the identified characteristics of prospective teachers may be of limited use for the identification of what a teacher is able to do in a school and in a real classroom, given school-specific contexts influencing their practice. 5.2 research on teacher distributions and the teacher body given the explicit modelling of the allocation function, which is integrated into our model of teacher education as an open system, researchers are enabled to investigate consequences of different approaches to allocating teachers to schools. for example, it may be investigated how certification requirements affect the pool of teachers who choose to teach. there are already studies concerning this problem (for example c. könig & r.mulder 39 | f l r angrist & guryan, 2008). however, they investigate this feature of the allocation function isolated from other relevant features, and isolated from the teacher labour market context. an isolated investigation of these features may not suffice for explanations of different teacher distributions. for example, boyd et al. (2012) conclude that, while some teacher education programmes produce teachers with higher student achievement gains than others, these effects are eliminated when their attrition rate is taken into account. another example are the results a simulation study conducted by rothstein (2012). it showed that changing the quality of the teaching force through selection is only successful if at the same time teacher evaluation systems and increased teacher salaries are introduced. this illustrates the need for possibilities for an integrated rather than isolated investigation of selection and allocation effects, which our model provides. moreover, schools depend on the amount of available information about teachers in order to make informed recruitment decisions. these decisions seem to rely on only weak and noisy signals (goldhaber, 2007). thus, it is frequently argued that for an acquisition of reliable specific information, an assessment of teachers based on actual classroom performance is necessary (goldhaber & liddle, 2011). staiger and rockoff (2010) suggest that tenure should be delayed until a sufficient amount of information is collected. as long as indicators of teacher education do not adequately capture what teachers do during their initial teacher training (cf. the respective description in section 5.1), mismatches between teachers and schools are to be expected which lead to teacher turnover. in light of the change in the notion of teacher education effectiveness, a stronger reliance on actual classroom performance of teachers in the context of recruitment seems reasonable. our model allows for an investigation of the influence of different approaches to recruiting teachers and their relation to teacher turnover, taking into account contextual conditions of the teacher labour market. it has to be noted that the model in his current state captures only the structural prerequisites of recruitment decisions. however, our model can be easily extended to include the individual recruitment (or transfer) decisions of teachers and principals within the context of a given configuration of an allocation function. the relations and research questions outlined in the previous sections may also be investigated by cross-country comparisons of teacher education systems, for example a comparison of credential-based and information-based allocation functions (van de werfhorst, 2011). comparisons of different approaches to allocating teachers to schools need to consider not only quantitative, but also qualitative aspects of, for example, recruitment procedures or probationary periods. these qualitative aspects not only include the variety of the different procedures, but also the actual utilisation of these procedures by principals, school boards, or other entities responsible for staffing decisions. thus, when collecting data, researchers may not only rely on institutional data provided by administrative datasets or official documents, because this might only cover the „espoused allocation‟. in order to gain a complete picture of the qualitative aspects, it might be necessary to actually ask principals or school boards about the actual utilisation of the procedures in order to capture the „allocation in use‟ (a similar distinction can be found in cannata, 2010). covering only one of these two procedures may lead to biased estimates of the relation between allocation approaches and teacher distributions. 5.3 cross-country and cross-institutional comparisons of teacher education systems it is important to consider that teacher education practice, as well as learning of prospective teachers during initial teacher training, depend on country-specific characteristics of teacher education systems and contextual conditions present in education systems and teacher labour markets (paine & zeichner, 2012). depending on the point of view, our model enables researchers to investigate not only cross-country, but also cross-institutional differences in teacher education practice. cross-country, as well as cross-institutional analyses involve three overarching steps: (1) the choice and inclusion of contextual information in the model; (2) modeling the interrelation between functions, dimensions, or structural elements; (3) and modeling the interrelation between prospective teachers and the system. in its current form, the focus is on the general education system, or teacher labour market, as the immediate context of teacher education. it has to be kept in mind that this is not the only context teacher education is embedded in. depending on the researcher‟s point of view, the institutional, political, or societal context might be considered the immediate context of teacher education (grossman & mcdonald, 2008). c. könig & r.mulder 40 | f l r the choice of contextual information relates to the decision of the researcher to compare different teacher education programmes (for example, university-based versus school-based teacher education; concurrent versus consecutive), or to compare teacher education systems in different countries. when comparing teacher education programmes, the primary context is the institutional context. thus, respective information relates to higher education, for example the degree of integration of the teacher education programme into universities. when comparing teacher education systems, the primary context is the education system or teacher labour market. respective information relates, according to our model, to the supply and demand of teachers in the education system. it is possible to include contextual characteristics as background information, or else, information about group membership in a multigroup model. for example, comparing teacher education programmes in this multigroup framework allows investigating the differential effect of teacher education variables across different educational levels (huang & moon, 2009). for example, the importance of obtaining a degree for student achievement seems to differ across elementary, middle, and high school levels (phillips, 2010). the differential relevance is explained by the generalist/specialist distinction between elementary, middle, and high school teacher education; the importance of subject-specific degrees increases with education level, where teachers are more often trained to be specialists. hence, there seem to be differential effects of different teacher education programmes on teacher characteristics. other possibilities to include contextual information are cross-classification approaches or multilevel models, depending on the quality and detail of available data. the interrelation of the functions, the dimensions of the functions, and even the structural elements constituting the functions might complicate cross-country or institutional comparisons of teacher education systems. with these interrelations it becomes difficult to pinpoint the influencing factors of competence (development) of the prospective teachers, as well as of positive matching, or more general teacher-school matches. however, it can be argued that it is especially this interrelation which renders the possibility of a single influencing factor of teacher education effectiveness improbable. consequently, our model allows the investigation of the influence of configurations of functions, dimensions, and structural elements on the different aspects of teacher education effectiveness. this might be a more appropriate approach to research on teacher education, especially in light of the complex nature of teacher education systems. these interrelations can be accounted for depending on the availability of data and on the focus on either outcomes or processes. with our characterisation of the selection and allocation functions, it is possible to construct empirical typologies of their structural arrangements. in this case, the structural elements are then treated as indicators of their respective dimensions. for example, the assessment procedures and their criteria are indicators of the comprehensiveness of information available about prospective teachers. in a similar manner, school autonomy, recruitment procedures, and union regulations are indicators of control over the recruitment process. based on the structural elements composite measures can be constructed for each dimension. in a further step these composite measures can be used in latent class or cluster analyses in order to identify different approaches to selecting teacher education candidates and prospective teachers, as well as different approaches to allocating teachers. these different profiles can be investigated with regard to their associated organisational outcomes, that is, to success rates of the teacher education system or to different distributions of teachers in the education system. similar approaches have been taken in the context of institutional dimensions of education systems and the relation between education and labour market outcomes (hofman, hofman, & gray, 2008; bol & van de werfhorst, 2011). another possibility for cross-country comparisons in a multigroup framework is focusing on processes rather than outcomes, that is, focusing on the interplay between use of learning opportunities and development of competence rather than on comparisons of mean competence levels. such questions are suited best for a multiple group structural equation modelling approach. the different configurations of both functions can be used as background variables to select countries with similar or different levels of information, integration, or labour market capacities. next, these countries can be compared in differences in the relation between characteristics of prospective teachers, their use of learning opportunities, and the development competences. depending on the comprehensiveness of this learning model, differences in the relations are attributed to differences in the configuration of the functions. c. könig & r.mulder 41 | f l r interrelations may further be specified as interaction effects or cross-classifications of the structural elements in multilevel models. this might be suited if the researcher wants not only to compare different teacher education systems or programmes, but also to identify the influencing factors on for example competence development of prospective teachers. the aforementioned multigroup model can be extended to a multigroup multilevel model. on the organisational level we have the specific structure and characteristics of the learning environment, cross-classified with characteristics of the selection function and contextual conditions in the education system. the individual level comprises, for example, characteristics of prospective teachers and information about their use of the learning opportunities. the different programmes or systems can easily be integrated into the multigroup approach by specifying the multilevel model for each educational level (for example elementary, middle, or high school level). the aforementioned relationships can then be compared across programmes or systems. any difference in coefficients across the groups informs us about the differential effect of teacher education on competence development across teacher education programmes or systems. with this approach, is it not necessary to keep contextual information constant, because it is directly included in the model. moreover, modern structural equation modeling programmes allow the specification of cross-level interactions. with these interaction it is not only possible to investigate top-down (from the system to the prospective teacher), but also bottom-up processes (from the prospective teacher to the system), or else, to investigate the relation between individual and organisational development more closely. 6. conclusion to sum up, it can be stated that the organisational perspective on teacher education as an open system can contribute to existing research by raising awareness with regard to the interrelations of the different parts of a teacher education system, and the interplay between system and individual prospective teachers. with its focus on the selection and sorting of teacher education candidates and prospective teachers, and on the allocation of teachers to schools in the education system, it offers a framework which facilitates a better understanding of these processes and their relation with teacher education effectiveness. additionally it is flexible enough to allow for further developments and extensions, for example the continuing professional development of teachers once they are in the teaching profession, and offers a framework in which researchers are able to integrate own studies and projects. in the end, the model may lead to substantive new insights which facilitate informed and effective policies in order to make teacher education practice more effective, both for prospective teachers and for the system itself. keypoints an organisational model of teacher education is developed. the model illustrates the dependencies of teacher education and its context. the model illustrates the interplay of individual and organisational development. the model includes characterisations of the selection and allocation functions. the model offers various opportunities for further research on teacher education. references akiba, m., letendre, g. k., & scribner, j. p. (2007). teacher quality, opportunity gap, and national achievement in 46 countries. educational researcher, 36(7), 369-387. doi:10.3102/0013189x07308739 c. könig & r.mulder 42 | f l r angrist, j., & guryan, j. (2008). does teacher testing raise teacher quality? evidence from state certification requirements. economics of education review, 27, 483-503. doi:10.1016/j.econedurev.2007.03.002 baumert, j., kunter, m., blum, w., brunner, m., voss, t., jordan, a., ... tsai, y.-m. (2010). teachers' mathematical knowledge, cognitive activation in the classroom, and student progress. american educational research journal, 47, 133–180. doi:10.3102/0002831209345157 bess, j. l. & dee, j. r. (2008). understanding college and university organization: theories for effective policy and practice; volume i: the state of the system. sterling, va: stylus publishing. blömeke, s. (2006). struktur der lehrerausbildung im internationalen vergleich. ergebnisse einer untersuchung zu acht ländern. zeitschrift für pädagogik, 52, 393-416. http://nbnresolving.de/urn:nbn:de:0111-opus-44668 blömeke, s. (2009). predicting educational and occupational success in teacher training and subject-specific degrees – on the predictive validity of cognitive and psycho-motivational selection criteria. zeitschrift für erziehungswissenschaft, 12, 82-110. doi:10.1007/s11618-008-0044-0 bol, t., & van de werfhorst, h. (2011). signals and closure by degrees: the education effect across 15 european countries. research in social stratification and mobility, 29, 119-132. doi:10.1016/j.rssm.2010.12.002 boyd, d., grossman, p., lankford, h., loeb, s., & wyckoff, j. (2009). teacher preparation and student achievement. educational evaluation and policy analysis, 31, 416-440. doi:10.3102/0162373709353129 boyd, d., grossman, p., hammerness, k., lankford, h., loeb, s., ronfeldt, m., & wyckoff, j. (2012). recruiting effective math teachers: evidence from new york city. american educational research journal, doi:10.3102/0002831211434579 cannata, m. (2010). understanding the teacher job search process: espoused preferences and preferences in use. teachers college record, 112, 2889-2934. http://www.tcrecord.org id number: 16011, date accessed: 1/23/2014 4:13:37 pm connor, c. m., son, s. h., hindman, a. h., & morrison, f. j. (2005). teacher qualifications, classroom practices, family characteristics, and preschool experience: complex effects on first graders‟ vocabulary and early reading outcomes. journal of school psychology, 43, 343-375. doi:10.1016/j.jsp.2005.06.001 croninger, r. g., rice, j. k., rahbun, a., & nishio, m. (2007). teacher qualifications and early learning: effects of certification, degree, and experience on first-grade student achievement. economics of education review, 26, 312-324. doi:10.1016/j.econedurev.2005.05.008 darling-hammond, l., & snyder, j. (2000). authentic assessment of teaching in context. teaching and teacher education, 16, 523-545. doi:10.1016/s0742-051x(00)00015-9 denzler, s., & wolter, s. (2009). sorting into teacher education: how the institutional setting matters. cambridge journal of education, 39, 423-441. doi:10.1080/03057640903352440 ehrenberg, r. g., & smith, r. s. (2011). modern labor economics (11th edition). amsterdam: prentice hall. gansemer-topf, a., & schuh, j. (2006). institutional selectivity and institutional expenditures. research in higher education, 47, 613-142. doi:10.1007/s11162-006-9009-4 goe, l., & strickler, l. (2008). teacher quality and student achievement: making the most of recent research. washington, dc: national comprehensive center for teacher quality. (eric document reproduction service no. ed520769) goldhaber, d. (2007). everyone‟s doing it, but what does teacher testing tell us about teacher effectiveness? journal of human resources, 42, 765-794. doi:10.3368/jhr.xlii.4.765 goldhaber, d., & liddle, s. (2011). the gateway to the profession: assessing teacher preparation programs based on student achievement. seattle: cedr. http://nbn-resolving.de/urn:nbn:de:0111-opus-44668 http://nbn-resolving.de/urn:nbn:de:0111-opus-44668 c. könig & r.mulder 43 | f l r grodsky, e., & jackson, e. (2009). social stratification in higher education. teachers college record, 111, 2347-2384. http://www.tcrecord.org id number: 15713, date accessed: 9/16/2013 3:53:50 pm grossman, p., & mcdonald, m (2008). back to the future: directions for research in teaching and teacher education. american educational research journal, 45, 184-205. doi:10.3102/0002831207312906 harris, d. n. & sass, t. r. (2011). teacher training, teacher quality and student achievement. journal of public economics, 95, 798-812. doi:10.1016/j.jpubeco.2010.11.009 hofman, r. h., hofman, w. h. a., & gray, j. m. (2008). comparing key dimensions of schooling: towards a typology of european school systems. comparative education, 44, 93-110. doi:10.1080/03050060701809508 hopkins, d. (2008). a teacher’s guide to classroom research. maidenhead: mcgraw-hill. huang, f. l., & moon, t. r. (2009). is experience the best teacher? a multilevel analysis of teacher characteristics and student achievement in low performing schools. educational assessment evaluation and accountability, 21, 209-234. doi:10.1007/s11092-009-9074-2 ingersoll, r.m., & strong, m. (2011). the impact of induction and mentoring programs for beginning teachers: a critical review of the research. review of educational research, 81, 201-233. doi:10.3102/0034654311403323 jackson, c. k. (2010). match quality, worker productivity, and worker mobility: direct evidence from teachers. nber working paper 15990. johnson, s. m., & kardos, s. m. (2008). the next generation of teachers: who enters, who stays, and why. in m. cochran-smith, s. feiman-nemser & d.j. mcintyre (eds.), handbook of research on teacher education (pp. 445-467). new york: routledge. jovanovic, b. (1979). job matching and the theory of turnover. journal of political economy, 87, 972-990. http://www.jstor.org/stable/1833078 kast, f. e., & rosenzweig, j. e. (1972). general systems theory: applications for organization and management. the academy of management journal, 15, 447-465. http://www.jstor.org/stable/255141 katz, d., & kahn, r. l. (1978). the social psychology of organizations. new york: wiley. kennedy, m. (1998). learning to teach writing: does teacher education make a difference? new york: teachers college press. konold, t., jablonski, b., nottingham, a., kessler, l., byrd, s., imig, s., … mcnergney, r. (2008). adding value to public schools – investigating teacher education, teaching, and pupil learning. journal of teacher education, 59, 300-312. doi:10.1177/0022487108321378 korthagen, f.j.a. (2010). situated learning theory and the pedagogy of teacher education: towards an integrative view of teacher behavior and teacher learning. teaching and teacher education, 26, 98106. doi:10.1016/j.tate.2009.05.001 lankford, h., & wyckoff, j. (2010). teacher labor markets: an overview. in d.j. brewer & p.j. mcewan (eds.), economics of education (pp. 235-242). london: elsevier. liu, e., & johnson, s. (2006). new teachers‟ experiences of hiring: late, rushed, and information-poor. educational administration quarterly, 42, 324-360. doi:10.1177/0013161x05282610 loeb, s., kalogrides, t., & beteille, t. (2012). effective schools: teacher hiring, assignment, development, and retention. education finance and policy, 7, 269-304. doi:10.1162/edfp_a_00068 luschei, t., & carnoy, m. (2010). educational production and the distribution of teachers in uruguay. international journal of educational development, 30, 169-181. doi:10.1016/j.ijedudev.2009.08.004 little, j., & bartlett, l. (2010). the teacher workforce and problems of educational equity. review of research in education, 34, 285-328. doi:10.3102/0091732x09356099 maaz, k., hausen, c., mcelvany, n., & baumert, j. (2006). keyword: transitions in the educational system. zeitschrift für erziehungswissenschaft, 9, 299-327. doi:10.1007/s11618-006-0053-9 c. könig & r.mulder 44 | f l r marshall, j. h., & sorto, a. m. (2012). the effects of teacher mathematics knowledge and pedagogy on student achievement in rural guatemala. international review of education, 58, 173-197. doi:10.1007/s11159-012-9276-6 martz, w. (2013). evaluating organizational performance: rational, natural, and open system models. american journal of evaluation, 34, 385-401. doi:10.1177/1098214013479151 morge, l., toczek, m-c., & chakroun, n. (2010). a training programme on managing science class interactions: its impact on teachers„ practises and on their pupils achievement. teaching and teacher education, 26, 415-426. doi:10.1016/j.tate.2009.05.008 musset, p. (2010). initial teacher education and continuing training policies in a comparative perspective: current practices in oecd countries and a literature review on potential effects. oecd working papers no. 48. paris: oecd publishing. oecd (2011). teachers matter: attracting, developing and retaining effective teachers. pointers for policy development. paris: directorate for education, education and training policy division. http://www.oecd.org/edu/school/48627229.pdf paine, l., & zeichner, k. (2012). the local and the global in reforming teaching and teacher education. comparative education review, 56, 569-583. doi:10.1086/667769 parsons, t. (1951). the social system. london: routledge. phillips, k. j. r. (2010). what does „highly qualified‟ mean for student achievement? evaluating the relationships between teacher quality indicators and at-risk students‟ mathematics and reading achievement gains in first grade. elementary school journal, 110, 464-493. http://www.jstor.org/stable/10.1086/651192 rothstein, j. (2012). teacher quality policy when supply matters. nber working paper 18419. saks, a., uggerslev, k., & fassina, n. (2007). socialization tactics and newcomer adjustment: a metaanalytic review and test of a model. journal of vocational behavior, 70, 413-446. doi:10.1016/j.jvb.2006.12.004 schacter, j., & thum, y. m. (2004). paying for high and low-quality teaching. economics of education review, 23, 411–430. doi:10.1016/j.econedurev.2003.08.002 schneider, m., & somers, m. (2006). organizations as complex adaptive systems: implications of complexity theory for leadership research. the leadership quarterly, 17, 351-365. doi:10.1016/j.leaqua.2006.04.006 schwille, j., & dembele, m. (2007). global perspective on teacher learning: improving policy and practice. paris: unesco international institute for educational planning. scott, w. r., & davis, g. f. (2007). organizations and organizing. rational, natural, and open system perspectives. upper sadle river: pearson. sicherman n., & galor o., (1990). a theory of career mobility. journal of political economy, 98, 169-192. http://www.jstor.org/stable/2937647 staiger, d. o., & rockoff, j. e. (2010). searching for effective teachers with imperfect information. journal of economic perspectives, 24, 97-118. doi:10.1257/jep.24.3.97 stiglitz, j. e. (1975). the theory of screening, education and the distribution of income. american economic review, 65, 283-300. http://www.jstor.org/stable/1804834 tillema, h. h. (1994). training and professional expertise: bridging the gap between new information and pre-existing beliefs of teachers. teaching and teacher education, 10, 601-615. doi:10.1016/0742051x(94)90029-9 van de werfhorst, h. g. (2011). skills, positional good or social closure? the role of education across structural-institutional labour market settings. journal of education and work, 24, 521-548. doi:10.1080/13639080.2011.586994 c. könig & r.mulder 45 | f l r van de werfhorst, h. g., & mijs, j. j. b. (2010). achievement inequality and the institutional structure of educational systems: a comparative perspective. annual review of sociology, 36, 407-428. doi:10.1146/annurev.soc.012809.102538 van der velden, r., & wolbers, m. h. j. (2007). how much does education matter and why? european sociological review, 23, 65-80. doi:10.1093/esr/jcl020 wang, a., coleman, a., coley, r., & phelps, r. (2003). preparing teachers around the world. princeton: educational testing service. wang, j., odell, s. j., & schwille, s. a. (2008). effects of teacher induction on beginning teachers„ teaching. a critical review of the literature. journal of teacher education, 59, 132-152. doi:10.1177/0022487107314002 weinert, f. e. (2001). concept of competence: a conceptual clarification. in d. s. rychen, & l. h. salganik (eds.), defining and selecting key competencies (pp. 45-65). seattle, wa: hogrefe & huber. winters, m. a., dixon, b. l., & greene, j. p. (2012). observed characteristics and teacher quality: impacts of sample selection on a value added model. economics of education review, 31, 19-32. doi:10.1016/j.econedurev.2011.07.014 yeh, s.s. (2009). the cost-effectiveness of raising teacher quality. educational research review, 4, 220232. doi:10.1016/j.edurev.2008.06.002 zeichner, k. (1983). alternative paradigms of teacher education. journal of teacher education, 34, 3-9. doi:10.1177/002248718303400302 zeichner, k. (2005). a research agenda for teacher education. in m. cochran smith, & k. zeichner (eds.), studying teacher education (pp. 737-761). mahwah: lawrence erlbaum. zeichner, k. (2006). studying teacher education programs: enriching and enlarging the inquiry. in c.f. conrad, & r.c. serlin (eds.), the sage handbook for research in education (pp. 79-95). thousand oaks: sage. zeichner, k., & conklin, h.g. (2008). teacher education programs as sites for teacher preparation. in m. cochran-smith, s. feiman-nemser, d. mcintyre, & k. demers (eds.), handbook of research on teacher education (pp. 269-289). new york: routledge. codepen 5minnameierpub frontline learning research special issue vol 8, no. 5 (2020) 70 91 issn 2295-3159 explaining happy victimizing in adulthood – a cognitive and economic approach gerhard minnameiera agoethe university frankfurt am main, germany article received 1 june 2018 / revised 21 november 2019 / accepted 21 november / available online 1 july 2020 abstract while acknowledging the phenomenon of “happy victimizing” (hv), the classical explanation is questioned and challenged. hv is typically explained by a lack of moral motivation (mm) that is thought to develop in late childhood and adolescence. apart from empirical evidence for widespread hv in adulthood, there are also strong theoretical arguments against the classical explanation. firstly, there are arguments against the coherence of the very concept of mm. secondly, while the classical explanation focuses on internal drivers (in the sense of mm), the one proposed in the present paper focuses on the patterns of interaction. accordingly, hv may depend less on internalised values and individual motivation (whether in terms of moral internalism or moral externalism), and more on the “rules of the game” that are established in social interaction (or not). on this account, hv appears where higher order moral rules are not established and cannot be established, either due the circumstances or due to the unwillingness (or incapability) to play by the rules of these higher order games (where “games” are to be understood in the game-theoretic sense). the ordinary one-shot prisoners’ dilemma is a case in point. it precludes promise-giving as well as other higher order moral regimes, but instead forces the agents into a conflict of interest, where everyone has to mind their own business. moreover, claiming that all players have to pursue their own self-interest, can be understood as moral rule of its own. keywords: happy victimizer phenomenon; moral stages; moral reasoning; moral motivation; moral internalism and externalism; game theory; norms and conventions info corresponding email: minnameier@econ.uni-frankfurt.de doi: https://doi.org/10.14786/flr.v8i5.381 1. introduction it is well known and empirically well established that 4 to 6 years old children have a marked propensity for the so-called happy victimizer pattern (hvp). this is the combination of violating a known moral rule and feeling good about it (nunner-winkler & sodian 1988; arsenio, gold, & adams, 2006). while older children have more often mixed feelings (bad about acting immorally, good about what they get by doing so), the younger ones do not seem to feel any remorse, even though they know and accept the moral rules they violate. in the first place, hvp was discovered using a projective method, where the children had to look at a series of pictures that illustrated a case of immoral behaviour. then they were asked whether the behaviour of the protagonist was ok or not. finally they were asked how he or she felt and why. later on, the study was replicated asking the children directly what they would think and feel, if they were in the protagonist’s shoes (keller, lourenço, malti, & saalbach, 2003). even in these circumstances, about 50 per cent of the participants showed hvp. the proposed explanation was straightforward. since children obviously understand and seem to have internalised the moral norms, they are said to have the moral knowledge necessary for moral action, but lack moral motivation (nunner-winkler & sodian, 1988; nunner-winkler 2007; 2013; arsenio et al., 2006; krettenauer, malti & sokol, 2008). moral motivation has even been thought to be the crucial ingredient that developed only gradually in late childhood and adolescence and prevented many from doing the right thing (nunner-winkler 1999; 2007; malti & krettenauer, 2013). however, what is moral motivation? even though the concept is old, it is still difficult to grasp (brink, 1997; smith, 1994/2005; zangwill, 2003; minnameier, 2010; malti & krettenauer, 2013; wren 2013; heinrichs, minnameier, gutzwiller-helfenfinger, & latzko, 2015; rosati, 2016). furthermore, concerning its development, the proponents of moral motivation remain mostly silent. while nunner-winkler has thought that it develops gradually throughout childhood and adolescence (see above), others think it is there right from the start (malti & krettenauer, 2013, referring to warneken & tomasello, 2009, who have found strong evidence for pro-social behaviour among toddlers). what we know, however, is that hvp has turned out to be salient even among young adults and contrary to the classical explanation, according to which hvp should vanish in late childhood (nunner-winkler 2007; krettenauer, malti & sokol, 2008; minnameier & schmidt, 2013; heinrichs et al. 2015). beyond the narrow frame of research on hvp, there is also huge evidence for happy-go-lucky cheating and other forms of moral victimisation among adults (batson et al., 1999; 2002; ariely, 2012, rustichini, & villeval, 2014). in particular, it is well known that students of economics and business administration act more selfishly in various experiments than students of other subjects (marwell & ames, 1981; frey, 1986; carter & irons, 1991; frank et al., 1993; 1996; frey & meier, 2003; rubinstein 2006). further analyses have mainly focused on whether this points to a self-selection of self-interested individuals into economics-related studies or to an “indoctrination effect”, which means that economics is taught in such a way that makes students more selfish and competitive. the differentiation was introduced by carter and irons (1991) who found evidence in favour of the self-selection hypothesis, based on the ultimatum game. however, selten and ockenfels (1998) as well as frank et al. (1993) found evidence for an indoctrination effect. within this economic body of research, however, it has never been asked how people actually take their decisions and what is really appropriate in specific circumstances. after all, we know that people use moral principles in a situation-specific manner (krebs & denton, 2005; rai & fiske, 2011; minnameier, beck, heinrichs, & parche-kawik, 1999; minnameier & schmidt, 2013). hence, the question arises how economists and non-economists actually take their decisions and what is (more or less) appropriate. by the same token, hvp in general might be explainable as an action pattern that is morally justified (or at least justifiable) under specific conditions. thus, while the phenomena are almost crystal clear, the explanation is not. in section 2.1, i will summarise arguments against the received explanation from moral motivation that i have elaborated elsewhere in detail (minnameier 2010; see also 2012; 2013). on this account, the common understanding of moral motivation has to be turned around almost completely. what i suggest instead in the remainder of section 2 is to replace the false dichotomy of moral cognition and moral motivation by a more comprehensive theory of moral judgement and agency that includes processes of abduction, deduction and induction. this approach converges not only with the integrative account of rai and fiske (2011), but also with a “reason-based theory of rational choice” (dietrich & list, 2013a and b) that allows us to integrate cognitive moral psychology with rational choice theory (section 2.2). as it turns out, however, this theory of moral decision-taking has to be integrated into a game-theoretic view, because moral principles represent more than just personal values (section 2.3). in section 3 a study is presented that contains important evidence in favour of the reason-based approach, and section 4 contains an extensive discussion that also introduces further theoretical ramifications. section 5 concludes. 2. moral motivation and beyond 2.1. the problem of moral motivation the idea of moral motivation has different sources. for instance, kant needed it to explain why an individual ought not only to follow moral principles, but to engage in moral reasoning and develop moral principles in the first place (ameriks, 2006). this is why kant says that “(t)here is nothing it is possible to think of anywhere in the world, or indeed anything at all outside it, that can be held to be good without limitation, excepting only a good will” (2002/1785, p. 9 [4.393] ). in modern philosophy and psychology, the concept has been used to explain hvp-like behaviour (in terms of a lack of moral motivation). philippa foot (1972) kicked off the modern debate in philosophy observing that we can very well be indifferent to morality without being irrational (zangwill, 2003). and in moral psychology it was augusto blasi (1984) and james rest (1984) who held against kohlberg that moral reasons did not motivate moral action directly, but needed to be sided by judgements of responsibility (blasi) or moral motivation (rest). both strands of reasoning, the one in moral philosophy and the one in moral psychology, are directed against a view traditionally ascribed to socrates. socrates is said to have claimed that knowing what morality (or virtue) demands motivates virtuous action and that acting otherwise indicates the agent’s ignorance about the moral course of action (see e.g., brickhouse & smith, 2010, esp. chap. 3). advocates of moral externalism hold that one can very well know what the moral course of action would be, yet not be motivated and therefore act in a different way. moreover, they do not interpret this as a mere case of weakness of the will, in which agents would act against their own intentions and possibly be racked by remorse as a consequence. their view is that a moral judgement has to combine with a desire, where both are related only contingently. therefore, on the externalist account, moral judgement does not motivate by itself, but needs to be seconded by moral motivation (see e.g., rosati, 2016). this is precisely the way in which james rest, in the psychological camp, has defined moral motivation, i.e., to “select among competing value outcomes of ideals the one to act on; deciding whether or not to try to fulfil one’s moral ideal” (1984, 27). and he differentiates it from willpower, which means “to execute and implement what one intends to do” (ibid.). this is, by and large, the current state of affairs in moral psychology concerning the concept of moral motivation (see e.g. thoma & bebeau, 2013; nunner-winkler, 2013). against this view, i have argued elsewhere that this conceptual framework is inconsistent for two reasons (see minnameier, 2010; 2013). the first is that rest’s definition of moral motivation implies a judgment: if individuals are not driven against their will (which would indicate lack of willpower rather than lack of moral motivation), but select freely among competing value outcomes, this kind of selection requires a decision based on some criterion. in other words, moral motivation would not be independent from moral judgement, but would have to involve some kind of moral judgement resulting in an intention that motivates action. in fact, deciding whether to go for some personal benefit or to dispense with it for some other-regarding motive is even the paradigmatic case of a moral problem as such. therefore, the classical definition of moral motivation seems ill-conceived? the second point pertains to what makes moral motivation moral. how could moral motivation possibly be distinguished from other kinds of motivation, if not for an underlying reason or so? i see no other way to explain the moral aspect of moral motivation than to refer to some underlying reason. hence, moral motivation would have to be derived from the conviction of being morally obliged in some way. 2.2. inferential moral reasoning and a reason-based theory of rational choice on the cognitive view, moral motivation in the sense of rest’s third component is not just eliminated, but rather replaced by a specific kind of judgement within a more comprehensive notion of moral judgement. this conception allows for three different parts of moral reasoning which play distinctive roles in the formation of a moral intention. in particular, this broader notion of moral reasoning comprises abduction, deduction, and induction, which in turn are based on the charles s. peirce’s pragmatist theory of inferential reasoning (see minnameier, 2004; 2017). according to this approach, both the adoption and the application of moral principles are mediated by three characteristic inferences called “abduction”, “deduction” and “induction”. any kind of development is triggered by a negative, i.e., disconfirming, induction, which means that a certain principle that one previously adhered to fails to do the job in the given situation. for instance, if a child uses the golden rule (“do unto others as you would have others do unto you”) and now faces a decision at school about where to go for a class outing, this rule might not suffice. for even if everybody followed this principle, no unitary decision might be obtained. hence, instead of trying to square diverging individual interests, a decision has to be taken at the level of the group. majority voting would be a solution for this kind of problem, and this solution allows us to transcend individual interests in order to determine what is best for the group as a whole. moreover, group decisions call for loyalty on the part of the outvoted members. the shift onto the higher stage can be understood as being mediated, firstly, by a negative induction which leads to the insight that the golden rule fails to solve the problem at hand. secondly, a new principle has to be invented, which is captured by an abductive inference. thirdly, deduction tells us what follows, if the principle is applied to the situation at hand, in particular what kind of action would have to be taken. fourth, and finally, induction leads to the adoption (or rejection) of that principle as a guideline for the present situation, but equally for all situations of the same type. these inferential processes can also be assumed to mediate moral action, where moral principles are not invented ab ovo in a genuine developmental transformation, but merely activated in relevant situations (see minnameier, 2013). in those cases, the inferential reasoning may take an explicit or implicit (habitual) form, especially in well-rehearsed situations, for which habitual action schemes have been established. in the latter case, we may immediately know how to react without engaging in any (further) reasoning, but would have to be able to explicate our reasons on demand. the inferential approach meshes perfectly with a reason-based theory of rational choice (rbt) (dietrich & list, 2013a and b; minnameier 2016a). dietrich and list strongly criticise the behaviourist approach that is still prominent in economics today (see e.g. gul & pesendorfer, 2008) and which relies on the concept of “revealed preferences” (samuelson, 1938; 1948). this means the preferences are revealed from choice data and rationality assumptions like completeness and transitivity, just to name the most important. if you choose apples, where you could have had pears for the same price, this reveals that you prefer apples over pears. however, our “real” preferences are usually of a deeper psychologic nature, so that we might buy a bunch of red roses to impress a beloved person, not because we find them particularly attractive and worth the price for what they are as such. therefore, dietrich and list formulate a rational choice theory in which motivating reasons (i.e., the underlying drivers which need not necessarily be made conscious) play a central role. we could also call them the “fundamental preferences” that express what we are really striving for, as opposed to the concrete or instrumental preferences, like the one for red roses in the example. the latter are formed based on the former and the specific restrictions that hold in a given situation (like time, money, and so forth). if we take moral principles as fundamental preferences or “reasons” in terms of rbt, then the situation-specific adaptation of morality is theoretically tractable within a rational-choice-theoretic context. the situational cues provide information about the positive and negative restrictions (affordances and constraints), which we would have to understand as “beliefs”, because what counts for the explanation of behaviour is how the agent views the situation, not an objective account of it. the fundamental preferences and the beliefs determine the concrete preferences that we could also call “intentions” in the context of morality. this in turn, would determine a rational action. however, people might fail to act according to their own intentions, especially for lack of willpower, which allows us to include “irrational choice” as well. in the inferential context, we can say, accordingly, that a moral problem perceived as a feature of the situation abductively leads to a moral principle that captures this problem. from this principle and the situational premises we can deductively derive one or several moral courses of action, which just follow from applying the principle to the situation. the last, inductive, step is to determine whether this solution of the moral problem can be appropriately implemented. sometimes, the price one has to pay, or the risk, might be too high, so that we might legitimately discard a certain course of action at this stage. however, if there are no such problems, we commit ourselves and so a moral intention is formed. this would then have to be carried out on pain of irrationality. finally, i think this reconstruction is valid also for intuitive moral agency, because if we do not assume any kind of moral reason or principle to underlie the act, we could never distinguish self-interested action from moral action. therefore, even an intuitive helping behaviour that ought to be characterised as “moral” must imply a moral point of view (e.g., that the other person is in need of help and therefore should be helped). consequently, some kind of moral cognition would always have to be abducted, even in non-deliberative moral decision making (see minnameier, 2016b; 2017; hermkes, 2016). 2.3. morality in the game-theoretical context rational choice theory branches out into decision theory and game theory (see e.g., binmore, 2009, p. 25). the former relates to the choices a rational agent takes in a certain environment (“games against nature”), whereas the latter concerns the interaction with one or more other rational agents. on the reason-based account, moral agency is modelled within a decision-theoretic framework, and moral values are treated as personal values (fundamental preferences) according to which the individual wants to live. however, from such a point of view, moral judgements are (involuntarily) reduced to prudential judgements, and so is moral action reduced to prudential action, because whatever an agent does, it is always explained in terms of maximising utility with respect to satisfying fundamental (but still personal) preferences (whether we call them morals, values or virtues). this raises the question of what morality really is. this question is very topical also in the context of prosocial behaviour among toddlers and apes (for reviews see paulus, 2014; killen & smetana, 2015). it appears as if they have a sense of morality or justice. however, it may be a mistake to infer directly from prosociality to morality. first of all, agents may have altruistic or other-regarding orientations, but these would none the less be their own orientations. they might be explained in terms of vicarious experience caused mirror-neuron activity. however, in whichever way these orientations are explained, the hard problem remains. the hard problem is that stretching morality to include mere prosociality would make the distinction between morality and prudence obsolete or impossible. conversely, if (first person) prosociality is distinguished from (third person) morality, the morality requires a third-person perspective (or meta-perspective) from which the differentiated perspectives of self and other(s) are looked upon and coordinated in some way. this idea of a hard problem can even be taken further. we typically conceive morality in terms of (other-regarding) personal values. people who have internalised such values are regarded as highly moral people in the sense that they are intrinsically motivated to act morally. conversely, those who are only extrinsically motivated appear to use morality instrumentally and are therefore considered not to be morally motivated. in other words, they are not moral agents, but only do as if they were moral for some selfish reasons. again, they seem to be guided by a prudential rationality rather than a moral rationality. however, what appears as truly moral on this account, still runs into the hard problem, because if agents follow their personal values and try to maximise utility in this respect, they will always have to follow a prudential rationality, whatever the specific content of their values and however other-regarding they may be. this hard problem remains, because morality is reduced to a “decision-theoretic” problem rather than a “game-theoretic” one. however, morality is basically a social project, not a question of the good life for the individual, and hence not just a question of personal values. if moral principles are to make any sense, they do not only address the agent as an individual, but all agents involved in a moral problem. moral rules are social rules, and these rules have to be accepted and heeded by the set of agents to whom the rules apply. a further common misunderstanding or misconception is this: if we think of morality in the game-theoretic sense, we often treat moral issues in terms of a zero-sum game. a zero-sum game is one in which a fixed sum of money or amount of goods is to be allocated. the overall sum does neither increase nor decrease. in this very sense we think that rich people should transfer some of their riches to the poor, thereby leaving the total unchanged. on this account, those who give suffer a loss (for moral reasons), and the needy enjoy the benefit. however, apart from zero-sum games we have two other types, i.e., coordination games and cooperation games. coordination games are less problematic, because coordination is always beneficial for every agent. for instance, in some countries people drive on the right-hand side, in some they drive on the left-hand side. none the less there is no reason, say, for those from the continent to drive on the left while in the united kingdom, and those from the uk would have no incentive to drive on the right while on the continent (unless they wanted to commit suicide). where we drive is a question of “conventions”, because one just has to follow a uniform rule, and there seems to be no deeper reason, at least no moral one, why we should decide to drive on the right or on the left. thus, conventions are the solution concept for coordination games, and i think domain theorists like turiel (2002) and nucci (2008) could benefit from consulting game theory to distinguish conventional problems from moral problems. cooperation games are different. the prisoners’ dilemma is a classic example. in this type of situation, “cooperation” would benefit everyone (like in a coordination game), but it does not constitute a stable equilibrium in the sense of a nash equilibrium. in the prisoners’ dilemma the agents have an incentive to defect, which takes them to a social trap in the form of a pareto-inferior nash equilibrium. their only way to overcome this deadlock is to invent a so-called institution, i.e., a rule backed by suitable sanctioning mechanisms that ensures cooperation. if the prisoners’ dilemma were not as restrictive as it is (where the agents are not allowed to communicate with each other), they would perhaps agree not to confess to the judge and threaten each other with the punishment of comrades outside the prison or with their own fierce revenge in case the other might defect. however, such an institution is tantamount to a moral rule, in particular that of a mutual promise or contract. and the potential sanctions may not only be negative, but also positive (respect for one’s reliability and a mutual willingness to cooperate in the future). hence, moral principles can be straightforwardly reconstructed as solution concepts for so-called cooperation problems? i think this hits the point, and i also think that both the decision-theoretic interpretation as well as the understanding of morality in terms of a zero-sum logic are severely flawed. these misconceptions are at the heart of fatal errors about moral motivation and the related normative questions (see e.g. minnameier, 2013, and the discussion of empirical results below). 3. empirical evidence of hvp in the light of the reason-based approach 3.1. review of a study on morality in the economic context here, i would like to summarise the data and results of a recent study on how economists and non-economists choose to act in different framings of the famous prisoners’ dilemma (minnameier, heinrichs, & kirschbaum, 2016), which also yields important insights about how prevalent hvp is among different groups and in different contexts, and on how it might be brought about. this is essential for the crucial question of whether those who show the pattern are really motivated immorally or not, and whether the self-interested behaviour is to be morally condemned and educationally tackled or is rather acceptable or even desirable as an adequate situation-specific response to the constraints the individual faces. as mentioned in the introduction, there is clear evidence that students of economics and business administration behave more selfishly than others, at least in certain situations. and by the same token we may assume that hvp is more prevalent among these students. however, not so much is known about the motivation of either self-interested or other-regarding behaviour. moral psychologists tend to ask why people act selfishly and fail to follow moral rules. economists wonder about other-regarding behaviour which they consider irrational. thus, there seems to be a need for clarification. the prisoners’ dilemma is a case in point. it has been used to measure moral motivation (see nunner-winkler, 2007), where cooperation indicates high moral motivation and defection low moral motivation. on this view, however, the moral course of action is the exact opposite of rational action in terms of an economic analysis. conversely, defection, which is the dominant strategy and thus rational from an economic point of view, is thought to indicate a lack of moral motivation. hence, you could either be “rational” or “moral”, but not both. luckily, the reason-based approach sketched above allows us to reconcile morality and rationality. based on it, we may assume that even though economists might generally be more self-interested than students of other subject matters (especially following the self-selection hypothesis), not all those who take self-interested decisions may do it for lack of moral motivation, but may simply be more realistic and prudent in their decision-making. to test this assumption, we have carried out a study using the prisoners’ dilemma (pd) in the two framings already discussed above, i.e., we called it “wall street game” in one condition and “community game” in the other. additionally, we let participants express their feelings and reasons. our hypotheses in this context were the following (we use “economists” as a short cut for “students of economics and business administration”, and “b&e education” for “business and economics education”): compared with other students, economists (a) defect to a greater extent and (b) are less vulnerable to the framing (since they know they are in a pd and that defection is the dominant strategy). accordingly, we expect (a) more happy victimizers than happy moralists among economists and (b) fewer happy victimizers than happy moralists among the students of other subjects. we expect that at least in part these differences do not indicate differences in moral judgement and/or moral motivation, but merely different views of the situational constraints. since we have also students of b&e education in our sample, we expect them to score at intermediate levels as compared with the other two groups. 3.2. sample and design the sample consists of 481 undergraduates from two german universities (frankfurt am main and bamberg). 54 percent of the sample are bachelor students of economics and business administration (“economists”), 14 percent are bachelor students of b&e education, and 32 percent are teacher students with no economics-related subject (henceforth “teacher students”). 39 percent of our participants are male, 61 percent are female. the average age is 22.4 years. the participants were presented a classical pd in two different frames. for one half it was called “wall street game” (wg), for the other half it was called “community game” (cg) (see liberman et al., 2004; ellingsen et al., 2012). they were randomly assigned to one of the two conditions. of the 481 participants, 227 were in the cg-condition, 254 in the wg-condition (χ^2 = 1.516, p = 0.218). the description and instructions were exactly the same in both conditions with the only exception that the respective name of the game appeared three times in the instructions. they had to decide between two options, a and b, and were given the following information: if both you and the other person choose a you both get €50. if both you and the other person choose b you both get €20. if you choose a and the other person chooses b you get €5 and the other person gets €80. if you choose b and the other person chooses a you get €80 and the other person gets €5. after having chosen one of the two possible strategies – which are commonly called “cooperate” (a) and “defect” (b) – they had to explain, first, why they decided the way they did. after this, they had to rate how they feel about the decision on a four-point likert-scale (good, rather good, rather bad, bad) and, finally, they had to explain their feelings. the data on the participants’ emotions allowed us to code the answers in the happy-victimizer framework (arsenio, gold & adams, 2006; nunner-winkler, 2007; 2013), where good or rather good feelings indicate “happiness” and bad or rather bad feelings indicate “unhappiness”. concerning the choice options, cooperation codes for “moralists” and defection for “victimizers”. thus we classify the answers as happy (hv) or unhappy victimizers (uv) and happy (hm) or unhappy moralists (um). 3.3 results as for the framing in terms of wg and cg, no framing effect can be identified in the total sample (χ^2 = 0.112, df = 1, n = 481, p = 0.783). the same is true for economists (χ^2 = 0.007, df = 1, p = 1) and b&e education (χ^2 = 0.122, df = 1, p = 0.804), but a rather strong one among the non-economist teacher students (χ^2 = 5.374, df = 1, p = 0.029), with 80 percent cooperation in the cg-condition and only 63 percent in the wg-condition (see table 1). economists only cooperate by 45 percent (cg) and 44 percent (wg) respectively. this is not surprising, since economists know the prisoners’ dilemma and have identified it as such. apart from the framing, however, the defection rate of economists is much higher, in general, than that of non-economists. students of b&e education score at intermediate levels. table 1 framing, cooperation and defection in the pd as table 1 also reveals, the study programme has a significant effect on the decisions taken (χ^2 = 24.799, df = 2, n = 476, p = 0.000). this effect remains, if we control for gender (male: χ^2 = 10.988, df = 2, n = 187, p = 0.002 (one-tailed); female: χ^2 = 13.208, df = 2, n = 289, p < 0.001, one-tailed). thus, hypotheses 1a and 1b are both strongly confirmed. and concerning hypotheses 4 we can state that students of b&e education are at an intermediate level with respect to the proportion of cooperation and defection. perhaps even more importantly, we see a major difference in terms of agency (hv/uv/hm/um), with 55 percent hvs among economists, but only 31 percent among non-economists. conversely, 67 percent of the non-economists are hms, whereas only 36 percent of the economists are hms (see figure 1; χ^2 = 26.094, df = 6, n = 337, p = 0.000, based on fisher’s exact test). if we control for gender, results are still statistically significant (male: χ^2 = 11.016, df = 6, n = 130, p = 0.044 (one-tailed); female: χ^2 = 12.052, df = 6, n = 204, p = 0.026, one-tailed). hypothesis 2 is confirmed by these data, and so is hypothesis 4. figure 1: the proportions of hv, uv, hm, and um. another important result relates to the reasons the participants give for their choices and feelings. we classified them in terms of neo-kohlbergian stages, that are explained below in the discussion. three such types can be distinguished (see figure 2): 1. those who ignore or fail to see the conflict of interest inherent in the pd and just do what they think is best for all, i.e., to cooperate (stage 1c). 2. those who see the conflict of interest and argue that they just have to pursue theirs, which is to defect (stage 2a). 3. those who see the conflict of interest and wish to overcome it. they decide to cooperate, but also express that they would be ready to defect, should their partner be unwilling to cooperate (stage 2c). figure 2: types of moral reasoning. with respect to the third type, there is no difference between the groups. therefore, the overall difference in moral orientations is down to a trade-off between the other two types. economists have a comparatively strong tendency to argue that as an agent in this game they have to pursue their personal interest, whereas teacher students show a strong orientation towards what would be collectively rational (χ^2 = 29.548, df = 4, n = 338, p = 0.000). if we control for gender, however, we only get significant results for female participants (χ^2 = 17.653, df = 4, n = 205, p < 0.001; male: χ^2 = 7.648, df = 4, n = 130, p = 0.053. one-tailed). among economists, 60.2 percent are type 2, and 16.1 percent are type 1. conversely, among teacher students we find 32.2 percent type 2 and 44.8 percent type 1 participants. the fact that there are no differences with respect to the more sophisticated third type is at least an indication that the participants might not differ in their moral preferences, but instead in their relevant beliefs. the strong defective orientation, coupled with type-2-reasoning that economists reveal, may be due to the fact that they are highly aware of the situational constraint the pd poses. conversely, the teacher students seem to be oblivious to this very fact. they seem to focus on the common interest, ignoring that the pd models conflicting interests. in other words, the common interest should be to overcome this problem of conflicting interests, which is precisely what type-3-participants aim at. however, type-1-participants are unable (or unwilling) to see this basic problem. in the context of the differentiation between preferences and restrictions this means that they, as a matter of fact, conceive of quite different restrictions than participants of type 2 or type 3. inasmuch as the differences in agency derive from differences in beliefs rather than differences in basic moral preferences, the different groups might differ neither in their moral judgement competence nor in what might be called their moral motivation, but in their comprehension of situational constraints. a further analysis, illustrated in figure 3 supports this view. here we only look at hvs and the reasons they give for their choices and feelings. some of them explicitly refer to the situational constraints and state that cooperation would have been better, but in the situation as it was, they just had to choose the other option. they clearly would have preferred to cooperate. they are not unhappy, however, because they think they have taken the right decision. in quite some participants we could identify this kind of reasoning, and we call these “strategic moralists”, because they are morally motivated, but also focus on the prudential aspect of what kind of morality can be implemented in the present situation. figure 3: strategic moralists and happy victimizers in the strict sense. again, we see more strategic moralists among economists than among teacher students. this does not mean that economists are, generally speaking, not more selfish than others. hypothesis 3 only states that the stronger tendency towards an hv-like agency on the part of the economists is not only an effect of a higher level of selfishness, but in part down to different beliefs. this is confirmed by the data. furthermore, students of b&e education score on intermediate levels also in this respect. hypothesis 4 is thus confirmed throughout. figure 3: strategic moralists and happy victimizers in the strict sense. again, we see more strategic moralists among economists than among teacher students. this does not mean that economists are, generally speaking, not more selfish than others. hypothesis 3 only states that the stronger tendency towards an hv-like agency on the part of the economists is not only an effect of a higher level of selfishness, but in part down to different beliefs. this is confirmed by the data. furthermore, students of b&e education score on intermediate levels also in this respect. hypothesis 4 is thus confirmed throughout. 4. discussion: moral principles as social institutions 4.1. morality in the prisoners’ dilemma above, i have made a claim for a game-theoretic understanding of morality, where moral rules have to be rules of a game that the agents play. if they are to be rules in the game-theoretic sense, however, they have to be self-enforcing (binmore, 2010). that is, following moral rules must be somehow pay – especially in “moral currencies” like respect, reputation and the like – and violating them must translate into costs, so that not complying with the rules is clearly irrational (at least for those who understand the rules). put in other words, a simple appeal to moral rules and precincts is worth nothing, if those rules cannot secure compliance by themselves (and if this fault is not compensated by some other rules). this is the case in the (one-shot) pd, where a morality of contract or the golden rule cannot be implemented, simply because these moral rules are either impossible (striking a deal) or not enforceable (golden rule). the golden rule is not enforceable, because no signals of approval or disproval, or even of the appropriateness of the golden rule, can be exchanged, which means that this “moral currency” is invalid in this case. this explains why such moral orientations are usually crowded out very quickly (for the remaining levels of cooperation see footnote 9). however, this is not the whole story. if we take moral rules as rules of the game, we can make even more sense of the strategies that individuals choose in the game. first of all, even where individuals defect, they may still follow a moral rule, the rule that everyone has interests to pursue. this is one variant of kohlberg’s stage 2, which concerns the “awareness that each person has interests to pursue and that these may conflict” (colby & kohlberg, 1987, p. 26). defectors, whom we have classified as hvs, do not betray morality, but follow a specific morality that acknowledges the dignity of each person’s individual interest and that these interests may conflict. this is the “type 2” morality specified above. in such cases of conflict of interest (and in the absence of any means to mediate between these conflicting interests), agents are morally justified to pursue their interests, but also have to accept others doing so. this morality is employed by both hvs and sms, where the difference is that sms explicitly state that a higher order morality would not work. secondly, “type 1” morality seems naïve, because it ignores the conflict of interest and rather assumes that each individual’s interest in the other person’s well-being is strong enough to preclude defection. this is a kind of morality that would work among closely affiliated agents (e.g., family members and friends are typically supposed to care for each other and dispense with a personal profit or benefit to support the other). their moral appeals for solidarity would therefore fall flat. thirdly, “type 3” morality is employed by those who see a chance to coordinate with others, but this higher form of morality is almost certain to be crowded out, since individuals frequently express they would switch to defection, if their partners fail to cooperate. fourthly, the situation would be different, if we allowed for changes in the game. for instance, the pd could be played repeatedly. in this case, a tit-for-tat strategy is possible. if player 2 understands the intention of player 1’s choices, cooperation can easily emerge in this repeated interaction. player 1’s cooperation in the first round can be rewarded by player 2’s cooperation in the second round, and vice versa. conversely, player 1’s defection in one round can be punished by player 2’s defection in the following round. every cooperative choice can be understood as a promise to cooperate in the future, provided that the other player follows suit. hence, under these circumstances the morality of promise-giving can be implemented und generate benefits for both 4.2. the economics of morality in the game-theoretic context finally, we can generalise the kind of moral functioning just explained with respect to the pd (see also minnameier, 2018). the pd is one example of what game-theorists call a “cooperation game”. perhaps counterintuitively, cooperation games are the situations in which cooperation typically fails if no institution is established, because the pareto-efficient point is not a nash-equilibrium. this formal structure of cooperation games is illustrated in figure 4. figure 4: the basic structure of cooperation games. the simplest version of such a game is the situation that thomas hobbes models in the leviathan: based on the “law of nature” according to which “every man has right to everything” (1651/2001, p. 65 [chap. 15, §2]) a “war of every one against every one“ (1651/2001, p. 59 [chap. 14, §4]) is thought to ensue. this marks the very beginning of morality, where, e.g., children get into conflict about some resources like food or toys and they compete in trying to appropriate things. what they have to learn is to mutually respect property or rights of use (e.g. that the one who had it first has the right to use a certain item). however, before they learn this, they have to experience the social trap (i.e., the inefficient nash equilibrium) they reach in the bellum omnium contra omnes, for this is the problem that calls for innovation. at the same time, this problem allows them to take the perspective of the other individual who has the opposite point of view. both “players” understand that when they win something, the other one loses it (win-lose), and vice versa (lose-win). and eventually they learn that their conjoint activity produces a lose-lose-result. establishing the moral norm allows them to move into the win-win-zone. since it applies to a cooperation game, a moral norm – as an “institution” – has to go with the possibility of sanctions. in the simplest case of morality in narrow social relationships, the agents sympathise with each other, want to make and keep friends and to win each other’s affections. therefore, signs of fondness and attachment, like smiles and so on, function as positive sanctions, whereas repudiation and anger function as negative sanctions. if the sanctions work this way in a social relationship, the payoff matrix is changed accordingly (see figure 5). figure 5: payoff matrix with premiums (+3) and discounts (-3) for sanctions. if the institution, i.e., the moral rule, is understood by both players, defection is discounted by the costs for spoiling the personal relationship and incurring dislike. conversely, cooperation is enhanced by the returns in affection currency. the most important aspect, however, of this change is that the whole game changes decisively. it is no longer a cooperation game, but a so-called coordination game in which the pareto-efficient combination is also a nash-equilibrium. the main result of this analysis is, therefore, that moral rules allow us to turn cooperation games into coordination games, and in this very sense they become self-enforcing, as long as the players understand them and as long as the sanctions work. the latter explains, why morality frequently vanishes in situations characterised by anonymity and social distance (see e.g., dana, weber, & kuang, 2007; andreoni & bernheim, 2009). 4.3 moral stages and appropriate sanctions the economic rationale just sketched allows us to construct ever more complex kinds of social interaction and cooperation. this entails a succession of cooperation games that have to be turned into coordination games with the help of moral principles as the regulating institutions. another property of this succession of stages is their dialectic order. if mutual acceptance of property rights marks the beginning of moral reasoning, this means that the perspectives and legitimate claims of different individuals are treated equally, each in their own right. however, these perspectives are kept independent of each other, and the morality of property rights secures and underpins this independence. this principle ceases to fit, if individuals differ in their property rights. if one individual owns a certain item desired by the other(s), a sharing norm is what we need. therefore, we usually claim that one ought to share, at least with friends or with those who are dear to us. the sharing norm relates these perspective reciprocally to each other. however, there is a simple form of sharing, where those who have something have to give to others and have to share in equal parts, usually. and there is an advanced form of sharing, where relevant inter-individual differences (of deservingness in terms of need, effort, and so on) have to be taken into account. in this latter case, equal sharing is considered unjust, and a norm of equitable sharing has to be put in place. this basic triad of types of morality follows from a dialectical approach advocated by the mature piaget in one of his last works (piaget & garcia, 1989), where he calls them “intra”, “inter”, and “trans”, respectively. empirical evidence is available today, e.g., from experiments with young children between three and six years of age (paulus & moore, 2014). another tenet of piaget’s final version of the mind’s architecture is that the “trans”-stage forms a complex unity which however can break up into a set of trans-stages as a further morally relevant aspect enters the scene that cannot be integrated into the present trans-stage. and so a higher level with a new stage triad opens up. in this very sense, the early morality, which is based on sympathy and altruism, gives way to new and higher form of morality relating to the cases where helping someone else is futile and goes against one’s own interest. in early forms of morality, helping others is never in contradiction with one’s own interest, i.e., one enjoys helping others and caring for them. however, if you ought to care for a competitor in business or in sport, or where someone, say, wants you to go for a walk together, when you yourself prefer to stay at home, this marks a true clash of interests. according to kohlberg, the morality of conflicting interests is captured by his stage 2 (colby & kohlberg, 1987). and since kohlberg had also invented the idea of sub-stages “a” and “b” for every stage – that he later changed into “types” 11 – i label the first triad as stages 1a, 1b, and 1c, and the succeeding triad stage 2a, 2b, and 2c, and so on. in this so-called neo-kohlbergian framework we find 9 stages altogether (see minnameier, 2014). they cannot be explained in detail here, but the first three of them are briefly described in table 2, together with the sanctions that work in its respective context. 12 table 2 moral principles and sanctioning potentials for stages 1 to 3 since stage 1 is based on sympathy, sanctions are imposed in terms of affection (positive) and dislike (negative). this is the code in which moral discourse takes place at this stage (or the “moral currency” of stage 1). if moral discourse in terms of this stage – or any other stage – fails because of unwillingness or inability, one can always back out of “the game”, shift to a lower stage and thus play a different game. for instance, if others simply wouldn’t stop taking things away from me or using and possibly spoiling my property illicitly, i will have to fight back in some way. at stage 1 this always relates to the people with whom one wants or has to keep up (like parents, mates and so on). thus, the extended punishment at stage 1a means to shift to stage 0, i.e., to defend oneself by taking vengeance (this is the hobbesian state of nature). similarly, at stage 1b one can ultimately stop sharing, so that everybody sticks with what they have (which is the principle of stage 1a), and at stage 1c one can always revert to strict reciprocity rather than an equitable or caring interaction. at stage 2 the legitimacy of interests and the conflicts of interest that arise are the core problem. this stage applies to situations in which the agents are mutually disinterested, either with respect to the person of the other or with respect to a specific activity. thus, while siblings are generally quite interested in each other and are involved in a close relationship, they may none the less have diverging interests in terms of leisure time activities, and then it is legitimate for them to go their own sweet ways, as it were. accordingly, a conflict may arise, when one, e.g., wants to listen to loud music while the other has to prepare for an exam (the same of course applies to students sharing a flat). if no ways can be found in which each one can pursue their interests without encroaching upon the others’, one has to go separate ways. this is meant by “suspension” or “separation” as the extended punishment at stage 2a, where the agents are relegated to contexts, where they do not interfere anymore with each other and only deal with those with whom one gets on well and wants to affiliate. stages 2b and 2c provide forms of coordinating in conflicts of interest. mutual promises at stage 2b agents with different interests to strike deals so that everybody’s interest is furthered. any ordinary deal is an example. the pd is a situation in which the agents would like to strike such a deal to overcome the dilemma. but the rules of this game preclude this. therefore, agents have to revert to stage 2a and simply pursue their own interest (by defecting). in real life this can be a form of punishing those who go back on their promises. in the pd it is morally just to act selfishly, because the conflict of interest cannot be resolved. stage 2b requires that both parties have something to trade. stage 2c goes beyond this, because here one makes a contract with oneself, determining what you would have yourself do, if you were the other person. however, this requires trustful relationships and that the favours you offer are paid back in case the roles were reversed some other time when you are in need of assistance or so. in case of violation of this principle one could revert to 2b and demand immediate compensation for any service to be given. at stage 3a individual interests are merged into group interests. in a way, stage 2c implies that one tries to please everybody and coordinate diverging individual interests in this way. however, there are situations in which this is impossible, e.g., if one works for a company where pleasing customers of suppliers too much means to spoil the company’s business. at this point it is important to think in terms of social units like companies, departments, families and clans, or peer groups and teams (in sport or at work). stage 3a relates to these social units and the roles one takes on in these contexts. role-related reputation is what one can gain, and disrepute may be the price one has to pay, if one fails to fulfil one’s tasks. in the limiting case, one is excluded from the group – either literally, e.g., by being laid off from a firm, or in the sense that group cohesion breaks apart. in the latter case, one can still treat each other with respect and rely on each other in terms of stage 2c, but not in terms of role-related duties and commitments. stage 3b concerns inter-group relationships, with customs and generalised expectations as moral principles. examples are fairness rules in sport, social “conventions” like what kinds of behaviour are expected from superiors and subordinates, honest practices in commercial relations (e.g., whether a handshake is an obligation or not). where there are diverging views among honest people – i.e., “honest” in the sense of stage 3b – about what is decent and what not, an authority will have to decide, typically the leader or leading authority of group, which can be a political community, a company or any other kind of organisation. that there is a leader or an authority who knows (or must know) what is right and may legitimately decide, is the core of stage 3c. at this stage, the verdict of an authority is believed to be sacrosanct. at this point i end the description of moral stages. however, this is not to be mistaken as the endpoint. an authority as just described has to be legitimate, and what follows the acceptance of some kind of leadership is the question how we can determine whether certain laws, rules and forms of government are legitimate or not. this is the proper field of ethics that we would then enter. altogether, the neo-kohlbergian taxonomy of stages comprises 9 stages (minnameier, 2000; 2001; 2005). finally, turning back to the three stages just described and to the way the sanctions work, i would like to point out that there are always two kinds of punishment available (see table 2): one is punishment within an institution, which relates to the proper meaning of the moral principle and reclaims that it be heeded. if the other does not play according to these rules, one can still revert to a lower stage and consequently play another moral game at this lower level. this precludes self-exploitation in situations where certain moral rules cannot be implemented for some reason. 5. conclusion hvp does not seem to be a real problem. when people illegitimately act in selfish ways, well-functioning social systems should be able to counter such behaviour by appropriate sanctions. and if these sanctions fail to work effectively, we have to adapt our tools, in particular by playing different (lower-stage) moral games with other kinds of sanctions (as explained in section 5.3). conversely, however, we will not solve such problems by trying to increase moral motivation in the sense criticised in this article (least of all among moral transgressors). there are situations, where “happy victimizing” actually seems to be an appropriate behavioural orientation, in particular in strictly competitive situations, as when applying for a job or competing over a large order in business. in such situations, it is morally mandatory to pursue one’s self-interest (as long as one is playing fair). the same is true for situations in which contracts are either not fulfilled or not feasible (like e. g., in the pd). acknowledgements i wish to thank two anonymous reviewers who have read the paper very thoroughly and carefully. they have helped me to improve it and correct remaining errors. keypoints the classical (morally externalistic) explanation based on moral motivation has been criticised and confronted with an internalistic alternative that incorporates a theory of inferential reasoning and a reason-based theory of rational choice. the reason-based approach and moral functioning can also be interpreted in terms of game theory, where moral principles (or preferences) become preferences for games. empirical evidence supports the reason-based approach, which is then extended to an overall theory of moral principles as institutions in the institutional-economic sense. one main outcome of this analysis is that morality requires positive and negative sanctioning mechanisms that must be operative, if specific types of morality are to be upheld in specific contexts. this new approach also revives kohlbergian moral theory and leads to a neo-kohlbergian theory of moral reasoning and moral functioning. footnotes 1 as a world-wide citation standard, the two numbers refer to the volume and the page in the famous “akademie-textausgabe”. 2 brickhouse and smith show that, contrary to the received view, socratic moral psychology is not naïvely cognitivistic, but more in line with what it is claimed in the present contribution. 3 this is an induction, because it is not just a formal deductive judgement based on deductive premises, but determines a belief about what is feasible and effective in real world interactions. furthermore, this belief includes the insight that the golden rule not only fails in the present situation, but would equally fail in all situations of the same kind as the present one. in this sense, any induction, even if it actually only addresses a single situation, is always generalizable, in principle. and this applies to positive (confirming) as well as negative (disconfirming) inductions. 4 note that already in gary s. becker’s economic approach to human behaviour (1976; 1993), preferences are conceived as very fundamental. however, he uses rational choice theory to explain any kind of (stable) human behaviour, i.e., he uses the principle of rationality as an explanatory tool, so that clearly nothing remains as irrational, since what is irrational on this account, is simply not rationalisable. weakness of will, however, is part of an explanation, and then this weakness is interpreted as a constraint on the agent’s actions, which rationalises even choices that run counter to the agent’s intentions. 5 we have a total of 587 participants in the study. however, 78 of them study something other than the three groups discussed here of have failed to specify the study programme. 28 have not completed the questionnaire and were therefore excluded from the analysis. 6 while the proportion of male and female participants is fairly equilibrated for economists (50/50 percent) and b&e education (42/58 percent), it is not for teacher students (21/79 percent). this study does not focus on the gender aspect. none the less, we control for gender in the subsequent analyses of moral orientations. in terms of age the differences are small (economists: 22.42 years; b&e education: 24.83; teacher students: 21.21 years) but significant (according to the kruskal-wallis-test, since variances are heterogeneous). 7 in the analyses on gender differences, only 476 cases are included, because the remaining five have failed to indicate their gender. 8 since both kinds of reasoning complemented each other, we have taken them together as expressing their moral judgement. 9 if one player does not understand the rules, the game is not played (at least not with this player), because a game implies that the players know the rules. if they don’t, they may still play a game, but a different one. 10 of course, self-signalling is always possible, and this may be the mechanism that makes some people cooperate, even after having gained experience with the pd (ledyard, 1995). some take this as a strong moral self (blasi, 1984; 1995; bergman, 2002; 2004; krettenauer, 2013). however, since this is tantamount to self-exploitation, one is certainly not morally obliged to cooperate in the face of (a high risk of) others defecting, especially since such defection cannot be penalised in any way. hence, whether one should engage in one-tailes cooperation is a question of personal value and prudential reasoning, but not a moral issue in the strict sense, even though we often associate it with morality. 11 kohlberg first introduced these forms as „sub-stages“ (see e.g. 1984), but later treated them as mere „types“, because he noticed anomalies in the developmental sequence (colby & kohlberg, 1987). from the point of view of the neo-kohlbergian taxonomy, however, these anomalies are integrated and therefore constitute no systematic problem anymore. 12 neo-kohlbergian stages roughly conform to kohlberg’s original stages (at least with respect to stages 1 through 5). however, there are a few important differences with respect to particular substages. for instance, the “golden rule” integrates conflicts of interests and identified as stage 2c in the neo-kohlbergian framework, while kohlberg takes it as a form of stage 3. 13 kohlberg associated the golden rule with stage 3. however, here he seems mistaken, since the golden rule applies to balance individual interests, whereas – at least in the neo-kohlbergian framework – stage 3 is based on social units to which individual interests are merged. in this sense any stage 3 morality differs sharply from the golden rule, which, however, is integrated in this higher order of hierarchical complexity. 14some separate strictly between moral issues and conventional issues (most prominently turiel, 1983; 2002; nucci, 2008). with respect to the systematic differentiation between “norms” and “conventions” in game theory (see above), i fully endorse this. however, social conventions can also function as norms in the sense and in the contexts discussed here. for further discussions of conventions functioning as norms see bicchieri (2006) and sugden (2010, where the latter explicitly discusses turiel’s and nucci’s approach). references ameriks, k. (2006). kant and motivational externalism. in h. f. klemme, m kühn & d. schönecker (eds.), moralische motivation: kant und die alternativen (pp. 3-22), hamburg: meiner. andreoni j., & bernheim, d. b. (2009). social image and the 50–50 norm: a theoretical and experimental analysis of audience effects. econometrica, 77, 1607–1636. https://doi.org/10.3982/ecta7384 ariely, d. (2012). the (honest) truth about dishonesty: how we lie to everyone – especially ourselves. new york: harper collins. arsenio, w. f., gold, j., & adams, e. (2006). children’s conceptions and displays of moral emotion. in: m. killen/j. g. smetana (eds.). handbook of moral development (pp. 581-609). mahwah, nj: erlbaum. batson, c. d., thomson, e. r., & chen, h. (2002). moral hypocrisy: addressing some alternatives. journal of personality and social psychology, 83, 330-339. https://doi.org/10.1037/0022-3514.83.2.330 batson, c. d., thomson, e. r., seuferling, g., whitney, h, & strongman, j. a. (1999). moral hypocrisy: appearing moral to oneself without being so. journal of personality and social psychology, 77, 525-537. https://doi.org/10.1037/0022-3514.77.3.525 becker, g. s. (1976). the economic approach to human behavior. chicago, il: university of chicago press. becker, g. s. (1993). nobel lecture: the economic way of looking at behavior. journal of political economy, 101, 385–409. bergman, r. (2002). why be moral? a conceptual model from developmental psychology. human development, 45, 104-124. bergman, r. (2004). identity as motivation: toward a theory of the moral self. in d. k. lapsley & d. narvaez (eds.), moral development, self, and identity (pp. 21-46), mahwah, nj: lawrence erlbaum associates. bicchieri, c. (2006). the grammar of society: the nature and dynamics of social norms. cambridge, uk: cambridge university press. binmore, k. (2009). rational decisions. princeton: princeton university press. binmore, k. (2010). game theory and institutions. journal of comparative economics, 38, 245-252. https://doi.org/10.1016/j.jce.2010.07.003 blasi, a. (1984). moral identity: its role in moral functioning. in w. m. kurtinez & j. l. gewirtz (eds.), morality, moral behavior, and moral development (pp. 128-139). new york: wiley. blasi, a. (1995). moral understanding and the moral personality: the process of moral integration. in w. m. kurtinez & j. l. gewirtz (eds.), moral development: an introduction (pp. 229-253). boston: allyn and bacon. brickhouse, t. c., & smith, n. d. (2010 ). socratic moral psychology. cambridge: cambridge university press. brink, d. o. (1997). moral motivation. ethics, 108, 4-32. carter, j. r., & irons, m. (1991). are economists different, and if so, why? journal of economic perspectives, 5(2), 171–177. colby, a., & kohlberg, l. (1987). the measurement of moral judgment, vol. i: theoretical foundations and research validation. cambridge, ma: cambridge university press. dana, j., weber, r. a., & kuang, j. x. (2007). exploiting moral wiggle room: experiments demonstrating an illusory preference for fairness. economic theory, 33, 67–80. doi: 10.1007/s00199-006-0153-z dietrich, f., & list, d. (2013a). a reason-based theory of rational choice. noûs, 47, 104-134. https://doi.org/10.1111/j.1468-0068.2011.00840.x dietrich, f., & list, d. (2013b). where do preferences come from? international journal of game theory, 42, 613–637. doi: 10.1007/s00182-012-0333-y foot, p. (1972). morality as a system of hypothetical imperatives. philosophical review, 81, 305-316. doi: 10.2307/2184328 frank, r. h., gilovich, t., & regan, d. t. (1993). does studying economics inhibit cooperation? journal of economic perspectives, 7 (2), 159-171. frank, r. h., gilovich, t., & regan, d. t. (1996). do economists make bad citizens? journal of economic perspectives, 10(1), 187-192. doi: 10.1257/jep.10.1.187 frey, b. s. (1986). economists favour the price system. who else does? kyklos, 39(4), 537–563. https://doi.org/10.1111/j.1467-6435.1986.tb00677.x frey, b. s., & meier, s. (2003). are political economists selfish and indoctrinated? evidence from a natural experiment. economic inquiry, 41, 448-462. https://doi.org/10.1093/ei/cbg020 gul, f., & pesendorfer, w. (2008). the case for mindless economics. in a. caplin & a. schotter (eds.), t he foundations of positive and normative economics: a handbook (pp. 3-39). oxford: oxford university press. heinrichs, k., minnameier, g., gutzwiller-helfenfinger, e., & latzko, b. (2015). „don’t worry, be happy“? – das happy-victimizer-phänomen im berufsund wirtschaftspädagogischen kontext. zeitschrift für berufs und wirtschaftspädagogik, 111, 31-55 . hermkes, r. (2016). perception, abduction, and tacit inference. in l. magnani & c. casadio (eds.), model-based reasoning in science and technology – logical, epistemological, and cognitive issues (pp. 399-418). heidelberg: springer. hobbes, t. (1651/2001). leviathan. south bend, in: infomotions. kant, i. (2002/1785). groundwork for the metaphysics of morals (ed. and transl. by a. w. wood). new haven, ct: yale university press. keller, m., lourenço, o., malti, t., & saalbach, h. (2003). the multifaceted phenomenon of „happy victimizers“: a cross-cultural comparison of moral emotions, british journal of developmental psychology, 21, 1-18. doi: 10.1348/026151003321164582 killen, m., & smetana, j. g. (2015). origins and development of morality. in m. e. lamb (ed.), handbook of child psychology and developmental science, vol. 3 (7th ed.; pp. 701-749). ny: wiley-blackwell. https://doi.org/10.1002/9781118963418.childpsy317 kohlberg, l. (1984). essays on moral development, vol. 2: the psychology of moral development . san francisco, ca: harper & row. krebs, d. l., & denton, k. (2005). toward a more pragmatic approach to morality: a critical evaluation of kohlberg’s model, psychological review, 112, 629-649. https://doi.org/10.1037/0033-295x.112.3.629 krettenauer, t. (2013). moral motivation, responsibility and the development of the moral self. in f. oser, k. heinrichs & t. lovat (eds.), handbook of moral motivation: theories, models, applications. (pp. 215-228). rotterdam: sense. krettenauer, t., malti, t., & sokol, b. w. (2008). the development of moral emotion expectancies and the happy victimizer phenomenon: a critical review of theory and application, european journal of developmental science, 2, 221-235. doi: 10.3233/dev-2008-2303 ledyard, j. (1995). public goods: a survey of experimental research. in j. kagel & a. roth (eds.), handbook of experimental economics (pp. 253–279). princeton: princeton university press. malti, t., & krettenauer, t. (2013). the relation of moral emotion attributions to prosocial and antisocial behavior: a meta-analysis. child development, 84, 397-412. doi: 10.1111/j.1467-8624.2012.01851.x marwell, g., & ames, r. (1981). economists free ride, does anyone else? journal of public economics, 15(3), 295–310. minnameier, g. (2000). strukturgenese moralischen denkens eine rekonstruktion der piagetschen entwicklungslogik und ihre moraltheoretischen folgen . münster: waxmann. minnameier, g. (2001). a new stairway to moral heaven – a systematic reconstruction of stages of moral thinking based on a piagetian 'logic' of cognitive development. journal of moral education , 30, 317-337. https://doi.org/10.1080/03057240120094823 minnameier, g. (2005). developmental progress in ancient greek ethics. european journal of developmental psychology, 2, 71-99. https://doi.org/10.1080/17405620444000274a minnameier, g. (2010). the problem of moral motivation and the happy victimizer phenomenon – killing two birds with one stone . new directions for child and adolescent development, 129, 55-75 . https://doi.org/10.1002/cd.275 minnameier, g. (2012). a cognitive approach to the ‘happy victimiser’. journal of moral education, 41, 491-508. https://doi.org/10.1080/03057240.2012.700893 minnameier, g. (2013). deontic and responsibility judgments: an inferential analysis. in f. oser, k. heinrichs & t. lovat (eds.), handbook of moral motivation: theories, models, applications. (pp. 69-82). rotterdam: sense. minnameier, g. (2014). moral aspects of professions and professional practice. in s. billet, c. harteis & h. gruber (eds.), international handbook of research in professional and practice-based learning (pp. 57-77). berlin: springer. minnameier, g. (2016a). rationalität und moralität – zum systematischen ort der moral im kontext von präferenzen und restriktionen. zeitschrift für wirtschaftsund unternehmens¬ethik, 17, 259-285. minnameier, g. (2016b). abduction, selection, and selective abduction. in l. magnani & c. casadio (eds.), model-based reasoning in science and technology – logical, epistemological, and cognitive issues (pp. 309-318). heidelberg: springer. minnameier, g. (2017). forms of abduction and an inferential taxonomy. in l. magnani & t. bertolotti (eds.),springer handbook of model-based reasoning (pp. 175-195). berlin: springer. minnameier, g. (2018). reconciling morality and rationality – positive learning in the moral domain. in o. zlatkin-troitschanskaia, g. wittum & a. dengel (eds.), positive learning in the age of information (plato) a blessing or a curse? (pp. 347-361). wiesbaden: springer vs. minnameier, g., & schmidt, s. (2013). situational moral adjustment and the happy victimizer. european journal of developmental psychology, 10, 253-268. doi: 10.1080/17405629.2013.765797 minnameier, g., beck, k., heinrichs, k., & parche-kawik, k. (1999). homogeneity of moral judgement? apprentices solving business conflicts. journal of moral education , 28, 429-443. https://doi.org/10.1080/030572499102990 minnameier, g., heinrichs, k., & kirschbaum, f. (2016). sozialkompetenz als moralkompetenz – theoretische und empirische analysen. zeitschrift für berufsund wirtschaftspädagogik, 112, 636-666. nucci, l. (2008). social cognitive domain theory and moral education. in l. nucci, & d. narvaez (eds.), handbook of moral development and character education (pp. 291–309). oxford: routledge. nunner-winkler, g. (1999). development of moral understanding and moral motivation. in f. e. weinert & w. schneider (eds.), individual development from 3 to 12 (pp. 253–292). cambridge, uk: cambridge university press. nunner-winkler, g. (2007). development of moral motivation from childhood to early adulthood. journal of moral education, 36, 399-414. https://doi.org/10.1080/03057240701687970 nunner-winkler, g. (2013). moral motivation and the happy victimizer phenomenon. in f. oser, k. heinrichs & t. lovat (eds.), handbook of moral motivation: theories, models, applications. (pp. 267-288). rotterdam: sense. nunner-winkler, g., & sodian, b. (1988). children’s understanding of moral emotions, child development, 59, 1323-1338. doi: 10.2307/1130495 paulus, m. (2014). the emergence of prosocial behavior: why do infants and toddlers help, comfort, and share? child development perspectives, 8, 77-81. https://doi.org/10.1111/cdep.12066 paulus, m., & moore, c. (2014). the development of recipient-dependent sharing behaviour and sharing expectations in preschool children. developmental psychology, 50, 914-921. https://doi.org/10.1037/a0034169 piaget, j., & garcia, r. (1989). psychogenesis and the history of science. new york: columbia university press. rai, t. s., & fiske, a. p. (2011). moral psychology is relationship regulation: moral motives for unity, hierarchy, equality, and proportionality. psychological review, 118, 57-75. https://doi.org/10.1037/a0021867 rest, j. r. (1984). the major components of morality. in w. m. kurtinez & j. l. gewirtz (eds.), morality, moral behavior, and moral development (pp. 24-38). new york: wiley. rosati, c. s. (2016). moral motivation. in e. n. zalta (ed.), the stanford encyclopedia of philosophy, url = . rubinstein, a. (2006). a sceptic’s comment on the study of economics. economic journal, 116, c1-c9. https://doi.org/10.1111/j.1468-0297.2006.01071.x rustichini, a., & villeval, m. c. (2014). moral hypocrisy, power, and social preferences. journal of economic behavior & organization, 107, 10-24. https://doi.org/10.1016/j.jebo.2014.08.002 samuelson, p. a. (1938). a note on the pure theory of consumer’s behaviour. economica, 5, 61–71. doi: 10.2307/2548836 samuelson, p. a. (1948). consumption theory in terms of revealed preference. economica, 15, 243–253. 10.2307/2549561 selten, r., & ockenfels, a. (1998). an experimental solidarity game. journal of economic behavior & organization 34 (4), 517-539. https://doi.org/10.1016/s0167-2681(97)00107-8 smith, m. (1994/2005). the moral problem. oxford: blackwell. sugden, r. (2010). is there a distinction between morality and convention? in m. baurmann, g. brennan, r. e. goodin & n. southwood (eds.), norms and values: the role of social norms as instruments of value realization (pp. 47-65). baden-baden: nomos. thoma, s. j., & bebeau, m. j. (2013). moral motivation and the four component model. in f. oser, k. heinrichs & t. lovat (eds.), handbook of moral motivation: theories, models, applications. (pp. 49-68). rotterdam: sense. turiel, e. (1983). the development of social knowledge: morality and convention. new york: cambridge university press. turiel, e. (2002). the culture of morality: social development, context, and conflict. new york: cambridge universtiy press. warneken, f., & tomasello, m. (2009). the roots of human altruism. british journal of psychology, 100, 455–471. https://doi.org/10.1348/000712608x379061 zangwill, n. (2003). externalist moral motivation. american philosophical quarterly, 40, 143-154. codepen 3heinrichs publication frontline learning research special issue vol.8 no.5 (2020) 24 46 issn 2295-3159 an action-theoretical approach to the happy victimizer pattern – exploring the role of moral disengagement strategies on the way to action karin heinrichsa, tobias kärnerb & hannes reinkec auniversity of education upper austria, austria buniversity of hohenheim, stuttgart, germany cuniversity of bamberg, germany article received 11 june 2018 / revised 3 may 2020 / accepted 7 may / available online 1 july abstract research in moral education demonstrates the pattern referred to as happy victimising (hv) does not emerge only among children. adults also transgress moral rules and might feel good doing so; however, research reveals the hv pattern emergence is context specific. in contrast to findings among young children in whom the hv pattern was interpreted as a lack of motivation and thus a developmental stage, it is an open question as to what happy victimising in adulthood means and how such patterns affect intentions as an important step towards action. this paper offers an action-theoretical approach, allowing for reconstruction of the process of intention formation, as well as a systematic discussion of results from two separate lines of research: (1) research on patterns of moral decision-making, such as the hv, and (2) research on moral disengagement. additionally, a survey study provides insights into what intentions, emotion attributions, and moral disengagement strategies adults display in situations of low moral intensity, and whether they indicate consistent or contradictory patterns across situations. results indicate intra-personal consistency regarding patterns of moral decision-making, but also show there are participants who vary these patterns across situations. moral disengagement strategies were shown to have context-specific use, at least in regard to their subcategories. regarding education, this study encourages not only a focus on strengthening the moral self or autonomous moral judgement but also on paying attention to actions and person-situation interactions. this might be useful to implement environments that support reduced application of moral disengagement strategies. keywords: moral transgression, happy victimizer, moral acting, moral disengagement info corresponding author email: karin.heinrichs@ph-ooe.at doi: https://doi.org/10.14786/flr.v8i5.386 1. introduction the frequency of economic scandals, such as diesel-gate, tax evasion, corruption, fraud, or fake-shops for breathing masks during the covid-19 pandemic indicates that even people who seem to be friendly and empathic at first glance may transgress moral rules almost as frequently as others, depending on the situation, context, or one’s role. at least occasionally, people do not follow conventional rules, moral standards, or principles and, simultaneously, ignore others’ perspectives in favour of fulfilling their own or their companies’ needs. while this may lead to the assumption that moral transgression and self-centeredness are a basic human phenomenona that emerge in adulthood, research also shows that moral education can be successful in developing socio-moral competencies (e.g., lind, 2019; weinberger & frewein, 2019) and creating a moral atmosphere. further, fostering empathy and social perspective-taking helps to increase prosocial behaviour (bandura, 2016; eisenberg, fabes, & spinrad, 2007; malti et al., 2016). from an educational perspective, it seems important to keep searching for effective ways to develop the competencies adults need to effectively deal with morally relevant situations in their everyday (working) lives. in routine as well as odd situations, people have to balance their own interests with those of others. thus, to discuss aims of moral education across the lifespan, results of empirical research should be considered that contribute to explaining how people act, what personal and situational factors determine whether someone acts in line with or contradictory to moral standards, and how such competencies can be developed up to adulthood and beyond. research on moral psychology shows that agents simultaneously know about moral rules and attribute positive emotions in cases of transgression. this pattern of ethical decision-making, here called ‘happy victimising’, was initially detected among children approximately four years old; however, recent research has shown that this pattern also emerges among adolescents and adults (e.g., heinrichs, gutzwiller-helfenfinger, latzko, minnameier, & döring, this issue; heinrichs, minnameier, latzko & gutzwiller-helfenfinger, 2015; nunner-winkler, 2007; 2013; minnameier, heinrichs, & kirschbaum, 2016; minnameier & schmidt, 2013). furthermore, there is evidence of different manifestations of ethical decision-making and emotion attributions, such as the patterns of ‘happy victimizing’ (hv; transgressing moral rules, attributing positive emotions), ‘unhappy victimizing’ (uv; transgressing moral rules, attributing negative emotions), ‘happy moralizing’ (hm; obeying moral rules, attributing positive emotions), and ‘unhappy moralizing’ (um; obeying moral rules, attributing negative emotions) these patterns are at least applied in economically relevant situations, and emerge to varying extents, depending on the measurement methods used (gutzwiller-helfenfinger & latzko, this issue). they also vary intra-personally across situations, and can influence individual actions (döring, 2013; gasser, gutzwiller-helfenfinger, latzko, & malti, 2013). thus, these patterns of ethical decision-making might contribute to explaining adults’ deviant behaviours within different social, private, and work-related contexts. however, there is a lack of empirical evidence and theoretical foundations on how internal patterns of decision-making, moral judgments, and emotion attributions can be modelled to determine the processes of action in morally (and economically) relevant situations. this is important to study across different developmental stages; however, this article focuses on adults’ patterns of moral decision-making and emotion attributions as indicators of the valence of intentions, and thus as predictors for actions. to bridge the gap between judgment and action, we applied the process model of judging and acting (heinrichs, 2005). this model provides a theoretical framework to gain deeper insight into relevant situational and individual determinants of action processes. this model further offers a detailed reconstruction of the action formation process, from interpreting a perceived situation to implementing a behaviour. particularly, it provides ideas on relevant steps in the first phase of acting, which concludes with intention formation. referring to this process model, one obstacle to forming a high-valence intention is a lack of self-commitment to one’s preferred way of behaving. this lack may often become apparent in situations with conflicting values or goals, or when an individual perceives ambivalence in cognitive or emotional states, and often appears in uv and um patterns. to overcome this lack of commitment and form a high-valence intention, the process model posits that people use cognitive control strategies (heinrichs, 2005). to specify what cognitive control strategies might be helpful in morally relevant situations where the agent has to choose between transgressing against or obeying a moral rule, it is necessary to refer to further research. therefore, we refer to bandura`s concept of ‘moral disengagement’ strategies (mds; bandura, 1990, 2016). bandura and colleagues suggest (volitional) control strategies, and indicate that these strategies deactivate self-sanctions that would normally support moral action. individuals using mds can thus make a choice other than the ‘moral’ course of action, as these self-regulation strategies enable agents to reach a state of emotional well-being while causing negative (severe) consequences for others (osofsky, bandura, & zimbardo, 2005). in specifying the role of mds in the process of forming an intention, however, it remains questionable whether and to what extent people really apply mds in certain situations; when they decide for or against a moral transgression; and whether they feel committed to choosing ‘victimising’ or ‘moral behaviours’. focusing on mds in particular situations seems important, as moral reasoning, moral emotions, and patterns of moral decision-making vary intra-personally across situations, and can therefore be considered results of person-situation interactions. therefore, it might be fruitful to study how people use mds within the context of moral transgressions. bandura and colleagues mainly studied mds as individual tendencies across situations; thus, they looked for intra-personal consistency. contrastingly, we focused on the application of mds in particular situations during the action process, particularly during the first step towards acting: the sub-process of forming an intention. we aimed to provide deeper insights into whether individuals who intend to make moral transgressions (uv, hv) also apply mds. if they use mds in terms of (moral) reasoning for their preferences, in line with the process model, it could be assumed that mds might have functioned during the process of forming intentions towards a preference of moral transgression, increasing the level of commitment, and the probability of acting in line with one’s intention. thus, this paper provides theoretical ideas and the first empirical data on the hv pattern in adulthood, exploring mds use in morally relevant situations. theoretically, the presented study refers to an action-based approach (heinrichs, 2005) that allows for the presentation of theoretical ideas on how patterns of moral decision-making and emotion attributions (hv, uv, hm, um), as well as mds, may affect intentions in morally relevant situations, and specifically in situations that provoke decisions to follow or break a moral rule. the results of this questionnaire study among students can provide insight in differences in intentions, attributed emotions, and frequency, as well as qualities and intra-personal differences, of mds use across morally relevant situations in a work-related context. the paper is structured as follows. section 2.1 provides basic information on the state of research in terms of empirical findings on patterns of moral decision-making among adults. in this context, a rationale is provided for why this study focuses on situations of low moral intensity that are assumed to provoke victimisation at a higher rate than dilemmas and that are omnipresent in everyday (working) life. section 2.2 explicates basic assumptions of an action-theoretical approach to reconstruct patterns of moral decision-making as potential intentions to act in morally relevant situations. we therefore chose situations in consumer and business contexts, as adolescents and adults are familiar with these situations in everyday or working life. furthermore, such situations might have the potential to elucidate inner conflicts related to transgressing against moral rules in favour of economic or personal interests. section 2.3 explains theoretical foundations and empirical findings related to mds and posits that mds may represent cognitive strategies that can be linked to intentions to obey or transgress against a moral rule. based on this theoretical foundation, a survey study was conducted to explore whether and to what extent students apply patterns of moral decision-making and emotion attributions across situations, as well as whether and to what extent students use mds when choosing whether to victimise others (sections 3 and 4). finally, implications for further research on decision-making, behaviour, and disengagement in morally relevant situations and implications for moral education are discussed (section 5). 2. theory 2.1 happy victimising in adulthood the developmental psychologists nunner-winkler and sodian (1988) coined the term ‘happy victimizer phenomenon’, and explained it as a lack of moral motivation at an early stage of moral development. as the rate of people displaying this ‘happy victimizer phenomenon’ decreases in later age groups (from eight years onwards), it was assumed that this pattern is caused by the absence of a link between cognition and emotion, and therefore might be overcome during later stages of moral development (krettenauer, malti, & sokol, 2008; nunner-winkler, 1993). however, further studies revealed that such patterns also emerged to a considerable extent in adolescence (döring, 2013; heinrichs et al., this issue) and even in adulthood (heinrichs et al., 2015; nunner-winkler, 2007; 2013; minnameier, heinrichs, & kirschbaum, 2016; minnameier & schmidt, 2013). however, empirical studies have revealed that these patterns of moral decision-making in adulthood do not characterise a person-specific method of moral judgment, applied consistently across situations, with hardly any exceptions, as was assumed in the ‘happy victimizer phenomenon’ of early childhood. moreover, in adulthood, these patterns are described as varying intra-personally across situations, to an extent that had not been expected based on previous theoretical assumptions. simultaneously, recent findings indicate a significant small or medium effect that still points to personal preferences towards patterns of moral decision-making (heinrichs, et al., this issue; malti & krettenauer, 2013). thus, patterns of moral decision-making seem to result from person-situation interaction, but are more affected by situational determinants than developmental psychology had assumed. moreover, empirical research in moral and developmental psychology has previously confirmed that the proportion of moral decisions reflecting the hv pattern varies depending on situational cues, particularly on the degree of the moral conflict, or whether people are encouraged to make judgments using a selfor others’ perspective (keller, lourenço, malti, & saalbach, 2003; nunner-winkler, 2013; malti & krettenauer, 2013). however, these situational variations were mostly discussed as being dependent on measurement methods (heinrichs et al., 2015; this issue; gutzwiller-helfenfinger & latzko, this issue; nunner-winkler, 2013). following empirical results on intra-personal variations in moral decision-making, in this paper, we do not use the term ‘happy victimizer phenomenon’, as previously described as emerging in earlier stages of childhood development. moreover, we differentiate ‘patterns’ of moral decision-making and emotion attributions (see above; heinrichs et al., this issue). it has been assumed that hv—as well as uv, hm, and um—displays intra-personal variations in moral decision-making and managing moral emotions across situations. however, until now, there has been no satisfying empirical evidence supporting this, but rather a need for research on the personal and situational determinants that trigger the use or intra-personal change in these patterns. furthermore, there is also a lack of theoretical approaches to explain hv in adulthood. however, there is evidence that these patterns are important insofar as they are linked to individual actions. empirical findings confirm that hv is a relevant pattern in the context of bullying (gasser et al., 2013), and characterises bullies or bully victims. moreover, the hv pattern is related to deviant adolescent behaviour (döring, 2013). research on counterproductive behaviour in organisational contexts indicates the relevance of individual moral values, judgments, and moral sensibility (moore, detert, treviño, baker, & mayer, 2012). thus, patterns of moral decision-making, such as hv, uv, hm, and um, might further contribute to explaining deviant behaviours among adults, within different social, private, and work-related contexts. therefore, they may also be relevant from the perspectives of moral education, vocational education, human resource development, and organisational behaviour. 2.2 an action-based approach to moral decision-making this paper primarily contributes to theoretical progress in explaining determinants of the action process in morally relevant situations and, in particular, determinants of moral intentions. therefore, two lines of research are linked to each other: research on patterns of moral decision-making, like the hv pattern, and research on mds. the process model of acting (heinrichs, 2005) functions as a theoretical framework which can be used to reconstruct and specify the action processes, from constituting a situation to forming an intention, implementation, conduct, and evaluation. this model was developed as a theoretical framework integrating esser’s (1996) social psychological model of ‘definition of the situation’ and heckhausen’s rubikon model (gollwitzer, 1996; heckhausen, gollwitzer, & weinert, 1987), and is based on a set of assumptions. it allows for analysing the interaction between personal and situational determinants on the way from perceiving selected situational cues to forming an intention and behaviour (see figure 1). thus, in line with approaches to moral judgment and action in the post-kohlbergian tradition, it does not focus on the development of personal determinants, but points to applying patterns of decision-making, reasoning, or acting to morally relevant situations and contexts (krebs & denton, 2005; lapsley & narvaez, 2005; for a summary, see also heinrichs, 2010). figure 1. the process model of acting: from constituting a situation to forming an intention (heinrichs, 2005) the model’s basic assumptions are as follows (heinrichs, 2005): an action is determined by an initial situation constituted by the individual. if he or she experiences a difference between is and ought in respect to moral issues, he or she perceives a morally relevant ‘problem’1 . experiencing such a problem motivates further judgments and actions, and marks the important starting point of the action process. insofar as actions are determined by a situation subjectively constituted at the beginning of the process, the constituted situation is determined by personal and situational conditions. the action process is reconstructed and theoretically divided into four phases: (1) forming an intention, (2) planning, (3) implementation/conduct, and (4) evaluation. the central output of the first phase is an intention. the agent is willing and feels committed to realising and achieving an aim. this means he or she has made a decision towards aims or actions. sometimes, the individual already has the aim linked to a concrete action plan; otherwise, the concrete action will have to be specified during the implementation phase. this intention might be formed if the agent has perceived a problem, if he or she feels confident he or she can find a solution (at least in the future), if he or she is motivated to contribute to solving the problem, and, moreover, if a status of self-commitment was developed and expresses the volitional power to overcome barriers during the implementation phase (see figure 1). this means that the intention, as a measurable product of inner processes, has to be connected to the status of commitment and to be of notable valence. regarding patterns of moral decision-making (hv, uv, um, hm), forming an intention is the first of four phases in the action process. if a person has experienced a morally relevant problem (defined as a subjectively perceived gap between is and ought), then this individual might perceive a tension between obeying a moral rule and fulfilling his or her personal needs, and between different aims or actions. he or she might experience an ambivalence between cognitive, emotional, or motivational states, particularly if he or she attributes negative emotions towards his or her preferred way of behaving (uv or um). if the person perceives an inner conflict, he or she might not yet feel committed to one aim or action. to form an intention and progress into action, he or she must then decide among alternatives. referring to action theory, cognitive control strategies, in the sense of volitional strategies, play a major role in increasing commitment (heckhausen, 1987; heinrichs, 2005; 2013; sokolowski, 1993; 1996). volitional strategies support dealing with inner conflict or ambivalence in such a way that, finally, a state of commitment to one out of several possible actions could be achieved. in relation to patterns of moral decision-making, this means that the person feels committed to obeying or transgressing against a moral rule. it is assumed that such a state of self-commitment can only be reached if the individual anticipates being able to cope with upcoming negative consequences or conflicts. as research shows that patterns such as hv emerge depending on the quality of a moral conflict, it can be assumed that there is a need to apply cognitive control strategies across situations that might vary according to the extent of moral intensity. a situation’s moral intensity depends on how an individual perceives elements of reality (see figure 1; for more details on the concept of moral intensity, see jones, 1991). facing some ‘situational’ prompts, individuals may experience (intense) internal conflicts and high moral intensity. this could be expected, for example, in moral dilemmas that studies in moral psychology—especially in the kohlbergian tradition—focused on, and that are supposed to seldomly emerge in everyday life. contrastingly, situations of low moral intensity are assumed to be more frequent. individuals perceive low moral intensity if they do not recognise an internal conflict, or if they quite easily make a decision towards one preferred action. many people may perceive low moral intensity, for example, when transgressing against a moral rule only has (mild) negative consequences such as increased economic costs or treating others slightly unfairly, rather than causing physical or psychological harm. in line with the process model of acting, we may assume that if people experience an inner conflict, forming an intention might be a matter of reflective (vs. intuitive) data processing. in moral conflicts and situations of high moral intensity, an individual might perceive a need for self-commitment and apply cognitive control strategies. conversely, in situations of low to moderate moral intensity, an individual might experience a smaller difference between is and ought. people may display tendencies towards one action or another more easily and intuitively, based on automatic (cognitive and affective) processes of moral motivation (haidt & craig, 2008; heinrichs, 2005; 2013; rothmund & baumert, 2014). however, even in cases of less conscious modes of data processing in which an individual has a clear preference for an action, for example in situations of low moral intensity or situations the individual has faced before, cognitive control strategies are assumed to play an important role in building commitment. taking situations of low intensity into account in research on moral actions may be important from an educational perspective. one possible scenario is that people who choose victimising in situations of low moral intensity may become used to it or even continue to transgress morally, even in situations of moderate or high moral intensity. contrastingly, people who accept victimising in situations of low moral intensity might switch to obeying moral rules in situations of higher moral intensity. however, studying the development of hv and its determinants is an important question for further research and not the aim of this paper. therefore, these considerations encouraged us to study patterns of moral decision-making in situations of low moral intensity as a first step and a matter of moral sensibility (thoma & bebeau, 2013; tirri, 1999). thus, the process model of acting theoretically allows one to specify the role of cognitive control strategies as part of forming an intention. it is also assumed that intentions may differ in valence, that is, in strength of commitment or in their volitional power. further, the volitional power of one’s intentions impacts how barriers need to be overcome during the implementation phase (heckhausen et al., 1987; heinrichs, 2005). however, the process model of acting is first limited to theoretically reconstructing a sequence of input and output of inner sub-processes. admittedly, an individual cannot be conscious of these psychological processes; instead, it is assumed that the individual is at least potentially aware of the content or results of subprocesses, such as intentions, emotion attributions, or cognitive control strategies, applied in a particular situation (nisbett-wilson-thesis; see neuweg, 1999). thus, identifying relevant content or output of subprocesses, as mentioned above, could serve to systematically develop hypotheses concerning the links between them as results of subprocesses of actions in certain situations, such as the thesis that people who decide to victimise others and feel happy may have applied control strategies and deactivated self-sanctions, particularly regarding a specific situation. moreover, the process model of acting does not provide concepts specifying different kinds or qualities of cognitive control strategies; however, self-regulation theory does. the concept of mds (bandura, 1990; 2002; 2016) focuses on mechanisms people apply when choosing non-moral actions, such as transgressing against moral rules or victimising others. thus, in this paper, mds are chosen to specify mechanisms that supposedly support people in decision-making when experiencing internal conflict or ambivalence. applying mds may support the formation of intentions, even to engage in ‘immoral’ behaviours, such as moral transgressions or victimising others. 2.3 moral disengagement strategies during the last two decades, mds have been studied in various contexts (bandura, 2016), including those related to situations of high moral intensity (e.g., mcalister, bandura, & owen, 2006) and lower moral intensity, such as in work-related contexts (moore et al., 2012), sports (boardley & kavussanu, 2007), or connected to leadership issues (detert, treviño, & sweitzer, 2008). empirical findings reveal that mds affect prosocial behaviours and transgressions in childhood, adolescence (bandura, caprara, barbaranelli, pastorelli, & regalia, 2001), and adulthood (detert et al., 2008; fida, paciello, tramontano, fontaine, barbaranelli, & farnese, 2015; moore et al., 2012; osofsky, bandura, & zimbardo, 2005). research reveals that mds as a personal trait impacts ethical and unethical behaviour in a wide range of morally relevant situations—not only in situations of high moral intensity, such as moral dilemmas, but also in situations of lower moral intensity (moore et al., 2012). bandura assumed that “self-sanctions keep conduct in line with internal standards” (bandura, 1990, p. 28). otherwise, “disengagement of moral self-sanctions enables people to compromise their moral standards and still retain their senses of moral integrity” (bandura, 2016, p. 2). he differentiated four loci of mds: locus of the behaviour, agent of action, outcomes of action, and recipients affected by action. each of these loci indicates strategies allowing an individual to ignore moral standards and follow non-moral values (bandura, 2016; osofsky et al., 2005). the findings clearly indicate that mds have the power to specifically explain unethical behaviour. however, mds are mostly measured by using a scale to understand the propensity as a trait or tendency to apply mds in adolescence (bandura et al., 2001), and an adapted version for adults (moore et al., 2012). findings have confirmed individuals’ propensity to use mds as a predictor of unethical behaviour. furthermore, bandura reported that mds was triggered by specific contextual factors (bandura, 2002; moore et al., 2012). nevertheless, there is still a lack of insight into what type of mds are applied in certain situations and whether mds preference varies across situations. according to the process model of acting as explained above, mds might be particularly important in cases of ambivalence or conflicting aims or intentions. additionally, mds may affect forming an intention not only during reflective data processing but is assumed also to function as a filter in information processing when an individual intuitively commits to a non-moral action. an individual might make a decision based on heuristics, habits, or routines in everyday life, especially in cases of lower ambivalence and lower moral intensity, or when he or she does not have the opportunity to reflect, or accepts a suboptimal solution (esser, 1996; heinrichs, 2005). additionally, it can be assumed, in line with the mds approach, that people with a high personal tendency towards mds might develop a set of justifications consistent with their moral self, to manage internal ambivalence and conflicts in these morally relevant situations. mds may function as cognitive control (or volitional) strategies and support forming an intention towards victimising or obeying moral rules. thus, the application of mds used in a particular situation may (sometimes) become visible if people are asked for the reasons behind their preferred actions. 2.4 research questions to summarise, this paper offers theoretical approaches intended to contribute to explaining patterns of moral decision-making, such as the hv pattern, along with the process model of acting (heinrichs, 2005) and the concept of mds (bandura, 2016). the theoretical considerations focus on a procedural perspective of acting, particularly on reconstructing how happy or unhappy people are who intend to break or obey moral rules and, thus, show patterns of intentions with varying valence. it is assumed that patterns of moral decision-making and emotion attributions represent results of processes determined by personal and situational conditions, and may vary interpersonally and situationally. in addition to this theoretical approach to reconstructing the hv pattern, this paper is intended to empirically explore whether situational stimuli of low moral intensity may provoke intrapersonal variation of hv or uv patterns across situations. moreover, it is intended to gain insights and explore whether and to what extent adults apply mds in given situations. the following research questions are addressed: (1) to what extent do adults apply victimising strategies in situations of low moral intensity? (2) do patterns of moral decision-making and emotion attributions among adults vary intra-personally across situations of low intensity? (3) to what extent do adults apply mds to justify victimisation in (different) situations of low moral intensity? 3. method 3.1 data collection and sample characteristics in total, 587 university students from goethe university, frankfurt (germany; n = 201) and the university of bamberg (germany; n = 344) were surveyed using self-report questionnaires. thirty-four students were guest students from other universities and eight students did not report their university affiliation. on average, students had studied in total for 3.3 semesters (sd = 1.8, min. = 1, max. = 12). our sample comprised 213 male and 364 female students (10 students did not provide information regarding gender), with a mean age of 22.3 (sd = 2.9) years. thus, participants were emerging adults. of the observed students, 28.3% were studying to become teachers, 47.4% were studying economics, and 11.6 % were studying business education and educational management. participant filled in a self-report, paper-and-pencil questionnaire. questionnaires were provided in different university courses (e.g., educational psychology in the subject of teacher education studies, basics of scientific work, business ethics); thus, we used convenience sampling. in the questionnaire, students were confronted with descriptions of morally relevant situations. the stimuli used in this study did not focus on extreme moral conflicts or dilemmas, such as the death penalty, but focused on situations of lower moral intensity that emerge in everyday life. negative consequences of victimising others were limited to economic effects, such as high costs or losing money, or neglecting values relevant to social interactions, like honesty, trust, or legality. in the given cases, bodily harm or even death were not focused on as relevant consequences if the rule was disobeyed. moreover, we were interested in whether adults deal with such situations of lower moral intensity using sophisticated heuristics, in line with mds. in response to open-ended questions, the students were asked to make decisions, anticipate their own emotions, and provide reasons for their decisions and emotions; more precisely, we asked for their intentions. this way of capturing the hv pattern has been described from a self-perpetrator perspective as “self-judgments” (keller et al., 2003; yuill, pearson, pearbhoy, & van den ende, 1996; see also heinrichs et al., this issue). patterns of moral decision-making (here representing patterns of intentions) were coded (hv, uv, um, hm). to explore whether cognitive control strategies, for example mds, play a role in the action process, a content analysis of participants’ answers to open-ended questions regarding morally relevant decisions was conducted. the results are based on qualitative data (not the scale of mds). consistent with the concept of moral disengagement, data analysis was limited to participants who decided to transgress. applying mds is assumed to indicate perceived ambivalence, an output of moral decision-making in respect to selected situations. moreover, this operationalisation of ‘applied mds’ indicates that mds play a role in the action process and intention formation, either before committing to a particular action, or afterwards to justify a previously made decision. 3.2 operationalisation of constructs 3.2.1 patterns of moral decision-making to identify the situation-specific patterns of moral decision-making in terms of hm, um, hv, and uv, we used descriptions of two hypothetical situations of low moral intensity. in this study, moral intensity varied only regarding one criterion reported by jones (1991): the quality of the relationship between perpetrator and victim. other criteria that could potentially cause situations to be perceived as differing in moral intensity remained consistent across situations included in the survey. in situation 1 (‘travel costs’: an employee has to decide how to act confronted with the temptation to claim travel costs without having expenses), we chose a legal person (an organisation) as the victim, and in situation 2 (‘change’: a person receives to much change and has to decide to give the money back or not), we chose a natural person as the victim (for further description of the two situations, see the appendix). hm, um, hv, and uv were coded based on participants’ decisions as a relevant part of intentions. participants answered the question ‘what would you do’ (give the money back as the moral strategy vs. keep the money as the victimising strategy; self-judgment perspective). additionally, they rated their corresponding emotional state (‘how would you feel?’) by choosing one of the following options: very good, rather good, rather bad, or very bad. the ratings were re-coded as ‘happy’ (very good and rather good) and ‘unhappy’ (very bad and rather bad). therefore, the hm pattern was operationalised by keeping a moral rule and feeling (very or rather) happy, and the um pattern was defined by keeping a moral rule and feeling (very or rather) bad. furthermore, the hv pattern was defined by violating a moral rule and feeling (very or rather) happy, and the uv pattern was defined by violating a moral rule and feeling (very or rather) bad. 3.2.2 mechanisms of moral disengagement in the context of moral decision-making, participants were asked to provide reasons for their decisions. on that basis, answers to open-ended questions were coded to organise the given reasons via a coding scheme for mechanisms of mds, as adopted from the work of bandura, barbaranelli, caprara, and pastorelli (1996). in line with the idea of mds, only the answers of participants who decided to violate the moral rule (hv or uv) were considered. in total, the coding scheme consisted of the eight mechanisms of mds and a category ‘others’, as described in table 1. the reported coding scheme was the basis for coding participants’ reasons for their decisions. the coding categories were a priori defined theoretically and coding rules were determined, thus constituting the theoretical basis of our analysis (creswell, 2014; schreier, 2012). two independent researchers performed the coding. both were well-trained with sample codes. within a training round, the coders coded the participants' responses. when codes did not match, the respective sense units were discussed and assigned to a category by reaching a consensus. the category system was then further differentiated and validated. thus, the coding procedure was guided by the standard procedure of qualitative content analysis as described mayring (2015). to assess the inter-rater reliability of the coding, 64 reasons given by the participants (33 cases of situation 1 and 31 cases of situation 2), corresponding to almost 23 % of the overall applicable 280 cases, were coded by the two independent coders. cohen's kappa score was 0.613 for the cases using situation 1, and 0.713 for those using situation 2. for all the 64 double-coded cases, cohen’s kappa reached 0.7. therefore, the situation-specific coding, as well the overall coding, showed satisfactory inter-rater reliability that could be classified as “substantial” (range from 0.61 to 0.8), according to the corresponding ranges of kappa with respect to landis and koch (1977). cases coded in category ‘9 others’ were discussed individually by two coders within consensus validation and, if possible, were assigned to one of the categories 1 to 8. table 1 coding scheme for mechanisms of mds 4. results 4.1 variation of patterns of moral decision-making across situations first, analyses were conducted to answer research questions 1 and 2. the results (see table 2) indicated that all patterns of moral decision-making can be found in the sample. most participants chose hm (situation 1: 66.4 %; situation 2: 73.6 %). however, 10.7 % in situation 2 (n = 61) and up to 19.4 % (n = 110) in situation 1 showed the hv pattern. thus, victimising patterns can be applied in this sample of adults. table 2 intra-personal connections and variations of patterns of moral decision-making note. pearson χ2 = 152.092, df = 9, p < 0.001; 19 participants had a missing value for at least one of the two situations. regarding research question 2, cramer’s v (0.299; p < 0.001; pearson χ2 = 152.092, df = 9, p < 0.001) indicated a moderate-sized intrapersonal consistency of patterns of moral decision-making. however, there were participants who changed their pattern of moral decision-making across situations. for example, more than 25% out of students who rated hm in situation 2 rated victimising in situation 1. 4.2 mechanisms of mds as reasons for violating a moral rule along with the theoretical assumption of mds, only those participants who chose victimising strategies were integrated into the content analyses to answer research question 3. the findings indicated that mds were applied in the situations presented by those participants who chose victimising (situation 1: n = 159; situation 2: n = 111). descriptive frequency analyses showed that all categories of mds were used across both situations, though some mechanisms were not used in situations 2. however, frequency of the different mds varied across situational stimuli (see figures 2 and 3). figure 2. mechanisms of md as reasons for violating a moral rule: situation 1—travel costs figure 3. mechanisms of mds as reasons for violating a moral rule: situation 2—change 5. discussion and conclusions 5.1 main findings and limitations the process model of acting offers a framework to elucidate the role of cognitive control strategies, particularly mds, to forming intentions towards following or breaking a moral rule in situations of low moral intensity. further, it can provide theoretical progress and allows for a better understanding of patterns of moral decision-making and emotion attributions, such as hv, uv, um, and hm (aims and valence), in morally relevant situations. thus, this approach allows for the integration of perspectives of other theoretical approaches to hv in adulthood, as presented by minnameier (this issue) and gutzwiller-helfenfinger and latzko (this issue). the action-based perspective presented in this paper consider cognitive and emotional as well as volitional processes. additionally, the present study offers empirical results underlying the theoretical assumptions of the action-based approach to the hv pattern in situations of low moral intensity. the results related to research questions 1 and 2 merely support the findings of former studies regarding adults’ patterns of moral decision-making, as adults decided to victimise, and victimising emerged as a result of person-situation interactions. the patterns showed significant intrapersonal consistency; however, at the same time some participants showed variations in patterns across situations (heinrichs et al., 2015). moreover, the results of the present study enrich former empirical research on patterns of moral decision-making in adulthood, particularly by focusing on patterns of intentions in situations of low moral intensity. additionally, mds were assessed in the context of moral transgressions, not as a personal tendency. qualitative content analysis provided codes of mds within answers to open-ended questions, and showed that students who chose victimising applied mds in the two given situations of low moral intensity, to a relevant extent. frequency analyses of the different categories of mds indicated that mds use differed in quality and quantity across situations and between participants who attribute positive or negative emotions (between participants who show patterns of uv and hv; figures 2 and 3). in future research, it could be interesting to collect data from a bigger sample to go beyond descriptive methods of analyses and test whether these differences in mds use can be confirmed as significant effects triggered by situational conditions or as a personal tendency of mds use. however, the present study does not provide valid empirical evidence, but rather empirical insights regarding patterns of moral decision-making (um, uv, hv, hm) and mds use. thus, our empirical approach obviously has various limitations that are discussed comprehensively, to point out the potential of the presented approach to gain theoretical and empirical progress in future research (lakatos, 1978). 5.1.1 limitations and further perspectives regarding methods and data collection the present data were collected to examine to what extent moral decision-making, emotion attribution, and mds use emerged as intra-personally consistent or varying across situations. assuming that patterns of moral decision-making, as well as mds use, are the results of person-situation interactions, only two situational stimuli were used for comparisons between different situational conditions. however, no personal determinants or traits were included to control for personal conditions. thus, the results only allow for developing a hypothesis that patterns of moral decision-making might show up with intra-personal consistency, to a particular extent, while also indicating there is intra-personal variation across situations that has not yet been explained. regarding mds use, the results were mostly limited to a descriptive level. mds use was coded based on the participants’ answers to open-ended questions, in line with the theoretical assumption that they would only be used if a person had also chosen ‘victimising’. that led to a reduced sample of the coded mds: 159 participants for situation 1, 111 participants for situation 2, and only 32 participants who expressed mds in both situations. thus, this study does not provide reliable data on intrapersonal stability or variation of mds use. intra-personal consistent use vs. variation of mds use across situations of low moral intensity should be studied in future investigations. additionally, the present study is limited to situations of low moral intensity, predominantly characterised as situations of temptation. thus, the results do not provide information on how people react in situations of high moral intensity. furthermore, decisions were measured using the self-judgment perspective (‘what would you do?’ and ‘how would you feel?’; heinrichs et al., this issue) as indicators of participants’ intentions (aims, valence). however, in future research, it could be fruitful to use other methods to detect whether and to what extent participants experience ambivalence or internal conflict, and to what extent they feel committed to one method of action. moreover, participants were asked how they would act and feel in hypothetical situations presented as text-based stimuli. thus, the decisions they made in this study do not necessarily correspond to the behaviour they would display in real life. furthermore, there is a need to develop more realistic settings of data collection, allowing for a better understanding of decision-making, emotions, and actions. this might be possible, for example, in the field or with experimental studies. the results of the present investigation might also differ from those studies, if the participants were encouraged to reflect on their intentions as related to stimuli presented in an interview and embedded in interpersonal conversation, or to come up with their own narratives telling their experiences or motives in obeying or breaking moral rules in their everyday lives. moreover, from the very beginning of hv research, the measurement procedure has been criticised for not really identifying emotions, but emotion attributions or emotion justifications. it would be important to develop sophisticated and valid measures of moral emotions in the context of patterns of moral decision-making (see heinrichs et al., 2015; gutzwiller-helfenfinger & latzko, this issue). 5.1.2 limitations and further perspectives in regard to displaying the process of acting therefore, to reflect on the present study, the results call for further research to provide deeper insights into the process of intention formation and further sub-processes of moral actions. it would be interesting to know whether and to what extent individuals differ in subjectively constituted problems or in assessing the moral intensity of situations, depending on their individual moral principles or values, on selfand social sanctions, and on non-moral values and needs. mds use and hv or uv patterns may emerge, at least in different forms, depending on whether ambivalence or internal conflict was perceived, or whether the individual managed to cope with his or her negative emotions (bandura, 2016). to build commitment to one preferred way of acting (victimising or moral acting), the agent has to activate mechanisms of self-regulation based on social or self-sanctions (bandura, 2016). nevertheless, we must admit that the way of measuring intentions, emotions, or applied mds regarding the given situations is far from validly displaying the sequence of inner sub-processes. in future research, specific experimental studies may offer, for example, data on forming an intention, attributing emotions, or justifying decisions under contrasting conditions. 5.1.3 limitations and further perspectives from a developmental perspective furthermore, this paper mainly focused on the process of acting rather than on individual development. the present study only captures one measurement point and includes students representing emerging adults. to develop implications for moral education, it would be relevant to at least study the development of relevant personal determinants, such as mds. research on mds has provided interesting results from a developmental perspective. osofsky, bandura, and zimbardo (2005) indicated a low, but significant correlation between mds and age. older participants reported higher levels of mds. however, a correlation between age and mds only provides superficial indications and calls for a deeper understanding of underlying processes. in respect to this, bandura offered a more sophisticated assumption, that the development of mds is in line with a change of preference from social sanctions in childhood and earlier years to self-sanctions among adults (bandura, 2016). this idea is quite consistent with the basic assumption in the kohlbergian tradition of studying moral development. kohlberg claims social sanctions to be important at the preconventional level. at the conventional level, not one person but a social group or system, may sanction deviant or immoral behaviour. at the postconventional level the agent is assumed to have developed into a person with autonomous judgement and a strong ‘moral self’, committed to a hierarchy of values, and able to reflect on moral problems from a perspective of legitimacy. thus, the relevance of social sanctions seems to decrease; however, self-sanctions might increase on the way to higher moral stages. however, there is empirical evidence that such an individual-constructivist (vs. social-constructivist) idea of moral development towards an autonomous moral self ignores phenomena such as situational impact on moral judgements and patterns of hv (as well as hm, um, hm) during adolescence and adulthood (beck & parche-kawik, 2004; heinrichs et al., 2015; krebs & denton, 2005). 5.2 conclusions the action-theoretical approach presented in this paper provides a perspective to explain hv as a pattern emerging within the action process, particularly regarding immoral behaviour among adults. people choose to jeopardise their own standards and seem to feel happy about a decision in favour of a moral transgression if they manage to find ways to cope with inner conflicts and get strongly committed towards their intention. this idea is in line with bandura, who presented the concept of moral disengagement as an indicator of lacking self-regulation concerning cases of immoral behaviour. to conclude, the three approaches, the action-theoretical model, the hv pattern in adulthood as a pattern emerging during the process of acting, as well as the concept of moral disengagement, address the same basic question: how can people act in ways that contradict their moral principles, at least in some situations, without feeling distress or having a bad conscience? this study indicates that we have to acknowledge that breaking moral rules or standards is quite common, even among adults, at least as studied here in situations of lower moral intensity. otherwise, findings have revealed that people do commit to fairness, sharing, and others’ well-being (fehr & fischbacher, 2004; fehr & schmidt, 1999, 2006). however, the results presented are far from being valid for drawing evidence-based conclusions concerning moral education. nevertheless, the approach reconstructs (happy or unhappy) victimising in an action-based perspective as a result of intention formation, and provides long-term perspectives for education that also differ from those discussed in this special issue on the hv pattern in adulthood and adolescence (gutzwiller-helfenfinger & latzko, this issue; minnameier, this issue). to reduce mds use or foster reflective use of strategies like mds offers an approach that differs from common methods of moral education, such as supporting moral judgment competence with respect to cognitive development (minnameier, 2012), fostering moral expertise (narvaez & lapsley, 2005), or developing moral emotions (gutzwiller-helfenfinger & latzko, this issue). additionally, from an educational perspective, the results of the present study only pointed to the empirical ‘is’ in decision-making, intentions, emotion attributions, and moral disengagement in morally relevant situations. there is an additional need to discuss aims considering also norms and values and legitimate curricula. it seems important to reflect on whether students should be encouraged to follow their moral ideals in certain situations, even if they must accept great (personal) disadvantages. perhaps it would be preferable to enable them to balance their own needs and those of others. minnameier would argue that sometimes, it is morally adequate to behave as a strategic moralist (minnameier, this issue; minnameier, heinrichs & kirschbaum, 2016). he states, for example, that to implement trustful cooperation in the long term, depending on the situational conditions, an individual may have to adopt his or her way of acting towards moral standards of partners or one’s environment. sometimes it may be recommended to look for strategies (e.g., in cooperative games) that may lead to moral behaviours in a sense of an overarching moral aim, such as developing trustful cooperation and preventing others from pursuing only their own needs and getting rich at the expense of the individual in the long-term. overall, there are considerably contrasting positions concerning the aims of moral education. it is discussed controversially whether moral education should focus on struggling to promote autonomously judging moral agents and accept that some of them may end up as unhappy moralists (oser & reichenbach, 2005), or whether it would be better to foster individuals who are able to choose ways of acting that may lead to showing hv patterns when considering the situational conditions for implementing morality and acting as strategic moralists. however, there seems to be a common position underlying this discourse of ‘successfully engaging in meaningful, positive and caring relationships is both a prerequisite for and consequence of successful teaching and learning processes (malti, häcker, & nakamura, 2009). moreover, meaningful relationships are especially important in a globalised society […]‘ (gutzwiller-helfenfinger & heinrichs, this issue) to create a ‘moral atmosphere’ (kohlberg, 1984), supporting moral actions and the development of socio-moral competencies. keypoints adults tending towards moral transgression used various moral disengagement strategies (mds) across situations. fostering a reflective use of mds might support formation of intentions with high valence at least in morally relevant situations of low moral intensity. abbreviations hm = happy moralizing pattern; hv = happy victimizing pattern; mds = moral disengagement strategies; um = unhappy moralizing pattern; uv = unhappy victimizing pattern footnotes 1 the definition of problem used here mainly points to the individually constituted discrepancy between is and ought. in terms of problem-solving approaches (e.g. following dörner, 1979), it includes tasks and problems (for a more sophisticated discussion, see heinrichs, 2005). references bandura, a. (1990). mechanisms of moral disengagement in terrorism. in reich, w. (ed.), origins of terrorism: psychologies, ideologies, theologies, states of mind (pp. 161–191). cambridge: woodrow wilson center press. bandura, a. (2002). selective moral disengagement in the exercise of moral agency. journal of moral education, 31(2), 101–119. https://doi.org/10.1080/0305724022014322 bandura, a. (2016). moral disengagement: how people do harm and live with themselves. new york, ny: worth publishers. bandura, a., barbaranelli, c., caprara, g. v., & pastorelli, c. (1996). mechanisms of moral disengagement in the exercise of moral agency. journal of personality and social psychology, 71(2), 364–374. https://doi.org/10.1037/0022-3514.71.2.364 bandura, a., caprara, g. v., barbaranelli, c., pastorelli, c., & regalia, c. (2001). sociocognitive self-regulatory mechanisms governing transgressive behaviour. journal of personality and social psychology, 80(1), 125–135. https://doi.org/10.1037/0022-3514.80.1.125 beck, k., & parche-kawik, k. (2004). das mäntelchen im wind? zur domänenspezifität moralischen urteilens [the wave in the wind? towards domain specificity of moral judging]. zeitschrift für pädagogik, 50(2), 244–265. boardley, i. d., & kavussanu, m. (2007). development and validation of the moral disengagement in sport scale. journal of sport and exercise psychology, 29(5), 608–628. https://doi.org/10.1123/jsep.29.5.608 creswell, j. w. (2014). research design: qualitative, quantitative, and mixed methods approaches (4th ed.). los angeles: sage. detert, j. r., treviño, l. k., & sweitzer, v. l. (2008). moral disengagement in ethical decision-making: a study of antecedents and outcomes. journal of applied psychology, 93(2), 374–391. https://doi.org/10.1037/0021-9010.93.2.374 döring, b. (2013). the development of moral identity and moral motivation in childhood and adolescence. in k. heinrichs, f., oser, & t. lovat (eds.), handbook of moral motivation. theories, models, applications (pp. 289–305). rotterdam: sense publishers. dörner, d. (1979). problemlösen als informationsverarbeitungsprozess [problem solving as information processing]. stuttgart: kohlhammer. eisenberg, n., fabes, r. a., & spinrad, t. l. (2007). prosocial development. in n. eisenberg, w. damon, & r. m. lerner (eds.), handbook of child psychology: social, emotional, and personality development (p. 646–718). john wiley & sons inc. https://doi.org/10.1002/9780470147658.chpsy0311 esser, h. (1996). die definition der situation [definition of the situation].kölner zeitschrift für soziologie und sozialpsychologie, 48(1), 1–34. fehr, e., & schmidt, k. m. (1999). a theory of fairness, competition, and cooperation. quarterly journal of economics, 114(3), 817–868. https://doi.org/10.1162/003355399556151 fehr, e., & schmidt, k. (2006). the economics of fairness, reciprocity and altruism: experimental evidence and new theories. in: s.-c. kolm, & j. m. ythier (eds.), handbook of the economics of giving, altruism and reciprocity (pp. 615–691). amsterdam: elsevier. fehr, e., & fischbacher, u. (2004). third-party punishment and social norms. evolution and human behaviour, 25(2), 63–87. https://doi.org/10.1016/s1090-5138(04)00005-4 fida, r., paciello, m., tramontano, c., fontaine, r. g., barbaranelli, c., & farnese, m. l. (2015). an integrative approach to understanding counterproductive work behaviour: the roles of stressors, negative emotions, and moral disengagement. journal of business ethics, 130(1), 131–144. https://doi.org/10.1007/s10551-014-2209-5 gasser, l., gutzwiller-helfenfinger, e., latzko, b., & malti, t. (2013). moral emotion attributions and moral motivation. in k. heinrichs, f. oser, & t. lovat (eds.), handbook of moral motivation. theories, models and applications (pp. 304–320). rotterdam: sense publishers. gollwitzer, p.m. (1996). rubikonmodell der handlungsphasen. [rubikon model of action phases] in j. kuhl, & h. heckhausen (eds.), motivation, volition und handlung (pp. 531–582). göttingen: hogrefe. gutzwiller-helfenfinger, e., & heinrichs, k. (2020). the happy victimizer pattern in adulthood – state of the art and contrasting approaches: introduction to the special issue. frontline learning research, 8(5), 1-4. https://doi.org/10.14786/flr.v8i5.681 gutzwiller-helfenfinger, e., & latzko, b. (2020). happy victimizing in emerging adulthood: reconstruction of a developmental phenomenon? frontline learning research, 8(5), 47-69. https://doi.org/10.14786/flr.v8i5.382 haidt, j., & craig j. (2008). the moral mind: how five sets of innate intuitions guide the development of many culture-specific virtues, and perhaps even modules. in p. carruthers, s. laurence, & s.p. stich (eds.), the innate mind: foundations and the future, evolution and cognition (vol. 3; pp. 367–391). oxford: oxford university press. heckhausen, h., gollwitzer, p.m., & weinert, f. e. (1987). jenseits des rubikon [beyond the rubicon]. berlin: springer. heinrichs, k. (2005). urteilen und handeln – ein prozessmodell und seine moralpsychologische spezifizierung . [a process model of judging and acting and its moral psychological specification] (vol. 12). frankfurt a. m.: peter-lang-verlag. heinrichs, k. (2010). urteilen und handeln in der moralischen entwicklung [judging and action in a developmental perspective]. in b. latzko, & t. malti (eds.), moralentwicklung und moralerziehung in kindheit und adoleszenz (s. 69–86). göttingen: hogrefe. heinrichs, k. (2013). moral motivation in the light of action theory. in k. heinrichs, f. oser, & t. lovat (eds.),handbook of moral motivation. theories, models and applications (pp. 623–657). rotterdam: sense publishers. heinrichs, k., gutzwiller-helfenfinger, e., latzko, b., minnameier, g., & döring, b. (2020). happy victimizing in adolescence and adulthood – empirical findings and further perspectives, frontline learning research, 8(5), 5-23. https://doi.org/10.14786/flr.v8i5.385 heinrichs, k., minnameier, g., latzko, b., & gutzwiller-helfenfinger, e. (2015). „don’t worry, be happy“? – das happy-victimizer-phänomen im berufsund wirtschaftspädagogischen kontext [“don’t worry, be happy”? – the happy victimizer phenomenon in the context of vocational and business education]. zeitschrift für berufsund wirtschaftspädagogik, 111(1), 32–55. jones, t. m. (1991). ethical decision-making by individuals in organisations: an issue contingent model. academy of management review, 16(2), 366–395. https://doi.org/10.5465/amr.1991.4278958 keller, m., lourenço, o., malti, t., & saalbach, h. (2003). the multifaceted phenomenon of ‘happy victimizers’: a cross‐cultural comparison of moral emotions. british journal of developmental psychology, 21(1), 1–18. https://doi.org/10.1348/026151003321164582 kohlberg, l. (1984). the psychology on moral development: the nature and validity of moral stages (vol. ii). new york: harper & row. krebs, d. l., & denton, k. (2005). toward a more pragmatic approach to morality: a critical evaluation of kohlberg's model. psychological review, 112(3), 629–649. https://doi.org/10.1037/0033-295x.112.3.629 krettenauer, t., malti, t., & sokol, b. (2008). the development of moral emotions and the happy victimizer phenomenon: a critical review of theory and applications. european journal of developmental science , 2, 221–235. https://doi.org/ 10.3233/dev-2008-2303 lakatos, i. (1978). the methodology of scientific research programmes. cambridge, ma: cambridge university press. landis, j. r., & koch, g. g. (1977). the measurement of observer agreement for categorical data. biometrics, 33(1), 159–174. https://doi.org/ 10.2307/2529310 lapsley, d. k., & narvaez, d. (2005). moral psychology at the crossroads. character psychology and character education, in d. k. lapsley, & c. power (eds.), character psychology and character education (pp. 18–35). notre dame: university of notre dame press. latzko, b., & malti, t. (2010). children's moral emotions and moral cognition: towards an integrative perspective.new directions for child and adolescent development, 129, 1-10. https://doi.org/10.1002/cd.272 lind, g. (2019). moral ist lehrbar!: wie man moralisch-demokratische fähigkeiten fördern und damit gewalt, betrug und macht mindern kann [morality can be taught! how we can foster moral-democratic abilities and reduce violence, fraud and power]. berlin: logos. malti, t., & krettenauer, t. (2013). the relation of moral emotion attributions to prosocial and antisocial behaviour: a meta‐analysis. child development, 84, 397–412. https://doi.org/10.1111/j.1467-8624.2012.01851.x malti, t., häcker, t., & nakamura, y. (2009). sozial-emotionales lernen in der schule [socioemotional learning in schools]. zurich: pestalozzianum. malti, t., averdijk, m., zuffianò, a., ribeaud, d., betts, l. r., rotenberg, k. j., & eisner, m. p. (2016). children’s trust and the development of prosocial behavior. international journal of behavioral development, 40(3), 262–270. https://doi.org/10.1177/0165025415584628 mayring, p. (2015). qualitative inhaltsanalyse: grundlagen und techniken [qualitative content analysis: basics and techniques]. weinheim: beltz. mcalister, a. l., bandura, a., & owen, s. v. (2006). mechanisms of moral disengagement in support of military force: the impact of sept. 11. journal of social and clinical psychology, 25(2), 141–165. h ttps://doi.org/10.1521/jscp.2006.25.2.141 minnameier, g. (2020). explaining happy victimizing in adulthood – a cognitive and economic approach. frontline learning research, 8(5),70 91. https://doi.org/10.14786/flr.v8i5.381 minnameier, g. (2012). a cognitive approach to the ‘happy victimizer’. journal of moral education, 41(4), 491–508. https://doi.org/10.1080/03057240.2012.700893 minnameier, g., heinrichs, k., & kirschbaum, f. (2016). sozialkompetenz als moralkompetenz – wirklichkeit und anspruch? [social competence as moral competence – reality and demand?]. zeitschrift für berufsund wirtschaftspädagogik, 112(4), 636–666. minnameier, g., & schmidt, s. (2013). situational moral adjustment and the happy victimizer. european journal of developmental psychology , 10(2), 253–268. https://doi.org/10.1080/17405629.2013.765797 moore, c., detert, j. r., treviño, l., baker, v. l., & mayer, d. m. (2012). why employees do bad things: moral disengagement and unethical organisational behaviour. personnel psychology, 65(1), 1–48. https://doi.org/10.1111/j.1744-6570.2011.01237.x narvaez, d., & lapsley, d. k. (2005). the psychological foundations of everyday morality and moral expertise. in d. k. lapsley, & c. power (eds.), character psychology and character education (pp. 140–165). notre dame: university of notre dame press. neuweg, g. h. (1999). könnerschaft und implizites wissen zur lehr-lerntheoretischen bedeutung der erkenntnisund wissenstheorie michael polanyis [expertise and tacit knowledge – the relevance of michael polanyi’s theory of knowledge for teaching and learning]. münster: waxmann. nunner-winkler, g. (2013). moral motivation and the happy victimizer phenomenon. in: k. heinrichs, f. oser, & t. lovat (eds.), handbook of moral motivation. theories, models, applications (pp. 267–287). rotterdam: sense publishers. nunner-winkler, g. (2007). development of moral motivation from childhood to early adulthood. journal of moral education, 36(4), 399–414. https://doi.org/10.1080/03057240701687970 nunner-winkler, g. (1993). die entwicklung moralischer motivation [the development of moral motivation]. in w. edelstein, g. nunner-winkler, & g. noam, g. (eds.), moral und person (pp. 278–303). frankfurt am main: suhrkamp. nunner-winkler, g., & sodian, b. (1988). children's understanding of moral emotions. child development, 59, 1323–1338. https://doi.org/10.2307/1130495 oser, f. k., & reichenbach, r. (2005). moral resilience – the unhappy moralist. in w. edelstein, & g. nunner-winkler (eds.), morality in context, advances in psychology (pp. 203–224). north holland: elsevier. osofsky, m. j., bandura, a., & zimbardo, p. g. (2005). the role of moral disengagement in the execution process. law and human behaviour, 29(4), 371–393. https://doi.org/10.2307/1130495 rothmund, t., & baumert, a. (2014). shame on me implicit assessment of negative moral self-evaluation in shame-proneness. social psychological and personality science, 5(2), 195–202. https://doi.org/10.1177/1948550613488950 schreier, m. (2012). qualitative content analysis in practice. los angeles: sage. sokolowski, k. (1993). emotion und volition [ emotion and volition]. göttingen: hogrefe. sokolowski k. (1996). wille und bewusstheit [will and consciousness]. in: j. kuhl, & h. heckhausen (eds.), enzyklopädie der psychologie: themenbereich c theorie und forschung, serie iv motivation und emotion, band 4 motivation, volition und handlung (pp. 485–530). göttingen: hogrefe. thoma, s. j., & bebeau, m. j. (2013). moral motivation and the four component model. in k. heinrichs, f. oser, & t. lovat (eds.), handbook of moral motivation. theories, models and applications (pp. 49–67). rotterdam: sense publishers. tirri, k. (1999). teachers' perceptions of moral dilemmas at school. journal of moral education, 28(1), 31–47. https://doi.org/10.1080/030572499103296 weinberger, a. & frewein, k. (2019). vake (values and knowledge education) als methode zur integration von werterziehung im fachunterricht in heterogenen klassen beruflicher schulen: förderung von kognitiven und affektiven zielen [vake (values and knowledge eduation): a measure for integrating values education in domain specific lessons in heterogeneous classes in vocational schools: fostering cognitive and affective goals]. in k. heinrichs & h. reinke (eds), heterogenität in der beruflichen bildung. im spannungsfeld von erziehung, förderung und fachausbildung (s. 181–194). reihe wirtschaft – beruf – ethik. bielefeld: wbv. appendix hypothetical situations for the assessment of moral decision-making [introduction for the participants] the following section shows various descriptions of situations. we would like to ask you to read through these situations carefully and to imagine yourself in the described situations. there are always exactly two alternative actions. please decide in favour of one of them by marking the appropriate alternative with a cross. afterwards, you will be asked to give reasons for your decision. please indicate the main reasons in any case. in the end, you will be asked how you would have felt by acting as stated in each case. situation 1—travel costs please imagine the following situation: you are working for a large international company and attend a national meeting within the framework of your employment company. accidentally, a friend hast to go to the same city as you do. he offers to give you a ride. you accept his offer willingly. during the return journey, you have a nice chat. nobody from your company knows or has noticed that you didn’t take your own car, and therefore have no expenses of your own. the travel expenses for this journey would be 50 euros. the formula for the travel expenses report says that only real travel costs are refundable. the friend can settle up his travel expenses himself, so that you don’t have to refund him anything. please put yourself into this situation and decide what you would do: ( ) you claim the travel expenses of 50 euros. ( ) you don’t claim the travel expenses of 50 euros. please give reasons for your decision… [these statements from participants are the basis for coding the mechanisms of mds] how would you feel if you would had really acted like that? please mark only one possible answer. ( ) very good ( ) rather good ( ) rather bad ( ) very bad situation 2—change please imagine the following situation: due to the fact that you speak spanish very well, you go on holiday in spain. you have earned the money for your holiday by working in a factory. during a day trip by bus to a distant city, you buy a handmade wooden sculpture for your parents in a craft shop. it is 50 euros. you pay with a 200 euro bill, and leave the shop. as you count your change outside the shop, you notice that the seller has given you back four 50 euro bills instead of 150 euros, inadvertently. please put yourself in this situation and decide what you would do: ( ) you return 50 euros to the seller. ( ) you keep the 50 euros. please give reasons for your decision… [these statements from participants are the basis for coding the mechanisms of mds] how would you feel if you would had really acted like that? please mark only one possible answer. ( ) very good ( ) rather good ( ) rather bad ( ) very bad frontline learning research 6 (2014) 7-14 issn 2295-3159 corresponding author: bert de smedt, leopold vanderkelenstraat 32 box 3765, b-3000 leuven, belgium. e-mail: bert.desmedt@ppw.kuleuven.be doi: http://dx.doi.org/10.14786/flr.v2i4.115 7 | f l r advances in the use of neuroscience methods in research on learning and instruction bert de smedt a a faculty of psychology and educational sciences, university of leuven, belgium article received 27 may 2014 / revised 19 september 2014 / accepted 11 november 2014 / available 23 december 2014 abstract cognitive neuroscience offers a series of tools and methodologies that allow researchers in the field of learning and instruction to complement and extend the knowledge they have accumulated through decades of behavioral research. the appropriateness of these methods depends on the research question at hand. cognitive neuroscience methods allow researchers to investigate specific cognitive processes in a very detailed way, a goal in some but not all fields of the learning sciences. this value added will be illustrated in three ways, with examples in field of mathematics learning. firstly, cognitive neuroscience methods allow one to understand learning at the biological level. secondly, these methods can help to measure processes that are difficult to access by means of behavioral techniques. finally, and more indirectly, neuroimaging data can be used as an input for research on learning and instruction. this paper concludes with highlighting the challenges of applying neuroscience methods to research on learning and instruction. frontline: cognitive neuroscience offers a series of tools and methodologies that allow researchers in the field of learning and instruction to complement and extend the knowledge they have accumulated through decades of behavioral research. the appropriateness of these methods depends on the research question at hand. keywords: cognitive neuroscience; methods; mathematics learning; educational neuroscience b. de smedt 8 | f l r 1. introduction non-invasive brain imaging methods, such as event-related potentials (erp) or functional magnetic resonance imaging (frmi), represent a series of tools and methodologies that allow researchers in the field of learning and instruction to complement and extend the knowledge they have already accumulated through decades of behavioral research (e.g., cacioppo, berntson, & nusbaum, 2008; de smedt, ansari, et al., 2011). the potential application of these methods depends on the research question at hand. after a brief discussion of relevant brain imaging methods, i will use the field of mathematics learning to illustrate three ways in which these methods can be used in research on learning and instruction. this paper concludes with highlighting some challenges of applying such methods to research on learning and instruction. 2. neuroscience methods when considering different methods that are used by neuroscientists to study the structure and function of the brain, it is important to point out that neuroscience is a very broad field that includes a variety of disciplines ranging from cellular and molecular neuroscience to cognitive neuroscience (e.g., squire et al., 2013). i restrict the focus here to cognitive neuroscience and its methods (ward, 2006), because this sub-field of neuroscience is the closest to research on learning and instruction, given its focus on the neural mechanisms that underlie human cognition and behavior. a detailed description of these cognitive neuroscience methods is beyond the scope of this contribution and excellent introductions are provided by ward (2006) and dick, lloyd-fox, blasi, elwell, & mills (2014). sometimes, psychophysiological measures, such as skin conductance, heart rate or eyemovement data, are also denoted as neuroscience methods. although these methods tap into the nervous system, they are not direct measures of brain structure or function and therefore they are not considered here. the transmission of information in the brain from one cell to the other occurs through electrical signals, and this electrical activity of the brain can be captured by methods such as electroencephalography (eeg), which requires a cap of electrodes to be mounted on the head of a participant, and magnetoencephalography (meg) (ward, 2006). on one hand, the advantage of these methods is that they can measure the activity of the brain in response to a particular stimulus (i.e. event-related activity) at a very accurate temporal scale and they are particularly suited to investigate when a process is taking place. on the other hand, a large number of stimuli of a particular type (typically a few dozens) are needed in order to reliably estimate the brain signal in response to that stimulus. another series of methods are magnetic resonance imaging (mri) techniques, which use large magnetic fields and the magnetic properties of hydrogen atoms in brain tissue or in blood to visualize brain structure and brain function, respectively (ward, 2006). these data are acquired in a specific and very noisy environment, the mri scanner, in which participants have to lie still and are not allowed to move more than a few millimeters. this category of methods can investigate the structure of the brain, i.e. the gray or white matter, and how this structure is related to performance or changes as a result of learning. interesting examples are provided by supekar et al. (2013), who showed that the size of the hippocampus predicted the performance gains in response to one-on-one math tutoring and by keller b. de smedt 9 | f l r and just (2009), who showed that intensive remedial reading instruction resulted in changes in white matter in poor readers. mri also allows us to investigate brain function, a technique that is called functional mri or fmri, which is one of the most common techniques used in cognitive neuroscience (ward, 2006). functional mri is an indirect way of assessing the brain’s activity and measures the level of oxygen in the blood. the assumption is that an increase in oxygen level is the result of the vascular system’s response to an increase in brain activity. mri methods are very accurate on a spatial scale and are particularly suited to investigate where in the brain a particular process is taking place. due to the practical constraints of the mri-environment (e.g., noise, no movement) the type of tasks that participants can complete is limited, yet progress is being made over the last years to use more complex tasks, such as playing video games (anderson et al., 2011) and even face-to-face interaction (e.g., redcay et al., 2010). it is crucial to point out that the measures reviewed above, i.e. signals indicating brain structure or function, can only be meaningfully interpreted by linking them to cognitive theories (e.g., cacioppo et al., 2008; de smedt, ansari, et al., 2011). furthermore, the collection of behavioral data represents a necessary step in most studies in cognitive neuroscience (e.g., ward, 2006). in all, a detailed cognitive theory of the phenomenon under investigation is crucial to design and interpret cognitive neuroscience data and to apply cognitive neuroscience methods to the field of learning and instruction. 3. application to research on learning and instruction how can the cognitive neuroscience methods reviewed above advance the field of learning and instruction? this depends on the research question at hand, and only some but certainly not all types of research questions in the field of learning and instruction might benefit from the use of cognitive neuroscience methods. stern and schneider (2010) provided a nice analogy for determining when these cognitive neuroscience tools and theories could be appropriate. they compared this issue with the use of a digital road map. when using a digital road map for looking at the field learning and instruction, the appropriate resolution of the map depends on what the map viewer is looking for, alleys (micro-level) vs. highways (macro-level), and users can zoom in and out between different levels of resolution. some questions only focus at the broader context of learning (macro-level), as is the case in large-scale research on educational systems, and are at a low level of resolution. others aim to unravel the very specific cognitive processes that underlie learning, and this requires a map at very high resolution (micro-level). it is at this micro-level of understanding of such specific cognitive processes that cognitive neuroscience methods can be applied in the field of learning and instruction. i will use the field of mathematics learning to illustrate three ways in which cognitive neuroscience methods can be useful for research in learning and instruction (see also de smedt et al., 2010; de smedt, ansari, et al. 2011; de smedt & grabner, 2015). 3.1 understanding learning at the biological level neuroimaging data allow us to examine at the biological level how people learn. such data can provide converging evidence for findings that have been obtained through psychological and educational research. this convergence of findings from different research methodologies has the b. de smedt 10 | f l r potential to provide a better and more complete understanding how typical and atypical learning takes place (e.g., de smedt, ansari, et al., 2011; lieberman, schreiber, & ochsner, 2003). for example, how do people acquire and apply different strategies to solve elementary arithmetic problems, such as 5 + 9 or 4 × 3? decades of behavioral research have revealed that these problems are either solved by using fact retrieval from declarative memory or by using procedural strategies, such as counting, and developmental data indicate that children develop an increasing reliance on arithmetic fact retrieval, while the use of procedures to solve such elementary problems decreases over time (e.g., siegler, 1996). research in cognitive neuroscience is now beginning to understand on how this learning of arithmetic is reflected at the neural level (e.g., arsiladou & taylor, 2011; zamarian, ischebeck, & delazer, 2009). in a series of studies, we have tried to investigate this issue with eeg (de smedt, grabner, & studer, 2009; grabner & de smedt, 2011; grabner & de smedt, 2012). in these studies, adults had to solve a series of addition, subtraction and multiplication problems, while their brain activity was recorded with eeg, and they had to verbally report on a trial-by-trial basis on the strategies they used to solve the presented problems. these studies had two aims. first, we wanted to verify whether these two types of strategies were reflected in different brain activity patterns and whether fact retrieval training resulted in changes in brain activity that reflected a shift in strategy use. second, we aimed to test if cognitive neuroscience methods, such as eeg, can be used as a way of methodological triangulation to further validate the use of verbal report data. these data are typically used in behavioral research to investigate strategy use but their validity has been debated (e.g., kirk & ashcraft, 2001). the eeg data revealed different patterns of activity for the two types of strategies: oscillations in the theta band (3–6 hz) were associated with fact retrieval whereas oscillations in the lower alpha band (8–10 hz) were related to procedural strategies (grabner & de smedt, 2011). when we trained participants in using fact retrieval strategies, we were also able to show that the well-known behavioral shift from procedural strategies to fact retrieval as a function of training was also reflected in specific changes in brain activity, i.e. training-related activity increases in the theta band and decreases in the lower alpha band (grabner & de smedt, 2012). combining verbal strategy reports with reaction times on specific problem types and neuroimaging data allowed us to further examine the validity of these verbal reports. this type of methodological triangulation confirmed that verbal strategy reports are a valid way to capture strategies in mental arithmetic. in all, this convergence of findings obtained by different research methods at behavioral and biological levels provides a more solid empirical ground for our theories on strategy development. 3.2 measuring difficult-to-access processes neuroimaging data can provide a level of analysis and measurement that cannot be accessed by behavioral data alone. examples of this application can be observed in the study of individual differences between learners and in understanding the origins of atypical development. de smedt, holloway, & ansari (2011) used fmri to investigate brain activity in 10-12-year-old children during addition and subtraction and compared children with low and average levels of arithmetical competence, who significantly differed in their performance on a standardized arithmetic fluency test. although both groups of children did not differ in a simple calculation task at the behavioral level (i.e. accuracy, speed) during the acquisition of the fmri data, the authors observed significant group differences in brain activity in the right intraparietal sulcus, a brain region that is known to play a key role in the processing of numerical magnitudes: children with low levels of arithmetical competence showed higher activation in this region during the solutions of problems with a relatively small b. de smedt 11 | f l r problem size. the interpretation of these data in the context of neurocognitive theories of numerical magnitude processing and arithmetic development (e.g., ansari, 2008; butterworth, varma, & laurillard 2011) suggests the use compensatory strategies and generates predictions that should be further exploited in subsequent research. for example, it might be that the children with low arithmetical competence in the study of de smedt, holloway, et al. (2011) continued to rely to a greater extent on quantity-based strategies (such as counting or procedural calculation) on those problems that children with relatively higher arithmetical competence already retrieved from their memory, a possibility that should be evaluated in subsequent research. in all, this indicates that brain imaging data can uncover subtle processing differences between groups of learners that may not be detected through the measurement of behavioral data alone, illustrating the high resolution level which cognitive neuroscience methods are able to capture. 3.3 input for research on learning and instruction studies in cognitive neuroscience can also have an indirect impact on research in learning and instruction, by drawing our attention to specific fine-grained cognitive processes that are implicated in different types of learning (see aue, lavelle, & cacioppo, 2009 for a similar rationale in the field of psychology). such data have the potential to generate new hypotheses that can be tested in research on learning and instruction. for example, neuroimaging studies on how the brain processes numbers have revealed that the intraparietal sulci (ips) are consistently active whenever we have to perform numerical and arithmetical tasks and that this structure supports the processing of numerical magnitudes (e.g., ansari, 2008; dehaene, piazza, pinel, & cohen, 2003). brain imaging studies in children with developmental dyscalculia, a learning disorder that is characterized by severe and persistent difficulties in acquiring mathematical competencies, point to structural and functional abnormalities in the ips in these children (e.g., butterworth et al., 2011; price & ansari, 2013 for a review). this all suggests that the processing of numerical magnitudes is potentially a key to successful mathematical development and this processing might be compromised in developmental dyscalculia (dd). this suggestion has fueled a large number of psychological and educational studies that have empirically confirmed this hypothesis at the behavioral level (see de smedt, noël, gilmore, & ansari, for a review), by consistently showing that individuals with dd have significant impairments in their ability to compare (symbolic) numbers. more broadly, these studies have also furthered our understanding of individual differences in typical mathematical development, as the ability to compare (symbolic) numbers is predictive of subsequent mathematical development (see de smedt et al., 2013, for a review). this research has impacted on studies in the field of learning and instruction, through the development and evaluation of specific interventions (e.g., de smedt et al., 2013) and diagnostic instruments that can be used for the screening and early identification of at-risk children (nosworthy, bugden, archibald, evans, & ansari, 2013). it is important to point out that even if these studies do not collect measures of brain activity or structure, they rely to some extent on insights gleaned from cognitive neuroscience studies. used in this way, cognitive neuroscience data might set the stage for new educational research and it can, albeit indirectly, enhance our understanding of learning. 4. challenges the application of cognitive neuroscience methods to research on learning and instruction also imposes some challenges and caveats that one needs to be aware of (see also ansari, de smedt, & b. de smedt 12 | f l r grabner, 2012; de smedt & grabner, 2015), which are not specific to the domain of mathematics learning. these challenges deal with the issue of external validity or generalizability as well as the scope of biological data and explanations (e.g., beck, 2010). it is important to point out that most of the existing studies in cognitive neuroscience involved adult participants and that these methods are not so easy to apply in children. this is because the acquisition of neuroimaging data is very sensitive to movement and motion artefacts in children often negatively impact on data acquisition. progress is being made in the reduction of such artefacts, for example by training children to keep still when such data are being collected (de bie et al., 2010). at a more theoretical level, cognitive neuroscience findings obtained in adult participants cannot be readily generalized to the developing brain and the learning of children and adolescents, as the human brain undergoes massive structural and functional changes throughout childhood and adolescence (ansari, 2010). the tasks used in most cognitive neuroscience studies are very elementary and differ from the rich and complex tasks that are typically solved in everyday learning environments and that are used in research learning and instruction. such complex tasks cannot be easily administered in cognitive neuroscience studies for various reasons. as indicated above, there are practical constraints related to the laboratory environment in which neuroimaging data are being collected. in order to obtain reliable data on brain activity during a particular task, a large number of trials of the same task need to be presented. these tasks need to be very elementary, because the larger the number of cognitive processes in a particular task, the more difficult it will be to disentangle these cognitive processes physiologically. one way to resolve this is to correlate data acquired in very constrained laboratory settings to ecologically valid measures of learning (see price, mazzocco, & ansari (2013), for an example). one important caveat deals with the scope of a neuroscientific data and explanations. there might be an inappropriate belief that neuroscientific data are more convincing, informative and valid than behavioral data (beck, 2010). on the contrary, knowledge gained through cognitive neuroscience methods should be considered at the same level of data obtained by standard behavioral methods in learning and instruction. there should be no knowledge hierarchy, but an appreciation of multiple sources of data to better understand how learning takes place and how it can be fostered (de smedt, ansari, et al., 2011). 5. conclusion the application of neuroscience methods to research on learning and instruction depends on the level of the research question. when interested in very specific low-level processes, neuroimaging data have the potential to help understanding learning at the biological level, to measure processes that are difficult to access via behavioral data and to generate and test hypotheses for educational phenomena that can be subsequently investigated via behavioral research on learning and instruction. keypoints the application of neuroscience methods to research on learning and instruction depends on the level of the research question. b. de smedt 13 | f l r cognitive neuroscience methods allow researchers to investigate specific cognitive processes in a very detailed way, a goal in some but not all fields of the learning sciences neuroimaging data have the potential to understand learning at the biological level, to measure process that are difficult to access via behavioral data and to generate hypotheses for subsequent research on learning and instruction. acknowledgments this work is partially supported by grant goa 2012/010. references anderson, j. r., bothell, d., fincham, j. m., anderson, a. r., poole, b., & qin, y. l. (2011). brain regions engaged by partand whole-task performance in a video game: a model-based test of the decomposition hypothesis. journal of cognitive neuroscience, 23, 3983-3997. doi: 10.1162/jocn_a_00033 ansari, d. (2008). effects of development and enculturation on number representation in the brain. nature reviews neuroscience, 9, 278-291. doi: 10.1038/nrn2334 ansari, d. (2010). neurocognitive approaches to developmental disorders of numerical and mathematical cognition: the perils of neglecting the role of development. learning and individual differences, 20, 123-129. doi:10.1016/j.lindif.2009.06.001 ansari, d., de smedt, b., & grabner, r. (2012). neuroeducation a critical overview of an emerging field. neuroethics, 5, 105-117. doi: 10.1007/s12152-011-9119-3 arsalidou, m., & taylor, m. j. (2011). is 2 + 2 = 4? meta-analyses of brain areas needed for numbers and calculations. neuroimage, 54, 2382-2393. doi: 10.1016/j.neuroimage.2010.10.009 aue, t., lavelle, l. a., & cacioppo, j. t. (2009). great expectations: what can fmri research tell us about psychological phenomena? international journal of psychophysiology, 73, 10-16. doi: 10.1016/j.ijpsycho.2008.12.017 beck, d. m. (2010). the appeal of the brain in the popular press. perspectives on psychological science, 5, 762-766. doi: 10.1177/1745691610388779 butterworth, b., varma, s., & laurillard, d. (2011). dyscalculia: from brain to education. science, 332, 1049-1053. doi: 10.1126/science.1201536 cacioppo, j. t., berntson, g. g., & nusbaum, h. c. (2008). neuroimaging as a new tool in the toolbox of psychological science. current directions in psychological science,17, 62-67. doi: 10.1111/j.1467-8721.2008.00550.x de bie, h. m. a., boersma, m., wattjes, m. p., adriaanse, s., vermeulen, r. j., oostrom, k. j., huisman, j., veltman, d. j., & delemarre-van de waal, h. a. (2010). preparing children with a mock scanner training protocol results in high quality structural and functional mri scans. european journal of pediatrics, 169, 1079-1085. doi: 10.1007/s00431-010-1181-z de smedt, b., ansari, d., grabner, r. h., hannula, m. m., schneider, m., & verschaffel, l. (2010). cognitive neuroscience meets mathematics education. educational research review, 5, 97105. doi:10.1016/j.edurev.2009.11.001 de smedt, b., ansari, d., grabner, r.h., hannula-sormunen, m., schneider, m., & verschaffel, l. (2011). cognitive neuroscience meets mathematics education: it takes two to tango. educational research review, 6, 232-237. doi:10.1016/j.edurev.2011.10.003 de smedt, b., & grabner, r. h. (2015). applications of neuroscience to mathematics education. in a. dowker & r. cohen-kadosh (eds.) the oxford handbook of mathematical cognition. oxford: oxford university press. doi: 10.1093/oxfordhb/9780199642342.013.48 de smedt, b., grabner, r. h., & studer, b. (2009). oscillatory eeg correlates of arithmetic strategy use in addition and subtraction. experimental brain research, 195, 635-642. doi b. de smedt 14 | f l r 10.1007/s00221-009-1839-9 de smedt, b., holloway, i. d., & ansari, d. (2011). effects of problem size and arithmetic operation on brain activation during calculation in children with varying levels of arithmetical fluency. neuroimage ,57, 771-781. doi:10.1016/j.neuroimage.2010.12.037 de smedt, b., noël, m. p., gilmore, c., & ansari, d. (2013). the relationship between symbolic and non-symbolic numerical magnitude processing skills and the typical and atypical development of mathematics: a review of evidence from brain and behavior. trends in neuroscience and education, 2, 48-55. doi: 10.1016/j.tine.2013.06.001 dehaene, s., piazza, m., pinel, p., & cohen, l. (2003). three parietal circuits for number processing. cognitive neuropsychology, 20, 487-506. doi: 10.1080/02643290244000239 dick, f., lloyd-fox, s., blasi, a., elwell, c., & mills, d. (2014). neuroimaging methods. in d. mareschal, b. butterworth, & a. tolmie (eds.) educational neuroscience. (pp. 13-45). malden, ma: wiley-blackwell. grabner, r. & de smedt, b. (2011). neurophysiological evidence for the validity of verbal strategy reports in mental arithmetic. biological psychology, 87, 128-136. doi:10.1016/j.biopsycho.2011.02.019 grabner, r. h., & de smedt, b. (2012). oscillatory eeg correlates of arithmetic strategies: a training study. frontiers in psychology, 3(428), 1-11. doi: 10.3389/fpsyg.2012.00428 keller, t. a., & just, m. a. (2009). altering cortical connectivity: remediation-induced changes in the white matter of poor readers. neuron, 64, 624-631. doi: 10.1016/j.neuron.2009.10.018 kirk, e. p., & ashcraft, m. h. (2001). telling stories: the perils and promise of using verbal reports to study math strategies. journal of experimental psychology-learning memory and cognition, 27, 157-175. lieberman, m. d., schreiber, d., & ochsner, k. n. (2003). is political cognition like riding a bicycle? how cognitive neuroscience can inform research on political thinking. political psychology, 24, 681-704. doi: 10.1046/j.1467-9221.2003.00347.x nosworthy, n., bugden, s., archibald, l., evans, b., & ansari, a. (2013). a two-minute paper-andpencil test of symbolic and nonsymbolic numerical magnitude processing explains variability in primary school children's arithmetic competence. plos one, 8, e67918. doi: 10.1371/journal.pone.0067918 price, g. r., & ansari, d. (2013). dyscalculia. in o. dulac & m. lassonde (eds.) handbook of clinical neurology (pp. 241-244). london: elsevier. price, g. r., mazzocco, m. m. m., & ansari, d. (2013). why mental arithmetic counts: brain activation during single digit arithmetic predicts high school math scores. journal of neuroscience, 33, 156-163. doi: 10.1523/jneurosci.2936-12.2013 redcay, e., dodell-feder, d., pearrow, m. j., mavros, p. l., kleiner, m., gabrieli, j. d. e., & saxe, r. (2010). live face-to-face interaction during fmri: a new tool for social cognitive neuroscience. neuroimage, 50, 1639-1647. doi:10.1016/j.neuroimage.2010.01.052 siegler, r. s. (1996). emerging minds: the process of change in children's thinking. new york, ny: oxford university press. squire, l. r., berg, d., bloom, f. e., du lac, s., ghosh, a., & spitzer, n. c. (2013). fundamental neuroscience (4th ed.). oxford, uk: academic press. stern, e., & schneider, m. (2010). a digital road map analogy of the relationship between neuroscience and educational research. zdm the international journal on mathematics education, 42, 511-514. doi 10.1007/s11858-010-0278-1 supekar, k., swigart, a. j., tenison, c., jolles, d. d., rosenberg-lee, m., fuchs, l., & menon, v. (2013). neural predictors of individual differences in response to math tutoring in primarygrade school children. proceedings of the national academy of sciences, 110, 8230-8235. doi: 10.1073/pnas.1222154110 ward, j. (2006). the student's guide to cognitive neuroscience. new york, ny: psychology press. zamarian, l., ischebeck, a., & delazer, m. (2009). neuroscience of learning arithmetic: evidence from brain imaging studies. neuroscience and biobehavioral reviews, 33, 909-925. doi: 10.1016/j.neubiorev.2009.03.005 hanin et van nieuwenhoven publication frontline learning research vol.6 no. 2 (2018) 39 65 issn 2295-3159 teaching the problem-solving process in a progressive or a simultaneous way: a question of making sense? vanessa hanina, catherine van nieuwenhoven a a université catholique de louvain, belgium article received 27 september 2017/ revised 12 december / accepted 10 august / available online 30 august abstract over the past two decades, the perennial low success rates of elementary students in math problem solving and the difficulties experienced by teachers in helping their students with this type of task has become quite a hot topic. in response, several instructional interventions aiming to develop an expert and reflexive approach to problem solving have been designed. however, these interventions are based on two contrasting teaching approaches, either teaching the components of the problem-solving process at the same time or teaching them one at the time. a meticulous analysis of the literature indicates that studies that have compared these two teaching approaches have focused primarily on undergraduate students. moreover, they have mainly been assessed in terms of cognitive outcomes. yet, recent studies stress the importance of analyzing the cognitive, motivational and emotional processes involved in problem-solving learning together in order to gain a full understanding of the process. addressing these limitations is essential to enhance our understanding of problem-solving learning and to design more effective interventions. this paper focuses on this issue by investigating whether teaching the problem-solving process in all its complexity or one component at a time is preferable in terms of cognitive, motivational and emotional outcomes. this issue is handled for both novice and expert solvers. data were gathered among 267 upper elementary students. findings showed that both teaching approaches support the shortand long-term acquisition of cognitive problem-solving strategies, regardless of the student’s profile. however, beneficial emotional and motivational outcomes occur only when the problem-solving process is taught in all its complexity, that is, makes sense for the learner. novice solvers made less use of the help-seeking strategy and persisted more. keywords: mathematics problem-solving; emotion regulation strategy; heuristic strategy; novice and expert; teaching practices info corresponding author mail: vanessa.hanin@uclouvain.be doi: 10.14786/flr.v6i2.333 1. introduction in a societal context that lays increasing emphasis on the need for analytical and complex task-resolution skills (depaepe, de corte, & verschaffel, 2010; ntcm, 2010), it is important to question their development and acquisition in the academic context. in terms of mathematics education, curriculum designers have stressed that students develop meaningful mathematical skills, motivate themselves and learn how to deal appropriately with situations encountered in their everyday life through problem-solving tasks (ntcm, 2010). however, both research studies (demonty, blondin, matoul, baye, & lafontaine, 2013; demonty & fagnant, 2014) and international tests (oecd, 2016) have brought to light students’ perennial low success rates in math problem solving. the alarming international test scores stimulated researchers to design educational interventions that aim to develop an expert and reflexive approach to problem solving. most educational research (blum, 2011; de corte, 2012; fagnant & jaegers, 2018; tzohar-rozen & kramarski, 2014) agrees that the development of an expert and reflexive approach to problem solving occurs through the mastery of specific heuristic strategies embedded in an overall metacognitive approach. yet, different assumptions concerning the importance of facilitating realistic meaningful experiences, as opposed to facilitating cognitive processing, lead to different pedagogical approaches. on the one hand, research on the development of an expert and reflexive approach to mathematical problem solving conducted in the field of mathematics instruction stresses the importance of giving all students, whether they are experts or novices, realistic, meaningful, challenging and complex problem-solving tasks, supporting a “simultaneous” teaching approach (blum, 2011; depaepe et al., 2010; van dooren, verschaffel, greer, de bock, & crahay, 2010). on the other hand, studies conducted in the field of cognitive psychology and, more specifically, anchored within the framework of cognitive load theory, have shown that, for novice learners, it is preferable to teach one element at a time to avoid overwhelming their working memory (blayney, kalyuga, & sweller, 2010; pollock, chandler, & sweller, 2002). thus, if research in mathematics instruction supports approaching the problem-solving process in its full complexity, the work carried out in the field of cognitive psychology supports the opposite approach. in addition, there is a lack of understanding of the role of affective components and motivational aspects in the problem-solving process, as both of the traditions mentioned above focus (more or less widely) on cognitive aspects. such understanding therefore appears crucial for the integration of the two approaches. and this is especially the case given that recent studies have pointed out the necessity of considering cognitive, emotional and motivational dimensions when dealing with teaching and learning issues (ahmed, van der werf, kuyper, & minnaert, 2013; pekrun, 2014; tzohar-rozen & kramarski, 2014). the present study aims to extend previous research by overcoming these limitations and enhancing our understanding of the effects on cognitive, motivational and emotional dimensions of teaching the problem-solving process to upper elementary students. more precisely, this paper aims to test the effects of these two teaching approaches (“simultaneous” versus “gradual-all together”) on the frequency of use of heuristic strategies, the valence of the emotions felt, the kind of emotion regulation strategies used and the level of persistence. in addition, given that novice and expert solvers may differ regarding their learning process, a comparison between the two teaching approaches was made for these two learner profiles (muir, beswick, & williamson, 2008; zimmerman & campillo, 2003). 2. theoretical foundations and relevant empirical work first, we examine the problem-solving process. then, the relation of cognitive load theory to the two teaching approaches studied in the present paper is described. subsequently, the role played by emotions in problem-solving tasks and the necessity of developing functional emotion regulation strategies are examined. this section closes with discussion of one important motivational dimension for problem-solving learning, that is, persistence. 2.1 the problem-solving process nowadays, scholars acknowledge that the development of expertise in mathematical problem solving requires the reconceptualization of mathematical problems as exercises in mathematical modeling; that is, considering the statement of a problem as the description of a situation in everyday life that can be modeled mathematically (blum & niss, 1991; de corte, verschaffel, & masui, 2004; fagnant, demonty, & lejonc, 2003). mathematical modeling is viewed as a complex and cyclic process involving a number of phases. in this regard, based on a thorough analysis of the existing literature in the problem-solving field (blum, 2011; fagnant & demonty, 2005; fuchs, fuchs, prentice, burch, hamlett, et al., 2003 ; lucangeli, tressoldi, & cendron, 1998), hanin and van nieuwenhoven (2016) identified, eight heuristic strategies of particular importance in solving non-routine problems that delineate the problem-solving process (figure 1): (1) building a representation of the problem, that is, specifying the relevant contextual and numerical information, the unknown (what we are looking for), and the relationships between these elements via a drawing, a table or a reformulation (situation model); (2) estimating the answer a priori, that is, approximating the solution of the problem by identifying the kind of response that is needed (a measure, a price, etc.), by rounding the numbers, by imagining what the solution is not or by giving an order of magnitude; (3) using one’s knowledge, namely, identifying the mathematical structure of the problem, asking whether one has already solved a similar problem and how one did it (identification of the mathematical model); (4) planning, that is, breaking down the problem into steps; (5) executing the necessary calculations, that is, translating each step of the solution plan by an appropriate mathematical operation and executing it to arrive at a final mathematical result; (6) verifying the relevance of the operations chosen, ensuring compliance with instructions and checking the accuracy of the calculations; (7) interpreting the outcome, that is, making sure that the solution makes sense with regard to the problem statement (plausibility of the solution) and that the solution is congruent with the a prior estimate; (8) for a satisfactory interpretation, the solution is communicated. the two teaching approaches compared in the present study are based on this problem-solving process. it should also be noted that the linear arrangement gives a timeline to be used in the teaching of the solving process. the problem-solving process must be seen as cyclical and constituted of a back-and-forth between the different heuristic strategies. figure 1 diagram of the problem-solving process 2.2 the contribution of cognitive load theory cognitive load theory views the resolution of complex, non-routine mathematical problems as high in element interactivity, where the elements are the heuristic strategies (pollock et al., 2002; van merriënboer & sweller, 2005). in such tasks students must process the different heuristic strategies in working memory simultaneously for learning to occur, since these heuristic strategies are all tightly linked. as many elements must be processed in working memory simultaneously, there is high intrinsic cognitive load . yet we know that working memory is limited. usually the individual’s cognitive architecture constructs schemas in order to handle the problem of working memory overload. schemas organize a large number of elements and take their interactivity into account, while acting as a single element, that is, without overloading working memory. however, these schemas result from a preliminary construction. and therein lies the problem. the construction of such schemas is the result of the simultaneous processing of all of the elements and therefore includes the elements’ interactivity. however, due to working memory limitations, novice learners can only process a few of the elements in working memory simultaneously, which prevents the construction of schemas, that is learning, from taking place. thus, as processing all of the elements simultaneously in working memory is not effective (in the case of elements that are high in interactivity), there remains the option of processing element by element in working memory and thereby eliminating the interactions among them, but at the cost of a reduced understanding. on this point, empirical evidence has shown that it is more beneficial for non-expert learners to begin with a “one-by-one element” approach and, once each element has been examined in isolation, to deal with the full set of elements in an interactive way (what we call the “gradual-all together” approach) rather than to present, from the outset, the material in its full complexity (what we call the “simultaneous” approach) (pollock et al., 2002; van merriënboer & sweller, 2005). regarding expert learners, empirical studies have highlighted that they perform equally well regardless of the teaching approach used, because they already possess relevant problem-solving schemas (pollock et al., 2002). 2.3 the contribution of emotion theory complex math tasks are known to generate negative emotions (hanin, & van nieuwenhoven, 2018; op’t eynde, de corte, & mercken, 2004). however, tasks lacking sufficient challenge also generate negative emotions, mostly boredom (pekrun, goetz, daniels, stupnisky, & perry, 2010). thus, depending on the teaching approach implemented, both “novice” and “expert” problem-solvers might be affected by negative emotions. negative emotions are known to be detrimental to learning. more precisely, studies have reported that negative emotions foster the use of rigid, detail-oriented, and analytical approaches, divert a part of the available cognitive resources from the task, and promote external regulation (pekrun, 2014). not only do students feel negative emotions when dealing with complex math problems but, in addition, they do not regulate these emotions (de corte, depaepe, op’t eynde & verschaffel, 2011). on this point, hanin et al. (2017) highlighted six strategies used by upper elementary school children to regulate their negative emotions when solving math problems. “negative self-talk” involves focusing on the negative aspects of the situation, by dramatizing them, by constantly thinking them over or by convincing oneself that they are beyond one’s control. “dysfunctional avoidance” involves task avoidance, despite the fact that task completion is beneficial in the long run. “emotion expression” refers to the social sharing of one’s emotions. “task utility self-persuasion” involves convincing oneself of the personal utility of the task despite the fact that the task generates unpleasant emotions. “help seeking” concerns seeking peer and teacher assistance. finally, “brief attentional relaxation” involves releasing attention for a few seconds through distraction or relaxation. this strategy covers two sub-strategies, namely, a positive form of distraction and physical relaxation. of these six, students considered the first three to be maladaptive and the other three were judged to be adaptive (hanin, gregoire, mikolajczak, fantini-hauwel, & van nieuwenhoven, 2017). however, although scholars agree on the crucial role played by emotions in problem-solving learning and performance, to our knowledge, no study so far has examined the effect of cognitive training programs on emotions and emotion regulation strategies. 2.4 the contribution of motivation theory alongside emotions, a motivational variable of particular interest when solving mathematical problems is persistence (montague & applegate, 2000), that is, “the behavioral strength that fosters, despite the impediments encountered, the continuation of the actions required by the engagement” (brault-labbé & dubé, 2008, p. 731, our translation). studies conducted by montague and applegate (2000) suggested that task difficulty has a direct influence on middle school students’ persistence. they postulated that “some students "shut down" cognitively when they perceive problems as difficult or when information-processing demands seem excessive” (p. 225). although little research has examined this concept in the context of compulsory schooling, on the basis of the above findings, we assume that learning all the heuristic strategies within the same problem-solving task, that is, as interactive, will raise more difficulties than learning them one at a time. consequently, the “simultaneous” approach might undermine students’ persistence, where the “gradual-all together” approach may have no effect or even support students’ persistence. nevertheless, no study has examined the effect on students’ persistence of specific cognitive teaching approaches to problem solving. 3. aims and hypotheses this study investigates whether taking into account motivational and emotional dimensions leads to the same results as those observed in studies comparing the same two teaching methods but from a purely cognitive perspective. in other words, this study seeks to clarify the following question: “are the principles of cognitive load theory still valid when motivational and emotional dimensions come into play?”. this general issue is dealt with through two research aims. the first aim seeks to clarify whether it is better for learning to handle the problem-solving process in its full complexity – the “simultaneous” approach – or to teach one heuristic strategy at the time the “gradual-all together” approach. studies undertaken within the framework of cognitive load theory have stressed that the “simultaneous” approach consumes much more of the student’s cognitive resources than the “gradual-all together” approach. moreover, studies have also shown that complex tasks give rise to negative emotions. therefore, we expected that the “gradual-all together” group would display a better understanding of the heuristic strategies taught, persist longer in the face of difficulties, feel fewer negative emotions, use appropriate emotion regulation strategies and perform better in problem solving than the “simultaneous” group and the control group (traditional approach) . the second aim examines whether it is appropriate to adopt a different teaching approach according to the learner’s level of expertise in problem solving. consistent with cognitive load theory, we hypothesized that the “gradual-all together” approach would lead to better learning outcomes for “novice” problem-solvers as compared to the “simultaneous” approach and the traditional approach. more precisely, we expected “novices” in the “gradual-all together” group to benefit more from the training program than their peers in the “simultaneous” and control groups. with regard to “expert” problem-solvers, as tasks lacking sufficient challenge are a source of negative emotions (pekrun, 2014; pekrun et al., 2010), we supposed that the “simultaneous” approach would suit them better. these hypotheses were tested through the implementation of a cognitive training program in mathematical problem-solving (section 4.2). as mentioned in the introduction, this comparison fits into a larger project. by examining the role of the motivational and emotional aspects of the problem-solving process, this paper also seeks to integrate otherwise disconnected lines of inquiry and, thereby, to contribute to advancing existing knowledge in the learning research field. it seems crucial to better understand the processes at work during mathematical problem-solving tasks in order to provide practitioners with the most adaptive solutions. 4. method 4.1 participants a sample of 267 upper elementary students took part in the first part of the present study. they came from seven french-speaking belgian schools located in different cities and had different socio-economic backgrounds and levels of performance. the “gradual-all together” group was made up of 86 students (m age = 10.8; sd = 0.89), 48.8% of which were girls; the “simultaneous” group consisted of 79 students (m age = 10.5; sd = 0.66), of which 53.2% were girls. with respect to the control group, it was composed of 102 students (m age = 10.7; sd = 0.87), of which 45.1% were girls. for the second part of the present study, only students with a “novice” or “expert” problem-solver profile, defined on the basis of their problem-solving performance at time 1 (pretest), were sampled. students with an average score below 0.5 out of 1 were called “novice” problem-solvers whereas those with an average score above or equal to 0.8 out of 1 were labelled “expert” problem-solvers. this selection was consistent with the teachers’ assessments. in each group, 43 novice problem-solvers and 12 expert problem-solvers participated in this second study. 4.2 training program the two teaching approaches examined in the present study are based on a similar training program aiming at developing among students an expert and reflexive approach to problem solving (description available in appendix a). with respect to the two teaching approaches tested, as described above, they differed in how the heuristic strategies were taught. the “gradual-all together” approach consisted in teaching one heuristic at a time. in each problem, between one and two heuristics were addressed. it was only when all the heuristic strategies had been examined individually that the learner had to implement, within the same problem, the full set of heuristics, that is, to address them as interactive. conversely, in the “simultaneous” approach, the heuristic strategies were all taught in the first problem while solving it. then, for each subsequent problem, the student had to apply the full set of strategies in a flexible way. during the same period, with a similar frequency, the students in the control group worked on the statement of the same problems. however, unlike the teachers of the “gradual-all together” and of the “simultaneous” groups, the teachers of the control group received no methodological instructions. the problems were therefore handled according to the teacher’s usual instructional practices for problem solving. the intervention extended over five weeks at the rate of two non-routine problems a week plus three weeks of pretest, post-test, and follow-up assessment (see table 1). in addition, note that, for the sake of ecological validity, the training programs were implemented by the regular classroom teachers. table 1 illustration of the research design. note. w1 = week 1; w2 = week 2, etc. as outlined by durlak and dupre (2008), one cannot interpret the results of the implementation of a training program without first ensuring that it has been delivered as planned. therefore, the fidelity of implementation of the training program was evaluated. first, a check-list that contained step-by-step instructions was provided for each lesson. as the teacher completed a step, he or she had to check it off. examination of the checklists showed that teachers completed 95% of the steps as prescribed. second, teachers were asked to keep a diary with their feelings, students’ responsiveness, potential amendments, and any other piece of information considered to be relevant. on this point, while the teachers reported being convinced of the relevance of the training program, they also pointed out that it was too intense. this observation was seen in particular from the teachers of the “simultaneous” group, who reported: “i think that what we did has had an impact, i mean i saw a change, changes in behavior, such as greater ease getting to work, greater autonomy, etc. but, we were very restricted in terms of time and thus i think that it did not leave them the time to digest the information because it was too fast, i think that it was the main problem. this must be done in the longer term”. third, the teachers received a half-day’s training during which the components of the training program were outlined. a manual containing the description of the training program components and the lesson plans as well as detailed examples of anticipated correct representations, procedures and solutions was given to each teacher. finally, meetings were organized to exchange positive experiences and difficulties encountered as well as to take stock of the past lessons and to plan those to follow. 4.3 measures first of all, it is important to point out that the three groups completed all of the measures presented below on three occasions: prior to the intervention, immediately after it, and then six weeks later. problem-solving performance was assessed by means of a performance test made up of three non-routine problems. this test was designed on the basis of our expertise in mathematics teaching and on fagnant and demonty’s (2005) textbook. the students’ performance was appraised by a global score obtained by averaging their scores for the three items on a binary scale (0 = wrong answer, 1 = right answer). the test took on average 30 minutes to complete. it is noteworthy that for each measurement point, the same three mathematical structures were used to design the problem statements, and only the presentation of the problem was modified. in addition, students were asked to indicate all their reasoning and calculations on their sheet, not to erase anything, but to cross out if necessary. heuristic strategies were measured on the basis of students’ written products. more precisely, we scrutinized their pre-test, post-test, and follow-up tests for traces of the application of five of the eight heuristic strategies taught, namely, building a representation, using one’s knowledge, planning, checking the outcome and the procedure, and interpreting the outcome. a score for each heuristic strategy was computed according to a binary scale (0 = missing; 1 = present). note that the presence of each heuristic strategy was measured and not the accuracy. persistence was appraised by means of an adapted and translated version of the “attitude toward mathematics survey” scale (fredricks & mccolskey, 2012), one of the few existing instruments offering a persistence scale that is isolated from behavioral engagement. the adapted version measures persistence in mathematical problem solving through 8 items (e.g. “when i have trouble understanding a math problem, i re-examine it until i understand it”) rated on a 4-point likert scale (1 = (almost) never to 4 = (almost) always). the internal consistency of the global score was satisfactory to good (pretest: α = .65, posttest: α = .87, follow-up test: α = .81). the emotions experienced by the students while solving a mathematical problem were evaluated through a questionnaire presenting facial expressions. these latter included positive emotions (enjoyment, pride, relief), and negative emotions (boredom, fear, anger, hopelessness, shame, worry, frustration, and nervousness) most frequently experienced by elementary and secondary students when dealing with problem-solving tasks (op’t eynde et al., 2004; pekrun, goetz, & frenzel, 2005). students were asked to indicate to what extent they felt each emotion when solving a math problem using a 5-point likert scale (1 = never to 5 = always). the internal consistency of the global score for positive emotions (pretest: α = .78, posttest: α = .74, follow-up test: α = .74) as well as that of the global score for negative emotions (pretest: α = .82, posttest: α = .82, follow-up test: α = .87) was good. emotion regulation strategies were appraised using the children’s emotion regulation scale in mathematics (cers-m) designed and validated by hanin et al. (2017). this questionnaire has shown good psychometric properties with belgian upper elementary students. it consists of 14 items, rated on a 4-point likert scale (ranging from 1= (almost)never to 4 = (almost) always) and targets six strategies used by 5th and 6th graders to regulate their emotions when solving math problems, namely, task utility self-persuasion (e.g. “even if i do not like solving math problems, i tell myself that it is important to do so in order to be able to understand them and thereby to succeed”), help-seeking (e.g. “i ask the teacher to help me to solve the problem”), brief attentional relaxation (e.g. “i put down my pencil for a few seconds and stretch my arms”), emotion expression (e.g. “i tell my neighbor that the problem makes me angry, sad, hopeless, or bored”), negative self-talk (e.g. “i tell myself that it is terrible not being able to solve the problem and that i am sure that it only happens to me”), and dysfunctional avoidance (e.g. “in order not to experience an unpleasant moment, i tell myself that i will solve the problem later”). the cers-m subscales showed acceptable internal consistency for the three measurement times in the present sample with scale reliabilities ranging from .64 to .84. 4.4 analysis to achieve our two research goals, three procedures were carried out. first, mixed-model group (“gradual-all together” vs. “simultaneous” vs. control) x time (time 1 vs. time 2 vs. time 3) repeated measures analyses of variance (anovas) were performed on each measure, with time as the within-subject factor and group as the between-subjects factor. this gave us a general overview of the variables for which the three groups were distinguishable. second, these first results were refined by examining, for all of the variables under consideration, short-term changes (between time 1 and time 2), long-term changes (between time 1 and time 3) and post-intervention changes (between time 2 and time 3). repeated measures anovas were therefore performed. third, for each of the variables presenting a significant short-term, long-term or post-intervention development, paired t-tests were performed in order to characterize the development of each group. we used the bonferroni correction for the t-tests in order to avoid a potential alpha error inflation (field, 2013). we report here the corrected p-values. in addition, in order to have a more accurate understanding of the effects, we reported the effect sizes. the latter were computed on the basis of cumming’s (2012) recommendations, that is, the calculation of cohen’s d to which we applied a correction to remove the effect size’s bias. cohen’s d was calculated using the standard formula: 5. results 5.1 question 1: in terms of learning, would it be better to teach non-routine problem solving according to a “simultaneous” approach or a “gradual-all together” approach? in order to check for any baseline differences at pretest between the three groups, a univariate anova was performed for the variables under consideration. the three groups differed regarding only the emotion expression strategy (see appendix b). 5.1.1 overall effect of the problem-solving intervention significant group x time interactions were found for persistence (f(2,264)=4.244,p=.002,partial η^2=.06). with respect to heuristic strategies, findings revealed significant group x time interactions for three out of the five strategies measured, namely, building a representation of the problem (f(2,264)=25.778,p <.001,partial η^2=.19); using one’s knowledge (f(2,264)=12.971,p <.001,partial η^2=.10) and planning (f(2,264)=10.588,p <.001,partial η^2=.09). regarding the emotion regulation strategies, two of them presented a significant group x time interaction, namely, help-seeking (f(2,264)=3.643,p=.006,partial η^2=.04) and negative self-talk (f(2,264)=2.747,p=.028,partial η^2=.03). however, it is noteworthy that while the effect sizes are moderate to large for the heuristic strategies, they are quite small for the emotion regulation strategies. 5.1.2 shortand long-term effects and change dynamics with respect to short-term changes (between time 1 and time 2), the three groups differed for persistence; for three heuristic strategies, namely, building a representation of the problem, using one’s knowledge and planning; and for two emotion regulation strategies, help-seeking (significant trend) and negative self-talk (see appendix e). a finer analysis showed that, contrary to our expectations, only the “simultaneous” group stood out regarding the persistence variable (t(78)=-4.816,p<.001,d=.78) by displaying a significant increase. however, this finding must be put into perspective, given the relatively small effect size. with respect to heuristic strategies, the three groups presented a significant increase in the use of the strategy of building a representation of the problem. on this point, the “gradual-all together” group (t(85)=-10.184,p<.001,d=.1.57) and the “simultaneous” group (t(78)=-13.711,p<.001,d=2.09) displayed larger effect sizes as compared to the control group (t(101)=-3.494,p=.003,d=.42). with regard to the planning strategy, both the “gradual-all together” group (t(85)=-5.809,p<.001,d=.92) and the “simultaneous” group (t(78)=-6.538,p<.001,d=1.02) displayed a significant improvement. the same results were observed regarding the strategy of “using one’s knowledge”, that is, there was a significant increase for both the “gradual-all together” group (t(85)=-5.706,p<.001,d=.99) and the “simultaneous” group (t(78)=-7.633,p<.001,d=1.21). as far as emotion regulation strategies are concerned, it turned out that the “simultaneous” group resorted less frequently to the help-seeking strategy at time 2 than at time 1 (t(78)=3.155,p=.006,d=.40). furthermore, while a significant group x time interaction was found regarding the negative self-talk strategy, fine-grained t-test analyses revealed that none of the three groups showed significant development. with regard to long-term changes (between time 1 and time 3), four out of the six significant short-term differences found at time 2 compared with time 1 were also significant over the longer term, at time 3 (see appendix e). specifically, the same three heuristic strategies that showed short-term changes also presented significant increases at time 3. the same behavior was observed for the three strategies, that is, a significant increase for both the “gradual-all together” group (building a representation of the problem: t(85)=-7.671,p<.001,d=1.08; using one^' s knowledge: t(85)=-5.023,p<.001,d=.81; planning: t(85)=-4.815,p<.001,d=.77) and the “simultaneous” group (building a representation of the problem: t(78)=-12.972,p<.001,d=1.88; using one^' s knowledge: t(85)=-6.189,p<.001,d=.98; planning: t(85)=-5.251,p<.001,d=.79). finally, with regard to emotion regulation strategies, results indicated a significant decrease in the use of the help-seeking strategy by both the “gradual-all together” group (t(85)=3.341,p=.003,d=.40) and the “simultaneous” group (t(78)=5.440,p<.001,d=.69). however, it is noteworthy that the effect sizes are rather small. in order to capture solely what happened after the intervention and in doing so to distinguish clearly between short and long-term changes, the difference between time 2 and time 3 was investigated (see appendix e). on this point, it appeared that the “gradual-all together” group made substantially less use of the heuristic strategy of building a representation (t(85)=3.428,p=.003,d=.38). nevertheless, the effect sizes are very tenuous. finally, a significant decrease in the use of the emotion expression strategy by the “gradual-all together” group must also be noted (t(85)=2.621,p=.033,d=.25). 5.1.3 discussion our first research question investigated whether it is better for learning to teach the problem-solving process in its full complexity, that is, to teach all of the problem-solving heuristic strategies within the same problem-solving task (the “simultaneous” approach), or to teach one heuristic strategy at a time and only after that to implement them simultaneously within the same task (the “gradual-all together” approach). findings suggest that solving two non-routine mathematical problems weekly without specific methodological instructions is already effective, as reflected in the control group’s increased use between time 1 and time 2 of the ‘building a representation of the problem’ heuristic strategy. however, this change did not persist after the end of the training program (no significant difference between time 1 and time 3). in this respect, hanin and van nieuwenhoven (2016) have shown that this heuristic is one of the few to be traditionally taught in math classrooms, which is confirmed by the descriptive statistics at time 1 (see appendix b). consequently, this strategy is not new for the students; they have already practiced it. on this point, scholars have shown that repeated practice and confrontation with the learning material promote knowledge and skills assimilation and internalization (anderson, 1981; piaget, 1978). thus, when students become familiar with the representation heuristic strategy, which occurs faster thanks to previous practice, they may no longer feel the need to write their representation down and may prefer to do it mentally. teaching problem solving according to a “gradual-all together” approach appeared to be more effective at the cognitive, metacognitive and emotional levels than the traditional approach. at the cognitive level, results highlighted a short-term significant increase in the use of three heuristic strategies, namely, building a representation of the problem, using one’s knowledge, and planning. while the improvement in the two last strategies persisted over time, a decrease in the use of the ‘building a representation of the problem’ heuristic strategy was recorded between time 2 and time 3. this observation supports the familiarization-internalization assumption put forward earlier. in effect, through the training program proposed, not only did students have the opportunity to implement this strategy many times, they also received information about how to implement it, when it is the most convenient strategy, and what it is used for (the www & h rule; veenman, van hout-wolters, & afflerbach, 2006). this method of instruction is known to promote students’ understanding and familiarization with the learning material (tzohar-rozen & kramarski, 2017; veenman et al., 2006). as a result, we may wonder why the other two strategies, namely, using one’s knowledge and planning, were not also internalized by the students. in this regard, let us mention that the ‘building a representation of the problem’ strategy is relevant for each problem. therefore, students had implemented it on many occasions. moreover, descriptive statistics at time 1 (see appendix b) showed that of the three heuristic strategies, ‘building a representation of the problem’ was the only one that had really been exploited in the traditional classrooms before the beginning of the training program. consequently, students had had more opportunities to practice it. conversely, the planning strategy will not be invoked if the problem requires only one or two stage(s) to be solved. with respect to the ‘using one’s knowledge’ strategy, as claimed by fuchs et al. (2003), this will be fully used only at the point when the learner has encountered problems presenting different mathematical structures, has had the opportunity to identify these structures and has attempted these types of problems several times in order to build a typology of the problems’ mathematical structures. at this stage, it is thus more the premises of the strategy that are observed. at the emotional level, the help-seeking strategy appeared to be used to a lesser extent in the long run by the students in the “gradual-all together” group. such a finding would suggest that the “gradual-all together” approach, through appropriate tooling and scaffolding, fostered among students a more responsible and autonomous approach to problem solving, in other words, a self-regulated approach. in effect, in such an approach, students rely less on their teacher and peers (help-seeking strategy) (allal, 2007). nonetheless, contrary to our supposition, the “gradual-all together” approach did not enhance students’ level of persistence. the “simultaneous” approach to problem solving stood out as beneficial not only in terms of the cognitive and metacognitive aspects but also at the emotional and motivational levels. in effect, a short-term increase in the use of three heuristic strategies, namely, building a representation of the problem, using one’s knowledge, and planning, as well as maintenance over time of the level reached were observed among the students of the “simultaneous” group. thus, unlike their counterparts in both the control and the “gradual-all together” groups, the students in the “simultaneous” group seemed to still need to write down their representation of the problem at time 3. this observation, although surprising at first sight, does not invalidate the familiarization-internalization hypothesis. as reported by the teachers, the intensity of the training program, especially in its “simultaneous” version, may not have given students the opportunity to assimilate and internalize entirely the heuristic strategies taught. in this respect, several scholars have pointed out that the duration, the intensity and the frequency of the intervention’s activities impact the findings (durlak, weissberg, dymnicki, taylor, & schellinger, 2011; greenberg, domitrovich, graczyk, & zins, 2005). in addition, the teaching approach itself may, in a complementary way, explain this observation. in the “simultaneous” approach, students’ awareness was raised to deal with the various heuristic strategies in a coordinated way. more precisely, in this approach the representation strategy is viewed as a step determining all the others, that is, the step on which the subsequent heuristic strategies are built. consequently, writing down one’s representation may facilitate the continuation of the problem-solving process. in addition, a short-term decrease in the use of the help-seeking strategy as well as a maintenance over time of this lower level were observed among the “simultaneous” group. contrary to the “gradual-all together” group, for which the decrease appeared only at time 3, in the “simultaneous” group, the same decrease was already observed at time 2. so, contrary to our supposition, approaching the problem-solving process in its full complexity is more beneficial than a step-by-step approach. in other words, it is not enough to equip the learner adequately; the problem-solving approach must make sense for him/her. in this respect, not only did the “simultaneous” approach lead to faster changes in the development of self-regulated behaviors as compared to the “gradual-all together” approach, it was also associated with a short-term increase in persistence. this finding suggests that students are able to handle and to overcome the difficulties encountered, which in turn supports the hypothesis that the “simultaneous” approach contributes to the development of self-regulated behaviors. it follows from this first investigation that the “simultaneous” approach stands out as the most promising one in that it impacts the cognitive, metacognitive, emotional and motivational dimensions of problem-solving learning. however, is the “simultaneous” approach to problem solving appropriate for all profiles of students? 5.2 question 2: is it appropriate to adopt a different teaching approach according to the learner’s level of expertise in problem solving? findings regarding novice problem-solvers are presented first, followed by those for expert learners. let us mention in advance that there were no baseline differences between the three groups of novice problem-solvers (see appendix c). 5.2.1 overall effect of the problem-solving intervention on novices significant group x time interactions were found for three heuristic strategies, namely, building a representation of the problem (f(2,126)=12.830,p <.001,partial η^2=.18), using one’s knowledge (f(2,126)=4.651,p=.001,partial η^2=.07), and planning (f(2,126)=5.054,p=.001,partial η^2=.08). additionally, significant interactions were also found regarding two emotion regulation strategies, namely, help-seeking (f(2,126)=4.949,p=.001,partial η^2=.09) and negative self-talk (f(2,126)=3.484,p=.009,partial η^2=.06). finally, the persistence dimension also showed a significant interaction (f( 2,126)=3.294,p=.013,partial η^2=.09). 5.2.2 shortand long-term effects and change dynamics regarding novices regarding short-term changes, the three groups of novice problem-solvers differed in terms of three heuristic strategies (building a representation of the problem, using one’s knowledge, and planning). in addition, the three groups were also distinguishable on three emotion regulation strategies (that is, task utility self-persuasion, help-seeking, negative self-talk) as well as on persistence (see appendix f). in-depth analyses highlighted a substantial improvement for the three heuristic strategies presenting a significant increase, for both the “gradual-all together” group (building a representation of the problem: t(42)=-6.525,p<.001,d=1.36; using one’s knowledge: t(42)=-3.925,p<.001,d=.98; planning: t(42)=-4.702,p<.001,d=1.17) and the “simultaneous” group (t(42)=-10.898,p<.001,d=2.31; t(42)=-4.611,p<.001,d=.99; t(42)=-4.370,p<.001,d=.94; respectively). as regards emotion regulation, a significant decrease in the use of the help-seeking strategy was recorded among the “simultaneous” group (t(42)=3.204,p=.009,d=.59). unexpectedly, the control group displayed a significant decrease in the use of the negative self-talk strategy (t(42)= 2.727,p=.030,d=.44). however, the magnitudes of the effects regarding emotion regulation strategies are rather weak. in addition, note that although repeated measures anovas indicated a significant interaction regarding the task utility self-persuasion strategy, a finer-grained examination by means of t-tests showed that none of the three groups presented a significant development on this dimension. finally, and contrary to our expectations, only the “simultaneous” group displayed a significant rise in persistence (t(42)=-4.026,p=.003,d=.98). with regard to long-term changes, four out of the six significant differences found at time 2 compared with time 1 were also significant at time 3 (see appendix f). first, the two experimental groups displayed a significant improvement regarding the use of heuristic strategies. more precisely, both the “gradual-all together” group (t(42)=-6.561,p<.001,d=1.14 ; t(42)=-3.415,p=.003,d=.74 ; t(42)=-4.760,p<.001,d=1.10;respectively) and the “simultaneous” group (t(42)=-8.545,p<.001,d=1.70 ;t(42)=-4.128,p<.001,d=.86 ; t(42)=-3.553,p=.003,d=.71 ; respectively) displayed a significant improvement in the use of three heuristic strategies, namely, building a representation of the problem, using one’s knowledge, and planning. in addition, regarding emotion regulation strategies, the “simultaneous” group used the help-seeking strategy much less (t(42)=5.509,p<.001,d=.88). with respect to what happened after the intervention, although repeated measures anovas indicated significant group x time interactions regarding persistence and the negative self-talk strategy, a finer-grained examination via t-tests revealed that there were no significant differences between time 2 and time 3, for any of the three groups. regarding expert problem-solvers, there were no baseline differences between the three groups except for the emotion expression strategy (see appendix d). 5.2.3 overall effect of the problem-solving intervention regarding experts significant group x time interactions were found for two heuristic strategies, namely, building a representation of the problem (f(33,2 )=5.072,p=.002,partial η^2=.28), and planning (f(33,2 )=3.026,p=.026,partial η^2=.19). note that interactions with a significant trend were observed for both the strategy of using one’s knowledge (f(33,2 )=2.533,p=〖.051〗^t,partial η^2=.16) and persistence (f(33,2)=2.841,p=.07,partial η^2=.49). 5.2.4 shortand long-term effects and change dynamics regarding experts as regards short-term changes, the three groups of expert problem-solvers differed on the same cognitive and motivational variables as the novice problem-solvers, that is, persistence, building a representation of the problem, using one’s knowledge and planning (see appendix g). however, unlike novices, the three groups of expert learners were not distinguishable on the emotional dimension. both the “gradual-all together” group (t(11)"=-10.757," p<.001"," d=3.96) and the “simultaneous” group (t(11)=-6.280,p<.001,d=2,47) displayed a significant improvement in regard to the heuristic strategy of building a representation of the problem. the same findings were found regarding the strategy of using one’s knowledge ("gradual-all together\" " group: t(11)=-3.985,p=.009,d=1.66; “simultaneous” group: t(11)=-5.063,p<.001,d=1.92). with respect to the planning strategy, a substantial improvement was observed within the “simultaneous” group (t(11)=-4.710,p=.001,d=2.03). in addition, a significant improvement in persistence was recorded among the “simultaneous” group (t(11)=-4.977,p=.048,d=1.84). with respect to long-term changes, the four significant differences found at time 2 compared with time 1 were still significant at time 3 (see appendix g). as regards the heuristic strategies, both the “gradual-all together” group and the “simultaneous” group used the same three strategies significantly more: building a representation of the problem ("“gradual-all together” group: t" (11)=-4.183,p=.006,d=1.83;”"simultaneous” " group ": t" (11)=-8.848,p<.001,d=2.99), using one’s knowledge ("“gradual-all together” group: t" (11)=-3.40,p=.021,d=1.35;”"simultaneous”" group ": t" (11)=-3.644,p=.012,d=1.38) and planning ("gradual-all together” group: t" (11)=-3.191,p=.027,d=1.12;”"simultaneous” " group ": t" (11)=-2.809,p=.048,d=1.38). finally, a significant increase in persistence was observed among the “simultaneous” group (t(11)=-4.314,p=.006,d=.74). 5.2.5 discussion our second research question examined whether it is productive for the teacher to adopt a different teaching approach (“gradual-all together” vs “simultaneous”) according to the learner’s level of expertise in non-routine problem solving. contrary to what we expected, our findings revealed that while the “gradual-all together” approach and the “simultaneous” approach are equally beneficial regarding the cognitive and metacognitive dimensions, the second approach is more effective with regard to the emotional and motivational aspects of problem-solving learning, regardless of the student’s profile. first, with respect to the cognitive dimension, novice and expert problem-solvers in both approaches displayed a significant increase and maintenance over time of the level reached for three heuristic strategies (building a representation of the problem, using one’s knowledge, and planning). however, only these three heuristic strategies showed this positive development. several assumptions may be put forward to explain why the heuristic strategies of “checking the outcome(s) and the procedure” and “interpreting the outcome” did not experience greater impact by the intervention. with respect to the former, checking is both time-consuming and cognitive resource-consuming in that the student must repeat calculations (outcome checking) and read over his work (procedure checking). the “interpreting the outcome” heuristic strategy, according to the descriptive statistics, presented a steady but not significant increase between the three times of measurement. this slower growth may be partially explained by teachers’ beliefs. nowadays it is widely acknowledged that teachers’ beliefs influence their teaching (beswick, 2006; van der sandt, 2007; wilkins, 2008). in this respect, several scholars have shown that a substantial proportion of teachers, when confronted with non-routine problems for which a realistic answer is expected, display a strong tendency to exclude realistic considerations; in other words, they believe that realistic considerations have no place in math classrooms (depaepe et al., 2010). on this basis, we hypothesize that if, before presentation of the problem-solving process to the teachers, we had carried out a critical analysis of their misconceptions, the heuristic strategy of “interpreting the outcome” would have been used significantly more by their students. however, the positive growth in the use of this heuristic strategy within the three groups suggests that the nature of the problems proposed (requiring students to make sense of the outcome), had already brought about a change, although a slower one than if the problems had been preceded by an analysis. in a complementary way, these two strategies of checking and interpreting are located at the end of the problem-solving process. in this respect, teachers of both the “simultaneous” and the “gradual-all together” groups reported that the training program schedule was very intense, leaving little time for students to master each of the heuristic strategies. consequently, it is possible that the last heuristic strategies were less developed. second, as for emotion regulation strategies, only novice learners experienced significant changes. on this point, it appeared that novices in the “simultaneous” group made less use of the help-seeking strategy. this finding suggests that, as previously mentioned, the “simultaneous” approach supports the development of more autonomous management of the problem-solving process. moreover, a short-term significant decrease in the use of the “negative self-talk” strategy was observed among the novices in the control group. however, this decrease did not last over time. if this observation may sound a little surprising at first sight, as previously mentioned, scholars have shown that the repeated practice of an activity or of a set of knowledge improves the learner’ skills and performance (anderson, 1981; fayol, 2006) and, as a result, diminishes anxiety and emotional internalization behaviors (hampel, meier, & kümmel, 2008; kanfer, ackerman, & heggestad, 1996). thus, the simple weekly processing of two mathematical problems seems enough to reduce significantly the internalization of negative emotions. in our view, the intensity of both the “simultaneous” and the “gradual-all together” approaches may explain why such a decrease was not significant for these two approaches. further, we hypothesize that the short duration of the intervention accounts for the non-maintenance over time of the decrease in “negative self-talk” observed within the control group. third, concerning the motivational dimension, a short-term significant increase in persistence was observed among both novice and expert “simultaneous” learners. although this improvement faded after the end of the training program for the former, it persisted over time for the latter. this finding may be explained by both the intensity and the short duration of the training program. more vulnerable students, such as novice problem-solvers, may not have had sufficient time to deeply assimilate the heuristic strategies taught. consequently, at the end of the training program, they may not have felt better equipped to solve mathematical problems and their persistence fell back to its baseline level. additionally, the lack of challenge or, at least, the low level of challenge involved in the “gradual-all together” approach may account for the stability of the persistence level observed among both novice and expert problem-solvers experiencing this approach. in this connection, a study conducted among seventh grade students underscored a positive and quite strong relationship between persistence in problem-solving tasks and a taste for challenging tasks (malmivuori, 2006). similarly, wolters and rosenthal (2000) showed that enhancing eighth grade students’ interest in working on a task by making it more challenging or meaningful increased their persistence for the task. in short, consistent with the work done on mathematics instruction, it is clear that a training program that approaches problem solving in its full complexity, and thereby puts the focus on an understanding of the process, is cognitively, metacognitively, motivationally and emotionally more fruitful for both the novice and the expert problem-solver than the “gradual-all together” approach. 6. general discussion and conclusion for decades, problem solving has constituted a real stumbling block both for students and for teachers who report having difficulty helping their students with this type of task (fagnant, dupont, & demonty, 2016). while many scholars have developed training programs that aim to develop an expert and reflexive approach to problem solving among elementary and secondary students, they have recommended different teaching approaches without identifying the most effective one (blum, 2011; de corte et al., 2004; fagnant & demonty, 2005; tzohar-rozen & kramarski, 2017). yet, in order to improve the teaching of non-routine problem solving and so improve students’ problem-solving learning and performance, it is important to compare the effectiveness of these various approaches. the present paper contributes to advancing the existing knowledge about problem-solving instruction by examining whether it is more fruitful to teach the problem-solving process in all its complexity (the “simultaneous” approach) or one heuristic strategy at a time (“the gradual-all together” approach). first, our findings highlight that, while both learning approaches support the acquisition of cognitive strategies over the shortand the long-term, if student motivation and a positive emotional rapport with problem-solving tasks are taken into account, the “simultaneous” approach, and thus, the maintenance of complexity, is more beneficial, for both novice and expert students. in this sense, our results support the work done in mathematics instruction that stresses the importance of proposing realistic, meaningful, challenging and complex problem-solving tasks. as these findings contradict current classroom practices, they are of particular importance. on this point, research has shown that in order to facilitate task completion for their students, teachers reduce complex tasks to micro-tasks, especially for students with a “novice” profile (demonty & fagnant, 2014; depaepe et al., 2010). these micro-tasks require students to apply procedures that are meaningless for the task requested. for example, the task of making a representation of a problem does not make sense per se, unless the student then goes on to solve the problem afterward. so, this study draws attention to the fact that the difficulties experienced by students in problem solving have more to do with the inadequacy of the tasks proposed to prepare them for managing complexity than with the management of the complexity itself. concretely, it emphasized that, for a novice to become proficient at solving complex tasks, such as mathematical problems, requires repeated confrontation with such tasks. in this respect, ericsson’s theory of deliberate practice (ericsson, 2008; ericsson, krampe, & tesch-römer, 1993) has shown, through several empirical studies, that considerable practice, in the sense of the number of hours devoted to the practice of the competence that one wishes to acquire, is a prerequisite for the automation of such competence. a practical spin-off would be to increase the practice of solving complex tasks in classes. a second contribution of our study to advancing existing knowledge regards heuristic strategies. it follows from our results that the strategies taught are not acquired at the same pace by the learner. some strategies, less familiar, based on inadequate belief or requiring more cognitive resources, take longer to be integrated. this information is critical for designing more effective training programs, as each strategy occupies a specific and central place in the problem-solving process. a study conducted with students of the same age emphasized that to increase more rapidly students’ use of the “checking” and the “interpreting” strategies, it is necessary to add emotional and motivational support to the cognitive and metacognitive intervention (hanin, & van nieuwenhoven, 2018b). this observation is in line with self-regulated learning theories, which postulate that motivational beliefs and emotions play a key role not only in initiating the learning process but also in sustaining the learner’s efforts throughout the process (boekaerts, 2011; zimmerman, 2011). a third contribution pertains to the emotional aspects of problem-solving tasks. our findings reveal that differences regarding emotional variables between the three groups are quite limited. this study highlights that nurturing one aspect of self-regulation (in the present case, the cognitive one) has little effect on the other ones (here the emotional aspect). so, it would seem that to induce emotional regulation among learners, it is necessary to explicitly teach emotional knowledge and skills. this echoes and supports a recent exploratory study conducted among fifth graders by tzohar-rozen and kramarski (2017). in this way, the present study adds nuance to the empirical studies conducted so far, which have shown that learners who receive metacognitive support for problem-solving tasks display greater general motivation as well (hoffman, 2010; kramarski & gutman, 2006). this means, from a conceptual point of view, that if the processes involved in the regulation of cognition share common points with those involved in the regulation of motivation, they are distinct from those entangled in the regulation of emotions. this observation makes it possible to refine and add nuance to current literature on the subject. fourth and last, we used an original method that consists in establishing a dialogue between two research fields that deal with similar issues but that are not usually involved in an interdisciplinary context with the purpose of advancing existing knowledge about problem solving and allowing new empirical insights. priolet (2014) talked about an “integrative theoretical framework” to designate “the mobilization of works related to mathematics instruction and to both the psychology of learning and of development” (p. 60, our translation). more precisely, in addition to being based on an instructional analysis, our findings underline the necessity of presenting to students teaching-learning situations that take into account their cognitive, emotional and motivational functioning and the contextual features in which the learning takes place in order to be truly functional (maury, 2001; priolet, 2014). the present study confirms that we cannot offer students a training program that has “just” been thought of “mathematically”. so, not only does adopting such an integrative perspective allow for better understanding of the processes at work during mathematical problem-solving tasks and thereby for drawing firmer conclusions for practical guidance, but it is also essential for understanding problem-solving competence in all its complexity. in addition, this study draws attention not only to the importance of providing learning activities that make sense to the learner but also to the difference between “the learner’s academic fulfillment” and “the learner’s academic performance”. in other words, our findings suggest that developing students’ heuristic strategies, emotion regulation strategies and motivation, on the one hand, and increasing his/her performance on the other, are not always compatible processes. the weight placed by our politicians on both national and international tests is a good reflection of current concerns. however, the perennial low success rates, mentioned in the introduction, suggest that this may not be a very productive avenue. maybe it is time to ask whether schools should be more concerned about educating “achievers” or long-term, engaged learners. while the present study yields promising results for both the educational and research perspectives, several limitations that call for further investigations must be noted. first, several changes did not persist, took place quite slowly or were fairly weak. this suggests the need for a training program that is less intense (where the lessons are more spaced out in time), of longer duration (where the learner has more opportunities to implement what he/she has learned in new problem situations) and that directly addresses the emotional dimension (by including lessons on emotions and emotion regulation strategies). such a training program might lead to better uptake of all the heuristic strategies taught, to better emotional regulation, and consequently to better performance. second, as this is, to our knowledge, the first study to investigate these questions at the level of compulsory education, it would be interesting to replicate it with other samples in order to strengthen the stability of the present findings, especially as it partially questions the assumptions of cognitive load theory. third, as previously mentioned, the study would benefit from a more fine-grained measure of the heuristic strategies. on this point, in order to have a better understanding of the link between the heuristic strategies and the performance score, adding a measure of the accuracy of the implementation of each heuristic strategy would be enlightening. fourth, in the present study expert and novice solvers were assigned in terms of high and low performance, as is the case in many studies of novice and expert learners (bassok 2003; brand et al., 2003; muir et al., 2008). however, defining expertise in problem solving exclusively in terms of performance is restrictive. consequently, it would be interesting in future studies to more thoroughly conceptualize these two learner profiles. fifth and last, it is noteworthy that the results regarding the expert problem-solvers depend on the sample size. this was rather small, which makes it necessary to replicate the present study with a bigger sample of expert problem-solvers. disclosure of interest the authors declare that they have no conflicts of interest concerning this article and approve the final article. keypoints examines two contrasting approaches to teaching the problem-solving process; shows that both approaches support the acquisition of cognitive strategies; identifies meaningfulness of approach as key for emotional and motivational benefits to occur; highlights the fact that the emotional dimension concerns only novice problem-solvers; makes it possible to design more targeted and more efficient pedagogical interventions. references ahmed, w., van der werf, g., kuyper, h., & minnaert, a. (2013). emotions, self-regulated learning, and achievement in mathematics: a growth curve analysis. journal of educational psychology, 105 (1), 150-161. doi :10.1037/a0030160 . allal, l. (2007). régulations des apprentissages: orientations conceptuelles pour la recherche et la pratique en éducation [regulation of learning: conceptual guidelines for research and practice in education]. in l. allal & l. mottier lopez (eds.), régulation des apprentissages en situation scolaire et en formation [regulation of learning in educational and instructional situations] (pp. 7-24). brussels: de boeck. anderson, j. r. (1981). cognitive skills and their acquisition. hillsdale, nj: lawrence erlbaum associates. bassok, m. (2003). analogical transfer in problem solving. in j. e. davidson & r. j. sternberg (eds.), the psychology of problem solving (pp. 343-369). new york, ny: cambridge university press. beswick, k. (2006). the importance of mathematics teachers' beliefs. the australian mathematics teacher, 62(4), 17-22. blayney, p., kalyuga, s., & sweller, j. (2010). interactions between the isolated-interactive elements effect and levels of learner expertise: experimental evidence from an accountancy class. instructional science, 38(3), 277-287. doi: 10.1007/s11251-009-9105-x. blum, w. (2011). can modelling be taught and learnt? some answers from empirical research. in g. kaiser, w. blum, r. borromeo ferri, & g. stillman (eds.), trends in teaching and learning mathematical modelling(pp. 15-30). new york, ny: springer. blum, w., & niss, m. (1991). applied mathematical problem solving, modelling, applications, and links to other subjects. state, trends and issues in mathematics instruction. educational studies in mathematics, 22(1), 37-68. doi: 10.1007/bf00302716. boekaerts, m. (2011). emotions, emotion regulation, and self-regulation of learning. in b. j. zimmerman & d. h. schunk (eds.), handbook of self-regulation of learning and performance (pp. 408–425). new york, ny: routledge. brand, s., reimer, t., & opwis, k. (2003). effects of metacognitive thinking and knowledge acquisition in dyads on individual problem solving and transfer performance. swiss journal of psychology, 62 (4), 251-261. doi: 10.1024/1421-0185.62.4.251. brault-labbé, a., & dubé, l. (2008). engagement, surengagement et sous-engagement académiques au collégial: pour mieux comprendre le bien-être des étudiants [academic commitment, over-commitment and under-commitment at secondary school: understanding students’ wellbeing]. revue des sciences de l’éducation, 34(3), 729-751. doi: 10.7202/029516ar. cumming, g. (2012). understanding the new statistics. effect sizes, confidence, intervals, and meta-analysis. new york, ny: routledge. de corte, e. (2012). constructive, self-regulated, situated and collaborative (cssc) learning: an approach for the acquisition of adaptive competence. journal of education, 192(2/3), 33-47. de corte, e., depaepe, f., op’t eynde, p., & verschaffel, l. (2011). students’ self-regulation of emotions in mathematics: an analysis of meta-emotional knowledge and skills. zdm, 43(4), 483-495. de corte, e., verschaffel, l., & masui, c. (2004). the clia-model: a framework for designing powerful learning environments for thinking and problem solving. european journal of psychology of education, 19 (4), 365-384. demonty, i., blondin, c., matoul, a., baye, a., & lafontaine, d. (2013). la culture mathématique à 15 ans. premiers résultats de pisa 2012 en fédération wallonie-bruxelles [mathematical knowledge at 15 years old. first results of pisa 2012 in the wallonia-brussels federation]. les cahiers des sciences de l’education, 34, 1-26. demonty, i., & fagnant, a. (2014). tâches complexes en mathématiques : difficultés des élèves et exploitations collectives en classe [complex mathematical tasks: students’ difficulties and whole-class practice].education et francophonie, 42(2), 173-189. doi : 10.7202/1027912ar. depaepe, f., de corte, e., & verschaffel, l. (2010). teachers' approaches towards word problem solving: elaborating or restricting the problem context. teaching and teacher education, 26(2), 152-160. doi: 10.1016/j.tate.2009.03.016 . durlak, j. a., & dupre, e. p. (2008). implementation matters: a review of research on the influence of implementation on program outcomes and the factors affecting implementation. american journal of community psychology, 41(3-4), 327-350. doi : 10.1007/s10464-008-9165-0. durlak, j. a., weissberg, r. p., dymnicki, a. b., taylor, r. d., & schellinger, k. b. (2011). the impact of enhancing students' social and emotional learning: a meta-analysis of school-based universal interventions. child development, 82(1), 405-432. doi: 10.1111/j.1467-8624.2010.01564.x. elia, i., van den heuvel-panhuizen, m., & kolovou, a. (2009). exploring strategy use and strategy flexibility in non-routine problem solving by primary school high achievers in mathematics, zdm mathematics education, 41(5), 605-618. ericsson, k. a. (2008). deliberate practice and acquisition of expert performance: a general overview. academic emergency medicine, 15, 988-994. ericsson, k. a., krampe, r. t., & tesch-römer, c. (1993). the role of deliberate practice in the acquisition of expert performance. psychological review, 100, 363-406. fagnant, a., & demonty, i. (2005). résoudre des problèmes : pas de problème! guide méthodologique et documents reproductibles. 10/12 ans [solving problems: no problem! methodological guide and reproducible documents. 10/12 years]. brussels : de boeck. fagnant, a., demonty, i., & lejonc, m. (2003). la résolution de problèmes: un processus complexe de « modélisation mathématique » [problem-solving: a complex process of “mathematical modeling”]. bulletin d’informations pédagogiques, 54, 29-39. fagnant, a., dupont, v., & demonty, i. (2016). régulation interactive et résolution de tâches complexes en mathématiques [interactive regulation and complex tasks in mathematics]. in l. mottier lopez & w. tessaro (eds.), le jugement professionnel au cœur de l’évaluation et de la régulation des apprentissages [professional judgment at the core of evaluation and regulation of learning] (pp. 229-251 ). berne: peter lang. fagnant, a., & jaegers, d. (2018). soutenir l’autorégulation cognitive et développer les compétences en résolution de problèmes: une étude exploratoire en fin d’enseignement primaire [supporting cognitive self-regulation and developing problem-solving skills: an exploratory study at the end of primary education]. in s. cartier & l. mottier lopez (eds.), soutien à l'apprentissage autorégulé en contexte scolaire: perspectives francophones [support for self-regulated learning in the school context : francophone perspectives] (pp. 161-181). quebec: presses de l’université du québec. fayol, m. (2006). un esprit pour apprendre [a mind to learn]. in e. bourgeois & g. chapelle (eds.), apprendre et faire apprendre [to learn and to teach] (pp.53-67). paris: puf. field, a. (2013). discovering statistics using ibm spss statistics (4th ed.). london: sage publications. fredricks, j. a., & mccolskey, w. (2012). the measurement of student engagement: a comparative analysis of various methods and student self-report instruments. in s. l. christenson, a. l. reschly, & c. wylie (eds.), handbook of research on student engagement(pp. 763-782). new york, ny: springer us. fuchs, l. s., fuchs, d., prentice, k., burch, m., hamlett, c. l., et al. (2003). explicitly teaching for transfer: effects on third-grade students’ mathematical problem solving. journal of educational psychology, 95(2), 293-305. greenberg, m. t., domitrovich, c. e., graczyk, p. a., & zins, j. e. (2005). the study of implementation in school-based preventive interventions: theory, research, and practice . washington, dc: center for mental health services, substance abuse and mental health administration, u.s. department of health and human services. hampel, p., meier, m., & kümmel, u. (2008). school-based stress management training for adolescents: longitudinal results from an experimental study. journal of youth and adolescence, 37 (8), 1009-1024. doi: 10.1007/s10964-007-9204-4. hanin, v., & van nieuwenhoven, c. (2016). evaluation d’un dispositif pédagogique visant le développement de strategies cognitives et métacognitives en résolution de problème en première secondaire [evaluation of a training-program aiming at the development of cognitive and metacognitive strategies in problem-solving among grade one students]. e-jiref, 2(1), 53-88. hanin, v., grégoire, j., mikolajczak, m., fantini-hauwel & van nieuwenhoven, c. (2017). children’s emotion regulation scale in mathematics (cers-m): development and validation of a self-reported instrument. psychology, 8(13), 2240-2275. doi : 10.4236/psych.2017.813143. hanin, v., & van nieuwenhoven, c. (2018a). evaluation d’un dispositif d’enseignement apprentissage en resolution de problèmes mathématiques: evolution des comportements cognitifs, métacognitifs, motivationnels et émotionnels d’un résolveur novice et expert [evaluation of a training-program in mathematical problem-solving : evolution of cognitive, metacognitive, motivational and emotional behaviors of an expert and a novice solver]. e-jiref, 4(1), 37-66. hanin, v., & van nieuwenhoven, c. (2018b). developing an expert and reflexive approach to problem-solving: the place of emotional knowledge and skills. psychology, 9(2), 280-309. doi: 10.4236/psych.2018.92018. hoffman, b. (2010). i think i can, but i’m afraid to try: the influence of self-efficacy and anxiety on problem-solving efficiency. learning & individual differences, 20, 276-283. kanfer, r., ackerman, p. l., & heggestad, e. d. (1996). motivational skills & self-regulation for learning: a trait perspective. learning and individual differences, 8(3), 185-209. doi: 10.1016/s1041-6080(96)90014-x . kramarski, b., & gutman, m. (2006). how can self-regulated learning be supported in mathematical e-learning environments? journal of computer assisted learning, 22(1), 24-33. lucangeli, d., tressoldi, p. e., & cendron, m. (1998). cognitive and metacognitive abilities involved in the solution of mathematical word problems: validation of a comprehensive model. contemporary educational psychology, 23(3), 257-275. malmivuori, m. l. (2006). affect and self-regulation. educational studies in mathematics, 63(2), 149-164. doi: 10.1007/s10649-006-9022-8. maury, s. (2001). didactique des mathématiques et psychologie cognitive : un regard comparatif sur trois approches psychologiques [didactics of mathematics and cognitive psychology: a comparative look at three psychological approaches]. revue française de pédagogie, 137, 84-89. montague, m., & applegate. (2000). middle school students’ perceptions, persistence, and performance in mathematical problem solving.learning disability quarterly, 23(3), 215-227. doi: 10.2307/1511165. muir, t., beswick, k., & williamson, j. (2008). “i’m not very good at solving problems”: an exploration of students’ problem solving behaviours. the journal of mathematical behavior, 27(3), 228-241. doi: 10.1016/j.jmathb.2008.04.003 . national council of teachers of mathematics (nctm). (2010). why is teaching with problem solving important to student learning? reston: national council of teachers of mathematics. novick, l. r. (1988). analogical transfer, problem similarity, and expertise. journal of experimental psychology: learning, memory, and cognition, 14 (3),510–520. doi: 10.1037/0278-7393.14.3.510 . oecd. (2016). pisa 2015 results. excellence and equity in education (volume i). paris: oecd publishing. op’t eynde, p., de corte, e., & mercken, i. (2004). pupils’ (meta)emotional knowledge and skills in the mathematics classroom. paper presented at the annual meeting of the american educational research association (aera) , san diego. pekrun, r. (2014). emotions and learning. educational practices series. belley, france: international academy of education. pekrun, r., goetz, t., daniels, l. m., stupnisky, r. h., & perry, r. p. (2010). boredom in achievement settings: control-value antecedents and performance outcomes of a neglected emotion. journal of educational psychology, 102(3), 531-549. doi: 10.1037/a0019243. pekrun, r., goetz, t., & frenzel, a.c. (2005). achievement emotions questionnaire-mathematics (aeq-m). user’s manual. unpublished document. university de munich, munich. piaget, j. (1978). success and understanding. cambridge, ma: harvard university press. pollock, e., chandler, p., & sweller, j. (2002). assimilating complex information. learning and instruction, 12(1), 61-86. doi: 10.1016/s0959-4752(01)00016-0 . priolet, m. (2014). enseignement-apprentissage de la résolution de problèmes numériques à l’école élémentaire: un cadre didactique basé sur une approche systémique [teaching-learning of digital problem-solving in elementary schools: a didactic framework based on a systemic approach]. education & didactique, 8(2), 59-86. tzohar-rozen, m., & kramarski, b. (2014). metacognition, motivation and emotions: contribution of self-regulated learning to solving mathematical problems. global education review, 1(4), 76-95. tzohar-rozen, m., & kramarski, b. (2017). meta-cognition and meta-affect in young students: does it make a difference on mathematical problem solving? teachers college record, 119(13). van der sandt, s. (2007). research framework on mathematics teacher behaviour: koehler and grouws' framework revisited.eurasia journal of mathematics, science & technology, 3(4), 343-350. van dooren, w., verschaffel, l., greer, b., de bock, d., & crahay, m. (2010). la modélisation des problèmes mathématiques [modeling mathematical problems]. in m. crahay & m. dutrévis (eds.), psychologie des apprentissages scolaires [psychology of school learning](2nd ed., pp. 199-220). bruxelles: de boeck. van merriënboer, j. j., & sweller, j. (2005). cognitive load theory and complex learning: recent developments and future directions. educational psychology review, 17(2), 147-177. doi: 10.1007/s10648-005-3951-0. veenman, m. v. j., van hout-wolters, b., & afflerbach, p. (2006). metacognition and learning: conceptual and methodological considerations. metacognition and learning, 1(1), 3-14. doi: 10.1007/s11409-006-6893-0. wilkins, j. l. (2008). the relationship among elementary teachers’ content knowledge, attitudes, beliefs, and practices. journal of mathematics teacher education, 11(2), 139-164. doi: 10.1007/s10857-007-9068-2. wolters, c. a., & rosenthal, h. (2000). the relation between students’ motivational beliefs and their use of motivational regulation strategies. international journal of educational research, 33(7-8), 801-820. doi: 10.1016/s0883-0355(00)00051-3 . zimmerman, b. j. (2011). motivational sources and outcomes of self-regulated learning and performance. in b. zimmerman & d. schunk (eds.), handbook of self-regulation of learning and performance (pp. 49-64). new york, ny: routledge. zimmerman, b. j., & campillo, m. (2003). motivating self-regulated problem solvers. in j. e. davidson & r. j. sternberg (eds.), the psychology of problem solving(pp. 233-262). new york, ny: cambridge university press. appendix a. training program’s features. the training program implemented in the present study aimed to develop an expert and reflexive approach to problem solving among students by means of the following components: (1) each regular teacher familiarized his/her students with the eight heuristic strategies/stages depicted in figure 1. this familiarization was performed according to the www & h rule (veenman et al., 2006), which consists in teaching each heuristic strategy by specifying the what (what it consists of), the why (its usefulness), the when (the most relevant point in the problem-solving process at which to implement it), and the how (the way to implement it correctly); (2) the teachers used an open-ended methodology to foster diversity of problem representations, modelings, strategies and procedures (fagnant & demonty, 2005); (3) students were trained and scaffolded to regulate their own problem-solving process in an increasingly autonomous way; (4) the non-routine problems chosen were realistic (i.e., problems were anchored in fifth-grade students’ experiential worlds), complex (i.e., problems made it necessary to implement a mathematical modeling process), and open-ended (i.e., problems could be correctly represented, modeled, and solved by taking different paths), as suggested by de corte et al. (2004). furthermore, as the objective of this training program was the development of a process to solve non-routine problems, only application problems were selected. moreover, except for the first problem, which was solved in groups of five students, problems were solved individually and followed by a whole-class discussion. frontline learning research 3 (2014) 83-101 issn 2295-3159 corresponding author: jessie de naeghel, henri dunantlaan 2, 9000 ghent, belgium, jessie.denaeghel@ugent.be http://dx.doi.org/10.14786/flr.v2i1.84 83 | f l r strategies for promoting autonomous reading motivation: a multiple case study research in primary education jessie de naeghel a , hilde van keer a , ruben vanderlinde a a department of educational studies, ghent university, belgium article received 5 february 2014 / revised 16 february 2014 / accepted 26 april 2014 / available online 11 june 2014 abstract it is important to reveal strategies which foster students’ reading motivation in order to break through the declining trend in reading motivation throughout children’s educational careers. consequently, the present study advances an underexposed field in reading motivation research by studying and identifying the strategies of teachers excellent in promoting fifth-grade students’ volitional or autonomous reading motivation through multiple case study analysis. data on these excellent teachers were gathered from multiple sources (interviews with teachers, sen coordinators, and school leaders; classroom observations; teacher and student questionnaires) and analysed. the results point to the teaching dimensions of autonomy support, structure, and involvement – as indicated by self-determination theory – as well as to reading aloud as critical strategies to promote students’ autonomous reading motivation in the classroom. a school culture supporting students’ and teachers’ interest in reading is also an essential part of reading promotion. the theoretical and practical significance of the study is discussed. keywords: reading motivation; reading promotion; primary education; case studies 1. introduction competence in reading is essential for functioning adequately in today’s society. in this respect, it is crucial to encourage students’ high-quality forms of reading motivation and, therefore, to stimulate them to read more frequently (de naeghel, van keer, vansteenkiste, & rosseel, 2012; wigfield & guthrie, 1997) and master important reading skills (de naeghel et al., 2012; becker, mcelvany, & kortenbruck, 2010; wang & guthrie, 2004). unfortunately, research indicates that intrinsic reading motivation declines as children go through school (guthrie & wigfield, 2000). hence, it is important to uncover strategies which de naeghel et al. 84 | f l r foster students’ “love of reading” in order to break through the declining trend in reading motivation throughout children’s educational careers. reading motivation research indicates that teachers can play a crucial role in sustainably stimulating their students to read for pleasure and information (gambrell, 1996; guthrie & cox, 2001; guthrie, mcrae, & klauda, 2007; guthrie et al., 2006; santa et al., 2000). moreover, encouraging students’ willingness to read can be considered as a critical part of a high-quality education (de naeghel et al., 2012; guthrie & cox, 2001; guthrie et al., 2007), which can equip children from different socioeconomic backgrounds with the necessary reading competencies to be successful in today’s society (oecd, 2004). furthermore, teachers’ activities to promote their students’ volitional or autonomous reading motivation are of importance for achieving equal opportunities for all children, as teachers reach the majority of children independent of their socioeconomic background. in this respect, studying teachers excellent in promoting autonomous reading motivation can reveal critical strategies to promote reading motivation in education. mohan, lundeberg, and reffitt (2008) even explicitly encourage further research on excellent reading teachers. as teachers’ self-reports on their reading instruction do not always correspond with their actual behaviour (pressley, rankin, & yokoi, 1996) and, hence, observations of classroom teaching are explicitly encouraged (mohan et al., 2008), it is essential to study what exactly occurs in classrooms from different methodological perspectives to enhance data triangulation. therefore, a multiple case study research approach has been applied in the current study with an embedded mixed-method design (i.e., mix of quantitative and qualitative research approaches in which the emphasis is placed on the qualitative data; creswell & plano, 2007) to portray the strategies applied by teachers excellent in the promotion of highquality forms of reading motivation. in this respect, the study advances an underexposed field in reading motivation research through the study of what exactly occurs in the classroom practice of teachers excellent in promoting autonomous reading motivation, aiming to identify critical strategies to stimulate students’ willingness to read. moreover, it contributes to classroom practice by formulating practical guidelines for teachers and schools. 1.1 autonomous and controlled reading motivation several studies underline the multidimensional nature of reading motivation (e.g., baker & wigfield, 1999; de naeghel et al., 2012; watkins & coffey, 2004), indicating that children can be motivated for a variety of reasons. in line with the self-determination theory (sdt; ryan & deci, 2000), which is a contemporary and promising motivation theory with a rich and continuously emerging empirical basis, de naeghel et al. (2012) differentiate between qualitatively different types of reading motivation. particularly, autonomous and controlled types of reading motivation are distinguished. autonomous reading motivation, on the one hand, refers to engaging in reading activities for their own enjoyment (e.g., pleasure, interest) or because of their perceived personal significance and meaning (e.g., personal value, importance). on the other hand, controlled reading motivation is defined as reading to meet internal feelings of pressure (e.g., guilt, fear, pride) or to comply with external demands (e.g., expectations, reward, punishment). the present study will especially focus on autonomous reasons for reading, as autonomous reading motivation is associated with more positive outcomes, including higher leisure-time reading frequency, more reading engagement, and better reading comprehension. conversely, controlled reading motivation is related to less frequent reading in leisure time and lower reading comprehension scores (becker et al., 2010; de naeghel et al., 2012). 1.2 promoting reading motivation in the classroom the sdt formulates general guidelines to facilitate autonomous motivation (ryan & deci, 2000). particularly, conditions or teaching dimensions supporting students’ basic psychological needs for autonomy (i.e., the experience of a sense of volition or psychological freedom), competence (i.e., the experience of being confident and effective in action), and relatedness (i.e., the experience of feeling connected to and accepted by others) are argued to encourage students’ autonomous motivation to engage in activities de naeghel et al. 85 | f l r (skinner & belmont, 1993; ryan & deci, 2000; see figure 1). in this respect, it should be noted that the need for autonomy refers to the experience of being the initiator of one’s own behaviour or being selfdetermined and hence differs from acting independently without making an appeal to others (deci & ryan, 1987). the teaching dimensions distinguished in sdt are frequently studied in education in general (e.g., skinner & belmont, 1993; sierens, vansteenkiste, goossens, soenens, & dochy, 2009) as well as in physical education in particular (e.g., chatzisarantis & hagger, 2009; tessier, sarrazin, & ntoumanis, 2008), but less explicitly in primary education and in research on reading motivation. moreover, previous sdtbased research especially adopted a quantitative approach (e.g., chatzisarantis & hagger, 2009; sierens et al., 2003; skinner & belmont, 1993). hence, the focus on qualitative methods in the present study adds value to the sdt literature. the first teaching dimension, autonomy support, refers to giving students age-appropriate choices, recognising and connecting with children’s interests, offering rationales, taking the students’ perspective, and providing students with opportunities to take the initiative during learning activities (reeve, 2002; sierens, 2010; skinner & belmont, 1993). several studies confirm that autonomy-supportive teacher behaviour facilitates autonomous motivation (e.g., soenens & vansteenkiste, 2005) and positive learning outcomes, such as deep-level learning (e.g., vansteenkiste et al., 2005) and performance (e.g., black & deci, 2000). figure 1. teaching dimensions supporting students’ basic psychological needs and hence encouraging autonomous motivation (sdt; based on reeve, 2009). the second teaching dimension, structure, primarily fosters children’s need for competence. structure concerns clearly communicating expectations, responding consistently, providing optimal challenges, offering help and support, and providing positive feedback (reeve, 2002; sierens, 2010; skinner & belmont, 1993). research indicates that structuring by providing optimal challenges and providing positive feedback is positively associated with volitional or autonomous motivation (mouratadis, vansteenkiste, lens, & sideridis, 2008; vallerand & reid, 1984). third, the teaching dimension associated with children’s need for relatedness is involvement or “the quality of the interpersonal relationship with teachers and peers” (skinner & belmont, 1993, p. 573). teachers are involved with their students when they invest personal resources, express affection, and enjoy time with their students (reeve, jang, carrell, jeon, & barch, 2004). involvement is positively related to students’ behavioural and emotional engagement in the classroom (skinner & belmont, 1993). literature explicitly focusing on the encouragement of reading motivation (e.g., edmunds & bauserman, 2006; gambrell, 2011; gaskins, 2008) formulates strategies relating to the significance of providing choices and recognising interests (i.e., autonomy support), scaffolding and positive feedback (i.e., structure), and helping one another and interaction about books (i.e., involvement) as well. consequently, the value of the general teaching dimensions of autonomy support, structure, and involvement is acknowledged involvement relatedness = the experience of feeling connected to and accepted by others psychological need satisfaction autonomous motivation autonomy support structure autonomy = the experience of being self-determined competence = the experience of being confident and effective in action de naeghel et al. 86 | f l r in reading motivation studies and is therefore useful as a frame of reference to explore how teachers specifically encourage autonomous reading motivation in their classrooms. although research on instructional programs focusing on promoting reading motivation in late primary classrooms is relatively rare (guthrie et al., 2007), one instructional program did receive a lot of attention in the research literature, namely concept-oriented reading instruction (cori; e.g., guthrie & cox, 2001; guthrie et al., 2007; guthrie, wigfield, & vonsecker, 2000; wigfield et al., 2008). cori combines reading strategy instruction, conceptual knowledge in science, and support for students’ reading motivation. the theoretical justification for practices which influence children’s motivation in cori (e.g., providing students with age-appropriate choices linked to personal interests, providing collaborative support to stimulate interpersonal interaction) comes in part from the abovementioned sdt teaching dimensions (guthrie, 2004; guthrie et al., 2000). however, it should be noted that the adoption of sdt in reading motivation research to study the enhancement of students’ autonomous reading motivation remains rather limited and fragmented. above and beyond the significance of the sdt teaching dimensions of autonomy support, structure, and involvement the literature stresses the importance of teachers acting as reading models, valuing reading and sharing the “love of reading” to enhance their students’ reading motivation (gambrell, 1996; pecjak & kosir, 2008). teachers’ reading aloud is in this respect considered an effective strategy to stimulate students’ reading for enjoyment (fisher, flood, lapp, & frey, 2011; gambrell, palmer, codling, & mazzoni, 1996; pecjak & kosir, 2008). middle school students, for example, explicitly corroborate the value of their teachers’ reading out loud (ivey & broaddus, 2001). the literature, however, reveals contrasting results with respect to the effectiveness of reading aloud in early childhood education (e.g., morrow & gambrell, 2002; meyer, wardrop, linn, & hastings, 1993). in this respect, lane and wright (2011) emphasise that especially a systematic approach to reading aloud (e.g., dialogic reading; whitehurst et al., 1999) yields important academic benefits for children (e.g., increasing vocabulary, listening comprehension, word-recognition skills). since teachers are part of a broader school environment or community, it can be argued that the school culture can support and foster teachers’ and students’ willingness to invest in reading. in this respect, taylor, pearson, clark, and walpole (2010) indicate that effective schools indeed prioritise reading at both the class and school level. nevertheless, the role of the school and the specific school culture is still underexposed in reading motivation research. daniels and steres (2011) argue that schools’ prioritising of reading as a school-wide goal and hence fostering a climate in which teachers and students are expected and stimulated to read will positively influence students’ engagement. particularly, they encourage the allocation of a specific time for students to read self-selected books during the school day, support for teachers and administrators to read and discuss their reading with students, teachers’ professional development on literature, and investment in classroom libraries. moreover, literature underlines the role which literacy coaches can play in professionally supporting teachers to reflectively consider and improve the quality of classroom reading instruction and student learning. often, literacy coaches coordinate and support the literacy program of a school as well (steckel, 2009; vanderburg & stephens, 2010; walpole & blamey, 2008). 1.3 aim of the present study the present study is innovative in a number of ways. this study extends previous sdt research by applying sdt in research on primary school students and reading motivation. moreover, whereas numerous sdt-based studies relied solely on quantitative research, the present study adopts an embedded mixedmethod approach. this study also builds on the literature on reading motivation by studying reading aloud (fisher et al., 2011; gambrell et al., 1996; pecjak & kosir, 2008) and by exploring the critical role of the school’s reading culture for teachers’ classroom practices (daniels & steres, 2011; taylor et al., 2010). de naeghel et al. 87 | f l r the present study aims at contributing to theory on strategies to promote autonomous reading motivation and at offering guidelines for teachers’ classroom practice. in this respect, this study explores whether sdt’s teaching dimensions (i.e., autonomy support, structure, and involvement; reeve, 2002; skinner & belmont, 1993), reading aloud, and the reading culture at school can be identified as valuable strategies and stimulating contexts for the promotion of autonomous reading motivation in late primary classrooms. to pursue this goal, teachers excellent in promoting autonomous reading motivation were selected for a multiple case study research, as reading research explicitly expresses a need for further research on excellent reading teachers (mohan et al., 2008). 2. methodology 2.1 design a multiple case study research design (yin, 1989) was chosen, since on the one hand it affords an excellent way to identify and describe how teachers promote autonomous reading motivation and on the other hand it contributes to the establishment of theory on the promotion of autonomous reading motivation. also, the present study is regarded as an embedded mixed-method design (cresswell & plano, 2007). 2.2 teacher selection the present study is part of a broader research project on reading motivation and the promotion of reading motivation in flemish (belgium) late primary education. this study questioned 1270 fifth-grade students and their 67 teachers. on the basis of this large-scale enquiry, three teachers were selected for the present case study research, mrs. k, mrs. s, and mr. t (see table 1), according to two criteria. first, in an open-ended teacher questionnaire the three selected teachers self-reported applying several reading promotion strategies in their classroom (e.g., book promotion, reading aloud, small-group reading activities) and engaging in reading projects at the school level (e.g., school library, book club). second, their students reported high levels of recreational autonomous reading motivation on the self regulation questionnaire (srq)-reading motivation (see data collection section for a description of the instrument and table 1 for more detailed background information on the selected teachers; mrs. k’s class: m = 4.10, sd = 0.71, mrs. s’s class: m = 3.98, sd = 1.01, and mr. t’s class: m = 4.14, sd = 0.55; sample mean of all classes [n = 67] = 3.63, sd = 0.99; de naeghel et al., 2012). these two criteria reflect the selected teachers' excellence in terms of encouraging autonomous reading motivation. the three selected teachers agreed to participate in the present study. 2.3 data collection for the three selected teachers, qualitative and quantitative data regarding the class and school context were collected from multiple sources to enhance data triangulation. first, semi-structured teacher interviews were conducted which questioned their own reading motivation, their perception of their students’ reading motivation, and the practice of activities at class and school level to promote reading motivation. additional semi-structured interviews were conducted with special educational needs (sen) coordinators and school leaders to explore the role of the school in promoting students’ willingness to read. sen coordinators are members of the school team with both a supportive function towards students and teachers and a coordinating function aimed at optimising the school’s sen policy. second, field notes were taken by the researcher during at least two classroom observations of different reading activities in each class. third, two questionnaires were administered to teachers and their students to assess their reading motivation (srqreading motivation, de naeghel et al., 2012) and execution/perception of teaching dimensions (i.e., autonomy support, structure, and involvement; teacher as a social context (tasc) questionnaire, belmont, skinner, wellborn, & connell, 1988). fourth, school documents (e.g., the school website and inspectorate reports) were analysed. de naeghel et al. 88 | f l r 2.3.1 measurement scales students’ autonomous reading motivation was measured with the srq-reading motivation (de naeghel et al., 2012). each of the eight items of the autonomous reading motivation subscale was administered twice, with regard to motivation for recreational reading on the one hand (e.g., “i read in my free time, because it is important for me to read”) and motivation for academic reading on the other hand (e.g., “i read for school, because it is important for me to read”). in this respect, recreational reading referred to reading in students' leisure time and academic reading was defined as reading at school and for homework. items were scored on a five-point likert scale, ranging from one (disagree a lot) to five (agree a lot). the eight-item subscales had a good internal consistency with cronbach’s α = .90 and cronbach’s α = .92 respectively. the three teachers completed a slightly adapted version of the srq-reading motivation which measured autonomous reading motivation in general (i.e., without distinguishing between the recreational and academic context) and leaving out some less age-related items (e.g., “i have to prove myself that i can get good reading grades”). students’ perception of the teaching dimensions of autonomy support (e.g., “my teacher gives me a lot of choices about how i do my schoolwork”), structure (e.g., “my teacher doesn’t make clear what he/she expects of me in class”), and involvement (e.g., “my teacher likes me”) were assessed with the short version of the tasc questionnaire (belmont et al., 1988; sierens, vansteenkiste, goossens, soenens, & dochy, 2009). the eight-item subscales structure and involvement had an acceptable internal consistency, with cronbach’s α = .67 and cronbach’s α = .75 respectively. regarding autonomy support, four items were deleted, since they raised questions during administration and were found to be too difficult for fifth-graders. this resulted in a four-item subscale with an acceptable internal consistency, cronbach’s α = .62. items were scored on a five-point likert scale, ranging from one (disagree a lot) to five (agree a lot). teachers completed an adapted version of the tasc teacher questionnaire (belmont et al., 1988), which measured their execution of autonomy support, structure, and involvement in interaction with their students. 2.4 data analysis data analysis consisted of two phases, a vertical and a horizontal analysis. in the vertical analysis qualitative and quantitative data on each teacher were collected and a within-case analysis was performed (miles & huberman, 1994). the interview transcripts, school documents, and field notes were labelled with descriptive codes (summarising the content of text fragments) and subsequent interpretative codes (reflecting concepts from the theoretical framework). we designed the coding scheme starting with the three teaching dimensions as described in sdt (i.e., autonomy support, structure, and involvement; ryan & deci, 2000) and further developed it in the light of the interpretative data. text fragments with the same codes were clustered and interpreted with the use of the conceptual framework of this study. moreover, teacher and student questionnaires (srq-reading motivation, de naeghel et al., 2012; tasc, belmont et al., 1988) were analysed with spss 18. the analysis of the qualitative and quantitative data resulted in a case-specific report for each teacher which presented the data in the same format. in the second phase, the case-specific reports were subject to cross-site or horizontal analysis (miles & huberman, 1994) in which the cases were systematically compared for similarities and differences. to safeguard the quality of the data analysis, the intermediary results, interpretations, and conclusions were critically discussed by the researchers. 3. results 3.1 vertical analysis data presented in the three case-specific reports are structured around the same themes: (1) context and teacher profile, (2) classroom design (i.e., the availability of reading material, reading promotion material, etc.) aimed at reading promotion in the class, (3) classroom strategies (i.e., teaching dimensions: autonomy support, structure, and involvement; and reading aloud), and (4) school-level strategies on reading de naeghel et al. 89 | f l r motivation. the selection of these themes was based both on theory and empirical evidence (de naeghel & van keer, 2013; daniel & steres, 2011; fisher et al., 2011; gambrell et al., 1996; marinak & gambrell, 2007; mullis, martin, kennedy, & foy, 2007; reeve, 2002; sierens, 2010; skinner & belmont, 1993; steckel, 2009). in the case-specific reports the source of results is mentioned in parentheses. table 1 presents background information on the three selected teachers, their classes, and schools. table 1 background information on the three selected teachers, their classes, and schools mrs. k mrs. s mr. t teacher gender female female male age 49 35 34 teaching experience 28 years 10 years 14 years class number of students 16 19 17 mean student age 10.75 (0.31) 10.91 (0.40) 10.82 (0.45) school educational network subsidised private (roman catholic) subsidised private (roman catholic) community district type city city rural note: standard deviation in parentheses. 3.1.1 promotion of autonomous reading motivation in mrs. k’ s classroom context and teacher profile. mrs. k is a 49-year-old teacher with 28 years of teaching experience. she teaches fifth grade in a small school located just outside the city. there are 16 students in her class, who are on average 11 years old. mrs. k spends about 100 minutes a week on reading instruction. she uses “taalsignaal” as a teaching manual for the dutch language lessons. her preferred teaching methods are whole-class instruction, small-group instruction, and independent work. mrs. k hesitates to call herself a motivated reader, since she does not spend a lot of time reading novels. on the other hand, she is interested in journals, newspapers, informative books, etc. for gathering information [teacher interview] and reports that she is an autonomously motivated reader [table 2, element a]. classroom design. approximately 40 journals and 60 informative books are on the shelves. the children’s book week (i.e., a national reading project) theme “secrets” is illustrated on the bulletin board and books by anthony horowitz are displayed on a small table [observation 1]. classroom strategies. autonomy support effected by affording choices, offering rationale, and taking the students’ perspective is not so prominent in mrs. k’s teaching style [appendix 1, elements a, c, and d]. she discusses various text genres and text fragments provided in the manual in a systematic way, posing rather standard questions: who?, what?, what about?, etc. in her opinion, the manual offers fascinating texts and nice illustrations with the potential to promote reading pleasure [appendix 1, element b]. although both mrs. k and the school leader consider writing book reviews a questionable motivational strategy, students are required to write 10 reviews of self-selected reading material (i.e., six novels, one informative book, one comic book, and two poems) following an imposed format [appendix 1, element a]. she is enthusiastic about “panel reading” as instructional practice which implies discussing and presenting informative texts in small groups. it gives students opportunities to be more self-determined [appendix 1, element e]. after finishing their appointed tasks, students have the opportunity to read self-selected books or journals individually [appendix 1, elements a and e]. she provides structure by communicating her expectations [appendix 1, element f], offering students support when needed [appendix 1, element h], and providing positive feedback [appendix 1, element i]. mrs. k is greatly involved in interpersonal relationships with her students. she takes time for and expresses enjoyment in the interactions with her de naeghel et al. 90 | f l r students [appendix 1, element j]. the greater attention to structure and involvement compared with autonomy support is reflected in higher scores on the related subscales in the teacher survey [table 2, element b]. her students say they perceive more structure and involvement than autonomy support, but these remain moderate [table 2, element d]. moreover, her students report moderate levels of autonomous reading motivation [table 2, element c]. next to these sdt teaching dimensions, mrs. k acknowledges the value of reading aloud to promote children’s reading motivation. she does not invest a lot of time in it, however. further, mrs. k engages in national reading projects [teacher interview and observation 1]. in the teacher interview she said: “in the children’s book week, i read aloud every day. but otherwise … i don’t have time for it, to my regret.” school-level strategies. mrs. k’s school has a large library, founded and run by the school leader. the library is open during lunch break and puts narrative as well as informative books at students’ disposal. the collection is frequently updated to stimulate students’ curiosity. in this respect, the school leader tries to pass on his “love of reading” to children and their parents by creating a reading culture at school. moreover, he promotes national reading projects [school leader interview]. table 2 teachers’ and students' autonomous reading motivation and execution/perception of teaching dimensions mrs. k mrs. s mr. t teacher a. autonomous reading motivation a 4.00 5.00 3.88 b. execution of teaching dimensions autonomy support a 3.88 3.63 3.88 structure a 4.14 4.00 4.00 involvement a 4.63 3.75 4.63 students c. autonomous reading motivation a mean recreational reading motivation a 3.15 (0.72) 3.63 (.73) 3.98 (.46) mean academic reading motivation a 3.11 (0.88) 3.43 (.80) 3.91 (.57) d. perception of teaching dimensions mean autonomy support a 3.56 (0.67) 2.56 (0.68) 3.76 (0.27) mean structure a 3.52 (0.52) 3.69 (0.43) 3.77 (0.22) mean involvement a 3.45 (0.57) 3.68 (0.65) 3.92 (0.43) note: a subscale scores range from one to five, with five indicating a higher score. standard deviation in parentheses. 3.1.2 promotion of autonomous reading motivation in mrs. s’s classroom context and teacher profile. mrs. s is a 35-year-old teacher with 10 years of teaching experience. she teaches languages, social studies, and sciences half-time in a small school located in the city. there are 19 students in her class, who are on average 11 years old. mrs. s spends about 130 minutes a week on reading instruction. she uses “taalsignaal” as a teaching manual for the dutch language lessons. her preferred teaching methods are whole-class instruction, small-group instruction, and independent work. mrs. s is an autonomously motivated reader, devouring novels, informative books, comics, etc. in her free time as well as for her professional development [teacher interview and table 2, element a]. classroom design. narrative and informative books are on the shelf at the back of the classroom. the collection is often renewed with books from the public library, depending on the themes discussed in the social studies and sciences lessons. approximately 500 narrative and informative books are located in a small library room nearby the classroom [teacher interview and observation 1]. classroom strategies. mrs. s provides autonomy support especially by fitting in with students’ interests [appendix 1, element b], offering rationales [appendix 1, element c], and providing students with de naeghel et al. 91 | f l r opportunities to be initiators of their own behaviour [appendix 1, element e]. for instance, her students are tutors for their third-grade peers in a reading project combining direct instruction in reading comprehension strategies and cross-age peer tutoring to practice their reading skills with self-selected books. in this respect, she explicitly discusses with her students why being a good tutor and using reading strategies is important. mrs. s, the sen coordinator, and the students experience these opportunities to read together as motivating [observation 1, teacher interview, and sen coordinator interview]. after finishing their appointed tasks, students have the opportunity to work independently on additional material or to read self-selected books [appendix 1, elements a and c]. moreover, fifth-grade students write one book review on a self-chosen book (i.e., design a new cover, write a summary, and make a drawing [appendix 1, element a]). mrs. s and the sen coordinator underline the importance of providing students with fascinating texts to promote reading pleasure. in mrs. s’s opinion, the manual does not offer enough interesting texts to practice reading comprehension. therefore, mrs. s often brings new reading material from the public library into the classroom to stimulate students’ willingness to read [appendix 1, element b]. it should be noted, however, that giving students choices occurs primarily during peer tutoring sessions [appendix 1, element a]. mrs. s provides structure by having a clear plan of the day, communicating her expectations [appendix 1, element f], and providing student support [appendix 1, element h]. as part of the reading peer tutoring project she gives direct instruction in reading comprehension strategies, supporting students’ reading competence [appendix 1, element h]. mrs. s further invests a lot in interpersonal relationships with her students. she cares about how students do in class and takes their needs into account as much as possible. in other words, she is involved [appendix 1, element j]. data from the teacher survey illustrate that she especially provides structure and to a somewhat lesser extent is involved with her students and supports their autonomy [table 2, element b]. students’ reports indicate that her students perceive more structure and involvement than autonomy support [table 2, element d]. furthermore, her students report moderate levels of autonomous reading motivation [table 2, element c]. besides implementing the sdt teaching dimensions mrs. s reads aloud frequently. in the teacher interview she said: “i bring books to read aloud to stimulate them. … reading aloud is just for fun. no questions afterwards.” mrs. s further engages in national reading projects [teacher interview]. school-level strategies. as mentioned above, the reading project combining direct instruction in reading comprehension strategies and cross-age peer tutoring is organised across different grades. not only fifth and third grade, but also sixth and second, and fourth and first grade read together in this school reading project. currently, the teachers themselves are responsible for running the project, coordinated by mrs. s. in the early stages of the project, the school leader and sen coordinator were more involved [teacher, school leader, and sen coordinator interview]. furthermore, the sen coordinator reads picture books in all grade levels to introduce new school projects [sen coordinator interview]. finally, there is a study group, in which mrs. s takes part, which works out new ideas regarding reading and reading promotion in staff meetings [school leader interview]. 3.1.3 promotion of autonomous reading motivation in mr. t’s classroom context and teacher profile. mr. t is 34 years old and has 14 years of teaching experience. he teaches fifth grade in a small private school in the countryside. there are 17 students in his class, who are on average 11 years old. mr. t spends about 120 minutes a week on reading instruction. he uses “taalmakker” as a teaching manual for the dutch language lessons. his preferred teaching methods are whole-class instruction and independent work. mr. t especially reads to gain knowledge. he prefers short passages in newspapers and journals, comics, and children’s books [teacher interview] and reports that he is an autonomously motivated reader [table 2, element a]. classroom design. two bookshelves filled with approximately 50 narrative and informative books and two boxes with comics are put at students’ disposal. a bean-bag seat and the step in front of the classroom provide a reading spot [observation 1]. to expand the number of books in the class library, mr. t asks parents to donate comic books that are no longer read at home and to give a book to the class as a birthday gift instead of sweets. each week one student gets the role of librarian by lottery [teacher interview]. de naeghel et al. 92 | f l r classroom strategies. mr. t provides autonomy support by affording choices [appendix 1, element a], fitting in with students’ interests [element b], offering rationales [element c], taking students’ perspective [element d], and providing students with opportunities to be initiators of their own behaviour [element e]. more specifically, he tries to teach reading in a meaningful context (e.g., making a class garden, solving puzzles, keeping abreast of current events [appendix 1, element c]). in his opinion, the manual offers fascinating texts for teaching reading comprehension. in addition, he brings newspaper and journal articles to the class to study, a tradition which is copied by his students [appendix 1, element b]. mr. t likes to challenge his students with group assignments (e.g., making a picture book, searching for key words in various text passages [appendix 1, element e]. he asks his students to make a drawing of a self-selected book during holidays, which is then presented in the classroom [appendix 1, element a]. after finishing their appointed tasks, students have the opportunity to read or draw [appendix 1, element e]. he provides structure by passing on his expectations [element f], providing optimal challenges [element g], offering support to his students [element h], and giving them constructive feedback [appendix 1, element i]. moreover, he is greatly involved in interpersonal relationships with his students. mr. t attaches great importance to creating a respectful classroom atmosphere and listening to students’ personal stories [appendix 1, element j]. in the teacher survey mr. t reports that he is highly involved with his students [table 2, element b]. students’ reports confirm they experience autonomy support, structure, and involvement [table 2, element d]. moreover, the students report that they are autonomously motivated to read [table 2, element c]. next to implementing the sdt teaching dimensions, mr. t reads aloud each friday afternoon to create a stimulating reading atmosphere. in the teacher interview he stated: “… children really enjoy it. i create a nice reading atmosphere, reading expressively and immersing myself in the book … and by doing so the interest of students in books certainly grows.” his students are involved in the selection of the book and each finished book results in a creative project (e.g. a play, a scale model of the village described in the book). furthermore, mr. t engages in national reading projects [teacher interview]. school-level strategies. mr. t’s school pays a lot of attention to reading. his school organises an overall reading project from kindergarten to sixth grade. the project was set up by mrs. l, the school’s sen coordinator and literacy coach (steckel, 2009; walpole & blamey, 2008), and the school leader. the project’s theme is a story about a boy, “jonah sprout,” who meets all kinds of letters during a boat trip. his boat (an old yard wagon) comes ashore in the school’s playground. in the school “jonah sprout” is represented by a puppet [sen coordinator and school leader interview]. mrs. l, the literacy coach, acts as a pioneer for all reading activities at school. she promotes the children’s book week, the reading aloud week, and poetry day (national reading projects). during staff meetings she illustrates possible activities and provides teachers with the necessary reading material. the introduction and closure of all reading activities is a collective school event. each activity is introduced by a play with “jonah sprout” in the leading role and closed with a presentation of reading activities of each grade. moreover, mrs. l organises a book club for students of fifth and sixth grade in “jonah sprout”'s boat. during book club time, books are discussed and approached in a creative way (e.g., reading and cooking a recipe, improvising the end of a story [sen coordinator interview]). 3.2 horizontal analysis 3.2.1 classroom strategies for promoting autonomous reading motivation sdt’s teaching dimensions. in line with more general sdt research, the teaching dimensions of autonomy support, structure, and involvement (reeve, 2002; skinner & belmont, 1993) could be identified as critical strategies promoting autonomous reading motivation in particular and this in each of the three cases. the selected teachers, however, especially differ in the extent and manner of the autonomy support they provide. de naeghel et al. 93 | f l r as mentioned above, autonomy support primarily nurtures students’ need for autonomy or selfdetermined behaviour (ryan & deci, 2000). students’ autonomy is first supported by giving students ageappropriate choices (appendix 1, element a; reeve, 2002; skinner & belmont, 1993). in the three cases this is mainly reflected in opportunities to select books for independent reading and book reviews. whereas mrs. k provides an imposed format for the book reviews, mrs. s and mr. t allow more creativity and personal input. in addition, mr. t occasionally provides choices between different assignments. in all three cases, however, there are still opportunities to enlarge the number of choices regarding what students read and how they engage in and complete reading tasks (gambrell, 2011). second, the three teachers recognise the importance of fitting in with students’ interests to promote autonomous reading motivation (appendix 1, element b; reeve, 2002; skinner & belmont, 1993). in this respect especially, mrs. s and mr. t bring supplementary reading material into the classroom related to topics studied in social studies and sciences, students’ social environment, or the news. furthermore, each of the three teachers has a classroom library, containing narrative and informative books, and sometimes comics or journals, which are at students’ disposal during independent reading. third, students’ autonomy is encouraged by the offer of rationales (appendix 1, element c; skinner & belmont, 1993). mrs. s clearly explains to her students why she teaches certain topics. mr. t, on the other hand, tries to offer a rationale by teaching reading in a meaningful context. in contrast, mrs. k does not seem to invest a lot of effort in this strategy. fourth, taking the students’ perspective was only explicitly observed in mr. t’s classroom (appendix 1, element d; reeve, 2002; skinner & belmont, 1993). finally, the three selected teachers apply various instructional strategies, such as panel reading, group work, cross-age peer tutoring, and independent reading, that allow students to be more self-determined or volitional and therefore fulfil the need for autonomy and encourage autonomous motivation for reading (appendix 1, element e; reeve, 2002; sierens, 2010; skinner & belmont, 1993). in general, mr. t provides the highest level of autonomy support by providing choices, fitting in with students’ interests, teaching reading in a meaningful context, taking the students’ perspective, and providing opportunities to his students to be initiators of their own behaviour [appendix 1, elements a to e]. from the student questionnaires it can be noted that his students corroborate to perceive the highest level of autonomy support [table 2, element d] and, moreover, report the highest level of recreational and academic autonomous reading motivation [table 2, element c]. the fact that mr. t’s students indicate not only the highest perceived autonomy support but also the highest level of autonomous reading motivation is certainly an argument in favour of his autonomy-supportive teacher behaviour (reeve, 2002; skinner & belmont, 1993). according to the interpretative data, mrs. k, in contrast, appears to be the least autonomy-supportive of the three participating teachers. a closer look at the results of the teacher and student questionnaire suggests that mrs. k and mr. t report equal practice of autonomy-supportive behaviour [table 2, element b]. furthermore, mrs. k's students perceive a higher level of autonomy support than mrs. s’s students [table 2, element], although her students do report lower levels of recreational and academic autonomous reading motivation [table 2, element c]. this finding illustrates how differences in research methods (i.e., interpretative or quantitative) can lead to different perspectives and conclusions, as detailed observation and questioning of stakeholders (i.e., interpretative methods) and information gathering by surveys (i.e. quantitative methods) probably do not address the research questions in exactly the same manner. nevertheless, these methods can jointly help to create a fuller and more nuanced picture of what exactly happens in the classroom. the teaching dimension structure, which promotes students’ need for competence (reeve, 2002; sierens, 2010; skinner & belmont, 1993), is more or less equally addressed by the three case study teachers. all three communicate their expectations to the students, offer help and support, and provide positive feedback [appendix 1, elements f to i]. in addition, mrs. s invests the most time in explicitly teaching reading comprehension strategies to foster students’ competence in reading [appendix 1, element h]. mr. t invests the most in providing optimal challenges by giving stimulating group tasks [appendix 1, element g]. de naeghel et al. 94 | f l r results of the teacher questionnaire corroborate the roughly equal levels of structure in their classrooms [table 2, element b], although the students of mrs. s and mr. t experience structure related to teaching practices slightly more in their classrooms [table 2, element d]. the teaching dimension of involvement, associated with the need for relatedness (reeve et al., 2004; skinner & belmont, 1993), is most prominent in the teaching style of the three selected teachers. all three invest a lot in interpersonal relationships with their students by explicitly making time to listen to students’ personal stories and interests, expressing enjoyment in the interaction with their students, and taking students’ needs into account [appendix 1, element j]. furthermore, mrs. s and her school’s sen coordinator explicitly point to the importance of reading together as a motivating strategy, confirming the relevance of involvement between students (reeve et al., 2004; skinner & belmont, 1993) and opportunities to collaborate as in cori (guthrie & cox, 2001). according to the teachers’ responses in the teacher questionnaire, mrs. k and mr. t seem to be most highly involved with their students. moreover, mr. t receives the highest score on involvement from his students [table 2, elements b and d], corroborating the qualitative interview and observational data [appendix 1, element j]. reading aloud. next to the teaching dimensions of autonomy support, structure, and involvement, reading aloud is recognised as an important strategy to promote autonomous reading motivation in the three cases (pecjak & kosir, 2008). in particular, mrs. s and mr. t often read aloud to stimulate their students’ reading behaviour, whereas mrs. k reports that she generally does not have enough time for it. further, mr. t explicitly indicates that he creates a stimulating reading atmosphere and involves his students in the selection of the reading material. 3.2.2 school-level strategies for reading promotion the three participating teachers belong to schools which recognise the importance of reading. in mrs. k’s school the presence of a large library and the dedication of the school leader to managing the library communicate to teachers, students, and parents how strongly reading is appreciated by the school, and, hence, that encouraging reading is significant. in mrs. s’s school, mrs. s plays a prominent role herself in coordinating a reading project which combines direct instruction in reading comprehension strategies with cross-age peer tutoring across different grades and in participating in a study group on reading and reading promotion. moreover, the sen coordinator of her school reads picture books in all grade levels. the school leader and literacy coach of mr. t’s school organise an overall reading project from kindergarten to sixth grade. additionally, the literacy coach supports the teachers in promoting reading in their classroom and organises a book club for fifth and sixth graders. in sum, each of the three teachers belongs to a school that palpably acknowledges the importance of reading and therefore confirms that a school culture focusing on school-wide reading has potential to encourage teachers’ and students’ engagement (daniels & steres, 2011) and motivation for reading. 4. discussion and conclusion in order to break through the declining trend in reading motivation throughout children’s educational careers, it is important to identify strategies which enable teachers to encourage students’ autonomous reading motivation. in this respect, the present study furthers an underexposed field in reading motivation research by studying and identifying the strategies of teachers excellent in the promotion of volitional or autonomous reading motivation. sdt formulates general guidelines or teaching dimensions to facilitate autonomous motivation (ryan & deci, 2000). these general teaching dimensions of autonomy support, structure, and involvement could be identified as critical strategies to promote autonomous motivation for reading in the classroom practice of the selected teachers. in this respect, the present study points to the theoretical significance of adopting these teaching dimensions in reading motivation research, as the sdt teaching dimensions have rarely been explicitly studied in the specific context of reading motivation before and on the basis of our results appear to be transferable and relevant to this research area. it should be noted that the participating de naeghel et al. 95 | f l r teachers more or less equally addressed the teaching dimensions of structure and involvement, whereas they differed particularly in the extent and manner of the autonomy support they provided. this indicates that even some of the selected teachers apparently invest less in or have more difficulties with supporting their students’ autonomy and suggests autonomy support is a powerful strategy with opportunities for growth. next to the significance of the sdt teaching dimensions, the results confirm the relevance of reading aloud as an effective classroom strategy to stimulate students’ willingness to read (fisher et al., 2011; gambrell et al., 1996; pecjak & kosir, 2008). further research is, however, needed to collect more detailed information on teachers’ specific approach to reading aloud (lane & wright, 2011). what is of interest as well is that the teachers considered as excellent in promoting autonomous reading motivation belong to schools that invest in reading at school level, underlining the importance of a school-wide interest in and attention to reading (daniels & steres, 2011; taylor et al., 2010). as the role of the school and school culture is still underexposed in reading motivation research, follow-up studies could enlarge their focus to how schools (e.g., school members [teachers, school leaders, literacy coaches, etc.], policy, projects, and curriculum) contribute to a supportive reading environment in order to formulate additional guidelines for school practice. the identified strategies for promoting autonomous reading motivation are of particular importance for teaching practice and accordingly for teachers’ professional development in both pre-service and inservice training. considering the significant influence of the home environment on students’ reading motivation (swalander & taube, 2007), teachers can play a crucial role in positively motivating all of their students to read (gambrell, 1996; santa et al., 2000). in this way, they invest in equipping their students with the necessary reading competencies to be successful in today’s society, striving for equal opportunities for all. further, the identified strategies are valuable as tools for reflection on and improvement of teachers’ and schools’ reading promotion approach and practice. first, it appears that the sdt teaching dimensions (reeve, 2002; skinner & belmont, 1993) can be implemented and integrated relatively easily in classroom practice, as they merely involve a change of attitude and awareness of sdt’s frame of reference. in this respect, teachers can make their own reading activities more supportive of autonomous reading by applying the sdt teaching dimensions (e.g., providing choices between different reading materials, offering rationales for learning activities, providing positive feedback to their students) without having to make time-consuming changes to their reading curriculum. as mentioned above, teachers can invest particularly in making their reading activities more autonomy-supportive (e.g., providing choices between different activities, matching students' interests, taking the students’ perspective), as even teachers indicated as excellent in promoting autonomous reading motivation still have opportunities for growth. additionally, as reading aloud remains an important and valuable activity in late primary education, teachers can invest more time in reading aloud in class to stimulate children’s interest in reading. they can underline its significance for instance by scheduling reading aloud in the plan for the week. moreover, teachers and schools can be inspired by the described school-level reading strategies to further their own school-wide reading policy. this study focused on the strategies of teachers considered to be excellent in promoting autonomous reading motivation. it can be expected that the identified strategies will be less explicitly present in the daily classroom practice and schools of teachers who are less excellent or even rather poor at promoting autonomous reading motivation. hence, these strategies can function as guidelines to improve their reading activities. nevertheless, further research should offer insight into the classroom practices of teachers who are less than excellent in promoting autonomous reading motivation and explore possibilities to improve their skills through teacher training. in sum, the present study points to the theoretical and practical significance of adopting sdt’s teaching dimensions (i.e., autonomy support, structure, and involvement) as well as to reading aloud as critical strategies to encourage students’ autonomous reading motivation in the classroom. moreover, a school culture supporting students' and teachers' interest in reading is essential. de naeghel et al. 96 | f l r keypoints this study extends sdt research by applying sdt in research on primary school students and reading motivation and by adopting an embedded mixed-method design this study contributes to reading motivation research by identifying the strategies of teachers excellent in the promotion of reading motivation this study indicates autonomy support, structure, and involvement as critical strategies to promote autonomous reading motivation in the classroom this study confirms the relevance of reading aloud as an effective classroom strategy to stimulate students’ willingness to read this study builds on the literature on reading motivation by highlighting the critical role of the school’s reading culture for teachers’ practices acknowledgements this research was supported by a grant from the special research fund of ghent university (bijzonder onderzoeksfonds universiteit gent). references baker, l., & wigfield, a. (1999). dimensions of children's motivation for reading and their relations to reading activity and reading achievement. reading research quarterly, 34, 452-477. doi:10.1598/rrq.34.4.4 becker, m., mcelvany, n., & kortenbruck, m. (2010). intrinsic and extrinsic reading motivation as predictors of reading literacy: a longitudinal study. journal of educational psychology, 102, 773-785. doi: 10.1037/a0020084 belmont, m., skinner, e., wellborn, j., & connell, j. (1988). teacher as social context: a measure of student perceptions of teacher provision of involvement, structure, and autonomy support [tech. rep. no. 102]. rochester, ny: university of rochester. black, a. e., & deci, e. l. (2000). the effects of instructors' autonomy support and students' autonomous motivation on learning organic chemistry: a self-determination theory perspective. science education, 84, 740-756. doi: 10.1002/1098-237x(200011)84:6<740::aid-sce4>3.0.co;2-3 chatzisarantis, n. l. d., & hagger, m. s. (2009). effects of an intervention based on self-determination theory on self-reported leisure-time physical activity participation. psychology & health, 24, 29-48. doi: 10.1080/08870440701809533 creswell, j. w., & plano, c. v. l. (2007). designing and conducting mixed methods research. thousand oaks, calif.: sage publications. daniels, e., & steres, m. (2011). examining the effects of a school-wide reading culture on the engagement of middle school students. research in middle level education online, 35, 1-13. deci, e. l., & ryan, r. m. (1987). the support of autonomy and the control of behavior. journal of personality and social psychology, 53, 1024-1037. doi: 10.1037//0022-3514.53.6.1024 de naeghel, j., van keer, h., vansteenkiste, m., & rosseel, y. (2012). the relation between elementary students’ recreational and academic reading motivation, reading frequency, engagement, and comprehension: a self-determination theory perspective. journal of educational psychology, 104, 10061021. doi: 10.1037/a0027800 de naeghel et al. 97 | f l r de naeghel, j., & van keer, h. (2013). the relation of student and class-level characteristics to primary school students’ autonomous reading motivation: a multilevel approach. journal of research in reading, 36, 351-370. doi: 0.1111/j.1467-9817.2013.12000.x edmunds, k. m., & bauserman, k. l. (2006). what teachers can learn about reading motivation through conversations with children. the reading teacher, 59, 414-424. doi: 10.1598/rt.59.5.1 fisher, d., flood, j;, lapp, d., & frey, n. (2004). interactive reading-alouds: is there a common set of implementation practices? the reading teacher, 58, 8-17. doi:10.1598/rt.58.1.1 gambrell, l. b. (1996). creating classroom cultures that foster reading motivation. the reading teacher, 50, 14-25. gambrell, l. b. (2011). seven rules of engagement: what's most important to know about motivation to read. the reading teacher, 65, 172-178. doi: 10.1002/trtr.01024 gambrell, l. b., palmer, b. m., codling, r. m., & mazzoni, s. a. (1996). assessing motivation to read. reading teacher, 49, 518-533. doi: 10.1598/rt.49.7.2 gaskins, i. w. (2008). ten tenets of motivation for teaching struggling readers – and the rest of the class. in r. fink & s. j. samuels (eds.), inspiring reading success. interest and motivation in an age of highstakes testing (pp. 98-116). newark, de: international reading association. guthrie, j. t. (2004). classroom contexts for engaged reading: an overview. in j. t. guthrie, a. wigfield & k. c. perencevich (eds). motivating reading comprehension. concept-oriented reading instruction (pp. 1-24). mahwah, nj: lawrence erlbaum associates. guthrie, j. t., & cox, k. e. (2001). classroom conditions for motivation and engagement in reading. educational psychology review, 13, 283-302. doi: 10.1006/ceps.1999.1044 guthrie, j. t., mcrae, a., & klauda, s. l. (2007). contributions of concept-oriented reading instruction to knowledge about interventions for motivations in reading. educational psychologist, 42, 237-250. doi: 10.1080/00461520701621087 guthrie, j. t., & wigfield, a. (2000). engagement and motivation in reading. in m. l. kamil, p. b. mosenthal, p. d. pearson, & r. barr (eds.), handbook of reading research: volume iii (pp. 403-422). mahwah, nj: lawrence erlbaum associates. guthrie, j. t., wigfield, a., humenick, n. m., perencevich, k. c., taboada, a., & barbosa, p. (2006). influences of stimulating tasks on reading motivation and comprehension. journal of educational research, 99, 232-245. doi:10.3200/joer.99.4.232-246 guthrie, j. t., wigfield, a., & vonsecker, c. (2000). effects of integrated instruction on motivation and strategy use in reading. journal of educational psychology, 92, 331-341. doi:10.1037/00220663.92.2.331 ivey, g., & broaddus, k. (2001). “just plain reading”: a survey of what makes students want to read in middle school classrooms. reading research quarterly, 36, 350–377. doi: 10.1598/rrq.36.4.2 lane, h. b., & wright, t. l. (2011). maximizing the effectiveness of reading aloud. the reading teacher, 60, .668-675. doi: 10.1598/rt.60.7.7 marinak, b.a. & gambrell, l.b. (2007). choosing and using informational text for instruction in the primary grades. in b.j. guzzetti (eds.) literacy for a new millennium: early literacy. (pp. 141-154). westport, ct: praeger. meyer, l.a., wardrop, j.l., linn, r.l., & hastings, c.n. (1993). effects of ability and settings on kindergartners’ reading performance. journal of educational research, 86, 142–160. doi: 10.1080/00220671.1993.9941153 miles, m., & huberman, m. (1994). qualitative data analysis: an expanded sourcebook. thousand oaks, ca: sage. mohan, l., lundeberg, m. a., & reffitt, k. (2008). studying teachers and schools: michael pressley's legacy and directions for future research. educational psychologist, 43, 107-118. doi: 10.1080/00461520801942292 morrow, l.m., & gambrell, l.b. (2002). literature-based instruction in the early years. in s.b. neuman & d.k. dickinson (eds.), handbook of early literacy research (pp. 348–360). new york: guilford. mouratadis, m., vansteenkiste, m., lens, w., & sideridis, g. (2008). the motivating role of positive feedback in sport and physical education: evidence for a motivational model. journal of sport & exercise psychology, 30, 240-268. de naeghel et al. 98 | f l r mullis, i. v. s., martin, m. o., kennedy, a. m., & foy, p. (2007). iea’s progress in international reading literacy study in primary school in 40 countries. chestnut hill, ma: timms & pirls international study center, boston college. organisation for economic co-operation and development [oecd]. (2004). learning for tomorrow’s world. first results from pisa 2003. paris: oecd. pecjak, s., & kosir, k. (2008). reading motivation and reading efficiency in third and seventh grade pupils in relation to teachers' activities in the classroom. studia psychologica, 50, 147-168. pressley, m., rankin, j., & yokoi, l. (1996). a survey of instructional practices of primary teachers nominated as effective in promoting literacy. elementary school journal, 96, 363-384. doi: 10.1086/461834 reeve, j. (2002). self-determination theory applied to educational settings. in e. l. deci, & r. m. ryan (eds.), handbook of self-determination research (pp. 183-203). rochester, ny: university of rochester press. reeve, j. (2009). understanding motivation and emotion. hoboken, nj: john wiley & sons, inc. reeve, j., jang, h., carrell, d., jeon, s., & barch, j. (2004). enhancing students' engagement by increasing teachers' autonomy support. motivation and emotion, 28, 147-169. doi:10.1023/b:moem .0000032312.95499.6f ryan, r. m., & deci, e. l. (2000). self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. american psychologist, 55, 68-78. doi:10.1037/0003-066x.55.1.68 santa, c. m., williams, c. k., ogle, d., farstrup, a. e., au, k. h., baker, b. m., et al. (2000). excellent reading teachers: a position statement of the international reading association. journal of adolescent & adult literacy, 44, 193-199. soenens, b., & vansteenkiste, m. (2005). antecedents and outcomes of self-determination in three life domains: the role of parents' and teachers' autonomy support. journal of youth and adolescence, 34, 589-604. doi: 10.1007/s10964-005-8948-y sierens, e. (2010). autonomy-supportive, structuring, and psychologically controlling teaching: antecedents, mediators, and outcomes in late adolescents [unpublished dissertation]. leuven: kuleuven. sierens, e., vansteenkiste, m., goossens, l., soenens, b., & dochy, f. (2009). the interactive effect of perceived teacher autonomy support and structure in the prediction of self-regulated learning. british journal of educational psychology, 79, 57-68. doi: 0.1348/000709908x304398 skinner, e. a., & belmont, m. j. (1993). motivation in the classroom reciprocal effects of teacherbehavior and student engagement across the school year. journal of educational psychology, 85, 571581. doi: 10.1037/0022-0663.85.4.571 steckel, b. (2009). fulfilling the promise of literacy coaches in urban schools: what does it take to make an impact? the reading teacher, 63, 14-23. doi: 10.1598/rt.63.1.2 swalander, l., & taube, k. (2007). influences of family based prerequisites, reading attitude, and selfregulation on reading ability. contemporary educational psychology, 32, 206-230. doi:10.1016/j.cedpsych.2006.01.002 taylor, b. m., pearson, p. d., clark, k., & walpole, s. (2010). effective schools and accomplished teachers: lessons about primary-grade reading instruction in low-income schools. the elementary school journal, 101, 121-165. doi: 10.1086/499662 tessier, d., sarrazin, p., & ntoumanis, n. (2008). the effects of an experimental programme to support students' autonomy on the overt behaviours of physical education teachers. european journal of psychology of education, 23(3), 239-253. doi: 10.1007/bf03172998 vanderburg, m., & stephens, d. (2010). the impact of literacy coaches: what teachers value and how teachers change. the elementary school journal, 111, 141-163. doi: 10.1086/653473 vansteenkiste, m., simons, j., lens, w., soenens, b., & matos, l. (2005). examining the motivational impact of intrinsic versus extrinsic goal framing and autonomy-supportive versus internally controlling communication style on early adolescents' academic achievement. child development, 76, 483-501. doi:10.1111/j.1467-8624.2005.00858.x walpole, s., & blamey, k. l. (2008). elementary literacy coaches: the reality of dual roles the reading teacher, 62, 222-231. doi: 10.1598/rt.62.3.4 http://www.vopspsy.ugent.be/pdfs/download.php?own=mvsteenk&file=bjep2009.pdf http://www.vopspsy.ugent.be/pdfs/download.php?own=mvsteenk&file=bjep2009.pdf de naeghel et al. 99 | f l r watkins, m. w., & coffey, d. y. (2004). reading motivation: multidimensional and indeterminate. journal of educational psychology, 96, 110-118. doi: 10.1037/0022-0663.96.1.110 whitehurst, g.j., zevenbergen, a.a., crone, d.a., schultz, m.d., velting, o.n., & fischel, j.e. (1999). outcomes of an emergent literacy intervention from head start through second grade. journal of educational psychology, 91, 261–272. doi: 10.1037/0022-0663.91.2.261 wigfield, a., guthrie, j. t., perencevich, k. c., taboada, a., klauda, s. l., mcrae, a., & barbosa, p. (2008). role of reading engagement in mediating effects of reading comprehension instruction on reading outcomes. psychology in the schools, 45, 432-445. doi:10.1002/pits.20307 yin, r. k. (1989). case study research: design and methods. thousand oaks, ca: sage. appendix 1 examples of the execution of sdt’s teaching dimensions in mrs. k’s, mrs. s’s, and mr. t’s classroom reading activities examples and illustrations mrs. k mrs. s mr. t strategies to promote autonomous reading motivation autonomy support a. providing choices students write ten reviews of self-selected reading material, i.e., six novels, one informational book, one comic book, and two poems, following an imposed format. “i oblige them a little, since there are children who wouldn’t read anything otherwise. … they have still read something and maybe it motivates them to choose a book on their own … but possibly it lets them take a dislike to reading.” [teacher interview] during a group assignment students have the opportunity to choose their group members. [observation 1] students choose books or journals for independent reading. [observation 1] students write one book review on a self-chosen book (e.g., design a new cover, write a summary, and make a drawing [teacher interview]). students choose reading books during cross-age peer tutoring sessions and independent reading. [observation 1] every holiday students make a drawing of a self-selected book, which is presented in class afterwards. [teacher interview] students are involved in the selection of the book that is read aloud. [teacher interview] during a group assignment students have the opportunity to choose between different text passages, e.g. newspaper, children’s newspaper, difficult sentences, … [observation 1] during a group assignment students have the opportunity to choose their partner. [observation 2] students choose books or comics for independent reading [observation 1] b. fitting interests according to mrs. k the manual offers fascinating texts and nice illustrations. [teacher interview] students are very enthusiastic about the mystery theme of the group assignment. [observation 1] mrs. s often brings new reading material from the public library into the classroom, since the manual does not offer many interesting texts in her opinion. [teacher interview and observations] “a story should be exciting, certainly for that age! … and some children, not all of in mr. t’s opinion the manual offers fascinating texts. [teacher interview] mr. t brings newspaper and journal articles to the class to study with his students. [teacher interview and observation 1] “i search for ageappropriate things (to read) such as first love. the de naeghel et al. 100 | f l r them of course, are interested in all kinds of details about famous historical figures.” [teacher interview] “we try to relate lessons to real-world experiences, if possible … to involve the children and stimulate their interests. … however, i still have to impose the material that i have to teach.” [teacher interview] “i think the most important thing in motivating children is matching their interests.” [sen coordinator interview] giggling, the familiarity, that’s great … i especially start from reality. … in short, … situations from their environment.” [teacher interview] “don’t force them. show them that there is something about their interests, perhaps a journal, an informative book, a novel, ... there is something for everyone.” [teacher interview] c. offering rationale “clearly indicating lesson goals … i don’t do that.” [teacher interview] mrs. s discusses with her students why being a good tutor and using reading strategies is important. [observation 1] “i try to communicate why we do certain things. for instance, when i started class this morning i clearly indicated what we would do and why. it motivates them to engage in the activity.” [teacher interview] mr. t teaches reading in a meaningful context, e.g., making a class garden, solving puzzles, reading about current events, … [teacher interview] “… offering a whole range of possibilities, preferably as integrated as possible … makes them realize the relevance …” [school leader interview] d. taking students’ perspective reciting some difficult sentences, a boy stumbles over his words. students laugh. the boy feels mocked, sits down on the ground, and starts crying. mr. t lets him know that it is okay, accepts his emotional outburst, gives him some time, and talks to him during playtime. [observation 1] e. initiator of own behaviour during “panel reading” students discuss and present informative texts in small groups. “the students really like panel reading.” [teacher interview] after finishing the appointed tasks, students have the opportunity to read. [teacher interview] the timing during the first group assignment is very restrictive. students have seven minutes to find the answer to some questions on the blackboard concerning the author of a book. “still three minutes. … still 30 seconds! … stop!” [observation 1] fifth-graders are tutors for their third-grade peers in a reading project combining direct instruction in reading comprehension strategies and cross-age peer tutoring. “… in reading comprehension children often work together … most children enjoy working together.” [teacher interview and observation 1] “certainly reading together … stimulating by reading together.” [sen coordinator interview] after finishing the appointed tasks, students can work on additional material or read a book. “… additional material to do on their own mr. t provides challenging group tasks such as making a picture book, creating a play, making a class garden, … “in the spring we make a class garden. students look for a step-by-step plan to make the garden.” [teacher interview and observations] after finishing the appointed tasks, students have the opportunity to read or draw. [teacher interview] de naeghel et al. 101 | f l r has to be fun and motivating and can be completed at their own speed.” [teacher interview] structure f. communicating expectations mrs. k communicates step by step what the children are expected to do in the group assignment. first, make three groups of three and two groups of four. go with your group to a computer and open the dutch webpage of wikipedia. search for an answer to the following questions. … [observation 1] “formulate this in a sentence, please.” [observation 1] “planning is very important for children, … knowing first we will do this, and afterwards that, …” [teacher interview] mr t clearly communicates how to fulfil the group assignment. first, go to your group. second, choose a group leader. third, turn over the sheet with the assignment. … [observation 1] g. providing optimal challenges mr t provides challenging group tasks such as making a picture book, creating a play, making a class garden, … [teacher interview and observations] h. offering help and support the teacher drops hints to help the children find the right answer to the riddles and questions in the group assignment. [observation 1] mrs. s provides direct instruction on reading comprehension strategies. [teacher interview] after reading the text, she helps the children to answer the more difficult questions. for instance, she rereads a certain passage aloud. [observation 2] the teacher drops hints on how to decode the mysterious title of one of the assignments on the blackboard. [observation 1] mr. t suggests strategies for the social studies and sciences’ test. [observation 1] i. providing positive feedback “well read.” [observation 1] “very good, brief and to the point.” [observation 1] “and giving positive feedback. great! stimulate. okay, doesn’t matter, chin up.” [teacher interview] involvement (j.) the children may whisper the answer of the riddles or questions in mrs. k’s ear. [observation 1] when mrs. k reads a book, she asks the children to come and sit around her with cushions. [observation 1] “i believe the school library encourages a strong exchange between students. … some children say to each other: i read a nice book. you should certainly read it too! ” [teacher interview] “who is already thinking about his future?” children enthusiastically tell the teacher their dreams about future professions. mrs. s takes time to listen to her students’ stories. [observation 2] “i try to take the children into account as much as possible.” [teacher interview] “i listen to their story, their interests, their favourite books … i encourage their participation.” [teacher interview] “we become equal, respecting each other. i respect them, they respect me.” [teacher interview] “i listen if there is something they want to tell me … i go to a soccer game, a dance show which my students are taking part in. i show my interest in more than the regular lessons, e.g. how was soccer or rope skipping? i have a little chat with them in the playground. i just make sure that they like me and vice versa.” de naeghel et al. 102 | f l r [teacher interview] one of the students talks about difficulties at home. mr. t listens carefully and gives moral support. [observation 1] codepen unlusoy publication frontline learning research vol.8 no. 2 (2020) 109 130 issn 2295-3159 expanding the notion of global learning: turkish-dutch teens’ networked configurations for learning aslı ünlüsoya, mariëtte de haana a utrecht university, the netherlands article received 25 october 2018 / revised 28 february 2020 / accepted 6 march/ available online 7 may abstract digital technology facilitate interactions between learners and resources at a global level. new learner prototypes are therefore proposed, such as the notion of the global learner. in this paper, we argue that these prototypes of global learning often do not account for the variety of ways in which youth use technology and see themselves as learners. we take the example of turkish-dutch youth to show empirically how they represent an alternative for what is often seen as the prototype of what a global learner is. we combine ego-network methodology with in-depth interviews to provide a detailed account of how 25 turkish-dutch teens see themselves as learners, how they make use of technology to pursue their interests, how they reach out to others and media resources, and how they form selves in relation to the values and norms of their (transnational) community. using the notion of ‘learner identity’, the study shows how these teens develop learner identities that are built on specific and culturally informed notions of ‘what a learning subject is’ that challenge the universality of the autonomous subjectivity implied in prototypical notions of the global learner. in addition, the study shows how through digital affordances, unique networked (trans)national connectivities are formed, which are informed by these teens’ specific socio-cultural position. we argue that by acknowledging these alternative ways of what a learning subject is, and how connections are formed, we can proactively incorporate them as useful models of global learning. keywords: connected learning; global learner; learner identity; turkish-dutch teens; ego-network analysis info corresponding author email: a.unlusoy@uu.nl doi: https://doi.org/10.14786/flr.v8i2.423 1. introduction: aim and scope in the learning sciences, new prototypical notions of learning have been put forward that correspond to the possibilities and challenges of the digital era. these notions foreground the informal domain as a space where learning takes place and oppose or challenge traditional models for schooling. for instance, inspired by the possibilities of gathering an endless amount of resources on the internet and connecting with likeminded others to explore these resources, notions such as ‘affinity spaces’ (gee, 2005), ‘connected learning’ (ito, gutiérrez, livingstone, penuel, rhodes, & salen, et al., 2013), or ‘personalized e-learning’ (o'donnell, lawless, sharp, & wade, 2015) have arisen. a similar example of a technology-driven prototypical model of learning is the notion of ‘global learning’. inspired by the possibilities of utilizing technology to facilitate interactions between learners of different cultures, which, in principle, provides learners with the opportunity to develop global perspectives, the notion of global learning has been put forward to inspire educational reform (gibson, rimmington, & landwher-brown, 2008). these concepts have in common that they put the learners’ personal engagement at the centre as well as the learners’ ability to gather (digital) resources based on this personal engagement. as such, they challenge traditional, authority-driven models of learning, in which knowledge distribution by institutions is the norm. at the background of these discussions about new metaphors and models for learning in the digital age, our ambition with this paper is to expand our ideas of what a global learner might be. we do so through showing empirically how turkish-dutch youth develop particular socio-culturally informed ‘learner identities’ as well as unique networked (trans)national connectivities that challenge dominant metaphors of learning in the digital age. as we hope to show, they challenge the image of the autonomous, individualistic self-implied in these ideals of global learning, as well as the idea that connectivity evolves around the agentic efforts of the individual learner. adopting a perspective on learning as socially and culturally situated, this paper argues that such situated perspectives seem to be forgotten with the launching of 21st century notions of learning. therefore, the paper seeks to expand such a perspective into learning in the 21st century and new models for learning. in this paper, we build upon earlier work (de haan, leander, ünlüsoy, & prinsen, 2014, p. 508) in which we argued for a critical reconsideration of “idealized digital connectivities for learning”. in work on these idealized connectivities, the suggestion is made that “people are optimally networked so that resources are equally available, shared and voiced, and participation possibilities are maximized” (p. 508). we have argued that everyday social practices of connectivity reflect a much more nuanced and differentiated reality, based on the idea that ‘connectivities for learning’ are situated over time and socially constructed social practices that are informed by specific cultural norms and values. we have proposed that personal networks as a unit of analysis are a good starting place to explore these nuances and we have coined the term ‘networked configurations for learning’ (ncl) to refer to the idea that connectivities for learning are diverse and socially situated. in this paper, we expand our earlier argument on the specificity of connectivities. first, in this paper we provide a more detailed account of one group of learners, turkish-dutch youth, of which we have gathered more ethnographic data in comparison to the earlier paper. second, we are making use of this sample to also elaborate more extensively on how the notion of ‘what a learner is’ can be socio-cultural-specific. we draw on sinha’s (1999) idea of ‘learner identity’, who has argued that being or knowing how to be a particular kind of learner is not something that is ‘given’ or universal but rather something that is formed in socialization practices associated with particular communities. third, in this paper we elaborate more explicitly on how digital connectivities are part of global-local dynamisms shaped by both migration and digital technology. in particular, we focus on the transformative potential of these mobilities for learning, by showing how ‘to be here and there at the same time’ and how being a member of several normative communities simultaneously provides unique opportunities for learners. the study thus provides an empirical record of what we think of as an ‘a-typical case’ of a 21st century learner. the study documents turkish-dutch immigrants’ use of technological affordances to expand their learning and then asks how their efforts relate to the personalized, individually engaged learner pictured in new prototypical notions of learning. before we present our theoretical take on learning as a cultural and situated phenomenon, and how this relates to notions of connectivity and new technologies, we give a brief overview of how globalization and new technologies have spurred new notions of learning (1.2) as well as how teens from minority backgrounds constitute a good example of how technology is adopted in particular ways, related to the dynamics of migration (1.3). 1.1 global societies and new notions for learning we live in an era that is marked with abundant information and almost constant exposure to it. news headlines, blog, vlog and status updates, tweets, social media feeds, emails and text messages ask for our attention not only as the recipients of the information but also as the distributors, co-creators, and recyclers of it. new information and communication technologies (ict) are widely acknowledged for their role in lowering the threshold of information access for everyone and in enabling new ways to interact. however, these changes are not only dependent on the influence of technologies. how people use these technologies is strongly related to who they are and their social, cultural and material environment. the dynamic interplay between technology and identity eventually also shapes the ways in which people interact, socialize and learn and can create specific socio-technical practices and divides in this respect (hildreth & kimble, 2004). knowledge production and consumption in so-called ‘global’ societies happens at geographically dispersed scales. in globalized information and knowledge societies, individuals are not only part of relatively homogeneous locally based communities, but, at the same time, they are a member of many different, locally and globally dispersed networks, which provides them with unique and tailored possibilities to find knowledge and learn in these networks (farrell, 2006). this idea resonates with the more general concept of networked individualism that addresses how we relate to people in the digital age (rainie & wellman, 2012). rainie and wellman (2012) observe that in the past, personal networks used to be mainly defined by small, densely knit local groups, and communication was primarily face-to-face and location-dependent. now, individuals are much less constrained by geographical boundaries, and even though traditional social spaces defined by, for instance, kinship relationships, neighbourhood and work remain important, they are no longer the only sites for socialization. according to castells (2007), these changes also mean a shift from a more hierarchically structured social system to a more networked and participatory one, which is profoundly transformative for individuals as well as for the foundations of society as we know it. some have argued that this development fundamentally changes the way we learn, while simultaneously causing a greater diversification of the possibility to learn. an example of such work is developed in alignment with the notion and educational ideal of ‘connected learning’ (ito, et al., 2013). connected learning, which is enabled through new digital infrastructures in globalized societies, is defined as learning that is socially embedded, interest-driven, and oriented towards educational, economic, or political opportunity. basically, the premise is that new digital infrastructures and networks allow young people to pursue personal interests or passions, which they, with the support of others, turn into learning opportunities, which again might also lead to academic achievement or civic engagement. the premise is that new technologies enable people to explore and share interests freely and openly. there is a much greater freedom -in comparison to standardized educationin how people invest their time and energy to satisfy their (varied) interests as well as in the actual potential to turn these interests into careers. connected learning has been presented as an ideal of learning in the global society for all, and in opposition to and as an alternative for outdated notions and practices of learning and education (ito, et al., 2013; kumpulainen & sefton-green, 2014). although its idealized form is only available for progressive digital media users typically associated with privileged minorities (ito, et al., 2013), this idea in fact highlights the variation in the lives and learning possibilities of young people. 1.2 new migration and technology: changing opportunities for learning for immigrant youth in particular, teens from minority backgrounds constitute a good example of how technology is adopted in particular ways. for a long time, an important defining aspect of being an immigrant has been the geographical, social and cultural gap between the two ‘homelands’; the one that is left behind and the one of settlement. however, under the influence of new technologies, the image of the “uprooted migrant” is now replaced with the “connected migrant” (diminescu, 2008). new technologies enable a space to be ‘together’ regardless of actual physical locations and enable being here and there simultaneously. the effort to establish new belongings and associations while maintaining the connections with loved ones and acquaintances wherever they may be is now a key part of the migration experience (diminescu, 2008). these network connections can be considered also as paths of information, belonging, support etc., and form important “linguistic and social capital” (lam, 2014, p. 503). more importantly, these new technologies provide immigrant teens with forms of networked capital, which reflects their social, cultural, ethnic, and historical background as well as their material reality. often these networks provide them access to different social spheres that are heterogeneous. these new connectivities and the life worlds they give access to have implications for what it means to learn and socialize. the focus becomes much more on what it means to learn to participate and move through multiple different social spheres as well as on the process of transformation that is necessary to participate in these heterogeneous social and culture spheres and networks (de haan, 2011). although new technologies also provide mainstream youth with these possibilities and challenges, they seem to define immigrant youth in particular. there is a small body of literature that indeed shows that immigrant youth access a variety of different spaces, social networks, which enable as well as challenge their learning in particular ways in comparison with mainstream youth. for instance, lam (2009) observes that as chinese-american teens explore their interests online they use both chinese and english. this enables them to access a distinct range of information and media content, which provides alternative, more empowering spaces for their learning compared to learning at school. likewise, messina dahlberg and bagga-gupta (2014) show how in online communities with multiple ethnic backgrounds, culturally and linguistically hybrid ways for co-constructing and mediating learning are supported, which are different from (monocultural or monolinguistic) institutional learning spaces. below, we will elaborate our argument on how new technologies create particular and situated opportunities for learning, departing from the notion of learning as a situated phenomenon (1.2.1). we argue that both the notion of ‘what a learner is’ (1.2.2) as well as connectivities that are constructed for learning are culturally and socially situated (1.2.3). 1.2.1 the ‘particular’ of learning and the acknowledgement of non-mainstream notions we draw upon sociocultural learning theories, and more specifically on the notion that learning is situated in socio-cultural practice in two different ways. first, learning is situated in the sense that learning is a product of the activity, context, and culture in which it is developed and used (brown, collins & duguid, 1989). it is situated in sociocultural practices precisely because ‘human beings have the need and ability to mediate their interactions with each other and the nonhuman world through culture’ (cole, 1998, p. 291). it cannot be captured by just looking at individuals. learning is distributed among co-participants of communities of learners (lave & wenger, 1998). second, learning is situated in the sense that it involves the appropriation of particular heritages and particular learner identities, and there is variation in how communities guide learners according to culturally informed notions of what learning is (rogoff, 2003). this second position represents a more politically oriented strand of studies, as the issue is often raised that the heritages, identities and culturally informed learning practices of minorities are not always acknowledged in mainstream education (gonzalez & moll, 2002) or in educational theories (rogoff, 2003). this study wants to highlight in particular the second sense of situatedness, while acknowledging the first. 1.2.2. becoming a particular kind of learner: adopting a ‘learner identity’ to foreground the subjectivity of the learner, studies in the sociocultural tradition have argued that becoming a learner also involves developing a version of ‘the self’, which fits the cultural expectations of what is expected from a novice. as sinha (1999) argues: becoming a learner is a situated phenomenon, which requires earlier experience in a particular socio-cultural practice. for instance, learning to recognize the appropriateness of a particular socially organized set up for a teaching learning situation and positioning oneself as a learner in accordance with socially appropriate roles (e.g., teacher and learner positions) is something that requires knowledge and prior experience of how learning is culturally and socially framed. developing human beings are being constructed and positioned in ‘particular and specific kinds of non-discursive practices, in such a way that he or she becomes a learning subject, or self, of the kind required by the culture within which teaching learning situations and opportunities are situate’ (sinha, 1999, p. 33). to elaborate his point, sinha contrasts the often taken-for-granted image of the creative learner with other taken-for-granted images of learners, such as the idea that learners are information-processing subjects. he claims that we often forget that these notions of what a learner is or should be are themselves shaped by normative traditions on learning. when we, for instance, assume learners to be creative, this implies a socio-culturally constructed self that understands him/herself as a creative developing being. the same applies for the idea that learners represent an autonomous self that is operating relatively independent from her/his social environment in terms of motivation, cognition, awareness, judgement and action. in other words, the learning self is not a culturally neutral concept but depends on particular interpretations of how a subject is supposed to grow, relate, identify, know, etc. although the relationship between learning and identity has been addressed in different ways (see for an overview moje & luke, 2009), this particular point is often forgotten. it is partly reflected in the distinction that arnseth & silseth (2013) make when they describe the learning self as both ‘a’ novice, that is, as becoming a central participant of a community that is endowed with a particular (community related) identity, and ‘a particular kind’ of novice, involving all it takes to become a central participant of that community. it is this second issue that we address here. however, evidently, both notions of a learner identity can never be entirely independent as both are embedded in culturally based notions of what membership in a community means. 1.2.3 notions of connectivity and learning not only learner identities are particular and situated, but likewise (online) connectivities that are constructed for learning are defined by socially and culturally informed experiences. following what we described above regarding the unique and tailored possibilities to find knowledge and use connections for learning afforded by technology, we argue that these diversified connectivities are situated in socio-cultural practices. as noted above in section 1.1, we have termed this networked configurations for learning (ncl). as ‘networked individualism’ and ‘connected learning’, ncl focuses on the role of the new technologies and the importance they deem to our increased networking capacity via these ict. however, in the concept of ncl, an argument is developed on how this network capacity matches with the socio-cultural, economic, personal conditions and drives of individuals or groups. moreover, it is used to study how these networks function for learning and allows description of the particular online and offline networked connectivities of diverse socio-cultural groups and the culturally and socially informed experiences for learning these connectivities enable (de haan, leander, ünlüsoy, & prinsen, 2014, p. 532). ncl builds upon the idea that the personal networks and a person’s learning and socialization experiences are directly related to and interdependent with one another. personal networks are the dynamic mechanisms where important everyday learning experiences are situated. configurations of these networks are only partly shaped by new technologies and, as argued earlier, it is essentially people’s social, cultural, ethnic, and historical background and material reality that shape these networks. in this study, we describe how the formations of the networks of turkish-dutch youth inform and shape their learning, while also paying attention to the wider socio-cultural and historical context of these immigrant youth. before we introduce our study, we provide an overview of the literature on turkish-dutch teens in the netherlands, in particular as related to their media use, and how this has been discussed as related to what it means to grow up as a minority youth. 1. 3 turkish-dutch teens the turkish-dutch youth in our study are secondor third-generation immigrants: children of families whose (grand-)fathers were recruited mostly from the rural regions in turkey. they migrated to the netherlands for labour and reunited with their family over the course of eighties and nineties (schneider, crul, & van praag, 2014). although current policies expect minorities to integrate, earlier integration was not facilitated as labour migrants were expected to return to their country, and language and culture maintenance as well as concentrated settlement were supported by the dutch government. this policy is now seen as one of the explanations for the relative segregation of the turkish immigrant community (vedder & virta, 2005; verkuyten, 2001). studies on turkish-dutch adolescents have shown that they are raised in families that are very concerned with transmitting the turkish tradition, history and language, and relationships between adolescents and their parents are highly impacted by what is considered appropriate according to the norms and values in the turkish community. turkish youth also show a strong attachment and self-esteem (related) to the turkish community (verkuyten, 2001). earlier media researchers have reported how media, especially television, is used by turkish families, including youth, to orient themselves towards turkey and that they are also oriented towards homeland media (d’haenens, 2003). moreover, turkish immigrants are documented as less active on the web, e.g., on discussion fora, in comparison to their moroccan peers, the other large immigrant population in the netherlands (ünlüsoy, de haan, leander, & völker, 2013). content analyses showed that the online discussion fora they use generally deal with turkey and turkish culture or identity (d’haenens, 2003). from another perspective, milikowski has pointed out how television watching can also have de-ethnicizing effects on these youth through the comparative lens it offers (2000). there is not much known from the literature on how turkish youth orient themselves on the internet from the perspective of their learning. mostly, the literature that touches upon issues of education and learning deals with the participation of turkish youth in formal schooling and their school success. other literature centres around key factors relevant for public participation such as employment (e.g., crul & schneider, 2010) or issues of identity and well-being (e.g., phalet & hagendoorn, 1996; verkuyten, 2001; vedder & virta, 2005). studies on the informal educational climate in turkish families describe turkish families as being defined by traditional gender division roles, fear of ‘dutchification’ of their children (lindo, 2000), as well as the significant gap between turkish children’s and their parents’ experiences in education (coenen, 2001). additionally, studies have shown that immigrant parents of turkish origin in the netherlands orient themselves towards collective and in-group-serving values in the education of their children (phalet & schönpflug, 2001). for youth in turkey, studies show that they have changed towards more independence, self-respect and autonomy in comparison with their parents under the influence of rapid economic and social change. however, these orientations continue to exist next to a strong orientation towards respect for tradition, obedience, politeness, honour for parents and elders and adherence to social expectations (morsunbul, crocetti, cok, & meeus, 2016). the abovementioned literature, apart from the fact that the studies that report on turkish-dutch immigrant youth are relatively outdated to provide the background for this study, are informative with respect to the challenges youngsters in this community might be facing for their education and learning. nevertheless, it lacks a perspective that considers how the global changes induced by ict and social media have changed the learning opportunities for turkish-dutch youth in the netherlands. we lack knowledge of the impact of new technologies on the learning opportunities of young immigrants such as the turkish-dutch youth in the netherlands. how do the affordances of technology, and the connections and resources it provides, define these teens in who they want to become, and how they learn to become? how do the affordances of these technologies also define the global-local dynamism that characterizes the lives of these immigrant youth? and, in line with the aims and scope of this paper as described above, how can we describe these teens as a particular case of a global and connected learner to meet our ambition of expanding our ideas of what a global learner might be? 2. current study & research questions in line with the aim as outlined above, in this study we ask how turkish-dutch teens perceive themselves (as learners), what the characteristics are of their personal online and offline networks, as well as how these networks enable and inspire them to achieve their learning goals. the specific research questions that guide our analyses are: 1. how can the personal networks of turkish-dutch youth be described in terms of structural characteristics (size, density, clusters) and composition (e.g., homogeneity, geographical spread)? 2. what characterizes turkish-dutch teens as learners? we approach this question by asking how turkish-dutch teens characterize themselves, what their interests and ambitions are, who or what they want to become, and what their view is on how they learn (to become someone)? 3. how do turkish-dutch teens’ networks function for their learning? in line with our goal to understand how new technologies create particular and situated opportunities for learning, we ask the following sub questions. can we distinguish particular interest-driven learning network (sub)clusters? are such networked sub clusters mediated by specific technologies or media resources? how do such sub clusters mediated by technologies enable or put boundaries on the learning of turkish-dutch teens? 3. methodology 3.1 sample and procedure a total of 25 turkish-dutch teens of 13-16-year-old (m = 14.68, sd = 1.03; 14 female participants) were interviewed for this study. the participants were from two inner-city schools in secondary education. the school in rotterdam (n = 12; 6 female) was a preparatory school for vocational university (called havo: hoger algemeen voortgezet onderwijs) and the school in den bosch (n = 13; 6 female) was a lower preparatory school for secondary vocational training (called vmbo: voorbereidend middelbaar beroeps onderwijs). participants who went to the same school knew each other as schoolmates. all participants were born in the netherlands; their families (either parents or grandparents) have migrated to the netherlands for labour. participants were drawn from a largescale survey study on learning, identity and the use of new media; the survey sample was representative of migrant youth age 12-18 in the netherlands, in secondary education. given our interest in personal networks and (online) connectivities we selected the students who had reported online media use (i.e., checking in their online social media account, watching videos and using an instant messaging application) on a regular basis in the earlier survey. through the schools we informed youth and their parents regarding our continued research and that participation was voluntary. the participants were informed that they could withdraw from the interview at any point. none made use of this possibility. the interviews took place in a quiet room in schools, during school hours. they lasted on average 1,5 hours and the students received a voucher for their participation. the interviews were audio-recorded transcribed verbatim. during the transcription process, we found out that 3 interview recordings were corrupted, and one interview was only partially recorded. these 4 cases (3 girls, 1 boy) were excluded from qualitative analyses. the participants were interviewed using a social network interview (sni) technique which revealed information regarding the structure and composition of their online and offline personal networks, see section 3.2. for details. on average, networks consisted of 21.5 contacts (sd = 6.57), varied between 12-35 contacts across the sample, we collected information over 537 network contacts in total. 3.2 instrument and measurements sni is a semi-structured, in-depth interview instrument that is used to gather information and analyse personal networks, also called ego-networks. it consists of two parts. the first part, called the name generator, identified the ‘important people’ in the lives of our participants. we asked the participants to think of important people in their lives, e.g., who they identified with, who were reliable or who they hang out with. we also prompted the participants to think of different spaces (school, neighbourhood, social media, vacations) to help them remember people who might be considered important for their personal network. we used the network analysis programs vennmaker 1.0 and nodexl to collect information and visualize the networks. ego-network data contain demographic information regarding all contacts (called alters) in the participants’ (called ego) networks and information to interpret the relationships between the ego and his or her alters (e.g., how frequently they communicate) (crossley, belotti, edwards, everett, koskinen, & tranmer, 2015). in this study, we collected the following information about each alter: age, gender, location (same household, neighbourhood, city, elsewhere in the netherlands, outside the netherlands, unknown) and level of education. we also collected the alters’ relationship to the ego (immediate family, extended family, friends from school, friends elsewhere, acquaintance), how they communicate with that alter (mainly online, mainly in-person [offline], both onand offline), and whether alters knew each other (i.e., whether they would recognize and talk to each other if they saw each other on the street). the (clustered) position of alters, as related to each other and the respondent, was determined using the harel–koren fast multiscale algorithm, which is one of nodexl’s force-directed algorithms (alters/nodes naturally push away from each other, while edges [relations/connecting lines] bring them closer together). this results in highly connected nodes migrating to the centre, while less connected nodes are pushed to the outside. the ‘groups’ function of nodexl was then used to calculate clusters, which works by aggregating closely interconnected groups of nodes. only when the network visualizations were generated by this software, we progressed to the second part of the interview. we asked the participants if the visualization resembled what they thought their network would look like (e.g., ‘does this network picture and the groups generated represent your network?’). overall, the representations were reported to be accurate, and small differences were discussed in the interviews. the second part of sni covered 1) how teens defined and identified with the different parts of their networks (we asked questions such as “are there people in this network picture that you look up to?”, “who are the people in this network that you spend most of your time with?”, “what do you do together?”); 2) what kind of (online or offline) learning activities they recognized in their network relationships (we asked questions such as “are there people or groups of people in this network with whom you undertake activities in which you want to become better?”); 3) how new technologies played a role in maintaining the network and what role these play for their learning (we asked questions such as “what are some of the things that you became better at (online or offline) over time?”, “how (if at all) did using new technologies made the experience different?”). the interviews were conducted with continuous attention for the personal networks of these teens, and their statements were consistently connected with the visualized personal network maps throughout the interview. prior to starting the interviews, we also checked briefly what the participants’ associations with learning were. participants who strictly thought of school learning were encouraged to think of the concept more broadly (such as how they learned to bike, how they found out about a new app, how they explored different sports or developed a hobby) so that we could come to a shared understanding of the idea of learning. we informed the participants that school-learning examples were okay to mention, but that our study had a broader perspective on learning. 3.3 analyses the first research question: ‘how can the personal networks of turkish-dutch youth be described in terms of structural characteristics (size, density, clusters) and composition (e.g., homogeneity, geographical spread)?’ was answered by analysing the quantifiable characteristics of ego-networks. based on frequencies and averages, we described the general structural and compositional features of networks. variables of ethnic, gender and age homogeneity were created per ego-network by computing the amount of alters who share the same ethnic background, gender or age as the ego. this measurement reveals the proportion of people who are similar to and/or different from the ego, in other words, the relative diversity (or uniformity) in each network. density in each network, that is the proportion of individuals in a network who know each other, was computed to assess how tightly connected each network was. the network characteristics of girls and boys were also compared to each other. section 4.1 describes the results of this analyses. to answer the second research question, ‘what characterizes turkish-dutch teens as learners?’, the transcriptions were first read, with this research question and the respective sub-questions in mind. nvivo software was used to label and analyse the narratives. we paid attention to perception of the self, identity markers, self-descriptions, and in cases where these were present, we pay attention to how these were related to issues of development, becoming and learning. next, we focused on how they defined themselves as a learner, or how they defined striving to be someone (becoming) more generally. section 4.2 and 4.3 describe the results of this analyses. to answer the third research question: ‘what characterizes turkish-dutch teens’ networks as learning networks? and how do their networks function for their learning?’ as well as to answer the respective sub questions, we focused on particular interests, hobbies, and activities that they mentioned, asked if these were represented by particular sub clusters of their networks, if and how these were mediated by particular technologies, in particular when the relations were contacted offline, while also paying attention to the specific location of these network clusters or individual relations. finally, we focused on if and how these sub-clusters enabled or hindered their learning. we start off presenting general network characteristics in 4.1 (e.g., divides in their networks, and what characterizes the people in their networks), and continue with the narratives on their identity as a learner (represented in 4.2. and 4.3), while also connecting these narratives to the network data from 4.1. in sections 4.4 to 4.6, we again combine network data with their narratives on learning when we focus on how their learning happens in particular networked sub-configurations, paying attention to how technology mediates these configurations, and how these function for their learning. for instance, we argue how technology plays a role in creating specific network divides and how this works for their learning, or how technology provides access to particular networks, which then provides entrance to distinctive opportunities to gain information, form opinions, discuss positions and gain new insights. the analyses as a whole must also be read as a commentary on assumptions of models of global learning, especially when the analyses address how global learners can be identified and what kind of connectivities technologies create. 4. findings: networked configurations for learning of turkish-dutch teens 4.1 turkish-dutch teens’ network characteristics: quantitative data the following structural and compositional characteristics of the networks are derived from 25 ego-networks with 537 alters in total. in table 1, we present a detailed overview of the personal networks and how boys’ and girls’ networks compare to each other. there were no significant differences between boys and girls regarding the proportions of different network characteristics. the noteworthy similarities across the networks are highlighted below. as explained in the methods section, networks were generated based on the important relationships of participants. the turkish-dutch participants generated largely family-based, ethnically homogenous personal networks. the networks were densely connected, meaning that most people knew each other. the algorithm we used generally created two clusters, given the interconnectedness of the networks. the clusters created by the algorithm were typically characterized by family versus friends’ relations, or older generations versus peer relations. the participants confirmed the cluster structure; most of the participants divided their networks based on a friends and family sub-cluster. in a few cases, the algorithm created three clusters, which youth identified as family, and two different groups of friends (e.g., from a sports club and school or from the mosque and from school), and in one case virtually all network contacts were connected, resulting in a single cluster. table 1 overview of turkish-dutch youth’s network composition (in %) family members were nearly always the majority in their networks. the personal network with the least amount of family still had 45% (9 out of 20 alters) of family members, and the percentage went up to 80%, with an average of 60.6% family presence in networks. on average, 40.3% of alters were older than the participants and 5.8% were younger; peers were on average 53.9% of all network contacts. network contacts who were mainly contacted online were 14.7% of all network contacts (79 out of 537). these 79 people were nearly exclusively family members who lived in turkey or elsewhere, but outside the netherlands. there were no statistically significant differences in network configurations between boys and girls. for the whole sample, density scores varied between .50 and .98, indicating in the least dense network 50% of contacts knew each other. on average, 88.3% of all contacts were of turkish descent (varied between 55% and 97%). there was a clear preference for hanging out with same-gender peers (70.6% were same-gender); often the only men in turkish girls’ networks and women in boys’ networks were their relatives. nearly a quarter (22.3%) of all network contacts lived outside the netherlands (often in turkey but also in germany, france and belgium), indicating that geographical distances were not preventing them from keeping in touch with their family and friends. the contacts that lived abroad were predominantly family members (89%), friends (10%) and 1 acquaintance (1%). 4.2. perceptions of the (learning) self: wanting to be like them “you become who your parents raised you to be” in both 4.2 and 4.3, we analyse how turkish-dutch youth perceive their ‘learning self’. in line with how we defined the notion of learner identity above, we first concentrate on their notion of ‘self’ in 4.2, while in 4.3 we extend this analysis with a focus on their vision on development and becoming. in both cases, we do so under the assumption that these two notions are highly related. we asked the participants to think about the characteristics, experiences, people, things and interests that made them ‘who they are’. although there were individual differences in the way the responses were formulated, the prominent trend among all participants was their emphasis on and identification with their family and community. this was also clear from their social networks, as we just reported in 4.1, which for a large part consisted of members of the turkish community, mostly family. another sign of this family orientation, as the network pictures illustrate (see figures 2 & 3), was that parents knew (almost) every one of the network contacts of their child. according to the participants, their family relationships, and in some cases relationships with good friends, shaped who they were. in response to who or what made them who they are, the participants often simply stated ‘my parents’, ‘my family’ or tahir (15, m), “without my parents and siblings i am nothing”. emel (13, f) “you become who your parents raised you to be”. simge (16, f) “what i learn at home from my mother and father shapes how i think and how i behave. my friends, they learn from their parents and behave that way…when we are together [with her group of friends] we influence each other too and do the same things together”. these examples illustrate a common understanding among turkish-dutch youth that the development of the self does not so much relate to becoming an independent self but a self that is highly relational, involving their closest relationships (with their parents and friends). these examples show not only that the notion of (being like your) family is a central and essential aspect of turkish-dutch teens’ identity but also that in their discourse on the self, a reference to the collective was always prominent. this was also evident from the fact that youth, when asked to describe themselves, more often referred to community values, such as being “respectful, especially towards older people” and “trustworthy, or honest”, than unique qualities. thus, rather than characteristics that typify an individual, these qualities reflect a community ideal of how one should be and behave. in addition to mentioning family, being turkish was a prominent identity marker in their discourse on the self. this was inferred from a variety of responses to questions such as “with whom do you feel you can be yourself” and “where/when do you feel at home”. “turkish-ness” seemed to represent a ‘comfort-zone’; a ‘place to withdraw’ or a state of feeling particularly at ease and seemed to be related to having a common history, values, and language. other implicit references to their being turkish included speaking turkish at home, especially with parents, but also among friends, going to turkey for vacation and following turkish media. this identification with the turkish community was also reflected in their network structure; 88.3% of all network contacts had a turkish background (see figure 1 below). figure 1. ethnic groups in turkish-dutch teens’ networks. the orientation towards turkey was also encouraged in the family. for example, yildiz’s father explicitly encouraged her to speak turkish more fluently: “my father says ‘you must learn turkish’, he corrects my turkish… his turkish is very good. with my mother, i speak only turkish because her dutch isn’t good” (yildiz, 14, f). the media-diet of the participants was primarily in turkish, and this, too, was sometimes encouraged by their parents. adnan (16, m): “my father comes home from work and he talks about all the news. he must listen to the [turkish] news, read the newspapers and teletext and i’m at home beside him so i listen with him. i talk about turkish politics a lot…”. satellite television and online streaming were accessible for all participants. these technologies gave participants continuous access to turkish media products (i.e., news, series, reality shows) and provided them with a wealth of information and material to understand and define ‘being turkish’ for themselves. 4.2.1 different others as contrasting examples in the diaspora however, as already indicated above, through their social networks, youth were able to contact extended family and friends of family members who live in turkey and in other migration countries (compare table 1, which indicates that 22% of their network contacts are transnational contacts). these transnational contacts, especially the ones from other migration countries, ruptured the relative homogeneity of their models for identification as these family members were socialized in communities that partly hold different values and norms. for example, ahmet (16, m) whose sister’s family lives in germany says “my nephew is very different from me. he is a good person, that’s true, but he’s different…he doesn’t do sport, he sits too much behind the computer and he smokes…when we are there i get along with him and his friends, but up to a certain point…if they say come we’ll go smoke, i won’t…i learn german at school here, but when i am in germany with my nephew i learn more. i understand everything, but i cannot talk very well”. this, and many other examples, show that the turkish diaspora, and the possibility to connect with it through digital technology, brought these teens in contact with other cultural traditions, alternative possible selves and ‘versions’ of being turkish, that serve as extended opportunities for learning and identification. for instance, they provided important language learning opportunities, or a comparative perspective on life between netherlands and other countries of the turkish diaspora in terms of economic chances, school experiences, teenage life, youth cultures, and gender roles. in this sense, their perception of the (learning) self, as grounded in a particular version of communal belonging, seems to be changing through these digitally afforded networks, which allows a more diverse and fragmented identification with their community. figure 3. network of ahmet. 4.3. learner identity: loyalty to the community, hierarchy and learning from role models as a next step in our analyses, we focused on how teens expressed a process of becoming someone, or in other words, how they saw themselves as a learner. to understand turkish-dutch youth’s associations with learning, and what kind of learners they perceived themselves to be, we asked them questions such as ‘what do you associate with the word learning?’, ‘when and with whom do you feel that you learn something?’, or ‘is there something you strive to get better at?’. our findings show that informal learning experiences were often expressed in narratives of ‘becoming a particular kind of person’, while taking someone from their community or family as a model that represented particular values and status. the participants often told us what kind of person they wanted to become, taking an individual as an example. for instance, emel (13, f) mentioned her uncle (who is part of her online network, see her network picture in figure 2) as her role model; “i would like to be exactly like him…when he was young he said to the family that he was going to study and graduate (at) university. he kept following his dream until he achieved it and i want that for myself. he is from elazig and he studied in cambridge”. another example of this was tahir (15, m), who said that his cousin was an inspiration for him because “he has a good life although he did not have much money. he has, how should i say that, he has worked a lot, worked a lot, gave it [money] to his parents to pay for the house […] therefore, later when i have a job, i will also give a part [of my income] to my mother, i also want to take care of my parents.” these role models often share certain characteristics such as being loyal to their family, working hard, starting with very little and achieving their goals despite difficulties. the narratives often highlight these teens’ appreciation for such role models and their desire to become a similar example once it is ‘their turn’ to do so. learning then represented modelling the important others from the community, as well as returning or giving back to the community, rather than seeking out a unique, individual path that distinguishes the individual from other members from that community. figure 2. personal network of emel. age-related hierarchy and status play a significant role in how youth perceive the workings of learning as relational. in the following example, emel (13, f) illuminates this hierarchy by describing herself as a role model for her younger brother: “my younger brother learns a lot from me, that he needs to respect older people, that he needs to follow his dreams…if there is something he doesn’t understand in his schoolwork he also comes to me”. furthermore, age and experience were essential elements and aspects in their vision of how one learns and gains wisdom. when comparing her peers (friends) to the older people in her network, simge (16, f) said: “as you grow older you become more thoughtful and more understanding. that is the obvious difference. a person who is 15, 16 years old is more ‘uzmanlaşmış’ [which means specialized in turkish, referring to the idea of being skilled, but here she means more prone] in making mistakes than, say, a 45-year-old. a 45-year-old [referring to her teacher at the mosque] can know more and is more thoughtful”. these examples make clear that for these youth, learning does not represent a process of making themselves independent from the community, pursuing a unique identity, or following a unique personal trajectory, but on the contrary, learning to become as one ‘ought to be’, to be like important others from the community and to return back value to the community. although these teens see learning as related to the explicit guidance of older generations and accept this guidance as learning, they did not exclude other ways of learning such as peer-learning or experimenting. however, these forms of learning were not foregrounded in their discourse in relation to learning or did not always count or were recognized as learning. 4.4. interest-based activities and networks? the point that turkish-dutch teens hold up collective identities to describe themselves, and do not use identifications that point to an autonomous, unique and individualized self as much, was also clear from their narratives about specific interests or activities that would typify them. hobbies, individual habits or an exclusive personal expertise were rarely mentioned. in most cases, these hobbies would represent more generally appreciated activities for boys or for girls, such as fighting sports for boys and fashion for girls. when asked ‘which activities do you strive to get better at’, boys often responded with sports, games and sometimes also interests such as cars, computers, planes/flying. turkish-dutch boys were mostly keen participants in sports, specifically football and martial arts (e.g., karate, kendo, boxing). they practised these sports often in sport-schools or sport-clubs on an amateur or semi-professional level. sport practices represented relatively unique and individualized learning spaces for them, which was also evident from their social network pictures. for instance, ahmet (16, m), a goal-keeper, stated: “i learn a lot from football … i learn a lot about how i should move, there is a lot of interaction (between coach and other keepers), and we learn to make decisions and logical thinking, especially logical thinking”. ahmet’s network picture (see figure 3) illustrates that sports and gaming are in fact personal spaces that are relatively independent from the rest of his mainly family-based, densely connected network. ahmet’s interest in sports is fostered through two contacts represented in this part of his network: a friend with whom he plays the online game “online soccer manager” and only talks about football-related issues, and his football coach. girls, on the other hand, found informal learning interests rather difficult to pinpoint, but most of them expressed their interest in fashion and spending time together with friends. in contrast to the boys’ enthusiasm for sports, there was very little attention to sports from girls. none of the female participants were actively doing any sports at the time of the interview. additionally, there were no other overlapping interests between boys and girls. for girls, it seemed the social aspect of any given interest was more central than gaining expertise in their field of interest, such as improving their ‘eye for fashion’. in other words, they were ‘just’ interested in fashion because they enjoyed the social side of consulting each other about clothes. thus, we found that for these youth, ‘interests’ were more generally appreciated activities and were not seen as personal. boys sometimes developed relatively unique interest-based networks, mostly related to sports or online gaming, while girls were reluctant to recognize interest-based learning in their favourite activities. 4.5. access to digital media as transformative potential as already illustrated in the example of ahmet in 4.4, turkish-dutch youth used the internet to support their offline interests or activities, such as sport, school or music preferences. for instance, as turkish-dutch boys often were engaged in fight-sports such as karate, taekwondo and boxing, most of these boys also visited youtube to watch fragments of fight choreography (e.g., bruce lee movies), fighting tournaments or street-fight videos. they often searched for information that often would be hosted in turkey or have content related to turkey. for instance, boys who are interested in football would search turkish websites about football, such as fanatik (a turkish sports (online-)newspaper), or they would visit websites that stream turkish television series. however, this media content based in turkey would be shared and discussed in social networks that are transnational and consist of social contacts both based in the netherlands and abroad, mostly in turkey, but also in the turkish diaspora. this is clear from the example of ceylin (16, f) (see figure 4, which shows her social network). in the interview, ceylin mentions how a combination of media network resources has helped her to think more consciously and critically about the social position, rights and demands of ethnic minorities. she explains how she learned that a television series (behzat c., a crime-detective television series) in turkey was cancelled due to, among other issues, bringing up the issue of education in kurdish for kurdish people. the news of cancellation combined with what she knew about kurdish people in turkey through her personal transnational social network triggered the conversation. she explains that her cousin, one of the transnational contacts in her network, informed her that in turkey, he observed that kurdish people were living comfortably similar to how they do and did not have a lower status or have lesser means to maintain their lives “[when her cousin was in izmir] he said that he saw kurdish people, and they were all very rich…those who live in the cities especially are powerful people and have all the means….”. through her transnational social network, she is made aware that kurdish minority status in turkey is not necessarily a problematic one regarding economic means and that they have consumer patterns she can also identify with. see figure 4, which shows 3 of her cousins that inform ceylin about ‘new’ places in turkey she does not know yet and give access to knowledge regarding kurdish minorities in turkey, among other issues. the perspective she gained about kurdish people through her nephew enabled her to also identify with and see them (also) as minorities. through the information provided via her online social network, she started to see the kurdish as minorities similar to herself: “well kurdish people should have their rights [particularly referring to the right of education in native language, which was an issue in the crime-detective television series], [...] i’m here [in the netherlands] a turkish person”. she continues her comparison of her own situation as a turkish minority in the netherlands with the situation of kurdish people in turkey. the combination of watching this turkish television series, hearing the news of cancelling, and having online contact with family in turkey, who have contact with kurdish people in turkey, enabled her to compare the situation of minorities in turkey and in the netherlands. through these different resources and sometimes conflicting stories from these resources, ceylin has learned to see the complexities of minority status and of political rights, including her own situation and that of the kurdish people. figure 4. network of ceylin. 4.6. access to digital media as network boundaries in addition to tapping into the content issued in turkey, turkish-dutch immigrant youth use media specifically tailored for turkish-dutch immigrants. the following example shows how their media use also develops along ethnic lines and social networks and marks divides between turkish-dutch immigrants and their dutch classmates. the first author asks ceylin (16, f) about a radio app for turkish-dutch immigrants. “i: what is taksim fm? turkish music? c: yes. taksim.fm is a radio channel in the netherlands made by turkish people. but it’s turkish, look, [she turns on the radio (on her phone)] but it’s not only music, it’s talk-shows, and they have a website. i don’t remember if there’s a mehmet akif (dj). (…) they talk about the dutch and the news here [meaning in the netherlands] but also about turkey and other stuff. it’s [focused] specifically [on] the things that are interesting for the turkish-dutch young people.” ceylin continues to explain how different media and apps are utilized differently for different ethnic groups. she says: “the dutch people of my age wouldn’t know taksim fm […] one main difference between dutch people and me is that i speak both turkish and dutch and a little english sometimes like “i love you” (giggles). turkish people also write out accents, you know, like the laz messages [on whatsapp], so that’s different with us [referring to her turkish-dutch friends]. but (with) my dutch classmates, well, we use it [whatsapp] for school stuff, because, well, we’re friends but not the best friends, and school is our only common subject. they are not part of my other daily life”. this example shows how specific media applications, media content, language used, and even typography are network-dependent. while texting with her turkish-dutch friends, ceylin uses the turkish language or specific typographical codes associated with the laz language to joke or tune in to themes specifically interesting for turkish-dutch immigrant youth. another example of such a divide is experienced by fatos (15, f). she is a fan of a turkish actor in her favourite drama-series little secrets (turkish: küçük sırlar). she explains “i love the internet. we have satellite tv to watch turkish channels at home, and i watch television there, but if i’m not at home or if i don’t have time at the time of the show, then i stream it from the internet…. she mentions a list of series and talk-shows she follows; when asked which one she likes most, she says: cetin, from küçük sırlar. i’m his fan. so are my friends […]. fatos tells that after school she spends much time chatting with her friends, and one of their favourite subjects is what happens in the series [küçük sırlar], for instance, how the characters dress up and about their expensive lifestyle in istanbul. she also seeks information and other related content (e.g., photos, news) regarding the series and the actor she likes and shares this on her social media account (hyves, a dutch social networking platform active between 2004-2013), where she reports to have approximately 400 contacts. in addition to its entertainment value, this series enables these girls (fatos and her other turkish-dutch friends) a window into life in turkey. however, she is sharing this interest exclusively with her turkish-dutch friends. fatos tells us how she cannot share this topic with one of her best friends, m., and how this creates a boundary between them. the access to the show through satellite tv and the internet creates an information divide between m., who is dutch and who she considers one of her best friends, and her friends who have access to the show. what both of these examples show is that digital (mobile)communication also mediates and re-informs specific network divides. therefore, next to media resources and the networks associated with them, which provide unique learning opportunities in the form of distinctive opportunities to gain information, form opinions, discuss positions and gain new insights, these youth also create clear boundaries in their social networks, which cut them off from other opportunities to learn and socialize. 5. discussion the results show how turkish youth create their own version of a global learner, based on notions of the self and of becoming that are primarily relational and oriented towards the collective. furthermore, afforded by technology, these youth create unique networked relationships for their learning, which are relatively closed for outsiders and are organized around the collectivities of the family and ethnically informed networks. at the same time, in the diaspora, their (transnational) networks are slowly becoming more diverse and fragmented. through contact with different migrant communities, settled in different countries, they are confronted with multiple versions of the ideal self as well as with diversification of socialization ideals and practices. although the network configurations of the participants sometimes reflect existing traditional (e.g., gender-based) boundaries, their networks also provide novel learning opportunities. unique trans-local experiences and corresponding means of reflection are mediated by a combination of the specific configuration of their (transnational) social networks, access to technology and media content. in this discussion, the main point we want to address is how these teens form a specific kind of global learner, which is not covered in all respects by recent prototypical models for learning in the digital era, such as in the concept of the connected learner. before doing so, we first discuss the other main point we want to bring under the attention, namely how our network analysis approach has enabled us to reach the goal of this study: to provide a critical reconsideration of “idealized digital connectivities for learning”. 5.1. how network analysis approach has enabled us to reach the goal of this study ego-network analysis as we have applied in this study, combining the gathering of social network data with in-depth interviewing, is especially adept for exploring the interaction between social structures and certain qualities or processes assigned to individuals and how these influence and shape each other (crossley, et al. 2015). in our case, we were able to map the specific social relationships that youth employ for their learning and understand their experience and perception of the “learning self” in relation to the structural and compositional aspects of their personal communities. in other words, this methodology allowed us to study learning as a networked phenomenon. as such, the approach was particularly useful to comment on models of learning that put connectivity up front. given that with this methodology, we can map the particularity of the connectivities of these youth empirically, it is suited to relate this conceptual work with the empirical record. the combination between quantitative analyses and interpretative work also enables the analysis of underlying paradigms associated with models of connectivity, such as the idea of an autonomous, independent self that is at the centre of the connectivity. moreover, the methodology allows us to study more precisely than with, for instance, interview studies, how learning opportunities and identities are created by, and vice versa create, social capital. ego-network analysis is up to now only used by a limited number of studies to study learning. we hope this study contributes to showing the potential of this approach for the study of learning. 5.2. how these teens form a specific kind of global learner as we have argued before (de haan et al., 2014) the prototypical image of a so called ‘connected learner’ as implied in the connected learning project resonates with a learner that is “highly agentic, driven by individual needs and interests, and pursues his or her learning in individualized and tailored-to-the-need networks”(p. 510). it is important to be specific here to what of the connected learning project we direct our critique. we argue that the initial ideal of highly engaged learners that seek out (online) connections to fulfil their individual interests is itself a culturally informed particular image of a learner. we do not direct our critique to the educational ideal of the project of connected learning which has been presented as an ideal of learning in the global society for all, and in opposition to and as an alternative for outdated notions and practices of learning and education (ito et al., 2013). we think that although connected learning is presented as an inclusive project in which individual learners are stimulated to connect to peers and other collectivities, there is not enough attention for how it was inspired initially by an individualistic learner ideal, based on the idea of unique preferences, networking efforts and independence in formulating their knowledge interests. the turkish-dutch teens in this study are well-connected learners, but they diverge from the ideal implied in this prototypical learner in several critical ways. first, there is very little emphasis on individuality among this group. these teens underline the interdependency and connectedness within their family and community much more than they bring up individual characteristics or interests. the driving force for these teens seems to be establishing interdependence with the family and the turkish(-immigrant) community. second, the learning experiences of turkish-dutch teens can be characterized as conformist or traditional in the sense that they appreciate the guidance from their parents to lead them to what is considered the key values and virtues of their community. to be a good person, it is important to act according to the norms of this community. the role models for a ‘good person’ are often those people who are respected within the family. in this regard, these teens also diverge from the image of a teenager in western middle-class families more generally, who puts less stress on relatedness with their family and much more on their individual agency (kağıtçıbaşı, 2005). furthermore, our data showed that the socialization and learning experiences of these teens are defined by relatively (ethnically) homogeneous, closed and dense social networks and that this is also partly the case for their online networks. in these communities, which are now also extended to the online world, the passing on of traditional values, family bonds, and hierarchical relationships, a focus on the collective and strong gender divisions remain important. this part of our data is in line with the image that was provided in the literature on turkish-dutch immigrant populations and that depicts this group as a relatively gender-segregated community (vedder, 2005), with a strong attachment to turkey (verkuyten, 2001) and a “fear of ‘dutchification’ of their children” (lindo, 2000, p.221). this part of our results would imply that, even given their access to online media, these youth’s learning ecologies seem rather stable and closed towards new influences, which is rather atypical for learning in migration (de haan, 2011). however, our study also revealed that their ncl undergo important changes, related to new possibilities provided by digital media. our results partly confirm earlier studies that the turkish community’s media use is geared towards content from turkey and that turkish-dutch immigrant youth’s online activities are also geared towards turkey in terms of the language used, reference to turkish culture or identity (d’haenens, 2003). this was evident from how turkish-dutch youth ‘plugged in’ media content in their networks that came from media channels based in turkey directed at the turkish community. nevertheless, our study also shows how tendencies described by the phenomenon “networked individualism” (rainie & wellman, 2012) impact these youth. looking at where the online contacts of turkish-dutch youth are located geographically, it was clear that they connect online with people who live relatively close by (in their neighbourhoods and in their cities). however, technology was also used to build networks across spaces, and their networks were defined by particular local-global dynamics. digital media allows the learning of these turkish-dutch youth not only to reach towards turkey but also to other turkish diaspora countries in europe (e.g., germany, belgium, france). these networks provide them with important trans-local learning experiences in terms of access to different languages and life worlds, even if these happen within their extended families. moreover, our data has shown that mobility patterns between turkey and the netherlands allow media content to be reinterpreted in similar ways as milikowski (2000) has argued. as shown by the example of ceylin, through a constant comparison between contexts, particular media-based content is re-weighted and re-interpreted, which provides important new possibilities for learning. certainly, these youth are not only “connected migrants” (diminescu, 2008) who establish new belongings and associations while maintaining the connections with their root community; they are also ‘connected learners’. they use new technologies to create spaces for their learning, regardless of actual physical locations, and give form to new ways of being that allow them to be ‘here and there simultaneously’. this greatly expands their socialization and learning possibilities. the ‘being here and now simultaneously’ has been associated with the notion of deterritorialization and the possibility it allows to develop a critical position by authors such as braidotti (1994). rather than the detachment from particular places in a literal sense, it is the distantiation of conventions and the multi-perspectivity that is seen as enabling the development of a critical position in relation to the “canonical”. in a similar vein, the living within or moving between heterogeneous spaces as well as the need to take distance from the existing cultural paradigms while reconsidering and recreating them has been referred to as the migrant condition (papastergiadis, 2000). the data shows that the trans local social network configurations of these young migrants, also in combination with their mobility patterns, generated particular opportunities for deliberation and reflection, that are related to the particular kinds of both deterritorialization and connectivity these youth experience. 5.3 implications for practice we believe that picturing this kind of a-typical global and connected learner helps us to expand our ideas of what a global learner might be. seeking to uncover the one-sidedness of notions of ‘new’, 21st century learning helps to understand how some might be privileged while others are marginalized. on a more positive note, acknowledging diverse types of connected learners can help to proactively incorporate them as useful models of global learning (doerr, 2017). further, the particular form of the ncl utilized by these turkish-dutch youth might also involve the risk of growing up relatively isolated. therefore, we would plea for more attention be paid to these particular informal learning experiences within the (semi-)formal contexts of learning, such as schools, libraries or community centres. it is important for teens and educators alike to realize how these network configurations are playing a role in shaping who these teens are and how they shape their future opportunities. with this paper, we hope to have contributed to a critical reflection on the particularity of networked connectivities and their impact on the potential diversification of learning and socialization in our societies. with this, we align with the ideal implied in the educational connected learning project that seeks ways to expand patterns that have been found for what might be privileged learners to all learners. however, our contribution turns the way to work towards this ideal around. instead of starting with elite or privileged learner ideals, and expand these to larger populations, our mission is to first expand our knowledge and ideals of the ways in which youths can make use of the possibilities our digital societies offer. it is key that educators and practitioners are partner in this process and are aware of this variation in their work with both majority and minority students. keypoints new prototypical models for learning in the 21st century are grounded in particular culturally informed ideals. learner identities of turkish-dutch teens contradict the autonomous, individualistic learning self, implied in the ideal of the ‘connected learner’. unique networked (trans)national connectivities are formed in response to the interaction of digital affordances and the learners’ specific socio-cultural position. acknowledging diverse types of connected learners can help to proactively incorporate them as useful models of global learning. references arnseth, h. c., & silseth, k. (2013). tracing learning and identity across sites: tensions, connections and transformations in and between everyday and institutional practices. in o. erstad, & j. sefton-green (eds.), identity, community, and learning lives in the digital age (pp. 23-38). cambridge: cambridge university press. braidotti, r. (1994). nomadic subjects. new york: columbia university press. castells, m. (2007). communication, power and counter-power in the network society. international journal of communication, 1, 238-266. retrieved from https://ijoc.org/index.php/ijoc/article/view/104/47 brown, j. s., collins, a., & duguid, p. (1989). situated cognition and the culture of learning. educational researcher, 18, 32–42. coenen, e. (2001). 'word niet zoals wij': de veranderende betekenis van onderwijs bij turkse gezinnen in nederland. amsterdam: het spinhuis. cole, m. (1998). can cultural psychology help us think about diversity? mind, culture, and activity, 5, 291-304. doi:10.1207/s15327884mca0504_4. crossley, n., bellotti, e., edwards, g., everett, m. g., koskinen, j., & tranmer, m. (2015). social network analysis for ego-nets: social network analysis for actor-centred networks. london: sage. crul, m., & schneider, j. (2010). comparative integration context theory: participation and belonging in new diverse european cities. ethnic and racial studies, 33, 1249-1268. doi:10.1080/01419871003624068 de haan, m. (2011). immigrant learning. in k. symms gallagher, r.k. goodyear, d.j. brewer & r. rueda (eds.), urban education: a model for leadership and policy (pp. 328-341) (14 p.). new york: routledge de haan, m., leander, k., ünlüsoy, a., & prinsen, f. (2014). challenging ideals of connected learning: the networked configurations for learning of migrant youth in the netherlands. learning, media and technology, 39, 507-535. doi:10.1080/17439884.2014.964256 d'haenens, l. (2003). ict in multicultural society: the netherlands: a context for sound multiform media policy? international communication gazette, 65, 401-421. doi:10.1177/0016549203654006. hildreth, p. m., & kimble, c. (eds.). (2004). knowledge networks: innovation through communities of practice. igi global. diminescu, d. (2008). the connected migrant: an epistemological manifesto. social science information, 47, 565-579. doi:10.1177/0539018408096447 doerr, n. m. (2017). phantasmagoria of the global learner: unlikely global learners and the hierarchy of learning. learning and teaching, 10, 58–82. doi:10.3167/latiss.2017.100206 farrell, l. (2006). making knowledge wor: literacy & knowledge at work. new york: peter lang. gee, j. p. (2005). semiotic social spaces and affinity spaces: from the age of mythology to today's schools. in d. barton, & k. tusting (eds.), beyond communities of practice: language, power and social context (pp. 214--232). cambridge: cambridge university press. gibson, k., rimmington, g., & landwehr-brown, m. (2008). developing global awareness and responsible world citizenship with global learning. roeper review, 30, 11-23. doi:10.1080/02783190701836270. gonzalez, n., & moll, l. c. (2002). cruzando el puente: building bridges to funds of knowledge. educational policy, 16, 623–641. hirzalla, f., de haan, m., & ünlüsoy, a. (2011). new media use among youth in migration: a survey-based account. wired up technical research report. utrecht university. retrieved from http://www.uu.nl/wiredup/publications.htm ito, m., gutiérrez, k., livingstone, s., penuel, b., rhodes, j., salen, k., ... watkins s. c. (2013). connected learning: an agenda for research and design. digital media and learning research hub: irvine, usa. kağıtçıbaşı, ç. (2005). autonomy and relatedness in cultural context: implications for self and family. journal of cross-cultural psychology, 36, 403-422. kumpulainen, k., & sefton-green, j. (2014). what is connected learning and how to research it? international journal of learning and media, 4, 7-18. doi:10.1162/ijlm_a_00091 lam, w. (2009). literacy and learning across transnational online spaces. e-learning, 6, 303-324. doi:10.2304/elea.2009.6.4.303 lam, w. (2014). literacy and capital in immigrant youths' online networks across countries. learning, media and technology, 39, 488-506. doi:10.1080/17439884.2014.942665 lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge university press: u.k. lindo, f. (2000). does culture explain? understanding differences in school attainment between iberian and turkish youth in the netherlands. in h. vermeulen & j. perlmann, immigrants, schooling and social mobility (1st ed., pp. 206-224). houndmills: macmillan press ltd. messina dahlberg, g., & bagga-gupta, s. (2014) understanding glocal learning spaces: an empirical study of languaging and transmigrant positions in the virtual classroom. learning, media & technology, 39,468-487. doi:10.1080/17439884.2014.931868 milikowski, m. (2000). exploring a model of de-ethnicization. european journal of communication, 15, 443-468. doi:10.1177/0267323100015004001 moje, e. b., & luke, a. (2009). literacy and identity: examining the metaphors in history and contemporary research. reading research quarterly, 44, 415-437. morsunbul, u. crocetti, e., cok, f., & meeus, w. (2016). identity statuses and psychosocial functioning in turkish youth: a person-centered approach, journal of adolescence, 47, 145-155. doi:10.1016/j.adolescence.2015.09.001 o'donnell, e., lawless, s., sharp, m., & wade, v. (2015). a review of personalised e-learning: towards supporting learner diversity. international journal of distance education technologies, 13, 22-47. doi:10.4018/ijdet.2015010102 papastergiadis, n. (2000). the turbulence of migration: globalisation, de-territorialisation, hybridity. blackwell: oxford. phalet, k. & hagendoorn, l. (1996). personal adjustment to acculturative transitions: the turkish experience. international journal of psychology, 31, 131-144. doi:10.1080/002075996401142 phalet, k., & schönpflug, u. (2001). intergenerational transmission of collectivism and achievement values in two acculturation contexts: the case of turkish families in germany and turkish and moroccan families in the netherlands. journal of cross-cultural psychology, 32, 186–201. doi:10.1177/0022022101032002006 rainie, h., & wellman, b. (2012). networked. cambridge, mass.: mit press. rogoff, b. (2003). the cultural nature of human development. oxford: oxford university press. schneider, j., crul, m., & van praag, l. (2014). upward mobility and questions of belonging in migrant families. new diversities, 16, 1–7. sinha, c. (1999). situated selves: learning to be a learner. in j. bliss, r. säljö & p. light (eds.), technological resources for learning (pp. 32-46). oxford: pergamon. ünlüsoy, a., de haan, m.j., leander, k. & völker, b.g.m. (2013). learning potential in youth’s online networks: a multilevel approach. computers and education, 69, 522-533. doi:10.1016/j.compedu.2013.06.007 vedder, p. & virta, e. (2005). language, ethnic identity, and the adaptation of turkish immigrant youth in the netherlands and sweden. international journal of intercultural relations, 29, 317-337. doi:10.1016/j.ijintrel.2005.05.006 verkuyten, m. (2001). global self-esteem, ethnic self-esteem, and family integrity: turkish and dutch early adolescents in the netherlands. international journal of behavioral development, 25, 357-366. doi:10.1080/01650250042000339 microsoft word südkamp et al_publication.docx frontline learning research vol.3 no. 2 (2015) 1-26 issn 2295-3159 1 corresponding author: anna südkamp, emil-figge-str. 50, 44227 dortmund, germany, phone: +49 231 755 6570, fax: +49 231 755 6572, e-mail: anna.suedkamp@tu-dortmund.de doi: http://dx.doi.org/10.14786/flr.v3i2.130 competence assessment of students with special educational needs—identification of appropriate testing accommodations anna südkampa, steffi pohlb, & sabine weinertc atu dortmund university, germany bfreie universität berlin, germany cuniversity of bamberg, germany article received 23 october 2014 / revised 16 march 2015 / accepted 18 may 2015 / available online 1 june 2015 abstract including students with special educational needs in learning (sen-l) is a challenge for largescale assessments. in order to draw inferences with respect to students with sen-l and to compare their scores to students in general education, one needs to assure that the measurement model is reliable and that the same construct is measured for different samples and test forms. in this article, we focus on testing the appropriateness of competence assessments for students with sen-l. we specifically asked how the reading competence of students with sen-l may be assessed reliably and comparably. we thoroughly evaluated different testing accommodations for students with sen-l. the reading competence of n = 433 students with sen-l was assessed using a standard reading test, a reduced test version, and an easy test version. also, n = 5,208 general education students and a group of n = 490 lowperforming students were tested. results show that all three reading test versions are suitable for a reliable and comparable measurement of reading competence in students without sen-l. for students with sen-l, the accommodated test versions considerably reduced the amount of missing values and resulted in better psychometric properties than the standard test. they did not, however, show satisfactory item fit and measurement invariance. implications for future research are discussed. keywords: students with special educational needs; testing accommodations; reading competence; large-scale assessment südkamp et al 2 | f l r 1. introduction large-scale assessments generally aim at drawing inferences about individuals’ knowledge, competencies, and skills (popham, 2000). today, educational assessments play an important role as they inform students, parents, educators, policymakers, and the public about the effectiveness of educational services (pellegrino, chudowsky, & glaser, 2f001). using results from large-scale assessments, researchers can study factors influencing the acquisition and development of competencies and derive strategies on the improvement of educational systems. often, assessments are meant to serve even more ambitious purposes such as supporting student learning (chudowsky & pellegrino, 2003). assessing students’ domain-specific competencies (e.g., reading competence, mathematical competence) is a key aspect of most large-scale assessments today (weinert, 2001). in this study, we focus on the assessment of competencies of students with special educational needs (sen) in large-scale assessments. while national large-scale assessments like the national assessment of educational progress (naep) in the united states and international assessments like the programme for international student assessment (pisa) have established sophisticated methods for the assessment of students without sen, testing students with sen has proven to be challenging. in order to inform strategies for the assessment of students with sen, we evaluate whether and if so, how students with sen may be tested reliably and comparably to general education students. for this purpose, students with and without sen were tested with accommodated and non-accommodated test versions. on the level of the single items, we carefully checked the reliability and comparability of the test scores obtained with the different test versions as reliability and comparability are necessary prerequisites for drawing meaningful inferences from large-scale assessments. 1.1 assessing reading competence of students with sen large-scale assessments usually aim at describing the abilities of students within a country across the whole spectrum of the educational system or even across countries. this also includes students with sen. in our notion, students with sen include all students who are provided with special educational services due to a physical or mental impairment. in germany, special schools are established for students with sen. the special school system—in turn—is highly differentiated itself. there are special schools for students with special educational needs in learning, visual impairments, hearing disability/impairment, specific language/speech impairments, physical handicaps/disabilities, severe intellectual impairment/disability, emotional and behavioral difficulties, comprehensive sen, and students with health impairment. so far, comparatively little is known about the educational careers of students with sen and their development of competencies across the life span (heydrich, weinert, nusser, artelt, & carstensen, 2013; ysseldyke et al., 1998). however, there is evidence that for students with sen, reading problems pose one of the greatest barriers to success in school (kavale & reece, 1992; swanson, 1999). learning to read is a tedious process requiring psycholinguistic, perceptual, cognitive, and social skills (gee, 2004). beyond the basic acquisition of the alphabet system (i.e., letter-sound correspondence and spelling patterns), reading expertise implies phonological processing and decoding skills, linguistic knowledge (vocabulary, grammar), and text comprehension skills (durkin, 1993; verhoeven & van leeuwe, 2008). according to kintsch (2007), text comprehension can be seen as a combination of text-based processes that integrate previous knowledge to a mental representation of the text. it is thus a form of cognitive construction in which the individual takes an active role. text comprehension entails deep-level problem-solving processes that enable readers to construct meaning from text and derives from the intentional interaction between reader and text (duke & pearson, 2002; durkin, 1993). on average, students with sen show lower reading performance in large-scale assessments than students without sen (thurlow, 2010; thurlow, bremer, & albus, 2008; ysseldyke et al., 1998). for example, for the naep 1998 reading assessment in grades 4 and 8, lutkus, mazzeo, zhang, and jerry (2004) südkamp et al 3 | f l r report lower average scale scores for students with sen compared to students without sen. within the german kess study (bos et al., 2009) reading competence of seventh graders in special schools was compared to the reading competence of fourth graders in general education settings. results demonstrated that fourth grade primary school students outperformed students with sen in seventh grade in reading competence, the difference being about one third of a standard deviation. drawing on data from a three-year longitudinal study, wu et al. (2012) found that, compared to their general education peers, students receiving special educational services were more likely to score below the 10th percentile for several years in a row. in light of these findings, different reasons for the low performance of students with sen have been discussed (abedi et al., 2011). first, some students with sen have difficulties related to the comprehension of text (e.g., lack of knowledge of common text structures, restricted language competencies, inappropriate use of background knowledge while reading; gersten, fuchs, williams, & baker, 2001). reading problems of students with sen in upper elementary and middle school are likely to be complex and heterogeneous resulting, for example, from a lack of phonological processing and decoding skills, a lack of linguistic knowledge (vocabulary, grammar), and a lack of text comprehension skills, or from a combination of problems in these areas. second, lower performance could be attributed to a lack of opportunities to learn and to low teacher expectations (woodcock & vialle, 2011). third, there could be barriers for students with disabilities in large-scale assessments that lead to unfair testing conditions (pitoniak & royer, 2001). according to thurlow (2010), a combination of all these factors is likely. taking the norm of test fairness seriously, largescale studies try to ensure that students with disabilities will not be confronted with unfair testing conditions. that is why testing accommodations are often employed for students with sen. 1.2 providing students with sen with testing accommodations the provision of testing accommodations for individuals with disabilities is a highly controversial issue in the assessment literature (pitoniak & royer, 2001; sireci, scarpati, & li, 2005). generally, testing accommodations are defined as changes in test administration that are meant to reduce construct-irrelevant difficulty associated with students’ disability-related impediments to performance. according to the standards for educational and psychological testing, accommodations comprise “any action taken in response to a determination that an individual’s disability requires a departure from established testing protocol. depending on circumstances, such accommodation may include modification of test administration processes or modification of test content” (american educational research association, 1999, p. 110). note that some authors differentiate between accommodations and modifications: while accommodations are not meant to change the nature of the construct being measured, modifications result in a change in the test and equally affect all students taking it (hollenbeck, tindal, & almond, 1998; tindal, heath, hollenbeck, almond, & harniss, 1998). in this article, however, we use the definition of the standards for educational and psychological testing. due to the many types of disabilities, various accommodations have been provided when testing students with sen. accommodations include, for example, modification of presentation format—including the use of braille or large-print booklets for visually-impaired examinees and the use of written or signed test directions for hearing-impaired examinees—and modification of timing, including extended testing time or frequent breaks (koretz & barton, 2003). in the 1998 naep reading assessment, a sample of students with varying disabilities and students with limited english proficiency were assigned to the following accommodations based on their individual needs: one-on-one testing, small-group testing, extended time, oral reading of directions, signing of directions, use of magnifying equipment, and use of an aide for transcribing responses. changes in the test bear the possibility that they alter the construct measured. if accommodated tests for students with sen measure a different construct than the standard test for general education students, the competence scores between the two student groups are not comparable. thus, it is utterly important to test whether test accommodations result in reliable and comparable competence measures (borsboom, 2006; südkamp et al 4 | f l r millsap, 2011). lutkus et al. (2004) address the issue of whether the naep reading construct remains comparable for accommodated versus non-accommodated students by analyzing differential item functioning (dif). dif exists when subjects with the same trait level have a different probability of endorsing an item. only very few items were found to have statistically significant dif for the focal group (accommodated students) versus the reference group (non-accommodated students), which indicated measurement invariance across subgroups being assessed with different tests. in contrast, koretz (1997) did find indications of dif as 13 of 22 common items showed strong dif when comparing item difficulty for students with sen tested with accommodations and students without sen tested under standard conditions using data from the kentucky instructional results information system assessment. in pisa 2012, samples of students with sen were also tested with accommodated test versions (a shortened test version of the standard test and a test version including easier items). here, the results on the psychometric properties of the accommodated test versions are still to be published (müller, sälzer, mang, & prenzel, 2014). in sum, the results concerning the use of testing accommodations are inconsistent. a major concern remains that in some cases accommodations may alter the test to the extent that accommodated and non-accommodated tests are no longer comparable (abedi et al., 2011; bielinski, thurlow, ysseldyke, freidebach, & freidebach, 2001; cormier, altman, shyyan, & thurlow, 2010). however, one drawback of the studies by lutkus et al. (2004) and koretz (1997) is that in the analyses, different accommodations are not distinguished although different accommodations may have different effects. another disadvantage is that students with sen are often compared to students without sen at the same grade level. here, students with sen and students without sen differ not only in terms of their sen status but also in terms of their expected achievement level. the study by yovanoff and tindal (2007) is one of the rare studies using an alternative comparison group where students with sen in grade 3 are compared to students without sen in grade 2. another issue is that comparisons usually involve students without sen receiving the standard test and students with sen receiving the accommodated test versions. by doing this, possible dif may be due to both, testing accommodations and problems of testing students with sen. in order to disentangle the appropriateness of test accommodations from the testability problems of students with sen, the effects of test accommodations should separately be tested in a group of students without sen. in the same vein, pitoniak and royer (2001) identify three major challenges for research on testing accommodations: variability in examinees, variability in accommodations, and small sample sizes (also see geisinger, 1994). in the present study, we approach these challenges by focusing on students with special educational needs in learning (sen-l), by focusing on specific accommodations appropriate for students with sen-l, and by using a study design that incorporates a group of low-achieving students without sen for evaluating the appropriateness of testing accommodations. 1.3 testing students with special educational needs in learning (sen-l) while providing students with physical, hearing, and visual impairments with testing accommodations is rather accepted, pitoniak and royer (2001) stress the importance of studying the effects of testing accommodations on test validity (or comparability), especially when testing students with learning disabilities. in this study, we focus on students with sen-l in germany, who comprise all students, who are provided with special educational services due to a general learning disability1. in germany, students are assigned to the sen-l group when their learning, academic achievement, and/or learning behavior are impaired (kmk, 2012) and when students cognitive abilities are below normal range (grünke, 2004). in contrast to students with sen-l, students with (specific) learning disabilities (e.g., a reading disorder) are not necessarily impaired in their general cognitive abilities. in germany, the decision of whether a student 1 as for the term “learning disabilities”, the term sen-l is not clearly defined. note that we refer to a heterogeneous group of students with multifaceted etiology. südkamp et al 5 | f l r has special educational needs in learning is based on a diagnostic procedure and made collaboratively by parents, teachers, consultants, and school administrations. about 78% of the sen-l students in germany (kmk, 2012) do not attend regular schools but attend special schools with specific programs and trainings tailored to those who are unable to follow school lessons and subject matter in regular classes. in fact, students with sen-l compose the largest group of students with special educational needs in germany (kmk, 2012). comparably, students with learning disabilities compose the largest group of students with disabilities in the unites states (cortiella & horowitz, 2014; us department of education, 2013). our assumption is that the acceptance of testing accommodations for students with sen-l is low, because the disabilities of students with sen-l (e.g., information processing restrictions) are very likely to interfere with the construct that is to be measured (e.g., reading literacy). in turn, respective testing accommodations are likely to be construct-relevant. there are two test accommodations typically implemented for students with sen-l: extended test time and “out-of-level” testing. extended test time is usually implemented in order to compensate for information-processing restrictions in students with sen-l. in his review on the appropriateness of extended time accommodations for students with sen—including students with sen-l among others—lovett (2010) identified two studies with a serious amount of differentially functioning items, while dif was negligible in one other study. a prominent hypothesis regarding extended test time is the differential boost hypothesis (fuchs, fuchs, eaton, hamlett, & karns, 2000), which states that students with sen benefit more from extended time than students without sen. in their review on test accommodations for students with sen including 14 studies on extended time, sireci et al. (2005) conclude that students with sen as well as students without sen benefit from extended test time. in only one of the reviewed studies students with sen benefited more from extended test time than students without sen. another common method is to provide students with sen-l with an out-of-level test, which was originally meant for testing younger children (thurlow, elliott, & ysseldyke, 1999). similarly, alternate assessments that test lower-level reading and mathematical skills or skills that are precursory to reading and numerical literacy can be applied (zebehazy, zigmond, & zimmerman, 2012). both methods aim at avoiding undue frustrations for students with sen-l and at improving the accuracy of measurement. critics of out-oflevel and alternate assessments argue that students with sen-l are faced with low expectations due to the assessment and are prevented from taking the standard tests, and thus consider the assessments to be inappropriate for accountability assessment. nevertheless, thurlow et al. (1999) consider out-of-level testing a good opportunity to test students with sen, if one can make sure that a common scale across different disparate grade levels is available. such a common scale may be achieved by using methods of item response theory (irt), given that the items measure the same construct. when scaling oregon’s early reading alternate assessment onto the first general statewide benchmark reading assessment in grade 3, yovanoff and tindal (2007) identified good psychometric properties of the alternate assessment and no severe dif between students with sen (grade 3) and students without sen (grade 2). however, data that support either the use or nonuse of out-of-level testing or alternate assessments is still rare (minnema, thurlow, bielinski, & scott, 2000; see the study by zebehazy et al., 2012, which focuses on visually impaired students, for an exception). 2. research questions as prior research has shown, testing competencies of students with sen-l represents a challenge for large-scale assessments (thurlow, 2010). assessing competencies of students with sen-l with tests that have been developed for students without sen-l may fail to result in satisfying item fit measures and may be associated with differential item functioning, which impedes the opportunity to compare the competence scores of students with and without sen. the present study aims to evaluate different strategies of testing südkamp et al 6 | f l r students with sen-l. generally, we address the question of whether and how satisfying item fit measures and measurement invariant test scores can be obtained for students with sen-l in large-scale-assessments. we evaluate whether standard tests developed for students without sen-l and testing accommodations for students with sen-l result in reliable and comparable measures of reading competence. if a reliable and comparable measurement of reading competence can be achieved, substantial research on the competence level, predictors of reading competence and competence development, as well as group differences may be investigated. in this study, two major research questions are addressed: first, we investigate whether a reduction in test difficulty and a reduction of the number of items lead to test results comparable to students tested without accommodations. we approach these questions by testing students in general education, for whom reliable and valid competence scores can be obtained using a standard reading test. as the accommodated test versions are targeted towards a lower competence level, we did not use the whole group of students in general education, but focused on the subgroup of low-achieving students. secondly, we explore whether these accommodations are suitable for testing students with sen-l. 3. method 3.1 sample and design we collected data within the german national educational panel study (neps). the neps is a national, large-scale longitudinal multicohort study that investigates the development of competencies across the lifespan (blossfeld & von maurice, 2011; blossfeld, von maurice, & schneider, 2011). the study aims at providing high-quality, user-friendly data on competence development and educationally relevant processes for an international scientific community (barkow et al., 2011). between 2009 and 2012, six representative start cohorts (aßmann et al., 2011) were sampled, including about 60,000 individuals from early childhood to adulthood. specific target groups include migrants (kristen et al., 2011) and students with sen-l (heydrich et al., 2013). all participants are accompanied on their individual educational pathways through a collection of data on competencies (weinert et al., 2011), learning environments (bäumer, preis, roßbach, stecher, & klieme, 2011), educational decisions (stocké, blossfeld, hoenig, & sixt, 2011), and educational returns (gross, jobst, jungbauer-gans, & schwarze, 2011). following the principles of universal design (dolan & hall, 2001; thompson, johnstone, anderson, & miller, 2005), the neps aims at providing a basis for fair and equitable measures of competencies for all individuals. in the present study, we used data from three different studies of students in fifth grade. these studies comprise a) a representative sample of general education students (main sample), b) a sample of students with sen-l, and c) a group of students in the lowest academic track (lat). the response rate in these studies was 55%, 45%, and 63%, respectively. in the main sample there were n = 5,208 general education students, including n = 700 students in the lowest academic track (see aßmann, steinhauer, & zinn, 2012, for more information on the neps main sample). on average, these students were mage = 10.95 (sdage = .53) years old and 48.3% were female (0.7% had a missing response on age, 0.2% had a missing response on gender). about 24.1% of the students reported that they spoke a language other than german at home. the sample of students with sen-l draws on a feasibility study with n = 433 students who were recruited at special schools for children with sen-l in germany. students in this sample were mage = 11.41 (sdage = .63) years old and 43.3% were female (0.7% had a missing response on gender). in this sample, about 30.1% of the students reported that they spoke a language other than german at home. südkamp et al 7 | f l r in this feasibility study, we applied two accommodated test versions that aimed at a) reducing the difficulty of the test and b) reducing the test length (and thereby increasing the testing time per item). in order to discern whether test items do not function properly because the accommodations change the test construct or whether students with sen-l still have problems with the test, we implemented a group of low achieving students without sen. this group consisted of a separate sample of n = 490 students enrolled in the lowest academic track, or hauptschule. students in this sample were mage = 11.28 (sdage = .63) years old and 48.4% were female. about 29.8% of the students in the lat spoke a language other than german at home. focusing on this sample, we evaluated whether the accommodated test versions yield reliable test scores and whether they assess the same construct as the standard reading test. students without sen were tested as for this group it has already been shown that reliable and valid competence assessment can be obtained using the standard reading test. thus, we could investigate the impact of the testing accommodations and disentangled testing problems resulting from badly-constructed accommodated test versions and testing problems resulting from the assessment of students with sen-l. we restricted our sample to students in the lowest academic track, because the accommodated test versions were targeted towards a lower competence level. for students in general education in higher academic tracks, the test accommodations would be too easy and, as a consequence of such low test targeting, could result in aberrant response patterns (due to motivation problems) as well as in low item discriminations (due to the low variability in item responses). implementing this group of low-achieving students allowed us to investigate whether the accommodated test versions generally result in reliable and comparable measures of competence. all students were tested in the middle of fifth grade in november and december 2010. data were collected by the international association for the evaluation of educational achievement (iea) data processing and research center (dpc). students participated in the study voluntarily, so student and parental consent was necessary. each student who participated in the study received 5 euros. 3.2 measures and procedures within all three samples, reading literacy as well as mathematical competence was assessed. the orientation towards the functionality and everyday relevance of the competencies studied is one central aspect of the neps framework for the assessment of competencies. it draws on the concept of literacy in international comparative studies with a focus on enabling participation in society (see oecd, 1999). in this study, we focus on the assessment of reading literacy. within the neps, the reading competence assessment focuses on text comprehension. all reading tests are developed based on a framework for the assessment of reading competence (gehrer, zimmermann, artelt, & weinert, 2013). this framework has been developed based on theoretical and pragmatic considerations that take earlier concepts and studies of reading competence within large-scale assessments into account. the most important dimensions within the framework are text types, cognitive requirements, and task formats. concerning text types, texts with commenting, information, literacy-aesthetic, instruction, and advertising functions are included. in turn, cognitive requirements range from finding information in the text, drawing text-related conclusions, and reflecting and assessing. across all age groups, the items in the test are either simple multiple choice (mc) items, complex mc items, or matching items. complex multiple-choice (cmc) items present a common stimulus followed by a number of mc questions with two response options each. matching (ma) items consist of a common stimulus followed by a number of statements, which require assigning a list of response options to these statements (see gehrer, zimmermann, artelt, & weinert (2012) for a full description of the framework including information on text types, cognitive requirements, item formats, and example items). südkamp et al 8 | f l r 3.2.1 standard reading test the standard reading test was designed for students enrolled in the regular school system. it was developed based on the conceptual framework sketched above. students were asked to read five different texts and answer questions focusing on the content of these texts (gehrer, zimmermann, artelt, & weinert, 2013). the test for students in fifth grade included a text about a continent (information function), a recipe (instruction function), an invitation (advertising function), a critical statement on a societal topic (commenting function), and a fictive story about a famous character (literacy-aesthetic function). in the analysis of the standard reading test, 56 items were included; however, subtasks of complex mc and matching items were treated as single items. so when combined, there were 33 questions in the standard reading test, which students had to complete within 30 minutes. for testing general education students, the test has shown good psychometric properties (pohl, haberkorn, hardt, & wiegand, 2012). 3.2.2 reading test with accommodations based on the standard reading test, two accommodated test versions were administered in this study. as mentioned above typical testing accommodations for students with sen-l include extended testing time and “out of level” testing. within the neps, time for testing a domain-specific or domain-general competence is limited to 30 minutes. under this restriction, we decided to develop one accommodated test version by reducing test length (reduced test), resulting in an increased test time per item. one text and its respective nine items, plus an additional 10 hard items were removed. the text on the societal topic and the items were removed, because the items showed to be comparatively difficult in prior item analyses in samples of general education students. in order to facilitate scaling of the different test versions on a same scale, an anchor item design (e.g., kolen & brennan, 2004) was used for linking the different test versions. for this design a sufficient number of items need to be the same in all test forms. therefore, in the reduced test four texts and 37 items remained the same as in the standard reading test and functioned as anchor items in this design. we refer to the term “anchor item” when an item is the same in the standard test and in the accommodated test versions. as a result of reducing the length of the test, one text function was left out in the reduced test version (the commenting function). still, the anchor items represented all three cognitive requirements. note that while this accommodation mainly served to reduce test length, it also reduced item difficulty. we decided to develop a second accommodated test version (easy test) that mainly aimed at reducing the difficulty of the standard test. therefore, three texts and their respective 37 items from the standard reading test were removed (the text about the continent, the critical statement on a societal topic, and the fictive story about a famous character). these texts and its respective items were replaced with three texts and 23 items that had been developed for younger children in grade 3—including a text on the human body (information function), a short story about a family (literacy-aesthetic function), and an invitation (advertising function). this procedure can be considered as some sort of “out-of-level” testing. however, two texts remained the same as in the standard reading test as we used an anchor item design. based on prior item analysis in samples of general education students in grade 5, five especially difficult items were eliminated from these texts. this procedure resulted in 12 overlapping items in the standard reading test and the easy test version. these items were used as anchor items in this design. in sum, the easy test version included 35 items. overall, 5,208 general education students including 700 students from the lowest academic track were tested with the standard reading test. students with sen-l took the standard reading test (n = 176), the reduced test (n = 173), or the easy test (n = 84) by random assignment. the additional sample of n = 490 students from the lowest academic track was randomly assigned to the reduced test (n = 332) and the easy test (n = 158). note that the standard reading test was not administered to this sample of students in the lowest academic track. for investigating the appropriateness of the standard reading test for students in the südkamp et al 9 | f l r lat, the subsample of the main sample of general education students attending schools of the lowest academic track were used (n = 700). in order to control for fatigue and acquaintance effects, the order of the different tests was rotated within the booklet in almost all test versions. the standard test and the easy test were administered either before or after a mathematics test. for the analyses, due to sample size issues, the test order was ignored and the different conditions were analyzed together. due to sample size limitations, there was no rotation of the position of the reduced test; the reduced test was only administered before the mathematics test. for the comparison of estimated item difficulties with general education students, data from these students refer to the same position within the booklet as data of students with sen-l or the students in the lat. so no bias is to be expected from test position. 4. analyses 4.1 the model we scaled the data within the framework of item response theory (irt). in accordance with the scaling procedure for competence data in the neps (pohl & carstensen, 2012; 2013), we used a rasch model (rasch, 1960) estimated in conquest (wu, adams, wilson, & haldane, 2007). in this model a unidimensional measurement model with equal loadings across items is proposed. various fit indices are available that describe the psychometric properties of the tests. as described above, the reading test included complex mc and matching items. these items consisted of a set of subtasks that were aggregated to a polytomous variable in the final scaling model in the neps. when aggregating the responses on the subtasks to a single polytomous super-item, we lose information on the single subtasks. since in this study we were interested in the fit of the items, we treated the subtasks of complex mc and matching items as single dichotomous items in the analyses. as such, we could not account for possible local item dependence within each set of subtasks. we applied the rasch model to every test version (standard test, reduced test, easy test) and sample (students with sen-l, students in the lat). 4.2 item fit in order to investigate whether the standard test and the accommodated reading tests reliably measured reading competence, we evaluated different fit measures. these included the weighted mean square (wmnsq; wright & masters, 1982), item discrimination, point-biserial correlation of the distractors with the total score and the empirically approximated item characteristic curve (icc). all of these measures provide information on how well the items fit a unidimensional rasch model. as wu (1997) showed, fit statistics depend on the sample size. the larger the sample size, the smaller the wmnsq and the greater the t-value. thus, since the group of students with sen-l differs in sample size from the group of students in the lat, we considered different evaluation criteria for the interpretation of the wmnsq. in this study, we report item discrimination, which describes the pointbiserial correlation of the item with the total score (i.e., relative number of correct responses on the total number of valid responses). a well-fitting item should have a high positive correlation—that is, subjects with a high ability should score higher on the item than subjects with a low ability. for an easier interpretation, we report the discrimination not only in absolute values, but classify the item fit regarding the discrimination into acceptable item fit (discrimination > .2), slight misfit (discrimination between .1 and .2) südkamp et al 10 | f l r and strong misfit (discrimination < .1). furthermore, point-biserial correlations of incorrect response options and the total score are evaluated. the correlations of the incorrect responses with the total score allow for a thorough investigation of the performance of the distractors. a good item fit would imply a negative or zero correlation of the distractor with the total score. distractors with a high positive correlation may indicate an ambiguity in relation to the correct response. finally, empirically approximated item characteristic curves (icc) were considered. these describe whether the number of correct responses corresponds to the theoretical implied response probability at each competence level. 4.3 measurement invariance reading scores of students with sen-l versus students in general education can only be compared when the tests are measurement invariant—that is, when there is no differential item functioning (dif). measurement invariance is furthermore a necessary assumption for linking the different test forms. when measurement invariance holds—and thus there is no dif—the probability of endorsing an item is the same for students with sen-l and those without sen-l who have the same ability. the presence of dif is an indication that the respective reading test measures a different reading construct for both target groups, and thus that the reading scores between the target groups may not be compared. we tested dif for each test version (standard, reduced, easy) and each target group (students with sen-l, students in the lat) by comparing the estimated item difficulties in the respective test version and target group to the estimated item difficulty of the same items for students in the main sample of the neps. students with sen-l as well as students in the lat were, thus, compared to general education students in the main sample. there is one exemption: the group of students in the lat was not tested with the standard reading test. in order to estimate dif for that group on the standard test, we used the data of the students in the lowest academic track of the main sample of general education students. for this, we separated the main sample into students in the lowest academic track and students in other tracks and compared the estimated item difficulty between both groups. we estimated dif in a multi-facet irt model, estimating separate item difficulties for general education students and for the respective target group. in line with the benchmarks chosen in the neps (pohl & carstensen, 2012), we considered absolute differences in item difficulties greater than 0.6 to be noticeable and absolute differences greater than 1 to be strong dif. note that these benchmarks serve here as an orientation for interpretation. to get a thorough picture, we also report the absolute dif value. also note that while in the standard test dif may be investigated for all items, dif in the reduced test and the easy test may only be investigated for the anchor items. in the reduced test and the easy test there are anchor items that allow linking of the different test versions. as described above, there are 37 anchor items in the reduced test and 12 anchor items in the easy test. 5. results in the following we will first represent the occurrence of missing values in each test form. then we will present item fit for the different test forms and samples, followed by a further investigation of reasons for item misfit. in a next step, results on the comparability of test scores are presented. the results on item fit and measurement invariance are then considered together for evaluating the appropriateness of the different test forms for assessing competencies of students with sen-l. südkamp et al 11 | f l r 5.1 missing responses table 1 depicts the mean of the relative amount of different kinds of missing responses for each of the target groups and test versions. similar to the main study (pohl et al., 2012), there is a large number of missing responses—on average, up to 19% of the items are missing. the amount of missing responses is larger in the students with sen-l group than in the group of students in the lat for all test versions and all types of missing responses. comparing the different test versions, the lowest number of omitted items is found in the easy test version. this is probably due to the fact that the easy test version contains many easy items and that omission of items is related to the difficulty of the item (see, e.g., pohl, gräfe, & rose, 2014). the lowest number of not reached items is found in the reduced test version. thus, the reduction of texts and items to work on within the given assessment time does increase the number of items reached. the lowest number of invalid missing responses occurs in the reduced test version. this is likely because the reduced test version contains fewer matching items; this is the item format with the largest number of invalid responses (pohl et al., 2012). table 1 averages of the relative frequency of missing responses type of missing response test sen-l m lat m omitted standard 6.72 4.78 reduced 5.20 2.70 easy 2.01 0.92 not reached standard 10.46 9.45 reduced 3.90 1.04 easy 5.63 3.44 invalid standard 1.13 0.44 reduced 0.48 0.18 easy 1.59 0.18 total number of missing responses standard 18.31 14.67 reduced 9.58 3.92 easy 9.22 4.53 note. sen-l = special educational needs in learning; lat = lowest academic track. 5.2 item fit 5.2.1 standard test first we analyzed item fit for the standard reading test for students with sen-l and students in the lowest academic track. overall, item discrimination is relatively small for students with sen-l. the mean item discrimination is .25 (it is .34 in the lowest academic track). four items show a slight misfit (discrimination between .1 and .2) and 10 items a strong misfit (discrimination less than .1). in the lowest academic track, there is only one item with a strong misfit and nine items with a slight misfit. evaluation of further fit measures for students with sen-l confirms these results. table 2 depicts the number of misfitting items for the wmnsq, icc, and point-biserial correlations. summarizing these results, there is a large amount of items in the standard test that do not fit. eap-reliability of competence südkamp et al 12 | f l r scores for students in the lowest academic track is sufficiently high (rel = 0.823), while it is considerably lower for students with sen-l (rel = 0.652).we can conclude that students with sen-l may not be tested appropriately with the standard reading test. in contrast, fit indices in the lowest academic track indicate a relatively good item fit that is comparable to the fit found in the main sample of general education students (see pohl et al., 2012 for the results in the main study). the results indicate that the test is appropriate not only for the main sample including students attending higher academic tracks but also for low-performing students. table 2 number of items with misfit indicated by weighted mean square (wmnsq), item characteristic curve (icc), and point-biserial correlations fit measure test sen-l lat wmnsq standard 7 7 reduced 2 1 easy 1 3 icc standard 15 9 reduced 17 2 easy 12 1 point-biserial correlations standard 21 3 reduced 14 1 easy 5 0 note. sen-l = special educational needs in learning; lat = lowest academic track. 5.2.2 reduced test the item discriminations of the items in the reduced test version indicate a better item fit for students in the lat than for students with sen-l. for both target groups, the reduced test shows better item fit indices than the standard test version. for students with sen-l, there are six items with a slight misfit (discrimination between .1 and .2) and five items with a strong misfit (discrimination below .1). note that— not necessarily—the items showing misfit in the standard reading test, also show low discriminations in the reduced test. this may indicate that problems with testing of students with sen-l do not necessarily lay in the specificity of the items, but may reflect other aspects of testing. the mean item discrimination is .28. in contrast, for students in the lat the mean item discrimination is .47 and there is only one item with a slight misfit and one item with a strong misfit. note that the item with the strong misfit was also problematic in the main sample. evaluation of the wmnsq, the iccs, as well as of the point-biserial correlations of the responses (see table 2) corroborates these findings. the results show that the items in the reduced test version have a good item fit for students in the lat. they have, however, an insufficient fit in the students with sen-l group. nevertheless, the item fit in the students with sen-l group is better for the reduced test than for the standard test. as in the standard test, eap-reliability was sufficiently high for students in the lowest academic track (rel = 0.850) but it was not sufficient for students with sen-l (rel = 0.525). 5.2.3 easy test the items in the easy test fit the data for both target groups better than the standard test. for students with sen-l there are only four items with a slight misfit and three items with a strong misfit. the mean item discrimination for students with sen-l is .30, while it is .46 for the students in the lat. in the students of the lat group, there is no item with an unsatisfactory discrimination. also the other fit measures evaluated (see table 2) show that the items in the easy test version fit the model in the group of students in the lat südkamp et al 13 | f l r but show some misfit in the students with sen-l group. the eap-reliability for students in the lowest academic track was high (rel = 0.877), while it was not satisfactory for students with sen-l (rel = 0.600) compared to the other two test versions, the easy test version shows the best model fit for students with sen-l. 5.3 investigation of item misfit we further investigated the occurrence of item misfit based on test characteristics. we did not find any systematic relationship between item misfit and the different dimensions of the conceptual framework of the reading test (text function, cognitive requirements, and item format). however, we did find a relationship between item misfit and item difficulty. 5.3.1 standard test the correlation of the item difficulty estimated in the main sample—thus being independent of the measurement model in the sen-l group—and item discrimination within the students with sen-l group is -.492. the more difficult an item, the lower is the discrimination. this may be an indication of disadvantageous test targeting—that is, inappropriate item difficulties for this target group. the items in the standard test are too difficult for students with sen-l (mean item difficulty with the mean of the reading ability set to zero = 0.58 logits), while item difficulties match the abilities of the students of the lowest academic track well and are in fact rather easy (mean item difficulty = -0.41 logits). here, the correlation between item difficulty estimated in the main sample and item discrimination for students in the lowest academic track is -.324. note that since the measurement model of the standard test in the lowest academic track was estimated based on a subsample of the main sample, estimated item difficulty is not independent of the estimated item discrimination in the sample of students of the lowest academic track in the main sample. 5.3.2 reduced test in the group of students in the lat item fit of the reduced test is not substantively correlated with item difficulty (cor = -0.06) and is considerably negatively correlated in the students with sen-l group (cor = -.43). within students in the lat, there is no relationship between item difficulty and item misfit, while in the students with sen-l group, items with high difficulty show larger item misfit. this may also be a result of the small variance in item discrimination in the group of students in the lat for this test version. test targeting shows that the reduced test is still too difficult for students with sen-l (mean item difficulty = 0.43 logits) but too easy for students in the lower academic track of general education (mean item difficulty = -1.03 logits). 5.3.3 easy test since most of the items in the easy test are not part of the standard test, we did not compute correlations between item difficulty and item fit. however, we did investigate test targeting. in test targeting, the easy test version is also too easy for students in the lat (mean item difficulty = -0.99 logits) and too hard for students with sen-l (mean item difficulty = 0.61 logits). note that the easy test version is even more difficult than the reduced test version. 5.4 measurement invariance 5.4.1 standard test table 3 shows the absolute differences in estimated item difficulties first, between general education students and students with sen-l and second, between students in the lowest academic track and students in südkamp et al 14 | f l r other tracks of the main sample taking the standard test version. for students with sen-l, negative values in the table indicate a higher item difficulty compared to general education students while positive values indicate lower item difficulty. for students in the lowest academic track, negative values indicate a higher item difficulty for these students compared to students in other tracks in the main sample and positive values indicate a lower item difficulty. table 3 differential item functioning (dif) in the different test versions and student groups differential item functioning item difficulty sen-l lat standard reduced easy standard reduced easy reg50110 -1.909 -1.010 -0.942 -0.304 -0.256 reg50121 -2.814 -1.678 -1.200 -0.320 0.246 reg50122 -2.063 -0.926 -0.800 -0.276 -0.360 reg50123 -2.078 -0.444 -0.848 -0.222 -0.072 reg50124 -2.236 -0.510 -0.930 -0.140 -0.246 reg50125 -2.202 -1.018 -0.752 -0.260 -0.442 reg50126 -1.793 -0.652 -0.512 0.018 0.858 reg50127 -2.173 -0.714 -1.234 -0.414 -0.362 reg50130 -0.805 -0.850 -0.090 -0.068 -0.106 reg50140 -0.148 -0.382 -0.288 -0.132 -0.024 reg50150 0.874 -0.400 0.200 reg50161 0.542 -1.688 -1.388 -0.536 -0.302 reg50162 0.149 -1.020 -0.130 -0.310 -0.686 reg50163 0.035 -0.348 0.422 -0.342 -0.330 reg50164 -0.076 -1.320 -0.774 -0.466 -0.116 reg50165 0.048 -0.302 -0.042 -0.428 -0.368 reg50170 2.351 0.570 -0.294 reg50210 -1.411 -1.054 -0.566 -0.564 -0.352 -0.574 -0,304 reg50220 1.436 1.200 1.602 1.360 0.576 0.414 0,490 reg50230 -1.187 -0.926 -0.814 -0.850 -0.148 0.006 -0.148 reg50240 0.050 -0.082 0.146 0.232 -0.094 -0.044 -0.226 reg50250 0.667 0.164 0.344 -0.096 0.134 0.170 -0.018 reg50261 -1.352 -0.318 -0.204 reg50262 1.924 0.580 -0.038 reg50263 2.159 0.172 0.088 reg50264 2.167 -0.290 0.188 reg50265 2.195 0.724 0.180 reg50266 2.221 1.016 0.116 reg50310 -0.867 -0.824 -1.254 -0.756 -0.318 -0.142 -0.444 reg50320 -1.425 -0.982 -0.870 -0.798 -0.464 -0.196 -0.066 reg50330 -1.185 -1.654 -1.632 -1.020 -0.440 -0.106 -0.154 reg50340 -0.158 -0.570 0.378 0.026 -0.186 0.078 0.030 reg50350 0.838 0.082 0.310 0.420 0.102 0.028 0.222 reg50360 -0.887 -0.844 -0.324 -0.324 -0.274 -0.062 -0.130 (continued) südkamp et al 15 | f l r item difficulty sen-l lat standard reduced easy standard reduced easy reg50370 0.140 -0.256 0.318 -0.058 0.020 0.288 -0.100 reg50410 0.885 0.370 0.206 reg50421 -0.481 0.042 0.342 reg50422 -0.225 0.380 0.666 reg50423 0.243 1.268 0.536 reg50430 2.371 0.772 0.080 reg50452 0.531 1.586 0.590 reg50440 1.922 1.264 0.374 reg50451 0.183 1.526 0.716 reg50460 1.356 0.436 0.100 reg50510 -0.898 -0.532 -0.618 -0.594 0.044 reg50521 -0.313 -0.052 0.878 -0.366 -0.058 reg50522 -0.635 -0.156 0.966 -0.080 0.416 reg50523 -0.004 0.366 1.066 0.214 0.250 reg50524 -0.634 0.256 0.872 -0.318 0.704 reg50530 1.487 0.770 0.206 reg50540 0.064 -0.262 0.748 -0.428 -0.096 reg50551 -0.035 0.064 -0.756 0.030 -0.090 reg50552 1.135 0.188 1.214 -0.184 -0.108 reg50553 0.385 0.210 0.540 -0.452 0.214 reg50560 1.125 0.716 0.938 0.624 0.666 reg50570 0.515 -0.334 0.392 -0.330 0.206 note. sen-l = special educational needs in learning; lat = lowest academic track. the results clearly show measurement invariance for students in the lowest academic track and large differences in estimated item difficulties for students with sen-l. for students in the lowest academic track, of the 56 items there is no item with strong dif (absolute difference in item difficulties greater than 1) and only three items with slight dif (absolute difference in item difficulties between 0.6 and 1). for students with sen-l there are 12 items with slight dif and 14 items with strong dif. the results indicate that measurement invariance holds for students in the lowest academic track but that the test measures a different construct for the group of students with sen-l compared to general education students. thus, reading test scores for students with sen-l are not comparable to test scores for general education students. 5.4.2 reduced test table 3 also shows dif for the accommodated test versions. in the reduced test, for students with sen-l, 15 out of 38 items have slight dif and eight items have strong dif. only 15 items show no considerable dif. thus, the measurement of reading competence with the reduced test is different from that of general education students with the standard test. this does, however, not seem to be a result of the test accommodation. within the group of students in the lat measurement invariance holds as only three items show slight dif. the results indicate that for students with sen-l the measurement model, and thus, the measured construct, is different from that of students in general education. 5.4.3 easy test in the easy test, for students with sen-l three out of twelve anchor items show noticeable dif and two items show strong dif. there are only seven items with no noticeable dif. in contrast, in the lat group there is no noteworthy dif in the easy test and only four items show slight dif in the reduced test. südkamp et al 16 | f l r while measurement invariance may be assumed for the students in the lat, it does not hold for students with sen-l. again, differences in the measurement model do not seem to be induced by the test accommodation, but rather reflect a specific testing problem of students with sen-l. 5.5 item fit and measurement invariance considering both criteria—item fit and measurement invariance—how many items with good psychometric properties are left within the different groups and test versions? is it possible to construct a test out of well-fitting items? figure 1 shows the discrimination and dif of the items in the standard test version for students with sen-l (a) and for students of the lowest academic track (b). the grey lines give the rules of thumb for the evaluation of the items. items within discrimination > .2 and absolute dif < 0.6 have no noticeable misfit or dif. items within .2 > discrimination > .1 and 0.6 < absolute dif < 1 have noticeable but not considerable misfit and/or dif. items with discrimination < .1 and absolute dif > 1 have considerable misfit and/or dif. these items should not be used for testing. figure 1a) shows that a considerable amount of items do not meet the fit and dif criteria in the sen-l group. only 22 out of 56 items show good fit and dif indices. thirteen items show a slight misfit in at least one of the two criteria and 21 items exceed at least one of the criteria for a strong misfit or large dif. there are obviously not many items left that meet the criteria of a good test. for students of the lowest academic track (figure 1b), there is only one item with a slight misfit in either of the two criteria and seven items with a strong deviation from at least one of the two criteria. thus, there are 48 items that meet the criteria of a good test in the lowest academic track group of the main sample. a) students with sen-l südkamp et al 17 | f l r b) students in the lowest academic track of the main sample figure 1. discrimination and differential item functioning of the items in the regular test. sen-l = special educational needs in learning. in the reduced test (see figure 2a), for students with sen-l, 13 out of 38 items show a strong misfit and/or dif, 16 items show a slight deviation from at least one of the two criteria and only nine items are suitable for testing considering both criteria. there are a high number of items that may not be used on a test. again, in the lat group the items fit both criteria very well (figure 2b). only one out of 38 items needs to be excluded due to strong misfit or dif, and only three items show a slight misfit and/or dif. thirty-four items meet the criteria of fit and measurement invariance. the low dif values in the lat group provide evidence in support of the argument that reducing the test length (i.e., increasing the testing time per text and item) does not threaten the comparability of the results. thus, reducing test length may be an appropriate accommodation. however, this accommodation is not sufficient to reliably and comparably measure reading competence for students with sen-l. a) students with sen-l südkamp et al 18 | f l r b) students in the lowest academic track figure 2. discrimination and differential item functioning of the items in the reduced test. sen-l = special educational needs in learning. since there are only 12 items in the easy test that may be tested for dif, we refrained from plotting the different evaluation criteria for this test version. it may, however, be concluded that from the 12 items, there are four with a slight misfit or dif and two with a strong one. only six of the 12 anchor items meet the criteria of fit and dif. since linking may only be done using 12 items, losing six items due to fit and dif problems raises questions as to the appropriateness of this accommodated test version for the group of students with sen-l. as a comparison, in the lat group there is only one of these 12 items with a slight misfit and one with a strong misfit. the results in the lat group are an indication that reducing the difficulty of the test does result in reliable and comparable reading competence measures. however, this test accommodation is not appropriate enough for assessing students with sen-l. 6. discussion the present research dealt with the question of how competencies of students with sen-l may be assessed reliably and comparably to general education students. we assessed the reading competence of students with sen-l using a standard reading test, a reduced reading test, and an easy reading test. we used a group of low-achieving students without sen to test whether the test accommodations alter the measured construct. the results showed that all three reading test versions are suitable for a reliable and comparable measurement of reading competence in students without sen. reducing both test length and item difficulty resulted in reliable measures that are comparable to those of a standard test for general education students. for students with sen-l, the accommodated test versions considerably reduced the amount of missing values. they did not, however, show a satisfactory item fit and measurement invariance. although the testing accommodations increase item fit and measurement invariance for students with sen-l as compared to using a standard reading test, there are still many items unsuitable for a reliable and comparable assessment of reading competence in students with sen-l. thus, the competence scores assessed by the tests in this study are neither suitable for a substantive interpretation of the competence level of students with südkamp et al 19 | f l r sen-l, nor may they be used for a valid comparison of competence levels between students with sen-l and students in general education. concerning the testing accommodations implemented in this study, the reduced test primarily aimed at compensating for information-processing restrictions in students with sen-l (e.g., for slow processing speed) while the easy test primarily aimed at adapting the test to a reduced competence level in reading (by reducing test difficulty in general) thereby improving the accuracy of measurement and avoiding undue frustrations for students with sen-l. since we showed—within the group of students in the lat—that the items in the accommodated test versions have a good fit, we may conclude that the misfit in the group of sen-l students is not due to badly constructed items or to the fact that the test versions changed the measured construct. misfit of items in the sen sample must be due to problems in testing this specific target group. our analyses on test targeting showed that even the accommodated test versions are too difficult for students with sen-l. since item fit became better for accommodated versions, which were composed of easier items than the standard test, we hypothesize that a further reduction in item difficulty may help to improve testing of students with sen-l. this hypothesis is corroborated by the negative correlation of item difficulty and discrimination. still, both testing accommodations focus on general problems faced by students with sen-l when reading (slow processing speed, reduced competence level in reading). in future research, it would be desirable to identify more specific reading problems of students with sen-l that can be addressed in testing accommodations. another explanation for item misfit in the sample of students with sen-l may lay in the test-taking behavior (such as guessing or item omission, see pohl, südkamp, hardt, carstensen, & weinert, 2015). it is also possible that differences in item fit between the students in the lat and the students with sen-l are due to differences in school curricula. comparing the three test versions—the standard test, the reduced test, and the easy test—in the lat group, the accommodated test versions resulted in better competence measures than the standard test. for students with sen-l, the easy test showed the best results regarding item fit, test targeting, and dif. since in the reading test, items are grouped to sets belonging to different texts, constructing a reading test from wellfitting and measurement invariant items is a difficult encounter. this is different in other competence domains of the neps that do not have such a strong testlet structure (see weinert et al., 2011, for a description of the tests). 6.1 strengths and limitations studying the effects of testing accommodations not only in groups of students with sen-l but also in groups of students in general education (here: low-performing students), is a promising approach to the identification of appropriate testing accommodations. in many previous studies, accommodated test versions were only applied to students with sen. thus, one could not disentangle whether low psychometric properties of accommodated tests and change of the measured construct were due to testing accommodations or testability problems of students with sen. using the lat group allowed us to investigate whether the applied testing accommodations generally provide reliable and measurement invariant measures of reading competence. with the results in the group of lat students, we ruled out the premise that misfit and measurement invariance for students with sen-l is due to changes in the measured construct resulting from a reduction in test length or reduction in item difficulty. considering the wide range of competence levels of students in general education, students in the lat are the group of students without sen being closest in competence level to students with sen. thus, the accommodated test versions—that are targeted towards students with sen-l—will still be better targeted to students in the lat than to all students in general education. the study’s strength also lies in the use of a sophisticated methodological approach and the evaluation of various measures of item fit in addition to differential item functioning. when using methods of irt, other studies on the assessment of students with sen mainly report dif but leave out information on item fit in the sample of students with sen (abedi, leon, & kao, 2008; bolt & ysseldyke, 2008). südkamp et al 20 | f l r considering the group of students with sen-l, using data from a relatively large representative sample allows us to draw credible conclusions. however, our samples of students with sen-l and students in the lat group considerably differed in their size. there were about twice as many students in the lat group compared to the students with sen-l group. for some testing conditions the sample was comparatively small. for example, only 84 students with sen-l were assessed with the reduced test version. due to the large number of missing responses, there were items with just 52 valid responses. fit and dif measures may, as a consequence, be unreliable. we tried to account for this in the evaluation of the fit and dif criteria. one might also argue that the group of students with sen-l is still a highly heterogeneous one, including, for example, students with different performance and ability profiles in the cognitive domain. compared to prior research, however, the target population is rather homogeneous as students with sen in areas other than learning (e.g., those with physical impairments) are precluded. other studies investigated appropriateness of competence assessments on even more heterogeneous groups of students (e.g., lutkus et al., 2004, including students with disabilities in general). possible testing problems may, however, only occur for students with specific disabilities (e.g., for students with sen-l, but not for students with visual impairments) or for specific testing accommodations. analyzing the whole group of students with disabilities and running analyses across all types of testing accommodations may mask possible testing effects. in our study we focused on a specific group of students with sen and analyzed different testing accommodations separately. item misfit and dif do not need to be caused by all students with sen-l, but only by a certain group of students. however, we did not account for interindividual differences within our samples in this study. in ongoing research, we (pohl et al., 2015) use a person-based approach and try to empirically identify groups of students with sen-l whose assessment is especially challenging. here, we assume that individual student characteristics (e.g., individual test taking strategies, cognitive performance profiles) are related to testability2. 6.2 implications and future research incorporating easy instead of hard items in the test version (e.g., as done in the easy test version), is methodologically seen a form of adaptive testing. adaptive testing is currently discussed in large-scale studies such as the naep (xu, sikali, oranje, & kulick, 2011), the programme for international student assessment (pisa; pearson, 2011), and the neps (pohl, 2014). if better test targeting is one of the key issues for testing students with sen-l, adaptive testing procedures for general education students may well be extended to include students with sen-l. one way to systematically reduce difficulty in reading tests for students with sen might be a reduction in grammatical and lexical complexity of texts and items (abedi et al., 2011). in upcoming feasibility studies within the neps, seventh graders with sen-l will be tested with a standard reading test that is reduced in grammatical and lexical complexity. in another feasibility study in grade 3 we will examine the effects of newly developed test instructions on students’ test performance, missing values, and invalid answers, as well as on their motivation, and test anxiety. there are numerous and manifold arguments for the inclusion of students with sen in large-scale assessments. however, the issue of whether students with sen-l may be assessed reliably and comparably in large-scale assessments —and if so how—remains to be an important and complex question. in our study, we aim to present a sophisticated design and a comprehensive methodological approach to these questions 2 in the present study, differences in test taking in students with and without sen-l might also be caused by differences in school curricula. this alternative hypothesis could be tested by comparing students with sen-l attending general education and special schools. however, in germany only few students with sen-l attended general education schools at the time of data collection and these students often differ in individual as well as in social background characteristics from students attending special schools. südkamp et al 21 | f l r and to shed light on them. we think that the systematic identification of specific testing accommodations for groups of students with sen is a promising approach. keypoints so far, data on the acquisition and development of competencies of students with special educational needs in learning (sen-l) are rare. assessing competencies of students with special educational needs within large scale assessments is challenging. this study addresses the question of whether and how satisfying item fit measures and measurement invariant test scores can be obtained for students with sen-l in large-scaleassessments. testing accommodations may result in reliable and to the standard test comparable competence measures. the investigated testing accommodations helped to some extent to increase the testability of students with sen-l. the systematic identification of further appropriate testing accommodations is a promising approach to the assessment of students with sen-l. acknowledgments this paper uses data from the national educational panel study (neps). from 2008 to 2013, neps data were collected as part of the framework program for the promotion of empirical educational research funded by the german federal ministry of education and research (bmbf). as of 2014, the neps survey is carried out by the leibniz institute for educational trajectories (lifbi) at the university of bamberg in cooperation with a nationwide network. we especially thank cordula artelt, claus h. carstensen, jana heydrich, lena nusser, and markus messingschlager for their contribution to this study. our thanks also go to the staff of the neps administration of surveys and to the methods group. we would also like to thank the anonymous reviewers for their comments on earlier versions of the manuscript and erika fisher for copy editing services. references abedi, j., leon, s., & kao, j. (2008). examining differential item functioning in reading assessments for students with disabilities. (cresst report 744). los angeles, ca: university of california, los angeles, national center for research on evaluation, standards, and student testing. abedi, j., leon, s., kao, j., bayley, r., ewers, n., herman, j., & mundhenk, k. (2011). accessible reading assessments for students with disabilities: the role of cognitive, grammatical, lexical, and textual/visual features (cresst report 785). los angeles, ca: university of california, los angeles, national center for research on evaluation, standards, and student testing. südkamp et al 22 | f l r american educational research association, american psychological association, & national council on measurement in education (1999). standards for educational and psychological testing. washington, dc: american educational research association. aßmann, c., steinhauer, h. w., kiesl, h., koch, s., schönberger, b., müller-kuller, a., … blossfeld, h.-p. (2011). sampling designs of the national educational panel study: challenges and solutions. zeitschrift für erziehungswissenschaft, 14, 51-65. doi:10.1007/s11618-011-0181-8 aßmann, c., steinhauer, h. w., & zinn, s. (2012). weighting the fifth and ninth grader cohort samples of the national educational panel study, panel cohorts (technical report). bamberg, germany: university of bamberg national educational panel study, retrieved from https://www.nepsdata.de/portals/0/neps/datenzentrum/forschungsdaten/sc3/1-0-0/sc3_sc4_1-00_weighting_en.pdf. bäumer, t., preis, n., roßbach, h.-g., stecher, l., & klieme, e. (2011). education processes in life-coursespecific learning environments. zeitschrift für erziehungswissenschaft, 14, 87-101. doi:10.1007/s11618-011-0183-6 barkow, i., leopold, t., raab, m., schiller, d., wenzig, k., blossfeld, h.-p., & rittberger, m. (2011). remoteneps: data dissemination in a collaborative workspace. zeitschrift für erziehungswissenschaft, 14, 315-325. doi: 10.1007/s11618-011-0192-5 bielinski, j., thurlow, m. l., ysseldyke, j. e., freidebach, j., & freidebach, m. (2001). read-aloud accommodations: effects on multiple-choice reading and math items (nceo technical report 31). minneapolis, mn: university of minnesota, national center on educational outcomes. blossfeld, h.-p., & von maurice, j. (2011). education as a lifelong process. zeitschrift für erziehungswissenschaft, 14, 19-34. doi:10.1007/s11618-011-0179-2 blossfeld, h.-p., von maurice, j., & schneider, t. (2011). the national educational panel study: need, main features, and research potential. zeitschrift für erziehungswissenschaft, 14, 5-17. doi:10.1007/s11618-011-0178-3 bolt, s. e., & ysseldyke, j. (2008). accommodating students with disabilities in large-scale testing: a comparison of differential item functioning (dif) identified across disability types. journal of psychoeducational assessment, 26, 121-138. doi:10.1177/0734282907307703 borsboom, d. (2006). the attack of the psychometricians. psychometrika, 71, 425-440. doi: 10.1007/s11336-006-1447-6 bos, w., bonsen, m., gröhlich, c., guill, k., may, p., rau, a., et al. (2009). kess 7: kompetenzen und einstellungen von schülerinnen und schülern—jahrgangsstufe 7 [kess 7: competencies and attitudes of students in grade 7]. hamburg, germany: behörde für bildung und sport. chudowsky, n., & pellegrino, j. (2003). large-scale assessment that support student learning: what will it take? theory into practice, 42, 75-83. doi:10.1207/s15430421tip4201_10 cormier, d. c., altman, j., shyyan, v., & thurlow, m. l. (2010). a summary of the research on the effects of test accommodations: 2007-2008 (technical report 56). minneapolis, mn: university of minnesota, national center on educational outcomes. cortiella, c., & horowitz, s. h. (2014). the state of learning disabilities: facts, trends and emerging issues. new york: national center for learning disabilities. dolan, r. p., & hall, t. e. (2001). universal design for learning: implications for large-scale assessment. ida perspectives, 27, 22-25. duke, n. k., & pearson, p. d. (2002). effective practices for developing reading comprehension. in a. e. farstrup & s. j. samuels (eds.), what research has to say about reading instruction (pp. 205–242). newark, de: international reading association. durkin, d. (1993). teaching them to read. boston, ma: allyn and bacon. fuchs, l. s., fuchs, d., eaton, s. b., hamlett, c. l., & karns, k. m. (2000). supplementing teacher judgments of mathematics test accommodations with objective data. school psychology review, 29, 65–85. südkamp et al 23 | f l r gee, j. p. (2004). reading as situated language: a sociocognitive persepective. in r. b. ruddell & n. j. unrau (eds.), theoretical models and processes of reading (pp. 116-132). newark: international reading association. gehrer, k., zimmermann, s., artelt, c., & weinert, s. (2013). neps framework for assessing reading competence and results from an adult pilot study. journal of educational research online, 5, 50-79. gehrer, k., zimmermann, s., artelt, c., & weinert, s. (2012). the assessment of reading competence (including sample items for grade 5 and 9) [scientific use file 2012, version 1.0.0.] bamberg: university of bamberg, national educational panel study. geisinger, k. f. (1994). psychometric issues in testing students with disabilities. applied measurement in education, 7, 121-140. doi:10.1207/s15324818ame0702_2 gersten, r., fuchs, l. s., williams, j. p., & baker, s. (2001). teaching reading comprehension strategies to students with learning disabilities: a review of research. review of educational research, 71, 279-320. doi:10.3102/00346543071002279 gross, c., jobst, a., jungbauer-gans, m., & schwarze, j. (2011). educational returns over the life course. zeitschrift für erziehungswissenschaft, 14, 139-153. doi:10.1007/s11618-011-0195-2 grünke, m. (2004). lernbehinderung [learning disabilities]. in lauth, g., grünke, m., & brunstein, j. (eds.). interventionen bei lernstörungen [interventions to learning deficits](pp. 65-77). göttingen: hogrefe. heydrich, j., weinert, s., nusser, l., artelt, c., & carstensen, c. h. (2013). including students with special educational needs into large-scale assessments of competencies: challenges and approaches with the german national educational panel study (neps). journal of educational research online, 5, 217240. hollenbeck, k. tindal, g. almond, p. (1998). teachers’ knowledge of accommodations as a validity issue in high-stakes testing. the journal of special education, 32, 175-183. kavale, k. a., & reece, j. h. (1992). the character of learning disabilities. learning disability quarterly, 15, 74-94. doi: http://dx.doi.org/10.2307/1511010 kintsch, w. (2007). comprehension: a paradigm for cognition. cambridge, uk: cambridge university press. kmk – sekretariat der ständigen konferenz der kultusminister der länder in der bundesrepublik deutschland [standing conference of the ministers of education and cultural affairs of germany] (2012). sonderpädagogische förderung in schulen 2001–2010 [special education in schools 2001– 2010]. retrieved from http://www.kmk.org/fileadmin/pdf/statistik/komstat/dokumentation_sopaefoe_2010.pdf kolen m. j., & brennan r. l. (2004). test equating, scaling, and linking. new york, ny: springer-verlag. koretz, d. m. (1997). the assessment of students with disabilities in kentucky (cse technical report 431). los angeles, ca: cresst/rand institute on education and training. koretz, d. m., & barton, k. e. (2003). assessing students with disabilities: issues and evidence (cse technical report 587). los angeles, ca: university of california, center for the study of evaluation. kristen, c., edele, a., kalter, f., kogan, i., schulz, b., stanat, p., & will, g. (2011). the education of migrants and their children across the life course. zeitschrift für erziehungswissenschaft, 14, 121-137. doi:10.1007/s11618-011-0194-3 lovett, b. j. (2010). extended time testing accommodations for students with disabilities: answers to five fundamental questions. review of educational research, 80, 611-638. doi:10.3102/0034654310364063 lutkus, a. d., mazzeo, j., zhang, j., & jerry, l. (2004). including special-needs students in the naep 1998 reading assessment part ii: results for students with disabilities and limited-english proficient students (research report ets-naep 04-r01). princeton, nj: ets. millsap, r. e. (2011). statistical approaches to measurement invariance. new york, ny: routledge. minnema, j., thurlow, m., bielinski, j., & scott, j. (2000). past and present understandings of out-of-level testing: a research synthesis. (out-of-level testing project report 1). minneapolis, mn: university of minnesota, national center on educational outcomes. retrieved from http://education.umn.edu/nceo/onlinepubs/oolt1.html südkamp et al 24 | f l r müller, k., sälzer, c., mang, j., & prenzel, m. (2014, march). kompetenzen von schülerinnen und schüler mit besonderem förderbedarf. ergebnisse aus dem pisa 2012 förderschul-oversample [competencies of students with special educational needs. results from the pisa 2012 oversample of special schools]. paper presented at the conference of the german association for empirical educational research, frankfurt, germany. oecd – organisation for economic co-operation and development. (1999). measuring student knowledge and skills: a new framework for assessment. paris, france: oecd. pearson (2011, october 7th). pearson to develop framework for oecd’s pisa students assessment for 2015 [pearson announcement]. retrieved from http://www.pearson.com/news/2011/october/pearson-todevelop-frameworks-for-oecds-pisa-student-assessment-f.html?article=true pellegrino, j., chudowsky, n., & glaser, r. (2001). knowing what students know: the science and design of educational assessment. washington, d. c.: national academy press. pitoniak, m. j., & royer, j. m. (2001). testing accommodations for examinees with disabilities: a review of psychometric, legal, and social policy issues. review of educational research, 71, 53-104. doi:10.3102/00346543071001053 pohl, s. (2014). longitudinal multi-stage testing. journal of educational measurement, 50, 447-468. doi: 10.1111/jedm.12028 pohl, s., & carstensen, c. h. (2012). neps technical report: scaling the data of the competence test (neps working paper no. 14). bamberg, germany: university of bamberg, national educational panel study. pohl, s., & carstensen, c. h. (2013). scaling the competence tests in the national educational panel study—many questions, some answers, and further challenges. journal for educational research online, 5, 189-216. pohl, s., gräfe, l., & rose, n. (2014). dealing with omitted and not reached items in competence tests evaluating approaches accounting for missing responses in irt models. educational and psychological measurement, 74, 423-452. doi: 10.1177/0013164413504926 pohl, s., haberkorn, k., hardt, k., & wiegand, e. (2012). neps technical report for reading—scaling results of starting cohort 3 in fifth grade (neps working paper no. 15). bamberg, germany: university of bamberg, national educational panel study. pohl, s., südkamp, a., hardt, k., carstensen, c. h., & weinert, s. (2015). testability and test-taking behavior of students with special educational needs in large-scale assessments. manuscript submitted for publication. popham, w. j. (2000). educational measurement. boston, ma: allyn and bacon. rasch, g. (1960). probabilistic models for some intelligence and attainment tests. copenhagen: nielsen & lydiche (expanded edition, chicago, university of chicago press, 1980). ritchey, k. d., silverman, r. d., schatschneider, c., & speece, d. l. (2015). prediction and stability of reading problems in middle childhood. journal of learning disabilities, 48, 298-309. doi:10.1177/0022219413498116 sireci, s. g., scarpati, s. e., & li, s. (2005). test accommodations for students with disabilities: an analysis of the interaction hypothesis. review of educational research, 75, 457-490. doi:10.3102/00346543075004457 stocké, v., blossfeld, h.-p., hoenig, k., & sixt, m. (2011). social inequality and educational decisions in the life course. zeitschrift für erziehungswissenschaft, 14, 103-199. doi:10.1007/s11618-011-0193-4 swanson. (1999). reading research for students with ld: a meta-analysis of intervention outcomes. journal of learning disabilities, 32, 504-532. doi:10.1177/002221949903200605 thompson, s. j., johnstone, c. j., anderson, m. e., & miller, n. a. (2005). considerations for the development and review of universally designed assessments (nceo technical report 42). minneapolis, mn: university of minnesota, national center on educational outcomes. thurlow, m. l. (2010). steps toward creating fully accessible reading assessments. applied measurement in education, 23, 121-131. doi:10.1080/08957341003673765 südkamp et al 25 | f l r thurlow, m. l., bremer, c., & albus, d. (2008). good news and bad news in disaggregated subgroup reporting to the public on 2005-2006 assessment results (technical report 52). minneapolis, mn: university of minnesota, national center on educational outcomes. thurlow, m., elliott, j., & ysseldyke, j. (1999). out-of-level testing: pros and cons (policy directions no. 9). minneapolis, mn: university of minnesota, national center on educational outcomes. retrieved from http://education.umn.edu/nceo/onlinepubs/policy9.htm tindal, g., heath, b., hollenbeck, k., almond, p., & harniss, m. (1998). accommodating students with disabilities on large-scale tests: an experimental study. exceptional children, 64, 439–450 u.s. department of education, national center for education statistics. (2013). digest of education statistics, 2012 (nces 2 014-015). verhoeven, l., & van leeuwe, j. (2008). prediction of the development of reading comprehension: a longitudinal study. applied cognitive psychology, 22, 407-423. doi:10.1002/acp.1414 weinert, f. e. (2001). concept of competence: a conceptual clarification. in d. s. rychen, l. & h. salganik (eds.), defining and selecting key competencies (pp. 45-66). seattle: hogrefe & huber. weinert, s., artelt, c., prenzel, m., senkbeil, m., ehmke, t., & carstensen, c. h. (2011). development of competencies across the life span. zeitschrift für erziehungswissenschaft, 14, 67-86. doi:10.1007/s11618-011-0182-7 woodcock, s., & vialle, w. (2011). are we exacerbating students’ learning disabilities? an investigation of pre-service teachers’ attributions of the educational outcomes of students with learning disabilities. annals of dyslexia, 61, 223-241. doi:10.1007/s11881-011-0058-9 wright, b. d., & masters, g. n. (1982). rating scale analysis: rasch measurement. chicago, il: mesa press. wu, m. (1997). the development and application of a fit test for use with marginal maximum likelihood estimation and generalized item response models (unpublished doctoral dissertation). melbourne, australia: university of melbourne. wu, m., adams, r. j., wilson, m., & haldane, s. (2007). conquest 2.0. [computer software] camberwell, australia: acer press. wu, y.-c., liu, k. k., thurlow, m. l., lazarus, s. s., altman, j., & christian, e. (2012). characteristics of low performing special education and non-special education students on large-scale assessments (technical report 60). minneapolis, mn: university of minnesota, national centre on educational outcomes. xu, x., sikali, e., oranje, a., & kulick, e. (2011, april). multi-stage testing in educational survey assessments. paper presented at the annual meeting of the national council on measurement in education (ncme), new orleans, la. yovanoff, p., & tindal, g. (2007). scaling early reading alternate assessments with statewide measures. exceptional children, 73, 184-201. ysseldyke, j. e., thurlow, m. l., langenfeld, k. l., nelson, r. j., teelucksingh, e., & seyfarth, a. (1998). educational results for students with disabilities: what do the data tell us? (technical report 23). minneapolis, mn: university of minnesota, national center on educational outcomes. zebehazy, k. t., zigmond, n., & zimmerman, g. j. (2012). ability or access-ability: differential item functioning of items on alternate performance-based assessment tests for students with visual impairments. journal of visual impairment & blindness, 106, 325-338. table of footnotes 2 as for the term sen-l, the term “learning disabilities” is not clearly defined. note that we refer to a heterogeneous group of students with multifaceted etiology. 3 in the present study, differences in test taking in students with and without sen-l might also be caused by differences in school curricula. this alternative hypothesis could be tested by comparing südkamp et al 26 | f l r students with sen-l attending general education and special schools. however, in germany only few students with sen-l attended general education schools at the time of data collection and these students often differ in individual as well as in social background characteristics from students attending special schools. frontline learning research 6 (2014) 15-24 issn 2295-3159 corresponding author: i.molenaar@pwo.ru.nl doi: http://dx.doi.org/10.14786/flr.v2i4.118 15 | f l r advances in temporal analysis in learning and instruction inge molenaar a a radboud university nijmegen article received 8 june 2014 / revised 23 september 2014 / accepted 23 september 2014 / available online 23 december 2014 abstract this paper focuses on a trend to analyse temporal characteristics of constructs important to learning and instruction. different researchers have indicated that we should pay more attention to time in our research to enhance explanatory power and increase validity. constructs formerly viewed as personal traits, such as self-regulated learning and motivation, are now conceptualized as a series of events that unfold over time. this raises new questions with regard to the temporal characteristics of these constructs and their dynamic interplay with learner and context characteristics. even though the value of analyzing temporal characteristics is becoming evident, a number of challenges need to be tackled in order to make progress in the field of learning and instruction. first, we need to be aware of the paradigm shift that temporal analysis entails. second, a common understanding of different dimensions of time and the position of temporal characteristics therein can facilitate our time-related research dialogue. third, a better understanding how to answer time-related questions with appropriate methodological approaches needs to emerge. fourth, researching temporal characteristics requires procedures and guidelines for segmenting time units. fifth, temporal data are mostly collected at the micro level, whereas most theory is defined at a macro level; consequently we need to bridge these differences in the granularity used between collecting, coding and theorizing to enhance meaning making. finally, so far, most examples of time-related research are exploratory or comparative studies; the next step is to move toward confirmative studies, which constitute the “holy grail” of temporal analysis. keywords: temporal analysis; learning and instruction; time; methodologies i. molenaar 16 | f l r 1. introduction learning is defined as the acquisition of skills and knowledge and can be recognized through changes in the learners’ behaviour (mayer 2008; zimmerman 2002). the concept of time is innate to learning, as it takes time to acquire skills and knowledge and to signal changes in behaviour. in learning and instruction research, we mostly capture time in preand post-test designs. as such we often focus on a narrow concept of time, reducing the temporal characteristics of learning to the changes between preand post-tests which reduces validity and explanatory power of our research. currently, technological advancements increase our ability to gain traces of learners while they are learning, which is an important facilitator to overcome this limited focus on time in our field (greene & azevedo, 2010; reimann, 2009; winne, 2010). a steadily growing group of researchers is raising questions that address how different constructs act and develop over time (bannert et al., 2014, greene & azevedo, 2010; molenaar & chiu, 2014; riemann, 2009; schmitz, 2006; wise & chiu, 2011). with this growing interest in temporal characteristics of constructs at the heart of learning and instruction research, the need for temporal analysis is becoming more prevalent. this paper focuses on the developing trend in learning and instruction research to analyze temporal characteristics of different constructs. the rationale for temporal analysis in our field is discussed as well as the fact that temporal analysis entails a deviation from our main research approach, changing our analysis from characteristics of students to attributes of. learning activities (riemann, 2009; schmidt, 2006). researchers have conceptualized temporal characteristics of learning and instruction constructs in many ways in their research, leading to a diverse set of dimensions of time driving research questions. it is argued that a conceptual framework of temporal characteristics can support transparency and enhance comparability in the field. lastly, a number of challenges are discussed that we, as a field, need to overcome to successfully engage in temporal analysis. 2. the rationale for temporal analysis a number of researchers in the field of computer-supported learning (kapur, 2011; reiman, 2009;) and self-regulated learning (greene & azevedo, 2010; schmitz, 2006; schoor & bannert, 2012) indicate that we should pay more attention to time in the learning process. existing research methods do not “fully” utilize the temporal information embedded in the data collected (kapur, voiklis & kinzer, 2008; wise, perera, hsiao, speer & marbouti, 2012). this reduces the explanatory power of the analysis performed and limits the validity of the conclusions drawn (akhras & self, 2000; reimann, 2009). for example, kuvalja and colleagues (2014) show that self-directed speech and self-regulatory behaviour of children with a specific language impairment does not differ in frequency; neither the number of self-directed speech and selfregulatory events during learning, nor the order between the these events as detected by sequential lag analysis differed, but there was a difference between the two groups in the co-occurrence of self-directed speech and self-regulatory behaviour as detected by temporal patterns analysis (magnusson, 2000). this indicates that without proper temporal analysis, existing differences between groups of learners cannot be detected. moreover, a number of constructs, such as self-regulated learning and motivation, that were traditionally viewed as a trait of the learner are now conceptualized as a series of events (bannert et al. 2014; greene & azevedo, 2009; schmitz, 2006). driving this conceptual change are indications that self-report data have little relation with the actual student behaviour during learning (veenman, 2011). these findings point towards the need for new conceptualisations of these constructs. a temporal conceptualisation viewing i. molenaar 17 | f l r self-regulated learning as a series of events that act differently over time and changing contexts, might overcome these issues (azevedo et al., 2010; hadwin & järvelä, 2011). for example, malmberg and colleagues (2014) show that students use different strategies and learning patterns when working on an illstructured task compared to a well-structured task. a series of events can be perceived as a process that unfolds over time in a certain order (reimann, 2009). for example, self-regulated learning processes of successful students show a cyclical order among different strategies that repeat over time (bannert et al. 2014). moreover, molenaar and chiu (2014) found strong positive predictive relations between different learning activities during collaborative learning over time. the changed conceptualization of constructs raises new questions with regard to the characteristics of these constructs and their dynamic interplay with the learning context. finally, an emerging interest is in connecting different levels of analysis (hollenstein, 2013; suthers, teplovs, de laat, oshima & zeini 2011). this investigates of how macro-level phenomena can emerge from and/or be constrained by different micro-level dynamics. for example, chiu (2008) found that microcreativity in a group’ mathematical solutions can be sparked by a discourse pattern, namely a wrong idea followed by disagreement among the group members. temporal analysis can help develop an understanding of how patterns unfold, providing insights into how learning is taking place (chiu, 2008; wise & chiu, 2013). take together, the argument for temporal analysis is driven by the realisation that without careful attention for temporal characteristics of constructs in learning and instruction research, we are reducing the significance of our research and are unable to explain important aspects of learning and instruction. 3. a paradigm shift as touched upon in the introduction, it is important to understand that advanced temporal analysis entails a deviation from the traditional research paradigm used in learning and instruction (reimann, 2009; schmitz, 2006). often the variable-based approach is applied, which focuses on the analysis of variance between independent and dependent variable(s). in contrast, the event-based approach looks at events analysing the (dynamic) relations between the events (reimann, 2009). this approach focuses on researching the nature of these relations and their development over time. this reveals the temporal characteristics of a construct and/or how different constructs interplay over time. for example, it can indicate how a discussion among learners unfolds over time. consistency and change in the behaviour of constructs can be investigated by specifying these temporal characteristics (schmitz, 2006). yet often reviewers in learning and instruction immediately ask the next question: can we explain learning performance from temporal characteristics of the constructs? this question embodies the ”holy grail” of temporal analysis and often constitutes a connection between our traditional variable-based approach and the event-based approach. however, few (perhaps none) researchers have so far reached the “holy grail”. moreover, many of those initially aiming for this connection started to grow a realization that there are valuable questions to be answered within analysing temporal characteristics themselves. an example of such a research question is: which sequences of learner actions (discuss, elaborate, summarize) occur during collaborative learning? an example of a research question combining temporal characteristics with learning performance is: which sequences of learners’ actions during collaborative learning influence learning performance positively? overall temporal analysis in learning and instruction is innate to our intuitive understanding of learning, but the operationalization of this understanding entails a deviation of our i. molenaar 18 | f l r traditional research paradigm. consequently, the nature of the questions addressed with temporal analysis varies from our characteristic research questions in learning and instruction. after all, time is a highly complex construct that has been debated on from physics to philosophy. also within educational research, conceptualizations of different time scales (lemke, 2000) and the use of time in classrooms (bloome et al., 2009; mercer, 2008) have been discussed. still, there is no framework that conceptualizes dimensions of time and positions different temporal characteristics within these dimensions. research questions, therefore, focus on different dimensions of time and address conceptually different temporal characteristics of constructs. in the next section, different dimensions of time important for learning and instruction research are highlighted. 4. different dimensions of time so far, in our field when addressing temporal characteristics, we have encountered mainly frequency analysis indicating the number of occurrences of a variable during a particular time window. this provides insights into the prevalence of a construct during learning. for example, students receiving scaffolds during learning apply more metacognitive activities compared to students that do not receive scaffolds (molenaar et al., 2011). although informative, frequency analyses provide limited insights into the individual time-related characteristics of the constructs researched. even though this analysis showed that students perform more metacognitive activities, we do not know the importance of their position in the learning process, their duration or the rate at which these metacognitive activities occur during learning. thus frequency analyses treat the learning process as one holistic unit, ignoring the individual time-related characteristics of constructs. using the individual time-related characteristics allows for the analysis to illustrate how events occur within the flow of continuous events in a particular time window. examples are analyzing the significance of the position of events, the duration of particular events and the rate of particular events within the learning process (molenaar & wise, in prep). for example, planning at the start of a learning task was found to be more productive for learning compared to planning latter on (moos et al. 2008). also students monitor at a higher intensity and longer in more difficult tasks compared to easier tasks (iiskala et al. 2010). the dimension of time described above conceptualizes how constructs behave in a continuous flow of events by examining the individual time-related characteristics of these events within the flow. another dimension of time in contrast to analysing events in a continuous flow, is analysing relative arrangements of multiple events in time. here the focus does not lie on the individual time-related characteristics of events in a time window, but on how events are organized among each other. examples are both reoccurring processes and non-reoccurring processes (molenaar & wise, in prep). an example of a reoccurring process is the cyclical notion of self-regulated learning, which suggests that orientation, planning, monitoring and evaluation follow each other (hadwin & jarvela, 2013; zimmerman, 2002). nonreoccurring transitions occur only once, for example, students who learn how to read progress from spelling letters into the automatic detection of words (verhoeven, 2004). apart from reoccurring and non-reoccurring patterns which both indicate a form of regular change, irregular change is another form of an arrangement of multiple events that can be investigated. the notion of productive failure where collaborating students seem to engage in chaotic interaction in the beginning of their collaboration is an example of irregular change (kapur, 2009). this seemingly unstructured process is of essential importance for their later learning. the dimension of time described above conceptualizes how constructs behave in in relative arrangements of multiple events by examining the organisation among these events. i. molenaar 19 | f l r without claiming that the above is a complete overview of temporal characteristics useful for the field of learning and instruction, a clear distinction can be made between two dimensions of time, i.e. focusing on individual events within the continuous flow of events or on relative arrangements of multiple events (molenaar & wise, in prep). in order to push the conceptual understanding of time in our field, a conceptual framework of looking at time and positioning temporal characteristics therein is important for learning and instruction research to articulate and classify time-related research questions. such a framework can support conceptual clarity among researchers engaging in temporal analysis and organize and deepen debates. furthermore, it can be used as a roadmap to articulate temporal research questions, unravelling temporal characteristics of different constructs. 5. an illustrative example of temporal analysis in order to illustrate the above, i provide an example of a temporal analyses used to research socially regulated learning. during collaborative learning, students support one another’s learning as they discuss, elaborate, argue, confirm and regulate one another’s activities. we know that regulative activities such as metacognitive (i.e., planning, monitoring) and relational activities (i.e., confirming, engaging) contribute significantly to students’ learning (molenaar et al., 2011). yet, we know very little about how the group’s socially regulative activities influence students’ cognition at a micro level during collaborative learning. therefore, we explored how sequences of students’ cognitive, metacognitive and relational activities affect the likelihood of subsequent cognitive activities during collaborative learning and whether these relationships differ across time (molenaar & chiu, 2014). the data are from 18 triads (54 students) engaged in 51.338 conversation turns over 6 hours of learning activities. the triads collaborated face-to-face while working in a computer based learning environment. the primary school students were in grades 4, 5, and 6, and aged between 10 and 12. statistical discourse analysis, content and discourse analysis were used to analyse the learning activities. during content analysis, each turn in the conversation was coded as cognitive (higher or lower cognition), metacognitive (orientation, planning, monitoring and evaluation) or relational (confirm, deny, engage), procedural or off task activities. then, statistical discourse analysis (sda) was used to examine the sequential relations predicting lower and higher cognition (chiu & koo, 2005). i. molenaar 20 | f l r figure 1. path diagram of standardized final multivariate outcome, multilevel cross-classification of low cognition component. solid lines indicate positive effects. dashed lines indicate negative effects. thicker lines indicate larger effect sizes. *p < .05, **p < .01, ***p < .001. (molenaar & chiu, 2014; reproduced with permission) we found that high cognitive, low cognitive, metacognitive and relational activities in recent conversation turns were linked to the likelihood of low cognition in a conversation turn (see figure 1). metacognitive activities in the form of planning (in the previous conversation turn or -1), monitoring (-1), evaluating (2 conversation turns ago or -2), monitoring (-2), summarizing (-3) and monitoring (-3) all increased the likelihood of low cognition, while orientation (-2) reduced it. higher cognitive activities in either of the last two conversation turns or low cognition in any of the last six conversation turns also increased the likelihood of low cognition. lastly, relational activities in the form of confirming and engaging in any of the last two conversation turns increased the likelihood of low cognition. this example analyzes temporal characteristics of arrangement of multiple events to understand how these events act within the learning process. this type of analysis illustrates how different learning activities alternate and fluctuate among collaborating students and emerge into socially regulated learning. the findings show recurrent sequential relationships between cognitive, metacognitive and social relational activities. moreover, this analysis indicates that these patterns are rather stable over time. even though these analyses reveal important information about micro-level temporal interaction among learning activities, an often received question is: “what do these relations among learning activities mean for learning, i.e. which sequences should we encourage with instructional designs?” this question embodies the “ holy grail” and has not been addressed yet. although it is an important question, this inquiry clearly indicates the need for a paradigm shift within our field. we need to learn to value results of temporal analysis in their own right, providing important information about constructs in learning and instruction and taking steps to defining micro level temporal theories of how constructs behave over time. i. molenaar 21 | f l r 6. challenges apart from creating the awareness of the need for temporal research questions, there are a number of other challenges that need to be addressed to forward temporal analysis in the field of learning and instruction. as discussed in section 4, time can be conceptualised differently in our research (bloome et al., 2009; lemke, 2000; mercer, 2008). a conceptual framework to articulate different dimensions of time to frame temporal characteristics and related research questions could enhance conceptual clarity and provide ground for in-depth debate about time-related characteristics of individual events in the continues flow of events or about the arrangements of multiple events over time. second, although there are many emerging methods such as visualizations (reimann, 2009), sequential lag analysis (bakeman and gottman, 1997), statistical discourse analysis (chiu & khoo, 2005), temporal pattern analysis (magnusson, 2000), markov modeling (biswas, kinnebrew & segedy, 2012), data mining (robero et al. 2010), and dynamic systems (hollenstein, 2013) used to explore time and order in learning processes, we are only starting to explore the commonalities and differences among these methods. understanding about these techniques, as well as which learning and instruction questions can be answered by their application, is required. comparing different methods can enhance our understanding of temporal characteristics of constructs in learning and instruction (e.g., via triangulation) and methodological issues (e.g., which method is most appropriate for specific research questions?). collaboration among researchers is needed to create guidelines and to work towards a methodological framework for temporal analysis. third, when performing time-related research, we always “cut in time” i.e., we make an artificial division in time units. this segmentation of time can be approached differently, that is at the level of instructional units, time units or units of time in which a construct is acting homogeneously. for example, determining the time window based on the frequency of occurrence of low cognition in the group discourse (molenaar & chiu, 2014). choices made about segmentation have important implications for the results, and therefore, clear guidelines towards determining time windows should be formulated. fourth, granularity of our time related-research is an issue. the level at which we collect and code is often at a micro level capturing very small units, such as events from electronic learning environments or utterances in a dialogue. our theories are usually defined at a macro level, explaining how different constructs act. these different levels of granularity between coding and theory are a challenge for meaning making. aggregation of micro level variables to more macro level constructs can be a solution to this issue. yet, as with segmentation, decisions about granularity used in analysis also impacts results profoundly and should therefore follow clear procedures to ensure quality standards. moreover, combinations of different research traditions, such as ethnographical approaches and data-mining methods, can help make connections between macro level theory and micro level coding. a number of researchers have already indicated the need for micro level temporal theories of constructs to support temporal analysis (azevedo, 2014; bannert et al. 2014; molenaar & chiu, 2014; molenaar & järvelä, 2014; molenaar et al., 2011; kuvulja et al. 2014; winne, 2014). finally, until now, mainly exploratory studies have been done and there is a request from our community to move toward to the holy grail, that is to establish that particular temporal characteristics contribute to learning performance in particular ways. on the one hand, the holy grail will help confirm the value of temporal analysis for the field of learning and instruction. yet, as indicated above, linking these analysis to learning performance is challenging. collaboration among researchers is needed to overcome these issues and create guidelines to work towards a uniform approach for event-based methods to enhance our understanding of the temporal characteristics of learning and instruction. i. molenaar 22 | f l r 7. conclusion in the field of learning and instruction, there is an intuitive belief that temporality is important to comprehend learning. in order for us, as a field, to make progress in understanding the temporal aspects of learning, a number of challenges need to be overcome. the field needs to be aware that temporal analysis often departs from the traditional research approach. in order to enhance this advancement, the field must embrace a different kind of research question specifically related to temporal aspects of learning and instruction. keypoints time deserves more attention in learning and instruction research temporal analysis entails a paradigm shift addressing a different type of research question a conceptual framework of time can support framing temporal characteristics and research questions we need to advance our understanding of methodologies, time segmentation and meaning making of temporal analysis acknowledgments the thinking in this paper reflects idea’s developed and discussed during the various workshops “it’s about time”. all participants in these workshops have contributed to the construction of this understanding and especially my conversations with alyssa wise and ming ming chiu. references akhras, f. n., & self, j. a. (2000). modeling the process, not the product, of learning. in s. p. lajoie, computers as cognitive tools, volume two: no more walls (pp. 3-28). mahwah, nj: lawrence erlbaum associates. azevedo, r. (2014). issues in dealing with sequential and temporal characteristics of self-and sociallyregulated learning. metacognition and learning, 9(2), 217-228. http://dx.doi.org/10.1007/s11409014-9123-1 bakeman, r., & gottman, j. m. (1997). observing interaction: an introduction to sequential analysis. cambridge: cambridge university press. http://dx.doi.org/10.1017/cbo9780511527685 bannert, m. (2006). effects of reflection prompts when learning with hypermedia. journal of educational computing research, 4, 359-375. http://dx.doi.org/10.2190/94v6-r58h-3367-g388 bannert, m., reimann, p. & sonnenberg, c. (2014). process mining techniques for analysing patterns and strategies in students' self-regulated learning. metacognition and learning, vol 9, 161-185. http://dx.doi.org/10.1007/s11409-013-9107-6 segedy, j. r., kinnebrew, j. s., & biswas, g. (2012). supporting student learning using conversational agents in a teachable agent environment. in the future of learning: proceedings of the 10th international conference of the learning sciences (icls 2012) (vol. 2, pp. 251-255). bloome, d., beierle, m. grigorenko, m. & goldman, s. (2009). learning over time: uses of intercontextuality, collective memories, and classroom chronotopes in the construction of learning opportunities in a ninth-grade language arts classroom. language and education, 23(4), pp. 313334. http://dx.doi.org/10.1080/09500780902954257 i. molenaar 23 | f l r chiu, m. m., & khoo, l. (2005). a new method for analyzing sequential processes: dynamic multi-level analysis. small group research, 36, 600-631. http://dx.doi.org/10.1177/1046496405279309 chiu, m. m. (2008). flowing toward correct contributions during groups' mathematics problem solving: a statistical discourse analysis. journal of the learning sciences, 17 (3), 415 463. http://dx.doi.org/10.1080/10508400802224830 goldstein, h. (1995). multilevel statistical models. sydney: edward arnold. günther, c., & van der aalst, w. (2007). fuzzy mining: adaptive process simplification based on multiperspective metrics. in g. alonso, p. dadam & m. rosemann (eds.), international conference on business process management (bpm 2007) (pp. 328-343). berlin: springer. greene, j. a. & azevedo, r. (2010). the measurement of learners’ self-regulated cognitive and metacognitive processes while using computer-based learning environments. educational psychologist, 45, 203 – 209. http://dx.doi.org/10.1080/00461520.2010.515935 hadwin, a.f., & järvelä, s. (2011). introduction to a special issue on social aspects of self-regulated learning: where social and self meet in the strategic regulation of learning. teachers college record, 113(2), 235-239 hollenstein, t. (2013).state space grids: depicting dynamics across development.new york: springer. http://dx.doi.org/10.1007/978-1-4614-5007-8 iiskala. t., vauras, m., lehtinen, e., & salonen, p. (2011). socially shared metacognition within primary school pupil dyads’ collaborative processes. learning and instruction, 21, 379-393. http://dx.doi.org/10.1016/j.learninstruc.2010.05.002 järvelä, s. & hadwin, a. (2013). new frontiers: regulating learning in cscl. educational psychologist, 48(1), 25-39. http://dx.doi.org/10.1080/00461520.2012.748006 lemke, j.l. (2000). across the scales of time: artifacts, activities, and meanings in ecosocial systems. mind, culture and activity, 7(4), 273–290. http://dx.doi.org/10.1207/s15327884mca0704_03 kapur, m., voiklis, j., & kinzer, c. (2008). sensitivities to early exchange in synchronous computersupported collaborative learning (cscl) groups. computers and education, 51, 54-66. http://dx.doi.org/10.1016/j.compedu.2007.04.007 kapur, m. (2011). temporality matters: advancing a method for analyzing problem-solving processes in a computer-supported collaborative environment. international journal of computer-supported collaborative learning (ijcscl), 6,(1), 39-56. http://dx.doi.org/10.1007/s11412-011-9109-9 kennedy, p. (2008). a guide to econometrics. cambridge: blackwell. kinnebrew, j. s., segedy j.r. & biswas, g. (2014). analyzing the temporal evolution of students' behaviors in open-ended learning environments. metacognition and learning, vol 9, 217-228. http://dx.doi.org/10.1007/s11409-014-9112-4 kuvalja, m., verma, m. & whitebread, d. (2014). patterns of co-occurring non-verbal behavior and selfdirected speech; a comparison of three methodological approaches. metacognition and learning, vol 9, 87-111. http://dx.doi.org/10.1007/s11409-013-9106-7 magnusson, m. s. (2000). discovering hidden time patterns in behavior: t-patterns and their detection behavior research methods, instruments, & computers: a journal of the psychonomic society, inc, 32(1), 93–110. malmberg, j., järvelä, s. & kirchner, p. (2014). elementary school students’ strategic learning: does tasktype matter? metacognition and learning, vol 9, p. 113-136. http://dx.doi.org/10.1007/s11409-0139108-5 mayer, r.e. (2008). learning and instruction. pearson; new jersey. mercer, n. (2008) the seeds of time: why classroom dialogue needs a temporal analysis. journal of the learning sciences, 17, 1, 33-59. http://dx.doi.org/10.1080/10508400701793182 molenaar, i., van boxtel, c.a.m. & sleegers, p.j.c. & roda, c. (2011). attention management for selfregulated learning: atgentschool. in c. roda (ed), human attention in digital environments, cambridge university press: cambridge, 259 280.ttp://dx.doi.org/10.1017/cbo9780511974519.011 molenaar, i., chiu, m. m., van boxtel, c. & sleegers, p. j.c. (2011). scaffolding of small groups’ metacognitive activities with an avatar. international journal of computer-supported collaborative learning, 6, 601-624. http://dx.doi.org/10.1007/s11412-011-9130-z i. molenaar 24 | f l r molenaar, i., van boxtel, c.a.m & sleegers, p.j.c. (2011). metacognitive scaffolding in an innovative learning arrangement. instructional science, vol 39(6), 785-803. http://dx.doi.org/10.1007/s11251010-9154-1 molenaar, i & chiu m.m. (2014). dissecting sequences of regulation and cognition: statistical discourse analysis of primary school children’s collaborative learning. metacognition and learning, vol 9, 137160. http://dx.doi.org/10.1007/s11409-013-9105-8 molenaar, i. & järvelä, s. (2014). sequential and temporal characteristics of self and social regulated learning. metacognition and learning, vol 9, p. 75-85. http://dx.doi.org/10.1007/s11409-014-9114-2 molenaar, i & wise, a.f. (in prep). concepts of time: a framework for thinking about temporal aspects of learning. moos, d. c., & azevedo, r. (2008). self-regulated learning with hypermedia: the role of prior domain knowledge. contemporary educational psychology, 33(2), 270-298. http://dx.doi.org/10.1016/j.cedpsych.2007.03.001 reimann, p. (2009). time is precious: variableand event-centred approaches to process analysis in cscl research, international journal of computer-supported collaborative learning, 3, 239-257. http://dx.doi.org/10.1007/s11412-009-9070-z schegloff, e. a., 2007. sequence organization in interaction: a primer in conversation analysis. cambridge: cambridge university press. http://dx.doi.org/10.1017/cbo9780511791208 schmitz, b. (2006). advantages of studying processes in educational research. learning and instruction. 16, 433-449. http://dx.doi.org/10.1016/j.learninstruc.2006.09.004 schoor, c. & bannert, m. (2012). exploring regulatory processes during a computer-supported collaborative learning task using process mining. computers in human behavior. 28(4), 13211331. http://dx.doi.org/10.1016/j.chb.2012.02.016 suthers. d., teplovs, c., de laat, m., oshima, j., & zeini, s. (2011). connecting levels of learning in networked communities. workshop conducted at the 9th international conference on computer supported collaborative learning, july 9, 2011, hong kong. robero, c., ventura, s., pechenizkiy, m., & baker, r. (eds.). (2010). handbook of educational data mining. boca raton: chapman&hall/crc. veenman, m.v.j. (2011). learning to self-monitor and self-regulate. in r. mayer,& p. alexander (eds.), handbook of research on learning and instruction. new york: routledge. weinberger, a., & fischer, f. (2006). a framework to analyze argumentative knowledge construction in computer-supported collaborative learning. computers & education, 46, 71-95. http://dx.doi.org/10.1016/j.compedu.2005.04.003 winne, p. h. (2014). issues in researching self-regulated learning as patterns of events. metacognition and learning, 229-237. http://dx.doi.org/10.1007/s11409-014-9113-3 wise, a. f., & chiu, m. m. (2011). analyzing temporal patterns of knowledge construction in a role-based online discussion. international journal of computer-supported collaborative learning, 6(3), 445470. http://dx.doi.org/10.1007/s11412-011-9120-1 wise, a. f., perera, n., hsiao, y. , speer, j., & marbouti, f. (2012). microanalytic case studies of individual participation patterns in an asynchronous online discussion in an undergraduate blended course. the internet and higher education, 15(2), 108-117. http://dx.doi.org/10.1016/j.iheduc.2011.11.007 zimmerman, b. j. (2002). becoming a self-regulated learner: an overview. theory into practice, 42(2), 64-70. http://dx.doi.org/10.1207/s15430421tip4102_2 frontline learning research 5 (2014) 140-166 issn 2295-3159 corresponding author: ismo t. koponen, department of physics, p.o. box 64, fi-00014 university of helsinki, finland. ismo.koponen@helsinki.fi doi: http://dx.doi.org/10.14786/flr.v2i3.120 140 | f l r a systemic view of the learning and differentiation of scientific concepts: the case of electric current and voltage revisited ismo t. koponen, tommi kokkonen department of physics, university of helsinki, finland article received 12 february 2014 / revised 14 april 2014 / accepted 29 june 2014 / available online 3 july 2014 abstract in learning conceptual knowledge in physics, a common problem is the incompleteness of a learning process, where students’ personal, often undifferentiated concepts take on more scientific and differentiated form. with regard to such concept learning and differentiation, this study proposes a systemic view in which concepts are considered as complex, dynamically evolving structures. the dynamics of the concept learning and differentiation is driven by the competition of model utility in explaining the evidence. based on the systemic view, we introduce computational model, which represents the essential features of the conceptual system in the form of directed graph (dgm), where concepts are nodes connected to other conceptual elements (nodes) in the graph. the results of a dgm are then compared to the empirical findings to identify differentiation between concepts of electric current and voltage based on a re-analysis of previously published empirical findings on upped secondary school students’ learning paths in the context of dc circuits. the comparison shows that the model predicts and explains many relevant, empirically observed features of the learning paths of concept learning and differentiation, such as: 1) contextdependent dynamics, 2) the persistence of ontological shift and concept differentiation, and 3) the effects of communication on individual learning paths. the systemic view and the dgm model based on it make these generic features of interest in concept learning and differentiation understandable and show that these features are associated with the guidance of theoretical knowledge. finally, we discuss briefly the implications of the results on teaching and instruction. keywords: concept learning; concept differentiation; ontological shift; complex system, directed graph model i. t. koponen & t. kokkonen 141 | f l r 1. introduction learning scientific concepts is a demanding and lengthy process, in which the learner’s initial and personal concepts and conceptions gradually change towards more scientific concepts in that they are part of an extensive and coherent knowledge system (theory), which regulates and constrains their use. previous research (lee & law 2001; reiner, slotta, chi & resnick, 2000; smith, carey & wiser, 1985) has raised the notion that learners seldom use concepts in the same sense as they are used in scientific knowledge. one particular but central question is proper concept differentiation. when two closely related concepts are linked to the same phenomenon, novice learners do not always properly understand them as different concepts. rather, the concepts are confused and used in undifferentiated ways (lee & law, 2001; reiner et al., 2000; smith et al., 1985). the aim of the learning process then is to produce a clearer and more scientific understanding not only of how such concepts differ, but also of how they are related, a process referred to here as concept differentiation. concept differentiation has often been discussed from the viewpoint of ―ontological shift‖, which views that the ontological attributions are at the centre of concept learning, and changes in those attributions are the main mechanisms behind differentiation (chi & slotta, 1993; chi, 2005, 2008). this position finds support in the notion that ontological commitments in concept development are deeply rooted in the psychological aspects of concepts (murphy, 2004; keil 1989). however, the ontological shift view has been criticized for overemphasising the role of static ontologies (gupta, hammer & redish, 2010) and failing to pay proper attention to the role of theory in learning (ohlsson, 2011). in addition, when studying concept differentiation, one should understand that concepts must be shared and be communicable to other learners. communicating and sharing of concepts is closely related to problem of knowledge convergence, discussed mostly in cases of the explanations convergence and seeking consensus and a common way to understand concepts and terms (jeong & chi, 2007; weinberger, 2007). however, how communication affects the learning of scientific concepts and the differentiation process or, in general, which stages or steps of the learning paths communication could possibly affect, remains unclear. consequently, our understanding of the learning path in concept differentiation remains partially incomplete. one promising way to remedy this lack of understanding views learners’ concepts as complex structures and the learning process itself as a systemic process consisting of different conceptual elements and where those elements interact (brown & hammer, 2008; koponen & huttunen, 2013). the present study proposes a new way of synthesising different views by focusing explicit attention on concept learning and concept differentiation, so that the synthesis takes into account aspects of interest for personal concepts, such as the role of ontological attributions, and aspects relevant to scientific concepts, such as the communicability of concepts and their constrained, law-like use. such synthesis, referred to here as the systemic view, sees concepts as complex structures. different stages of concept learning, with partially differentiated concepts, are then seen as partial projections of the structure in different real situations; the projections are partial and incomplete mappings of more complete systems. on the level of personal concepts, the systemic view stems from recent views of the heterogeneity of concepts, which emphasise the diversity of roles of concepts in different cognitive processes (machery, 2009). on the level of scientific concepts, the systemic view borrows much from the ―dynamic frames‖ view of scientific concepts, where both ontological attributions and theoretical, law-like (nomic) knowledge are considered central to concept development (anderssen & nersessian, 2000; andersen, barker & chen, 2006). in the systemic view, the learning process also requires a driving force or mechanism; this study suggests that the utility of models, through which the concepts are used, and the competition of models based on utility provides that mechanism (ohlsson, 2009, 2011). the systemic model is applied here to discuss and simulate concept differentiation and its generic features in one empirically well-studied case; the concepts of electric current and voltage. the generic features of interest are the robustness of certain simple forms of the concepts (often called as misconception or intuitive conceptions), the strong context dependence of these conceptions, the occurrence of ontological shift and its persistence once achieved, and the role of theoretical knowledge in concept differentiation and ontological shift. this study focuses on developing the theoretical background of the systemic model. to that i. t. koponen & t. kokkonen 142 | f l r end, we further develop the directed graph model that we recently introduced (koponen, 2013). we embody the theoretical model by using re-analysis of empirical data of nine students’ learning processes, in groups of three (koponen & huttunen, 2013). we introduce a simulation model, based on directed graphs, to model the learning path and to reproduce the most important generic features of the empirical findings of concept differentiation. finally, we discuss some interesting implications for teaching that the model raises. the model presented here supports the view that ontological shift is not the primary agent in learning scientific concepts; rather, it stems from theoretical learning, driven by model utility. this means that instead of focusing on ontological training and on developing instructional methods based on it, attention should focus on how theoretical knowledge is introduced and applied in the learning process. another important notion is the role of context in learning and how students are gradually introduced to more demanding tasks. the model results show that overly complex tasks cannot promote learning if students lack sufficiently advanced concepts; yet overly simple tasks lead to stagnation, where a learner gets stuck on simple models and unsophisticated concepts. what is needed is a learning path that is progressive and which demands use of complex models and concepts. according to the view presented here, the learner needs to receive theoretical knowledge through instruction and to see its utility in complex enough situations, thus avoiding ―overlearning‖ of simple cases. this emphasises not only the teacher’s role, but also the importance of variation in contexts in which the knowledge is applied. these notions, based on the systemic view and on its computational embedding, therefore have direct practical consequences for how one should design learning paths and the role of teacher in them. 2. concept learning and differentiation: theoretical underpinnings pre-scientific concepts are often idiosyncratic, context dependent and difficult to communicate. of course, scientific concepts, as used by advanced learners and experts, not only share some aspects with ―personal‖ pre-scientific concepts, but also differ from them in important ways. one of the most important differences is that scientific concepts often refer to categories (entities or objects) that − like models − are themselves purely conceptual rather than categories within the reach of experience or observation, and their use is law-like (nomological) and constrained (andersen & nersessian, 2000; andersen, barker & chen, 2006; hoyningen-huene, 1993). nevertheless, personal and scientific concepts share features, in particular on the level of how theory or theory-like knowledge structures the sets of attributes that characterise the concepts. concept differentiation is a process where the sets of attributes that characterise and typify concepts become structured so that no other concept shares the same set and values of its attributes. in the case of scientific concepts, this requires that a given concept have law-like (i.e. nomological) relationships to other concepts. it is through these features that the concepts acquire the sharp descriptive power they hold in scientific theories (andersen & nersessian, 2000; andersen et al., 2006; hoyningen-huene, 1993.) at the core of learning scientific concepts is a transformation process where personal, individual concepts which are meaningful to a learner himself or herself, but not easily communicable or meaningful to other learners, are transformed into concepts that are more communicable and, where consensus exists, how they can be used in relation to other concepts. this latter way of using concepts is already scientific in that normative and law-like rules govern the use of the concepts (the nomological use of concepts), and the attributes that characterise them are sharply identified. these brief notions suggest that a suitable theoretical underpinning must link the individual learning and use of concepts to the shared, public use of concepts; intraand interpersonal levels of concept use and learning must be coupled. 2.1. concepts: personal and shared discussions of the learning process, where a student learns and acquires scientific concepts and becomes a fluent user of such concepts, must focus on differences in the ways in which concepts are i. t. koponen & t. kokkonen 143 | f l r understood when seen from the viewpoint of an individual’s personal cognition and learning, and when concepts are discussed as they are shared and used in scientific communities. in the former case, concepts are personal, often un-explicated, and seen as strongly context dependent (carey, 2010; gopnik & meltzoff, 1997), whereas in the latter case, concepts are explicated elements of scientific theories, and their proper use is constrained by the knowledge system as a whole (andersen, 2006; andersen & nersessian, 2000; hoyningen-huene, 1993). to highlight this difference between personal and shared scientific concepts, we use the terms ―intrapersonal concepts‖ and ―interpersonal concept‖. of these, the interpersonal concepts can be shared on the level of small and local groups, such as study groups in learning, or on the level of extended and global groups, such as scientific communities, in which case interpersonal concepts are simply called scientific concepts. the learning process, where an individual’s intrapersonal concepts acquire scientific character and are transformed into interpersonal ones, involves epistemic dimensions (the use of concepts in context of explanation) and communicative dimensions, where concepts are used in communication and consensus finding for what is explained and how. a description of the learning process from intrapersonal to interpersonal concepts requires one to have a model of concepts, which, at one end of the continuum, takes a form of an intrapersonal concept and in another end, as an interpersonal, scientific concept. 2.1.1 intrapersonal concepts in psychology and cognitive science, two important viewpoints of interest here regarding intrapersonal concepts are concepts as prototypes and concepts as theories (machery, 2009; murphy, 2004; smith & medin, 1981). in the concepts-as-prototypes view, the prototype represents a certain class of entities or objects to which the concept refers, and the prototype is understood as a body of knowledge about the properties of the members in that class. however, such properties are assumed to be only statistical or probabilistic, and are not strictly necessary or sufficient by themselves to determine membership (machery, 2009; murphy, 2004). the statistical or probabilistic knowledge contained in the prototypes can be about either 1) the typicality of the category or 2) its cue-like properties. in both cases, a set of properties or attributes and their values indicate how likely or significant a given property is in regard to the identification of the concept (smith & medin, 1981). in concept learning, where concepts develop and are transformed, as in, for example, a concept combination process, new concepts emerging from the combination process inherit some − but not necessarily all − of the properties of the ancestor prototypes (murphy, 2004). a reverse process of concept differentiation can be understood as a process where new concepts inherit partial or split sets of the properties of the original concepts. from the viewpoint of concepts as prototypes, an important part of concept learning is to learn the concept’s ontological attributions, which determine to which ontological categories the concept refers (keil, 1989; murphy, 2004). the ontological shift theory of conceptual change (chi & slotta, 1993; reiner et al., 2000; slotta & chi, 2006) addresses the way in which learners associate substanceand process-like attributes with the concepts they use. according to the ontological shift theory, many students’ learning difficulties originate from a misconceived ontological class (chi & slotta, 1993; slotta & chi, 2006). the view of concepts-as-theories is based on psychological research, which views conceptual knowledge as theory-like (carey, 2010; gopnik & meltzoff, 1997). the concepts-as-theory view focuses on the role of causal knowledge in the categorisation process and in concept learning. concepts, in this view, are first and foremost carriers of causal knowledge about the properties of the members of classes to which the concepts refer. therefore, causal knowledge is considered crucial for concept recognition and differentiation. quite often, the role of causal knowledge is discriminative with regard to the attributes or properties attached to a concept (machery, 2009; murphy, 2004; rehder, 2003). these two different views of intrapersonal concepts can be thought of as two different ways to use concepts (machery, 2009), thus reflecting the multifaceted aspects of concepts. such multifacetedness can considered as a sign of a real cognitive difference between the various ways of using concepts (machery, 2009) or as different projections (or mappings) of a more integrated, complex and generic system that projects differently in different real situations (danks, 2010). here, we adopt and further develop the latter viewpoint of concepts as systems projecting differently in different situations. the aspects of greatest interest i. t. koponen & t. kokkonen 144 | f l r in developing such a systemic view are: 1) attributes and sets of attribute values (as in the prototype view) and 2) causal and theoretical knowledge and its role in distinguishing the attributes in concept combination (as in the concepts-as-theories view). 2.1.2 interpersonal concepts in learning, concepts must be shared with other learners, instructors and teachers; concepts must be interpersonal. when concepts are shared, there must be common agreement of referents of the concepts, the ways in which they refer to and the ways to use concepts; there must be certain norms of usage. in particular, when concepts are scientific concepts, they are ―public‖ in that members (scientists) of institutional groups (scientific communities) share these intrapersonal concepts. crucial in to this is not only to agree on the norms, but also to link the norms to accepted verification methods, such as observations, experiments and models (andersen et al., 2006; andersen & nersessian 2000; hoyningen-huene 1993). there are relatively few attempts to discuss scientific concepts so that connection is made to a psychological understanding of concepts. one notable exception, however, is a view which sees scientific concepts as dynamic frames embracing conceptual knowledge (andersen et al., 2006). the dynamic frame view assumes that advanced scientific concepts are acquired by the same process of categorisation as everyday concepts. the categorisation of interest here is how different exemplar-type problems fall into the same classes based on how different types of models serve in solving those problems. then, the characteristic (but not defining) features of concepts emerge from the reference to classes of models, or clusters of models. scientific concepts, where the models form the classes relevant for learning concepts, are also regulated and constrained by certain rules for applying concepts in construction of models; the norms guide how to use the concepts. the dynamic frames incorporate the attributes of concepts, in much the same way as in the prototype theory, and the theoretical knowledge as in the theory-theory approach, but now theoretical knowledge has a role of organising the attributes and imposing constraints on their co-variation (andersen & nersessian, 2000; andersen et al., 2006). 2.3. concept learning as convergence process the focal point of this study is the individual learner’s process of learning scientific concepts, where personal concepts are transformed into scientific concepts. however, because learning takes place in a community of students and teachers, we must also understand how communication affect the learning process. collaborative learning and sharing ideas in small groups has been shown to enhance student learning. such learning is beneficial when the members’ knowledge supplements others’, but differs only slightly from it; members show some knowledge equivalence and knowledge sharing within the group (jeong & chi, 2007; weinberger, 2007). here, however, re-analysed and revisited the empirical data contain little information about the communication, and although the effect is evident, very simple models serve here to estimate the effects of communication on concept differentiation. 2.4. systemic view the systemic view sees concepts as part of a knowledge system, where the ―concept‖ as a part of the operation of the system may have a plurality of appearances and project differently in different contexts, yet the parts of the system remain unchanged. recently, some have suggested a different but related type of systemic view. it employs the ideas of dynamic complex systems, where robust and persistent conceptual patterns can arise in emergent fashion from interactions of elemental pieces of the dynamic system (brown & hammer, 2008). these interactions can thus give rise to a full spectrum of different projections of concepts, some of which are simple and some, complex. the systemic view is also adopted here, so concepts are considered functional parts of the system, affected by the system and its evolution. the systemic view requires specific representations of concepts which can capture their complex, multifaceted and dynamic nature. a suitable model of such concepts should address at least the following features: i. t. koponen & t. kokkonen 145 | f l r 1) attributes and sets of attribute values as in prototype and dynamic frame views. 2) theoretical knowledge in role of constraining and guiding the use of concepts. 3) models as they connect to the development of scientific concepts. 4) model competition and utility as mechanisms affecting the evolution of concepts. requirements 1-2 are essential to retaining a connection to a psychological view of concepts and concept learning, which understand intrapersonal concepts and the transition from intrato interpersonal concepts. requirements 2-4 are essential to describing how intrapersonal concepts develop or change into scientific concepts. in what follows, we introduce just such a systemic model in section 3 and then, in section 4, discuss how empirical results concerning the differentiation of the concepts electric current and voltage can be embedded within it. finally, in section 5, we present a computational embedding of the systemic view and use the computational model to simulate the process of concept differentiation. 3. systemic view on concept differentiation the systemic view of concept learning and differentiation sees concepts as constructs which, in the one hand, take the form of intrapersonal concepts, and on the other hand, the form of interpersonal concepts. such constructs are embedded in a conceptual system which evolves and affects the constructs as part of the system’s own evolution. the evolution of the concept system is changes in the connectedness of the concepts and in the strength of those connections. in that change, models play a central role, because through models, the concepts become projected onto actual, real situations. 3.1. structure: constructs concepts as complex structures are called here c-constructs. c-constructs are first and foremost connected to sets of attributes, where connecting links carry information about attributes and the strength of associations with those attributes. other elements of the system carry knowledge of regularities and relate concepts to each other in different ways: causally, through constrained determination (constrained covariation in a law-like manner) by constraining the use of concepts (e.g. conservation laws). these schemes are called determination constructs or, in shorthand, as d-constructs. cand d-constructs are the most elemental conceptual constructs of the systemic model, and as such, they offer no explanations or predictions by themselves. the task of explaining or predicting falls on models, which utilise the cand d-constructs as their constituents. models project concepts (c-constructs) onto phenomena to be explained, and through the success or failure of this projection, c-constructs are altered. the models, called here as m-constructs, are also conceptual constructs, but unlike cand dconstructs, are context dependent. the relationship of c-constructs’ to characteristic attributes is familiar from the prototype theory of concepts and is not only essential in describing personal concepts, but also important in describing scientific concepts. the basic level of attributes consists of simple and unstructured sets of attributes {a1, a2, ..., ak}, but more structured combinations can fall under more general schemes. these more general sets are subordinated under a more general (e.g. constraining) scheme, which here is typically a d-construct. in the course of concept development, these attributes are inherited, although some of the inherited attributes can be discarded as the concept evolves. these features resemble the dynamic framework approach to concepts (andersen & nersessian, 2000; andersen et al., 2006). the attributes are not strictly mutually exclusive, especially when c-constructs are not used together. however, the more important it is to use two c-construct together, the more difficult it is to maintain dissonant attributes; c-constructs are differentiated with regard to their attributes, as they should if they represent scientific concepts. d-constructs are general schemes which relate c-constructs to each other, typically in the form of causal connections or in the form of constrained determination (i.e. constrained co-variance with no causal i. t. koponen & t. kokkonen 146 | f l r dependence). in some cases, d-construct can simply constrain how a single c-construct can be applied. therefore, these constructs are essentially the carriers of theoretical knowledge (c.f. machery, 2009; rehder, 2003). the d-construct is the general template of the form of determination and is largely independent of the context, yet it prescribes how on can legitimately apply c-constructs to a given context through models. dconstructs play a crucial role in discriminating between attributes, because d-constructs connect cconstructs. through d-constructs the dissonances between attribute associations are revealed. m-constructs are designed so that they serve as models which explain phenomena or their selected properties. they use c-constructs, because concepts are needed to build models (nersessian, 2008). in some cases, m-constructs are also related to d-constructs, which then specify the relationship between cconstructs when more than one c-construct is involved. m-constructs are the basic vehicles for explaning or matching predictions with observable features of phenomena or, if one so wants, to select certain features of phenomena which fall under the explanatory power of a given m-construct. on the most advanced level, mconstructs are full-fledged scientific models. on the most basic level, m-constructs are simple and even selfexplanatory. however, in either cases, m-constructs are evaluated only against observational evidence {e1, e2, ..., ek}, which may either lend it support or lead to its rejection/inhibition. m-constructs compete against other available m-constructs in providing the most likely explanation of the evidence. figure 1. a schematic diagram (right) of cand d-constructs and their connections to attributes {a1, a2, ..., ak}, and (left) c-, dand m-constructs connected to each other and to sets of evidence {e1, e2, ..., ek}. links can be congruent (solid line) or dissonant (dashed line). the systemic view sees the knowledge system as a connected network of c-, dand m-constructs, where connections between these constructs can continuously change when the evidence changes. of course, the system can also reach stable states so that there are no changes in connections when additional evidence is available. changes in connections are based on locally effective rules, but the total effect depends on the global state of the system as whole (due to connectedness of the system). different concepts can then be expressed as different relational structures of the pieces or as different constellations of elements. within the systemic view, connections can be a type of positive constraint so that a connection strengthens the role of a given element. the connections can also be a type of negative constraint so that the connections weaken the role of the element. identifying connections and determining whether they are negative or positive must be based on empirical evidence. 3.2 dynamics: model competition and utility the evolution of the concept system is driven by the utility of m-constructs in explaining evidence (c.f. henderson, goodman, tenenbaum & woodward, 2010; ohlsson, 2009). the explanatory power of the m-construct changes with the changing amount of evidence to be explained. because different m-constructs can explain the same evidence, m-constructs compete against each other. if the context is simple and only i. t. koponen & t. kokkonen 147 | f l r little evidence need be explained, one can achieve this by using simple and only partially correct models that correspond to simple m-constructs. then, utility of the simple m-construct is better than utility of more complex ones (e.g. scientific ones), and the simpler ones are therefore more likely to be adopted. however, with the increasing complexity of the context and greater amounts of evidence to be explained, the complex models (m-constructs), which explain more, gain utility and become adopted. thus, it is important to note that in learning, the adoption of a model is a question not only of its correctness, but also of its utility (ohlsson, 2009, 2011). it is assumed here that different models and evidence to be explained are known in advance. many of the models may be inactive and much of the evidence unknown for the learner in the initial stages of learning. from point of view of modelling the learning, one can assume finite collection of possible models, some of them active and some inactive (cf. henderson et al., 2010). 3.3 concept convergence in a learning situation where concept learning and differentiation take place, learners share concepts in the group discussions within small groups. in learning, knowledge convergence is often considered crucial to forming shared, public concepts (weinberger et al., 2007; jeong & chi, 2007). the empirical data of concept differentiation indicate that knowledge also converges during this process; the ways in which one uses and understands the concepts become similar, at least to certain degree (koponen & huttunen, 2013). in the systemic view, this kind of knowledge convergence means that the ways in which the c-constructs link to other elements of knowledge and their attributes among the learning group become more similar during the learning process. here, the knowledge convergence discussed only to the extent that it concerns concept differentiation and via communication in small groups of three. therefore, in what follows, we concentrate on simple triadic communication patterns (discussed in more detail in section 4) and assume that the effect of convergence takes place mainly through utility of models. the communication is described simply as a consensus-based knowledge sharing where, through communication, all group members always adopt the model with the strongest utility. there is no threshold effect on adoption of the model. such a convergence model exaggerates the effect of communication on learning, but it is an adequate model for estimating the maximal expected effect of communication on concept differentiation. 4. empirical findings revisited: electric current and voltage research on learning scientific concepts and concept differentiation has been conducted in several ways and on different topics, but perhaps most extensively on the concepts electric current and voltage (cohen, eylon & ganiel; 1983; shipstone, 1984; engelhardt & beichner, 2004; koumaras, kariotoglou & psillos, 1997; lee & law 2001; mcdermott & shaffer, 1992; reiner et al. 2000;). some of the studies have focused on students’ explanatory models (cohen et al., 1983; engelhardt & beichner, 2004; mcdermott & shaffer, 1992; koumaras et al., 1997; shipstone, 1984), while some other studies have focused on ontological attributions (lee & law, 2001; reiner et al., 2000). the general outcome of these studies is that the concepts electric current and voltage are often mixed with personal, intuitive concepts or conceptions (or intrapersonal concepts) and, furthermore, are poorly differentiated. the impact of numerous empirical studies on deeper theoretical understanding of concept learning and differentiation, however, has been relatively modest for at least two reasons. first, although these studies have identified brought a variety of different types of models, intuitive conceptions and ontological attributions, they have failed to abstract from the empirical details general and generic features which could provide a broad enough theoretical perspective to understand the relationship between different views. therefore, for lack of a sufficiently broad theoretical perspective, discussions have often focused on the differences of a preferred theoretical perspective over that of some other perspective, rather than trying to provide a more integrated, progressive and broader theoretical framework that makes the partial accounts understandable (see e.g. chi & brem, 2009; gupta et al., 2010; ohlsson, 2009). we believe that many empirical findings can be captured within the systemic model when suitably idealised to reveal the essential generic features behind the multitude of details. furthermore, the systemic view can help us to understand i. t. koponen & t. kokkonen 148 | f l r how different aspects of concept learning are related. in what follows, we focus only on features of interest to advanced learners, typically those on an upper-secondary school level or first-year university level. 4.1. empirical results revisited and re-interpreted the purpose of the present work is to provide a new theoretical framework to discuss concept differentiation when learners’ concepts take on a scientific character. rather than report new empirical results, this study is re-uses and re-analyses already published empirical data (koponen & huttunen, 2013). the empirical data consists of nine students’ (upper secondary school) interviews about their conceptions of electric current and voltage in dc circuits. the students built dc circuits, observed their behaviour, and then proposed explanations for the observed brightness of light bulbs. the interviews were transcribed and analysed to identify the models students use to explain the behaviour. the nine students discussed their explanations in groups of three. the study consisted of three different contexts i-iii: i: light bulbs in series. the participants compared two variants (a single light bulb and two light bulbs) in terms of the brightness of the bulbs. this comparison produces evidence e1 and e2. ii: light bulbs in parallel. the first variant is again involves a single light bulb. the second variant involves two light bulbs in parallel. comparing the two variants yields evidence e1’ and e2’. iii: comparison of the brightness of light bulbs in series (i) and in parallel (ii). in the first variant, participants compare the brightness of light bulbs in series, and parallel circuits to the one-bulb case only. in the second variant, participants compare series and parallel cases to each other. this produces evidence e1’’ and e2’’. all six different types of evidence are referred to as an evidence set e = {e0, e1, e2, e0’,e1’ e2’, e0’’, e1’’, e2’’}, with e0, e0’ and e0’’ representing observations of the brightness of a single light bulb in each context (the brightest light bulb). further details about the empirical setup, design and excerpts from the student interviews are reported by koponen and huttunen (2013). these empirical studies reveal some common features, which answer the following questions: 1. what are the models students use to make predictions and explanations? 2. what are the determination (constraining or causal) schemes students employ as part of their models? 3. what attributes do students associate with the concepts, models or determination schemes they use? 4. how does communication in a small group (3 students) affect the relationships between concepts, models and determination schemes? the results of the re-analysis serve here to construct idealised sets of the mand d-constructs in contexts i-iii, with a summary of the results in table 1. a summary of the attributes revealed by the analysis is in table 2. m-constructs m1 and m2 are well-known electric current-based intuitive models found in many empirical studies (see koponen & huttunen, 2013, and references therein), while constructs m1’ and m2’ represent corresponding models, but are based on voltage (undifferentiated from current). these appear in relatively fewer cases, but are taken into account here. constructs m3 and m3’ are partially correct explanations, which take into account the role of components in determining the current. construct m3’, however, appears only once in the empirical data. construct m4 is the correct scientific model based on ohm’s law (d3) and kirchhoff’s laws i (d1) and ii (d2) which correctly differentiates between electric current and voltage. i. t. koponen & t. kokkonen 149 | f l r table 1 the mand d-constructs inferred from the empirical study (koponen & huttunen 2013) construct construct m1 the battery as a source of current. m1’ the battery as a source of voltage. m2 m1+ components consume current. m2’ m1’+ components consume voltage. m3 m1 + voltage over components creates current. m3’ m1’ + current over components creates voltage. m4 model based on ohm’s law + kirchhoff’s laws ki and kii. d0 constraining laws: conservation (of ―electricity‖ or current). d1 constraint: current is conserved in junctions/branches (kirchhoff i). d2 constraint: voltages in a closed loop equal zero (kirchhoff ii). d3 ohm’s law: u = ri or u/i = r. table 2 attributes a1-a9 inferred from the empirical study, with key word(s) used to characterise and identify each attribute. attribute key word attribute key word a1 stored a2 contained a3 consumed a4 conserved a5 degraded or diminished a6 divided and diminished a7 maintained a8 partitioned and conserved a9 generated, supported 4.2. representations as directed graphs most of the relevant elements found in the interviews can now be represented according to the systemic view and by using a directed graph model (dgm) to relate different c-, dand m-constructs to sets of attributes and evidence. the dgm is a representation, where connections between different elements are related through directed links which are either congruent or dissonant. the links and their direction provide information on how the elements interact. this has the advantage that dgm can serve as a computational template (koponen, 2013). an example of how dgm relates to different constructs and the most important links connecting them appears in figure 2. the links shown as solid lines are mutually supporting, congruent links; dissonant links, shown as dotted lines, point out contradictions. congruent links were recognised on the basis of how students combined these elements in different situations. the recognition of dissonant links was more problematic. most of the dissonant links are the interviewers’ interpretation of unavoidable logical contradictions rather than notions expressed by the students themselves (koponen & huttunen, 2013). different students’ conceptions can now be visualised as graphs with different node strengths. some typical students’ conceptions a-d found in the interviews and represented in this way appear in figure 3. cases a and b are the most common in contexts i and ii, while c and d usually occur only in context iii. d occurred in only two of the nine cases studied, while c (or constellations close to it) occurred in four cases (koponen & huttunen, 2013). an important aspect of the dgm representations is that they represent the students’ understanding as a constellation of c-, dand m-constructs and associated attributes with various strengths. of course, these strengths are idealisations of the systemic model, which only phenomenologically represents the apparent importance of a given construct as it can be identified in interviews, and only partial i. t. koponen & t. kokkonen 150 | f l r information about such strengths are available from the empirical data. nevertheless, such fine grained representations of students’ conceptions contain more information than do traditional ways based on written descriptions only. figure 2. directed graph model of all essential c-, dand m-constructs based on the empirical results, as reported in table 2. cand dconstructs are linked to attributes {a1, a2, ..., ak}, m-constructs are linked to sets of evidence {e1, e2, ..., ek}. links can be congruent (solid line) or dissonant (dashed line). construct c1 is current, and c2 is voltage. i. t. koponen & t. kokkonen 151 | f l r figure 3. some examples a-d of typical graphs representing students’ conceptions as projected on the dgm shown in figure 2 (these graphs are sub-graphs of the dgm). nodes are classified into three classes: strong, s > 0.7 (large circle); average, 0.3 < s < 0.7 (medium circle); and weak s < 0.3 (bullet). construct c1 is current, and c2 is voltage. 4.3. effects of communication the information about communication acts between the students (as it is available from the interviews) can serve to construct idealised communication patterns between students and to temporally locate the effects of communication on the students’ choices of models and attributions. analyses of data on the individual students’ conceptions have been published previously (koponen & huttunen, 2013), but data on communication is unpublished. a summary of the changes in models and how communication takes place is given in table 3. here, the re-analysed data, ordered in temporal sequences to reveal the communication acts, allows the identification of changes in cand m-constructs. unfortunately, the original data offer no detailed information on changes in sets of attributes. the results in table 3 show that the communication patterns in groups g1 and g2 are reciprocal in that all students exchange information in all directions. nevertheless, some one-person dominated patterns are evident, where a single student (s3 in g1 and s6 in g3) is more active than others. formally, such communication patterns between students p, p’ and p’’ can be modelled as a triad (see figure 4). with group g3, communication takes place reciprocally between all students and the communication pattern is dense. unfortunately, the empirical data here do not permit a more detailed analysis of the communication patterns. in what follows, the effect of communication is modelled as a triad; in one case as relatively sparse and one-person dominated, and in other case as dense and reciprocal. i. t. koponen & t. kokkonen 152 | f l r table 3 evolution of nine students’ conceptions 1-9 in groups g1-g3 of three students (s1-s3, s4-s6 and s7-s9) in contexts i, ii and iii, as idealised in terms of the dgm. communication events are shown as directed dyads i→ j from student i to student j, or as reciprocal dyads i ↔ j. group g1, with students 1-3 group g2, with students 4-6 group g3, with students 7-9 context s1 s2 s3 s4 s5 s6 s7 s8 s9 i a a (d) a a (d) a c a,c 1←3 2←3 3→1,2 4←5,6 5←6 6↔4 7←8,9 8↔9 9↔7,8 a c d c,b b d a,c c,a a,c ii 1↔3,2 2↔3,1 3↔1,2 4←6 5↔6 6→4,5 7↔8,9 8↔7,9 9↔7,8 c (d) d c b d 1←3 3→1 4←6 5←4 6↔4 c d d c c d c c,a a,c iii 1↔3,2 2↔2,3 3↔1,2 4→5,6 5←4 6←4 7↔8,9 8↔7,9 9↔7,8 d d d d d d c c c figure 4. patterns of students’ communication. the thickness of the arrows denotes the amount of communication. the communication pattern on the left is dominated by student p, while for students p’ and p’’ communication is sparse. the communication pattern on the right is reciprocal and dense between all students p, p’ and p’’. 5. computational embedding of systemic view in terms of dgm the directed graph model (dgm) can serve as a computational template; as a computational embedding of the systemic view to produce generically similar features found in empirical situations. in what follows, we briefly describe the computational features of such embedding; with similar type updating rules (see appendix) that have previously been introduced and motivated elsewhere (koponen, 2013). computational embedding transforms the qualitative notions contained in the systemic view into computational rules, quantifies the roles of model utility and theoretical guidance. concept learning and the degree of concept differentiation are monitored through two quantities: theoricity t and separability s. theoricity t describes the theoretical complexity of the concept, while separability s is connected to differentiation and, thus, to ontological shift. the pair of values (s,t) then specifies the learning path. i. t. koponen & t. kokkonen 153 | f l r in the dgm, c-, dand m-constructs are nodes connected by directed links. each node has a dynamically evolving strength, which determines its effect on the other nodes to which it is connected and, thus, the dynamics of the system. m-constructs are also connected to sets of evidence (see figure. 2). node strengths are updated after comparing m-constructs with the evidence, which means obtaining new evidence or reconsidering existing evidence. the dgm also has a memory effects in that the new strengths of the links and nodes and depend recursively on the previous values. furthermore, the simulations take into account also effects of communication between learners. in computational embedding information contained in one graphs affects the strengths of the nodes and thus the dynamics of another graph. to describe the state of the system and to characterise the evolution of the concepts, we must define several quantities in terms of node strengths and links. below is a short overview of these quantities. a complete description of the update rules and their definitions in terms of link and node strengths are given separately in the appendix. the details given in the appendix are not essential for understanding in general level how the model works, but the mathematical details give are needed to fully appreciate how the memory effects arise through connectivity from the global state of the network. the most important of the quantities are the theoricity t and separability s of c-constructs, which serve to specify the learning paths. theoricity t is a measure of the theoretical complexity of a c-construct that roughly describes the number of paths from c-constructs to m-constructs while taking into account the strengths of the links and nodes (see the appendix for details). separability s describes the degree of dissimilarity between c-constructs with regard to the different attributes associated with them (see table 2). if two c-constructs are connected to completely different sets of attributes, s is at its maximum value. both quantities are defined in a range from 0 to 1 so that t = 1 means full theoretic complexity (corresponding to the scientific use of given concept) and s = 1 means complete differentiation. the dynamics of the dgm depend crucially on the utility u of m-constructs, and the utility is the basis for model selection (the strengths of m-constructs depend on their utility, see appendix). first and foremost, the utility is proportional to the ratio of explained evidence to the theoretical complexity t of the c-construct while taking into account the relevant strengths of nodes and links. if the model explains most of the available evidence and its theoricity is low, the utility will be high and the model will be favoured in explanations. with more evidence to explain, the less complex models will generally explain less, thereby their utility is reduced. the extent to which the model takes into account conflicting evidence can be controlled with the parameter k, which also controls the effect of d-constructs on utility. if k is set to a high value, the state of the system is heavily guided by evidence and theoretical knowledge (i.e. d-constructs). with low values for k, conflicting evidence and theoretical information will be more or less ignored, thus (since d-constructs describing causal conservation laws will be less important), favouring simpler models. the importance of evidence (whether conflicting or not) can be also adjusted by weakening the links between m-constructs and evidence. one can also alter the order in which one encounters the evidence. in practice, parameter k is related to the potential of an individual student to make use of theoretical knowledge to construct explanatory models. learning also depends on the state of an individual student’s initial knowledge. the state of initial knowledge is taken into through the initial strengths of the different models m1-m4, usually so that simple models (such as m1 and m2) have high a priori strengths (they are then the preferred models), while complex models have low initial strengths. also, the initial strength of d-constructs affects how strongly theoretical knowledge will guide the learning process, an effect taken into account through parameter d’. in addition, students differ in their attentiveness to evidence, so that part of the evidence receives more weight than some other parts of the evidence. this is taken into account by giving weights to the evidence also. finally, the sequence of evidence and the order one encounters the evidence (i.e. the training sequence) affect the dynamics of the learning paths. the values of these parameters and their initial values serve to model the individual learners’ initial knowledge and their potential to make use of theoretical knowledge. in addition to the individual characteristics described above, communication between individuals affects the dynamics of learning paths. in the dgm, communication can also affect the strengths of the mconstructs. assuming pairwise communication between students, the stronger m-constructs affect the weaker ones so that the lower value is increased by the communication impact factor c. a value c = 1 means i. t. koponen & t. kokkonen 154 | f l r complete adoption of the highest utility models in communication, and c = 0 means ignoring completely the information provided through communication. the appendix explains the details of the communication model as part of the dgm. the computational model is idealisation and takes into account only the roughest features of concepts, models and communication. however, the model is constructed to include the most essential generic features and, as such, is capable of providing important insight into how different parts of a conceptual system and its internal connectivity affect concept differentiation. 6. results: dgm simulations of concept differentiation the directed graph model (dgm) and simulations based on it must make understandable the following generic features of concept learning and differentiation: 1) context-dependent dynamics of concept learning and differentiation (learning paths). the students’ conceptual states (as shown in figure 4) are context dependent in that they appear mostly in given contexts i-iii, with a given set of evidence (or observations) to be explained, and the state changes with the changes in set of evidence. 2) the dynamics and persistence of ontological change in attributions. changes in ontological attributions are indicative of concept differentiation. when it takes place, it leads robust and persistent learning outcome. 3) the effect of communication on concept differentiation. in two of three cases, a given group has a student with a more sophisticated conception and more differentiated concepts than the two other students have, but who eventually partially adopt that sophisticated conception. of course, the learning process entails many other details, but these generic features 1-3 are the most important and interesting ones that any model of concept learning and differentiation should explain. in what follows, we concentrate on simulating just such a process of concept learning and differentiation by using the dgm and monitoring the learning process through the theoricity t and separability s of concepts. 6.1. model parameters and initial conditions the dgm allows parameterisation of many different initial stages. the initial stages are described through the initial strength of m-constructs and d-constructs, and through the strength of the evidence. for initial model strengths, we studied here cases where m1 and m2 are strong models (strength 1.0 0.75), and m3 is of moderate strength (strength 0.5 0.25), and other m constructs are weak (strength 0.25 0.05). these cases are interesting, because they provide information on how initial, rather unsophisticated models such as m1 and m2 evolve during the learning process towards sophisticated models such as m4, and how concept differentiation relates to this change. this is also the learning path of most practical interest, a path from intuitive to scientific concepts. for initial d-construct strengths, we studied cases where d1, d2 and d3 are of equal strengths d’, varying from 1.0 to 0.4. in addition to initial values for d’, theoretical knowledge operates through congruent and dissonant connections, which can be tuned by parameter k (see the appendix and table 4) so that value k = 1 denotes the strongest guidance and, k = 0, no guidance at all. in addition to these parameterisations, the dynamics of the dgm and the learning paths depend on what we call here the training sequence, meaning evidence and the order in which one encounters it. the training sequences are constructed to correspond to empirical contexts i-iii (see section 4.1) so that each sequence iiiiii consists of evidence {e0, e1, e2, e0’, e1’ e2’, e0’’, e1’’, e2’’} where each element e0, e1, … is associated with strength e, specifying how much attention one pays to the evidence. if e = 1, then evidence is taken fully into consideration, but for 0.0 < e < 1.0, only partially. in the simulations, each event is repeated n times, (n = 3 or n = 4) and the sequence is then reversed in order to verify the i. t. koponen & t. kokkonen 155 | f l r permanence of learning (i.e. no reduction of values t and s for the reversed sequence, and no ―hysteresis‖ effect). thus, the computation consists of training sequences of form {nx(e’,e’,e’); nx(e’’,e’’,e’’); nx(e’’’,e’’’,e’’’); nx(e’’’,e’’’,e’’’); nx(e’’,e’’,e’’); nx(e’,e’,e’)}, with n = 3 consisting of 54 events and for n = 4 of 72 events. the training sequences of that form are completely specified by n and the set of values o = (e’,e’’,e’’’). in some cases, to test the hysteresis, an additional 38 events are added in random sequence, denoted by r. in summary, the parameters that specify the initial conditions are n and o, and the parameters affecting the dynamics are k and d’. 6.2. simulations of personal learning paths the personal (individual, without communication) learning paths are studied first in a case, where initial conditions favour models m1 and m2 with initial strengths of 0.75, but where model m2’ also has a substantial strength of 0.5. the learning paths begin from unsophisticated models, which closely correspond to patterns such as a and b (see figure 3), and then progress towards a more sophisticated patterns of type d. the learning paths of concept differentiation are monitored through the evolution of theoricity t and separability s. for comparison, estimates of the values of t and s corresponding to the empirical cases idealised as graphs a-d (see figure 3) are: t = 0.35 0.45, s = 0.10 0.20 for a; t = 0.35 0.45, s = 0.55 0.65 for b; t = 0.55 – 0.70, s = 0.60 – 0.70 for c; and t = 0.90 – 1.00, s = 0.95 – 1.00 for d. the learning paths are shown in figure 5 for training sequence parameterisations o1=(1.0,1.0,1.0), o2=(1.0,0.8,0.8) and o3=(1.0,0.5,0.1), with n = 3. the positions, where sequences corresponding to contexts i, ii and iii end, are denoted. theoretical guidance is studied for strong guidance k = 1.0 and 0.8, and for weaker guidance k = 0.50, while parameter d’ (for d-constructs) ranges from 1.0 to 0.4. the evolution of m-construct strengths, which reflects the competition between models that must explain more evidence, is shown in figure 6. the situation shown in figures 5 and 6 is asymmetric with respect to c1 (current) and c2 (voltage), with c1 always having a higher theoricity t than c2. this asymmetry stems from asymmetry in the initial strengths of m1 and m2, as shown in figure 6. this corresponds to the most frequent empirical situation in which students initially favour current-based models over voltage-based ones. one can also interpret the results in reverse way, with c2 having a higher theoricity and the roles of m1 and m2 reversed. however, this situation where voltage-based model is initially preferred over current-based model seldom occurs in empirical cases. for strong theoretical guidance (k = 1.0 or 0.8), together with close attention to observations (training sequences o1 and o2), learning and concept differentiation are successful. in such cases (figure 5, in the upper row, two cases on the left) learning is complete for concept c1 (current), which is fully scientific (t = 1) and completely differentiated (s = 1) from concept c2 (voltage). concept c2 is nearly scientific (t = 0.6), and with some extra training (shown in grey in figure 5), it rapidly becomes a fully scientific (t = 1) concept. the learning paths are step-wise, with clearly distinguishable stable stages in theoricity t with increasing separability s. a sequence corresponding context i is already enough for relatively advanced differentiation (i.e. ontological shift), although theoricity t may remain low. there appears to be a threshold of s = 0.7-0.8, which one can reach even with moderate development in theoricity. this threshold shows that one can achieve nearly complete separability and good differentiation (i.e. nearly complete ontological shift) even though the learning is otherwise still incomplete. when theoretical guidance decreases, k = 0.5 and d’ = 0.60 (figure 5, lower row, middle) or k = 0.8 and d’ = 0.4 (figure 5, upper row, right), the theoricity of concepts c1 and c2 remains low for the training sequence (black dots), but again, with extra training (grey dots), improvement is possible. this trend shows that, eventually, even moderate theoretical guidance is effective, but then more training is needed. however, the order one encounters the evidence in training is not crucial, if the context iii is involved. when theoretical guidance is low (k = 0.5, d’ = 0.4) and little attention focuses on evidence in case of context iii, very little learning takes place, irrespective of the amount of training. this situation is shown in figure 5 in the lower right corner. the evolution of m-constructs in figure 6 corresponds to learning paths in figure 5. the initially dominant m-constructs m1 and m2 remain dominant until the end of the sequence corresponding to context i. t. koponen & t. kokkonen 156 | f l r i. during the sequence corresponding to context ii, models m1’ and m2’ also become active and grow stronger. however, when sequence iii begins, with strong theoretical guidance, the initial models cannot compete with m4 (fully scientific model), which eventually dominates when the sequence corresponding to context iii ends. this does not occur with low theoretical guidance (figure 6, right column). however, with extra training, m4 is eventually enforced in cases of moderate theoretical guidance also (figure 6, lowest row), but not for the least guidance (figure 6, lower right corner). it is noteworthy that m3’ is never activated, which is in line the empirical finding that such a voltage-based model is only seldom encountered. figure 5. the theoricity t and separability s of concepts c1 (bullets) and c2 (boxes) in the case of six different learning paths with given parameters k and d’ that control the strength of theoretical guidance. the upper row shows cases where k ≥ 0.8 is always relatively high but d’ varies from 1.0 to 0.4. in the lower row, k also varies from a high value of 0.8 to a lower value of 0.5. the initial values of the model strengths and strengths of the observations are different in cases shown in the left, middle and right columns (corresponding model strengths are shown in figure 6). left column: initial values of model strengths favour models m1 and m2’ with strengths of 0.5, while other models have a weaker but equal strength of 0.25. the observations of events i-iii are strong (link strengths have a value of 1). middle column: model strengths as in the left column, but m3, m3’ and m4 are reduced to 0.15, observations i-ii are strong (1), but iii is only moderately strong (0.75). right column: otherwise similar to the middle column, but the observations in case iii are weak (0.10). the training sequence from i to iii (end points of each sequence are marked in the figure), with three repetitions for each event appear in black dots. the training sequence testing the permanence of learning from i to iii, then back from iii to i, and one random sequence appears in grey dots. the values corresponding to the empirical results of configurations a-d (see figure 3) are indicated (two symbols for each are located in the pairs of the lowest estimated and highest estimated values for t and s). i. t. koponen & t. kokkonen 157 | f l r figure 6. model evolution of the learning paths shown in figure 5 with parametrisations for k and d and different training sequences o1, o2 and o3 as indicated. the color represents the strength of a given model. the number of steps in the simulation appears on the vertical axis thus indicating the ordering of the sequence. learning paths with slightly different initial conditions from the cases in figure 5 are shown in figure 7. in the cases shown in the upper row, the m-constructs m3 and m4 are slightly weaker than in the case shown in figure 5. in the cases shown in the lower row, m-constructs m1 and m2 are of nearly equal strengths, which makes the initial stage more symmetric with respect to c1 and c2. in both cases, the attention to events corresponding to contexts ii and iii becomes weaker from left to right, represented as parameterisations o3 (as in figure 5) and o4 = (1.0,0.8,0.5). in addition, the training sequences is now such that each event is reproduced three (n = 3, black dots) or four times (n = 4, grey dots). the corresponding evolution of m-constructs is shown in figure 8. compared to the cases shown in figure 5, one can observe some interesting differences. in the case shown in figure 7, in the upper left corner, learning consists mostly of ontological shift through end of the sequence corresponding to context ii. after that, when the sequence corresponding context iii begins, theoricity t rapidly increases because m4 rapidly gains strength (see figure 8) due to strong theoretical guidance. eventually, when the training sequence ends, learning is again complete. the training sequence with n = 4 leads to higher theoricity t of concepts in i and ii, but interestingly, to a slower increase in theoricity in iii than in cases with n = 3 because with n = 4, m3’ grows strong during i and ii, which slows the adoption of m4. this ―overlearning‖ effect is most pronounced in the case shown in figure 7, in the upper right corner, where theoretical guidance is low and attention paid to events is also low. in this case, more frequent repetition of events with n = 4 leads to deterioration of the learning results, and learning stagnates on the low theoricity t of c1 and c2. ontological shift, however, advances and eventually, separability s = 0.8 is reached. such a situation corresponds to what occurs in real learning; too much focus on overly simple tasks, which reinforces unsophisticated models, may lead to the persistent and robust use of under-developed models and conceptions. i. t. koponen & t. kokkonen 158 | f l r figure 7. three deterministic cases of learning paths with a given k. the figures show theoricity t and separability s of c-constructs c1 (bullets) and c2 (boxes) from figure 2 and indicates the values corresponding to empirical results a-d (see figure 3). construct c1 is current, and c2 is voltage. the initial conditions favour models m1, and m2’ (voltage based). figure 8. model evolution of the learning paths in figure 7 with parameterisations for k and d and different training sequences o3 and o4 as indicated. the darkness represents the strength of a given model. the numbers of steps in the simulation appear on the vertical axis, thus indicating the ordering of the sequence. the lower row in figure 7 shows some interesting situations, where repetition temporarily leads to the deterioration of learning results, when simple situations recur after the sequence iiiiii. we briefly refer to this as ―hysteresis‖ in learning. eventually, however, (figure 7, lower row, two cases on left) i. t. koponen & t. kokkonen 159 | f l r complete learning with t = 1 and s = 1 occurs and learning becomes permanent, no longer affected by further repetitions. in the case of weak theoretical guidance and weak learning from events (figure 7, lower right corner), incomplete learning occurs, with stable learning resulting at t = 0.4 and s = 0.6. this is again due to ―overlearning‖ of incomplete models, which prevents the adoption of the more sophisticated model m4. similar results can also be observed for other cases with moderate k and moderate attention to observations; repeating simple situations i and ii many times before encountering more complex situation iii, may reinforce the incomplete models m1-m3 or m1’-m3’ so much that further development can no longer take place. this shows that repetitions of training sequences can have detrimental consequences on learning if initial theoretical guidance is too low. the examples discussed above are asymmetrical situations, where concept c1 (current) is the favoured concept, while c2 (voltage) is initially less developed, and remains largely as is during further evolution. this is the most common situation in learning, although the roles can sometimes be reversed. however, because the dgm is symmetrical with respect to c1 and c2 (see figure 2), a reversed situation where c2 has stronger theoricity than c1 is quite to similar if c1 and c2 simply switch roles. also, the symmetrical situations can occur can closely follow the results in figures 5 and 7, with the learning paths for c1 and c2 then simply overlapping. 6.3. simulations of communication effects the effects of communication on learning paths are simulated by using the sparse and dense communication pattern between members p, p’ and p’’ in a group of three (see figure 4), and two impact factors c = 0.2 and c = 0.75 for communication. the effect of communication is tested on cases where theoretical guidance is strong or moderate (figures 5 and 7, upper row, in the right column). the results of learning paths are shown in figure 9, and the evolution of m-constructs in figure 10. these figures show that even the effect of dense communication with a high impact c = 0.75 only moderately affects the learning paths. the most obvious effect is that if one member p in the group has a learning path which is strongly theoretically guided, thus reaching high values for t and s, the other cases tend to learn from that specific case and improve their learning and, consequently, reach higher values for t and s than without communication. eventually, members p’ and p’’ who are less successful (figures 5 and 7, in the lower right corner) than member p also achieve complete learning owing to the communication. this happens equally well for sparse and low-impact communication as for dense and high-impact communication. of course, this occurs only in cases with one successful learner in the group. these features appear to be in concordance with the empirical findings, although the empirical findings presently allow no more detailed comparisons. in the learning model, which is simply biased toward adopting the strongest model, the good learning result may temporarily worsen (see figure 9, upper right corner). however, this is a transient effect, and the learning path eventually evolves toward complete learning. i. t. koponen & t. kokkonen 160 | f l r figure 9. learning paths with the effect of communication taken into account. different figures represent paths with different parameterisations for k and d and different training sequences o1, o2, o3 and o4. figure 10. model evolution of the learning paths in figure 9 with different parameterisations for k and d and different training sequences o1, o2 and o3. the darkness represents the strength of a given model. the number of steps in the simulation appears on the vertical axis, thus indicating the ordering of the sequence. i. t. koponen & t. kokkonen 161 | f l r 6.4. summary of simulation results the results based on the dgm agree with the following central empirical findings of concept learning and differentiation: 1. context-dependent dynamics. this is apparent in the strong dependence of paths on the learning sequences. complete learning takes place only in sufficiently rich contexts (e.g. case iii), whereas in narrow contexts (e.g. cases i and ii), learning is moderate or incomplete. this is a consequence of model competition and the greater utility of complex models in complex contexts. 2. the persistence of ontological shift and concept differentiation (s ≈ 1). in the dgm, this is a direct consequence of the guidance of d-constructs and their ―memory effect‖, retaining the memory of successful applications of d-constructs. the persistence of the ontological shift agrees with the empirical findings. however, the ontological shift in attributions is not a driving force of concept learning, but an outcome of a learning process driven by theoretical knowledge. 3. communication affects individual learning paths and enables less advanced members of the group to adopt more advanced m-constructs from the most advanced member of the group. thus communication improves learning, although the effect in the cases studied here is not particularly strong. in summary, the dgm model reproduces the generic features of interest in concept learning and differentiation, and demonstrates that these features are associated with the guidance of theoretical knowledge, model utility and the memory effects of success in using models. 7. discussion and conclusions the model presented here is based on the systemic view, where concepts are viewed as complex, dynamically evolving structures. the model is constructed to capture generic aspects of concept learning and differentiation as exemplified in the case of learning two closely related scientific concepts – here, electric current and voltage. the generic features of most interest in need of explanation are: 1) the robustness of certain simple ways to use concepts to provide explanations in simple situations, a phenomenon usually assigned to robustness of misconceived ontological classes, 2) the context-dependent dynamics of change and requirement to encounter complex enough situations to effect in the change, and 3) the robustness of ontological shift once it has occurred. we suggest here that in order to understand these features and the dynamics of the change, we must develop a rich and complex enough model of concepts. on the one hand, the model of concepts takes a form of simple and nearly self-explanatory concepts, but on the other hand, a form of complex structures, dependent on other concepts. the systemic view is embodied by the use of the well-known case of the differentiation of electric current and voltage as concepts describing the behaviour of simple dc circuits. in that, we use here reanalysed empirical data. the re-analysed (and partly re-interpreted) empirical results are then represented by using different conceptual elements, or constructs: c-constructs, which stand for concepts, d-constructs for causal schemes and law-like theoretical schemes, and m-constructs, which are model-like structures that use cand d-constructs as integrated parts. as a formal representational model for these constructs and their mutual relationships, we introduce a directed graph model (dgm). in the dgm, concepts are nodes in the graph connected by directed links to other conceptual elements. the dgm serves as a computational template to simulate concept learning and differentiation and their dynamics. the stability of certain properties of concepts, traditionally considered robust ―misconceptions‖, and their dependence on contexts is now seen as related to the complex interplay of different conceptual elements. change is driven by competition between m-constructs (models) and by how available evidence governs it. however, how this is reflected in the theoryand attribute-relatedness of concepts depends on how those m-constructs employ concepts (i.e. how the concept projects onto the actual evidence). thus, d-constructs (theoretical knowledge) are central. all these aspects are recognised in current i. t. koponen & t. kokkonen 162 | f l r cognitively oriented views of concept learning, but are usually discussed separately or as unrelated views. the present study strongly suggests unifying these views and treating concepts as complex, multifaceted and dynamic structures. finally, the present work suggests that the theoretical background developed for research on concept development has important implications for the ways in which researchers and instructors view the learning process and how, on this basis, they design teaching solutions. the results point to the crucial role of theoretical knowledge in guiding concept learning and, furthermore, show that ontological shift, while an important part of learning, is not the primary driving force of learning, but is rather a consequence of more fundamental changes in the conceptual system. this suggests that theoretical structures and model utility should receive more attention in designing instructional solutions. on the other hand, it is clear that initial conceptions need not be actively ―unlearned‖; they can instead serve as a natural and useful starting point for the transformation. for teaching and instruction, one important message lies in the role of the training sequence in determining learning paths. the details of the training sequence, and their repetitions, do matter in the initial stages of learning if the guidance of theoretical knowledge is low. then, too much repetition of overly simple situations to explain may lead to ―overlearning‖ of unsophisticated concepts and models, and effectively prevent the acquisition of more advanced model. the results of the simulations, interpreted within the theoretical framework of the systemic model put forward here, suggests that designing specific training sequences which help one to ―unlearn‖ unsophisticated models is unnecessary; rather, what is needed is a training sequence which gradually and at suitable stages of learning introduces more challenging learning situations, where the utility of more advanced and scientific concepts and models becomes apparent. the theoretical positions discussed and suggested here directly impact learning and instruction by clarifying the degree to which degree ontological shift drives the learning process and to which degree it should be considered a consequence of more fundamental, theory-driven learning process. also, the question of to what extent differentiation and concept learning take place through the evolution of existing structures, and to what extent the learner must receive these structures instead of constructing them receives clarification. briefly, if the systemic view is correct, it suggests that ontological shift takes place, but is a consequence of theory-driven learning. the learner must receive complex theoretical structures through instruction and see their utility in complex enough situations to warrant adopting them. such results are practical in that they guide teaching and the development of teaching solutions; they provide support to some of the well-known teacher-centred solutions (the role of the teacher in providing models to organise new knowledge and in familiarising the students with complex theoretical models), while showing the indispensability of a rich context and context variation in the construction of explanatory models and the role of predictions and observations in learning. these notions, even without detailed suggestions for training sequences and instructional solutions, demonstrate that the choices to employ a certain theoretical framework to understand concept learning and differentiation are not neutral. rather, they have fundamental consequences for how learning and instruction are conceived, how their purposes and goals are viewed, and how our attention is guided towards crucial generic features of learning and its dynamics. keypoints concepts are considered complex structures, which are projected differently in different contexts. concept differentiation can be modelled when embedded within a systemic view on concepts. theoretical guidance and theoretical schemes are crucial for concept differentiation. ontological shift is a consequence of theory-guided learning process. i. t. koponen & t. kokkonen 163 | f l r robust misconception are stable dynamic states of the concept system attention must be paid on training sequences in learning. too frequent use of overly simple situations in training will stagnate the concept learning in robust states corresponding misconceptions. references andersen, h. barker, b. and chen, x. (2006). the cognitive structure of scientific revolutions. cambridge, ma: cambridge university press. andersen, h. and nersessian, n. j. (2000). nomic concepts, frames, and conceptual change. philosophy of science, 67, s224-s241. brown, d. e., & hammer, d. (2008). conceptual change in physics. in s. vosniadou (ed.), international handbook of research on conceptual change (pp. 127–154). new york: routledge. carey, s. (2010). the origin of concepts. new york, ny: oxford university press. chi, m. t. h., & slotta, j. d. (1993). the ontological coherence of intuitive physics. cognition and instruction, 10, 249-260. chi, m. t. h. (2005). commonsense conceptions of emergent processes: why some misconceptions are robust. the journal of the learning sciences, 14, 161-199. doi: 10.1207/s15327809jls1402_1. chi, m. t. h. (2008). three types of conceptual change: belief revision, mental model transformation, and categorical shift. in s. vosniadou (ed.), international handbook of research on conceptual change (pp. 35–60). new york, ny: routledge. chi, m. t. h., & brem, s. k. (2009). contrasting ohlsson's resubsumption theory with chi's categorical shift theory'. educational psychologist, 44, 58 — 63. doi: 10.1080/00461520802616283. cohen, r., eylon, b., & ganiel, u. (1983). potential difference and current in simple electric circuits: a study of students’ concepts. american journal of physics, 51, 407-412. danks, d. (2010). not different kinds, just special cases. behavioral and brain sciences 33, 208-209.doi: 10.1017/s0140525x1000052x engelhardt, p. v., & beichner, r. j. (2004). students’ understanding of direct current resistive electrical circuits. american journal of physics, 72, 98-115. doi: 10.1119/1.1614813. gopnik, a., & meltzoff, a. n. (1997). words, thoughts, and theories. cambridge, ma: mit press. gupta, a., hammer, d., & redish, e. f. (2010). the case for dynamic models of learners’ ontologies in physics. the journal of the learning sciences, 19, 285-321. doi: 10.1080/10508406.2011.537977. henderson, l., goodman, n. d., tenenbaum, j. b., & woodward, j. f. (2010). the structure and dynamics of scientific theories: a hierarchical bayesian perspective. philosophy of science, 77, 172–200. hoyningen-huene, p. (1993). reconstructing scientific revolutions: thomas s. kuhn’s philosophy of science. chicago, il: the university of chicago press. jeong, h & chi, m. t. h. (2007). knowledge convergence and collaborative learning. instructional science, 35, 287–315. doi: 10.1007/s11251-006-9008-z. keil, f. c. (1989). concepts, kinds and conceptual development. cambridge, ma: mit press. koponen, i. t. (2013). systemic view of learning scientific concepts: a description in terms of directed graph model. complexity, 19, 27-37. doi: 10.1002/cplx.21474. koponen i. t. and huttunen l. (2013). concept development in learning physics: the case of electric current and voltage. science & education, 22, 2227-2254. doi: 10.1007/s11191-012-9508-y. koumaras, p., kariotoglou, p. & psillos, d. (1997). causal structures and counter-intuitive experiments in electricity. international journal of science education, 19, 617–630. lee, y., & law, n. (2001). explorations in promoting conceptual change in electrical concepts via ontological category shift. international journal of science education, 23, 111149. machery, e. (2009). doing without concepts. oxford: oxford university press. murphy, g. l. (2004). the big book of concepts. cambridge, ma: mit press. nersessian, n. (2008) creating scientific concepts. mit press: cambridge, ma. i. t. koponen & t. kokkonen 164 | f l r ohlsson, s. (2009). resubsumption: a possible mechanism for conceptual change and belief revision. educational psychologist, 44, 20-40. doi: 10.1080/00461520802616267. ohlsson, s. (2011). deep learning: how the mind overrides experience. cambridge, ma: cambridge university press. rehder, b. (2003). categorization as causal reasoning. cognitive science, 27, 709–748. reiner, m., slotta, j. d., chi, m. t. h., & resnick, l. b. (2000). naive physics reasoning: a commitment to substance based reasoning. cognition and instruction, 18, 1-34. shipstone, d. m. (1984). a study of children’ s understanding of electricity in simple dc circuits. european journal of science education, 6, 185198. slotta, j. d., & chi, m. t. h. (2006). helping students understand challenging topics in science through ontology training. cognition and instruction, 24, 261-289. smith, c., carey, s., & wiser, m. (1985). on differentiation: a case study of the development of the concept of size, weight and density. cognition, 21, 177-237. smith, e. e., & medin, d. l. (1981). categories and concepts. cambridge ma: harvard university press. weinberger, a., stegmann, k., fischer, f. (2007) knowledge convergence in collaborative learning: concepts and assessments. learning and instruction, 17, 416-426. appendix: dgm update rules the dynamics of the dgm is determined by the update rules of the node strengths and weights of connecting links between the nodes. in the dgm, c-, dand m-constructs are nodes, which are connected by directed links. each node i has dynamically evolving strength si, which determines its effect on the other nodes to which it is connected and, thus, the dynamics of the system. node strengths s (the subscript is omitted if not essential) are updated after each ―event‖ e in the set of all events e, which means obtaining new evidence or reconsidering the evidence (i.e. any kind of comparison with the evidence). variable e is treated as a running index that keeps track of the encounter with evidence. the strength of the previous step s(e-1) is then updated to a new value s(e-1) s(e) that corresponds to evidence e. the congruent link between nodes i and j is described by the value aij =1, while dissonant link has aij = -1. the following quantities are then defined entirely in terms of node and link weights. 1. theoricity t is the theoretical complexity of the c-construct. the more there are d-constructs and m-constructs, which are connected to c-constructs, the greater is the theoretical complexity of the cconstruct (i.e. the greater is its theoricity). quantitatively, within the dgm, theoricity t can be quantified as the number of directed paths from the c-construct to the m-construct and to their respective strengths. in some of the paths, the d-constructs are also involved, which increases their theoricities. theoricity tc of the c-construct at node c ϵ c (index c refers to c-constructs) is: the first term represents one-step paths from the models (m ϵ m) to the c-construct. the second term represents two-step paths through the d-construct (d ϵ d). note here that sc = 1. the theoricity of a model, needed in what follows as part of the dynamic update rules for the dgm, is defined similarly, but now sc sm and inverted directed paths from the model to the cand d-constructs are counted (see table iv). 2. separability s measures the degree of dissimilarity between the set of attributes associated with two c-constructs represented by nodes c and c’. it is defined in regard to attributions only, as in the i. t. koponen & t. kokkonen 165 | f l r prototype theories. separability is operationalised as a suitably normalised number of unshared elements so that for fully differentiated concepts, s = 1, while for similar concepts, s = 0, defined as: here, the element aac represents attributions (i.e. the values of attributes a, to be defined later) linked to c-construct c, and n is a normalisation factor . in calculating separability, n takes into account the total strength of the attributions so that s ≈ 1 represents strong attributions with totally dissimilar attributions, while s << 1 can represent either totally similar or weak attributions. 3. utility u of the models in providing explanations depends on the ratio of explained facts to the model’s theoricity t. the utility of model m ϵ m is defined as where e is an event in set e, and e’ is a node which groups events together (see figure 2). dissonant (negative) links are denoted by a’ij. the first term in the sum represents direct congruent paths to the models (i.e. explanations), the second term direct dissonant paths, the third term two-step paths through e’, and the last term two-step paths through d-constructs. parameter k ϵ [0,1] controls the effect of dissonant links and the d-constructs on utility. if the model explains most of the available evidence and its theoricity t is low, its utility is high. however, with a growing set of evidence to explain, models with low theoricity generally fail, thereby reducing their utility. utility is the basis for model comparison and selection. the updating rules for the node strengths determine the dynamics of the graph and, consequently, the evolution of quantities u, t, and s, which depend dynamically on the node strengths. the updating rules also contain a memory effect, and the new values s(e) with evidence e depend recursively on the past values s(e-1), or s in shorthand. the updating rules are defined as follows: 4. the update rule for strength sm of m-constructs m is based on bayesian-type selection criteria (koponen, 2013; henderson et al., 2010) and depends on the utilities of the models. each m-construct has a certain expected plausibility or probability sm, which is updated to sm(e) when more evidence e in the form of observations becomes available. the new plausibility with evidence e is evaluated according to the bayesian rule so that if um(e) is the utility of model m when e is known, the model strength is then updated to where the sum in the denominator ensures its normalisation. generally, the more complex cconstructs provide more alternatives for m-constructs, so the bayesian rule favours ―simple‖ m-constructs. however, this may change when observations accumulate and more complex m-constructs explain more. the initial conditions of the system are its prior strengths and utilities, which must be deduced from available empirical data (based e.g. on interviews). 5. strength sd with evidence e depends on the connections of the d-construct to other d-constructs and m-constructs. the update rule for it is i. t. koponen & t. kokkonen 166 | f l r where the first and the last sums take into account the fact that dissonant connections reduce strength, while the second sum takes into account the fact that connections to successful m-constructs increase strength. the factor (1-sd) takes into account the ―memory effect‖; the use of given d-construct in successful m-constructs increase the value of sd, which does not decrease again. this models the known effect that the successful application of theoretical knowledge increases confidence in that knowledge. 6. attribute strengths sa are updated by taking into account one-step congruent (second sum) and dissonant (first sum) paths from the models, where parameter k controls the weight of dissonant paths (compare to utility). attributions aac are then defined on the basis of attribute strengths, where the first term represents two-step paths, and the last term, three-step paths from the attributes to the c-constructs through the d-constructs (see figure 2). attributions serve as a basis for calculating the separability s of a given pair of concepts (see definition 3 above). 7. the effect of communication on the dynamic of the dgm operates through the strengths of the mconstructs. at each step, where pairwise communication between students p and p’ is assumed, the stronger m-constructs affect the weaker ones. this is done by setting new values to the sm and s’m of p and p’, respectively, so that the larger value of them remains unchanged, but the lower value increases by factor c max{0,sm s’m}, where c is the communication impact factor. the update takes place before applying the bayesian rule in step 4. this means that the update rule can be interpreted as learning by adopting better explanatory models before comparing utilities. the update rules for the strengths of the mand d-constructs drive the dynamic evolution of the graph. table 4 summarises these strengths and other quantities defined in 1-6. table 4 definitions of quantities t, s and u, as well as update rules for node strengths s, are given for dand mconstructs and their attributes. the attributes of the c-construct are given by aac, and the separability of c and c’ by scc’. subscripts c, m, d and a denote c-, mand d-constructs and attributes, respectively. congruent links between i and j are denoted by aij = 1, while for dissonant links, a’ij = -1. name definition theoricity tk if k=m then i=c ; if k=c then i=m strength of d sd utility um strength of m sm strength of a sa attribution of c aac separability scc’ where endedijk et hoogeboom et al frontline learning research vol.6 no. 3 (2018) 123 147 issn 2295-3159 using sensor technology to capture the structure and content of team interactions in medical emergency teams during stressful moments maaike endedijk*a, marcella hoogeboom*a, marleen groeniera, stijn de laata, jolien van sasa a university of twente, the netherlands *both authors contributed equally to this work article received 13 april 2018 / revised 11 novemner/ accepted 23 november/ available online 7 december abstract in healthcare, action teams are carrying out complex medical procedures in intense and unpredictable situations to save lives. previous research has shown that efficient communication, high-quality coordination, and coping with stress are particularly essential for high performance. however, precisely and objectively capturing these team interactions during stressful moments remains a challenge. in this study, we used a multimodal design to capture the structure and content of team interactions of medical teams at moments of high arousal during a simulated crisis situation. sociometric badges were used to measure the structure of team interactions, including speaking time, overlapping speech and conversational imbalance. video coding was used to reveal the content of the team interactions. furthermore, the empatica e4 was used to unobtrusively measure the team leader’s skin conductance to identify moments of high arousal. in total, 21 four-person teams of technical medicine students in the netherlands were monitored in a simulation environment while they diagnosed and managed a patient with cardiac arrest. outcomes of this exploratory study revealed that more effective teams showed greater conversational imbalance than less effective teams, but during moments of high arousal the opposite was found. also, a number of differences were found for the content of team interaction. combining sensor technology with traditional measures can enhance our understanding of the complex interaction processes underlying effective team performance, but technological advances together with more knowledge about the simultaneous application of these methods are needed to tap into the full potential of wearable sensor technology in team research. keywords: : team interaction; video observation; skin conductance; sociometric badges; medical simulation; action teams. info. mail corresponding authors: a.m.g.m.hoogeboom@utwente.nl and m.d.endedijk@utwente.nl . doi: doi: https://doi.org/10.14786/flr.v6i3.353 1. introduction teams are ubiquitous in organizations. since the end of the 20th century the focus of work in organizations has shifted from the individual employee to employees as part of a team (kozlowski & ilgen, 2006). alongside this shift we have seen an increase in research focusing on the collaborative processes and outcomes of different types of teams (vangrieken, boon, dochy, & kyndt, 2017). one specific form of teams is an action team, “…where members with specialized skills must improvise and coordinate their actions in intense, unpredictable situations” (edmonson, 2003, p. 1421). in other words, it is the task of action teams to quickly establish effective coordination and communication in unexpected situations. as negative outcomes of these team processes can be detrimental for human safety (e.g., in medical teams or aviation teams), it is of utmost importance that these teams are trained well in performing these complex team interaction skills in a realistic environment. simulation rooms provide excellent, risk-free opportunities to practice both technical and team interaction skills in realistic scenarios that allow team members to experience how they will perform during stressful moments (kneebone, nestel, vincent, & darzi, 2007). not only research but also practice can benefit from exploring team interaction processes during such stressful moments during scenario-based training (entin & serfaty, 1999; lei, waller, hagen, & kaplan, 2016). traditionally, debriefing sessions with expert debriefers are used to provide feedback in simulation-based learning (fanning & gaba, 2007). during these sessions, expert debriefers select specific, observable events in the scenario and stimulate trainees to reflect on their behavior and decisions. the use of expert debriefers to provide feedback is not only very costly and timeand labor-intensive, but research has also shown that how debriefers facilitate the debriefing sessions is highly variable (tannenbaum & cerasoli, 2013). in addition, the events they select are not necessarily the moments team members experience as stressful. we argue that a next step is needed to move from traditional human-based observation methods to methods that allow for more objective and timely identification of effective team interaction processes during stressful moments. wearable sensors have opened up a new world of research possibilities to detect body signals and analyze speech from team interactions, providing insights into how people respond and interact without interfering with their natural work processes (fischer & järvelä, 2014). for example, sociometric badges (e.g., olguin et al., 2009; pentland, 2012) are sensors that are worn around the neck, similar to typical id-badges, and are able to detect various features of social interaction. the empatica e4-wristband [empatica ins, cambridge, usa] (garbarino, lai, tognetti, picard, & bender, 2014) is able to measure skin conductance (also known as electrodermal activity: boucsein, 2012) in an unobtrusive way. this can be used as an indicator for identifying moments of high arousal (boucheix, 2017; christopoulos, uy, & yap, 2016), which typically reflect high levels of distress in the context of medical action teams during a crisis situation (hunziker, johansson, et al., 2011). when combined, these sensors enable detailed exploration of social interaction in teams during moments of high arousal in action teams. however, a lot is still unknown about how to use and combine these sensors with more traditional measures to detect effective team interaction processes during moments of high arousal, especially for action teams. therefore, in line with the purpose of this special issue, the goal of this study is to clarify the methodological approach, added value and pitfalls of using and combining different sensor technologies in combination with video observation. our study was performed in a medical simulation room for advanced life support (als) training. als is a complex emergency situation following cardiac arrest of a patient and is characterized by “extreme time pressures, diagnostic uncertainty, and rapidly evolving situations” (doumouras, keshet, nathens, ahmed, & hicks, 2012, p. 274; hunziker, laschinger, et al., 2011). research has shown that human factors such as efficient team communication, coordination, and stress especially affect als efficiency and performance (fernandez castelao, russo, riethmüller, & boos, 2013; hunziker, laschinger, et al., 2011). some have estimated that poor non-technical skills can contribute to 64 to 83% of critical incidents in a medical context or crisis situation, for example, in anesthesia (arnstein, 1997). in this study, we combine traditional video observation methodology to systematically analyze the content of team interaction behaviors with innovative sensor technology to explore the structure of team interaction processes in more detail (sociometric badges) and identify moments of high arousal (empatica e4). the outcomes of this study demonstrate how using a combination of different sensors and traditional measures provides a means to get a rich picture of complex team interactions during moments of high arousal. these insights advance our knowledge of how to use and combine sensor technology and how this information can be used for optimization of simulation environments to help prospective and current medical professionals to improve not only their medical skills, but also their team interaction skills. 2. theoretical framework 2.1 simulation-based medical education simulation in medical education is used as an educational technique to improve health outcomes, dating back to the 17th century (cooke, irby, & o’brien, 2010; mcgaghie, issenberg, petrusa, & scalese, 2009). the number of medical simulation settings has expanded with the development of complex technologies. such advanced technologies enable simulations that closely resemble reality, especially when combining them with high-fidelity scenarios built around events that potentially could have serious consequences for the patient (dias & neto, 2016; grenvik, schafer, devita, & rogers, 2004). skills acquired in a well-designed medical simulation environment show better transfer to improved real-life patient care compared to traditional, on-the-job medical training (mcgaghie, issenberg, cohen, barsuk, & wayne, 2011). simulation is especially useful in training under conditions of uncertainty, ambiguity and rapid situation changes with potentially severe consequences for patient safety (satish & streufert, 2002). high-fidelity simulation-based learning is nowadays the standard for training als teams who must diagnose and manage a patient in cardiac arrest (sahu & lata, 2010). diagnosing and managing a patient in cardiac arrest requires immediate medical intervention and efficient teamwork; otherwise, a patient might not survive (hunziker, johansson, et al., 2011). successful resuscitation depends on the integrated application of technical skills, such as intubation, chest compressions, and clinical reasoning, and non-technical skills related to working in a team, such as communication, decision-making, and leadership (hunziker, tschan, semmer, howell, & marsch, 2010). effective and efficient teamwork in a resuscitation scenario requires a sequence of actions that is performed in the correct way and at the right time (hunziker, johansson, et al., 2011). 2.2 team interaction a fundamental aspect that can lead to high team performance in a crisis situation, such as a cardiac arrest, are the emergent, interactive processes between medical personnel (e.g., hunziker, johansson, et al., 2011). team interaction is defined as a series of ongoing behavioral processes and actions that occur over time (lei et al., 2016; stachowski, kaplan, & waller, 2009). for decades, team researchers have been advocating to capture the dynamic nature of such team interactions, as opposed to using static measures (marks, zaccaro, & mathieu, 2000). there are several ways to quantify verbal team interactions. some researchers focus on the content of the interaction, for example, by using observation schema that identify what team members say, such as the number of agreements, suggestions, or opinions in team conversations (e.g., atwal & caldwell, 2005; hoogeboom & wilderom, 2015). others focus on the structure of the conversation, such as which team member is speaking, the number of interruptions or the degree of turn-taking, regardless of the content (kim, mcfee, olguin olguin, waber, & pentland, 2012; koudenburg, postmes, & gordijn, 2017; pugliese, nicholson, & bezemer, 2015). for example, when providing teams real-time feedback on conversational balance (i.e. over-participators were stimulated to decrease their participation), this influenced their decision making for either better or worse (dimicco, hollenbach, pandolfo, & bender, 2007). in the current study, both the content and structure of team interactions will be taken into account. previous studies have already identified some content and structure related characteristics of effective team interaction in action teams which are dealing with crisis situations. for example, using video observation and coding, previous studies on the structure of interactions of airline teams have shown that effective teams displayed less complex, more homogenous interaction processes (kanki, folk, & irwin, 1991; zijlstra, waller, & phillips, 2012). other related research on the structure of interactions in nuclear power plant control room crews showed that effective interactions during crises consisted of fewer actors and less back-and-forth communication (stachowski et al., 2009). hence, shorter, less complex, and less reciprocal team interaction, indicating somewhat scripted or standardized forms of team interaction, seem to be more effective in action teams. regarding the content of interactions, kolbe et al. (2014) showed that effective medical team members more frequently spoke up and aided assistance after implicit action coordination (i.e. team monitoring). such task-related helping interactions after team monitoring behavior seemed vital for high performance. for the team leader, communicating clear goals and a clear task distribution has proven to reduce the emotional reactions by team members, leading to an increase in performance in stressful situations (zaccaro et al., 2001; andersen, jensen, lippert, & østergaard, 2010; marsch et al., 2004). moreover, research in emergency command-and-control teams has shown the importance of leader structuring behavior, such as clarifying and summarizing (van der haar et al., 2017). during resuscitation, the use of closed-loop-communication is advocated to avoid errors (fernandez castelao et al., 2013). this clear, structured, and standardized form of communication (brindley & reynolds, 2011; härgestam, lindkvist, brulin, jacobsson, & hultin, 2013) consists of an initial message (call-out) by the team leader, which should be confirmed or acknowledged by the receiver (check back) and confirmed once again by the team leader (closing the loop) (davis et al., 2017; härgestam et al., 2013; jacobsson, hargestam, hultin, & brulin, 2012; schmutz, hoffmann, heimberg, & manser, 2015). hence, in general, various studies from different fields suggest that effective and less effective action teams differ in both the content and structure of their team interaction. to enhance the learning opportunities in simulation environments, it is important to delineate the effective team interaction processes that are required for high team performance (goldman, 2014). however, only a small but growing number of studies have investigated in detail how teams interact in the daily context of their work and how this contributes to achieving their goals (humphrey & aime, 2015). nowadays, technical and methodological advances allow us to capture team interactions more precisely (molenaar, 2014). one of the available, but to date less frequently used, devices is the sociometric badge (kim et al., 2012): a wearable that includes several types of technology, namely bluetooth, an infrared sensor, an accelerometer and a microphone. either when used in isolation or when combined with other sensors, the badges allow for fine-grained analysis of the structure of verbal team interactions. for example, when combined with sensors that capture electrodermal activity, the structure of team interactions in higharousal moments can be compared to moments of low arousal. 2.3 moments of high arousal: definitions and effects identifying moments of high arousal during a medical simulation session can inform us about what happens during moments when a person is not able to process the mental effort or cognitive load required or when the perceived demands of the environment exceed a person’s ability to cope with these demands (berntson & cacioppo, 2000; stemmler, 2004; boucheix, 2017; lazarus & folkman, 1984). for example, in the context of medical simulations, higher levels of arousal may ensue when time pressure forces an als team member to act quickly. it should be noted that moments of high arousal can also automatically occur due to positive events, such as positive workplace interaction or excitement (heaphy & dutton, 2008). hence, high arousal is not only attributable to distress, but also to excitement (russell, 1980). in other words, when physiological measures of arousal are used, information is obtained about the intensity of physiological arousal, but not about the psychological state or valence (e.g. distress or excitement) associated with it (e.g., akinola, 2010). however, even though no general inferences about valence can be drawn from the intensity of arousal (e.g., boucsein, 2012; larsen, diener, & lucas, 2002), a previous study on self-reported emotions during resuscitation performance has shown that negative emotions (stress or overload) are significantly higher during resuscitation, while positive emotions were highest before resuscitation, decreased during resuscitation, and increased again when the simulated patient was awake again (hunziker, laschinger, et al., 2011). therefore, in our study we interpret high levels of arousal as an indicator of feelings of distress. heart rate and skin conductance, physiological responses of the body, are examples of markers of autonomic activity of the nervous system, and concomitants of arousal (akinola, 2010; benedek & kaernbach, 2010). particularly during social interaction, skin conductance has been found to be the most sensitive indictor of emotional responsiveness or arousal, as opposed to the other physiological markers (marci, ham, moran, & orr, 2007). this physiological measure captures the intensity of emotions during interactions with others (akinola, 2010; figner & murphy, 2011). skin conductance is defined as variations in the eccrine sweat glands (i.e., sweat glands which are present in all bodily parts, with the highest density in the palms and soles (boucsein, 2012): in response to sweat secretion from the skin (e.g., benedek & kaernbach, 2010). the popularity of using skin conductance measures is due to its direct relation with a stimulus or emotional response, which enables us to capture moments of high arousal, as long as the temperature in the environment is kept constant (boucsein, 2012; lang, bradley, & cuthbert, 1998). previous studies have examined the effect of stress on als performance, but with conflicting results. on the one hand, hunziker, laschinger et al. (2011) and hunziker, semmer et al. (2012) found that perceived stress during early resuscitation negatively influenced als performance. in the latter study, also physiological measures as indicators for stress were used, but no association with team performance was found, possible due to the fact that the team members were engaged in physical activity what distorted the physiological measurements. in a similar vein, the study of sandroni, fenici, et al. (2005) did not find a direct relation between physiological stress measures and individual performance during the als scenario as measured by a written multiple-choice test. other studies which examined the relation between physiological measures of stress and performance in other medical simulation settings found a u-shaped association (also known as the yerkes-dodson law, cf. cohen, 2011): positive relationships between arousal and performance have been reported, whereas extreme levels of arousal were detrimental for performance (e.g., keitel et al., 2011; wetzel et al., 2010). to understand these conflicting findings, we suggest that exploring more in-depth how members interact during moments of high arousal is an interesting endeavor, as this might better explain performance than solely the level of arousal. 3 the present study in this paper, we focus on how we can use the combination of sociometric data, physiological data, and video data to explore if more effective teams – compared to less effective teams alter the content and structure of their team interactions during moments of high arousal. we expect that sensor technology will have added value in addition to self-report measures to objectively identify moments of high arousal and analyze the structure of the team interactions. as these measures are relatively new to the field of social science and specifically educational science, this exploratory study sets out to uncover what the added value, pitfalls and hurdles are when using and combining different sensor technologies with more traditional measures in order to better understand team interactions. to better understand the added value of sensor technology, we have translated the aim of this study into the following research question: how do more versus less effective medical action teams differ in content and structure of team interactions during scenario-based training, and how do their interactions differ during moments of high arousal versus outside these moments of high arousal? we will answer the research question by step by step testing the differences in content and structure of team interactions between 1) moments of high arousal and non-high arousal moments (without taking effectiveness into account); 2) more and less effective teams (without taking arousal into account); and 3) the combination of both: differences between high arousal versus non-high arousal moments separately for more and less effective teams. 4 methodology 4.1 participants and design all 95 first-year master’s students (comprising 24 teams) who enrolled in the course ‘advanced life support (als)’ in the master study program ‘technical medicine’ at university of twente were invited to participate in the study. ninety-two students gave written informed consent to participate in the study. the three students who did not give consent and their team members were excluded from the study, resulting in a data set of 21 four-person teams and one three-person team. to avoid an unequal situation for this three-person team, one person from another team was added to this team during the assessment. to ensure comparability across teams, this three-person team was left out from the analysis (n = 84). on average, the students were 22.4 years old (sd = 1.1) and 44% were male. during the als course, students learned to diagnose and manage a patient with cardiac arrest and perform cardiopulmonary resuscitation (cpr) on an advanced human patient simulator in scenarios of varying complexity. a multimethod design was adopted which included four different sources of data: (1) video coding of the content of team interactions, (2) sociometric measurement to capture the structure of team interactions, (3) the empatica e4-wristband to capture skin conductance/arousal, and (4) teacher surveys to assess team effectiveness. 4.2 procedure prior to data collection, the study was approved by the ethical committee of the university as well as by the teachers involved in the als course. during the first introductory lecture of the course, the students were informed about the study. the data were collected during the final assessment of the course. all teams were assessed on the same day. two rooms were used for the als simulation scenarios, both had a regulated temperature with 0.1-degree celsius temperature tolerance. the temperature for both rooms was kept constant at 20.5 degrees celsius. before the start of the scenario, all four students of each team were randomly assigned to one of the four fixed team roles: team leader, responsible for task distribution, monitoring team performance, creating an overview of the situation, and patient handover at the end of the scenario; medication nurse, responsible for drug administrations and connecting devices to the patient; and two cpr administrators, responsible for chest compressions and airway management. all students had practiced each role at least one time during the course, so they knew what was expected from them at the assessment. in the als context the physical activities that are required by team members in the role of medication nurse and cpr administrator (such as intubation or chest compressions) influence their physiological arousal (i.e., producing distorted or biased physiological measurement: berntson & cacioppo, 2000; stemmler, 2004). the team leader in the als situation is less physically active, so we decided to measure physiological arousal of the team leader to identify moments of high arousal. the researchers distributed a sociometric badge to all team members and secured the empatica e4-wristband on the non-dominant wrist of the student in the role of team leader. as hunziker, laschinger, et al. (2011) described in their paper on stress and team performance during a simulated resuscitation, simulated advanced life support (als) scenarios usually follow a specific pattern. similar to the hunziker study, the als scenarios started with a short briefing about the patient’s history by one of the teachers (maximum 90 seconds), immediately followed by a resuscitation period during which cpr is performed. the scenarios ended with a handover of the patient to another team or specialist. however, contrary to the hunziker study, in the majority of the scenarios in our study the patient was still in a critical condition when the scenario was completed; the patient could breathe on its own, but did not have a steady pulse or sinus rhythm. given that the patient still was in a critical condition, the patient handover at the end is a crucial phase in the scenario. the duration of the scenarios was on average 22.00 minutes (sd = 4.83). previous studies have shown that team behaviors and interactions change as the scenario progresses, with a stronger focus on leadership and coordination skills at the beginning of a scenario (tschan et al., 2006, 2014), and agreement on a shared diagnosis and treatment plan, as well as accurate handover at the end of the scenario. especially in assessment settings the end is perceived as a stressful part of the scenario (sandroni et al., 2005). therefore, although we recorded the complete scenarios, for the analysis we focused on the beginning and end of the scenario, which we defined as the first 16.7% minutes (1st time window) and last 16.7% minutes (2nd time window). the duration of these 16.7%-windows was on average 3 minutes and 40 seconds (sd = 49.6 s) and varied from 2 minutes and 24 seconds for the shortest video to 4 minutes and 52 seconds for the longest video. 4.3 instruments 4.3.1 sociometric badges sociometric badges were used to assess the structure of team interactions (kim et al., 2012). sociometric badges are wearables distributed by humanyze (a spin-out of the mit media lab) that measure proximity, body movement and speech features using respectively bluetooth, infrared, an accelerometer and microphones. data were uploaded from the badges using sociometric datalab research edition (version 3.1.3029) and subsequently exported to excel files using the ‘structured meeting’ setting which disregards the bluetooth and infrared data in order to provide a dataset in which each team member is assumed to be in each other’s proximity during the entire session, which was the case in our setting. also, the setting ‘noisy environment’ was used, which filters out additional environment noise, such as the beeping of the heart monitor or the speech from teachers not wearing a badge. the exported microphone data shows per second whether a participant was talking or silent. from this data, the following four metrics were calculated at the team level: 1) the proportion of the time that one or more of the team members was speaking (i.e., proportion of speaking time), 2) the proportion of the speaking time when at least two team members were speaking at the same time (i.e., proportion of overlapping speech), and 3) the distribution of how each team member contributed to the overall speaking time (i.e., conversational imbalance in speech). the conversational imbalance was calculated as the standard deviation of the amount of time that each team member spoke (regardless of whether another member was speaking at the same time), corrected for the duration of the relevant time window. the higher this standard deviation, the greater the variation in speech among members, that is, the greater the conversational imbalance. in addition, also 4) the proportion of speaking time of the team leader was calculated, measured as the proportion of the time the team leader was speaking, regardless of whether another member was speaking at the same time. all measures were calculated using r (r core team, 2014). 4.3.2 empatica e4-wristband the empatica e4 (hereafter referred to as e4), a relatively unobtrusive wristband, including 8 mm silver-plated electrodes, was used for the skin conductance recording. continuous measures of physiological arousal at a sample rate of 4 hz were collected using this device. the wristband was placed on the wrist of the team leader’s non-dominant hand. continuous decomposition analysis was used to get the relevant phasic electrodermal activity parameters : skin conductance responses (i.e., the number of peaks for certain periods of time which represents the fast-varying phasic component of skin conductance; the amplitude threshold for the extraction of skin conductance responses was .01 micro siemens (µs)) and amplitude of the skin conductance responses (i.e., the height of a single skin conductance response). using this analysis, the classical trough-to-peak parameters such as number of skin conductance responses and amplitude of the skin conductance responses can be obtained (benedek & kaernbach, 2010). to detect the moments of high arousal, skin conductance amplitudes for each individual team leader were computed (hamaker, 2012; wiesenfeld, whitman, & malatesta, 1984). information about peak amplitude of individual skin conductance responses can be used to detect moments of high arousal (bach, flandin, friston, & dolan, 2009). the highest skin conductance amplitude in the beginning and end of the scenarios was determined for each team leader. on the basis of the highest skin conductance amplitude a single segment of 30 seconds was selected both in the beginning and the ending of the scenario. this resulted in two 30-second segments for each team: one in the beginning (first 16.7%) and one in the end (last 16.7%) of the scenario . the 30-second segment was based on the thin-slices theory of social interaction (curhan & pentland, 2007). because several studies have reported an average delay of one to four seconds between a stimulus and a skin conductance response (e.g., dawson et al., 2007; weis & herbert, 2017) we used a segment starting 5 seconds before and 25 seconds after the detected highest amplitude. the highest amplitude peak (in micro siemens, µs) of each team leader in both the beginning (m = 2.34, sd = 2.67) and end (m = 3.63, sd = 3.26) was compared with the mean amplitude of the beginning (m = 1.65, sd = 2.01) and end (m = 2.76, sd = 2.65). this indicated that both in the beginning (t (15) = 3.19, p < .01) and in the end (t (15) = 4.67, p < .01) the highest amplitude peak was significantly higher than the mean amplitude in that segment. 4.3.3 video the content of the team interactions was measured via video recordings. the video cameras were ceiling-mounted, fixed cameras that minimized obtrusiveness and reactivity of the team members. the content of the team interactions was coded for the first and last 16.7% minutes of each scenario. the codebook was specifically designed to capture behaviors that occur frequently in action teams and is rooted in earlier theorizing on teams and action teams (lei et al., 2016; stachowski et al., 2009; zijlstra et al., 2012). additionally, on basis of the theory described in the theoretical framework, two items were added in order to code closed-loop-communication, more specifically: (1) check-back (by a team member), and (2) closing the loop (by the team leader); see table 1 for an overview of definitions and examples of the coded behaviors. a distinction was made between behaviors of the team leader, and of the other team members (followers). exhaustive coding was applied, meaning that all behaviors were coded and a non-observable category was used for behaviors that were not understandable or not relevant. the “observer xt” software program (noldus, trienes, hendriksen, jansen, & jansen, 2000; spiers, 2004) was used to code the videos. one of the coders parsed all sessions, that is: segmented the videos in speaker utterances (klonek, burba, kauffeld, & quera, 2016), using as a unit of analysis “a sentence or part of a compound sentence that can be regarded as meaningful in itself, regardless of the meaning of the coding categories” (strijbos, martens, prins, & jochems, 2006, p. 37). subsequently, all segmented scenarios were systematically coded by two independent, trained coders, who were not informed about the moments of high arousal. overall, an inter-rater agreement of 83.1% (cohen’s kappa = .80; cohen, 1960) was established. after coding the videos, the behavioral codes “not observable”, “external communication”, and the infrequent behaviors “laugh” and “apologies”, were merged into one category “external / other behavior”. the frequencies of the behaviors of a team are highly influenced by the total duration of that video. therefore, all coded behaviors were standardized according to the shortest video using the following formula: standardized frequency of a certain behavior of team x = coded frequency of the behavior of team x * (duration of the shortest video / duration of video team x). this resulted in a time-standardized behavior which enabled direct comparisons of the frequencies of the team members’ behaviors across the different teams. in addition, to enable comparison of the frequencies between the 30-second segment and the rest of the time window, the percentages of every leader and follower behavior were calculated relatively to all of the leader or follower behavior in the relevant time interval. table 1 examples of coded video behaviors. note. tl = team leader; f = follower. * after coding, these behaviors were merged into the category external / other behavior. 4.3.4 teacher ratings of performance to assess team effectiveness, four team effectiveness items by gibson, cooper, and conger (2009) were used: “this team is a consistently well performing team”, “this team is effective”, “this team makes few mistakes”, and “this team delivers high quality work”. the items were directly scored after the scenario on a likert scale from 1 (strongly disagree) to 7 (strongly agree) by two teachers. each teacher scored twelve teams. the internal consistency of the scale was high: cronbach’s alpha was .97. the teams were categorized as ‘more’ or ‘less’ effective on the basis of a median split, which was 5.75. the 10 less effective teams had a mean score of 4.28 (sd= .98), as rated by the teachers (on a scale of 1 to 7); the 12 more effective teams, including those with the median score, scored on average 6.23 (sd = .52) on team effectiveness. 4.4 analysis 4.4.1 selection of the data and dealing with missing data all recorded data were checked for missing values. video recordings and skin conductance data of all 21 teams were successfully obtained. mechanical issues with five sociometric badges resulted in the loss of the data of 20 participants among 13 different teams. this resulted in a total of 18 teams from which sociometric data of the team leader was intact, and nine teams from which complete data of the sociometric badges was available of all team members. the demographic data from these subsamples was comparable with the reported data from the 21 teams; no significant differences were found. 4.4.2 synchronization of the data to combine the data from the e4, sociometric badges and video observations, the data had to be synchronized. coded video behaviors, sociometric data and skin conductance amplitude data were synchronized on the basis of a mutual timeline. the internal clocks in the e4, sociometric badges, and video recording devices were used for synchronization purposes employing custom python code (represented by unix time: number of seconds from 1-1-1970 in coordinated universal time: utc). the data sources could only be synchronized if the clock time of the e4 biosensor and the timestamp of the video recording device were exactly aligned. on the basis of the python code, the clock times of the e4, video, and sociometric badges could not be matched for five teams. possibly, the clock time of the e4 was not synchronized (e.g., with a computer or laptop that had an accurate clock time); therefore, differences in the clock times of the video and sociometric badge on the one hand and the e4 on the other hand might be present. however, it should be noted that it is difficult to pinpoint the exact cause of the synchronization issue. due to these synchronization issues and the earlier mentioned malfunctioning of five sociometric badges, a total of 16 teams were available for further analysis of the video data combined with the skin conductance amplitude data; 13 teams from which the sociometric data of the team leader could be matched to the skin conductance measures and video data, and seven teams from which the sociometric data was available of all team members and could be linked to the skin conductance measures and video data. 4.4.3 analysis of relations between variables to answer the first research question of whether content and structure of team interactions were different for moments of high arousal versus non-high arousal, a series of repeated measures manovas were conducted. for each time window (beginning and end), a separate repeated measures manova was conducted for the dependent variables describing the content of team interaction (team leader and follower behavior) and the variables describing the structure (proportion speaking time, proportion overlapping speech, conversational imbalance). in addition, a dependent sample t-test was conducted to test for differences in proportion speaking time of the team leader. this last variable could not be grouped into one of the manovas due to a different sample size. in all of these analyses the level of arousal (high versus non-high) was used as the within-subject variable. the second research question about the differences between more and less effective teams was also answered with a series of manovas and independent sample t-tests with the same dependent variables, but this time the level of effectiveness (more versus less) was used as the between-subject variable. to answer the third research question, the data was split into two groups (more versus less effective teams) to explore differences between those two groups in team interaction during moments of high arousal and non-high arousal. due to the low sample size, we had to refrain from conducting manovas as this resulted in not enough residual degrees of freedom. therefore, separate independent t-tests were conducted to test for differences in structure and content of team interaction. 5. results 5.1 comparison of structure and content of team interactions between high arousal moments and rest of the time window the results of the comparison of structure and content of team interactions between high arousal moments and rest of the time window is displayed in table 2. as can be seen in table 2, team members were speaking for approximately half of the time, and during the end of the scenario almost two third of the time. the team leader was speaking for roughly a quarter of the time. the repeated measures manovas and dependent t-test showed no differences for the structure of team interactions on team level, both for the first (f(12, 1) = .54, p = .667) and second time window (f(12, 1) = .22, p = .878). also, no differences were found in proportion of speaking time of the leader (t (12) = .41, p = .686 first time window, t (12) = 1.08, p = .303 second time window). regarding the content of the team interaction, in the beginning of the scenario, team leader communication is characterized by many commands, while in the end of the scenario, when teams have to reach a diagnosis and have to handover the patient, there is more external communication. the repeated measures manova showed a significant effect of arousal on the content of team interaction, f(10, 6) = 17.72, p = .001 (first time window) and f(10, 6) = 4.30, p = .044 (second time window). further inspection of the univariate anovas showed significant differences between high arousal moments and the rest of the time window in the first time window for the team leader behaviors questioning (f(1, 15) = 45.52, p < .001) and opinion (f(1, 15) = 8.32, p = .011). this demonstrated that, on average, team leaders showed relatively less questioning (m = 0.0%) and opinion (m = 0.0%) behavior during moments of high arousal than during the rest of the first time window (m = 3.3% and m = 0.8%). for the second time window, the univariate anovas showed significant differences for suggestion and opinion, with f(1, 15) = 10.12, p = .006 and f(1, 15) = 6.96, p = .019 respectively. this indicated that, on average, team leaders showed relatively less suggesting (m = 2.8%) and opinion (m = 0.0%) behavior during moments of high arousal compared to the rest of the second time window (m = 8.1% and m = 1.9%). table 2 comparison in structure and content of team interactions between moments of high arousal and outside these moments. a standardized frequencies (see method section). * p < .05. **p < .01. 5.2 comparison in structure and content of team interactions between more and less effective teams to determine whether the structure and content of interactions differed between more and less effective teams, manovas (see table 3) were conducted to compare differences for the beginning (first time window) and the end (second time window). for both time windows, the manovas did not show significant effects of team effectiveness on the structure of team interactions (f(5, 1) = 2.67, p = .220 and f(5, 1) = 8.94, p = .052 respectively). however, as the analysis of the second time window was approaching significance, we further inspected the outcomes of the univariate anovas, which showed significant differences in the conversational imbalance between the more and less effective teams. both in the beginning (f(1, 5) = 11.94, p = .018) and in the end (f(1, 5) = 11.17, p = .020), the more effective teams showed greater imbalance (respectively m = 0.12 and m = 0.13) than the less effective teams (m = 0.06 for both time windows). in other words, in more effective teams one person was more dominant in terms of speaking time, while in less effective teams the team members contributed more equally; see table 3. for content of team interactions two manovas were conducted that showed no significant effect of team effectiveness on content of team interactions for both time windows (f(10, 5) = .58, p = .785 and f(10, 5) = .52, p = .822 for the first and second time window respectively). separate univariate anovas confirmed no significant effects table 3. comparison in structure and content of team interactions between more and less effective teams. a standardized frequencies (see method section). bn = 10 for more effective teams and n = 6 for less effective teams. cn = 3 for more effective teams and n = 4 for less effective teams. dn = 9 for more effective teams and n = 4 for less effective teams. * p < .05. **p < .01. 5.3 comparison between moments of high arousal and rest of the time window separately for more and less effective teams in addition to the overall differences between more and less effective teams, we were also interested to see whether more and less effective teams were different in how they changed their content and structure of the team interaction between moments of high arousal and outside these moments. therefore, separately for the less and more effective teams, we tested what the differences were between the 30-second segment and the rest of the corresponding time window using dependent t-tests. on average, more effective teams (see table 4a) were less imbalanced during moments of high arousal (m = .09) than during the rest of the second time window (m = .14) (t (2) = 5.6, p = .030). regarding the content of team interaction, in the more effective teams, team leader behavior was characterized by more commands, both during moments of high arousal (54%), and in the rest of the time window (m = 30.7%). during the moment of high arousal, team leaders showed relatively less confirmation (m = 1.7%), less questioning (m = 0.0%), less opinion (m = 0.0%) and less closing the loop (m = 3.1%) than during the rest of the first time window (respectively m = 9.5%, m = 3.0% and m = 0.8% and m = 7.2%; t (9) = -4.77, p = .001, t (9) = -5.17, p = .001 and t (9) = -2.33, p = .045 and t(9) = -4.08, p = .020). in the end of the scenario, the moment of high arousal was characterized by relatively less summary behavior (m = 1.4%) when compared to the rest of the second time window (m = 5.3%), t (9) = -3.36, p = .008. also, for less effective teams (see table 4b) some differences were found regarding the structure and content of team interaction during the high arousal moments, compared to rest of the corresponding time window. contrary to the more effective teams, we found that less effective teams, on average, were more imbalanced during moments of high arousal (m = .11) than during the rest of the window (m = .06), t (3) = 4.1, p = .026); yet, this difference was only significant for the first time window. moreover, in the less effective teams, team leader behavior was characterized by a high percentage of commands (m = 32.0% and m = 30.2%). in addition, in the beginning of the scenario, team leaders of less effective teams showed relatively less questioning (m = 0.0%) and inquiry (m = 0.0%) during the moments of high arousal compared to the rest of the first time window (respectively m = 3.7% and m = 3.3%; t (5) = -4.19, p = .009 and t (5) = -2.92, p = .033). in the end of the scenario, during the moment of high arousal also relatively less inquiry behavior (m = 0.0%) and less suggesting behavior (m = 3.3%) was exhibited compared to the rest of the second time window (respectively m = 5.9% and m = 10.4%; t (5) = -3.40, p = .019, and t (5) = -3.21, p = .024). in addition, a difference in follower behavior was found; in less effective teams; the followers showed relatively less check back behavior during moments of high arousal than during the rest of the time window (m = 15.6% vs. m = 25.4%, t (5) = 2.59, p = .049). table 4a. comparison in structure and content of team interactions between moments of high arousal and outside these moments within more effective teams. astandardised frequencies (see method section). *p < .05. **p < .01. table 4b. comparison in structure and content of team interactions between moments of high arousal and outside these moments within less effective teams. astandardised frequencies (see method section). *p < .05. **p < .01 6. discussion in our study, we combined video data and ratings of team effectiveness with skin conductance measures (empatica e4-wristband), and measures of speech features (sociometric badges) to analyze the structure and content of team interactions in medical action teams. the context of our study, a medical simulation room, was highly relevant for the aim of our study as this enabled us to closely observe medical action teams during a simulated crisis situation (als for a patient with cardiac arrest). by studying the differences between more and less effective teams in the content and structure of their interactions, both during moments of high arousal and outside these moments, we were not only able to add to the understanding of effective team interactions during crisis situations, but also of the added value of combining these various sensors with traditional data. in the following paragraphs, we discuss (1) the insights from the study on structure and content of team interactions of action teams during stressful moments, (2) the added value and challenges of combining information from various sensors to contribute to a more fine-grained understanding of complex team interactions, (3) the limitations of our study and (4) future directions for research using sensor technology to study team interactions. 6.1 discussion of findings understanding team interactions in als teams is crucial because als teams “need to be organized in such a way that the individual skills of the team members can be used efficiently and effectively” (cooper & wakelam, 1999; p. 27). each team member needs a clear understanding of “how decisions are made within the group; what resources are needed and how they are to be utilized; how leadership is exercised; and how staff new to the situation are integrated into the group.” (cooper & wakelam, 1999; p. 27). we analyzed the structure and content of team interactions of medical action teams during a simulated crisis situation and compared team interaction at moments of high arousal with team interaction outside these high arousal moments; as well as differences in team interaction between more and less effective teams. first, when we compared team interaction during moments of high arousal with team interaction outside these moments without taking effectiveness of teams into account, we found differences in the content of the team interaction only. in this specific context, it turned out that the team leader gave no opinions during moments of high arousal, both at the beginning and end of the scenario. moreover, the team leader asked no questions (in the beginning of the scenario) and made less suggestions (in the end). this is in line with previous research stating that during moments of crises fast coordination and clear decision making is needed (tschan et al., 2006). in other words, there is no room for behavior that needs further interpretation or clarification from team members. however, no significant differences were found that indicated which behavior was more frequently present during the moments of high arousal. second, we inspected the differences between more and less effective teams without distinguishing between the moments of high arousal and the rest of the corresponding time windows. overall, no main effects of effectiveness were found on the content and structure of team interactions. of course, we have to consider that we had a very small sample size and thus too low power to detect small differences. the only difference that we found was in the conversational imbalance; more effective teams showed greater imbalance than the less effective teams, both at the beginning and the end of the scenario. in other words, in more effective teams, one person was more dominant in terms of speaking time, while in less effective teams the speaking time was more equally distributed. although equal member contribution is generally seen as positive for high team effectiveness as all opinions can be taken into account (dimicco et al., 2007) this is different for medical action teams during cardiac arrest where swift decisions and clear and effective coordination of the team leader are necessary (andersen et al., 2010; marsch et al., 2004). our study points into the same direction, namely that a greater contribution of one person (greater imbalance) contributes to team performance. however, in extreme contexts effective leaders are receptive to the input of team members (hannah, uhl-bien, avolio, & cavarretta, 2009) and team members of effective medical teams more frequently speak up and provide help (kolbe et al., 2014). therefore, dominancy of the team leader should not be interpreted as no room for other team members to contribute. finally, we analyzed separately for more and less effective teams if there were differences in structure and content of team interaction when comparing moments of high arousal with the remaining time in the begin and end of the scenario. this revealed an interesting regarding the structure of the team interaction: when we did not split up between moments of high arousal and the rest of the corresponding time window, we saw more conversational imbalance for more effective teams. however, after splitting up between moments of high arousal and the remaining time of the window, it became clear that both more and less effective teams show differences in conversational imbalance during moments of high arousal, but in opposite direction. the four less effective teams showed more imbalance, i.e. greater dominancy of one person during moments of high arousal, and were more balanced outside these moments (in the beginning of the scenario). on the contrary, in the three more effective teams team members contributed more equally during the high arousal moment (in the end of the scenario), and they were more imbalanced in their contributions during the rest of the time window. an interesting hypothesis for future research is to test whether a certain degree of conversational imbalance indeed contributes to higher team performance in action teams, while more equal contributions are needed when the team leader gets into a state of high arousal. to better understand these outcomes, it is worth to have a look at the differences in the content of interaction. first, table 4a showed that during moments of high arousal more than half of the team leader communication of the more effective teams consisted of commands. in addition, significant differences were found in other behaviors that occurred less frequently during the moments of high arousal: conformation, questioning, opinion, and closing the loop. for less effective teams (table 4b), it seems that team leader behavior during moments of high arousal was more similar to their behavior outside these moments. differences in team leader behavior in less effective teams were only found in the behaviors inquiry and questioning. in addition, the less effective teams showed differences in follower behavior, namely less check back behavior, which is a crucial step in closed-loop-communication to confirm an initial message from the team leader and to avoid mistakes and misunderstandings (davis et al., 2017; härgestam et al., 2013; jacobsson et al., 2012; schmutz et al., 2015). it could thus be the case that the finding that the team leader is more dominant during stressful moments in less effective teams, is related to less check back behavior from their followers: the team leader has to repeat or rephrase often because the followers do not confirm (check back) his or her commands which leads to a relatively greater contribution of the team leader. again, we have to take the low sample size into account here while interpreting the findings, but what our results in general show is that a more fine-grained analysis of high arousal moments of a scenario enhances our understanding of what is effective team interaction during stressful moments. in addition, simultaneously exploring the structure and the content of the interactions can provide a more in-depth understanding of what constitutes effective team behavior. 6.2 added value and challenges of using sensor technology to gain insights in team interactions 6.2.1 added value and challenges of using the empatica e4-wristband in our study, we used the e4 to detect of moments of high physiological arousal that would not be observable with other methods or measures (boucheix, 2017). unveiling the highest level of physiological activation is shown here to lead to advanced insights, as the structure and content of team interaction that was displayed during these moments somewhat differed from what teams displayed outside these high arousal segments. verbal reports or perceptual recall from participants about when they experienced higher stress or arousal are often distorted and do not accurately reflect the actual stressful moment (cf. hunziker, laschinger, et al., 2011). it should be noted that our results have to be interpreted with caution. first, when collecting data in the field (and not in a controlled laboratory environment) increases or fluctuations in skin conductance might be caused not only by higher mental effort, but also by general arousal or body movements (berntson & cacioppo, 2000; cacioppo & tassinary, 1990; stemmler, 2004). this means that the assumed link between physiological arousal and psychological, behavioral or interaction processes requires careful interpretation (akinola, 2010). as mentioned in the theoretical framework, the e4 captures the level of physiological states of arousal, but does not distinguish between excitement and distress. although other studies using self-reported measures of stress (hunziker, laschinger, et al., 2011) indicate that during als especially the negative emotions are high and positive emotions are low (which is why we interpreted in our study high arousal as distress), additional validation measures would be preferable. also, in an als context only the team leader’s skin conductance can be validly measured, as other members would produce distorted or biased physiological measurement because of their activity concerning the compressions and administration of drugs. in order to understand how this affects the team as a whole, more research is needed to study the physiological concordance: to what extent are the physiological processes aligned and how does that influence the dynamics within the team (marci et al., 2007)? finally, even when we interpret arousal as stress, the trigger for higher levels of arousal is still unknown. in the beginning of the scenarios high arousal might be caused by the need to respond quickly to an overload of information, the diagnostic uncertainty and rapidly evolving situations (doumouras et al., 2012; hunziker, johansson, et al., 2011), while at the end of the scenario the team leaders might experience higher arousal because they have to handover the patient and decide on a final diagnosis. both triggers could give rise to the same amount of arousal, but might result in different kinds of team leader behavior. 6.2.2 added value and challenges of using sociometric badges the sociometric badges provided further insight into team interaction processes during these high-arousal moments that adds to the behavioral aspects shown by the video coding. our results showcase how the differences found between the behavior of the more and less effective teams are enriched by the information from the sociometric badges: a difference in their conversational balance became visible. this highlights the importance of also exploring the structure of the verbal interactions, besides looking at the content of the interactions. the sociometric badges have been proven to be reliable and accurate enough to study high-level team interactions, such as participation in conversations and total speaking time (chen, & miller, 2017). however, in our study, we experienced that the hardware in the sociometric badge or e4 might not always function well, resulting in missing data. using equipment from a specific manufacturer also means that you are dependent on a commercial party whenever the equipment is malfunctioning. despite extensive testing, it appeared that the hardware was not functioning properly and technical support was lacking as the researchers support platform for the sociometric badges was discontinued. especially when collecting data from teams with the sociometric badges, the malfunctioning of one badge obstructs the computation of all team interaction dynamics (e.g., one participant could have a great influence on conversational balance). in this study, the failure of two badges resulted in the loss of data for almost half of the teams: all badges were used continuously during the day, meaning that each badge was used approximately five times that day. as downloading the data takes a substantial amount of time, it was not possible to do this during the experiment. as a consequence, it was discovered only afterwards that two badges malfunctioned, resulting in the loss of data for ten participants. in addition, in this specific study the interpretation of the sociometric measures is still on a superficial level: we do have information about the proportion of overlapping speaking time, but we do not know the specifics. for example, we might know that participants spoke more at the same time (overlap), but not whether this consisted of a lot of brief interruptions (e.g., a confirmatory 'yes') or fewer long interruptions (which might be experienced as more disrupting than brief confirmations). more advanced algorithms could result in additional measures that provide more insight into the effects we found, for example, in relation to the length and number of interruptions. 6.2.3 added value and challenges of combining sensor technology with traditional measures although this triangulation of data sources to capture team dynamics in simulation environments results in rich data, there are several challenges that should be noted when adopting such a research design. first, a great benefit of combining video, sociometric and physiological data is that it offers continuous measures of behavior, interaction and physiological intensity on a temporal scale. using this sensory triangulation enables the use of physiological arousal as a process-tracing method (figner & murphy, 2011). this means that it can provide information about behavioral processes, such as decision making, because such physiological data can be measured and collected continuously. at the same time, we experienced the difficulty of synchronizing the data. by using multiple sensors in combination with traditional measures, the risk that one device is malfunctioning or does not align with one of the other devices is substantial, resulting in a much smaller number of teams due to missing data. for future research, we recommend to always have a back-up procedure for this. for both the e4 and the sociometric badges it is possible to use behavioral markers: you push a button on the device and a time-stamp is saved to the data. when you do this in front of the camera you can always synchronize the video with the sensor device. for our study, it was not possible to include this additional step to the procedure, as we were collecting data in an assessment situation in which we already asked a lot from the participants. although our study shows the potential of combining sensor technologies and the added value for team learning research, further research is necessary to validate and ground these methods. each of the team interaction measures that are used in this study, whether observational, self-reported or technology-based, has its limitations. conflicting information about team interaction from these measures needs to be explained and sources of discrepancies need to be understood. this would require not only validation studies, but also transfer studies from the simulation environment to the field. 6.3 limitations and future research next to the problems and limitations that were related to the technology that was adopted, our study design had some limitations that we also have to acknowledge. to begin with, the small sample size (n = 22 teams) and even smaller sample size for the sociometric data resulted in limitations regarding the statistical power of the analyses and generalizability of our findings. moreover, due to insufficient residual degrees of freedom, we were not able to perform manovas for the third research question, resulting in a series of independent t-tests and thus an increased risk of type i errors. consequently, this research has a more exploratory character, which is why we interpreted the results with caution. in the future, we recommend to conduct similar studies with a bigger sample size. in addition to the low sample size, the observed frequency of the video coded behaviors was sometimes very low. this was due to the fact that we decided to zoom in on the beginning and end of the scenario instead of engaging in the time-consuming process of coding the whole scenario. within these time windows of three to four minutes, some of the behaviors were hardly present and when we further zoomed in on the 30-second segments of high arousal, behaviors became even more infrequent. one could also question how many different behaviors one can display in only a 30-second segment. we therefore recommend for future studies to study multiple 30-second segments and longer time windows to allow for more fair comparisons. furthermore, it is known that stress is a complex phenomenon which is difficult to measure (boucsein, 2012). in the present study we chose to include a physiological measure. as described above, these outcomes should be carefully interpreted, as eustress and distress produce similar physiological results. in addition, another limitation of our study is that in the context in which students were assessed it was not possible to obtain a baseline measure. therefore, we could not compare teams on the team leader’s level of arousal; only within person comparisons could be made where peaks in individual skin conductance data were identified in order to pinpoint moments of relative high arousal of a team leader. therefore, future research is advised to measure skin conductance for longer periods and to obtain a baseline measurement to improve the quality of results. in addition, depending on the context of the study, it might be worthwhile for future research to explore the option of measuring the physiological data on the glabrous palmar or plantar surfaces, as this is more reliable and valid (but also more invasive) (boucsein, 2012). studies are available that provide insight into the differences between wristbands compared to palmar measures of skin conductance (e.g., van lier et al., 2017). despite all experienced hurdles and limitations, our study strengthened our idea that wearable sensor technology has the potential to advance insights in team research. sensor technology has the potential to provide objective and unobtrusive measures of complex behavioral and physiological processes as teams do not have to be interrupted while performing their task, which would disrupt their processes. in our study, the participants indicated that wearing the sociometric badges and e4 did not distract them from performing their tasks. wearable sensor measures of interactions are especially promising when linked to psychological or team level constructs such as leadership emergence (chaffin, heidl, hollenbeck, howe, voorhees, & calatone, 2017) and in studying continuous streams of longitudinal data (mathieu, hollenbeck, van knippenberg, & ilgen, 2017). physiological measures of arousal can help to more objectively select stressful moments, which is highly relevant when studying action teams during crisis situations. however, applying these new methods is less straightforward and brings more challenges than is often suggested by manufacturers. first, in order to optimally use these methods, it is important to familiarize with this type of big data, which also adds computational complexity that requires specific expertise in data cleaning and dealing with noise (van keulen, kaminski, matheia, & katoen, 2018). second, in order to apply sensors in a specific situation, many pre-studies are needed to test the reliability of the measures (cf. chaffin, heidl, hollenbeck, howe, voorhees, & calatone, 2017; de laat, endedijk, ufkes, van keulen, & de vries, 2017). ultimately, if one manages to extract meaningful data from the sensors, a final question is how to integrate these data with more traditional measures. as fielding (2012) concludes in his analysis of how methods – including technological data can be mixed, integrating data sources is an innovation in itself and should be treated as such. in other words, although time saving is often advocated as one of the benefits of using sensor technology, we still have a long road to go before this will become reality. 7 conclusion effective team interaction is vital in medical situations and in medical learning and education. a combination of video-observational, sociometric and physiological data can enhance our understanding of the complex behavioral and interaction processes underlying effective team performance and provides alternative learning methods that can be used in the design of trainings and education of medical professionals. technological advances together with the availability of more knowledge about the simultaneous application of such methods are needed to use the full potential of wearable sensor technology in team research and overcome current teething troubles. outcomes of this and future studies might enable future medical professionals to better understand what is required at stressful moments. using these results in the training and during debriefing sessions can potentially optimize team interactions of future medical professionals and enhance the quality of medical care. keypoints more effective teams show greater conversational imbalance than less effective teams, but not during moments of high arousal. both team leaders and followers show some changes in the content of the team interaction during moments of high arousal. more knowledge about the simultaneous application of wearable sensor technologies is needed to use the full potential of these methods in team research. information from sensor technology can in the future be used during debriefing sessions to improve medical training and simulations references akinola, m. (2010). measuring the pulse of an organization: integrating physiological measures into the organizational scholar's toolbox. research in organizational behavior, 30, 203-223. doi:10.1016/j.riob.2010.09.003 andersen, p. o., jensen, m. k., lippert, a., & østergaard, d. (2010). identifying non-technical skills and barriers for improvement of teamwork in cardiac arrest teams. resuscitation, 81(6), 695-702. doi:https://doi.org/10.1016/j.resuscitation.2010.01.024 arnstein, f. (1997). catalogue of human error. british journal of anaesthesia, 79(5), 645-656. doi:10.1093/bja/79.5.645 atwal, a., & caldwell, k. (2005). do all health and social care professionals interact equally: a study of interactions in multidisciplinary teams in the united kingdom. scandinavian journal of caring sciences, 19(3), 268-273. doi:10.1111/j.1471-6712.2005.00338.x bach, d. r., flandin, g., friston, k. j., dolan, r. j. (2009). time-series analysis for rapid event-related skin conductance responses . journal of neuroscience methods, 184(2), 224-234. doi:10.1016/j.jneumeth.2009.08.005. bach dr, flandin g, friston kj, dolan rj. time-series analysis for rapid event-related skin conductance responses. journal of neuroscience methods. 2009;184(2):224-234. doi:10.1016/j.jneumeth.2009.08.005. benedek, m., & kaernbach, c. (2010). a continuous measure of phasic electrodermal activity. journal of neuroscience methods, 190(1), 80-91. doi:10.1016/j.jneumeth.2010.04.028 berntson, g. g., cacioppo, j. t. (2000). from homeostasis to allodynamic regulation. in cacioppo, j. t., tassinary, l. g., berntson, g. (eds.), handbook of psychophysiology (2nd ed., pp. 459–481). new york: cambridge university press. boucheix, j.-m. (2017). the interplay between methodologies, tasks and visualisation formats in the study of visual expertise. frontline learning research, 5(3), 155-166. doi:10.14786/flr.v5i3.311 boucsein, w. (2012). electrodermal activity (2nd. ed.). new york, ny: springer science & business media. brindley, p. g., & reynolds, s. f. (2011). improving verbal communication in critical care medicine. journal of critical care, 26, 155-159. doi:10.1016/j.jcrc.2011.03.004 cacioppo, j. t., & tassinary, l. g. (1990). inferring psychological significance from physiological signals. american psychologist, 45 (1), 16-28. doi:10.1037/0003-066x.45.1.16 chaffin, d., heidl, r., hollenbeck, j. r., howe, m., yu, a., voorhees, c., & calantone, r. (2017). the promise and perils of wearable sensors in organizational research. organizational research methods, 20(1), 3-31. chen, h. e., & miller, s. r. (2017). can wearable sensors be used to capture engineering design team interactions? an investigation into the reliability of sociometric badges. asme 2017 international design engineering technical conferences and computers and information in engineering conference. 7 . doi:10.1115/detc2017-68183. christopoulos, g. i., uy, m. a., & yap, w. j. (2016). the body and the brain: measuring skin conductance responses to understand the emotional experience. organizational research methods, 1094428116681073. doi:10.1177/1094428116681073 cohen, j. (1960). a coefficient of agreement for nominal scales. educational and psychological measurement, 20(1), 37-46. doi:10.1177/001316446002000104 cohen, r. a. (2011). yerkes–dodson law. in encyclopedia of clinical neuropsychology (pp. 2737-2738). springer, new york, ny. cooke, m., irby, d. m., & o'brien, b. c. (2010). educating physicians: a call for reform of medical school and residency . san francisco, ca: jossey-bass. cooper, s., & wakelam, a. (1999). leadership of resuscitation teams: ‘lighthouse leadership’. resuscitation, 42(1), 27-45. doi:10.1016/s0300-9572(99)00080-5 curhan, j. r., & pentland, a. (2007). thin slices of negotiation: predicting outcomes from conversational dynamics within the first 5 minutes. journal of applied psychology, 92(3), 802-811. doi:10.1037/0021-9010.92.3.802 davis, w. a., jones, s., crowell-kuhnberg, a. m., o’keeffe, d., boyle, k. m., klainer, s. b., … yule, s. (2017). operative team communication during simulated emergencies: too busy to respond? surgery, 161(5), 1348-1356. doi:http://doi.org/10.1016/j.surg.2016.09.027 dawson, m. e., schell, a. m., & filion, d. l. (2007). the electrodermal system. in j. t. cacioppo, l. g. tassinary, & g. g. berntson (eds.), handbook of psychophysiology (3rd ed., pp. 159-181). new york: cambridge university press. de laat, s., endedijk, m. d., ufkes, e. g., van keulen, m., & de vries, r. (2017, 24 november). real-time measures of social interaction as predictors for team effectiveness . paper presented at the waop conference 2017, nijmegen, the netherlands. dias, r. d., & neto, a. s. (2016). stress levels during emergency care: a comparison between reality and simulated scenarios. journal of critical care, 33, 8-13. doi:10.1016/j.jcrc.2016.02.010 dimicco, j. m., hollenbach, k. j., pandolfo, a., & bender, w. (2007). the impact of increased awareness while face-to-face. human-computer interaction, 22(1), 47-96 doumouras, a. g., keshet, i., nathens, a. b., ahmed, n., & hicks, c. m. (2012). a crisis of faith? a review of simulation in teaching team-based, crisis management skills to surgical trainees. journal of surgical education, 69(3), 274-281. doi:10.1016/j.jsurg.2011.11.004 edmondson, a. c. (2003). speaking up in the operating room: how team leaders promote learning in interdisciplinary action teams. journal of management studies, 40(6), 1419-1452. doi: 10.1111/1467-6486.00386 entin, e. e., & serfaty, d. (1999). adaptive team coordination. human factors, 41(2), 312-325. doi:10.1518/001872099779591196 fanning, r. m., & gaba, d. m. (2007). the role of debriefing in simulation-based learning. simulation in healthcare, 2(2), 115-125. doi:10.1097/sih.0b013e3180315539 fernandez castelao, e., russo, s. g., riethmüller, m., & boos, m. (2013). effects of team coordination during cardiopulmonary resuscitation: a systematic review of the literature. journal of critical care, 28(4), 504-521. doi:10.1016/j.jcrc.2013.01.005 fielding, n. g. (2012). triangulation and mixed methods designs. journal of mixed methods research, 6(2), 124-136. doi:10.1177/1558689812437101 figner, b., & murphy, r. o. (2011). using skin conductance in judgment and decision making research. in m. schulte-mecklenbeck, a. kuehberger, & r. ranyard (eds.), a handbook of process tracing methods for decision research: a critical review and user's guide (pp. 163-184). new york: psychology press. fischer, f., & järvelä, s. (2014). methodological advances in research on learning and instruction. frontline learning research, 2(4), 1-6. garbarino, m., lai, m., tognetti, s., picard, r. w., & bender, d. (2014). empatica e3 a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition. in wireless mobile communication and healthcare (mobihealth), 2014 eai 4th international conference on (pp. 39–42). athens, greece: ieee. gibson, c. b., cooper, c. d., & conger, j. a. (2009). do you see what we see? the complex effects of perceptual distance between leaders and teams. journal of applied psychology, 94(1), 62-76. doi:10.1108/dlo.2009.08123ead.009 goldman, s. r. (2014). perspectives on learning: methodologies for exploring learning processes and outcomes. frontline learning research, 2(4), 46-55. grenvik, a., schaefer, j. j., devita, m. a., & rogers, p. (2004). new aspects on critical care medicine training. current opinion in critical care, 10(4), 233-237. doi:10.1097/01.ccx.0000132654.52131.32 hamaker, e. l. (2012). why researchers should think “within-person”: a paradigmatic rationale. in m. r. mehl & t. s. conner (eds.),handbook of research methods for studying daily life (pp. 43-61). new york, ny: guilford publications. hannah, s. t., uhl-bien, m., avolio, b. j., & cavarretta, f. l. (2009). a framework for examining leadership in extreme contexts. the leadership quarterly, 20(6), 897-919. härgestam, m., lindkvist, m., brulin, c., jacobsson, m., & hultin, m. (2013). communication in interdisciplinary teams: exploring closed-loop communication during in situ trauma team training. bmj open, 3 (10). heaphy, e. d., & dutton, j. e. (2008). positive social interactions and the human body at work: linking organizations and physiology. the academy of management review, 33(1), 137-162. hoogeboom, a. m. g. m., & wilderom, c. p. m. (2015). effective leader behaviors in regularly held staff meetings: surveyed vs. videotaped and video-coded observations. in j. a. allen & n. lehmann-willenbrock & s. g. rogelberg (eds.), the cambridge handbook of meeting science (pp. 381-412). cambridge handbooks in psychology. cambridge university press. http://dx.doi.org/10.1017/cbo9781107589735.017. humphrey, s. e., & aime, f. (2014). team microdynamics: toward an organizing approach to teamwork. the academy of management annals, 8(1), 443–503. doi:10.1080/19416520.2014.904140 doi:10.1080/19416520.2014.904140 hunziker, s., johansson, a. c., tschan, f., semmer, n. k., rock, l., howell, m. d., & marsch, s. (2011). teamwork and leadership in cardiopulmonary resuscitation. journal of the american college of cardiology, 57(24), 2381-2388. doi:10.1016/j.jacc.2011.03.017 hunziker, s., laschinger, l., portmann-schwarz, s., semmer, n. k., tschan, f., & marsch, s. (2011). perceived stress and team performance during a simulated resuscitation. intensive care medicine, 37(9), 1473-1479. doi:10.1007/s00134-011-2277-2 hunziker, s., semmer, n. k., tschan, f., schuetz, p., mueller, b., & marsch, s. (2012). dynamics and association of different acute stress markers with performance during a simulated resuscitation. resuscitation, 83(5), 572-578. doi:10.1016/j.resuscitation.2011.11.013 hunziker, s., tschan, f., semmer, n., howell, m., & marsch, s. (2010). human factors in resuscitation: lessons learned from simulator studies. journal of emergencies, trauma and shock, 3(4), 389-394. doi:10.4103/0974-2700.70764 jacobsson, m., hargestam, m., hultin, m., & brulin, c. (2012). flexible knowledge repertoires: communication by leaders in trauma teams. scandinavian journal of trauma, resuscitation and emergency medicine, 20 (1), 44. doi:10.1186/1757-7241-20-44 kanki, b. g., folk, v. g., & irwin, c. m. (1991). communication variations and aircrew performance. the international journal of aviation psychology, 1(2), 149-162. doi:10.1207/s15327108ijap0102_5 keitel, a., ringleb, m., schwartges, i., weik, u., picker, o., stockhorst, u., et al. (2011). endocrine and psychological stress responses in a simulated emergency situation. psychoneuroendocrino, 36(1), 98–108. kim, t., mcfee, e., olguin, d. o., waber, b., & pentland, a. (2012). sociometric badges: using sensor technology to capture new forms of collaboration. journal of organizational behavior, 33(3), 412-427. doi:10.1002/job.1776 klonek, f. e., burba, m., kauffeld, s., & quera, v. (2016). group interactions and time: using sequential analysis to study group dynamics in project meetings. group dynamics, 20(3). 209-222. kneebone, r. l., nestel, d., vincent, c., & darzi, a. (2007). complexity, risk and simulation in learning procedural skills. medical education, 41, 808-814. kolbe, m., grote, g., waller, m. j., wacker, j., grande, b., burtscher, m. j., & spahn, d. r. (2014). monitoring and talking to the room: autochthonous coordination patterns in team interaction and performance. journal of applied psychology, 99(6), 1254-1267. doi:10.1037/a0037877 koudenburg, n., postmes, t., & gordijn, e. h. (2017). beyond content of conversation: the role of conversational form in the emergence and regulation of social structure. personality and social psychology review, 21(1), 50-71. doi:10.1177/1088868315626022 kozlowski, s. w., & ilgen, d. r. (2006). enhancing the effectiveness of work groups and teams. psychological science in the public interest, 7(3), 77-124. doi: 10.1111/j.1529-1006.2006.00030.x lang, p. j., bradley, m. m., & cuthbert, b. n. (1998). emotion, motivation, and anxiety: brain mechanisms and psychophysiology. biological psychiatry, 44(12), 1248-1263. doi:10.1016/s0006-3223(98)00275-3 larsen, r. j., diener, e., & lucas, r. e. (2002). emotion models, measures, and individual differences. in r. g. lord, r. j. klimoski, & r. kanfer (eds.), emotions in the workplace (pp. 64–106). san francisco: jossey-bass. lazarus, r. s., & folkman, s. (1984). coping and adaptation. in w. d. gentry (ed.), the handbook of behavioral medicine (pp. 282-325). new york: guilford. lei, z., waller, m. j., hagen, j., & kaplan, s. (2016). team adaptiveness in dynamic contexts: contextualizing the roles of interaction patterns and in-process planning. group & organization management, 41(4), 491-525. doi:10.1177/1059601115615246 marci, c. d, ham, j., moran, e., & orr, s. p. (2007). physiologic correlates of perceived therapist empathy and social-emotional process during psychotherapy. the journal of nervous and mental disease, 195(2), 103-111. doi:10.1097/01.nmd.0000253731.71025.fc marks, m. a., zaccaro, s. j., & mathieu, j. e. (2000). performance implications of leader briefings and team-interaction training for team adaptation to novel environments. journal of applied psychology, 85(6), 971-986. doi:10.1037/0021-9010.85.6.971 marsch, s. c., müller, c., marquardt, k., conrad, g., tschan, f., & hunziker, p. r. (2004). human factors affect the quality of cardiopulmonary resuscitation in simulated cardiac arrests. resuscitation, 60(1), 51-56. mathieu, j. e., hollenbeck, j. r., van knippenberg, d., & ilgen, d. r. (2017). a century of work teams in the journal of applied psychology. journal of applied psychology, 102(3), 452. mcgaghie, w. c., issenberg, s. b., cohen, m. e. r., barsuk, j. h., & wayne, d. b. (2011). does simulation-based medical education with deliberate practice yield better results than traditional clinical education? a meta-analytic comparative review of the evidence. academic medicine: journal of the association of american medical colleges, 86 (6), 706-711. doi:10.1097/acm.0b013e318217e119 mcgaghie, w. c., issenberg, s. b., petrusa, e. r., & scalese, r. j. (2010). a critical review of simulation‐based medical education research: 2003–2009. medical education, 44(1), 50-63. doi:10.1111/j.1365-2923.2009.03547.x molenaar, i. (2014). advances in temporal analysis in learning and instruction. frontline learning research, 2(4), 15-24. noldus, l. p., trienes, r. j., hendriksen, a. h., jansen, h., & jansen, r. g. (2000). the observer video-pro: new software for the collection, management, and presentation of time-structured data from videotapes and digital media files. behavior research methods, instruments, & computers, 32(1), 197-206. doi:10.3758/bf03200802 olguín, d. o., waber, b. n., kim, t., mohan, a., ara, k., & pentland, a. (2009). sensible organizations: technology and methodology for automatically measuring organizational behavior. ieee transactions on systems, man, and cybernetics, part b (cybernetics), 39 (1), 43-55. doi:10.1109/tsmcb.2008.2006638 pentland, a. (2012). the new science of building great teams. harvard business review, 90(4), 60-69. pugliese, a., nicholson, g., & bezemer, p. j. (2015). an observational analysis of the impact of board dynamics and directors' participation on perceived board effectiveness. british journal of management, 26 (1), 1-25. doi:10.1111/1467-8551.12074 r core team. (2014). r: a language and environment for statistical computing. vienna, austria: r foundation for statistical computing. retrieved from http://www.r-project.org/ russell, j. a. (1980). a circumplex model of affect. journal of personality and social psychology, 39(6), 1161. sahu, s., & lata, i. (2010). simulation in resuscitation teaching and training, an evidence based practice review. journal of emergencies, trauma and shock, 3(4), 378-384. doi:10.4103/0974-2700.70758 sandroni, c., fenici, p., cavallaro. f., bocci, m. g., scapigliati, a., & antonelli, m. (2005). haemodynamic effects of mental stress during cardiac arrest simulation testing on advanced life support courses. resuscitation, 66(1), 39-44. satish, u. & streufert, s. (2002). value of a cognitive simulation in medicine: towards optimizing decision making performance of healthcare personnel. quality and safety in health care, 11(2): 163-167. schmutz, j., hoffmann, f., heimberg, e., & manser, t. (2015). effective coordination in medical emergency teams: the moderating role of task type. european journal of work and organizational psychology, 24(5), 761-776. doi:10.1080/1359432x.2015.1018184 spiers, j. a. (2004). tech tips: using video management/analysis technology in qualitative research. international journal of qualitative methods, 3(1), 57-61. doi:10.1177/160940690400300106 stachowski, a. a., kaplan, s. a., & waller, m. j. (2009). the benefits of flexible team interaction during crises. journal of applied psychology, 94(6), 1536-1543. doi:10.1037/a0016903 stemmler, g. 2004. physiological processes during emotion. in p. philippot, & r. s. feldman, (eds.), regulation of emotion (pp. 33–70). mahwah, nj: erlbaum. strijbos, j.-w., martens, r. l., prins, f. j., & jochems, w. m. g. (2006). content analysis: what are they talking about? computer & education, 46, 29-48. tannenbaum, s.i., cerasoli, c. p. (2013).do team and individual debriefs enhance performance? a meta-analysis. human factors, 55 (1): 231–245. tschan, f., jenni, n., semmer, n. k., hunziker, s., marsch, s. u., & kolbe, m. (2014). leadership in different resuscitation situations. trends in anaesthesia and critical care, 4(1), 32-36. tschan, f., semmer, n. k., gautschi, d., hunziker, s., spychiger, m., & marsch, s. u. (2006). leader to recovery: group performance and coordinative activities in medical emergency driven groups. human performance, 19(3), 277-304. van der haar, s., koeslag-kreunen, m., euwe, e., & segers, m. (2017). team leader structuring for team effectiveness and team learning in command-and-control rooms. small group research, 48(2), 215-248. vangrieken, k., boon, a., dochy, f., & kyndt, e. (2017). group, team, or something in between? conceptualising and measuring team entitativity. frontline learning research, 5(4), 1-41. doi:10.14786/flr.v5i4.297 van keulen, m., kaminski, m., matheja, c., katoen, j.-p. (2018). rule-based conditioning of probabilistic data. in: proceedings of the 12th international conference on scalable uncertainty management (sum 2018), 3-5 october 2018, milan, italy. springer. van lier h.g. et al. (2017) design decisions for a real time, alcohol craving study using physioand psychological measures. in: de vries p., oinas-kukkonen h., siemons l., beerlage-de jong n., van gemert-pijnen l. (eds) persuasive technology: development and implementation of personalized technologies to change attitudes and behaviors. persuasive 2017. lecture notes in computer science, vol. 10171. springer, cham. weis, p. p., & herbert, c. (2017). bodily reactions to emotional words referring to own versus other people’s emotions. frontiers in psychology, 8, 1277. doi:10.3389/fpsyg.2017.01277 wetzel, c.m., black, s.a., hanna, g.b., athanasiou, t., kneebone, r.l., nestel, d., et al. (2010). the effects of stress and coping on surgical performance during simulations. ann surg., 251(1), 171–6. wiesenfeld, a. r., whitman, p. b., & malatesta, c. z. (1984). individual differences among adult women in sensitivity to infants: evidence in support of an empathy concept. journal of personality and social psychology, 46(1), 118-124. doi:10.1037/0022-3514.46.1.118 zaccaro, s. j., rittman, a. l., & marks, m. a. (2001). team leadership. the leadership quarterly, 12(4), 451-483. doi:http://doi.org/10.1016/s1048-9843(01)00093-5 zijlstra, f. r., waller, m. j., & phillips, s. i. (2012). setting the tone: early interaction patterns in swift-starting teams as a predictor of effectiveness. european journal of work and organizational psychology, 21(5), 749-777. doi:10.1080/1359432x.2012.690399 readspeaker® docreader™ click here if you are not being automatically redirected. frontline learning research 6 (2014) 1-25 issn 2295-3159 corresponding author: heidi hyytinen, institute of behavioural sciences, the university of helsinki, 00014 university of helsinki, finland, email: heidi.m.hyytinen@helsinki.fi doi: http://dx.doi.org/10.14786/flr.v2i4.124 1 | f l r the complex relationship between students’ critical thinking and epistemological beliefs in the context of problem solving heidi hyytinen a , katariina holma b , auli toom a , richard j. shavelson c , sari lindblom-ylänne a a university of helsinki, finland b university of eastern finland, finland c sk partners, llc & graduate school of education, stanford university, usa article received 2 may 2014 / revised 14 june 2014 / accepted 27 july 2014 / available online 24 september 2014 abstract the study utilized a multi-method approach to explore the connection between critical thinking and epistemological beliefs in a specific problem-solving situation. data drawn from a sample of ten third-year bioscience students were collected using a combination of a cognitive lab and a performance task from the collegiate learning assessment (cla). the cognitive-lab data were analysed using thematic analysis. the findings showed that students’ epistemological beliefs were interwoven into their critical thinking: students used critical thinking as a tool (1) for enhancing understanding and (2) for determining truth or falsehood. based on this classification, students could be placed in one of two qualitative profiles, either (1) thorough processing or (2) superficial processing. the results indicated that students who showed superficial processing palmed off justification for knowing on authoritative figures. in contrast to previous studies these students did not consider knowledge to be absolutely certain or unquestionable. the findings also show that students with thorough processing believed knowledge to be tentative and fallible, but did not share the relativist view of knowledge where any claim counts because all knowledge is relative. all ten students shared a fallibilist view of knowledge. keywords: critical thinking; epistemological beliefs; cognitive lab; relativism; fallibilism h.hyytnen et al. 2 | f l r 1. introduction critical thinking has been singled out as one of the most important skills for citizens of the twentyfirst century (halpern, 2014). mastering critical thinking is thus a goal that can be found in almost every higher education curriculum today. however, recent studies have raised concerns that even though most students make significant progress in learning concepts and procedures during their university studies, some students show little if any growth in critical thinking (arum & roksa, 2011a, 2011b; bok, 2006; pascarella, blaich, martin & hanson, 2011). in the field of higher education, research on critical thinking has generally focused on the development of critical thinking skills (e.g. arum & roksa, 2011a; heijltjes, van gog, leppink & paas, 2014). researchers have also highlighted the importance of understanding critical thinking as a social activity (e.g. arum & roksa, 2011b; kuhn, 2005; moore, 2004; 2013). in this exploratory study we provide a multidimensional framework for analysing critical thinking by combining theoretical aspects from philosophical, educational and psychological approaches. in our view the concept of critical thinking is closely connected to the concepts of ‘knowledge’ and ‘knowing’. furthermore, we assume that critical thinking cannot be formulated by referring to skills alone, but also always involves a disposition to use these skills adequately (see bailin & siegel, 2003; holma, 2014; siegel, 1988). previous research on critical thinking and personal epistemology has frequently applied quantitative multiple-choice tests, questionnaires or qualitative interviews (see e.g. australian council of education research, 2001; heijltjes, van gog, leppink & paas, 2014; greene & yu 2014; lahtinen & pehkonen, 2013; tremblay, lalancette & roseveare, 2012). recently, many researchers have questioned the reliability and adequacy of self-report questionnaires (greene & yu, 2014; elby & hammer, 2001). as a result, researchers have stated that there is a need for studies that assess the performance of students directly (e.g. elby & hammer, 2001; hofer, 2004; stes, min-leliveld, gijbels and van petegem 2009). at the same time researchers have also assumed that one assessment method is not enough to evaluate complex cognitive processes such as reasoning (e.g. baartman, bastiaens, kirschner & vleuten, 2007; dierick & dochy, 2001; maclellan, 2004). this study responds to current concerns by exploring students’ critical thinking as well as their epistemological beliefs, as elaborated upon below, in a problem-solving situation to which we applied a multi-method qualitative approach. a think-aloud method was used as the students worked through an openended performance task. our aim is to identify and understand qualitative differences in the critical thinking of students and in their beliefs about knowledge, as well as in their personal relationships. 2. critical thinking in university-level studies critical thinking is often ‘regarded as fundamental aim of education’ (bailin & siegel, 2003, p.188; cf. dewey, 1910). in a university context critical thinking has an essential role and is an important component of the learning outcomes (bok, 2006). critical thinking is defined as a process that enables an individual to make an informed decision about conflicting claims (ennis, 1991; fisher, 2011; bailin & siegel, 2003). it is purposeful, reasoned and reflective thinking (ennis, 1991; american philosophical association, 1990). a critical thinker knows how to assess the strength of evidence and the reasons that are relevant to the particular context or type of task, and also shows the disposition to draw on these skills (bailin & siegel, 2003; scheffler, 1965, halpern, 2014). critical thinking is seen as a skilful activity in which a person may be more or less proficient (fisher, 2011; scheffler, 1965). definitions of critical thinking typically include a list of the thinking skills that characterise an ideal critical thinker. for example, fisher (2011) lists the following: the ability to identify the elements in a reasoned case, especially reasons and conclusions; the abilities to identify and evaluate assumptions; the abilities to clarify and interpret expressions and ideas; to be able to judge the acceptability, especially the credibility, of claims; to evaluate arguments, analyse, evaluate and produce explanations; to be able to analyse, evaluate, and make decisions; to draw inferences and produce arguments (see also halpern, 2014). university studies require all of these abilities. h.hyytnen et al. 3 | f l r however, many philosophers have argued that critical thinking cannot be conceptualised merely by referring to a prescribed set of skills (bailin & siegel, 2003; holma, 2014; fisher, 2011; siegel, 1988, scheffler, 1965; see also halpern, 2014). it may be that a person has acquired the skills, but does not use them (fisher, 2011). as holma (2014) has pointed out, it is not enough for students to have critical thinking skills; they also need to use these skills effectively. thus, critical thinking always involves both the essential skills or abilities and the disposition to use them (bailin & siegel, 2003, holma, 2014; siegel, 1988). previous studies have called attention to the fact that students’ critical thinking skills do not always develop during university studies (arum & roksa, 2011a; bok, 2006; pascarella, blaich, martin & hanson, 2011). arum and roksa (2011b) demonstrated in their longitudinal study that a large number of university students showed no significant improvement in a range of critical thinking skills, such as reasoning and problem solving. however, a recent study by heijltjes and colleagues (2014) has shown that the combination of explicit instruction and practice has proven successful in improving students’ performance in reasoning skills. 3. knowledge and knowing in critical thinking critical thinking demands a comprehensive use of different types of knowledge (bok, 2006; ennis, 1991). there is a reciprocal relationship between ‘critical thinking’, ‘knowledge’ and ‘knowing’; on the one hand, students need knowledge about a phenomenon before they can think about it critically (halpern, 2014); on the other hand, students must have the necessary skills to evaluate that knowledge. the concepts of ‘knowledge’ and ‘knowing’ are thus substantial aspects of conceptualising critical thinking. there are several different definitions and classifications of the concept of knowledge. for example, philosophical epistemologists usually differentiate amongst three types of knowledge: propositional knowledge, procedural knowledge and knowledge by acquaintance (everitt & fisher, 1995; ichikawa & steup, 2012), although there is no consensus on the interpretation of knowledge or on the number of types of knowledge (fenstermacher, 1994). for our purposes the distinction between propositional and procedural knowledge has theoretical importance. propositional knowledge is defined as knowing that ‘such-and-such is the case’. this is sometimes referred to as factual or declarative knowledge. propositional knowledge (i.e. ‘knowing that’) is usually distinguished from procedural knowledge (i.e. ‘knowing how’) (ryle, 1949). in philosophical discussions propositional knowledge is related to such epistemological concepts as truth, justification, reason and evidence (ryle, 1949; scheffler, 1965, see also niiniluoto, 1999; shope, 2004). scheffler (1965) argued that the ‘knowing that’ attributes of a person may reveal his epistemological orientations, such as the criteria for justifying knowing. empirical research on personal epistemology focuses particularly on these personal orientations. procedural knowledge, meaning ‘knowing how’ to do something (knowing how to analyse, knowing how to swim, etc.; see everitt & fisher, 1995; shope, 2004), is related to possessing a skill (scheffler, 1965). in this sense critical thinking represents procedural knowledge, which is consistent with the other aspect of critical thinking mentioned above. however, several researchers have assumed that procedural knowledge always involves some propositional knowledge (i.e. everitt & fisher, 1995; smith 2002; markowitsch & messerer, 2007). for example, if a person knows how to play chess, he will probably know certain facts (e.g. rules) about playing chess. smith (2002) has emphasized that an individual has a certain skill only when his performance reflects both procedural and propositional knowledge. in sum, critical thinking involves a disposition to think critically, having the necessary propositional knowledge about a phenomenon and having the thinking skills (i.e. procedural knowledge) to evaluate that knowledge (cf. halpern, 2014). h.hyytnen et al. 4 | f l r 4. students’ epistemological beliefs as premises of critical thinking the term ‘personal epistemology’ or, alternatively, ‘epistemological belief’ is defined as an individual’s views of the nature of knowledge and knowing. the term also includes a view of one’s personal beliefs as a knower (pintrich, 2002; hofer, 2004). the concept of ‘personal epistemology can be described along a continuum from less sophisticated to more sophisticated’ ways of knowing (kaartinen-koutaniemi & lindblom-ylänne, 2012, p. 2) or a progress ‘from a state of simple, absolute certainty into a multifaceted, evaluative system’ (west, 2004, p. 61). during this process the individual changes from a passive recipient of knowledge to an active participant in constructing and evaluating knowledge (hofer & pintrich, 2002; kuhn, 2005; king & kitchener, 2004). over time epistemological beliefs develop more and more toward relativistic beliefs (hofer & pintrich, 1997, 2002). previous research on personal epistemology has found that the ability to think critically is embedded in a progression of epistemological beliefs (i.e. king & kitchener, 2004; kuhn & weinstock, 2002; kuhn, 1999; 2005). several researchers have hypothesised that students with weak critical thinking skills have an absolute view of knowledge. when students move on to the most developed epistemological level, their critical thinking tends to improve as well (bok, 2006; kuhn, 1999; kuhn & weinstock, 2002). it has also been demonstrated that students’ epistemological beliefs play an important role in their ability to evaluate the credibility of competing claims (barzilai & zohar, 2012). whether instruction has any influence on the development of epistemological beliefs is currently under discussion (e.g. valanides & angeli, 2005; lahtinen & pehkonen, 2013). however, there is evidence that not all university students reach the most highly developed level of personal epistemology (kuhn & weinstock, 2002; kaartinen-koutaniemi & lindblom-ylänne, 2012; king & kitchener, 2004; perry, 1970). king and kitchener (2004) have found that only advanced doctoral students consistently show the highest level of epistemological beliefs. furthermore, kaartinen-koutaniemi and lindblom-ylänne (2008, 2012) have shown that there is a considerable variation in personal epistemology among final-year master’s students. their results also showed variations between students in different age groups, study phases and disciplines (see also hofer, 2006; muis, bendixen & haerle, 2006). in addition, researchers have assumed that students’ epistemological beliefs may vary within the same discipline or domain (hammer & elby, 2003; greene & yu, 2014). 5. critical thinking and different conceptions of knowledge as the brief review above indicates, the literature of personal epistemology makes a distinction between a lower level of epistemological beliefs, in which knowledge is perceived as consisting of unchanging facts and is acquired directly from external authorities, and higher level epistemological beliefs, in which knowledge is seen as uncertain and constructed by the individual himself (kuhn & weinstock, 2002; hofer, 2005; valanides & angeli, 2005). several researchers have stated that students with higherlevel epistemological beliefs have better critical thinking skills than students with lower level epistemological beliefs (king & kitchener, 2004; kuhn & weinstock, 2002; kuhn, 1999; 2005). recently, holma and hyytinen (2014) have argued that there are several conceptual problems in this kind of hierarchical theory of knowledge (see also elby & hammer, 2001). in this section we focus on three conceptions of knowledge identified in the review of the literature on epistemology. these conceptions, specifically relativism, metaphysical realism and fallibilism, have theoretical importance for conceptualising critical thinking. a relativist position implies that all knowledge is relative to the person who believes or that all interpretations, theories and beliefs are equally right. because all beliefs are equally right, there is no reason to compare and evaluate different beliefs—all beliefs are equally justified (holma, 2012; holma & hyytinen, 2014). the problem of relativism becomes clear when it is related to the concept of critical thinking (holma & hyytinen, 2014). given that relativism allows people to construct their own ‘personal truths’, critical thinking turns out to be unnecessary (bleazby, 2011). for example, there is no need to evaluate ideas or h.hyytnen et al. 5 | f l r search for alternatives, because all ideas are equally trustworthy and justifiable (bleazby, 2011; holma & hyytinen, 2014). therefore, the idea that critical thinking presupposes the relativist view of knowledge is untenable. metaphysical realism is an epistemological position that assumes that ‘our knowledge and symbol systems [i.e. theories] directly reflect the structure of reality’ (holma, 2004, p. 421; putnam, 1981). the literature of personal epistemology seems to understand realism as metaphysical realism (see e.g. kuhn 2005; kuhn & weinstock, 2002; see also holma & hyytinen 2014), and furthermore, it appears to connect with metaphysical realism the assumption of the possibility of the certainty of human knowledge. as king and kitchener (2004) put it, knowledge is ‘obtained with certainty by direct observation’ (p. 7). 1 in the context of metaphysical realism, critical thinking turns out to be pointless. fallibilism is an epistemological position that implies that all our beliefs are liable to error (reed, 2002; niiniluoto, 1999; holma, 2012). contrary to relativism, fallibilism does not assume that all beliefs or theories are equally right. it presumes the possibility of improving our current conceptions, theories or beliefs. as holma (2012, p. 399) aptly states of fallibilism, ‘this position, like the belief that all human knowledge is uncertain, coheres with the evolutionary understanding of knowledge: the bodies of knowledge we now have may be mistaken and thus [are] possible subjects for revision, but they have, nevertheless, survived the process of evolution to this point; as such, they provide the best available starting point for choices and action of the present moment concerning further inquiry’ (see also peirce, 1934). from this point of view, epistemological fallibilism fits the presumption of critical thinking. previous research on personal epistemology lacks the notion of epistemological fallibilism. 1 king and kitchener (2004) do not call the lowest level of reflective thinking realism. however, in their model they maintain that, at the most limited level of thinking, knowledge is certain and is obtained from direct observation (p.7). this position fits metaphysical realism. h.hyytnen et al. 6 | f l r table 1. summary of the key concepts of this study concept description critical thinking process that enables an individual to make an informed decision between conflicting claims. it involves skills and dispositions (e.g. attitude and motivation) to evaluate the reliability and relevance of evidence, to identify arguments, to analyse, interpret and synthesise data from a variety of sources, to draw valid conclusions and address opposing viewpoints). 1 critical thinking also involves ‘knowing how to do something’ (procedural knowledge) and ‘knowing that’ (propositional knowledge). 2 epistemological beliefs students’ thoughts/beliefs about the nature of knowledge and the nature of knowing, including personal beliefs about themselves as knowers. 3 metaphysical realism the idea that human beliefs are direct copies of reality. the belief that all human knowledge is certain is connected to this epistemological position. 4 relativism the view that all knowledge is relative to the person who believes or that all interpretations/beliefs are equally correct. because all beliefs are equally correct, there are no means for comparing different beliefs. 5 epistemological fallibilism the view that human knowledge is uncertain. in contrast to relativism, it presumes the possibility of improving our current conceptions, theories or beliefs, seeking criteria for evaluating, comparing and justifying these beliefs or theories. 5 1 based on bailin & siegel (2003); ennis (1991); fisher (2011); fisher & scriven (1997), siegel (1988). 2 based on scheffler (1965); cf. also ryle (1949). 3 based on pintrich (2002). 4 based on holma (2004); putnam (1981). 5 based on holma (2012); holma & hyytinen (2014); peirce (1934). table 1 provides a summary of the definitions of the key concepts in this study. with this broader framework we are able to pin down different areas in critical thinking and epistemological beliefs, which have been shown to be vital for conceptualising these phenomena in prior studies or theorizations. although the conventions of critical thinking and epistemological beliefs are commonly embodied in social practices (e.g. arum & roksa, 2011b; elby & hammer, 2001; kuhn, 2005), the underlying dimensions (i.e. evaluating the reliability and relevance of evidence, identifying arguments, analysing information, addressing opposing viewpoints, reasoning) are relevant in each scientific discipline. moreover, in line with previous studies we expected that students’ epistemological beliefs and critical thinking might vary within the same discipline (see greene & yu, 2014; see also bailin & siegel, 2003). in our study we focused on the qualitative differences in critical thinking and personal epistemological beliefs by examining ten third-year university students’ thinking and performance in a cognitively-demanding authentic problem-solving situation. the aims of this study are twofold: to identify and describe qualitative differences in third-year university students’ critical thinking skills and epistemological beliefs in a problem-solving situation, and to analyse the interconnections between students’ h.hyytnen et al. 7 | f l r personal epistemologies and critical thinking skills. to achieve these aims, we formulated the following research questions: (1) how are critical thinking and epistemological beliefs presented in a problem-solving situation in a specific group of third-year university students? (2) how do critical thinking and epistemological beliefs vary from one individual to the next? 6. research methods and materials 6.1 participants this study was conducted with ten third-year bioscience students drawn from the fields of biological and environmental sciences in a research-intensive university in finland. the target population consisted of all third-year bioscience students in this particular university. first, we selected 40 students at random (approximately one-half of the target population). then we invited all students selected to participate in our study. ten out of 40 students volunteered. seven of the participants were female and three male. the students’ ages varied from 22 to 29, the mean age being 24. all came from a homogeneous cultural background, and all shared the same first language (finnish). in addition, the students had the same national high school certificate and had enrolled in the same bachelor’s study programme. the participants were at the same phase of their studies, that is, near the end of their bachelor’s studies, with the exception of one student whose study pace had been slower. during their university careers, the students had participated in lectures, practical laboratories, seminars, field courses and web-based teaching. we are aware that the sample size is too small for generalization. however, the purpose of this study is to deepen understanding of critical thinking and epistemological beliefs, for example, so as to describe how these phenomena vary across individuals in this specific group of students. 6.2 procedures for this study we collected a large body of data for each participant using a multi-method approach (johnson, onwuegbuzie & turner, 2007), including think-aloud protocol, interviews and a collegiate learning assessment (cla) performance task. the data collection was carried out in the spring of 2010 and consisted of ten cognitive labs. the students came to a classroom and were given the details of the study. the students spent two to three hours reading and responding to the performance task. in responding to the task, the students were asked to verbalise their thoughts (to ‘think aloud’). in the course of carrying out the task while thinking aloud, the students were also asked to write a memorandum addressing critical issues in the task and recommending —and justifying— a course of action. following the task, the students were interviewed about their processes in carrying out the task. students were also asked questions about critical thinking, knowledge and knowing. details of the procedures are provided below in appropriate sections. 6.2.1 collegiate learning assessment (cla) the collegiate learning assessment (cla) instrument for assessing college-level critical thinking skills used in this study was developed by the council for aid to education (cae). the cla is a standardised, open-ended test and it measures analytical reasoning, problem solving and written communication. unlike most standardised tests used in measuring critical thinking, the version of the cla used here did not include any multiple-choice questions (klein, benjamin, shavelson & bolus, 2007). the cla consists of two elements: a set of performance tasks and a set of analytical writing tasks (shavelson, 2010). only the performance task was used in this study. recent studies have found that open-ended problems with no obvious solution provide an opportunity for students to reflect on their beliefs about knowledge (barzilai & zohar, 2012; ferguson & bråten, 2012). for example, in a problem-solving situation students would need to determine the trustworthiness, and relevance, of different types of information h.hyytnen et al. 8 | f l r presented to them, co-ordinate various pieces of information related to the problem and consider the underlying assumptions and claims (shavelson, 2010). the cla performance task presents a realistic situation or problem and includes directions, openended questions and a document library containing reading material. in order to respond to the task, the students need to read, organise, synthesise and analyse information (which might be reliable/unreliable; relevant/irrelevant to the completion of the task; see shavelson, 2010) from multiple documents (for example letters, memos, summaries of research reports, articles, diagrams, graphs, maps, interview notes). in doing these activities the students need to assess their confidence in information taken from various sources, including the relevance of the source, and thereby deal with conflicting information. they then need to decide on a course of action and provide a reasoned explanation and justification for their course, drawing on supporting information from the document library (klein et al., 2007; shavelson, 2010). they also have to argue for and against alternative explanations. the specific performance task used in this study is proprietary and consequently cannot be described here. an example of a representative cla performance is presented in figure 1. adapted from r. shavelson, 2010, measuring college learning responsibly: accountability in a new era. stanford, ca: stanford university press, p. 38. figure 1. an example of a cla performance task. 6.2.2 cognitive labs the purpose of cognitive labs is to study the cognitive processes that students use when they complete different tasks. students are asked to report their thoughts verbally as they carry out a task (see johnstone, bottsford-miller & thompson, 2006). in this study cognitive labs were divided into three parts: (1) instruction and training, where the researcher explained what the cognitive lab was about and trained the students to think aloud with a short warm-up task; (2) ‘think-aloud’, where the students talked aloud while completing the cla performance task; and (3) a follow-up interview. the cognitive lab for each student was video-recorded and lasted two to three hours. to ensure the consistency of cognitive labs, a script of directions and the same training task and the interview questions for each student were used. the videos h.hyytnen et al. 9 | f l r were recorded with two cameras and a table microphone. the cognitive workshop produced the following materials: video data, content logs (see below), written test answers and transcribed interview data. the neutral type of think-aloud protocol conducted by ericsson and simon (1993) in which students were not interrupted while they were performing a task was used in this study. the think-aloud method makes it possible to collect data about a student’s ongoing thinking processes whilst he or she is working on a task (ericsson & simon, 1993; cotton & gresty, 2006; van someren, barnard & sandberg, 1994). we assume that students’ ‘knowing-that’ attributions (e.g. ‘scientific knowledge is true’) may reflect their epistemological orientations and reveal their criteria for justifying beliefs (see scheffler, 1965). moreover, in some cases the think-aloud method makes it possible to explore critical thinking in action, especially in situations that simulate real-world circumstances. immediately after the task was performed, a follow-up interview was conducted. the aim of the interview was to gain more detailed information about the processes and knowledge that the students used to complete the task and to probe students’ beliefs about knowledge and knowing. for example, the students were asked questions about how they dealt with conflicting information, how they decided which information to use, what sources of information in documents from the documents library they trusted and why, and how they usually evaluate knowledge. 7. data analysis the data were analysed using a qualitative thematic analysis with an abductive approach (timmermans & tavory, 2012; haig 2005). an abductive strategy means that the themes identified from the data were linked to the theoretical understanding based on previous studies. abduction is a process that combines things which one had not previously associated by creating a new interpretation, that is, the relationship of a new combination of study features (timmermans & tavory, 2012). hence, the analysis process was nonlinear, moving back and forward amongst all the data, data items, analysed qualities and understanding of the phenomenon based on prior studies. the first and fifth authors were responsible for the analysis, but the final results were obtained through a thorough discussion with all authors. the data were processed in such a way that the participants could not be identified. the analysis included four phases (figure 2) that represented the unique combination of datagrounded and theory-driven phases, as well as phenomenon and individual-level analyses. during the first phase, video recordings were initially indexed with the elan program, which allows the addition of as many tiers and annotations on the video stream as needed (see lausberg & sloetjes, 2009; max planck institute for psycholinguistics, 2012). the purpose of indexing was to make the large video data set easier to handle. in this study the indexing tiers corresponded to the parts of cognitive labs including training, thinkaloud methods and interviews. in addition, students’ interviews from the videos were transcribed. h.hyytnen et al. 10 | f l r *based on braun & clarke (2006, p. 87). figure 2. a visualisation of the analysis process. after the indexing, content logs were created for each video in which accurate descriptions and summaries of events were systematically recorded. transcriptions of relevant sections of verbalisations of students’ critical thinking and epistemological beliefs (e.g. whenever a student evaluated the quality and reliability of the information in a document or where a student reached a conclusion based on her or his analysis) and nonverbal acts (e.g. a student did not read in detail or skipped over the document) were also included in the log (cf. table 1). the second phase of the analysis was the data coding (see table 2 for definitions). this phase was theory-driven, meaning that the features guiding the coding were based on prior studies (see table 1). the coding focused on the following qualities: the process by which the student approached the task and solved h.hyytnen et al. 11 | f l r the problem, the knowledge that the student used to carry out the task, the critical thinking exhibited, and epistemological beliefs. these different qualities were coded systematically across the entire data set and within the data items such as the transcribed interviews and the think-aloud videos of each person. by this means, all the data items from one student, including the video data, content log, written test answers and transcribed interviews, were coded and analysed separately, after which data from all students were combined and compared (see table 3 for an example of the codes). all extracts were labelled with a student code (s1-s10) and a method code (i= interview, t=think aloud, w= written test answer). the data examples were translated into english. table 2. data sources and focal points of coding data sources coding features video data, content logs, transcribed interviews 1. the process: how does the student approach the task and solve the problem? video data, students’ written answers, content logs, transcribed interviews 2. what knowledge/information does the student use to solve the task? 2.1 what kind of knowledge/information did the student use? 2.2 why? 2.3 how does the student use that knowledge/information? video data, students’ written answers, content logs, transcribed interviews 3. critical thinking 3.1 how does the student identify, analyse and evaluate information, ideas and arguments? 3.2 how does the student judge the acceptability (especially the credibility) of documents? 3.3 how does the student interpret data/ graphs/ maps? 3.4 how does the student recognise the relationship between assumptions? 3.5 how does the student evaluate background information? 3.6 how does the student make a decision? 3.7 how does the student identify reasons and come to a conclusion? 3.8 how does the student produce explanations and arguments? h.hyytnen et al. 12 | f l r video data, students’ written answers, content logs, transcribed interviews 4. epistemological beliefs 4.1 what does the student think about knowledge, knowing and the credibility of knowledge? 4.2 how does the student determine the trustworthiness, acceptability and justification of different types of information? 4.3 how does the student describe herself or himself as a knower? table 3. an example of codes data extract coded for you could consider this a good argument; the expert has gone [to the place where events took place] to see for himself (s9t) 4.1 what does the student think about knowledge, knowing and the credibility of knowledge? 4.2 how does the student determine the trustworthiness, acceptability and justification of the different types of information? yeah, i don’t believe the chair of the stakeholder group] is completely off the mark either. [reliability] is just always case-specific. (s8i) 4.1 what does the student think about knowledge, knowing and the credibility of knowledge? this just seems scientific somehow. (s6t) 4.1 what does the student think about knowledge, knowing and the credibility of knowledge? in the third phase the codes and coded extracts were grouped under potential themes, and all the relevant data were gathered under each theme (braun & clarke, 2006). we identified a variety of preliminary themes on the basis of the codes. during the analysis, the preliminary themes were defined and combined several times. in the end two main themes and two subthemes remained (see figure 2). the final themes were refined, labelled and cross-checked to see if they worked in relation to the coded extracts and the entire data set. the focus of the thematic analysis was the variation of study features on the phenomenon level. after completing the thematic analysis, we found that the students could be placed in different profiles based on our themes as well as on patterns of behaviour and cognition observed. this phase focused on the variation of study features at the individual level. thereafter, we conducted final descriptions, interpretations and revisions of the results. the results of thematic analysis show how critical thinking and epistemological beliefs manifested themselves in this particular group of students, whereas the student profiles describe how these phenomena vary across individuals. 8. results in the thematic analysis two main themes were identified: (1) flexibility in critical thinking and (2) variation in critical thinking and epistemological beliefs. the two themes emerged from exploring the students’ critical thinking from different perspectives. the ways in which the themes were related differed amongst the participants, which further allowed us to identify student profiles. we identified two main profiles, and on the basis of their characteristic features we labelled them as (1) thorough processing and (2) h.hyytnen et al. 13 | f l r superficial processing. the results are described using a combination of identified themes and student profiles. 8.1 flexibility in critical thinking students showed various skills in their ability to adapt their thinking and their performance flexibility to the demands of the task. there was clear variation in the students’ ability to change their actions or ways of critical thinking, in which we identified both rigidity and flexibility. flexibility meant that the students could modify their actions and processes and change their behaviours as needed, whereas rigidity refers to situations in which students could not change their processes or look at things from a new perspective or adjust to new evidence in a problem-solving situation. students who were able to make changes in their actions showed open-mindedness and an inquiring attitude. in the following extract, one student describes how he adjusted his performance and ended up analysing and interpreting the documents correctly: i approached this assignment maybe a little too much as if i had simply copied what they say here in these papers and put them down in my answer. but then when i started thinking, like about my own views on the topics, then right off in [question] number one, it took me a really long time to answer this question. (s8i) on the other hand, there were students who could not adjust their thinking or performance. some of these students said that they always act in the same way: well, i’m always like this time-management catastrophe. like in exams and everything, especially exams, it always feels like i run out of time. and in general i notice that in all comprehension and analysis assignments and things like that, they always take me a really long time. (s5i) 8.2 variation in critical thinking and epistemological beliefs students showed various aims in the problem-solving situation. some students tried to understand the complex situation, whereas others tried to find the right answer to the problem. students also varied in their critical thinking, including (a) their disposition and ability to identify, analyse, evaluate and interpret information; (b) their ideas and arguments in judging the acceptability of documents; (c) their abilities to recognise relationships between assumptions; (d) their abilities to make a reasoned decision; and (e) their abilities to produce explanations and arguments. in addition, students’ epistemological beliefs varied. some students claimed that only through scientific knowledge we can arrive at truth. however, other students expressed the idea that both objective and subjective knowledge can hold the highest epistemic status. we found that critical thinking emerged as a tool for understanding knowledge and determining the goodness and reliability of knowledge; thus, students’ epistemological beliefs were interwoven into their critical thinking. within this theme we found that students used critical thinking either a) as a tool for enhancing understanding or b) as a tool for determining truth or falsehood. based on this difference, students could be classified in one of two qualitative student profiles, either (1) thorough processing or (2) superficial processing. the profiles captured the diversity of the students’ abilities and dispositions to think critically. in addition, these two profiles characterised the variation in how students viewed the nature and limitations of knowledge and knowing, and especially in how they determined what is needed to evaluate knowledge as true or justified and how they acquired and used the knowledge in the problem-solving situation (see table 4). the phrase ‘acquiring knowledge’ here emphasises the dominant way that students used to obtain knowledge in a problem-solving situation. the students classified in the profile called ‘thorough processing’ demonstrated an ability to carry out a deep processing of the content of the documents. these students saw knowledge as fallible and contextual. similarly, the students in the profile called ‘superficial processing’ expressed the idea that knowledge is fallible, yet they did not consider the contextual nature of knowledge at all. in the problem h.hyytnen et al. 14 | f l r solving situation they did make a serious effort to analyse, interpret or synthesise the information in the materials. the thorough processing profile is further divided into two sub-profiles: (1a) reasoning in order to reach conclusions and (1b) intuition. likewise, the second profile, ‘superficial processing’, also consisted of two sub-profiles: (2a) referring to an argument made by authoritative specialists or experts and (2b) trust in scientific method and proof. we describe the characteristics of the profiles and sub-profiles below and provide details pertaining to variation in academic thinking. table 4. the nature of knowledge and acquiring knowledge in two qualitatively different student profiles of critical thinking sub-theme student profile epistemological beliefs acquiring knowledge (sub-profile) critical thinking as a tool for enhancing understanding thorough processing both objective and subjective knowledge can hold the highest epistemic status. knowledge is fallible, relative and contextual. reasoning in order to reach a conclusion intuition critical thinking as a tool for determining truth or falsehood superficial processing knowledge may reach truth only if it is produced by a reliable process, that is, using empirical methods. objective knowledge holds the highest possible epistemic status, but is fallible. some theories may be false. referring to an argument made by authoritative specialists/experts trust in scientific method and proof 8.3 profile 1: thorough processing students the students (n=5) who deeply analysed the content of the documents created their own understanding of the problem-solving situation. for them, critical-thinking skills were tools to deepen and enhance understanding. these students believed that theories and beliefs could be understood in relation to some context, as the following extract shows: yeah, i don’t believe the chair of the stakeholder group] is completely off the mark either. [the reliability of] knowledge is just always context-specific. (s8i) these students considered it possible to improve current theories and beliefs. these students were thus open to new evidence that could disprove a previously-held position or belief. for them, scientific knowledge is probably reliable. they believed that both objective and subjective knowledge could attain the highest epistemic status, meaning that subjective perceptions (e.g. their own perceptions or information obtained from someone else) could also be reliable. these students thought that the credibility of knowledge could be affected by vested interests or bias, for example. although these students emphasised their own role in constructing knowledge, they did not believe that all knowledge is constructed or generated by human minds. from the epistemological perspective, these students took the fallibilist position. h.hyytnen et al. 15 | f l r the students who belonged to the ‘thorough processing’ profile were further divided into two subprofiles on the basis of how they acquired knowledge and reached conclusions in the problem-solving situation and how flexible they were in changing their actions or ways of thinking. the first sub-profile was called (1a) reasoning in order to reach conclusions and the second was called (1b) intuition. 8.3.1 reasoning in order to reach conclusions two students endeavoured to reach conclusions by reasoning. these students analysed connections across the information presented in the different documents. they also clarified and interpreted different claims and ideas that were presented in the documents. on the basis of their own analyses, they synthesised information, reached a clear decision or conclusion, provided arguments for their decision and explained why this decision was the best in light of all the issues brought up in the documents. in the following example one student describes her analytical process: ‘somehow i knew how to read beyond the documents’ (s4i). these students were also able to adjust their thinking in line with new evidence and make changes in their actions. these students justified conclusions with good reasons (e.g. reliable and valid evidence) and considered themselves as active and responsible knowers, as the following extracts show: but maybe i wouldn’t, like, start criticising right away; somehow, i’d have to start looking into, you know, on what basis they arrived at these figures. (s10i) for instance, using this graph is fine, but i think it’s been, you know, clearly misinterpreted here in the text. (s4i) these students created their own understandings of the situation on the basis of their analyses. they used the materials for the analysis and evaluation process in a way that went beyond the obvious. for example, they identified, analysed, evaluated and interpreted all the major facts and ideas presented in the documents. they consciously excluded some information in the documents because of contradictory evidence. in addition, they were able to distinguish relevant claims from irrelevant ones. these students also judged the reliability of the documents, evaluated presuppositions and analysed connections between claims. furthermore, they produced different explanations, identified reasons, produced arguments and drew inferences. these students further identified and used several criteria in evaluating reliability: corroborating claims from different sources, evaluating the context in which the claim was made, exploring who interpreted the data and evaluating the presuppositions. moreover, these students considered the ethical aspects of knowledge: knowledge and information shape human beings’ worldview. i’ve just gotten the impression about newspapers, about the media too, that it somehow has the effect that the opinions [presented in them] are so strong that maybe you don’t analyse it so clearly. so like even helsingin sanomat [finland’s largest daily] really has, somehow it seems that they have a pretty strong, you know, bias... you know that even if it’s neutral in a way, then the fact the issues they raise in it, in a way that already affects what information is raised, and what... that it like really powerfully shapes people’s worldview. (s4i) 8.3.2 intuition three students justified conclusions by intuition. these students created their own understanding of a situation. however, they did not select materials or question any information: they used all the information in the documents, such as empirical knowledge, expert opinions, reports, maps, experiences of an inhabitant, recommendations, letters and second-hand knowledge. these students acquired knowledge in a rather uncritical way. they rarely evaluated the reliability of documents. indeed, these students did not have clear criteria for evaluating the reliability or relevance of information. they just trusted their intuition: this just seems scientific somehow. (s6t) h.hyytnen et al. 16 | f l r i don’t know how i should formulate this, but i’ll start by saying that when i read, for instance... or when i’m taking classes, i don’t spend a whole lot of time wondering if some piece of information is reliable or not. (s6i) these students started to analyse and interpret thoroughly all information presented in the documents. they identified all major facts and ideas. they also considered different decisions or explanations, but could not explain what decision was the best or why. there were too many options available. because the students did not reach clear conclusions, they did not present any arguments for accepting the conclusion either. these students showed an inability to adjust their thinking to new evidence or make changes in their actions. 8.4 profile 2: superficial processing students common to all students in the second main profile was that they processed the materials in the problem-solving task superficially: they did not make a serious effort to analyse, interpret or synthesise the information in the materials. this profile consisted of five students who used critical thinking as an instrument for determining truth or falsehood. their goal was to find the right answer to the problem. in contrast to the ‘thorough processing’ profile, students in this profile believed that knowledge is trustworthy only if it was produced through a reliable process, for example, by using empirical methods or consulting suitable experts. for these students, scientific and verified knowledge is the most reliable, because that kind of knowledge is based on evidence, and it is unbiased and objective. the students believed that subjective knowledge is predominantly untrustworthy. however, these students considered empirical knowledge (which holds the highest epistemic status) to be fallible too, not absolutely certain. they believed that some theories might be false and that it is possible to improve current conceptions and theories. from the epistemological perspective, these students also took the fallibilist position. the analysis indicated varying problems in critical thinking, such as problems in evaluating information, reasoning and reaching conclusions. some of these students also had little motivation to think critically. characteristic of the students in this profile was that they focused on isolated details. they took knowledge for granted. in other words, they accepted knowledge (particularly scientific knowledge) as true without question. these students were further divided into two sub-profiles according to how they acquired knowledge in a problem-solving situation, trusting either (2a) an argument by authoritative specialists or experts and (2b) verified empirical evidence or testimony. 8.4.1 referring to an argument by authoritative specialists or experts two students were categorised in this sub-profile. these students trusted authorities in acquiring knowledge. they saw themselves as uncertain knowers. these students believed that if a person who is said to be an authority on something makes an argument about that something, then the argument should be trustworthy and therefore usable. the right answers can be reached by consulting the right expert. these students repeated arguments and conclusions as these were presented in the documents. they drew on empirical knowledge and expert opinions, that is, arguments from authoritative sources. these students had difficulties in evaluating information. they focused on details and took in all the information they were presented without question. they picked up isolated and obvious details from the materials for each question. the students did not properly analyse, evaluate or interpret the information presented in the documents; they just jumped to conclusions. they disregarded and seriously misinterpreted important information. they also had problems in reasoning and reaching a conclusion. in order to make decisions or arguments, these students reproduced lists of isolated details from documents. they did not provide any reasons or explanations for their decisions. moreover, they did not identify alternative solutions. these students presented some unreliable claims as being credible. in the interview one student representing this sub-profile said that she has had similar problems in learning: h.hyytnen et al. 17 | f l r creative comprehension and, like, reaching a synthesis of overall concepts is really challenging for me. like, for instance, it’s really hard to study for exams, because i’d be more than happy to read the book, but then i don’t really grasp the key message and structure that it’s trying to communicate. acquiring data independently and, like, learning information that way is challenging. so, for instance, i haven’t done all my exams. i haven’t done them because, i’ve tried to start [studying for] them lots of times, but then some, how would you put it, if listening is auditory, then learning from text is pretty hard for me. this third year, which is currently underway, has been, like, really hard. i’ve really haven’t gotten many credits. i don’t feel i’ve accumulated the amount of information i should have or could have in three years. that the pieces of information are discrete and still pretty scattered in my head at the moment. (s5i) these students expressed the view that knowledge is always uncertain, but they did not consider themselves capable of evaluating knowledge. these students named a few external criteria for evaluating knowledge (such as an authority, expert opinion, publication, openness, journal citations). however, in practice they did not know how to use these criteria independently. both of these students gave authoritative experts the responsibility for evaluating knowledge, as the following example demonstrates: i don’t know what the right approach is in order to grasp those overall concepts from that huge mass of teensy-weensy details. because a candidate has to read a huge number of articles to find the ones that are, like, related to one’s own topic and all. so it’s really hard when you’re, like, reading an article to judge why this one might be better than that one. so. but i got a tip from my supervisor that i should pay attention to the reliability of the journal. to be honest, it’s the research articles, the ones we have at the university, that are actually the only ones we’re told we can cite. and then it’s like... they’re easy to evaluate based on which publications are more credible. and on the web on [sic] science, they have this one like... what is it, like an indicator that they have, just based on the number of citations and other factors, of the accuracy of the research data… it’s hard! in a way, to make that distinction between what’s true and what isn’t. at least i don’t have the know-how to say what’s true and what isn’t. (s5i) both students expressed the view that in a real-life situation they would seek help from other people, such as authoritative specialists (e.g. a university teacher) or other students. for example, one student representing this sub-profile said several times that she needed co-operation with other students to solve the task: i haven’t really had to do anything like this before. that it’s pretty hard in a way. there are so many points and, you know, perspectives here. i haven’t even had to think about stuff like this at the university, then it’s really like new for me, or you know. the assignment was pretty difficult. this might have been more interesting as a group assignment. like there would have been, you know, interaction, and then maybe it would have generated more thoughts somehow. (s7i) 8.4.2 trust in scientific method and proof the three students comprising this sub-profile were very critical. they all selected documents roughly based on empirical evidence, excluding more than half of the documents provided. these students were aware of their own behaviour: i eliminate some of the documents right away, for instance, email exchanges and letters, because the people haven’t investigated the matter; the text was written based on a gut feeling. (s3t) these students expressed the view that scientifically and empirically verified knowledge is the most reliable. they knew that corroboration from other reliable and related sources improves credibility. in the problem-solving situation they only trusted and used arguments by scientific authorities. these students described themselves as ‘error seekers’ in the interviews. the following examples illustrate the view of the students in this sub-group: h.hyytnen et al. 18 | f l r i trust exam books and articles a lot, yes. the difference between the two is that books can often, you know, be unreliable. plus the fact that, at least when they’re academic, it has a lot to do with when they were written, because things move so fast. that i have this one book for my thesis that i was just looking at, it’s got tons of mistakes. so, like, you just have to find them yourself. but with articles, probably those, and then of course depending on the journal. that maybe some article in science: i consider them pretty reliable. nowadays, i’m a little too sceptical about all kinds of things. i question a lot more these days than i used to. (s1i) you can get the first impression of reliability, of course, from the kind of source it was published in. in other words, i wouldn’t swallow some iltalehti [a finnish tabloid] headline on some scientific subject without thinking it over properly first. but having a reference to those academic publications, and as far as how i’ve drawn those conclusions myself after having read the article, not based on some newspaper headline, then that would be at least important in terms of first impressions. and… well, even if you read a scientific article, if it doesn’t agree at all with what you’ve learned about the topic earlier, then of course you’d have good cause to suspect those research results quite a bit. but the source is what i’d probably consider as the main thing. (s3i) although these students describe themselves as critical, they did not evaluate information from reliable sources in the problem-solving situation. for example, they did not recognise that two sources, which included empirical or verified knowledge, were biased. they analysed and interpreted information superficially and focused on isolated details. they did not interpret the documents they selected nor did they consider presuppositions. in order to draw conclusions these students mainly reproduced details from the documents. they did not identify alternative solutions or conclusions or approaches to the problem. nor did they provide any reasons or explanations for their own conclusions. they identified only a few claims that were presented in the documents and disregarded many relevant aspects of those claims. as a result they had problems reaching a conclusion. in the following example one student described the situation as follows: ‘at least i wouldn’t draw any conclusions based on those [documents]’ (s1t). all these students thought that there was one definite answer to the problem. one student in this sub-profile emphasised that she does not have any disposition or motivation to express reasons for or against some idea in a test situation or in everyday life: in everyday life it’s rare that, if you’re discussing something, it’s rare that anything like this happens. or i never, really rarely discuss anything argumentatively in any way. in real life i simply don’t like it, discussing issues. (s9i) 8.5 summary of the results figure 3 combines the two main themes in order to form a comprehensive picture of participants’ critical thinking. students who had several problems in critical thinking, yet had flexibility coped with the demands of the task. for example, two students had problems evaluating documents and did not form a general picture of the situation presented in the documents. because the students were struggling with the demands of the task, they selected documents and reproduced arguments and conclusions just as these were presented in the materials. eventually, the students reached a limited conclusion. on the other hand, there were students who were skilled in specific critical-thinking skills, such as analysing and interpreting information, but lacked other abilities, such as evaluating conflicting claims or producing explanations. these students could neither reach a conclusion nor were they able to determine the weaknesses of alternative solutions. in addition, these students were unable to change their actions or thinking; for example, they were not flexible in time management. these students somehow ‘over-analysed’ the problem, and, in the end, they failed in the problem-solving process. h.hyytnen et al. 19 | f l r in sum, the aspect that distinguished the participants were the differences in 1) aims, 2) the skills and disposition to think critically, 3) epistemological beliefs, 4) acquiring knowledge and 5) the skill of flexibility in adapting thinking and performance to the demands of the task. figure 3. summary of results. 9. conclusions f le x ib il it y i n c r it ic a l th in k in g critical thinking as a tool for enhancing understanding generating personal understanding through ‘thorough processing’ epistemological beliefs: both objective and subjective knowledge can hold the highest epistemic status. knowledge is fallible and contextual. r ig id ity in c r itic a l th in k in g reaching a well-reasoned solution + figuring out how to complete multidimensional tasks and planning action + defining the problem, evaluating, analysing, interpreting information, identifying alternative reasons, considering relationships between assumptions and ultimately reaching a reasoned conclusion. endless weighing of the different options + defining the problem, identifying ideas, analysing, and interpreting information problems in time management, decision-making, reaching reasoned conclusions, evaluating knowledge and judging the acceptability of information. problems in producing explanations reaching a limited solution identifying only a few ideas problems in evaluating, analysing and interpreting information problems in decision-making and reaching conclusions + searching for alternative ways to complete a task, changing one’s own routines or seeking help from authoritative specialists problems in reaching a conclusion identifying only a few ideas problems in evaluating, analysing and interpreting information problems in decision-making, reaching conclusions and producing explanations expectation that a problem has a definite, right answer disposition to think critically may be low critical thinking as a tool for determining truth or falsehood seeking the right answer through ‘superficial processing’ epistemological beliefs: objective knowledge can reach the highest possible epistemic status. knowledge is fallible. h.hyytnen et al. 20 | f l r even though the number of participants in this study was small, the variety in the students’ critical thinking was evident. our results showed that after three years of university study, students’ critical-thinking skills and epistemological beliefs differed greatly, and eight out of ten volunteer students had some problems in critical thinking (cf. arum & roksa, 2011; pascarella, blaich, martin & hanson, 2011). the multi-method approach effectively revealed the variety of problems that university students may encounter. while many problems were related to the lack of disposition or skill, such as an inability to evaluate the credibility of documents, examine presuppositions, make interpretations, develop a personal perspective or generate arguments or conclusions, some of the problems were related to an inability to modify the whole criticalthinking process in a flexible manner. these findings corroborate the ideas of fisher (2011) and scheffler (1965), who suggested that individuals may be more or less skilled at different critical thinking abilities. in other words, a student may have the ability to identify and evaluate information, for example, yet at the same time struggle with other abilities, such as arriving at a conclusion, adjudicating conflicting claims or producing arguments. therefore, it is clear that unilateral instructions concerning critical thinking are difficult to provide. the findings of this study support the idea that students’ epistemological beliefs were interwoven into their critical thinking (cf. kuhn, 2005; kuhn & weinstock, 2002). critical thinking emerged as a tool for understanding and determining the relevance and reliability of knowledge. students who showed superficial processing believed that objective knowledge (i.e. scientific and verified knowledge) has the highest possible epistemic status. although it is sensible to trust in scientific and empirical knowledge more than in personal opinions, the problem was that these students accepted scientific knowledge without question: they did not analyse, evaluate or interpret the information contained in the documents they were given. they acquired knowledge by appealing in equal measure to authoritative opinion, trusting in verified empirical evidence and listening to testimonies. these students palmed off a justification for knowing on authoritative experts. contrary to the results of many previous studies (e.g. kaartinen-koutaniemi & lindblom-ylänne, 2012; king & kitchener, 2004; kuhn, 1999, 2005; kuhn & weinstock, 2002), our main finding was that the students who appealed to authorities, testimonies or empirical evidence did not believe that knowledge is absolutely certain or unquestionable. nor did these students share the view that beliefs accurately represent or correspond to reality. in effect, the students did not share a sense of metaphysical realism. instead, these students claimed that scientific theories are uncertain, but probably true. the findings also show that the students who believed that knowledge is contextual and relative did not share a relativist view of knowledge. this finding is also contrary to the findings of earlier studies (e.g. lahtinen & pehkonen, 2012). conversely, all of the students saw knowledge as fallible. the students believed that it is possible to seek criteria for evaluating, comparing and justifying beliefs or theories. although some students struggled with evaluating knowledge, all of them saw current conceptions and theories as a starting point for further inquiry. they were thus fallibilist in the epistemological sense. this study further shows that students’ belief in themselves as critical thinkers and knowers is not necessarily equivalent to how they perform. thus, we assume, along with previous studies (elby & hammer, 2001; greene & yu, 2014), that the self-reported assessment method is not enough to gauge these kinds of complex processes. the present small-scale qualitative study has provided a unique picture of the critical thinking and personal epistemological beliefs of ten third-year bioscience students. furthermore, this study has educational significance by revealing problems in these students’ critical-thinking skills and by describing the role of students’ conception of knowledge in the process of thinking critically. through a multifaceted approach, it was also possible to deepen understanding of the emphases and gaps in the prevailing empirical research on critical thinking and personal epistemology. however, the findings of this study should not be interpreted as an accurate prediction of the target population. the findings of this study rather illustrate the nature of the phenomenon being studied, and how the different aspects of critical thinking and epistemological beliefs are intertwined and contribute to it together. this study involved a small, homogeneous sample of students in one discipline only. owing to these limitations, more communication between the theoretical, empirical and methodological perspectives is required to increase understanding of this complex phenomenon in the different spheres. h.hyytnen et al. 21 | f l r keypoints in this exploratory study we provide a multidimensional framework for analysing critical thinking by combining theoretical aspects from philosophical, educational and psychological approaches. in this exploratory study we provide a multidimensional framework for analysing critical thinking by combining theoretical aspects from philosophical, educational and psychological approaches. a large body of data for each participant (n=10) was collected using multiple methods, including think-aloud protocol, interviews and a collegiate learning assessment (cla) performance task. the result shows that students’ epistemological beliefs were interwoven into their critical thinking: students used critical thinking as a tool for enhancing understanding and for seeking a right answer none of students shared an absolutist view of knowledge. none of students shared a relativist view of knowledge. all students shared a fallibilist view of knowledge. acknowledgements the first author was financially supported by a scholarship from the alfred kordelin foundation. the authors are grateful to their colleagues for helpful comments on an earlier version of this article in manuscript, as well as to the students for their participation. we are grateful to roger benjamin at the council for aid to education for permitting us to use a performance task from the collegiate learning assessment and to viivi virtanen (phd) and mikael kivelä (ma) for helping us in data collection. references the american philosophical association. (1990). critical thinking: a statement of expert consensus for purposes of educational assessment and instruction "the delphi report". committee on pre-college philosophy. millbrae, ca: the california academic press. angeli, c., & valanides, n. (2009). instructional effects on critical thinking: performance on ill-defined issues. learning and instruction, 19, 322334. arum, r., & roksa, j. (2011a). limited learning on college campuses. society, 48, 203207. doi 10.007/s12115-011-9417-8 arum, r., & roksa, j. (2011b). academically adrift. limited learning on college campuses. chicago: the university of chicago press. the australian council for educational research. (2001). graduate skills assessment. summary report. 01/e occasional paper series. higher education division. retrieved from http://www.acer.edu.au/documents/gsa_summaryreport.pdf baartman, l. k. j., bastiaens, t. j., kirschner, p. a., & van der vleuten, c. p. m. (2007). evaluating assessment quality in a competence-based education: a qualitative comparison of two frameworks. educational research review 2, 114129. bailin, s., & siegel, h. (2003). critical thinking. in n. blake, p. smeyers, r. smith, & p. standish (eds.), the blackwell guide to the philosophy of education (pp. 181193). blackwell publishing. h.hyytnen et al. 22 | f l r barzilai, s., & zohar, a. (2012). epistemic thinking in action: evaluating and integrating online sources. cognition and instruction 30, 3985. bleazby, j. (2011). overcoming relativism and absolutism: dewey’s ideals of truth and meaning in philosophy for children. educational philosophy and theory 43, 453466. doi: 10.1111/j.14695812.2009.00567.x bok, d. (2006). our underachieving colleges. a candid look at how much students learn and why they should be learning more. princeton, nj: princeton university press. braun, v., & clarke, v. (2006). using thematic analysis in psychology. qualitative research in psychology 3, 77101. cotton, d., & gresty, k. (2006). reflecting on the think-aloud method for evaluating e-learning. british journal of educational technology 37, 4554. dewey, j. (1910). how we think. boston: d.c. heath & co. dierick, s., & dochy, f. (2001). new lines in edumetrics: new form of assessment lead to new assessment criteria. studies in educational evaluation 27, 307329. elby, a., & hammer, d. (2001). on the substance of a sophisticated epistemology. science education, 554567. ennis, r. (1991). critical thinking: a streamlined conception. teaching philosophy 14, 524. ericsson, k. a., & simon, h. a. (1993). protocol analysis: verbal reports as data. cambridge, ma: mit press. everitt, n., & fisher, a. (1995). modern epistemology: a new introduction. new york, mcgraw-hill. fenstermacher, g. d. (1994). the knower and the known: the nature of knowledge in research on teaching. review of research in education 20, 356. ferguson, l. e., & bråten, i. (2012). students’ profiles of knowledge and epistemic beliefs: changes and relations to multiple-text comprehension. learning and instruction 25, 4961. fisher, a., & scriven, m. (1997). critical thinking: its definition and assessment. edgepress and centre for research in critical thinking. university of east anglia. fisher, a. (2011). critical thinking: an introduction. second edition. cambridge: cambridge university press. green, j. a., & yu, s. b. (2014). modelling and measuring epistemic cognition: a qualitative reinvestigation. contemporary educational psychology 39, 1228. haig, b. d. (2005). abductive theory of scientific method. psychological methods 10, 371388. halpern, d. f. (2014) thought and knowledge. fifth edition. ny: psychology press. hammer, d., & elby, a. (2003). tapping epistemological resources for learning physics. the journal of the learning sciences 12 (1), 5390. heijltjes, a., van gog, t., leppink, j., & paas, f. (2014). improving critical thinking: effects of dispositions and instructions on economics students’ reasoning skills. learning and instruction 29, 3142. hofer, b. k. & pintrich, p. r. (2002). personal epistemology: the psychology of beliefs about knowledge and knowing. new jersey: lawrence erlbaum associates. hofer, b. k. (2004). epistemological understanding as a metacognitive process: thinking aloud during online searching. educational psychologist 39, 4355. hofer, b. k. (2005). the legacy and the challenges: paul pintrich’s contributions to personal epistemology research. educational psychologist 40, 95105. hofer, b. k. (2006). domain specificity of personal epistemology: resolved questions, persistent issues, new models. international journal of educational research 45, 8595. holma, k. (2004). plurealism and education: israel scheffler’s synthesis and its presumable educational implications. educational theory 54, 419430. holma, k. (2012). fallibilist pluralism and education for shared citizenship. educational theory 62, 397409. holma, k. (2014). the critical spirit: emotional and moral dimensions critical thinking. under review. holma, k., & hyytinen, h. (equal contribution) (2014). the philosophy of personal epistemology. under review. h.hyytnen et al. 23 | f l r ichikawa, j. j., & steup, m. (2012). the analysis of knowledge. in edward n. zalta (ed.), the stanford encyclopedia of philosophy (winter 2012 edition). retrieved from http://plato.stanford.edu/archives/win2012/entries/knowledge-analysis/>. johnson, r. b., onwuegbuzie, a. j., & turner, l. a. (2007). toward a definition of mixed methods research. journal of mixed methods research 1, 112133. johnstone, c. j., bottsford-miller, n. a., & thompson, s. j. (2006) using the think aloud method (cognitive labs) to evaluate test design for students with disabilities and english language learners. technical report 44. minneapolis, mn: university of minnesota, national centre on educational outcomes. retrieved from http://www.cehd.umn.edu/nceo/onlinepubs/tech44/ kaartinen-koutaniemi, m., & lindblom-ylänne, s. (2008) personal epistemology of psychology, theology and pharmacy students: a comparative study. studies in higher education 33, 179191. kaartinen-koutaniemi, m., & lindblom-ylänne, s. (2012). personal epistemology of university students: individual profiles. education research international 2012, 18. doi:10.1155/2012/807645 king, p. m., & kitchener, k. s. (1994). developing reflective judgment: understanding and promoting intellectual growth and critical thinking in adolescents and adults. san francisco: jossey-bass. king, p. m., & kitchener, k.s. (2004). reflective judgment: theory and research on the development of epistemic assumptions trough adulthood. educational psychologist 39, 518. klein, s., benjamin, r., shavelson, r., & bolus, r. (2007). the collegiate learning assessment, facts and fantasies. evaluation review 31, 415439. doi: 10.1177/0193841x07303318. klein, s., freedman, d., shavelson, r., & bolus, r. (2008). assessing school effectiveness. evaluation review 32, 511525. kuhn, d. (1999). a developmental model of critical thinking. educational researcher, 28, 1625. kuhn, d. (2005). education for thinking. harvard university press. kuhn, d., & weinstock m. (2002). what is epistemological thinking and why does it matter? in barbara k. hofer & paul, r. pintrich (eds.) personal epistemology: the psychology of beliefs about knowledge and knowing, 121144. new jersey: lawrence erlbaum associates. lahtinen, a-m, & pehkonen, l. (2013). ‘seeing things in a new light’: conditions for changes in the epistemological beliefs of university students. journal of further and higher education, 37, 397415. lausberg, h., & sloetjes, h. (2009). coding gestural behavior with the neuroges-elan system. behavior research methods, instruments, & computers, 41(3), 841849. retrieved from http://www.springerlink.com/content/d53722q3k3314374/ maclellan, e. (2004). how convincing is alternative assessment for use in higher education. assessment & evaluation in higher education 29, 311321. markowitsch, j., & messerer, k. (2007). practice-oriented methods in teaching and learning in higher education. in p. tynjälä, j. välimaa, & g. boulton-lewis (eds.), higher education and working life: collaborations, confrontations and challenges (pp. 177194). oxford: elsevier. max planck institute for psycholinguistics. (2012). elan. [software]. available from http://tla.mpi.nl/tools/tla-tools/elan/ moore, t. (2004). the critical thinking debate: how general are general thinking skills. higher education research & development 23, 318. moore, t. (2013). critical thinking: seven definitions in search of a concept. studies in higher education, 38, 506522. muis, k. r., bendixen, l. d., & haerle, f. c. (2006). domain-generality and domain-specificity in personal. epistemology research: philosophical and empirical reflections in the development of a theoretical framework. educational psychology review 18, 3–54. niiniluoto, i. (1999). critical scientific realism. oxford: oxford university press. pascarella, e. t., blaich, c., martin, g. l., & hanson, j. m. (2011) how robust are the findings of academically adrift? change: the magazine of higher learning 43, 20–24. peirce, c. s. (1934). pragmatism and pragmaticism, vol. 5 of collected papers of charles sanders peirce. cambridge, massachusetts: harvard university press. h.hyytnen et al. 24 | f l r perry, w. g. (1970). forms of intellectual and ethical development in the college years. harvard university. pintrich, p. r. (2002). future challenges and directions for theory and research on personal epistemology. in b. k. hofer & p. pintrich (eds.), personal epistemology: the psychology of beliefs about knowledge and knowing (pp. 121144). new jersey: lawrence erlbaum associates. putnam, h. (1981). reason, truth and history. cambridge: cambridge press. reed, b. (2002). how to think about fallibilism. philosophical studies 107, 143157. ryle, g. (1949). the concept of mind. london: hutchinson’s university library. scheffler, i. (1965). conditions of knowledge. an introduction to epistemology and education. glenview: scott, foresman and company. shavelson, r.j. (2010). measuring college learning responsibly: accountability in a new era. stanford, ca: stanford university press. shope, r. k. (2004). the analysis of knowing. in i. niiniluoto, m. sintonen, & j. woleński (eds.) handbook of epistemology (pp. 356). dordrecht: kluwer academic. siegel, h. (1988). educating reason: rationality, critical thinking, and education. ny: routledge. smith, g. (2002). are there domain-specific thinking skills? journal of philosophy of education 36, 207227. van someren, m.w., barnard, y. f., & sandberg, j. a. c. (1994). the think aloud method. a practical guide to modelling cognitive processes. department of social science informatics. university of amsterdam. london: academic press. states, a., min-leliveld, m., gijbels, d., & petegem, p. van (2010). the impact of instructional development in higher education: the state-of-the-art of the research. educational research review 5, 25–49. timmermans, s., & tavory, i. (2012). theory construction in qualitative research: from grounded theory to abductive analysis. sociological theory 30, 167186. tremblay, k., lalancette, d., roseveare, d. (2012) ahelo feasibility study report. volume 1 design and implementation. organization for economic co-operation and development (oecd). retrieved from http://www.oecd.org/edu/skills-beyond-school/ahelofsreportvolume1.pdf valanides, n. and angeli, c. 2005. effects of instruction on changes in epistemological beliefs. contemporary educational psychology 30, 314330. west, e. j. (2004). perry’s legacy: models of epistemological development. journal of adult development 11, 6170. footnotes 1 king and kitchener (2004) do not call the lowest level of reflective thinking realism. however, in their model they maintain that, at the most limited level of thinking, knowledge is certain and is obtained from direct observation (p.7). this position fits metaphysical realism. frontline learning research 2 (2013) 111 issn 2295-3159 corresponding author: sandra van aalderen-smeets, po box 217, 7500 ae, enschede, the netherlands, sandra.vanaalderen@utwente.nl http://dx.doi.org/10.14786/flr.v1i2.27 3 | f l r investigating and stimulating primary teachers’ attitudes towards science: summary of a large-scale research project juliette walma van der molen a , sandra van aalderen-smeets a * a university of twente, the netherlands * both authors contributed equally article received 13 may 2013 / revised 19 september 2013 / accepted 19 september 2013 / available online 20 december 2013 abstract attention to the attitudes of primary teachers towards science is of fundamental importance to research on primary science education. the current article describes a large-scale research project that aimed to overcome three main shortcomings in attitude research, i.e. lack of a strong theoretical concept of attitude, methodological flaws in attitude research, and ineffective interventions. the research project included (a) the development of a new theoretical framework for teachers’ attitudes towards (teaching) science, (b) a new validated survey instrument (the das) to measure the different underlying components of primary teachers’ attitudes toward teaching science, and (c) an in-service professional development training course based on the previously developed theoretical framework. the framework of attitude consists of three dimensions: cognitive beliefs, affect, and perceived control, each consisting of several subcomponents. by means of the survey instrument we investigated the effects of the attitude focussed training course. the course aimed to improve attitude by creating awareness about teachers’ own attitudes, stimulating their scientific attitudes and curiosity, and training inquiry and thinking skills. the course refrained from providing recipe-like example lessons, materials, or methods. using a pre-post, experimental-control design we showed that the course significantly improved the affective and perceived control dimension of attitude. teachers enjoyed teaching science more, showed increased self-efficacy, and felt less dependent on external factors. this project shows that genuine attitude improvements of primary teachers can be accomplished by attitude focussed professional development. keywords: science education; attitude towards science; professional development; inquiry based learning. j. walma van der molen & s. van aalderen-smeets 4 | f l r 1. introduction the study on attitude towards science has received considerable attention over the last decades (osborne, simon, & collins, 2003; osborne & dillon, 2008). in our society, we are increasingly dependent on science and technology in all kinds of ways. despite this, a large section of the population has little scientific or technical knowledge and attitudes towards science and technology are not very positive. although this lack of interest often only really manifests itself when young people make their choice of subjects at secondary school, most pupils have already formed stereotypical images of and negative attitudes about science and technology before the age of 14 (osborne & dillon, 2008; tai, qi liu, maltese, & fan, 2006; turner & ireson, 2010). international research (e.g., jarvis & pell, 2004) shows that this negative image of science subjects is also found among primary school teachers. many primary teachers feel insufficiently capable of providing education in the field of science. they find it difficult to deal with pupils’ questions in this area and prefer to fall back on standard textbooks or highly structured materials or exercises. when this type of practice is the norm, it is no wonder that pupils’ attitudes with regard to science and technology are difficult to change for the better. primary schools and their teachers therefore play a crucial role in determining the attitudes and images of students towards science and improving primary teachers’ attitudes towards science is one of the major challenges in today’s science education (haney, czerniak, & lumpe, 1996; osborne, simon & collins, 2003; osborne & dillon, 2008). research has shown that (pre-service) primary teachers’ scientific literacy is low, that their attitudes towards science are mostly negative, and that primary teachers share a number of characteristics that impede the stimulation of science learning and of positive attitudes towards science among their pupils (harlen & holroyd, 1997; jarvis & pell, 2004; tosun, 2000; yates & goodrum, 1990). professional development should therefore pay explicit attention to improving the attitude of (preservice) primary teachers towards science (haney, czerniak, & lumpe, 1996). however, although primary teachers’ attitudes toward science have been investigated widely, scientific progress in this field has been slow. in our view, there are three important and in part related reasons for this poor development. first, until recently the theoretical conceptualization of the construct of primary teachers’ attitudes towards science was poorly articulated, both in research and in educational change projects (barmby, kind, & jones, 2008; bennett et al., 2001; coulson, 1992; osborne et al., 2003; pajares, 1992). many studies provided incomplete definitions (or no definition at all) for the construct of attitude, failed to explicate the components of attitude that they measured, or did not distinguish between attitudes towards science and other related concepts (e.g., opinions or motivation). a second reason for the slow scientific progress in research on primary teachers’ attitudes towards science is the lack of reliable and valid attitude measuring instruments that accommodate to necessary theoretical and statistical standards (gardner, 1995; reid, 2006). a recent review of the literature points at important flaws in the methodology of a majority of studies, such as weak psychometric properties and failure to pilot-test, validate, and evaluate the instrument according to current psychometric standards (van aalderen-smeets & walma van der molen, 2012). a third reason for slow progress in primary teachers’ attitude development may be found in the interventions that are directed at teachers’ professional development. most professional development projects that aim to improve science education in primary school focus on improving classroom didactics and provide a collection of standardized, recipe-like science lessons. although this might improve the knowledge of teachers regarding science content or pedagogical content knowledge, it does not automatically lead to improvements in their attitudes toward science. j. walma van der molen & s. van aalderen-smeets 5 | f l r 2. recent research to remedy these shortcomings in research and in the professional development of primary teachers’ attitudes towards science, we established a large-scale project over the past three years that included (a) the development of a new theoretical framework for teachers’ attitudes towards (teaching) science, (b) a new validated survey instrument to measure the different underlying components of primary teachers’ attitudes towards teaching science, and (c) an in-service professional development training course that was based on the previously developed theoretical framework. the project was based on the contention that only when teachers possess a positive attitude towards teaching science and towards inquiry-based learning, they will be motivated and able to seek and use science content in their lessons, to use inquiry-based learning in class, and to affect pupils’ scientific attitudes and their attitudes towards science in a positive manner. the present article presents an overview of the results of this integrated project. 2.1 theoretical framework the development of a new attitude framework implied disentangling the construct of primary teachers’ attitudes towards science. as described elaborately in the theoretical article that resulted from this project (van aalderen-smeets & walma van der molen, 2012), we aimed to explicate and structure the range of underlying components or dimensions of primary teachers’ attitudes towards science. the framework was based on an extensive review of previously used concept definitions of the construct of primary teachers’ attitudes towards science and we related these components to general psychological attitude theories, such as the tripartite model of attitudes (e.g., eagly & chaiken, 1993) and the theory of planned behaviour (e.g., ajzen & fishbein, 1980). this resulted in a framework of attitude consisting of the following three main dimensions: cognitive beliefs, affect, and perceived control. cognitive beliefs refer to teachers’ beliefs and opinions about (a) the relevance of science and science education, (b) beliefs about the relative difficulty of teaching science, and (c) gender stereotypical beliefs regarding science and science teaching. the second dimension of affect contains the independent subcomponents of (a) enjoying (teaching) science and (b) anxiety related to (teaching) science. the third dimension, perceived control, refers to the amount of control teachers perceive to have over (teaching) science and it consists of (a) self-efficacy (an internal sense of control, such as the perceived capacity to teach science) and (b) perceived dependency on context factors (beliefs about the extent to which a teacher is dependent on external factors to teach science, such as the availability of teaching-methods or materials, enough time, or other resources). the outcomes of our review of concepts suggested that, in addition to internal beliefs and feelings associated with self-efficacy, the beliefs and feelings that teachers have about external (i.e., contextual) factors are closely related to teachers’ sense of being in control. in our view, the perception of teachers regarding their dependency on context factors (e.g., their belief that they can teach science only if their school ensures the availability of the proper materials and sufficient preparation time) is an indispensable component of a complete theoretical framework of primary teachers’ attitudes toward science. the development of the theoretical framework provided a new theoretical basis for measuring primary teachers’ attitudes towards science and for interventions aiming to improve their attitude, two issues that were pursued in the studies described below. 2.2 validated survey instrument based on the theoretical exercise described above, we developed a new measurement instrument: the dimensions of attitudes towards science questionnaire (das). after construction of the first version of the das, we investigated its validity and reliability by means of a qualitative in-depth focus group study and a quantitative survey study (asma, walma van der molen, & van aalderen-smeets, 2011; van aalderensmeets & walma van der molen, 2013a). using the theoretical framework as a basis for the development of a new attitude instrument ensured that the complete range of relevant attitude dimensions and subcomponents was incorporated in the instrument. in addition to the subscales that correspond to the components of the theoretical framework of attitude, scales measuring teachers’ views on science and their intended science teaching behaviour were included in the questionnaire. j. walma van der molen & s. van aalderen-smeets 6 | f l r the pilot-tested das questionnaire was distributed digitally to in-service and pre-service teachers. a total of 556 respondents returned a complete questionnaire (80% female, mean age 31 years). the das instrument was evaluated at multiple levels; i.e. the validity of the overall structure of the instrument was investigated by confirmatory factor analysis, the internal consistency of the subscales was determined by cronbach’s alpha coefficient, and to assess the discriminating ability of each item, we looked at the standard deviations of each item and their response-range. our results supported the validity, reliability, and discriminating ability of the das instrument. the resulting factor structure corresponded to the underlying theoretical model. the obtained seven-factor solution confirmed the hypothesis that the das questionnaire is measuring the seven underlying sub-components of the attitude framework. furthermore, the results of the internal consistency analyses showed high internal consistency in all seven subscales. also, regression analyses showed that scores on the subscales of affect and perceived control were predictive of the scores on intended behaviour, indicating predictive validity. finally, all items in the revised das instrument showed large response variation, indicating a strong ability to discriminate between respondents displaying different beliefs, feelings, and thoughts toward teaching science. these results show that the das instrument is a valid, reliable, and comprehensive survey tool, which is able to measure a complex and difficult motivational concept (for a complete description of the instrument, see: van aalderen-smeets & walma van der molen, 2013a). the das instrument thus proves to be a promising instrument within the field of science education and teacher training at primary school level. it can be utilized as a research instrument for effect studies of training courses and other interventions aiming to professionalize primary teachers. furthermore, it can serve as a diagnostic tool for adapting training courses and interventions to the individual needs of preand inservice teachers. and finally, it can be used as a coaching tool for making primary teachers aware of their own view of science and their (changed) attitudes toward teaching science. by means of these different uses, the das instrument could become a highly valuable instrument for making progress within the field of science education in primary schools. 2.3 professional development our in-service training course was also based on the different underlying components in our theoretical framework and consisted of six 3-hour meetings spread over six months (walma van der molen, van aalderen-smeets & groot koerkamp, 2011). the course focused on creating awareness about teachers’ attitudes towards teaching science, awareness about their views on science, and awareness about their attitude towards inquiry-based methods of learning. these attitudes were reflected upon and challenged by means of assignments, exercises, questioning, information transfer, and research activities like experiments. in addition, each of these course elements was accompanied by activities that stimulated teachers’ own curiosity, inquisitiveness, critical thinking, reflection and metacognition, and higher-order thinking. most importantly, the training course refrained from providing recipe-like example lessons, pre-structured materials, or predefined methods. during the meetings, teachers engaged in coursework that prepared them for take-home assignments. during the final meeting, participants presented a science and inquiry-based project that they had developed and carried out with their pupils (see appendix for a general overview of the core elements of the course). to test the effectiveness of our course, we used a pre-post test, experimental-control group design to asses changes in teachers’ attitudes toward teaching science over time (experimental group n = 49, control group n = 45). the experimental group participated in the training course, while the control group did not receive any formal training. however, the control group did consist of teachers that reported to be interested in science education. we used our ‘dimensions of attitude toward science’ (das) instrument to measure quantitative changes in teachers’ attitudes towards teaching science. in addition, we used qualitative, open ended, self-report measures to investigate changes in scientific attitude, perceptions of science, and attitude towards inquiry-based methods of learning. j. walma van der molen & s. van aalderen-smeets 7 | f l r two out of three attitude components showed significant improvements (see table 1), indicating that the training course had a positive effect on attitude towards teaching science (van aalderen-smeets & walma van der molen, 2013b). participants in the course gained a more positive attitude in the affective and perceived control dimension of attitude, compared to the control group. this means they enjoyed science teaching more, showed increased self-efficacy, and felt less dependent on recipe-like, standardized methods, top down instruction, and the availability of pre-organized projects and materials. participating teachers in the training course did show a significant improvement on the remaining attitude components (less anxiety when teaching science, believing science teaching is more relevant, and less stereotypical beliefs), but this improvement was not significantly different from changes in the control group, even though the changes in the control itself were not significant, see table 1. this could be due to the relatively high interest in science and engagement with science of the control group, i.e., because they engaged in science related teaching and activities in between the preand post-test, they improved their attitudes slightly. on the open-ended questions, teachers indicated that, after participation in the course, they found science to be less complex and to use inquiry-based methods of teaching more often in both science-related lessons and in other school subjects. in addition, teachers’ responses showed enhancement in their scientific attitude, i.e., they became more curious, more critical, and more explorative. furthermore, teachers’ perceptions and expectations of their pupils changed; teachers reported to be surprised by the excellent achievement of some pupils during science lessons (for a more detailed report about this effect study, see van aalderen-smeets & walma van der molen, 2013b). table 1. attitude toward teaching science; comparison of attitude scores between trained and control group on pre and post-test. mean difference scores of the trained and control group are presented in the left columns. the right columns present the results of an anova analysis for each component of professional attitude (significant effects are printed in bold). trained group control group mdiff sd mdiff sd f value (1, 104) p eta cognition relevance .24 a .59 .08 .64 1.9 .17 .02 gender -.27 a .82 -.06 .87 1.6 .21 .01 affect enjoyment .53 a .80 .14 .90 5.6 .02 b .05 anxiety -.39 a .77 -.15 .95 2.1 .16 .02 perceived control self-efficacy .40 a .55 .11 .51 7.6 .01 b .07 context dependency -.98 a 1.10 .09 .98 27.8 (1,102) .01 b .21 mdiff = mean difference score (t2-t1), sd = standard deviation, a significant improvement within group (paired t-test between pre-test and post-test within group) b significant difference between improvements of trained and control group (anova interaction effect) this study indicates that focusing explicitly on primary teachers’ attitudes in a training course does improve the beliefs, feelings, and perceived control primary teachers have regarding science education and inquiry based methods of learning. j. walma van der molen & s. van aalderen-smeets 8 | f l r 3. future directions this large-scale research project provides valid tools and new approaches for improving and assessing primary teachers’ attitude towards (teaching) science and opens the door for several future directions in attitude research. the attitude effects reported here are short-term effects based on self-reports. further research is needed to investigate the long-term effects of teachers’ attitude change and changes in their actual teaching. furthermore, more research is needed on the effects of improved teacher attitudes on their pupils’ or students’ attitude toward science and their career choices in their future school career. the results presented here are not only relevant for primary education. the gained knowledge about improving attitudes towards science can be applied in interventions, professional development, and research aiming to improve secondary school students’ attitudes towards science as well. in addition, the approach taken in this research project, i.e., building a theoretical framework, then constructing a valid instrument, and testing the effects of a new intervention, may be followed for other lines of research, such as research on attitudes towards inquiry based learning or on attitudes towards the use of digital media in education. the framework presented in this article is an essential new step toward a convergence of the research in this field. only when researchers are aware of the complexity of the construct of teachers’ attitudes toward science, when explicit and substantiated decisions have been made regarding which components and objects should be measured, and when methodologically sound instruments and interventions are used, can scientific progress be achieved in research on teachers’ attitudes. future research is needed to investigate the various aspects of the proposed framework, including the relationships between the components and the weights of the various components and sub-attributes in predicting behavioural intention. the investigation of these aspects is a prerequisite for gaining further insight into the dynamics of primary teachers’ attitudes toward science and for the development of interventions that are better suited to improve specific aspects of teachers’ attitudes. keypoints professional development of primary teachers in science education should pay explicit attention to attitude improvements. attitude research should be based on a theoretical model, such as the framework of attitude towards science. primary teachers feel more in control over science teaching when they have gone through the inquiry process themselves. primary teachers’ attitudes towards teaching science can be improved by stimulating their own curiosity, scientific attitudes and thinking skills. sound theoretical and methodological attitude research should be on the frontline of research in science learning. acknowledgements contract grant sponsor: platform beta technology in the netherlands. references ajzen, i., & fishbein, m. (1980). understanding attitudes and predicting social behavior. englewood-cliffs, nj: prentice-hall. j. walma van der molen & s. van aalderen-smeets 9 | f l r asma, l.j.f., walma van der molen, j.h., & van aalderen-smeets, s.i. (2011). primary teachers’ attitudes towards science: results of a focus group study. in m.j. de vries, h. van keulen, s. peters, & j.h. walma van der molen (eds.). professional development for primary teachers in science. the dutch vtb-pro project in an international perspective (pp. 89–105). rotterdam: sense. barmby, p., kind, p.m., & jones, k. (2008). examining changing attitudes in secondary school science. international journal of science education, 30, 1075–1093. doi: 10.1080/09500690701344966. bennett, j., rollnick, m., green, g., & white, m. (2001). the development and use of an instrument to assess students’ attitude to the study of chemistry. international journal of science education, 23, 833–845. doi: 10.1080/09500690010006554. coulson, r. (1992). development of an instrument for measuring attitudes of early childhood educators towards science. research in science education, 22, 101–105. doi: 10.1007/bf02356884. eagly, a., & chaiken, s. (1993). the psychology of attitudes. belmont, ca:wadsworth group/thomson learning. gardner, p.l. (1995). measuring attitudes to science: unidimensionality and internal consistency revisited. research in science education, 25, 283–289. doi: 10.1007/bf02357402. haney, j.j., czerniak, c.m., & lumpe, a.t. (1996). teacher beliefs and intentions regarding the implementation of science education reform strands. journal of research in science teaching, 33, 971–993. doi: 10.1002/(sici)1098-2736(199611)33:9,971::aid-tea2.3.0.co;2-s. harlen, w., & holroyd, c. (1997). primary teachers’ understanding of concepts of science: impact on confidence and teaching. international journal of science education, 19, 93–105. doi: 10.1080/0950069970190107. jarvis, t., & pell, a. (2004). primary teachers’ changing attitudes and cognition during a two-year science inservice program and their effect on pupils. international journal of science education,26, 1787– 1811. doi: 10.1080/0950069042000243763. osborne, j., simon, s., & collins, s. (2003). attitudes towards science: a review of the literature and its implications. international journal of science education, 25, 1049–1079. doi: 10.1080/0950069032000032199. osborne, j., & dillon, j. (2008). science education in europe: critical reflections (a report to the nuffield foundation). london: the nuffield foundation. retrieved from http://www.pollen-europa.net/pollen dev/images editor/nuffield report.pdf. pajares, m.f. (1992). teachers’ beliefs and educational research: cleaning up a messy construct. review of educational research, 62, 307–332. doi: 10.3102/00346543062003307. reid, n. (2006). thoughts on attitude measurement. research in science & technological education, 24, 3– 27. doi: 10.1080/02635140500485332. tai, r. h, liu, c. q, maltese, a. v, & fan, x. (2006). planning early for careers in science. science, 312, 1143-1145. doi: 10.1126/science.1128690. turner, s., & ireson, g. (2010). fifteen pupils’ positive approach to primary school science: when does it decline? educational studies, 36, 119-141. doi: 10.1080/03055690903148662. tosun, t. (2000). the beliefs of preservice elementary teachers towards science and science teaching. school science and mathematics, 100, 374–379. doi: 10.1111/j.1949-8594.2000.tb18179.x. van aalderen-smeets, s.i., walma van der molen, j.h., & asma, l.j.f. (2012). primary teachers’ attitude toward science: a new theoretical framework. science education, 96, 158–182. doi: 10.1002/sce.20467. van aalderen-smeets, s. i. & walma van der molen, j. h. (2013a). measuring primary teachers’ attitudes toward teaching science: development of the dimensions of attitude towards science (das) instrument. international journal of science education, 35, 4, 577-600. doi:10.1080/09500693.2012.755576. van aalderen-smeets, s. i. & walma van der molen, j. h. (2013b). improving primary teachers’ attitudes toward science by attitude focussed professional development. (submitted). j. walma van der molen & s. van aalderen-smeets 10 | f l r walma van der molen, j.h., aalderen-smeets, van, s.i., & groot koerkamp, e. (2011). cursusboek talentontwikkeling, wetenschap en techniek: professionalisering voor basisschoolleerkrachten [coursebook talent development, science, and technology: professional development for primary teachers]. knowledge center for science and technology (kwto). yates, s., & goodrum, d. (1990). how confident are primary school teachers in teaching science? research in science education, 20, 300–305. doi: 10.1007/bf02620506. appendix schematic overview of the science education-training course for primary teachers (walma van der molen, van aalderen-smeets & groot koerkamp, 2011). content elements take-home assignments attitude towards (teaching) science scientific attitudes scientific skills personal development in class 1 creating awareness about view of science introduction on attitude toward science stimulating teachers’ curiosity and amazement about everyday items keeping a diary of amazement identifying and challenging pupils views about and perceptions of science linking meeting 1 to 2: from amazement and curiosity to formulating research questions 2 challenging cognitive beliefs about the relevance of science and stereotypical gender beliefs regarding science stimulating curiosity, inquisitiveness and question asking, dealing with scientific uncertainty and ambiguity formulating research questions and hypotheses evaluating a science education method or related medium (website, tv) with attitudinal criteria how many difficult questions can you think of? linking meeting 2 to 3: from research questions to research design and enjoying science 3 challenging the affective component of attitude, i.e. enjoyment and anxiety stimulating a critical attitude choosing research method and design, and conducting research in the classroom self-observation: searching for opportunities to integrate science in your existing lessons conducting research in the classroom; from research question to experiment linking meeting 3 to 4: not only hands-on but minds-on; stimulating academic thinking skills 4 being persistent, creative and original creative and higher order thinking skills; developing a thinking lesson improving an existing science method stimulating creative thinking in children j. walma van der molen & s. van aalderen-smeets 11 | f l r linking meeting 4 to 5: independence of context factors and being in control of science teaching 5 stimulating selfefficacy and perceived control using different perspectives to solve problems stimulating reflective and metacognitive thinking skills developing and teaching a science lesson linking meeting 5 to 6: from feeling in control to actually teaching science 6 summary of training course participant’s presentations of visual reports on science lessons note that this figure provides a schematic overview. several additional attitudinal elements are interwoven in the course. also spontaneous questions and comments from the participants that came up during the course were explained in terms of attitude or related to attitude toward science and scientific attitude. the first four columns are aimed at the personal development of the teacher. the in class, take home assignments are explicitly aimed at the interaction with pupils. frontline learning research 5 (2014) 1 28 issn 2295-3159 corresponding author: dorit alt doritalt@014.net.il http://dx.doi.org/10.14786/flr.v2i3.68 1 | f l r the construction and validation of a new scale for measuring features of constructivist learning environments in higher education dorit alt a a kinneret college on the sea of galilee, israel article received 4 december 2013 / revised 11 february 2014 / accepted 5 june 2014 / available online 13 june 2014 abstract this study was aimed at mapping features of constructivist activities in higher education settings, constructing and validating a new scale for measuring their presence in lecture face-to-face based environments (lbe), seminars (sm), and distance learning environments (dle). a mix-method approach was implemented in three phases. the first phase was aimed at qualitatively analysing classroom observational activities as experienced by students, in order to learn about actual instantiations of the theoretical constructivist features. the results foregrounded eight categories: 'knowledge construction', 'authenticity', 'multiple perspectives', 'prior knowledge', 'in-depth learning', 'teacherstudent interaction', 'social interaction' and 'cooperative dialogue'. the second phase was aimed at developing a questionnaire, based on the descriptions gathered in phase 1. the third quantitative phase was used to validate the developed questionnaire (constructivist learning in higher education settings scale [clhes]) by using structural equation modelling. in addition, students' academic self-efficacy had been chosen as a criterion variable in order to further assess construct validity of the clhes. lastly, a multivariate analysis of covariance was applied to allow the characterisation of differences between the learning settings in regard to the clhes eight factors and academic self-efficacy. the scales were submitted to 597 undergraduate third-year college students. according to the main results: construct validity of the new scale has been confirmed; teacher-student and student-student interactions were positively connected to self-efficacy for learning; and sm were perceived as generally more constructivist when compared with the other learning environments. implications of these findings and directions for future research are discussed. keywords: constructivism; academic self-efficacy, higher education d. alt 2 | f l r 1. introduction educational practice is continually subjected to renewal needs, due mainly to the growing proportion of information communication technology, social changes, globalisation of education, and the pursuit of quality. the accelerating rate of social change puts a premium on adaptability to the emerging requirements of present society such as communication and cooperation skills, and ability to critically select, acquire, and use knowledge (quisumbing, 2005; wegerif & de laat, 2011). these types of renewal needs require developing updated instructional practices that could integrate knowledge with the personal transferable skills (pellegrino & hilton, 2012). in order to meet the demands of 21th century learning needs, the creation of learning environments based on the constructivist pedagogy is suggested to engage learners in knowledge construction carried out by social negotiated tasks in real-world contexts while enhancing students' ability to regulate their learning (de kock, sleegers, & voeten, 2004). the constructivist approach has taken a leading theoretical position and has become a powerful driving force in the dynamic relationship between teaching methods and learning processes. however, despite the growing attention paid to constructivist pedagogic challenges in the context of learning environments, the instructional principles of this theory, which are aimed at directing the nature of educational processes, still need to be clarified (gijbels, van de watering, dochy, & van den bossche, 2006). nonetheless, during the past two decades, attempts to map instructional constructivist principles of educational materials and learning environments have yielded a few results in the field of university teaching (fraser, treagust, williamson, & tobin, 1987; tenenbaum, naidu, jegede, & austin, 2001). for example, tenenbaum et al. (2001) defined and empirically examined seven key features of constructivist learning environments: (1) arguments, discussions, debates, (2) conceptual conflicts and dilemmas, (3) sharing ideas with others, (4) materials and resources targeted toward solutions, (5) motivation toward reflections and concept investigation, (6) meeting students‟ needs, and (7) making meaning, real-life examples. however, alt (in press) maintains that this scale could be further elaborated to include additional perceptions on a wider range of theoretical dimensions that are important to the current situation in higher education setting. for example, understanding the students' prior knowledge (meyer, 2004); constructing environments for teaching and learning that are decompartmentalised (minick, stone, & forman, 1993); and engaging students in a self-regulated learning, in which they can set their own goals, mediate new meanings from existing knowledge, and form an awareness of current knowledge structures (hakkarainen, lipponen & järvelä, 2002). therefore, constructing a new scale for measuring a wider range of constructivist features in university learning environments is central for this study. other scales, such as the approaches to study inventory (asi) or the approaches to learning and studying inventory (alsi) (entwistle & ramsden, 1983), and the student process questionnaire (r-spq-2f) (biggs, kember, & leung, 2001), were used to measure constructivist learning by means of students' approaches to learning. these studies were based on the assumption that constructivist learning environments are aimed at fostering a deep (rather than surface) approach to learning (lea, stephenson, & troy, 2003; tiwari et al., 2006). approaches to learning refer to how students perceive themselves going about learning in a specific learning situation, and focus on how intention and process are combined in students' deep or surface learning (biggs et al., 2001). it has been recognised that these approaches to learning are not characteristics of learners but are determined by a relation between a learner and a context, and that students adjust their approaches to learning depending on the requirements of the task (evans, 2014). however, the nature of learning tasks and contexts has changed dramatically in the last decade in terms of depth and range of curricula and the diversity of settings (e.g., distance learning), thus the depth of learning in constructivist environments could currently refer to diversified requirements of those d. alt 3 | f l r environments, pertaining to the process of 'learning to learn', learning to gain an internal control for learning, and learning how to cooperate within communities of enquiry (de kock et al., 2004). therefore, assessing constructivist features implementation in current higher education learning contexts is of importance and lies at the core of the present study. moreover, both teacher and student are assumed to be jointly responsible for the outcome, the teacher for structuring the enabling conditions, the learner for engaging them, thus an approach to learning is described as the nature of the relationship between student, context, and task (biggs et al., 2001). however, the learning approaches scales seem to put emphasis on the learners, disregarding some significant theoretical components of learning patterns such as students' perceptions of the learning context that could affect their learning engagements (cano & garcía-berbén, 2014). in order to bridge the gap between theory and empirical study, this study will assess the relations between three learning dimensions: students' constructive learning activity perceptions, teacher-student engagements and students' social activity. finally, current studies have suggested that constructivist learning environments do not always promote students' deep learning, and point to several factors that limit the effectiveness of those learning settings (baeten, kyndt, struyven, & dochy, 2010; gijbels, segers, & struyf, 2008; kyndt, dochy, & cascallar, 2014). for example, kyndt et al. (2014) maintain that these learning environments demand too much from the students in terms of workload and task complexity, in these cases inducing a deep approach to learning could be difficult. based upon those studies, it seems important to detect possible relations between the learners and their social learning environment that could encourage them to become selfregulatory and support their confidence and ability to excel in complex tasks required for constructivist learning. hence, this mix-method study represents an effort to map features of constructivist learning environments, construct and validate a new scale for measuring facets of constructivist learning and asses their perceived implementation in several higher education learning contexts. moreover, since previous studies have consistently link students' academic self-efficacy (bandura, 1997) to learning settings based on the constructivist theory (dorman & adams, 2004; dorman, fisher, & waldrip, 2006), this psychological outcome has been chosen as a criterion variable in order to further assess construct validity of the new scale. this study could detect effective constructivist practices in university learning settings and measure their connection to self-efficacy for learning. revealing interrelations among several constructivist practices could provide practical implementations, informed by the constructivist theory, for higher education teaching practices. finally, the potential differences between various forms of contemporary learning settings: lecture based environments, seminars and distance learning environments, and the assessment of the use of constructivist activities in these settings, will be addressed in this study. such comparative examination could demonstrate how different constructivist activities could be applied in various settings as well as challenge the positive effect attributed to constructivist based environments on academic self-efficacy. 2. theoretical framework 2.1 the constructivist pedagogy constructivism is a view of learning that perceives the individual as an active and responsible agent in his/her knowledge acquisition process (brooks & brooks, 1999). this view is shared by cognitive constructivism and social constructivism. however, while cognitive constructivism is concerned with the individual's construction of knowledge, social constructivism stresses the collaborative processes in d. alt 4 | f l r knowledge building (windschitl, 2002). these epistemological emphases are exemplified by bakhtin (1984, 1986). for bakhtin (1984), meaning is a product of dialogues: "truth is not born nor is it to be found inside the head of an individual person; it is born between people collectively searching for truth, in the process of their dialogic interaction" (p. 110). several essential factors of the social constructivist pedagogy are indicated by theorists and practitioners (packer & goicoechea, 2001; popkewitz, 1998; steffe & gale, 1995). these features may be grouped around three key tenets of the constructivist learning environment in line with de kock et al.'s (2004) classification: constructive activity, teacher-student interaction and social activity, as further described below. the first tenet (constructive activity) pertains to the process of 'learning to learn'. this principle is based on several educational practices. first is the idea that learning occurs during sustainable participation in inquiry practices focused on the advancement of knowledge. this process, consists of a so-called predict observeexplain procedure (white & gunstone, 1992) where learners hypothesise, test their hypothesis, explain observations as a way of verifying hypothesis, and later discuss discrepancies between the hypothesis and the outcome. in this format, learners‟ participation throughout the lesson will be through predicting, observing and explaining the learning process. in this process, learners are required to actively make meaning from information; they cannot be passive consumers of conceptualisations, analyses and conclusions of others. however, although university teaching is claimed to have a special task to support students in adopting ways of thinking and producing new knowledge anchored in scientific inquiry practices (gellin, 2003; resnick, 1987), stahl (2011) argues that students' habits of learning are still overwhelmingly skewed toward passive acquisition of knowledge from authority sources rather than from collaborative inquiry activities. authenticity is another dimension of the constructive activity tenet. authentic experiences allow the individual to construct mental structures that are viable in meaningful situations. since learning is contextual, knowledge construction should occur in situations that are real rather than contrived (dolittle & camp, 1999). situating learning in a real world task ensures that learning is personally interesting, and provides the students with opportunities to think at the level of sophistication they are likely to encounter in the real world (erstad, 2011). lahn (2011) maintains that more attention should be paid to contextual variables that provide learners with a wide range of authentic experiences, and scaffolds that support an effective reorganisation of knowledge, while conceiving learners as active designers of their learning environment. providing multiple perspectives and representations of a content, is another dimension of the constructive activity tenet. the constructivist learning encourages the student to examine a phenomenon from several points of view (perspectives). when students are able to examine an experience from multiple perspectives, their understanding and adaptability are increased. in this process they are forced to go beyond everyday ethical contemplation by developing dialogue and multiple perspectives as well as drawing on available resources (lund & hauge, 2011). this practice provides students with multiple opportunities to develop a more viable model of their learning and social experiences (dolittle & camp, 1999). another dimension of the constructive activity first tenet refers to the idea that content and skills should be understood within the framework of the learner's prior knowledge (dochy & alexander, 1995). teachers should be able to ascertain their students' prior knowledge and teach accordingly. by understanding the student's mental structures, teachers can clarify incomplete or erroneous prior knowledge, determine the method of instruction necessary in a particular topic area, create effective experiences and plan independent activities, and assess materials adapted to the student (meyer, 2004). teachers should also create environments for teaching and learning that are decompartmentalised, by integrating individual, social and d. alt 5 | f l r institutional processes, as stressed by minick et al. (1993): "...one cannot develop a viable socio-cultural conception of human development without looking carefully at the way these institutions develop, the way they are linked with one another, and the way human social life is organised within them" (p. 6). hence, contrary to the traditional ideology of teaching and learning, which relies mainly upon learning opportunities that are the mere “spelled out” transmission of dominant knowledge, according to the new interdisciplinary approach, experiences retrieved from the past could offer mediations to decipher present experience, and lessons learned from prior inquiry could be turned towards a creative future (perret-clermont & perret, 2011). this approach is considered an efficient way to help teachers and learners deal with acquiring knowledge that grows at exponential proportions within change processes (jacobs, 1989). the second tenet (teacher-student interaction) is one of the main conceptual pillars of the constructivist pedagogy. this principle stresses on the self-regulated learner, and on shifting the external control over the learning process, as used in conventional and wellstructured learning settings, to the student's internal control for learning. in these processes, students should be encouraged to become selfregulatory, self-mediated, and self-aware (de kock et al., 2004). students are given opportunities to actively engage in self-regulated learning processes, including setting their own goals, mediating new meanings from existing knowledge, and forming an awareness of current knowledge structures (hakkarainen et al., 2002). the teacher role is to engage students in a self-regulated learning, often referred to as meta-cognition (brown, 1987), and encourage students to set their own goals while emphasising collaboration and negotiation. the teacher should also provide scaffolding during the learning process, while encouraging and guiding students to reflect on their own learning processes, rather than acting as a knowledge conduit (järvelä, hurme, & järvenoja, 2011). king (2002) describes this learning as a deliberate process during which learners focus on their performance and think carefully about the thinking that led to particular actions, what happened and what they are currently learning from the experience, in order to better perform in the future. according to the final tenet (social activity), learning is a social activity in which individual learning processes are affected by personal characteristics as well as by external social factors, and meaning is constructed from the interaction between existing knowledge and social situations (vygotsky, 1978). this principle highlights the cooperative nature of the learning process aimed at fostering a dialogic thinking (schwarz, 2009; schwarz & de groot, 2011; wegerif, 2007). the dialogic interpretative framework implies that pedagogic practices should be able to sustain more than one perspective simultaneously. this pedagogy has been described by wegerif and de laat (2011) in terms of moving learners into the space of dialogue. this process includes the promotion of communities of enquiry and dialogue skills through the use of forums of alternative voices, and the induction of students into real dialogues across cultural differences. järvelä et al. (2011) maintain that successful engagement in such collaborative and dialogic learning involves core processes of self-regulated learning, effective use of learning strategies to participate in collaborative interactions, meta-cognitive control, and regulation of motivation and emotions. 2.2 features of constructivist learning in higher education environments although the conventional lecture form has been consistently associated with the traditional one-way traffic instruction, based on objectivist philosophical assumptions, nave (1991) implies that several constructivist activities could be implemented in university lecture based settings. she distinguishes a conventional lecture from an 'open-text' lecture. in a conventional lecture, learners simply absorb new materials, without being allowed to raise questions. in contrast, an 'open-text' lecture allows the teacher to manoeuvre his/her ways from time to time, present the material from multiple points of view, and use varied d. alt 6 | f l r examples which are relevant to the students' world. during these activities, teachers can promote dialogic processes in the classroom. nave (1991) maintains that this complex and challenging approach necessitates qualified teachers who have the special skills required for this 'open-text' instructional design. another higher education environment is the distance learning, defined as a planned activity that occurs in a different place from the teacher, far from the designated learning place, using special techniques for designing online courses (barak & dori, 2009). the philosophy of constructivism seems to have crucial implications for learning and instructional design in distance learning settings. in the neo-vygotskian sociocultural theory, technology is seen as a facilitator of dialogic spaces where students can use networks to creative learning (wegerif & de laat, 2011). with the rapid growth of distance learning courses, it seems worthwhile to examine how distance learning settings support the use of constructivist activities. additional learning environment, based on the constructivist pedagogical approach, is the researchbased seminar. seminars include intense study relating to the student's major, typically have significantly fewer students per professor than normal courses, and are generally more specific in topic of study. these settings are conceived as excellent ways by which a community of learners could be built, interdisciplinary research-based (i.e. inquiry-based) settings could be promoted, and student-centred activities, where students themselves could take a key role in creating the research/learning link, could be fostered (lueddeke, 2003). despite the many theoretical appeals of comparing between traditional learning environments and constructivist based environments, few are the empirically based studies. for example, tynjälä (1999) showed how students in a constructivist learning environment acquire more diversified knowledge when compared with students in a traditional teaching setting. however, the potential differences between various forms of contemporary learning settings and the assessment of the use of constructivist activities in these settings are yet to be explored. such comparative examination could demonstrate how different constructivist activities could be applied in various settings. 2.3 academic self-efficacy an important psychological outcome addressed in previous research concerning constructivist teaching and learning, is academic self-efficacy (bandura, 1977, 1986). studies have stressed that academic self-efficacy is a positive predictor of academic achievement (carroll et al., 2009), and of self-motivation for academic attainment (bandura, 1997), therefore measuring the potential contribution of different learning environments to this psychological outcome is of importance. academic self-efficacy refers to personal judgements of one‟s ability to succeed at an academic task on a designated level or to attain a specific academic goal (bandura, 1997; linnenbrink & pintrich, 2002). accordingly, self-efficacy competence includes behavioural actions as well as the cognitive skills necessary for performance in a specific domain, and has been defined as “an individual‟s confidence in their ability to organise and execute a given course of action to solve a problem or accomplish a task” (eccles & wigfield, 2002, p. 110). according to bandura (1997), learners with the same level of cognitive skill development could differ in their intellectual performances due to the strength of their perceived self-efficacy. previous studies (dorman & adams, 2004; dorman et al., 2006; loyens, rikers, & schmidt, 2008; van dinther, dochy, & segers, 2011), link self-efficacy competence to the psychosocial learning environment that students experience in their schools and classrooms, and report a consistent contribution of the constructivist learning environment to students' academic self-efficacy. donche, coertjens, van daal, de maeyer and van petegem (2014) showed how academic self-efficacy has a positive direct effect on first year d. alt 7 | f l r university students' deep learning engagement. dorman and adams (2004) suggest that the potential of the constructivist learning environment in explaining academic self-efficacy should be recognised. 2.4 the present study this study attempts at first, mapping features of actual constructivist learning instantiations in higher education settings, second, constructing and validating a new scale for measuring those features, third, assessing the constructivist features implementation in different higher education settings, and fourth, measuring their effect on self-efficacy for learning. this study's main research questions were formulated as: q1. to what extent do students' perceptions of the presence of constructivist learning practices in their classes contribute to their academic self-efficacy? which perceived constructivist practices are connected to students' academic self-efficacy? q2. which learning environment sufficiently reflects an assemblage of constructivist tenets, and promotes academic self-efficacy? figure 1. demonstrates the theoretical structure of the proposed theoretical framework. figure 1. model 1. the theoretical structure of the proposed framework. 3. method a mix qualitative and quantitative research method, applied in three phases, was used to address the research aims and questions. creswell (2007) emphasised the superiority of a mixed-method research design in exploratory research. this method builds upon the synergy that exists between the qualitative-quantitative research continuum thus allowing to reinforce research construct validity and to expand the understanding of an explored phenomenon. 3.1 phase 1 the first phase was aimed at gathering and analysing classroom observational activities as experienced by students, in order to learn about actual instantiations of the theoretical constructivist features. this phase used a qualitative methodology to analyse the gathered materials according to the categorical scheme suggested by theory, while allowing for additional meaningful categories identification. d. alt 8 | f l r 3.1.1 participants and material gathering procedure phase 1 included 62 undergraduate third-year students from one major college in israel, (12.5% male students 84.6% female students). their distribution with respect to faculties was as follows: education15 students, criminology – ten students, sociology – 12 students, management – four students, economy – five students, behavioural sciences – eight students, political sciences four students, and communication four students. participants were asked to keep observation diaries of their learning activities in one of the following courses: a seminar (sm), a lecture based environment course (lbe) or a distance learning environment course (dle). since the following analysis procedure involved both deductive and inductive category applications, a prescribed general format of the diary was given, and three theoretical foci were suggested to assist observations: learning activity, teacherstudent interaction and social activity. there was also a selfreflection section in the diary. 3.1.2 analysis of the study materials in line with the deductive approach, a categorical scheme suggested by the theoretical perspective was defined (see the independent variable shown in fig. 1). the inductive approach allowed identifying additional meaningful categories. according to strauss (1987), both these aspects of inquiry are absolutely essential throughout the analysis. thus, both logically derived categories and those that have "serendipitously" arisen from the data may find their way into the research (merton, 1968). students' observations were analysed by four raters; all are experts in the research area of constructive learning. inter-rater cohen's kappa (k) reliability (cohen, 1960), which is commonly assessed in psychological research, was used. the raters were asked to categorise the students' observation reports according to the theoretical scheme. the k values were interpreted as follows: k < 0.20 poor agreement; 0.21 < k < 0.40 fair agreement; 0.41 < k < 0.60 moderate agreement; 0.61 < k < 0.80 good agreement; 0.81 < k < 1.00 very good agreement. results of 0.61 < k < 1 were considered acceptable for the purposes of the current study. the raters were also asked to report on new identified categories. 3.2 phase 2: questionnaire development this phase was aimed at developing a questionnaire that could assess constructivist activities in various educational settings. the students' descriptions gathered in the qualitative research (phase 1), where formulated as short items by three instructional design experts in the research area of constructive learning. for example, the following description of dle: "assignments were given during this course on moodle (modular object-oriented dynamic learning environment). this allowed me preparing the required work when i chose to; i could progress at my own pace" was phrased as: 'in this course, the teacher considered my learning pace' (c12). each item was given a likert-type score ranging from 1 = not at all true to 5 = completely true. consequently, a 41-item scale was submitted to 78 undergraduate third-year students in order to assess the clarity of the items. accordingly, five items were excluded due to unclear phrasing. the new scale (hereinafter: constructivist learning in higher education settings scale [clhes]) included 36 items. 3.3 phase 3 this quantitative phase was used to validate the developed questionnaire by using structural equation modelling (sem) (bentler, 2006; mcdonald & ho, 2002). in addition, since previous studies have d. alt 9 | f l r consistently link students' academic self-efficacy to constructivist learning settings, this psychological outcome had been chosen as a criterion variable to further assess construct validity of the new scale. additional aim of this phase was to test the research questions. 3.3.1 the criterion variable: academic self-efficacy an eight-item (g1 – g8) scale derived from the motivated strategies for learning questionnaire (mslq) (pintrich, smith, garcia, & mckeachie, 1993) was used to assess perceived academic competence in the students' learning environments. the mslq was originally designed to measure college undergraduates‟ motivation and self-regulated learning perception and learning strategies. the mslq is modular, thus allows using the sub-scales separately, as has been the case in the present study, which used only the academic self-efficacy sub-scale. all items were scored on a 5-point likert scale with anchors of 1 = strongly disagree to 5 = strongly agree. for example, 'i'm certain i can master the skills being taught in this course.' (cronbach's alpha = 0.89). 3.3.2 participants the clhes and mslq were submitted to 597 undergraduate third-year students (15.4% males and 84.6% females) from one major college in israel, of whom 37.5% were jewish students and 62.5% muslim students, with a mean age of 24.5 (sd=4.7) years. based on the report of the central bureau of statistics (2011) and the council for higher education (2009) in israel, the gender and ethnicity breakdown of northern galilee college students, majoring mainly in social sciences studies, is 20% males and 80% females of whom 40% jewish, 55% muslim, and 5% belonging to other religions, thus the current study's sample represents, to some extent, the gender and ethnicity breakdown of regional colleges located in the northern galilee. the distribution of the participants with respect to course settings (course groups) was as follows: 29.1% lbe students (enrolled in three randomly selected courses), 40.2% seminar course students (sm), (enrolled in eight randomly selected courses), and 30.7% dle students (enrolled in three randomly selected courses). the sample reflected the faculty enrollment breakdown of the campus, composed as follows: education – 63%, criminology – 12.8%, sociology – 7.9%, management 7.5%, economy – 4.3%, behavioural sciences 2%, political sciences 19. %, and communication – 0.6%. 3.3.3 procedure the clhes was administered to the participants near the end of their courses at the second semester of the third year of studies. the students were told that the purpose of the study was to examine their perceptions of the course. prior to obtaining participants' consent it was specified that the questionnaires were anonymous and that no pressure would be applied should they choose to return the questionnaire unfilled or incomplete (the overall response rate was 87%; 34 questionnaires were excluded due to incomplete response). finally, participants were assured that no specific identifying information about the courses would be processed. the scale items were originally generated in hebrew, and were translated into english and back translated by professional editors for the purpose of this paper. 4. findings 4.1 phase 1. qualitative study results table 1 presents the categories and several examples from the students' reports. in line with the theoretical framework, five categories have been recognised from the reports: knowledge construction, d. alt 10 | f l r authenticity, multiple perspectives, prior knowledge and teacherstudent interaction. an additional category of in-depth learning has emerged from the analysis. moreover, the theoretical category of social activity has been divided into two distinctive sub-categories: social interaction and cooperative dialogue, as further described below: 1) knowledge construction is described as multiple opportunities given to students to investigate real problems, raise questions and search for possible explanations while using various methodological approaches. 2) in-depth learning. this category pertains to the extent to which students are given opportunities to deeply explore a certain subject matter, rather than engaging them in a surface learning. 3) authenticity, deals with giving relevant meaning to the learned concepts and addressing real life and interesting events which are related to the studied topic. 4) the multiple perspectives category refers to presenting complex ideas from several points of view. 5) prior knowledge primarily deals with connecting the subject materials to other courses' topics. 6) teacherstudent interaction refers to the teacher role which includes guidance toward reflection on learning processes. 7) social interaction includes a variety of learning activities with other students, not necessarily during a lesson. 8) cooperative dialogue refers to dialogical activities during the lesson in which students can express opinions and original ideas. it can be learned from table 1 that the pedagogical principles introduced in the theoretical framework and in the analysis were associated with various course formats. for example, the following example shows how authentic real life examples are integrated in a lecture based course: "this course, entitled 'social roles', deals with the family life span, especially with men's and women's roles in different societies, for example, conflict situations within the family. the examples given in class reflect real situations from our daily life." a reversed description (rv) is a report in which students describe a lack of a constructive related activity in the learning environment, for example, the following report exemplifies how the teacher does not implement dialogical activities during a lecture based lesson: "when students want to comment on a specific issue that has been taught in class, the teacher explains that they have no right to do so, since "much better scholars than them have investigated the issue". eventually, everyone silently obeys the teacher." table 1 categories and examples from students' reports. note: seminars (sm), lecture based environments (lbe), distance learning environments (dle), reversed description (rv) category examples knowledge in this course we have investigated an interesting issue related to parents' d. alt 11 | f l r construction empowerment in educational processes, with relation to different cultural needs. this inquiry required interviewing parents; some of them were parents of children with special needs. we also interviewed educational teams in order to find ways to enrich parental involvement in schools and communities.(lbe) i want to explore how teenagers from different cultures experience their adolescence period. in order to find an answer to my question, i have to interview parents from different ethnic groups.(sm) this course involved a field work. we went to kindergartens in our city and explored how different theoretical approaches can be applied in real situations. the conclusions of our experiences were later discussed in the class.(lbe) students have presented their research work in class. they have described the whole process from the start: stated their research question, described the preferred methodology, presented the data analysis, research findings and conclusions.(sm) in-depth learning this course required preparing a project regarding the skills of the school counsellor. this was really an intensive work that included a deep study of this topic. (sm) the teacher shows us power point presentations loaded with complex figures i cannot understand. he moves from one topic to another, sometimes i really get confused.(rv) (lbe) the main goal [of this course] is the final exam. we study in order to pass the exam. there was no enriching beyond the concepts required for the exam. we could not ask questions during classes in order to deepen our understanding, since "there is no time for questions". (rv)(lbe) sometimes i get very interested in a subject raised by the teacher, at this point, disappointedly, she moves on to another subject. i feel that the quantity is much more important for her than the quality. (rv)(lbe) authenticity the teacher uploads assignments to the course website. these assignments concern current educational issues. we are also required to search for news and to find items regarding the studied material. (dle) this course, entitled 'social roles', deals with the family life span, especially with men's and women's roles in different societies, for example, conflict situations within the family. the examples given in class reflect real situations from our daily life. (lbe) one of the requirements of this course was conducting a research assignment related to problems which arab women are confronted with when leaving their close environment sphere towards academic studies, and the obstacles they encounter when they get back to their villages to work. this is an interesting issue; i was highly motivated to take part in this investigation. (sm) one of the topics was the history of the maccabiah [an international jewish athletic event]. we have studied the subject through protocols of interviews with past athletes, newspapers articles and stories related to the history of this event.(lbe) multiple perspectives the subject of this lesson was 'sexual assault'. each student could present his or her attitude. different perspectives were brought up by the students. one of them argued that women "bring it upon themselves" and should dress in a more modest manner. others disagreed and argued that religious girls in their d. alt 12 | f l r villages, although dressed by the religious code, were sexually abused. (lbe) in this course we talk about different codes of norms of several religions: jewish, muslim, christian and druze. at first, every student introduced his/her tradition regarding the dressing code, then, we asked each other questions regarding for example, the origin of these codes, and the obstacles arise within a multicultural society with relation to these codes. (lbe) in this lesson we have discussed the subject of 'egalitarian division of labour within the family'. some female students were against the idea of equal sharing, one of them argued that her husband is working hard and this is enough labour for him, and that from her point of view women should take care for domestic issues only. other students strongly opposed this position. maybe their different cultures effect their point of view.(lbe) prior knowledge the main topic dealt with the transition to parenthood. this subject was related first, to my previous experience as a mother, and second, to many subjects such as psychology, childhood era, conflicts in the family, which i have learned during the past year.(lbe) in this lesson we learned about ethics in research. the teacher showed us videos of the milgram's experiment on obedience to authority figures. i have learned about this experiment in a psychology related course earlier this year, however, this moral perspective has broadened my knowledge. (lbe) one of the discussed topics was on unmarried couples who choose to have a parenting agreement. this issue raised many important aspects that were related to several course materials i had previously studied, such as: parents and parenting, the child's security and needs. (lbe) teacher student interaction one of my assignments was to present a theme with relation to the studied material. the teacher encouraged me to search for papers, she has given me a general guidance on how and where to find scientific materials related to my subject.(lbe) assignments were given during this course on moodle (modular objectoriented dynamic learning environment). this allowed me preparing the required work when i chose to; i could progress at my own pace. (dle) the teacher knows every single student by his/her name. she always encourages me. after my class presentation, she sent me an email in which she had appreciated my progress and added some comments on how to improve my learning process. (sm) in this course the assignments are given in a way which allows me to organise my schedule in a flexible manner.(dle) social interaction during this course arab and jewish students have cooperated on multiple occasions. for example, the hebrew language is very difficult for non-native speakers, so in many occasions during a cooperative in -class or out-class work, jewish students helped arab students correcting spelling mistakes and improving oral presentations.(lbe) i have kept downloading materials from the website, nothing else was needed. i was not required to work with others, frankly, i did not know the students participating in the course .(rv)(dle) the teacher encourages us to use the forum. she raises questions and asks us to comment and hold a debate. however, in practice, it seems that many students d. alt 13 | f l r invest their time in answering her questions, and do not pay any attention to students' comments.(rv)(dle) cooperative dialogue the discussed subject was conflict in the family with relation to the "coming out of the closet" issue. a female student shared her private experience in this context with us. people got excited, students in this class come from different cultures, some of them religious, and therefore very different voices were heard. (lbe) although defined as a lecture based course, discussions were held in every lesson. for example, the jewish ancient law of halitza was discussed. according to this law, a jewish widow would need to marry her brother-in-law unless he freed her in a ceremony known as halitza. many students wished to say something about it. some argued that this ceremony is no longer valid even in orthodox communities. others suggested that this is another example of an anti-feminist realty imposed by religion. through these dialogues i have become more interested in the studied material.(lbe) when students want to comment on a specific issue that has been taught in class, the teacher explains that they have no right to do so, since "much better scholars than them have investigated the issue". eventually, everyone silently obeys the teacher. (rv)(lbe) 4.2 phase 2. descriptive statistics, internal consistency and construct validity of the clhes table 2 presents the clhes factors, sub-factors, item descriptions (as derived from phase 2) and internal consistencies (cronbach‟s alpha). items 10, 25, 30 were excluded from the analysis due to low item loading results (< .30) found in the structural equation modelling (fig.2). each of the eight factors showed a very high internal consistency. table 3 provides descriptive statistics for the clhes factors (n = 597). table 4 displays the bivariate correlation analysis results among the clhes factors and between these factors and the academic self-efficacy criterion variable. convergent validity has been shown by positive statistically significant correlations between all factor pairings. meaning, the measures of the constructivist factors that theoretically are related to each other are in fact observed to be related to each other. the generally moderate correlations among the dimensions suggest that the factors are, to some extent, independent each from the other. finally, as can be learned from table 4, the correlation coefficients shown between the clhes factors and the academic self-efficacy variable are lower than the amongconstructivistfactor coefficients. therefore, discriminant validity of the clhes scale may be confirmed. these conditions were posited by campbell & fiske (1959) as evidence supporting construct validity. d. alt 14 | f l r table 2 the clhes questionnaire: factors, sub-factors, item descriptions and internal consistencies (cronbach’s alpha) factors and subfactors item cronbach‟s alpha constructive activity (f1) knowledge construction (a1) c1. in this course, i was given opportunities to investigate real problems (five items) .93 c2. during this course, i was given opportunities to raise questions about complex problems c3. during this course, i was given opportunities to search for possible explanations for real problems c4. i was asked to analyse data regarding a significant problem i have raised during this course c5. during this course, i was asked to draw conclusions from a research work, in which i have participated constructive activity (f1) in-depth learning (a2) c6. in this course, i have learned skills with which i can deeply explore a subject of interest to me (four items, item c10 was omitted due to a low loading result) .87 c7. i could examine in depth a major issue in this course c8. in this course, i have focused on a central subject which i was required to deeply understand c9. in this course, i have learned how to deeply investigate a certain subject c10. in this course, we "jump" from one subject to another without examining any subject in depth* constructive activity (f1) authenticity (a3) c16. this course addressed interesting situations in reality (five items) .87 c17. the course focused on giving relevant meaning to the learned concepts c18. the course addressed real life and interesting events c19. the course was rich with real-life examples that interested me c20. the course did not addressed real life examples* constructive activity (f1) c21. in this course, ideas were presented from several points of view (four items, item c25 was omitted due c22. i have learned about complex real issues in this course d. alt 15 | f l r multiple perspectives (a4) c23. i have realised that the reality is complex and multi – dimensional, in this course to a low loading result) .81 c24. in this course, i had to question and criticise accepted ideas c25. in this course, ideas were presented from only one perspective, and were not allowed to be criticised* constructive activity (f1) prior knowledge (a5) c26. this course dealt with subjects i have learned in other courses (four items, item c30 was omitted due to a low loading result) .85 c27. the subjects learned in this course were related to prior knowledge i have gained c28. things i have learned in this course have helped me understand issues i have learned in other courses c29. the subjects in this course were related to diverse contents of knowledge c30. the subjects in this course were not related to other things i have learned in other courses* teacherstudent interaction (f2) c11. in this course, the teacher allowed me to think about my learning and how to improve it (five items) .91 c12. in this course, the teacher considered my learning pace c13. in this course, i could set myself some learning goals c14. in this course, the teacher encouraged me to think about my learning and ways to improve it c15. in this course, the teacher made me think about the advantages and disadvantages of my learning social activity (f3) social interaction (h1) c31. this course included a variety of learning activities with other students (three items) .88 c32. i was given opportunities to learn with other students in this course c33. i could collaborate with other students in this course social activity (f3) cooperative dialogue (h2) c34. arguments and discussions were held during this course (three items) .89 c35. it was possible to express original ideas in this course c36. in this course, i could express my opinion, even when it was different from other students * reversed items d. alt 16 | f l r table 3 descriptive statistics for the clhes measured factors kurtosis skewness sd mean factor -0.815 -0.31 1.11 3.11 knowledge construction (a1) -0.26 -0.54 0.99 3.41 in-depth learning (a2) 0.43 -0.76 0.86 3.59 authenticity (a3) .40 -0.54 0.79 3.41 multiple perspectives (a4) 0.36 -0.62 0.87 3.42 prior knowledge (a5) -0.24 -0.55 0.95 3.33 teacherstudent interaction (f2) -0.62 -0.35 1.09 3.13 social interaction (h1) 0.02 -0.63 0.99 3.48 cooperative dialogue (h2) table 4 bivariate correlation matrix for the eight factors of the clhes scale and academic self-efficacy factors 1 2 3 4 5 6 7 8 academic selfefficacy 1 knowledge construction (a1) .775 ** .557 ** .589 ** .409 ** .589 ** .458 ** .495 ** .336 ** 2 in-depth learning (a2) .604 ** .623 ** .497 ** .663 ** .501 ** .465 ** .364 ** 3 authenticity (a3) .686 ** .535 ** .623 ** .435 ** .455 ** .309 ** 4 multiple perspectives (a4) .533 ** .628 ** .488 ** .520 ** .302 ** 5 prior knowledge (a5) .546 ** .423 ** .380 ** .328 ** 6 teacherstudent interaction (f2) .457 ** .436 ** .385 ** 7 social interaction (h1) .595 ** .286 ** 8 cooperative dialogue (h2) .291 ** p < .01** d. alt 17 | f l r 4.3 phase 3 4.3.1 testing the first research question structural equation modelling (sem) was employed to test the first research question (q1), and to further assess the construct validity of the clhes, using a confirmatory factor analysis. data used for the sem were analysed with the maximum likelihood method. three fit indices were computed in order to evaluate model fit: χ2(df), (p > .05), cfi (> 0.9), and rmsea (< 0.08). the structural model (fig. 2) refers to the combined measurement and path models. the measurement model includes the following factors: first, the constructive activity (f1) latent variable accompanied by five latent variables: knowledge construction (a1) with five observed items (c1 – c5); indepth learning (a2) with four observed items (c6 – c9); authenticity (a3) with five observed items (c16 – c20); multiple perspectives (a4) with four observed items (c21 – c24); and prior knowledge (a5) with four observed items (c26 – c29); second, the teacherstudent interaction (f2) latent variable accompanied by five observed variables (c11 – c15); third, the social activity (f3) latent variable accompanied by two latent variables: social interaction (h1) with three observed items (c31 – c33) and cooperative dialogue (h2) with three observed items (c34 – c36). the path model was constructed as follows: three paths were specified between the latent factors f1 – f3 and the criterion latent variable of academic self-efficacy (se) which was accompanied by eight observed items (g1 – g8). the goodness of fit of the data to the model yielded to sufficient fit results (χ2 = 2079.36, df = 766, p = .000; cfi = .926; rmsea = .054). the results showed positive low significant coefficients between the teacherstudent interaction (f2) factor and the criterion variable of academic self-efficacy (β = .23, p < .01) and between the social activity (f3) factor and the criterion variable (β = .22, p < .05). an insignificant coefficient result was indicated between the constructive activity (f1) factor and the dependent variable. as shown in fig. 2, the clhes factors together explained 36% of the academic self-efficacy criterion variable variance. 4.3.2 testing the second research question in order to test the second research question (q2), a multivariate analysis of covariance (mancova) with bonferroni pair-wise comparisons and wilks' lambda criterion was applied to allow the characterisation of differences between the course groups (lbe, sm and dle) in regard to a linear combination of the multiple eight dependent factors of the clhes. in addition, an analysis of covariance (ancova) with bonferroni pair-wise comparisons was used to assess betweencourse group differences on the academic self-efficacy variable. the variables of gender (1 = male, 2 = female) and cultural group (1 = jewish, 2= muslim) were entered as covariates to neutralise any significant confounding effect in the analyses of variance. table 5 shows the mean scores, standard deviations, f values, wilks' lambda and partial eta-squared statistics of the analyses. results indicated significant differences between the course groups regarding the combination of the multiple clhes factors and separately on each of them. all the betweengroup differences were accompanied by moderate to large effect sizes, when small, moderate, and large effects are reflected in values of ηp2 equal to .0099, .0588, and .1379, respectively (cohen, 1969, pp. 278–280; richardson , 2011, p. 142). d. alt 18 | f l r d. alt 19 | f l r figure 2. the structural model, with standardised parameter estimates (n= 597). note: *p < .05 **p < .01 ***p < .001. table 5 mean scores, sd, f values, wilks' lambda, partial eta-squared statistics (ηp 2 ) and bonferroni pair-wise comparisons of the three course groups (lbe, sm and dle) on the eight clhes factors and the academic self-efficacy variable. the numbers of the pair-wise comparisons indicate: 1=the lowest mean result, 2= in between, 3= the highest mean result, identical numbers indicate insignificant between-group differences. course groups sm dle lbe dependent variables m sd m sd m sd f ηp 2 factors of the clhes scale wilks' lambda statistic (main effect) 27.90*** .277 anova knowledge construction (a1) 3.87 0.70 2.99 0.99 2.20 0.90 183.75*** .384 pair-wise comparisons 3 2 1 in-depth learning (a2) 3.98 0.65 3.35 0.90 2.68 0.97 115.10*** .281 pair-wise comparisons 3 2 1 authenticity (a3) 3.95 0.66 3.40 0.74 3.27 1.00 40.56*** .121 pair-wise comparisons 3 1 1 multiple perspectives (a4) 3.71 0.67 3.35 0.71 3.06 0.87 34.06*** .104 pair-wise comparisons 3 2 1 prior knowledge (a5) 3.68 0.73 3.43 0.83 3.04 0.95 24.92*** .078 pair-wise comparisons 3 2 1 teacherstudent interaction (f2) 3.72 0.79 3.32 0.83 2.83 1.01 45.06*** .133 pair-wise comparisons 3 2 1 social interaction (h1) 3.38 1.04 3.41 0.94 2.49 1.04 39.40*** .118 d. alt 20 | f l r pair-wise comparisons 3 3 1 cooperative dialogue (h2) 3.82 0.81 3.32 0.99 3.18 1.08 24.91*** .078 pair-wise comparisons 3 1 1 covariate effect gender .020 cultural group .067 academic self-efficacy pair-wise comparisons 4.07 0.04 3.90 .05 3.76 0.05 10.69*** .035 3 3 1 covariate effect gender .000 cultural group .020 note: p < .05 * p < .01** p < .001*** as presented in table 5, salient betweengroup differences were indicated for the factors: knowledge construction (a1) (ηp2 = .384) and in-depth learning (a2) (ηp2 = .281). on each factor, the lowest mean result was indicated for the lbe group and the highest for the sm group. somewhat lower effect sizes were found for three factors: teacherstudent interaction (f2) (ηp2 = .133) the lowest mean result was indicated for the lbe group and the highest for the sm group; authenticity (a3) (ηp2 = .121), with a significant higher score shown for the sm group compared with the other groups; and social interaction (h1) (ηp2 = .118) the lowest mean result was indicated for the lbe group and the highest results were shown for the sm and dle groups. the relatively lowest effect sizes were found for three factors: multiple perspectives (a4) (ηp2 = .104), prior knowledge (a5) (ηp2 = .078), on each factor, the lowest mean result was indicated for the lbe group and the highest for the sm group; and cooperative dialogue (h2) (ηp2 = .078) with a significant higher score indicated for the sm group compared with the other groups. regarding the academic self-efficacy variable, differences were found between the three groups, accompanied by a low effect size (ηp2 = .035) the highest results were indicated for the sm and dle groups and the lowest for the lbe group. d. alt 21 | f l r 5. discussion the overarching goals of this study were to map features of constructivist learning environments, construct and validate a new scale for measuring the presence of those features in different higher education settings, by using a mix-method approach. 5.1 the qualitative analysis consistent with previous theoretical research (de kock et al., 2004) this research revealed three key tenets of the constructivist learning environment: constructive activity, teacher-student interaction and social activity. regarding the constructive activity tenet, the results foregrounded five categories: knowledge construction, authenticity, multiple perspectives, prior knowledge, and in-depth learning. this research elaborates the body of literature by adding the sub-category of in-depth learning which emerged from the content analysis. this facet pertains to the extent to which students are given opportunities to deeply explore a certain subject matter, in order to seek a clearer understanding of the learning materials, in contrast to surface learning which is confined to rote learning and memorising facts. although in-depth learning is not a new concept, this research has empirically demonstrated its relation to constructive activities in higher education settings. moreover, the theoretical category of social activity has been divided into two distinctive facets: cooperative dialogue and social interaction. social interaction includes a variety of learning activities with other students, not necessarily during a lesson, whereas cooperative dialogue refers to dialogical activities during the lesson in which students can express opinions and original ideas. another finding regarding the qualitative research was that some constructivist pedagogical principles are associated with lecture based courses. for example, according to the students' reports, teachers of lecture based courses have used real-life examples during their lectures. some students reported on dialogical activities during lectures in which students could express opinions and original ideas. these findings were partially corroborated by the quantitative analysis results according to which, lbe and dle were perceived by the students to be equally consistent with the authenticity and cooperative dialogue constructivist features. although, the quantitative analyses have revealed that lbe are generally less consistent with other examined constructivist features compared with sm and dle formats, these findings may imply that some constructivist features can be applied in lecture based environments, in accordance with nave (1991). 5.2 the quantitative analysis phase – perceptions of the learning environments the main result of this phase showed that students perceive sm learning environments as more constructivist when compared with perceptions held by other course groups (lbe and dle). since sm settings are conceived as excellent ways by which constructivist activities could be fostered (lueddeke, 2003), this finding could have been expected, and thus could further validate the new scale. additional findings showed that dle are generally perceived as more constructivist than lbe, and less constructivist when compared with sm environments. however, no differences were shown between dle and lbe in authenticity and cooperative dialogue activities. although technology is seen as a facilitator of dialogic spaces (wegerif & de laat, 2011), according to this research findings, it may be inferred that this practice is inadequately applied by teachers. researchers (e.g., östlund, 2008) argue that guaranteeing collaboration for learning can be difficult to achieve in dle. in order to achieve this goal, d. alt 22 | f l r learners should be encouraged to use the forum, and teachers should stimulate interaction by creating assignments in which the learners can be actively engaged in discussion. nonetheless, the factor social interaction, which includes a variety of learning activities with other students, was similarly applied in dle and sm, compared with lbe. this could suggest that students of dle courses tend to be more engaged in off-line cooperative activities than during 'on-line' dialogues. 5.3 the quantitative analysis phase academic self-efficacy and perceptions of the learning environments additional important findings regard the criterion variable of academic self-efficacy. this study's empirical model indicates that stimulating meta-cognitive and reflective aspects of learning, through teacherstudent interaction, could bolster the students‟ confidence in their ability to accomplish a task. studies indicate that students who develop strong academic self-efficacy beliefs are better able to manage their learning, and consequently are more likely to successfully complete their education and be better equipped for a variety of occupational options in today's competitive society (bandura, barbaranelli, caprara, & pastorelli, 2001). accordingly, this study suggests that educators should be aware of the importance of pursuing this affective outcome by motivating the students to think reflectively, regarding the individuals' learning process. through this process of evaluating their own performance as learners, students could become active participates in their development (king, 2002), and consequentially, as suggested by this study, more confident in their ability to execute assignments. the social activity factor was found to be the second positive predictor of academic self-efficacy. this factor deals with the need to encourage interaction and collaboration among students. interaction is perceived to be one of the most important components of the learning experience, in which students are given sufficient opportunities to express themselves and to share their own experiences with others (dewey, 1938; tenenbaum et al., 2001; vygotsky, 1978). a recent study shows that effective cooperative learning communities support knowledge acquisition (wyatt et al., 2010). the present research indicates that social interaction could also benefit academic self-efficacy. a plausible explanation could be that interactions with others allow the learners to reflect on their own work and to make independent use of their results thus being able to perform more effectively as suggested by vygotsky (1978) and bandura (1986). moreover, encouraging interaction and collaboration among students could have provided sufficient opportunities for students to observe other group members. such vicarious experience could be gained in collaborative assignments provided by the learning environment, and could affect students' perceptions of their own ability to perform (bandura, 1997). moreover, students who worked together could have been encouraged to share their views and evaluations of other students in their group. having them identify the strengths of others, rather than their weaknesses, might have benefited their self-efficacy beliefs (schunk & miller, 2002). the present study stresses the importance of facilitating cooperative tutorial study groups not only in order to create a well-functioning environment, but also to nurture self-efficacious learners in higher education studies. it should be noted that according to this study's result, both sm and dle courses were more positively associated with increased self-efficacy for learning compared with lbe courses. this result could be theoretically explained by the firm contribution attributed to the philosophy of constructivism to learning and instructional design in distance learning settings and research-based seminar (lueddeke, 2003; wegerif & de laat, 2011). empirically, this result could be explained by the sm and dle emphasis on interpersonal interactions compared with the lbe courses, according to the participants' report. d. alt 23 | f l r lastly, the factor constructive activity was not found to be significantly connected to the selfefficacy dependent variable. it could be inferred that the social interaction dimensions of the learning environments are more prominent in explaining self-efficacy for learning. nonetheless, the positive high connections found between the three tenets of constructive activity, teacher-student interaction and social activity could suggest an indirect connection between constructive activities and academic self-efficacy through increased interpersonal interactions. 5.4 limitations and directions for future research first limitation is that the clhes scale constructed and validated in this study could be further elaborated. for example, this scale did not include characteristics of assessment as components of the constructivist learning environment. assessment is considered part of the fabric of classrooms to which students attach importance. assessment tasks that do not match student learning could lower the confidence of students for successfully performing academic tasks (dorman et al., 2006). thus further research is needed to examine this mediator measure with relation to higher education. second, future research should also consider expanding the model tested here with additional variables that could be related to learning activities such as, academic motivation psychological variables. these variables could be related to learning setting perceptions and academic self-efficacy, therefore assessing them in conjunction with the present study examined constructs is of importance and could allow measuring additional constructivist environments' effects on psychological constructs. third limitation concerns the cross-sectional nature of the data which can prevent definitive statements about causality. definitive proof of mediation will also require longitudinal data (cole & maxwell, 2003). it should be further acknowledged that alternate models might explain the relationships in these data as well as the one tested in this study. in fact, many relationships in the model are likely reciprocal. for example, although the analysis implies that the self-efficacy construct is mainly informed by the teacher-student interaction factor, it is equally plausible that teachers may become more involved with self-efficacious students. despite such possibilities, the path model could represent a reasonable, theoretically grounded structure of the relations between the examined factors. however, researchers should extend this work with longitudinal paradigms. lastly, this study was conducted in a single country, meaning that the results cannot necessarily be generalised. therefore, larger population studies are needed to validate these findings, and more research on this topic needs to be undertaken before the associations between the perceived learning environment and self-efficacy belief are more clearly understood. despite its limitations, this study underscores the importance of interpersonal relationships to students' psychological outcomes, specifically, the significant roles of teacher-studentand student-student relationships in enhancing academic self –efficacy are recognised in this study. keypoints a qualitative analysis of classroom observational activities has foregrounded eight factors: 'knowledge construction', 'authenticity', 'multiple perspectives', 'prior knowledge', 'in-depth learning', 'teacherstudent interaction', 'social interaction' and 'cooperative dialogue'. based on the qualitative analysis results, the constructivist learning in higher education settings scale [clhes] was developed. d. alt 24 | f l r construct validity of the clhes was confirmed by using structural equation modelling. teacher-student interactions and student-student social activities were positively connected to self-efficacy for learning. seminars (sm) were perceived as generally more constructivist when compared with lecture based environments (lbe) and distance learning environments (dle). acknowledgments this research was supported by a grant from the maslovaty foundation for the advancement of education on morals and society, founded by dr. nava maslovaty of blessed memory. references alt, d. (in press) assessing the contribution of constructivist based academic learning environment to academic self-efficacy. learning environments research. baeten, m., kyndt, e., struyven, k., & dochy, f. (2010). using student-centred learning environments to stimulate deep approaches to learning: factors encouraging or discouraging their effectiveness. educational research review, 5, 243-260. doi: 10.1016/j.edurev.2010.06.001 bakhtin, m. (1984). problems of dostoevsky’s poetics. (c. emerson, ed. & trans.). minnieapolis: university of michigan press. bakhtin, m. (1986). speech genres and other late essays. austin, tx: university of texas press. bandura, a. (1977). social learning theory. new york: prentice hall. bandura, a. (1986). the explanatory and predictive scope of self-efficacy theory. journal of social and clinical psychology, 4, 359-373. doi: 10.1521/jscp.1986.4.3.359 bandura, a. (1997). self-efficacy: the exercise of control. new york: freeman. bandura, a., barbaranelli, c., caprara, g. v., & pastorelli, c. (2001). self-efficacy beliefs as shapers of children‟s aspirations and career trajectories. child development, 72(1), 187-206. doi: 10.1111/1467-8624.00273 barak, m., & dori, y. j. (2009). enhancing higher order thinking skills among in-service science teachers via embedded assessment. journal of science teacher education, 20(5), 459 474. bentler, p. m. (2006). eqs 6 structural equations program manual. encino, ca: multivariate software, inc. biggs, j. b., kember, d., & leung, d.y.p. (2001). the revised two factor study process questionnaire: rspq-2f. british journal of educational psychology, 71 ,133-149. doi: 10.1348/000709901158433 brooks, j. g., & brooks, m. g. (1999). in search of understanding: the case for constructivist classrooms. alexandria, va: association for supervision and curriculum development. brown, a. l. (1987). metacognition, executive control, selfregulation and other mysterious mechanisms. in f. weinert & r. kluwe (eds.), metacognition, motivation and understanding (pp. 65–115). hillsdale, nj: lawrence erlbaum. campbell, d. t., & fiske, d. w. (1959). convergent and discriminant validation by the multitraitmultimethod matrix. psychological bulletin, 56, 81-105. doi: 10.1037/h0046016 cano, f., & garcía-berbén, a-b. (2014). university students' achievement goals and approaches to learning in mathematics: a re-analysis investigating 'learning patterns'. in d. gijbels, v. donche, j. t. e. richardson, & j. d. vermunt (eds.), learning patterns in higher education: dimensions and research perspectives (pp. 163 – 186). london and new york: routledge and earli. carroll, a., houghton, s., wood, r., unsworth, k. l., hattie, j., gordon, l., & bower, j. (2009). selfefficacy and academic achievement in australian high school students: the mediating effects of academic aspirations and delinquency. journal of adolescence, 32(4), 797-817. doi: 10.1016/j.adolescence.2008.10.009 http://www.routledge.com/books/search/author/david_gijbels/ http://www.routledge.com/books/search/author/vincent_donche/ http://www.routledge.com/books/search/author/john_t._e._richardson/ http://www.routledge.com/books/search/author/john_t._e._richardson/ http://www.routledge.com/books/search/author/jan_d._vermunt/ d. alt 25 | f l r central bureau of statistics. (2011). women in higher education. retrieved december 21, 2012 from http://www1.cbs.gov.il/www/publications/desc_exp/women.pdf (hebrew) cohen, j. (1960). a coefficient of agreement for nominal scales. educational and psychological measurement, 20, 37-46. doi: 10.1177/001316446002000104 cohen, j. (1969). statistical power analysis for the behavioural sciences. new york: academic press. cole, d. a., & maxwell, s. e. (2003). testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. journal of abnormal psychology, 112, 558–577. council for higher education. (2009). planning and budgeting committee 34/35 report. retrieved july 20, 2010 from http://www.che.org.il/download/files /contents_1.pdf (hebrew) creswell, j. w. (2007). educational research (3rd ed.). thousand oaks, ca: sage. de kock, a., sleegers, p., & voeten, m. j. m. (2004). new learning and the classification of learning environments in secondary education. review of educational research, 74(2), 141-170. doi: 10.3102/00346543074002141 dewey, j. (1938). experience and education. new york: touchstone. dochy, f. j. r. c., & alexander, p. a. (1995). mapping prior knowledge: a framework for discussion among researchers. european journal of psychology of education, 10, 225-242. doi: 10.1007/bf03172918 donche, v., coertjens, l., van daal, t., de maeyer, s., & van petegem, p. (2014). understanding differences in student learning and academic achievement in first year higher education: an integral research perspective. in d. gijbels, v. donche, j. t. e. richardson, & j. d. vermunt (eds.), learning patterns in higher education: dimensions and research perspectives (pp. 214 – 231). london and new york: routledge and earli. doolittle, p. e., & camp, w. g. (1999). constructivism: the career and technical education perspective. journal of vocational and technical education, 16(1), 23-46. retrieved from http://scholar.lib.vt.edu/ejournals/jvte/v16n1/doolittle.html dorman, j. p., & adams, j. (2004). associations between students' perceptions of classroom environment and academic efficacy in australian and british secondary schools. westminster studies in education, 27, 69 – 85. doi: 10.1080/0140672040270106 dorman, j. p., fisher, d., & waldrip, b. (2006). classroom environment student` perceptions of assessment, academic efficacy and attitude to science. a lisrel analysis. in d. l. fisher & m. s. khine (eds.), contemporary approaches to research on learning environments: world views (pp. 1 -28). singapore: world scientific. eccles, j. s., & wigfield, a. (2002). motivational beliefs, values, and goals. annual review of psychology, 53, 109-132. doi: 10.1146/annurev.psych.53.100901.135153 entwistle, n. j., & ramsden, p. (1983). understanding student learning. london: croom helm. erstad, o. (2011). weaving the context of digital literacy. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 295-310). london: routledge. evans, c. (2014). exploring the use of a deep approach to learning with students in the process of learning to teach. in d. gijbels, v. donche, j. t. e. richardson, & j. d. vermunt (eds.), learning patterns in higher education: dimensions and research perspectives (pp. 187 – 213). london and new york: routledge and earli. fraser, b. j., treagust, d. f., williamson, j. c., & tobin, k. g. (1987). validation and application of the college and university classroom environment inventory (cucei). in b. j. fraser (ed.), the study of learning environments (pp. 17-30). perth, western australia: curtin university of technology. gellin, a. (2003). the effect of undergraduate student involvement on critical thinking: a metaanalysis of the literature 1991–2000. journal of college student development, 44, 745–762. doi:10.1353/csd.2003.0066 gijbels, d., segers, m., & struyf, e. (2008). constructivist learning environments and the (im)possibility to change students‟ perceptions of assessment demands and approaches to learning. instructional science, 36, 431–443. doi:10.1007/s11251-008-9064-7 http://www.routledge.com/books/search/author/david_gijbels/ http://www.routledge.com/books/search/author/vincent_donche/ http://www.routledge.com/books/search/author/john_t._e._richardson/ http://www.routledge.com/books/search/author/jan_d._vermunt/ javascript:__dolinkpostback('detail','mdb%257e%257eaph%257c%257cjdb%257e%257eaphjnh%257c%257css%257e%257ejn%2520%252522westminster%2520studies%2520in%2520education%252522%257c%257csl%257e%257ejh',''); javascript:__dolinkpostback('detail','mdb%257e%257eaph%257c%257cjdb%257e%257eaphjnh%257c%257css%257e%257ejn%2520%252522westminster%2520studies%2520in%2520education%252522%257c%257csl%257e%257ejh',''); http://www.routledge.com/books/search/author/david_gijbels/ http://www.routledge.com/books/search/author/vincent_donche/ http://www.routledge.com/books/search/author/john_t._e._richardson/ http://www.routledge.com/books/search/author/jan_d._vermunt/ d. alt 26 | f l r gijbels, d., van de watering, g., dochy, f., & van den bossche, p. (2006). new learning environments and constructivism: the students' perspective. instructional science, 34(3), 213-226. doi:10.1007/s11251-005-3347-z hakkarainen, k., lipponen, l., & järvelä, s. (2002). epistemology of inquiry and computer supported collaborative learning. in t. koschmann, n. miyake & r. hall (eds.), cscl2: carrying forward the conversation (pp. 129–156). mahwah, nj: erlbaum. jacobs, h. h. (1989). interdisciplinary curriculum: design and implementation. alexandria, va: asdc. järvelä, s., hurme, t.-r., & järvenoja, h. (2011). selfregulation and motivation in computersupported collaborative learning environments. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 330-345). london: routledge. king, t. (2002, july). development of student skills in reflective writing. paper presented at the 4 th world conference of the international consortium for educational development in higher education, perth, australia. doi: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.136.2518 kyndt, e., dochy, f., & cascallar, e. (2014). students' approaches to learning in higher education: the interplay between context and student. in d. gijbels, v. donche, j. t. e. richardson, & j. d. vermunt (eds.), learning patterns in higher education: dimensions and research perspectives (pp. 249 – 272). london and new york: routledge and earli. lahn, l. c. (2011). professional learning as epistemic trajectories. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 53-68). london: routledge. lea, s., stephenson, d., & troy, j. (2003). higher education students' attitudes to student-centred learning: beyond 'educational bulimia'. studies in higher education, 28, 321-333. doi:10.1080/03075070309293 linnenbrink, e. a., & pintrich, p. r. (2002). motivation as an enabler for academic success. school psychology review, 31, 313-327. retrieved from http://prof.usb.ve/jjramirez/pregrado/ccp114/ccp114%20motivacion%20linnenbrink%20y% 20pintrich.pdf loyens, s. m. m., rikers, r. m. j. p., & schmidt, h. g. (2008). relationships between students‟ conceptions of constructivist learning and their regulation and processing strategies. instructional science, 36, 445–462. doi:10.1007/s11251-008-9065-6 lueddeke, g. r. (2003). professionalising teaching practice in higher education: a study of disciplinary variation and „teaching-scholarship‟. studies in higher education, 28, 213-228. doi: 10.1080/0307507032000058082 lund, a., & hauge, t. e. (2011). changing objects in knowledge-creation practices. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 206-221). london: routledge. mcdonald, r. p., & ho, m.-h. (2002). principles and practice in reporting structural equation analyses. psychological methods, 7, 64 – 82. doi: 10.1037/1082-989x.7.1.64. 64 merton, r. k. (1968). social theory and social structure. new york: free press. meyer, h. (2004). novice and expert teachers' conceptions of learners' prior knowledge. science education, 88(6), 970 983. doi: 10.1002/sce.20006 minick, n., stone, c. a., & forman, e. a. (1993). introduction: integration of individual, social, and institutional processes in accounts of children's learning and development. in e. a. forman, n. minick, & c. a. stone (eds.), contexts for learning: sociocultural dynamics in children's development (pp. 3 15). new york: oxford university press. nave, h. (1991). in favour of the frontal teaching. the israeli ministry of education: the division for curriculum planning and development. retrieved from http://www.education.gov.il/tochniyot_limudim/sifrut/asi12020.htm (hebrew) östlund, b. (2008). prerequisites for interactive learning in distance education: perspectives from swedish students. australasian journal of educational technology, 34, 42-56. retrieved from http://www.westga.edu/~distance/ojdla/fall163/ekstrand164.html packer, m. j., & goicoechea, j. (2001). sociocultural and constructivist theories of learning: ontology, not just epistemology. educational psychologist, 35, 227–241. doi: 10.1207/s15326985ep3504_02 http://www.routledge.com/books/search/author/david_gijbels/ http://www.routledge.com/books/search/author/vincent_donche/ http://www.routledge.com/books/search/author/john_t._e._richardson/ http://www.routledge.com/books/search/author/jan_d._vermunt/ http://www.routledge.com/books/search/author/jan_d._vermunt/ d. alt 27 | f l r pellegrino, j. w., & hilton, m. l. (eds.). (2012). education for life and work: developing transferable knowledge and skills in the 21st century. washington, d.c: the national academies press. perret-clermont , a.-n., & perret, j.-f. (2011). a new artifact in the trade: notes on the arrival of a computer supported manufacturing system in a technical school. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 87-102). london: routledge. pintrich, p. r., smith, d., garcia, t., & mckeachie, w. (1993). reliability and predictive validity of the motivated strategies for learning questionnaire (mslq). educational and psychological measurement, 53, 801-813. doi: 10.1177/0013164493053003024 popkewitz, t. s. (1998). dewey, vygotsky and the social administration of the individual: constructivist pedagogy as systems of ideas in historical spaces. american educational research journal, 35, 535– 570. doi:10.3102/00028312035004535 quisumbing, l. r. (2005). education for the world of work and citizenship: towards sustainable future societies. prospects: quarterly review of comparative education, 35, 289–301. doi: 10.1007/s11125-005-4266-0 resnick, l. (1987). education and learning to think. washington, dc: national academy press. richardson, j. t. e. (2011). eta squared and partial eta squared as measures of effect size in educational research. educational research review, 6, 135–147. doi:10.1016/j.edurev.2010.12.001 schunk, d. h., & miller, s. d. (2002). self-efficacy and adolescents‟ motivation. in f. pajares & t. urdan (eds.), academic motivation of adolescents (pp. 29-52). greenwich, ct: information age. schwarz, b. (2009). argumentation and learning. in n. mullermirza & a.n. perret-clermont (eds.), argumentation and education – theoretical foundations and practices (pp. 91–126). new york and london: springer. schwarz, b., & de groot, r. (2011). breakdowns between teachers, educators and designers in elaborating new technologies as precursors of change in education to dialogic thinking. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 261-277). london: routledge. stahl, g. (2011). social practices of group cognition in virtual match teams. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 190-205). london: routledge. steffe, l. p., & gale, j. (1995). constructivism in education. mahwah, nj: lawrence erlbaum associates. strauss, a. l. (1987). qualitative analysis for social scientists. cambridge: cambridge university press. tenenbaum, g., naidu, s., jegede, o., & austin, j. (2001). constructivist pedagogy in conventional oncampus and distance learning practice: an exploratory investigation. learning and instruction 11, 87–111. doi:10.1016/s0959-4752(00)00017-7 tiwari, a., chan, s., wong, e., wong, d., chui, c., wong, a., & patil, n. (2006). the effect of problembased learning on students‟ approaches to learning in the context of clinical nursing education. nurse education today, 26, 430-438. doi: http://dx.doi.org/10.1016/j.nedt.2005.12.001 tynjälä, p. (1999). towards expert knowledge? a comparison between a constructivist and a traditional learning environment in the university. international journal of educational research, 33, 355–442. doi: http://dx.doi.org/10.1016/s0883-0355(99)00012-9 van dinther, m., dochy, f., & segers, m. (2011). factors affecting students‟ self-efficacy in higher education. educational research review, 6(2), 95–108. doi: 10.1016/j.edurev.2010.10.003 vygotsky, l. s. (1978). mind and society: the development of higher mental processes. cambridge, ma: harvard university press. wegerif, r. (2007). dialogic, education and technology: expanding the space of learning. new york: springer. wegerif, r., & de laat, m. (2011). using bakhtin to rethink the teaching of higher-order thinking for the network society. in s. ludvigsen, a. lund, i. rasmussen & r. säljö (eds.), learning across sites: new tools, infrastructures and practices (pp. 313-329). london: routledge. white, r. t., & gunstone, r. (1992). probing understanding. london: the falmer press. d. alt 28 | f l r windschitl, m. (2002). framing constructivism in practice as the negotiation of dilemmas: an analysis of the conceptual, pedagogical, cultural, and political challenges facing teachers. review of educational research, 72(2), 131–175. doi: 10.3102/00346543072002131 wyatt, t. h., krauskopf, p. b., gaylord, n. m., ward, a., huffstutler-hawkins, s., & goodwin, l. (2010). cooperative m-learning with nurse practitioner students. nursing education perspectives, 31(2), 109-113. doi: http://dx.doi.org/10.1043/1536-5026-31.2.109 fraenken et woznitza publication frontline learning research vol.7 no. 1 (2019) 43 50 issn 2295-3159 students’ objects of pride in a learner-focused school setting: an exploratory study judith fraenkena, marold wosnitzab arwth aachen university, germany bmurdoch university, perth, australia article received 28 june 2018/ revised 5 december/ accepted 19 december / available online 31 january abstract in the past decades, schools have become more autonomous and open learning environments. it therefore seems increasingly important for educational research to also consider contextual influences by including autonomous learning settings in its investigations. studying the positive activating emotion of pride seems useful to learn more about the effects of this schooling as pride results from exactly those aspects promoted by autonomous learning: self-evaluation, reflection, self-responsibility and attribution. moreover, pride becomes relevant for a deeper understanding of students’ learning and achievement as pride promotes the desire to repeat already performed achievements in the future. regarding the growing support of individual learning in schools, the present study investigates objects of pride of students attending a school that promotes autonomous, non-competitive, individualized and cooperative learning. students of this school plan their timetables and learning process individually and document it in learning logbooks in which they furthermore can state once a week what they are proud of. in total, 1063 pride statements from 134 students were collected from the learning logbooks. a complementary study, collecting students’ pride statements detached from the learning logbooks, identified 254 pride statements. results show that the pride focus of students at the examined school is learning-oriented. the findings indicate that the specific learning setting of the examined school provides specific school-based pride triggers and thus promotes the learning-oriented pride focus of the students. this paper shall serve as a basis for further research on students’ pride and objects of pride and its potential effects on motivation, achievement and school life. keywords: students’ objects of pride; school setting; student-centred teaching info corresponding author: email: judith.fraenken@rwth-aachen.de doi: 10.14786/flr.v7i1.387 1. introduction today, education systems all over the world are facing massive changes in schooling with schools becoming more autonomous and open learning environments being introduced in the past decades (eurydice, 2008). it therefore seems increasingly important for educational research to also consider contextual influences (e.g.,urdan & schoenfelder, 2006) by including different, non-traditional learning settings in its investigations. the overall aim of autonomous school settings is not only to individualise learning but have students take responsibility for their own learning, reflect and evaluate their own learning processes, feel that learning outcomes are their own success and set individual direction-giving goals for future learning processes (e.g., assor, 2012; madjar & assor, 2013; niemiec & ryan, 2009). in this context, studying students’ pride seems useful to learn more about the effects of this schooling. this positive activating emotion precisely results from those aspects promoted by autonomous school settings: pride can be seen as the consequence of a successful evaluation of a specific event or object for which one feels responsible (lagattuta & thompson, 2007; lewis, 2016). furthermore, for the elicitation of pride, one’s own actions and outcomes of these actions have to be reflected and attributed to internal factors (hart & matsuba, 2007; kornilaki & chlouverakis, 2004; tracy, robins, & lagattuta, 2005; tracy, shariff, & cheng, 2010). this process is based on a person’s self-concept which includes complex cognitive processes such as self-perception and self-evaluation, which is why pride is defined as a self-conscious emotion (lewis, 2016; tracy & robins, 2007). at the same time, the respective person’s society’s standards, rules and goals (srg’s) serve as reference point for the evaluation of their own actions and determination of success (lewis, 2016). consequently, students’ sense of responsibility, their ability of self-evaluation and reflection are pivotal for the elicitation of pride, which is why a school promoting those aspects should be examined in this research. moreover, the emotion of pride is relevant for students’ learning as pride is described as an emotion that results from personal achievement and further promotes the desire to repeat or even outdo this achievement in the future (fredrickson, 2001; lewis, 2016). pride is furthermore considered an incentive to persevere on a task despite initial costs (williams & desteno, 2008), which is why studies in this field additionally become relevant for a deeper understanding of students’ learning and achievement and the promotion of students’ sense of pride. while previous research has mainly focused on the impact of pride on achievement and motivation as well as the attributions of success or the correlations of pride to various aspects such as goal-regulation, self-control or achievement values (e.g.,buechner, pekrun, & lichtenfeld, 2016; carver, sinclair, & johnson, 2010; oades-sese, matthews, & lewis, 2014; pekrun, goetz, frenzel, barchfeld, & perry, 2011; weidman, tracy, & elliot, 2016; williams & desteno, 2008), the object of pride, i.e. what a person is proud of, has been left somewhat disregarded. an exploratory qualitative study aimed to categorize different domains and emphases of students’ pride in the school context of a traditional teacher-centred german comprehensive school (fraenken & wosnitza, 2018). five main categories could be found namely learning in school (aspects that are directly related to learning in school), social aspects (social aspects that do not have to be directly related to learning in school), activities besides (performance at) school (aspects that are established outside the classroom), me (aspects that are related to one’s own person and not specific actions), and persons and animals (aspects that are related to other people or animals). results indicated that students’ pride regarding learning in school appeared to be more achievement-oriented and less learning-oriented in the traditional teacher centred school (fraenken & wosnitza, 2018). this could be hypothesised to result from the school’s competitive learning setting with clearly defined, standardised goals which promotes achievement-focused goals (self-brown & mathews, 2003) and consequently achievement-focused pride after those goals are being achieved. in the present study, the objects of pride of students from a learner-focused and autonomy promoting school setting are being explored. according to self-brown and mathews (2003), in a non-competitive setting where learning goals are defined and evaluated individually, students’ goals are learning-oriented. concerning the empirically verified positive correlations between achievement goals and pride (pekrun, elliot, & maier, 2006, 2009), it can be assumed that students’ objects of pride are learning-oriented in the examined school where students set their own goals and define their own learning process. the finding that learner-centred and autonomy-supporting structures apparently have the strongest positive relation to students’ mastery goals compared to performance goals (ames, 1992; meece, 2003) strengthens this assumption. with a gender perspective, significant differences can be expected regarding students’ pride in learning in school as boys perform worse than girls in most performance areas of german schools (mößle & lohmann, 2014) and therefore seem to have less reason to be proud at that area. confirming this, female students from a traditional teacher-centred school made significantly more pride statements about learning in school than male students (fraenken & wosnitza, 2018). as women are rated as more communal and socially committed than men (brosi, spörrle, welpe, & heilman, 2016), it can be expected that female students make more statements about social aspects than male students. this was also found in a traditional school, where female students made significantly more pride statements about social aspects than male students (fraenken & wosnitza, 2018). furthermore, tracy and beall (2011) found out that men’s pride is considered an attractive expression whereas women’s pride is one of the least attractive. this could lead to female students revealing overall less pride than male students. by encouraging and supporting children’s feelings of pride in their academic success, their perception of being responsible for their own success is being promoted (thompson, 1991). knowing about the range of objects of pride could help detecting different kinds of potential pride triggers in order to encourage students’ pride. this study is a reaction to the emerging change in schools towards student-centred approaches. the overall aim is to investigate the objects and emphases of pride of students attending one progressive school that promotes autonomous, individualized, cooperative and non-competitive learning. it is expected that the students focus their pride on their learning process and progress and less on their achievement outcomes. the results are thought to lay a foundation for future research on pride and possible connections of objects of pride with motivation, learning and achievement. 2. methodology in order to examine the objects of pride of students in a learner-focused school setting, an exploratory qualitative approach was chosen. data were collected at a german comprehensive school with a student-centred approach to teaching. the five major subjects (german, english, maths, natural sciences, social sciences) are provided as topic-related modules with different degrees of difficulty, on which students work individually in their own time and speed. minor subjects are taught in topicand project-related workshops. the school supplies subject-specific classrooms in which students of all grades work individually or cooperatively on their current modules. teachers act as advisors and tutors while students also support each other. students’ learning process and progress is planned and documented individually by themselves in so-called “learning logbooks”, discussed in individual weekly meetings with a tutor, and monitored by exams which the students take as soon as they have completed a module and state to feel ready. in the learning logbooks, students can voluntarily state their personal goal of the week and additionally what they are proud of on a weekly basis. for this, students complete the phrase “i am proud of…” without having to focus on the school sector. these weekly statements of what students are proud of form the database of this study. the sample consisted of 134 students (school years 5-8, 57% female). pride statements were collected from the learning logbooks of one school year and were separated into single statements (n=1063). based on deductive categorisation according to qualitative content analysis (mayring, 2010, 2015), the existing category system of students’ pride in a traditional school (fraenken & wosnitza, 2018) was used during the coding process. all statements were coded into the category system of students’ pride. two different researchers coded 56.44% of all statements (interrater agreement κ=.91). complementary study: the logbook entries are also visible for the students’ teachers and parents and consequently not anonymous. in order to exclude a methodological effect, a complementary study was conducted to collect students’ pride statements detached from the learning logbooks. for two weeks, 110 students of the same school (school years 5-8, 46% female) wrote down their pride statements anonymously. phrasing was adopted from the learning logbooks “i am proud of…”. 254 single statements were identified. 3. results analyses of data showed a wide range of students’ objects of pride (overview of the number of statements in table 1). with 72.53% of all learning-logbook pride statements, students focused their pride on learning in school. the main emphasis of the pride statements within the category learning in school lied with 73.15% on the statements referring to learning process and progress. within the category learning process and progress various new subcategories, which did not appear in the existing category system of the traditional school, could be found which represented 71.45% of all statements within that category. for example, 43.97% of the pride statements were referring to modules and subjects (“that i could almost finish the math module” (#9-20 )), 22.87% to taking tests or exams (“that i can write my german test this week” (#18-24)) and 4.61% to achieving the personal goal of the week (“that i have reached my goal of the week” (#11-2)). most of the remaining statements (21.45%) regarding learning process and progress stated getting a lot of work done (“because i’ve accomplished so much” (#26-37)). the remaining statements named active participation in the classroom (5.67%), homework or learning at home (1.42%) and understanding something (0.35%). furthermore, within the main category learning in school, 26.85% of the statements focused on achievement outcomes and results, e.g., grades (“that i have a b+ in the spanish exam” (#27-14)) or praise (“that i was praised in class” (#16-7)). social aspects included statements about e.g., social behaviour (“that i deal well with my friends” (#43-14)) or classroom discipline (“because i haven’t broken any rules in three days” (#37-1)), which represented another newly found subcategory. activities besides (performance at) school meant for instance hobbies and spare time (“that i had a good, fun birthday” (#48-19)). the category me included e.g., statements about the own personality and characteristics (“that i am honest with my assessments” (#9-3)) whereas persons and animals referred to others, e.g., teacher (“i am proud of the teachers because they are always nice and helpful” (#41-33)). table 1 main categories of students’ pride in a student-centred school results of the complementary study revealed that the focus also lied on learning in school (46.85% of all statements) but compared to the logbook entries, students made significantly less statements about learning in school anonymously [χ²(1, n=1317)=61.707, p=.000]. within that main category, students focused on learning process and progress (73.11% of all statements within the category learning in school). the remaining 26.89% referred to achievement outcomes and results. in total, 22.04% of the statements were related to activities besides (performance at) school which are significantly more statements than within the logbooks [χ²(1, n=1317)=86.638, p=.000]. of all the remaining statements, 4.72% related to social aspects, 2.76% to me and 6.3% to persons and animals. overall, 3.94% of the statements announced not to be proud of anything and 13.39% of all statements could not be coded (rest). 4. discussion besides pointing out a wide range of pride triggers, the results indicate a connection between the students’ objects of pride and their autonomous and individual way of learning and operating in the examined school. as expected, the students’ pride focused on their learning (represented by the category “learning process and progress” and less on their performance (represented by the category “achievement outcomes and results”). one reason for the large number of statements about their learning process and progress could be the fact that the students can set their own realistic goals by determining and conducting their own learning process self-responsibly. as one has to feel responsible for success to feel pride (lewis, 2016), students have to feel personally responsible for their learning process and progress to consequently be proud of it which is being promoted by the school. this result underlines the importance of students’ perceived responsibilities in the achievement context and the associated pride focus. the responsibility for the students’ own learning process was also being reflected by newly found subcategories which showed that the specific learning setting of the examined school provides specific pride triggers. by providing modules with different degrees of difficulty to be chosen by the students who work on them in their own time and speed (subcategory modules and subjects), the school enables the students’ high prospective and, thus, retrospective responsibility for their personal learning process and progress and therefore a school-specific pride trigger. the subcategory taking tests or exams also shows the students’ high responsibility for their learning process and progress as they could only be proud of being able to write a test soon because they individually set the date for their exams based on their own ability and learning progress. helker and wosnitza (2016) found that students’ sense of responsibility for their own learning process and achievement correlates with their sense of competence and autonomy. this puts emphasis on the assumption that the students’ high level of autonomy promotes their sense of responsibility and therefore their sense of pride. the learning logbooks serve as a foundation for students’ self-reflection and consequently promote their ability of self-evaluation. by filling the logbooks with learning contents, learning goals and achievements or defaults, students reflect their learning process and progress every day and thus create the basis for the ability of attributing success to themselves and consequently feeling pride. by doing so, students do not evaluate their learning process independently of others but with regard to the school’s standards, rules and goals (srgs) which is essential for the feeling of pride (lewis, 2016). keeping the logbooks promotes the awareness and adaption of the school’s srg’s and therefore the elicitation of students’ pride focus on their learning process and progress. whereas keeping learning logbooks appears to promote the students’ sense of responsibility and self-reflection, it must also be considered that the logbooks, as research object, could have an impact on students’ pride statements as they are accessible to teachers and parents. even though, the results of the anonymous complementary study reveal that the students’ focus still lied on learning in school, only less prominent compared to the findings in the logbooks. by contrast, students expressed more pride about activities besides (performance at) school than they did within their learning logbooks. apparently, although the students were not demanded to focus their pride statements on the school sector, they instinctively did so when filling out the logbooks. the very low number of statements about activities besides (performance at) school in the learning logbooks could therefore be explained by social desirability. with regard to learning in school, students of the complementary study still focused on learning process and progress which confirms and reinforces the considerations made above. contrary to our expectations, no gender differences could be found regarding main or sub categories. with regard to the examined students’ focus on their learning process and progress instead of their achievement and outcomes, it is, however, plausible that potential gender differences in performance did not have a huge impact on their sense of pride. furthermore, due to the separately coordinated learning schedules and the attention to the individual, gender stereotypes apparently play a minor role than in a more competitive school with direct comparisons between students’ approaches and achievement. school year differences, however, could be found. the continuously decreased focus on achievement outcomes and results from school year to school year as well as the simultaneously occurred increasing focus on learning process and progress gives reason to assume that the students internalize the school’s srg’s (lewis, 2016) over time and therefore adapt their pride focus on learning continuously. as the elicitation of pride requires complex cognitive processes which are involving over time (hart & matsuba, 2007; lewis, 2016; tracy & robins, 2007), the results could reflect this involvement as the students seem to learn over time that they are responsible for their own learning process and can consequently be proud of it. in summary, the examined school itself offers school-related, learning oriented pride triggers and additionally promotes the students’ ability to perform complex cognitive processes in order to feel pride. in combination with previous research on students’ pride in a traditional school (fraenken & wosnitza, 2018) it can be assumed that different learning settings provide different potential objects of pride and promote different pride focuses. however, the present study is explorative and only related to one particular school. building on this study, future research should include various school settings and a greater number of schools in order to further investigate the assumed connection between students’ objects of pride and their school environment. additionally, the results of the present study should be used to investigate students’ degree of autonomy, responsibility and ability of self-reflection and its impact on their pride and by implication their motivation and achievement. as the results of the study indicate a connection between the students’ objects of pride and their autonomous and self-responsible learning, this should be further explored and verified. furthermore, the impact of students’ objects of pride on their learning process, achievement and motivation should be explored in order to find out if and how the objects of pride themselves matter in the achievement context. acknowledgements our sincere appreciation is extended to kerstin helker for her invaluable advice and her comments on the manuscript. keypoints an autonomous school setting may trigger students’ pride in making them feel responsible for their own learning. a learner-focused school setting promotes learning-oriented rather than outcome-oriented pride. the longer students attend an autonomous school setting, the more they tend to feel proud of their learning process and progress. references ames, c. (1992). classrooms: goals, structures, and student motivation. journal of educational psychology, 84(3), 261–271. doi:10.1037//0022-0663.84.3.261 assor, a. (2012). allowing choice and nurturing an inner compass: educational practices supporting students’ need for autonomy. in s. l. christenson, a. l. reschly, & c. wylie (eds.), the handbook of research on student engagement(pp. 421-439). new york: springer science. brosi, p., spörrle, m., welpe, i. m., & heilman, m. e. (2016). expressing pride: effects on perceived agency, communality, and stereotype-based gender disparities. journal of applied psychology , no pagination specified. doi:10.1037/apl0000122 buechner, v. l., pekrun, r., & lichtenfeld, s. (2016). the achievement pride scales (aps). european journal of psychological assessment, 1-12. doi:10.1027/1015-5759/a000325 carver, c. s., sinclair, s., & johnson, s. l. (2010). authentic and hubristic pride: differential relations to aspects of goal regulation, affect, and self-control. journal of research in personality, 44 (6), 698-703. doi:10.1016/j.jrp.2010.09.004 eurydice. (2008). levels of autonomy and responsibilities of teachers in europe. fraenken, j., & wosnitza, m. (2018). stolz im schulalltag worauf sind schülerinnen und schüler stolz? [pride in everyday school life what are students proud of?]. in g. hagenauer & t. hascher (eds.), emotionen und emotionsregulierung in der schule und hochschule(pp. 15-28). münster: waxmann. fredrickson, b. l. (2001). the role of positive emotions in positive psychology: the broaden-and-build theory of positive emotions. the american psychologist, 56(3), 218-226. doi:10.1037//0003-066x.56.3.218 hart, d., & matsuba, m. k. (2007). the development of pride and moral life. in j. l. tracy, r. w. robins, & j. p. tangney (eds.), the self-conscious emotions: theory and research(pp. 114-133). new york, ny, us: guilford press. helker, k., & wosnitza, m. (2016). the interplay of students’ and parents’ responsibility judgements in the school context and their associations with student motivation and achievement. international journal of educational research, 76, 34-49. doi:10.1016/j.ijer.2016.01.001 kornilaki, e. n., & chlouverakis, g. (2004). the situational antecedents of pride and happiness: developmental and domain differences. british journal of developmental psychology, 22(4), 605-619. doi:10.1348/0261510042378245 lagattuta, k. h., & thompson, r. a. (2007). the development of self-conscious emotions: cognitive processes and social influences. in j. l. tracy, r. w. robins, & j. p. tangney (eds.), the self-conscious emotions: theory and research(pp. 91-113). new york, ny, us: guilford press. lewis, m. (2016). self-conscious emotions. embarrassment, pride, shame, guilt, and hubris. in l. feldman barrett, m. lewis, & j. m. haviland-jones (eds.), handbook of emotions(4th ed., pp. 792-814). new york: the guilford press. madjar, n., & assor, a. (2013). two types of perceived control over learning: perceived efficacy and perceived autonomy. in j. hattie & e. m. anderman (eds.), international guide to student achievement(pp. 439-441). new york: routledge. mayring, p. (2010). qualitative inhaltsanalyse [qualitative content analysis]. in g. mey & k. mruck (eds.), handbuch qualitative forschung in der psychologie(pp. 601-613). wiesbaden: vs verlag für sozialwissenschaften. mayring, p. (2015).qualitative inhaltsanalyse: grundlagen und techniken [qualitative content analysis: basics and techniques] (12th ed.). weinheim: beltz meece, j. l. (2003). applying learner-centered principles to middle school education. theory into practice, 42(2), 109-116. doi:10.1207/s15430421tip4202_4 mößle, t., & lohmann, a. (2014). entwicklung akademischer leistungen im geschlechtervergleich [development of academic performance in gender comparison]. in t. mößle, c. pfeiffer, & d. baier (eds.), die krise der jungen. phänomenbeschreibung und erklärungsansätze (pp. 19-27). baden-baden: nomos. niemiec, c. p., & ryan, r. m. (2009). autonomy, competence, and relatedness in the classroom: applying self-determination theory to educational practice. school field, 7(2), 133-144. doi:10.1177/1477878509104318 oades-sese, g. v., matthews, t. a., & lewis, m. (2014). shame and pride and their effects on student achievement. in r. pekrun & l. linnenbrink-garcia (eds.), international handbook of emotions in education(pp. 246-264). new york: routledge. pekrun, r., elliot, a. j., & maier, m. a. (2006). achievement goals and discrete achievement emotions: a theoretical model and prospective test. the journal of educational psychology, 98(3), 583-597. doi:10.1037/0022-0663.98.3.583 pekrun, r., elliot, a. j., & maier, m. a. (2009). achievement goals and achievement emotions: testing a model of their joint relations with academic performance. journal of educational psychology, 101(1), 115-135. doi:10.1037/a0013383 pekrun, r., goetz, t., frenzel, a. c., barchfeld, p., & perry, r. p. (2011). measuring emotions in students’ learning and performance: the achievement emotions questionnaire (aeq). contemporary educational psychology, 36(1), 36-48. doi:10.1016/j.cedpsych.2010.10.002 self-brown, s. r., & mathews, s. (2003). effects of classroom structure on student achievement goal orientation. the journal of educational research, 97(2), 106-112. doi:10.1080/00220670309597513 thompson, r. (1991). emotional regulation and emotional development. educational psychology review, 3(4), 269-307. doi:10.1007/bf01319934 tracy, j. l., & beall, a. t. (2011). happy guys finish last: the impact of emotion expressions on sexual attraction.emotion, 11(6), 1379-1387. doi:10.1037/a0022902 tracy, j. l., & robins, r. w. (2007). the self in self-conscious emotions. a cognitive appraisal approach. in j. l. tracy, r. w. robins, & j. p. tangney (eds.), the self-conscious emotions: theory and research(pp. 3-20). new york: the guilford press. tracy, j. l., robins, r. w., & lagattuta, k. h. (2005). can children recognize pride? emotion, 5(3), 251-257. doi:10.1037/1528-3542.5.3.251 tracy, j. l., shariff, a. f., & cheng, j. t. (2010). a naturalist’s view of pride. emotion review, 2(2), 163-177. doi:10.1177/1754073909354627 urdan, t., & schoenfelder, e. (2006). classroom effects on student motivation: goal structures, social relationships, and competence beliefs. journal of school psychology, 44(5), 331-349. doi:10.1016/j.jsp.2006.04.003 weidman, a. c., tracy, j. l., & elliot, a. j. (2016). the benefits of following your pride: authentic pride promotes achievement. journal of personality, 84(5), 607-622. doi:10.1111/jopy.12184 williams, l. a., & desteno, d. (2008). pride and perseverance: the motivational role of pride. j pers soc psychol, 94 (6), 1007-1017. doi:10.1037/0022-3514.94.6.1007 microsoft word castello et al_publication.docx frontline learning research vol.3 no. 3 special issue (2015) 1-4 issn 2295-3159 corresponding author: montserrat castelló, facultat de psicologia, ciències de l’educació i l’esport. blanquerna. universitat ramon llull, císter 34. 08022. barcelona. phone: +34932533000, fax: +34932533031, email: montserratcb@blanquerna.url.edu doi: http://dx.doi.org/10.14786/flr.v3i3.197 trends influencing researcher education and careers: what do we know, need to know and do in looking forward montserrat castellóa, lynn mcalpineb and kirsi pyhältöc auniversity of ramon llull, spain buniversity of oxford, uk cuniversity of oulu and university of helsinki, finland article received 3 august 2015 / revised 19 august 2015 / accepted 20 august 2015 / available online 23 october 2015 abstract earli sig 24, researcher education and careers (sig-reac), was founded because increasing interest has emerged within the earli community into understanding different aspects of doctoral and post-phd researcher educational and career development. this special issue brings together the outcome of our first scholarly discussion at the sig-reac inaugural meeting in september 2014 in barcelona. the goal of each of the five co-authored papers is to make visible what has been overlooked, and to attend to methodological considerations in order to draw out future lines of research. as a collection, the papers address multiple levels and issues of researcher education: establishing the multifaceted phenomenon that is researcher education and careers and providing key concepts that others might take up, e.g., informal/invisible curriculum; the personal as a sphere of activity that may collide with the sphere of work; drivers of education that can provide cross-national points of comparison. further, by identifying gaps in the literature, these papers together lay out an ambitious research agenda in a number of areas related to researcher education. in the process, they provide an extensive list of references well worth exploring since they represent the knowledge networks of over thirty researchers. in this editorial paper the sig-reac is presented, and the characteristics of the papers, their limitations and some future challenges of researcher education are discussed. keywords: researcher education, career development; post phd education; phd education; cross-cultural research castelló et al | f l r 2 earli sig 24, researcher education and careers (from now on sig-reac), was founded because increasing interest has emerged within the earli community into understanding different aspects of doctoral and post-phd researcher educational and career development. this special issue brings together the outcome of our first scholarly discussion at the sig-reac inaugural meeting in september 2014 in barcelona. our goal was to construct a richer, more comprehensive view of researcher education and careers: to begin to address the theoretical and methodological challenges underlying research and theory development in this area in order to create a shared agenda for the future. the meeting (and the preparation for it) launched collaborative writing that challenged us collectively to make transparent different theoretical perspectives, methods and methodologies. our goal was to negotiate these differences in order to articulate a commonly understood research agenda. while we shared an interest in examining the experiences of early career researchers we come from a variety of locations: geographic, disciplinary, career stage and intellectual tradition. when researchers from different theoretical, methodological, and national spaces want to do ‘real work’, it takes time to really understand each other and negotiate new understandings. therefore, the preparation for the sig-reac meeting included participants writing individual positions papers in which they addressed the following questions: what are the emerging trends in the research environment essential to better/more fully understand early career researcher (ecr) experience? what do we learn about ecr experience of the emerging trends by looking across the fields of academic communication, sociology of work, pedagogy? what are the gaps? what has been overlooked? what different methods and methodologies have been used across the three fields? which of these has been productive? what has been overlooked? in this way, we had the opportunity before the meeting to read each other’s thoughts and begin to get a sense of the richness and diversity in the group as well as common concerns, conceptions or methodologies. the preparation for the sig-reac meeting also included launching pre-discussions via moodle, based on reading each other’s papers. altogether 31 scholars from fourteen different countries participated in the sig meeting, where we launched co-writing, and worked together in small groups intensively for two days. post-meeting, this face-to-face work shifted to virtual exchanges and the special issue represents the results of our continued discussion over ten months. the special issue consists of five co-authored papers. the goal of each is to make visible what has been overlooked, and to attend to methodological considerations in order to draw out future lines of research. each of the papers addresses a specific aspect of researcher education and careers in order to develop a future research agenda: • drivers and interpretations of doctoral education today contributes to the literature on researcher education by examining the ways in which core global trends and drivers of higher education emerge in different guises at national levels. the paper compares recent doctoral education changes in the following countries – canada, colombia, denmark, finland, uk, and the usa – to provide insights on how global trends translate into local policies. by using the same global drivers as criteria across national boundaries, it is possible to see how educational policies are formed in considerably different ways. this raises questions about the universality of the phd. in the discussion, a research agenda for comparative studies is discussed. • the curriculum question in doctoral education begins by stating that although a global trend in researcher education has been developing more systematic doctoral education to enhance the quality of research and researchers, the value of a curricular perspective has remained largely unexplored both theoretically and empirically. it is argued that adopting an explicit curriculum approach is significant not only because it might help to disclose the tensions, but also because it allows us to face and reinterpret current challenges to doctoral education. first the concept of the curriculum in doctoral education is discussed and tensions between the formal/informal, open/hidden, and standardised/pluralised dimensions of curriculum are discussed. then, processes –how the curriculum is experiencedand outcomes – assessment and employabilityof doctoral education are addressed. finally, a research agenda drawing on notions of curriculum to help reconfigure doctoral education is proposed. castelló et al | f l r 3 • the doctorate as an original contribution to knowledge: considering relationships between originality, creativity, and innovation explores the meaning of originality in doctoral studies and its relationship with creativity and innovation. the paper opens up discussion about the taken-for-granted traditional expectation of ‘originality’ as an outcome of doctoral research. it does so by juxtaposing ‘originality’ with the notions of ‘innovation’, and ‘creativity.’ by exploring the similarities and differences among the concepts, the paper provides insight into both the possible meanings of ‘originality’ in research as well as the utility of the term in the context of 21st century knowledge societies. some future research steps are suggested to move towards unpacking the relationship between doctoral training conditions and outcomes, in the sense of fulfilling the requirement of originality. • mentoring: a review of early career researcher studies describes the result of a focused literature review of studies on early career researcher as a base for further inquiry into mentoring, given the frequent reference to mentoring as a source of support for early career researchers, e.g., eu concordat on researchers. the most striking finding of this analysis was the unand underconceptualized nature of empirical studies. there is much research to do, first, to better inform our conceptualization of early career researcher mentoring and, second, to better understand the value of specific aspects of mentoring support. • researcher identities in transition: signals to identify and manage spheres of activity in a risk-career argues that changes in ‘knowledge societies’ mean researchers are now embarked upon what could be defined as a ‘risk-career.’ this paper uses a framework of researcher identity produced by analysing spheres of activity and individuals’ ability to identify and interpret external signals (expectations, constraints and opportunities) to account for theoretical assumptions about researcher identity. it is argued that applying the framework to empirical examples of tensions in identity construction provides the basis for future research to unravel the complex interplay between signals and spheres of activity when dealing with the tensions and struggles of becoming a researcher. as a collection, the papers address multiple levels and issues of researcher education: establishing the multifaceted phenomenon that is researcher education and careers and providing key concepts that others might take up, e.g., informal/invisible curriculum; the personal as a sphere of activity that may collide with the sphere of work; drivers of education that can provide cross-national points of comparison. further, by identifying gaps in the literature, these papers together lay out an ambitious research agenda in a number of areas related to researcher education. in the process, they provide an extensive list of references well worth exploring since they represent the knowledge networks of over thirty researchers. still, there are limitations represented in this special issue. while it has explored in depth a number of issues, we are mindful there remains much to explore. for instance, they mostly focus on doctoral experience, as does much of the research in this area. so we encourage ourselves and other researchers to pay greater attention to postdoctoral experience, both in and out of academia. for instance, the vertical transition from doctoral student to post-doctoral researcher still remains largely uncharted, as do horizontal transitions e.g. from academia to other types of careers. we know that internationally, more than half of phd graduates leave academia whether by choice or lack of opportunity (barnacle & dall’alba, 2011). what appears to be emerging internationally is a range of alternate academic positions: contract teaching, contract post-phd research, and increasingly teaching-only lecturer positions, as well as administrative positions related to research and teaching. in the non-academic context, emerging types of employment include business, government, ngos, banking, industry, and previously unknown positions, e.g., start-ups. unfortunately we know little of the experience of individuals in any of the three fields, e.g., the extent to which they have the skills needed, their satisfaction with their employment, what range of genre they use. this is especially the case as regards a theoretical perspective since most of the available evidence is non-theorized survey data. such studies are needed to gain better understanding of the complexity of researcher careers. as well, postdoctoral supervision is also an underexplored issue that deserves more research interest. post-phd researchers consistently report they do not receive supervisory support to develop as researchers, further that they are even discouraged from seeking out professional development opportunities themselves. castelló et al | f l r 4 as long as such individuals are not conceived as becoming researchers, the supervisory attitudes they report are unlikely to change concluding remarks this special issue maps some of the uncharted terrain of inquiry into researcher education and careers. national developments in researcher education are affected by the global forces which, however, take different forms in national and local contexts. this became particularly apparent to us at our barcelona meeting where we represented fourteen different national contexts. there is still an insufficient understanding of how and in which forms global trends (which we collectively believe we understand) are translated into the local practices of researcher education, and their effect on doctoral education and academic work (which we collectively may not understand, though believe we do). accordingly, our overall conclusion is the need for well-designed international comparative studies so that as researchers we can gain a concrete understanding of the effects of global developments for researcher education and careers. we hope that the papers in this special issue evoke curiosity, provoke discussions and stimulate both theoretical and especially empirical research on researcher education and careers. the various approaches, empirical evidence and challenges identified in the papers highlight the importance of and the need for further research into this fascinating area. we look forward to lively discussion, commentaries and research papers addressing the new terrains in this area of research and encourage you to join us in earli sig 24, researcher education and careers (sig-reac). references barnacle, r., & dall’alba, g. (2011). research degrees as professional education? studies in higher education, 36(4), 459-470. http://dx.doi.org/10.1080/03075071003698607 cantwell, b. (2011). academic in-sourcing: international postdoctoral employment and new modes of academic production. journal of higher education policy and management, 33(2), 101-114. http://dx.doi.org/10.1080/1360080x.2011.550032 evans, l. (2011). the scholarship of researcher development: mapping the terrain and pushing back boundaries. international journal for researcher development, 2(2), 75-98. http://dx.doi.org/10.1108/17597511111212691 laudel, g., & glaser, j. (2008). from apprentice to colleague: the metamorphosis of early career researchers. higher education, 55, 387-406. http://dx.doi.org/10.1007/s10734-007-9063-7 frontline learning research vol. 11 no. 1 (2023) 1 39 issn 2295-3159 examining classroom contexts in support of culturally diverse learners’ engagement: an integration of self-regulated learning and culturally responsive pedagogical practices aloysius c. anyichie, deborah l. butler, nancy e. perry & samson m. nashon the university of british columbia, vancouver, bc, canada. article received 14 june 2022 / article revised 25 january 2023 / accepted 25 january 2023 / available online 14 february 2023 abstract research shows that culturally diverse students are often disengaged in multicultural classrooms. to address this challenge, literatures on self-regulated learning and culturally responsive teaching both document practices that foster engagement, although from different perspectives. this study examined how classroom teachers at schools that enrol students from diverse cultural communities on the west coast of canada built on a culturally responsive self regulated learning (cr-srl) framework to design complex tasks that integrated srl pedagogical practices (slpps) and culturally-responsive pedagogical practices (crpps) to support student engagement. two elementary school teachers and their 43 students (i.e., grades 4 and 5) participated in this study. we used a multiple, parallel case study design that embedded mixed methods approaches to examine how the teachers integrated srlpps and crpps into complex tasks; how culturally diverse students engaged in each teacher’s task; and how students’ experiences of engagement were related to their teachers' practices. we generated evidence through video-taped classroom observations, records of classroom practices, students’ work samples, a student self-report, and teacher interviews. overall findings showed: (1) that teachers were able to build on the cr-srl framework to guide their design of a cr srl complex task; (2) benefits to students’ engagement when those practices were present; and (3) dynamic learner context interactions in that students’ engagement were situated in features of the complex task that were present on a given day. we close by highlighting implications of these findings, limitations, and future directions. keywords: culturally diverse learners; engagement; self-regulated learning; culturally responsive teaching; complex task. this article reports findings from aloysius c. anyichie’s dissertation study at the university of british columbia, vancouver. this novel study received the best 2019 dissertation award in educational psychology from the canadian ass ociation for educational psychology (caep). corresponding author’s current address: aloysius c. anyichie, department of educational psychology and student services, brandon university, 270 18th st, brandon, mb r7a 6a9, canada. anyichiea@brandonu.ca doi:https://doi.org/10.14786/flr.v11i1.1115 mailto:anyichiea@brandonu.ca 2 1. introduction across north america, classrooms are increasingly including students from different linguistic and cultural backgrounds with diverse learning experiences and needs. currently, there are too many experiences of systemic racism and inequality between mainstream and racialized minority students. thus, this research is centrally concerned with how educators can create inclusive and equitable classrooms wherein every student is respected, experiences a sense of belonging, feels safe and is empowered to learn. for the research reported here, we considered that all students bring into the classroom their socio-cultural histories (e.g., ways of being and knowing) and practices including their previous experiences (e.g., of processes of learning in their previous schools and cultural environments), unique individual differences (e.g., interest), and expectations (e.g., about goals of learning, aspirations) that interact with classroom contextual features to shape their learning experiences including engagement (anyichie, 2018; bang, 2015; butler & cartier, 2018, cartier & butler, 2016; gray et al., 2020; gay, 2010; graham, 2018; okoye & anyichie, 2008). in culturally diverse classroom contexts, students from non-dominant cultures are at greatest risk for a lack of engagement because classroom activities are often disconnected from their backgrounds, interests and lived experiences. also, educators often struggle to create supportive learning environments for underrepresented and racialized learners (gay, 2010; 2018). given these challenges, research is needed to better understand how educators can design classroom environments to support culturally diverse learners’ engagement. research on culturally inspired pedagogies and self-regulated learning are very helpful in this inquiry. the diverse literature on asset-based culturally informed pedagogies (e.g., culturally responsive teaching, culturally relevant pedagogy) is helpful in how it foregrounds the influence of cultural background, heritage, and practices on individual learning processes (e.g., gay, 2018; ladson-billings, 1995, 2001; villegas & lucas, 2002). for example, research in culturally responsive teaching (crt) reveals how classroom activities that are personally meaningful such as teaching skills or academic knowledge in ways that connect to students’ cultural knowledge, frames of reference and lived experiences increase their interest and offer support to culturally diverse learners (e.g., gay, 2018). that said, most attention in this literature has focused on teacher practice in terms of the design and implementation of pedagogical practices rather than on how learners within those classrooms experience or engage with them. in addition, culturally sustaining pedagogy (paris, 2012) emphasizes the need to sustain students’ languages, literacies, and the cultural ways of being in communities of colour in order to foster cultural pluralism and bring about positive social transformation (paris, 2021; paris & alim, 2017). scholars working from these different perspectives offer suggestions on how to address issues of educational inequality, power, systemic racism, social injustice, achievement gaps; and enrich learning experiences especially among racialized students and communities of colour (andrews, 2021, mccarty & brayboy, 2021, howard, 2021, paris, 2021; young, 2010). another promising line of research, in terms of fostering culturally diverse learners’ engagement, focuses on self-regulated learning (srl). srl refers to individual and social forms of learning that empower learners to take control of their thoughts and actions in order to navigate environmental challenges and achieve valued goals (zimmerman, 2008). srl research has investigated how to empower learners’ participation by embedding srl-promoting practices into learning activities (butler et al., 2017; dignath & veenman, 2020). for example, srl research shows that when students are provided with choice and opportunities to control the amount of challenge they are experiencing in an activity, gains ensue in engagement, agency, and achievement (e.g., perry, 2013). srl researchers are increasingly paying attention to how social and cultural contexts might be shaping the nature and quality of students’ learning experiences (anyichie, 2018, anyichie & butler, in press; anyichie et al., 2016; hadwin & oshige, 2011; järvenoja et al., 2015; mcinerney & king, 2018; perry et al., 2017). but more research is needed into how educators can use srl-promoting practices to create culturally responsive, relevant, sustaining, and inclusive environments and foster culturally diverse learners’ engagement and achievement. 3 given their complementary foci, it might be generative to combine culturally responsive teaching (crt) and self-regulated learning (srl) practices to foster engagement for culturally diverse learners (anyichie, 2018; anyichie & butler, 2017). combining these practices has the potential to support engagement, motivation and active, intentional learning by informing the design of activities that are meaningfully relevant to learners’ socio-cultural histories, foster agency and empower learning (anyichie, 2018; anyichie & butler, 2018; gray et al., 2020; gay 2013; kumar et al., 2018). activities that combine srl and crt practices could also be constructed in ways that sustain underrepresented students’ values, language and cultural practices while providing access to the dominant culture (paris, 2021). thus, the research conducted here investigated how the engagement of learners from culturally diverse backgrounds might be fostered in a classroom context that integrates self-regulated learning and culturally responsive pedagogical practices. the next section discusses how crt and srl pedagogical practices could be integrated into a classroom context to support culturally diverse students’ engagement. 1.2. creating classroom contexts to support engagement: crt and srl pedagogical practices research from both the crt and srl fields identifies pedagogical practices that are associated with students’ engagement. first, although they emerge from different perspectives, frameworks such as culturally relevant pedagogy (e.g., ladson-billing 1995), culturally sustaining pedagogy (e.g., paris, 2012), and culturally responsive teaching (e.g., gay, 2010; villegas & lucas, 2002) all foreground the impact of sociocultural contexts on individuals’ learning process. building on the understanding that learners tend to actively engage in classroom learning activities they perceive to be relevant to their cultural backgrounds and values, these frameworks suggest supportive practices. in particular, this study was inspired by crt because of its emphasis on how to design classroom instructional practices to support learning for racialized and underrepresented students of colour (gay, 2018). some examples of culturally responsive pedagogical practices (crpps) include designing cultural diversity into curriculum content (e.g., by adjusting and situating curricula to connect with students’ prior knowledge and lived experiences by using multicultural textbooks), establishing cross-cultural communications (e.g., by creating opportunities for social interactions about personal or cultural issues), developing cultural competence (e.g., by fostering teachers’ and students’ understanding and knowledge of their cultures and that of other students), and creating cultural congruity (e.g., by using students’ cultural background and knowledge, histories, identities, frames of reference, interests, aspirations, and lived experiences as resources for teaching and learning) (gay, 2013, 2018; ladson-billings, 2021). these practices have been found to relate to student engagement, achievement and learning (aceves & orosco, 2014; gay, 2018; howard & rodriguez-minkoff, 2017; ladson-billings, 2021; villegas & lucas, 2002; ginsberg & wlodkowski, 2015). for example, gray et al., (2020) found out that black and latinx students were more engaged in classroom activities that were relevant to their cultural values of communal benefits for learning. as a complement to research on culturally responsive pedagogies, srl research identifies practices that also support learning in context and foster engagement. self-regulating learners are proactively engaged in their own learning process (zimmerman, 2002). for example, they generate and implement relevant cognitive strategies for successful learning (wolters & taylor, 2012). srl-promoting practices (srlpps) include giving students opportunities of making decisions about their learning; making choices and controlling the amount of challenge they are experiencing; engaging in formative assessments (e.g., selfand peerassessment) and cycles of strategic action (e.g., task interpretation, goal setting, planning, enacting strategies, self-monitoring, strategy adjustment); and self-evaluating their work in relation to criteria. these practices are consistently associated with the quality of student engagement and success (anyichie & butler, 2015, in press; anyichie & onyedike, 2012; mccann & turner, 2004; perry, 2013; schmidt et al., 2018). for example, perry et al., (2020) reported that students with high levels of support for srl (e.g., supports for self-assessment) in a writing task experienced higher levels of engagement in srl resulting in a higher quality of writing product. 4 historically, srl models (e.g., boekaerts & corno, 2005; cartier & butler, 2016; efklides, 2011; pintrich, 2000; winne & hadwin, 1998; zimmerman, 2000) have foregrounded individual and social processes in learning. as one example, butler and cartier’s (2018) situated model of srl emphasizes the role of dynamic interactions between what individuals bring to contexts (e.g., prior culturally-rooted experiences, histories, identities) and features of contexts (e.g., activity design) in shaping learning engagement. we drew on their situated model to offer a practical framework for developing an integrated pedagogy. explicitly integrating a culturally-relevant, sustaining and responsive focus into research on srl has the potential to further contribute by investigating how culturally diverse learners’ engagement can be supported when instructional practices are deliberately designed to facilitate both individual and sociocultural processes associated with learning (anyichie, 2018; anyichie & butler, in press; anyichie, et al., 2016). to build on the practices identified in these two areas of research, “a culturally responsive self regulated learning (cr-srl) framework” (see figure 1) was developed. this framework emerged from a theoretical and empirical analysis of the divergences and synergies between crt and srl principles and practices (see anyichie, 2018, anyichie & butler, 2017 for the details of its development). this framework was designed to integrate these two areas of research as a support for educators who want to create supportive environments where new learning is built on learners’ prior knowledge, histories and lived experiences in ways that will motivate students to engage in co-construction of new knowledge. research based pedagogical practices that are integrated in this framework have been associated with gains in student engagement, motivation, srl and achievement (aceves & orosco, 2014; anyichie, 2018; anyichie & butler, 2018, in press; brayboy & castagno, 2009; elaine & randall, 2010; onyedike & anyichie, 2012; perry et al., 2020; revathy et al., 2018; wolters & taylor, 2012). figure 1. a culturally responsive self-regulated learning framework source: adapted from anyichie (2018). this framework includes three broad classes of practices that are mutually interdependent. first, classroom foundational practices refer to all those proactive activities educators put in place in setting up a classroom context to set the stage for a culturally relevant, responsive, sustaining, inclusive, and empowering classroom community. foundational practices described in both crt and srl literatures include (1) fostering knowledge of learners; and (2) creating caring, safe, and supportive environments (banks et al., 2005; butler et al., 2017; rahman et al., 2010). for example, “knowledge of learners,” as a strategy, refers to practices teachers employ to gain a better understanding of their students’ background 5 histories and heritage (e.g., cultural identity, experiences, learning interests and needs, ways of knowing and being) that they can build on when designing instructional practices. educators can start with activities like ice breakers, a know yourself game, or background surveys to gather some basic information about the students and their experiences (anyichie, 2018). to better support culturally diverse students, educators need to develop their own cultural competence, and improve their awareness of students’ experiences of cultural diversity, issues of power, educational systemic racism, historical legacies of colonialism, and inequity. educators can gain this knowledge by questioning their own individual assumptions, critically examining these socio-political issues and sharing ideas with others (andrews, 2021; ginsberg & wlodkowski, 2015; gay, 2018; ladson-billings, 2021; paris, 2012; paris & alim, 2017). these practices can generate a knowledge base that are more likely to help educators in creating a caring, safe and supportive environment that is responsive and relevant to the students’ cultural histories (gay, 2018, ladson-billings, 2021). for example, knowledge of learners offers educators opportunities of connecting present learning to learners’ backgrounds and culturally situated lived experiences (crpp), empowering students’ active engagement in co-construction of new ideas (srlpp) and sustaining their ways of being (crpp) (paris & alim, 2017). second, designed instructional practices are at the heart of this framework. these practices represent strategic combinations of srlpps and crpps within classroom environments or activities. srlpps (e.g., opportunities for making choices and exercising control over learning challenge) could be woven into learning activity to foster the meaningfulness and relevance of the task to students’ cultural background (crpp). for example, an animal research project (e.g., on animal adaptation or habitation) with opportunities for students to choose an animal to research and decide how to demonstrate their learning can offer multiple prospects for diverse students’ engagement. such animal project creates opportunities for diverse students to bring in their prior knowledge, lived experiences (e.g., about their chosen animal) and express their knowledge in ways that are relevant to their cultural practices. a “complex” task creates a rich context for integrating these practices. perry (2013) defines “complex” tasks as those learning activities that address multiple instructional goals (e.g., mastering learning content, writing and reading strategies); integrate across subject areas (e.g., science, social studies); focus on large chunks of meaning about the learning content (e.g., having an animal project that invites students to describe the animal habitat and generate relevant facts about their animal); engage students in making meaningful choices (e.g., topic to write about, who to work with in a group activity); involve students in cognitive (e.g., attention, thinking) and metacognitive processes (e.g., engagement in cycle of strategic actions); include individual and social forms of learning (e.g., working alone or in groups); and allow multiple ways of demonstrating knowledge and learning (e.g., drawing, writing, oral presentation) (butler et al., 2017). complex tasks can support students’ srl. for example, an animal project could be deliberately designed as a complex task that connects with students’ cultural background and lived experiences (crpps); and empower their agency towards sustaining their cultural values and practices (srlpp, crpp), by providing opportunities for making choices and exerting control over learning challenges (srlpps). combining srlpps and crpps allow educators to establish cultural congruity and design culturally diverse curriculum context in their teaching (gay, 2018). for instance, providing culturally responsive, relevant and sustaining choices within a complex task (srlpp & crpp simultaneously, such as asking students to choose an animal with cultural or religious relevance), and/or a sequence of crpps and srlpps woven into the same classroom learning activity has promise in fostering engagement in culturally diverse classrooms (anyichie, 2018). finally, dynamic supportive practices describe the supports provided to students as their learning engagement unfolds in context. dynamic practices can also weave together srlpps and crpps. for instance, students could be offered multidimensional feedback from teachers, peers and parents (e.g., 6 identifying specific things that could be done to improve an on-going learning task) or using formative assessments (e.g., completing criterion-based selfand/or peer assessment forms) (butler & cartier, 2018; nicol & macfarlane-dick, 2006) that are culturally relevant and sustaining (egbo, 2019; montenegro & jankowski, 2017; ladson-billings, 2021). research has been identifying different ways that educators can design classroom contexts that build on this integrated pedagogy to support all learners’ engagement in multicultural classrooms (anyichie, 2018; anyichie & butler, 2018; 2019; in press; anyichie et al., 2019). for example, anyichie (2018) conducted a pilot study to examine the potential of a cr-srl framework in supporting culturally diverse learners’ engagement. in that study, he collaborated with a classroom teacher (i.e., venus 1 ) in a multicultural classroom in designing practices based on the cr-srl framework. analysis of classroom observations and documents showed that venus was able to build from the framework to enact crpps and srlpps in her class that reflected each of the three dimensions, although how she did that differed across subject areas. triangulation of multiple sources of data (e.g., classroom observations, teacher and student interviews, student surveys, experience sampling forms, student work samples) showed how culturally diverse learners’ engagement could be linked to the ways in which venus enacted cr-srl practices. nevertheless, this first study was limited by a small sample size (e.g., one classroom; just one teacher and 6 of her students) and precluded the ability to conduct cross-contextual analysis. we designed the current study to involve parallel cases of two classrooms, teachers, and students to better examine how classroom contexts designed by educators who build from the cr-srl framework might be instrumental in supporting culturally diverse learners’ engagement. 1.3. understanding engagement engagement is a critical piece in understanding students’ learning experiences; and it is associated with many positive outcomes including students’ achievement (appleton et al., 2008; christernson et al., 2012; fredricks et al., 2004; fredricks et al., 2019; kahu, 2013; reeve & tseng, 2011; turner et al., 2014; xie et al., 2019). there are variations in ways engagement has been conceptualized, defined, and studied (appleton et al., 2008; sinatra et al., 2015). many researchers have come to define engagement as a multidimensional construct with three distinct but interrelated dimensions including affective/emotional, behavioural, and cognitive engagement (fredricks et al., 2004, 2016; wang et al, 2011). behavioural engagement describes students’ overt behaviour and involvement in academic tasks and learning activities including attention, time on task, participation, concentration, and asking and answering questions (fredricks et. al., 2004; sinatra et al., 2015). emotional engagement refers to students’ feelings, attitudes and reactions about classroom tasks including boredom, anxiety, frustration, sadness, interest, happiness, enjoyment and belonging (pekrun & linnenbrink-garcia, 2012; schunk et al., 2013). cognitive engagement defines students’ deliberate investment of needed effort in their learning activities, metacognition, self-regulation of learning such as assessment, use of cognitive strategies, reflection, engagement in cycles of strategic action, active use of prior knowledge, and persistence in challenging tasks (cleary & zimmerman, 2012; fredricks & mccolskey, 2012). recently, researchers have added attention to a fourth dimension, agentic engagement (e.g., reeve & tseng, 2011). reeve (2013) defines agentic engagement as a “student-initiated pathway to a more motivationally supportive learning environment” such as active contribution to the flow of a learning activity including making suggestions and offering input (p. 581). while these conceptual distinctions are common in self-reported literature (e.g., jang et al., 2016), it can be difficult to observe and tease them apart in practice due to their overlap. for example, student emotional engagement (e.g., enjoyment) has been shown to be related to behavioural engagement (pietarinen et al., 2014). there is also a correlation between cognitive and behavioural engagement (martin, 2007; wang et al., 2011); and they tend to overlap by involving effort. however, they can be distinguished 1 all names in this article are pseudonyms. 7 by the nature of the effort expended. for instance, effort that involves doing the task (e.g., spending extra time) reflects behavioural engagement more than effort that is triggered by interest and motivation (e.g., deploying different strategies to master a challenging task or class material) which relates cognitive engagement (fredricks et al., 2004). engagement has been studied separately in the fields of srl and crt. srl research documents evidence that srlpps (e.g., choice provision, self and peer assessments) foster student engagement because they typically position students as owners of their learning while increasing their perceived autonomy (montenegro, 2017; patall et al., 2016; perry et al., 2020). research among culturally diverse students shows that students’ engagement is fostered through crpps (e.g., designing learning activities that are relevant to diverse learners’ prior knowledge and lived experiences) (kumar, et al, 2018; gray, et al., 2020). however, less research has been conducted about the integration of srlpps and crpps in supporting culturally diverse students’ engagement. in addition, attention is drawn to using multiple approaches to measuring student engagement instead of depending only on self-report instruments due to the discrepancy between students’ report and their actual actions (fredricks et al., 2019; greene, 2015). thus, this study adds to current literature on engagement by gathering multiple sources of data (i.e., quantitative and qualitative) to capture student engagement in classroom context that integrated srlpps and crpps. overall, student engagement involves a range of thoughts and actions that advance learning and lead to academic progress (reeve, 2013). due to the overlap among the different dimensions of engagement, limitations of self-report instrument, challenges of observing and making fine distinctions between forms of engagement in practice, for this study we operationalize engagement as an integrated construct that defines the process and the quality of a student’s active participation in a learning activity in relation to achieving task expectations. further, student engagement is malleable and situated in context (fredricks et al., 2004; salmela aro et al., 2016). for example, research has identified how a dynamic interaction between individuals and classroom contexts shapes learning experiences, including engagement and motivational processes such as interest, enjoyment and importance (anyichie, 2018; anyichie & butler, in press; butler & cartier, 2018; järvenoja et al., 2015; nolen et al., 2015; shernoff et al., 2016). based on expectancy-value-theory (wigfield & eccles, 2000; 2020), utility-value intervention research (e.g., harackiewicz & priniski, 2018; hecht, et al., 2021; hulleman et al., 2010; yeager et al., 2013) highlights the impact of students’ perceived usefulness or value of academic tasks on their engagement, interest, persistence, self-regulation, effort, and performance. thus, in this study, we focused attention on how culturally diverse students’ overall engagement could be associated with their perceived contextual features including values of activities that educators built into their practice interms of being interesting, enjoyable and important. 2. research questions the purpose of this study was to examine how two teachers were able to integrate srlpps and crpps into complex tasks so as to support their culturally diverse students’ engagement. our research questions were: (1) how did the teachers integrate crpps and srlpps in complex tasks? (2) how were culturally diverse students engaging in those complex tasks? and (3) how was culturally diverse students’ engagement related to the cr-srl practices enacted? 3. method 3.1. design this study involved two parallel case studies situated in elementary classrooms (i.e., one grade 4; one grade 5) within which teachers designed complex tasks to support culturally diverse learners’ 8 engagement. a case study design was chosen because of its effectiveness in examining a complex, dynamic and multidimensional phenomenon as it manifests in situ (merriam, 2009; yin, 2014). for instance, case study designs provide a framework for understanding students’ learning processes and the connections between pedagogical practices and associated outcomes (e.g., engagement and motivation) (butler, 2011; butler & cartier, 2018). a case study design also allowed us to collect and coordinate multiple forms of evidence to examine individual and social processes as they unfolded in the context of tasks. 3.2. participants to recruit teachers as participants from multicultural classrooms, the lead author of this article reached out to an independent school board located in an urban community in a western province in canada. he already had an existing professional relationship with the school board. he discussed his plans to volunteer as a resource person for supporting teaching and learning in upper elementary multicultural classrooms, and future intention of conducting research with interested teachers. the choice of upper elementary classrooms was to include students with the maturity to articulate their cultural backgrounds and learning experiences. the school board provided him with a total list of four schools they identified as multicultural within the district. with school board permission, the lead author e-mailed principals of six schools, including the four suggested by the school board as well as two other schools where he already had professional connections. three principals accepted his offer for help and extended his invitation to their teachers. he visited the schools of these principals out of which two teachers indicated interest to participate in this study. ultimately the lead author visited the classes of upper elementary teachers, who volunteered to participate in this study, at two of those schools (joseph and matthias, respectively) 2 . joseph taught in a grade 4 classroom at st mary’s elementary; matthias taught in a grade 5 classroom at st. victor’s elementary school. table 1 shows demographic information for each teacher as well as teaching experiences. table 1 teachers’ background teacher self identified culture gender academic qualifications years of teaching classes/grades taught years of teaching in the current school years of teaching current grade joseph 5 th generation (western european) m bed 25 4 -12 19 9 matthias caucasian (western european) m bed, ba (arts in english) 8 k-12 8 5½ at the time of this study, the province in which these teachers’ schools were located was transitioning to a new curriculum that focused on personalized learning, project-based learning, and accommodating student diversity including cultural backgrounds. in this context, the teachers shared the goal of empowering culturally diverse learners to engage actively in more personally relevant, open forms of learning. still, in relation to this study, both teachers did not have any formal knowledge or experience 2 the lead author did not visit the third school because the teacher who volunteered taught grade 2. 9 about designing crpps and srlpps. in addition, although joseph was experienced with designing complex tasks, matthias had never designed a complex task for his class. 3.2.1. students in joseph’s classroom all students in joseph’s grade 4 classroom (n = 31) were invited to participate in this study. eighteen students volunteered to participate by submitting back their signed assent and parent/guardians’ consent forms. table 2 shows that these eighteen participants were between the ages of 8 and 9. tables 2 and 3 combine to show how joseph’s students came from linguistically and culturally diverse backgrounds. table 2 student demographics in both classrooms classroom total # of students m f ages years (months) first language as english # (%) first language other than english # (%) home language other than english # (%) both parents are born in canada # (%) either or both parents are not born in canada # (%) special needs designat ion # (%) joseph’s 18 11 7 8 (10) 15 3 5 10 8 1 class 9(8) (83.3%) (16.7%) (27.8%) (55.6%) (44.4%) (5.6%) matthias’ 25 15 10 9 (10) 22 3 11 8 17 0 class 0(8) (88%) (12%) (44%) (32%) (68%) (0%) note: m = male and f = female. table 3 linguistic and cultural diversity in both classrooms classroom first language home language countries of ethnicity/or countries of other than other than parent(s) born origin (#of english english outside of ethnicity/countries) (# of languages) (# of languages) canada (# of countries) joseph’s class portuguese, italian, philippines, caucasian/ canadian, greek, and portuguese, italy, greece, southeast asian, italian, spanish. greek, and germany, african, latino, and columbian. portugal, and trinidad. el salvador. matthias’ class polish, korean, croatian, polish, east indian, indian, italian, philippines, and chinese. italian, korean, england, caucasian/ canadian, chinese, and philippines, african, croatian, polish, tagalog. africa, korea, korean, english, irish, taiwan, hong dutch, chinese, portuguese, kong, italy, and german. poland, and china. 10 3.2.2. students in matthias’ classroom all 31 students in matthias’ grade 5 classroom were also invited to participate. of them, twenty five provided parental consent and assent to participate in the study. table 2 shows that these participants were between 9 and 10 years of ages. tables 2 and 3 combine to show the diversity in students’ identified first and home languages, countries and ethnicities. 3.3. co-designing instructional practices the lead author of this article worked with the two volunteer teachers across the fall 2017 to design complex tasks that enacted cr-srl promoting practices. he met with each of the two teachers separately (joseph and matthias) and discussed their research interests and goals for students. next, the lead author served as a collaborator in facilitating individual meetings with each teacher separately. to advance teachers’ professional learning and practice development, he drew on a collaborative inquiry framework to involve them in the cyclical processes of goal identification, planning and enacting practices, reflecting on progress, and refining approaches accordingly (butler & schnellert, 2012; timperley et al., 2014). more specifically, during early meetings, the lead author introduced the cr-srl framework (anyichie, 2018; anyichie & butler, 2017) and discussed with each teacher how it could be implemented to support culturally diverse students’ engagement. through those discussions, he collaborated with each teacher in sharing ideas about designing and enacting relevant practices (i.e., crpps and srlpps), both for their classrooms as a whole and for a particular task. in each case, he focused on building from each teacher’s prior learning and experience. for example, based on the curriculum, what the teachers were already doing and experimenting with in relation to designing an integrated pedagogy, he worked with each teacher to design a learning task of their choice. he also worked together with the teachers throughout the term in refining the tasks in ways that best accommodated the provisions of the cr-srl framework. as students’ learning unfolded, he supported participating teachers to refine their practices as it fitted the dynamism of their respective classes. overall, each teacher had control over how to integrate the crpps and srlpps within their chosen learning activity as they considered appropriate for their students. 3.3.1. co-designed complex tasks the co-designed complex task in joseph’s class was titled understanding animal and human adaptations to the land. this learning task had three sections: animal adaptations, first nations’ adaptations to the land, and students’ adaptations to school. the first section asked students to research adaptations of any animal of their choice. the second required them to research human adaptations with the first nations in canada as a case example. finally, building on what the students were learning about animal and human adaptations, the third section asked them to research their personal adaptation to school. the co-designed complex task for matthias’ students was titled understanding your personal and cultural identity. this learning task also had three sections: relationships and cultural contexts, personal values and choices, and personal strengths and abilities. each of these sections comprised three to five open-ended questions that students were expected to answer. part of the first section also asked the students to create a collage that described them culturally (see appendix a for details on each of these tasks). 3.4. procedures as part of his early conversations with teachers, the lead author identified all proposed research procedures, but then made modifications based on their negotiation of goals and processes. following the completion of all ethical procedures, he worked with each teacher separately to implement the study design. further, prior to data collection, he explained all the data collection measures and processes to the students 11 in joseph’s and matthias’ classes and invited them to participate. he provided the teachers with consent/assent forms for themselves, their students’ parents/guardians, and their students. he explained to the students that the purpose of the study was to investigate with their teachers how best to support their learning. finally, he observed and collected data about the participating teachers’ implementation, and students’ experiences, of the crpps and srlpps. 3.5. data collection for each case study, we collected and coordinated multiple sources of data including: (1) classroom observations and associated field notes; (2) teacher documents (e.g., task instructions); (3) student work samples; (4) students’ self-reports about their engagement using an experience sampling and reflection form (esrf); and (5) teacher interviews. 3.5.1. observations the lead author conducted 12 observations of instructional episodes of complex task across 9 days in joseph’s classroom (515 minutes), and 6 observations across 5 days in matthias classroom (255 minutes). observations focused on the practices joseph and matthias enacted to support culturally diverse students in the context of their tasks; and how the students were participating in those practices. each observation lasted between 30 and 80 minutes. the total number of observations in each class was dependent on the number of days teachers invited the lead author to observe their students. observing the same students over time offered an opportunity to understand their engagement as related to the specific features of their complex tasks. during each classroom observation, the lead author created a running record of what he observed (see perry,1998; anyichie, 2018). in those records, he tried to capture all teacher and student talk “verbatim” as much as he could during individual and small group activities. he video-taped observations when it was possible to capture only students who had consented to participate. those video-taped observations provided rich contextual information and helped to better understand and interpret students’ engagement. while circulating during an observation, the lead author occasionally debriefed with the students about their participation. he also debriefed with teachers after each observation to gain more understanding about their practices in relation to students’ participation in them. 3.5.2. teacher documents the lead author reviewed the complex task instructions to consider practices each teacher designed to support his students. the review of this documentation assisted him during observations to focus attention on students’ participation in relation to teachers' enactment of crpps and srlpps. 3.5.3. student work samples during the observations, as students were working on their tasks, the lead author photographed samples of their work. sometimes, he took pictures of drafts in their work folders. these pictures helped to see how students were participating in the tasks in relation to features of each specific section. 3.5.4. experience sampling and reflection form (esrf) we used an esrf adapted from (larson & csikszentmihalyi, 2014) to gather students’ self-reports of their experiences while participating in the complex task. this form asked questions about students’: (1) feelings (i.e., how did you feel about working on this activity today?); (2) concentration (i.e., how well did you concentrate while working on this activity/task today?); (3) perceptions of challenge (i.e., was this activity challenging for you? if so, what made it challenging? what did you do about the challenge?); (4) perceptions of importance (i.e., how important is this activity?); (5) perceptions of enjoyment (i.e., did you enjoy what you worked on today?); and (6) interest (i.e., was this activity interesting?). students rated their subjective experiences on 5-point likert scales from 0 (not at all) – 4 (very much). they also provided 12 explanations for their ratings by responding to a follow-up “why”? students filled out this form each time they worked on their tasks 3 to help them reflect on their experiences while sharing that with their teachers. asking them to immediately report their experiences reduced retrospective bias. these repeated reports also helped us and the teachers to understand students’ real-time experiences over time. overall, the students submitted esrf (n = 77) in joseph’s class; and (n = 94) in matthias’ class. 3.5.5. interviews the lead author, at the end of the study, conducted individual in-depth semi-structured interviews with each teacher. the teachers were asked to share their perceptions about the practices they designed. example questions included: what cr-srl classroom practices did you design and implement to support your students, especially culturally diverse learners in your multicultural classroom? what did you try that seemed successful and beneficial? why do you think it was effective? what challenges did you experience? their interviews took place at the teachers’ schools and lasted between 45 60 minutes. 3.6. data analysis we conducted a combination of qualitative (e.g., of classroom observations, teachers’ debriefing and interviews, document, and student work samples) and quantitative (e.g., of student self-reports/ratings on the esrf) analyses. 3.6.1. coding of teacher practices we transcribed video-taped classroom observations, debriefings, and semi-structured teacher interviews. the transcribed information was shared with the teachers who confirmed the content before the coding. we reviewed documents (e.g., complex task instructions) and student work samples. we worked together to develop a priori categories derived from the cr-srl framework (see anyichie, 2018, anyichie & butler, 2017 for detailed review) to inform coding but were also open to unexpected findings. to support this analysis, we engaged in two levels of coding. first, we developed a chronological list of all practices enacted in each instructional episode with references to the lesson, activities, and section of the complex task. then we looked at each practice from an srl point of view, flagging any practice consistent with srl-promoting practices (see table 4). subsequently, we again reviewed the full list of practices with a crt lens, flagging any practice clearly linked with crt principles. the result was a chronological list of practices flagged as srlpps, crpps, both, or neither. this approach to coding enabled us to interpret whether and how srlpps and crpps were intertwined within each lesson and section of the task (see table 5). second, once all lessons, sections and activities were coded, we categorized the practices in relation to the three main categories of practices identified in the cr-srl framework (i.e., foundational, designed instructional and dynamic supportive practices). this lens enabled us to interpret how the practices teachers enacted did (or did not) reflect the main kinds of practices most frequently identified across the srl and crt literatures. finally, documents and fieldnotes were mined for confirming or disconfirming evidence. 3.6.2. coding of students’ engagement we coded, analyzed and interpreted students’ engagement in the context of the complex tasks based on three sources of data: (1) observations of students’ engagement, (2) students’ work samples, and (3) students’ reflections (using the esrf). 3 on a few days the students filled in esrf when the lead author was not present. however, he observed all the different sections of the learning tasks. note that the teachers adopted this form to support students’ self-reflection and gather feedback from students about their experiences in the ongoing class learning task. 13 to code observational data, we reviewed field notes and transcripts of debriefs to describe student activities and identify instances of their participation in specific contexts. we looked for engagement that reflected any of the four dimensions of engagement, as identified earlier, but did not try to disentangle them. for example, we coded as engagement evidence of students’ participation and direct involvement in learning activities including effort and persistence (e.g., reading, note taking and re-writing); concentration and focusing attention (e.g., eyes fixed on worksheets with evidence of thinking and writing); reflection and assessment (e.g., completing self/peer assessment forms); help-seeking (e.g., asking questions), listening and answering questions, making suggestions and offering input in class, and thinking aloud. finally, we examined student work samples for traces of engagement (e.g., integrating teacher feedback). whenever we flagged a link between teachers’ practices and engagement in our displays, we then accessed other forms of data in search of patterns to understand how particular practices (e.g., crpps, srlpps) may have enabled different students’ engagement in specific contexts. although we did not calculate inter-rater agreement, the first and second author discussed and reached agreement on the coding processes. we also measured student engagement by analysing their self-reported responses to the esrf. we started by constructing a display of each student’s ratings on concentration (as another indicator of engagement). to consider how their concentration ratings were linked to their perceptions of the complex task (and the section they were working on), we also looked at their perceptions of challenge, enjoyment, importance, and interest. then, we calculated descriptive statistics. we also created displays that allowed us to see how students’ perceptions about their tasks shifted across days and were associated with their self reported concentration. to examine if variations in students’ self-reported perceptions and engagement were similar within and across days, we conducted repeated measures within subject analyses of variance. furthermore, to advance our understanding of the possible associations between students’ perceptions of, and engagement in their tasks, we conducted correlational analyses. finally, to support identifying patterns, quantitative data from the esrf were roughly interpreted to be low if below midpoint (< 2.5) and high if above midpoint (>2.5). table 4 coding scheme for teacher practice: category, codes and descriptions/examples category codes descriptions/examples srl promoting practices srlpp teacher practices that were supportive of srl. for example, evidence of the teacher: (a) providing opportunities for choice and control over challenge (e.g., allowing students’ choice and decision making, scaffolding students’ meaningful choices, and supporting control over learning); (b) fostering self-assessment (e.g., by creating opportunities for students’ self reflection, self-monitoring, and adjusting of learning); (c) offering teacher support (e.g., by providing resources and instrumental supports, and co-regulatory opportunities between the teacher and student(s)); (d) providing opportunities for peer support (e.g., offering opportunities for peer-to peer support group activities, co-regulation of learning, and assessment); and/or (e) providing opportunities for students to engage in cycles of strategic actions. 4 (butler et al., 2017; perry, 2013; perry et al., 2020; schunk & greene, 2018). 4 many of the practices are identified across a range of resources and research. therefore, links to citations are provided at the end of each cell in the table. 14 culturally responsive pedagogical practices crpp teacher practices that were considered culturally responsive and relevant. for example, evidence of the teacher: (a) establishing cross-cultural communication (e.g., creating opportunities for social interactions about personal or cultural issues); (b) designing cultural diversity in curriculum content (e.g., adjusting and situating curriculum content to connect with students’ prior knowledge and lived experiences by using multicultural textbooks); and/or (c) establishing cultural congruity in classroom teaching and learning (e.g., matching class instruction with students’ prior experiences and cultural background). (d) developing cultural competence (e.g., fostering teacher’s and students’ understanding and knowledge of their cultures and that of other students). (aceves & orosco, 2014; gay, 2010, 2018; howard & rodriguez-minkoff, 2017; ladson-billings, 1995, 2021; villegas & lucas, 2002; ginsberg & wlodkowski, 2015). 15 table 5 sample coding of teacher practices (matthias class day 5) context teacher practices code lesson activity one matthias and the students were brainstorming ideas, and students independently completed the open-ended questions on their worksheets. provided opportunity for students’: connection of classroom activity to their cultural backgrounds crpp choice making srlpp self-reflection through esrf srlpp social interaction and group activity around cultural backgrounds srlpp/ crpp lesson activity two students were working in small groups; independently writing, and collectively sharing their individual stories on their group’s flipchart. offered instrumental support: scaffolded student thinking about their cultural backgrounds and experiences through brainstorming and a worksheet with guiding questions srlpp/ crpp provided learning resources (e.g., flipcharts) *other note. “other” represents teacher observed practices that are neither strictly srlpp nor crpp. 3.6.3. links between students’ engagement and the contextual features of the complex task to trace the links between teacher practices and students’ engagement during their respective complex task, we created data displays highlighting the relationship between teachers’ practices on different days and students’ self-reported engagement using matrix coding queries in nvivo 11 software. the displays represented observations of teacher practices alongside students’ narrative description of their perceptions of teacher practices and their participation on the esrf. 4. findings 4.1. how did teachers integrate cr-srl practices into their complex tasks? the review of teachers’ instructions for the complex tasks and classroom observations combined to reveal that both teachers integrated crpps and srlpps into their classrooms to support their students’ engagement. however, they integrated these practices in different ways. this section starts by presenting the findings in joseph’s class as a foundation for comparing the similarities and differences with matthias’ class. 4.1.1. joseph’s class joseph integrated both srlpps and crpps across sections of the complex task he designed. for example, part of the instructions for his task asked students to: 16 research one of the aboriginal people 5 (e.g., inuit, metes and first nations). compare your findings about the first nations and our daily living by responding to these questions: what is the biggest difference? what is most surprising when i think of my life? if i was a first nation person my age, what would i enjoy the most? (crpp) the above instructions revealed that joseph supported his students’ learning about aboriginal people in relation to their own lived experience (crpp). he offered them an opportunity to conduct independent research about aboriginal peoples (srlpp) and compare their research findings with their individual life experiences (crpp and srlpp). part of joseph’s complex task also involved the students in a field trip to a local museum to see exhibitions of the aboriginal peoples, especially the first nations [task instructions]. after the trip, joseph asked his students to reflect on their personal experiences and learning by completing a worksheet with some guiding questions (e.g., “find 3 things that helped the first nation peoples in their daily lives, provide a drawing, a brief description”; “how is this object/thing different from what you use in your life”; “how is my life changed after i have seen these exhibits”?) (crpp). further, the students were asked to share their individual learning in small groups and present their groups’ collective learning through a podcast to the entire class (srlpps) [task instructions; observation day 9]. this example shows how joseph created opportunities for his students to connect this task with their sociocultural context through the field trip (crpp), and their personal lived experiences (crpp and srlpp) through self-reflection (srlpp). overall, joseph’s task provided varied opportunities for students’ learning and engagement. it contained almost all the features of a complex task as defined by perry (2013) (srlpps). for example, it engaged students in independent and social forms of learning (e.g., individual and small group research), integrated two subject areas (e.g., science and social studies), extended over time (i.e., over two months), allowed students multiple ways of demonstrating learning and knowledge (e.g., through a multimedia book, podcasts presentations and role play), focused on large chunks of meaning about the learning content (e.g., conducting research), and involved students in making meaningful choices (e.g., of the aboriginal people to research, what and how to share about their lived experiences in relation to the first nation peoples’ lives) [task instructions; observations]. but joseph also strategically wove crpps in these srl-supportive features (e.g., by linking content to students’ cultural backgrounds and lived experiences). joseph provided scaffolds for students’ engagement in completing the task such as generating ideas (srpp), activated their prior knowledge in a way that facilitated students’ connection of class activities to their lived experiences (crpp), and offered opportunities for students’ self-reflection, selfmonitoring and peer support (srpp). 4.1.2. matthias’ class like joseph, matthias integrated both srlpps and crpps into the different sections of his learning task. for example, his instructions in the section on “personal values and choices,” asked students to: (1) list 5 things that are important to you/that you value in life; (2) explain why each of them is important to you; (3) consider “what do you hope to be in the future, and why?”; and (4) reflect on “how is this hope affected/influenced by your values or your cultural background? if it isn’t, what affects/influences your hope and why?” through these instructions, matthias provided opportunities for students to deliberately connect this class activity to their cultural background and personal lives (crpp), make personally meaningful choices and exercise control over the information they were sharing (srlpp), and reflect on how their culture might be influencing their choices and values in life (srlpp and crpp). 5 the curriculum of the province where this study was conducted emphasized enacting decolonizing pedagogies that is connected to the aboriginal peoples (i.e., indigenous) and engages students by connecting to their experience and heritage. 17 also, the section titled culture collage asked students to “make a collage of images and words that describe and represent you culturally (crpp); …use the space below [worksheet] to brainstorm ideas that you can apply to your final collage” (srlpp) [task instruction; observation day 4]. this evidence suggests that matthias offered opportunities for his students to engage in strategic action and self-monitoring (srlpp) and bring their cultural and life experiences into classroom learning activities (crpp). that said, whereas joseph’s learning task advanced his students’ understanding of classroom topics by situating them in their cultural backgrounds and lived experiences (crpp), matthias’ task, as a foundational practice, focused most on developing his students’ awareness and understanding of their identities and cultural backgrounds (e.g., by creating a collage) (crpp). matthias’ task did include some of the features of srl-promoting “complex” tasks (perry, 2013). for example, the task included multiple sections with specific products, involved students in both independent and small group learning processes, and extended over time (srlpps) [task instructions; observations]. however, whereas the activities in joseph’s task were highly integrated and interdependent, matthias’ task consisted of a series of short, similar tasks in a survey format. the activities involved mostly open-ended questions and built only from the social studies curriculum. also, matthias’ task had limited instructional goals, opportunities for group activities and demonstration of learning in multiple ways. overall, matthias’ task did not include a rich combination of srlpps and crpps compared to that of joseph. 4.2. how were culturally diverse students engaging in cr-srl complex tasks? to understand students’ in-the-moment perceptions about their participation during the complex tasks, we examined their esrf reports including both (1) students’ self-reported concentration (as one indicator of engagement); and (2) whether they perceived the task on each day as challenging, interesting, important, and/or enjoyable. in this analysis, we interpreted these latter ratings as indicators of how they were responding to the practices built into the task each day as personally relevant, valuable, and meaningful. 4.2.1. joseph’s class table 6 shows that students who participated in the cr-srl complex task in joseph’s class reported high-levels of engagement across all five days when esrf data were collected (concentration, m = 3.20, sd = 0.74). they also perceived the task to be highly important (m = 3.53, sd = 0.87), interesting (m = 3.36, sd = 1.18), and not very challenging (m = 0.74, sd = 0.90). 6 their perceptions of the task as highly important and interesting suggested that they may have found it personally meaningful, valuable, and relevant (m = 3.50, sd = 0.80). to find out if the differences in student mean ratings were consistent across days, or instead varied in relation to specific sections of the task, we ran a repeated measures analysis of variance on each of concentration, importance and interest. the results of the repeated measures anova with sphericity assumed showed that there were no statistically significant differences at the p < .05 level for student ratings across days on concentration [f (4, 28) = 1.116, p = .369, η 2 =.138]; importance [f (4, 20) = 1.117, p = .376, η 2 =.183], and interest f (4,12) =1.131, p = .388, η 2 = .274). these findings suggest that, overall, the students in joseph’s class perceived the cr-srl complex task to be highly important and interesting and were very engaged in it across days. 6 ratings of enjoyment were not available. joseph decided to redesign the esrf to make it more appealing to his students and excluded the question on enjoyment. by the time the lead author realized it, it was too late to include it in their reflection form. 18 “ table 6 esrf: means and standard deviations for students’ self-reports of concentration and perceptions of challenge, importance, and interest across days (joseph’s class) engagement perceived perceptions of task value day # of participants # of esrf concentration m (sd) challenge m (sd) importance m (sd) interest m (sd) overall m (sd) 5 18 16 3.19 (0.63) 0.94 (0.75) 3.77 (0.42) 3.83 (0.55) 3.81 (0.31) 8 18 15 2.87 (0.96) 0.87 (1.09) 3.33 (0.94) 3.00 (1.57) 3.10 (1.07) 9 18 17 3.44 (0.60) 0.50 (0.76) 3.63 (0.70) 3.88 (0.48) 3.81 (0.34) 10 18 16 3.19 (0.73) 0.88 (0.93) 3.19 (1.24) 2.40 (1.25) 2.84 (1.03) 11 18 13 3.33 (0.62) 0.58 (0.86) 3.83 (0.37) 4.00 (0.00) 3.92 (0.19) total 18 77 3.20 (0.74) 0.74 (0.90) 3.53 (0.87) 3.36 (1.18) 3.50 (0.80) note. esrf scale: 0 = not at all, 1 = slightly, 2 = somewhat, 3 = much, 4 = very much. 4.2.2. matthias’ class table 7 shows that students in matthias’ class reported relatively high levels of engagement in their complex task on days when the esrf was collected (concentration, m = 2.97, sd = 0.17). a repeated measures anova showed that there were no statistically reliable differences in self-reported concentration across days [f (3, 54) = 2.568, p = .064, η 2 =.125]. that said, while students’ ratings were above the mid point, suggesting relatively high levels of engagement, their self-reported concentration was significantly lower than that reported by students in joseph’s classroom (m = 3.20, sd = 0.74). overall, table 7 also indicates that matthias’ students perceived the task to be very important (m = 3.04, sd = 0.07), enjoyable (m = 2.94, sd = 0.36), and interesting (m = 2.75, sd = 0.30). students’ overall perceptions of the task suggest that they found it to be personally meaningful, valuable and relevant (m = 2.91, sd = 0.22). however, as in joseph’s classroom, there were some variations in their perceptions of the task across days. for example, although repeated measures anova showed that students reported similar levels of importance across days [f (3, 51) = .504, p = .681, η 2 =.029], post hoc pairwise comparison analysis with a bonferroni adjustment indicated that the students reported higher levels of interest and enjoyment on day 4 as compared to day 6 (interest, p = .041; enjoyment, p = .012). this finding again points to the way in which classroom conditions including teacher practices influenced students’ perceptions of a complex task over time. 19 table 7 esrf: means and standard deviations for students’ reports of concentration and perceptions of challenge, importance, enjoyment and interest across days (matthias' classroom) engagement perceived perceptions of task value overall # of # of concentration challenge importance enjoyment interest value days participants esrf m (sd) m (sd) m (sd) m (sd) m (sd) m (sd) 3.00 0.93 3.10 3.19 3.26 4 25 25 (0.60) (0.84) (0.56) 3.48 (0.67) (0.72) (0.68) 3.14 0.67 3.05 2.86 2.92 5 25 21 (0.64) (0.64) (0.84) 2.86 (0.83) (0.83) (0.84) 3.04 0.87 3.09 2.43 2.66 6 25 23 (0.75) (0.95) (0.85) 2.48 (1.17) (1.10) (1.09) 2.68 0.80 2.92 2.52 2.79 7 25 25 (0.84) (0.89) (0.89) 2.92 (0.98) (1.10) (1.01) total 25 94 2.97 (0.17) 0.82 (0.10) 3.04 (0.07) 2.94 (0.36) 2.75 (0.30) 2.91 (0.22) note. esrf scale: 0 = not at all, 1 = slightly, 2 = somewhat, 3 = much, 4 = very much. 4.3. how was culturally diverse students’ engagement related to teachers' cr-srl practices? to answer our third research question about the links between students’ perceptions of the complex task (i.e., interest, enjoyment, importance, challenge) and engagement, this section presents for each class: the association between students’ perceived values of daily activities and their self-reported concentration (i.e., one indicator of engagement); and a case study analysis of engagement as linked to activities on days where the highest variation was observed. 4.3.1. joseph’s class 4.3.1.1. associations between students’ self-reported engagement and perceptions about the complex task. to better understand how students’ perceptions of the complex task (i.e., interest and importance) could be associated with their self-reported concentration, we conducted a correlational analysis (see table 8). results indicated that all three variables were positively inter-correlated, suggesting a positive relationship between students’ perceptions of the task and engagement in joseph’s classroom. table 8 bi-variate and partial correlations among concentration, importance and interest (joseph’s classroom) control variables concentration interest importance m sd n+ none concentration 1 3.18 0.78 62 interest 0.491* 1 3.34 1.2 62 importance 0.321* 0.399* 1 3.56 0.86 62 note. + . total valid number (listwise) of responses from the participants. *. correlation significant at the 0.05 level (2-tailed). 20 4.3.1.2. associations between students’ engagement and teacher practices in the complex tasks. to gain more insight into the links between joseph’s practices and students’ engagement across days, we started by reviewing all evidence of students’ engagement in their tasks on each day, including students’ ratings and written reflections on the esrf, classroom observations/ debriefs, and student work samples. we cross-referenced that data with evidence of joseph’s practices derived from classroom observations/debriefs, task instructions, and his teacher interview. for the most part, students’ engagement in joseph’s task was high. although there were high-levels of student engagement across days, in this section, we chose day 8 for more in-depth analysis, since it was the day we observed the greatest variation in students’ perceptions of the task (see table 6). case study of day 8. prior to day 8, joseph had asked the students to conduct independent research on the first nations’ ways of life and share their findings in small groups. on day 8, they focused on comparing their research findings about the first nations’ life and their individual lives [task instructions; observation]. joseph had two connected activities in his lesson: brainstorming and completing a worksheet (see table 9). joseph’s practices on day 8. during the complex task on day 8, joseph enacted both srlpps and crpps (see table 9). for example, he spent the first 10 minutes facilitating a brainstorming activity about how the first nations lived and adapted to their land, and how that might be similar or different from today’s way of life [observation] (crpp). he supported students’ thinking about the first nations’ ways of life through metacognitive questions (srlpp): and retention of generated ideas by writing all their responses on the white board [observation]. observational data indicated that while scaffolding students’ reflective thinking (srlpp) about how to compare first nations’ way of life and how people live today including themselves (crpp), the teacher instructed them to: “…think about the most dramatic differences you come up with, most important to the least important”. the second activity asked the students to compare their own life experiences with that of the first nations by generating at least 3 similarities and differences (crpp) [task instruction; observation]. joseph supported students’ completion of this activity through a structured worksheet. for this activity, he gave them choices about how and where to work saying: “it’s lot more of individual work, but you can work with your partner to get at least 3 similarities and differences,” and at any corner of the class or at the resource room (a room adjacent to their class) (srlpp) [observation]. as the students completed their worksheets, joseph was observed circulating from group to group and answering questions. occasionally, he scanned through their worksheets and offered encouragement by saying “good, good.” at one point, after visiting a group, he shared an idea from s5: “he says that the first nations people hunted for food, but we hunt for sport. yet, we get food from it [hunting], but have it for sport” (crpp). in this way, he offered instructional support by sharing an idea from a student and by facilitating conversations around it (srlpp). linking student engagement to teacher practices on day 8. overall, our analysis suggested that student engagement was related to the crpps and srlpps joseph enacted on day 8. we observed that most of the students were actively engaged during the lesson activities. for example, at the beginning of the lesson, the students asked and answered questions and updated their notes. this finding could be linked to the open-ended questions joseph posed to them during the brainstorming exercise, as well as recording their responses on the board (srlpps). during the second group activity, students in one group were observed taking turns in comparing their lives with that of the first nations, as well as negotiating ideas to be written in their main worksheet. we observed this kind of negotiation among other groups as well. this involvement in co-construction of ideas could be associated with the opportunity joseph created for students 21 to connect classroom activities to their lived experiences (crpps) and to collaborate in completing the structured worksheet (srlpps). our interpretation was validated by joseph who connected the level of his students’ engagement to the practices he enacted: “i found in this project that having students relate what they learned to their self … was very effective and had a high-level of engagement” [interview, 18/12/2017]. although the students were engaged during this lesson, examination of their reflections on the esrf showed mixed and contradictory perceptions about that part of the task (see table 9). their comments, which can be associated with the variations in their engagement on day 8, could be attributed to individual differences and preferences in relation to the activities (e.g., whether or not they liked the content or lack of access to technology, and how they felt about it). for example, 4 out of 15 students that submitted their esrf on this day reported feeling bored. these variations and individual differences may explain the overall lower self-reported concentration on this day compared to other days. table 9 joseph’s classroom learning contexts (day 8), teacher practices and samples of students’ comments day s learning context teacher practices code sample of students’ comments (esrf) 8 lesson activity one: teacher and students were brainstorming and sharing students’ research findings about aboriginal groups scaffolded student thinking through brainstorming and questioning, srlpp s1: “i felt bored because we didn’t use the ipads”; s2: “i like the first nations people”; s3: “because we compare our differences, i get to learn about first nations”; s4: “it was fun writing about first nations life”; s5: “i did not feel like working”; s6: “some human beings [peers] are a little mean”; s7: “i like knowing about first nations”; s8: “you get to learn about people that came before us”. offered support on making connections between class activities and personal lives, and srlpp/crpp instructional support. srlpp lesson activity two: students were independently and in groups comparing independent research findings about aboriginal groups and their own personal lives scaffolded how to compare the first nations’ life with the students’ lives through metacognitive questions, srlpp/crpp provided opportunity for choice making, and offered emotional support. srlpp note. on day 8 the students reported their experiences of both lesson activities in one esrf 22 4.3.2. matthias’ class 4.3.2.1. associations between students’ self-reported engagement and perceptions of the complex task. to better understand how students’ perceptions of the complex task’s value could be associated with their self-reported concentration in matthias’ class, we again conducted a correlational analysis. the results in table 10 indicate that students’ perceptions of the task were positively and statistically significantly correlated with each other, but not with their reports of concentration. this finding suggests that the relationship among the students’ perceptions of the task on a given day notwithstanding, the complex task in matthias’ classroom may not have consistently led to an increase in students’ concentration. this finding contrasts with that of joseph’s class where engagement correlated positively with students’ perceptions of the learning task. table 10 bi-variate correlations among concentration, importance, enjoyment, and interest variables concentration importance enjoyment interest concentration 1 importance 0.115 1 enjoyment 0.120 0.387** 1 interest 0.167 0.481** 0.660** 1 ** correlation is significant at the 0.01 level (2-tailed). case study of day 6. we chose day 6 to better understand the interaction between matthias’ practices, in which students’ self-reported perceptions and concentration varied greatly on that day (see table 7). on day 6, the students came in from the lunch break, and submitted an assignment that asked them to write a paragraph on comics (language arts). next, while seated at their lockers [arranged in a table format with four/five students facing each other], the students independently worked on the section of the task on “personal strengths and abilities” [observation]. this section focused on students’ understanding of their strengths and abilities and how they use them in their community. teacher practices on day 6. classroom observational data showed that matthias enacted crpps and srlpps on day 6 (see table 11). to begin, he spent the first few minutes introducing the section of the task the students were supposed to be working on and communicating that he expected them to finish that section that same day. as in previous sections of the task, on day 6 students were charged with answering open-ended questions: (1) “what are some of your strengths and abilities?”; (2) “what would you say are some of your challenges and weaknesses?”; and (3) “how are you using your strengths in your: family, school, relationships?” (crpp) [task instructions]. next, he distributed a wor ksheet containing the above three questions as a learning resource to students (srlpp) [observation]. 23 table 11 learning context, teacher practices and samples of students’ comments on day 6 learning context teacher practices code samples of students’ comments lesson activity matthias and the students were brainstorming ideas while students independently completed their worksheets. provided opportunity for students’: connection of classroom activity to their cultural backgrounds through the worksheets crpp s1: “i didn’t enjoy writing, but the questions were interesting”; “got to learn more about myself”. s2: i got to think about what i'm good at”; “it was fun … i had to think”. s3: “i like working on these activities”; “some of the questions are hard”; these questions are all about me”. s4: “i tried to ignore the people in my group” [in terms of class sitting arrangement]; “i got bored after a while” s5: “i was talking sometime while working on it”; “if i don’t do it, i will get into trouble”; “it got my brain thinking about my school, my family, and my friends”. s6: “it was fun; our group was loud, but i didn’t talk”. s7: “i like doing this type of work; this assignment lets me be creative and lets me be me”. s8: “i did not think it was interesting”; “i got really bored”. s9 “some of the questions are hard “; “people around me are loud”. s10: “this activity was not easy nor hard, it was in the middle”; “sometimes i always get distracted”; “i'm not a big fan of writing”. s11: “pretty excited but i liked the collage more because it was more creative”; “i have quiet people around me so i can concentrate”; “i like learning about me…” s12: “i did not like this assignment”; “got easily distracted because its boring”; “it wasn’t interesting”. choice making srlpp self-reflection through esrf srlpp offered teacher support srlpp procedural support through brainstorming activity, and emotional support. 24 | flr our observational data showed that while the students were completing their worksheets, matthias concurrently facilitated a brainstorming exercise, strategically guiding the students through each of the questions in the worksheet. occasionally, matthias allowed limited time in between the questions for students to write their responses on their worksheets. within that short period, he circulated, asked questions, and attended to students’ specific needs. similarly, he provided scaffolds by asking questions (e.g., what do you think are your strengths?) (crpp/srlpp) and facilitated students’ learning and retention by keeping track of generated ideas on the board. matthias supported students’ attention and concentration by celebrating a student’s achievement: “you see she [s3] just focused and got finished. if you pay attention and reduce your discussions, you will be done soon”. halfway through the lesson, matthias facilitated students’ thinking about situating their responses in the context of their home and lived experiences: “how are you using your strengths at home? so, if you are creative, maybe during festive times you are helping out at decoration of things... think of things you do at home…” (crpp). although the questions in the worksheets were designed to orient students’ thinking in a culturally relevant manner, matthias did not explicitly emphasize making that connection all the time. finally, towards the end of the activity, before submitting their work, matthias created an opportunity for students to reflect on their perceptions of the activity in terms of being personally meaningful, valuable, and relevant through the esrf (crpp/srlpp). linking student engagement to teacher practices on day 6. observational data showed that many of the students were somewhat passive during the lesson activity. for example, during the brainstorming exercise, most students sat quietly on rows, listened, looked at the board and wrote in their worksheets. occasionally, some other students asked clarification questions to the teacher (e.g., “what if i want to write being empathetic”, “can we write full sentences?”). also, not many students responded to teacher directed questions. this observed low-level engagement could be associated with the fact that matthias talked more often than students, including responding to some of his own questions, while supporting students’ task interpretation and completion. in addition to teacher practices, students’ experiences with previous sections of the task (i.e., which also involved many written responses) may have impacted their interest and added to their lower-level engagement (see table 11). examination of student worksheets showed how they made personally relevant choices about the information they were sharing (see figure 2). this finding could be linked to the metacognitive questions matthias provided in the student worksheet (crpp/srlpp). yet, students provided similar responses in some questions. for example, many of the students stated “athletic, curiosity, creativity, confidence, and empathetic” as their strengths [observation; student worksheets]. this resemblance in student responses could be associated with the teacher’s efforts to keep track of their brainstorming discussions by writing generated ideas on the board. since matthias always guided his students through each of the questions with little time for them to deeply reflect on the questions, the students may have depended on teacher support and relied on his recording of ideas on the board. inadvertently, matthias’ procedural support (perry, 1998) may have caused dependency and constrained individual students’ thinking beyond their collective class discussion; thereby giving rise to passive participation and experiences of boredom. from an srl perspective, the activity on day 6 could be described as mostly teacher directed, with limited opportunities for bridging from guiding learning to fostering students’ independence. there was also no observed opportunity for social interaction or peer feedback. at the end, however, the esrf (see table 11) seemed to support their thinking, reflection on the activity (linked to srlpps), and awareness of their identities (linked to crpps). figure 2. student work samples on strengths and abilities 25 | flr these findings in matthias’ classroom like those from joseph’s class, again suggested that the dynamic interaction between the learner and context shaped their students’ engagement. in matthias’ class, the low engagement level of students on day 6 appeared to be related at least in part to the ways in which he enacted crpps and srlpps. fewer of those practices were evident on day 6, when compared with other days (e.g., days 4 and 5). interestingly, while in joseph’s class most students reacted positively on most days to the learning task, matthias’ task was not as consistently successful in engaging learners. in matthias’ classroom, there was greater variability in students’ perceptions and responses to the same learning context. 5. discussion 5.1. practices joseph and matthias integrated in their complex tasks in relation to our first research question, findings indicated that both joseph and matthias built on the cr srl framework and wove crpps and srlpps in their independent complex tasks. although each teacher had a choice of how they integrated crpps and srlpps, we expected this finding since the teachers volunteered and collaborated with the lead author in building from a cr-srl framework to co-design a complex task to support their students’ engagement. this finding suggests how participating teachers were learning to think about integrating crpps and srlpps simultaneously within a learning activity. also, it is in line with other research that has shown that teachers who are mentored to implement srl and/or culturally responsive frameworks, and are supported to work using a collaborative inquiry framework (timperly et al., 2014), experience shifts in their instructional approaches (butler et al., 2013; perry et al., 2006; teemant et al., 2011; powell et al., 2016; correll, 2016). that said, we observed differences in how joseph and matthias designed and enacted their tasks that seemed consequential in terms of influencing students’ engagement. for example, evidence suggested that joseph embedded a wider variety of crpps and srlpps with the potential to engage his students than did matthias. the differences in observed practices could have been influenced by the differences in their teaching experience, comfort levels with experimenting with new instructional practices, and the needs of their students. for example, joseph, who had been teaching for over 20 years, was already familiar with designing a complex task [debriefings], and with little support from the first author was able to successfully weave in crpps and srlpps. on the other hand, matthias, who had been teaching for 8 years, described himself as a novice in designing complex tasks [debriefings]. as a result, his learning task eventually included fewer features of an srl-promoting “complex task” than did joseph’s (perry, 2013). this finding is consistent with those of other studies that have linked teachers’ experimentation with new instructional strategies and shifts in instructional practices with teaching experience, participation in in-service professional development/workshops, collaborative inquiry and/or collaborations with researchers (anyichie & butler, 2017; 2018; butler & schnellert, 2012; clark et al., 1996; gray et al., 2020; mor & mogilevsky, 2013; turner & trucano, 2015). this study adds by showing how the cr srl framework helped teachers to build from their prior experience to start weaving crpps and srlpps together in generative ways into learning tasks and classrooms. 5.2. associations between student engagement and teacher practices one of the major findings from this study was that student engagement in the cr-srl complex tasks in each classroom was relatively high. furthermore, findings showed a higher level of engagement in joseph’s classroom, where the use of the two kinds of practices were more frequent and interconnected. research on both crpps and srlpps has similarly suggested that these practices are associated with higher levels of engagement and motivation (ginsberg & wlodkowski, 2015; kumar et al., 2018; wolters & taylor, 2012). also, this research adds by showing how crpps and srlpps could be combined within learning tasks to support high levels of 26 | flr engagement. overall, the findings of this study extend the literature by showing that an increase in student engagement can be associated with an integrated crpps and srlpps (e.g., anyichie, 2018; anyichie & butler, 2017, 2018, in press; anyichie et al., 2016; 2019; gay, 2018; ginsberg & wlodkowski, 2015; perry, 2013; revathy et al., 2018). in terms of the underlying goal of this research, which was to investigate strategies for more consistently supporting engagement of culturally diverse learners, this finding is very encouraging (see also anyichie, 2018; anyichie & butler, 2018). this research also adds by tracing with some specificity how students’ high level of engagement in their tasks could be associated with how teachers combined crpps and srlpps. for example, students in both joseph’s and matthias’ classrooms, while working on their tasks, were highly engaged in making culturally relevant choices. students’ engagement in choice making could be associated with opportunities teachers provided. for example, students were asked to make choices of what to learn and how to demonstrate their learning (srlpp, joseph); they were given opportunities to choose how to respond to culturally relevant open-ended guided questions (crpp, matthias). by making these choices, students participated in exercising control over their learning tasks. they also bridged the gap between their home and classroom cultures by deliberately connecting classroom activities to their histories and backgrounds. for instance, matthias’ culturally situated guiding questions (crpp/srlpp) activated his students’ prior knowledge, fostered their metacognitive thinking, and supported them in making personally relevant decisions about their values, strengths, and future life goals based on their interests, ability, family needs, and lived experiences. these kinds of questions might create opportunities for the sustainability of students’ cultural practices and ways of being in their communities (paris, 2021).this finding adds to previous literature on the association between choice provision, motivation, and engagement (e.g., schmidt et al., 2018; evans & boucher, 2015; jiang, et al., 2021; patall et al., 2016; perry, 2013) by revealing the affordances of a cr-srl complex task for student choice making and control over learning. overall findings from this study suggest the power of culturally relevant, sustaining, and meaningful choices in enhancing learners’ engagement. as culturally diverse students’ participation in the learning task unfolded in both classrooms, our findings showed that they were highly involved in self-reflection and self-assessment. for example, they situated their reflections on what they were learning in relation to their cultural backgrounds and lived experiences. in addition, they self-assessed their learning contexts (e.g., the task) and self-reported their perceptions and participation in them on the esrf. through these reflective processes, students engaged in cognitive and metacognitive processes by analyzing and monitoring their learning performances in ways that seemed to foster their active engagement in the task. this finding connects with previous research showing how self-evaluation and formative assessment improve students’ engagement, srl, cognitive processes and achievement (andrade & valtcheva, 2009; braud et al., 2021; nicol & macfarlane-dick, 2006; perry et al., 2010, 2020; sanchez et al., cooper, 2017; schunk & zimmerman, 2008). this study adds by showing how crpps can foster students’ self-reflection. further, consistent with previous research (e.g., anyichie, 2018, anyichie & butler, 2018; 2019; in press), students’ engagement during self-reflection processes could be linked to the opportunities teachers created (e.g., by providing culturally relevant questions) for students to relate what they were learning to their cultural backgrounds and lives (crpp/srlpp). similarly, aceves and orosco (2014) found that student engagement, understanding of text, and reading achievement increased when the teacher in their study created opportunities for students to relate the context of their reading activity to their individual background knowledge and lived experiences. this finding connects with the current study by suggesting how engagement can be enhanced in a personally relevant and valuable learning activity. 27 | flr furthermore, findings from observational data showed that joseph engaged his students by providing opportunities for independent and group activities that allowed for social interaction and sharing, peer support and collaboration (srlpp), as well as connecting classroom activities to their cultural backgrounds (crpp). both teachers, as well, offered instrumental support (srlpp) by providing scaffolds for students’ metacognitive thinking about how the tasks could be connected to their cultural backgrounds (e.g., through provision of worksheets with open-ended questions, and brainstorming activities). these findings align with those of other studies that have documented how student engagement is fostered by activities that involve collaboration, teacher support and new learning (cooper, 2014; heemskerk & malmberg, 2020; van braak et al., 2021; klem & connell, 2004; parsons et al., 2018; wu et al., 2013). our study adds by suggesting how culturally mixed small group activities, if implemented with responsivity and respect, could amplify students’ engagement. 5.3. students’ perceptions and engagement: individual and context interactions our findings also revealed a dynamic interaction between the learner and context that shaped students’ learning engagement. specifically, findings showed that students’ perceptions of their classroom activities in terms of being personally relevant and important, most times, shaped their engagement in them. for example, the complex task in joseph’s classroom more reliably fostered students’ positive perceptions and higher levels of self reported concentration, which were associated with crpps and srlpps and how he wove those in a more complex way. however, student reflective explanations of their experiences revealed individual-context variations within class engagement levels. these variations were more pronounced in matthias’ than in joseph’s classrooms and could be attributed to individual differences (i.e., preferences) in relation to the quality of activities assigned on particular days (e.g., teachers’ use of crpps and srlpps), and the overall learning context (e.g., being distracted by peers) across days. furthermore, findings from the esrf data and correlational analyses revealed tighter connections between students’ perceptions of daily activities and their engagements in them (i.e., the cr-srl complex task) in joseph’s classroom. in contrast, self-reported engagement on the esrf was not reliably correlated with students’ perceptions of their learning activities in matthias’ classroom, which also manifested in variations in student comments on esrf. this finding shows how variations in student engagement within and between classrooms were associated with learners’ perceptions of the classroom context. taken together, these findings extend previous research showing how student perceptions of their learning contexts including designed instructional practices, task features and teacher support shape their active engagement (anyichie et al., 2019; butler & cartier, 2018; van braak et al., 2021; jones et al., 2021; parsons et al., 2018; kelly & zhang, 2016). further, this current study validates findings that students tend to be highly engaged in learning activities they perceive to be important, interesting and enjoyable (ainley, 2012; harackiewicz & priniski, 2018; harackiewicz et al., 2016; jones, et. al 2021; patall et al., 2016); and relevant to their cultural values (gray et al., 2020). moreover, it shows how students’ perceptions of crpps and srlpps shaped their increased level of engagement. finally, the findings of this study establish how student engagement is situated in context, shaped by a dynamic interaction between the learner and context, and need to be understood within the context in which they occur (anyichie, 2018; anyichie & butler, 2018, in press; anyichie et al., 2019; butler & cartier, 2018; fredricks & mccolskey, 2012; heemskerk et al., 2020; okoye & anyichie, 2008; shernoff et al, 2016). 28 | flr 5.4. contributions and implications 5.4.1. theoretical contributions our study makes theoretical, methodological, and practical contributions. it contributes, through the cr srl framework, by offering a theoretical background for the integration of pedagogical practices from crt and srl theories. it adds to culturally relevant, sustaining, and responsive pedagogy by advancing understanding of how to empower culturally diverse learners’ agency within a cr-srl complex task. furthermore, this study contributes to srl theory by drawing attention to the impact of social and cultural contexts on students’ exercise of agency. furthermore, this study contributes to literature on student engagement. for example, it offered invaluable insights into how students’ perceptions of daily activities and engagement are situated in contexts that foster their srl (e.g., using srlpps) by deliberately encouraging them to draw from their cultural backgrounds and lived experiences to advance their learning experiences (using crpps). these findings, of an integrated crpps and srlpps in this study, extend previous literature on the need to situate research about student learning processes including engagement, srl and motivation within students’ sociocultural context and to weave cultural relevance into motivation research (anyichie & butler, in press; anyichie et al., 2016; 2019; gray et al., 2020 king et al., 2018; mcinerney et al., 2011; nolen et al., 2015; kumar et al., 2018 ; usher, 2018; zusho & clayton, 2011). 5.4.2. methodological contributions another major contribution of this study is further identification of methodological approaches for examining dynamic individual-context interactions. for example, building on prior research (e.g., butler, 2011; butler & cartier, 2018), we showed how the use of case study methodology enabled us to both generate a thick description of learners-in-context and conduct parallel cross-case analyses to trace patterns across cases. also, the use of a cr-srl complex task as a unit of analysis enabled us to trace the connection between teacher-enacted practices and student engagement in situ. for example, our in-depth case studies helped us see the dynamic interactions between student perceptions of contextual features of the complex task and their engagement within it (butler, 2011; butler & cartier, 2018); and, how that interaction explained the variabilities in students’ engagement within and between classes, and across days. these kinds of in-depth explorations of teacher and student activities in classrooms and how these activities are shaping teaching and learning needs to be employed extensively in srl research. finally, much of the prior engagement research employs self-report or a single measure (fredricks & mccolskey, 2012; fredricks et al., 2019). using a case study design in this study allowed us to collect and triangulate multiple sources of evidence (e.g., quantitative and qualitative data) to understand student engagement in relation to contextual features (e.g., crpps and srlpps). we also add by inserting an esrf into the mix of data collection sources that allows for collection of both quantitative and qualitative data in tandem. 5.4.3. practical contributions this study contributes to classroom teaching and learning practices by providing information about how teachers might integrate crpps and srlpps in a complex task to support engagement for culturally diverse learners. our findings suggest that a cr-srl framework (anyichie, 2018; anyichie & butler, 2017) might provide a useful guide for educators, especially those who do not share the similar cultural backgrounds and lived experiences with their students, in designing meaningful and relevant activities (e.g., cr-srl complex task) in our contemporary society. also, our study showed how students’ active engagement in their tasks increased when they perceived it to be personally relevant and important. these findings invite educators to deepen their cultural competence, knowledge of student backgrounds including prior learning experiences, cultural backgrounds and lived experiences, aspirations, and interests (gay, 2018, ladson-billings, 2021) and then build on those as resources for designing relevant instructional practices (chaplin, 2019; gay, 2013) with potentials for sustaining students’ heritage and ways of being (paris & alim, 2017). the implementation of this framework might support 29 | flr addressing contemporary problems of classroom systemic racism and inequality and help in closing achievement gaps between mainstream and minority students of colour. 5.4.4. limitations and future directions the contributions of this study notwithstanding, there are some limitations that need to be mentioned. first, we chose an in-depth parallel case study design (i.e., two elementary teachers and 43 students in two independent schools), which enabled us to explore relationships between enacted practices and students’ engagement in some depth. but we cannot generalize our findings to other contexts or classrooms because we were not able to recruit many multicultural schools/classrooms as we wanted. therefore, extending this study to include larger samples (e.g., state-funded or public schools, more classrooms, teachers and students of colour) might allow a more comprehensive understanding about how teachers’ and students’ cultural backgrounds influence their practices and engagement processes, respectively. second, the grade 4 and 5 participants in this study may not have fully developed and internalized their cultural norms and values in ways that would have facilitated their effective connection of classroom activities to their cultural backgrounds and lived experiences. involving higher levels of students (e.g., middle, high and college school students) as well as their families and communities in future research might be of help to examine more fully how practices that enable students to build from their cultural backgrounds might influence their engagement. third, coding of observations was done by the first author of this article. however, the data were shared with the teachers; and coding discussed and cross-checked with the second author until agreements were reached (brink, 1993). although we offered our different positionalities to our interpretation of coding, in future research, consensus coding of data (bradley et al., 2007), especially by individuals with different perspectives, could strengthen interpretations. nevertheless, the use of case study allowed for collection and triangulation of multiple sources of data that substantially helped in overcoming some of the shortcomings of any data collection method (yin, 2014; houghton et al. 2013). fourth, since this current paper considered all students in culturally diverse classrooms as bringing in their diverse linguistic, ethnic and cultural backgrounds to the classroom context, we did not analyse the data in relation to different groups of students. future research can investigate how student engagement and perception of classroom contexts (e.g., cr-srl practices, power relation between teacher and students and among students) are shaped by each learner’s unique and specific cultural background. finally, based on the findings of this study, future research could examine in more detail how the provision of culturally relevant, sustaining, and meaningful choices might enhance students’ engagement; and how crpps might be compatible with srlpps in context. 6. conclusion in conclusion, this study adds to the body of research exploring classroom contexts in support of student engagement and motivation. this study goes beyond by examining teacher practices and students’ engagement in complex tasks that integrated crpps and srlpps. a significant finding of this study was that the integration of crpps and srlpps into a complex task seemed to create affordances for culturally diverse students’ active engagement (anyichie, 2018; anyichie & butler, 2018, in press). another important finding was that students’ perceptions of contextual features (e.g., crpps and srlpps) to be personally meaningful and relevant seemed to increase their engagement. also, this study documented a complex interaction between the learner and context and how students’ engagement was related to the kinds of crpps and srlpps the teachers wove into the different sections of the task. overall, this study is among the first applied school research to highlight how classroom contexts that purposefully integrate multiple combinations of crpps and srlpps may yield benefits for culturally 30 | flr diverse learners’ engagement. finally, the research processes and findings of this study contribute to theory, research, methodology, and practice in both srl and crt. thus, we hope that this investigation serves as a guide for future applied research on supporting culturally diverse students’ engagement. keypoints ● a culturally responsive self-regulated learning framework guided educators in designing meaningful and engaging complex tasks. ● students were more engaged in learning contexts with richer integration of culturally responsive teaching and self-regulated learning practices. ● culturally relevant and meaningful choices were powerful in enhancing student engagement. ● case study design was helpful in understanding the dynamic interaction between students’ perception of, and engagement in, classroom contexts. 31 | flr references aceves, t. c., & orosco, m. j. (2014). innovation configuration culturally responsive teaching. ceedar center, 2(ic), 1–37. http://ceedar.education.ufl.edu/tools/innovation-configurations ainley, m. (2012). students’ interest and engagement in classroom activities. in s. l. christenson, a. l. reschly, & w. cathy (eds.), handbook of research on student engagement (pp. 283–302). springer. https://doi.org/10.1007/978-1-4614-2018-7_13 andrade, h., & valtcheva, a. (2009). promoting learning and achievement through self-assessment. theory into practice practice, 48(1), 12–19. https://doi.org/10.1080/00405840802577544 andres, d. j. c. (2021). preparing teachers to be culturally multidimensional: designing and implementing teacher preparation programs for pedagogical relevance, responsiveness, and sustenance. the educational forum, 85(4), 416–428. https://doi.org/10.1080/00131725.2021.1957638 anyichie, a. c. (2018). supporting all learners’ engagement in a multicultural classroom using a culturally responsive self-regulated learning framework [doctoral dissertation, the university of british columbia]. the university of british columbia open collections. https://doi.org/10.14288/1.0375773 anyichie, a., & butler, d. l. (in press). examining culturally diverse learner’s motivation and engagement processes as situated in a context of a complex task. frontier in education. https://doi.org/10.3389/feduc.2023.1041946 anyichie, a. c., & butler, d. l. (2015, may 30–june 3). implications of supporting the development of self regulated learning through modelling and scaffolding [poster presentation]. 42nd annual conference of the canadian society for the study, ottawa, on, canada. https://www.researchgate.net/publication/336810681 anyichie, a. c., & butler, d. l. (2017, april 27–may 1). a culturally responsive self-regulated learning framework [paper presentation]. american educational research association 98th annual meeting, san antonio, tx, united states. https://www.researchgate.net/publication/324605557 anyichie, a. c., & butler, d. l. (2018, april 13–17). culturally responsive teaching and self-regulated learning: an integrated approach to supporting engagement in inquiry-based learning [paper presentation]. american educational research association 99th annual meeting, new york, ny, united states. https://www.researchgate.net/publication/324605557 anyichie, a. c., & butler, d. l. (2019, june 1–5). understanding culturally diverse learners’ motivation and engagement processes as situated in an inquiry-based learning context [paper presentation]. 47th annual conference of the canadian society for the study of education, vancouver, bc, canada. https://www.researchgate.net/publication/336768995 anyichie, a. c., & butler, d. l. & nashon, s. m. (2019, june 1–5). investigating students;’ perceptions and engaegment in a multicultual classroom context [paper presentation]. 47th annual conference of the canadian society for the study of education, vancouver, bc, canada. https://www.researchgate.net/publication/367524431 anyichie, a. c., & onyedike, c. c. (2012). effects of self-instructional learning strategy on secondary schools students’ academic achievement in solving mathematical word problems in nigeria. african research review an international multidisciplinary journal, ethiopia, 6(27), 302–323. https://doi.org/10.4314/afrrev.v6i4.21 anyichie, a. c., yee, n., perry, n. e., & hutchinson, l. r. (2016, may 28–june 1). supporting culturally diverse students with self-regulated learning [paper presentation]. 43rd annual conference of the canadian soceity for the study of education, calgary, ab, canada. https://www.researchgate.net/publication/321807336 appleton, j. j., christenson, s. l., & michael, f. (2008). student engagement with school: critical conceptual and methodological issues of the construct. psychology in the schools, 45(5), 369–386. https://doi.org/10.1002/pits.20303 bang, m. (2015). culture, learning, and development and the natural world: the influences of situative perspectives. educational psychologist, 50(3), 220–233. https://doi.org/10.1080/00461520.2015.1075402 http://ceedar.education.ufl.edu/tools/innovation-configurations https://psycnet.apa.org/doi/10.1007/978-1-4614-2018-7_13 https://doi.org/10.1080/00405840802577544 https://doi.org/10.1080/00131725.2021.1957638 https://doi.org/10.14288/1.0375773 https://doi.org/10.3389/feduc.2023.1041946 https://www.researchgate.net/publication/336810681 https://www.researchgate.net/publication/324605557 https://www.researchgate.net/publication/324605557 https://www.researchgate.net/publication/336768995 http://www.researchgate.net/publication/367524431 https://doi.org/10.4314/afrrev.v6i4.21 https://www.researchgate.net/publication/321807336 https://psycnet.apa.org/doi/10.1002/pits.20303 https://doi.org/10.1080/00461520.2015.1075402 32 | flr banks, j., cochran-smith, m., moll, l., richert, a., zeichner, k., lepage, p., darling-hammond, l., duffy, h., & mcdonald, m. (2005). teaching divese learners. in l. darling-hammond & j. bransford (eds.), preparing teachers for a changing world: what teachers should learn and be able to do (pp. 232–274). jossey-bass. boekaerts, m., & corno, l. (2005). self regulation in the classroom: a perspective on assessment and intervention. applied psychology, 54(2), 199–231. https://doi.org/10.1111/j.1464-0597.2005.00205.x bradley, e. h., curry, l. a., & devers, k. j. (2007). qualitative data analysis for health services research: developing taxonomy, themes, and theory. health services research, 42(4), 1758–1772. https://doi.org/10.1111/j.1475-6773.2006.00684.x braund, h., deluca, c., panadero, e., & cheng, l., (2021). exploring formative assessment and co regulation in kindergarten through interviews and direct observation. frontiers in education, 6, article 732373. https://doi.org/10.3389/feduc.2021.732373 brayboy, b. m. j., & castagno, a. e. (2009). self‐determination through self‐education: culturally responsive schooling for indigenous students in the usa. teaching education, 20(1), 31–53. https://doi.org/10.1080/10476210802681709 brink, h. i. l. (1993). validity and reliability in qualitative research. curationis, 16(2), 35–38. https://doi.org/10.4102/curationis.v16i2.1396 butler, d. l. (2011). investigating self-regulated learning using in-depth case studies. in b. j. zimmerman & d. schunk (eds.), handbook of self-regulation of learning and performance (pp. 346–360). routledge. butler, d. l., schnellert, l., & perry, n. e. (2017). developing self-regulating learners. pearson. butler, d. l., & cartier, s. c. (2018). advancing research and practice about self-regulated learning: the promise of indepth case study methodologies. in d. h. schunk & j. a. greene (eds.), handbook of self-regulation of learning and performance (2 nd ed., pp. 352–369). routledge. https://doi.org/10.4324/9781315697048-23 butler, d. l., & schnellert, l. (2012). collaborative inquiry in teacher professional development. teaching and teacher education, 28(8), 1206–1220. https://doi.org/10.1016/j.tate.2012.07.009 butler, d. l., schnellert, l., & cartier, s. c. (2013). layers of selfand co-regulation: teachers working collaboratively to support adolescents’ self-regulated learning through reading. education research international, 2013, article 845694. https://doi.org/10.1155/2013/845694 cartier, s. c., & butler, d. l. (2016). comprendre et évaluer l’apprentissage autorégulé dans des activités complexes [understanding and assessing self-regulated learning in complex activities]. in b. noël & s. c. cartier (eds.), de la métacognition à l’apprentissage autorégulé [from metacognition to self-regulated learning] (pp. 41–54). deboeck. chaplin, m. (2019). reclaiming multicultural education: course redesign as a tool for transformation. multicultural perspectives, 21(3), 151–158. https://doi.org/10.1080/15210960.2019.1659041 christenson, s. l., reschly, a. l., & wylie, c. a. (2012). the handbook of research on student engagement. springer. clark, c., moss, p. a., goering, s., herter, r. j., lamar, b., leonard, d., robins, s., russell, m., templin, m., & wascha, k. (1996). collaboration as dialogue: teachers and researchers engaged in conversation and professional development. american educational research journal, 33(1), 193–231. https://doi.org/10.2307/1163385 cleary, t. j., & zimmerman, b. j. (2012). a cyclical self-regulatory account of student engagement. in s. l. christenson & a. l. reschly (eds.), handbook of research on student engagement (pp. 237–258). springer. https://doi.org/10.1007/978-1-4614-2018-7_11 cooper, k. s. (2014). eliciting engagement in the high school classroom: a mixed-methods examination of teaching practices. american educational research journal, 51(2), 363–402. https://doi.org/10.1111/j.1464-0597.2005.00205.x https://doi.org/10.1111/j.1475-6773.2006.00684.x https://doi.org/10.3389/feduc.2021.732373 https://doi.org/10.1080/10476210802681709 https://doi.org/10.4102/curationis.v16i2.1396 https://psycnet.apa.org/doi/10.4324/9781315697048-23 https://doi.org/10.1016/j.tate.2012.07.009 https://doi.org/10.1155/2013/845694 https://doi.org/10.1080/15210960.2019.1659041 https://doi.org/10.2307/1163385 https://psycnet.apa.org/doi/10.1007/978-1-4614-2018-7_11 33 | flr https://doi.org/10.3102/0002831213507973 correll, p. k. (2016). teachers' preparation to teach english language learners (ells): an investigation of perceptions, preparation, and current practices [doctoral dissertation, university of kentucky]. university of kentucky uknowledge. https://doi.org/10.13023/etd.2016.531 dignath, c., & veenman, m. v. j. (2020). the role of direct strategy instruction and indirect activation of self regulated learning—evidence from classroom observation studies. educational psychology review, 33, 489–533. https://doi.org/10.1007/s10648-020-09534-0 eccles, j. s. (2016). engagement: where to next? learning and instruction, 43, 71–75. https://doi.org/10.1016/j.learninstruc.2016.02.003 efklides, a. (2011). interactions of metacognition with motivation and affect in self-regulated learning: the masrl model. educational psychologist, 46(1), 6–25. https://doi.org/10.1080/00461520.2011.538645 egbo, b. (2019). teaching for diversity in canadian schools (2nd ed.). pearson. elaine, c., & randall, d. p. (2010). a critical review of culturally responsive literacy instruction. journal of praxis in multicultural education, 5(1), 83–99. https://doi.org/10.9741/2161-2978.1034 evans, m., & boucher, a. r. (2015). optimizing the power of choice: supporting student autonomy to foster motivation and engagement in learning. mind, brain, and education, 9(2), 87–91. https://doi.org/10.1111/mbe.12073 fredricks, j., blumenfeld, p., & paris, a. (2004). school engagement: potential of the concept, state of the evidence. review of educational research, 74(1), 59–109. https://doi.org/10.3102/003465430740010 fredricks, j. a., & mccolskey, w. (2012). the measurement of student engagement: a comparative analysis of various methods and student-report insttuments. in s. l. christenson, a. l. reschly, & c. wylie (eds.), handbook of research on student engagement (pp. 763–782). springer. https://doi.org/10.1007/978-1-4614 2018-7_37 fredricks, j. a., filsecker, m., & lawson, m. a. (2016). student engagement, context, and adjustment: addressing definitional, measurement, and methodological issues. learning and instruction, 43, 1–4. https://doi.org/10.1016/j.learninstruc.2016.02.002 fredricks, j. a., parr, a. k., amemiya, j. l., wang, m.-t., & brauer, s. (2019). what matters for urban adolescents’ engagement and disengagement in school: a mixed-methods study. journal of adolescent research, 34(5), 491–527. https://doi.org/10.1177/0743558419830638 gay, g. (2010). culturally responsive teaching: theory, research and practice. teachers college press. gay, g. (2013). teaching to and through cultural diversity. curriculum inquiry, 43(1), 48–70. https://doi.org/10.1111/curi.12002 gay, g. (2018). culturally responsive teaching: theory, research, and practice (3rd ed.). teachers college press. ginsberg, m. b., & wlodkowski, r. j. (2015). motivation and culture. in j. m. bennett (ed.), the sage encyclopedia of intercultural competence. sage. https://philpapers.org/rec/bentse-2 graham, s. (2018). race/ethnicity and social adjustment of adolescents: how (not if) school diversity matters. educational psychologist, 53(2), 64–77. https://doi.org/10.1080/00461520.2018.1428805 gray, d. l., mcelveen, t. l., green, b. p., & btyant, l. h. (2020). engaging black and latinx students through communal learning opportinites: a relevance intervention for middle schoolers in stem elective classroom. contemporary educational psychology, 60, article 101833. https://doi.org/10.1016/j.cedpsych.2019.101833 greene, b. a. (2015). measuring cognitive engagement with self-report scales: reflections from over 20 years of research. educational psychologist, 50(1), 14–30. https://doi.org/10.1080/00461520.2014.989230 hadwin, a., & oshige, m. (2011). socially shared regulation: exploring perspectives of social in self-regulated learning theory. teachers college record, 113(2), 240–264. https://doi.org/10.1177/016146811111300204 harackiewicz, j. m., & priniski, s. j. (2018). improving student outcomes in higher education: the science https://doi.org/10.3102/0002831213507973 https://doi.org/10.13023/etd.2016.531 https://doi.org/10.1007/s10648-020-09534-0 https://doi.org/10.1016/j.learninstruc.2016.02.003 https://doi.org/10.1080/00461520.2011.538645 https://doi.org/10.9741/2161-2978.1034 https://doi.org/10.1111/mbe.12073 https://doi.org/10.3102/00346543074001059 https://psycnet.apa.org/doi/10.1007/978-1-4614-2018-7_37 https://psycnet.apa.org/doi/10.1007/978-1-4614-2018-7_37 https://doi.org/10.1016/j.learninstruc.2016.02.002 https://doi.org/10.1177/0743558419830638 https://doi.org/10.1111/curi.12002 https://philpapers.org/rec/bentse-2 https://doi.org/10.1080/00461520.2018.1428805 https://doi.org/10.1016/j.cedpsych.2019.101833 https://psycnet.apa.org/doi/10.1080/00461520.2014.989230 https://doi.org/10.1177/016146811111300204 34 | flr of targeted intervention. annual review of psychology, 69, 409–435. https://doi.org/10.1146/annurev-psych-122216-011725 harackiewicz, j. m., smith, j. l., & priniski, s. j. (2016). interest matters: the importance of promoting interest in education. policy insights from the behavioral and brain sciences, 3(2), 220–227. https://doi.org/10.1177/2372732216655542 hecht, c. a., grande, m. r., & harackiewicz, j. m. (2021). the role of utility value in promoting interest development. motivation science, 7(1), 1–20. https://doi.org/10.1037/mot0000182 heemskerk, c., & malmberg, l. (2020). students’ observed engagement in lessons, instructional activities, and learning experiences. frontline learning research, 8(6), 38–58. https://doi.org/10.14786/flr.v8i6.613 houghton, c., casey, d., shaw, d., & murphy, k. (2013). rigour in qualitative research. nurse researcher, 20(4), 12–17. https://doi.org/10.7748/nr2013.03.20.4.12.e326 howard, t. c., & rodriguez-minkoff, a. c. (2017). culturally relevant pedagogy 20 years later: progress or pontificating? what have we learned, and where do we go? teachers college record, 119(1), 1–32. https://doi.org/10.1177/016146811711900104 hulleman, c. s., godes, o., hendricks, b. l., & harackiewicz, j. m. (2010). enhancing interest and performance with a utility value intervention. journal of educational psychology, 102(4), 880–895. https://doi.org/10.1037/a0019506 järvenoja, h., järvelä, s., & malmberg, j. (2015). understanding regulated learning in situative and contextual frameworks. educational psychologist, 50(3), 204–219. https://doi.org/10.1080/00461520.2015.1075400 jiang, j., kusamoto, m., & tanaka, a., (2021). moderating effects of individual differences in causality orientation on relationships between reward, choice, perceived competence, and intrinsic motivation. frontline learning research, 9(3), 69–95. https://doi.org/10.14786/flr.v9i3.751 jones, b. d., krost, k., & jones, m. w. (2021). relationships between students' course perceptions, effort, and achievement in an online course. computers and education open, 2, article 100051. https://doi.org/10.1016/j.caeo.2021.100051 johnson, d. w., johnson, r. t., & smith, k. a. (1998). cooperative learning returns to college: what evidence is there that it works? change, 30(4), 26–35. https://doi.org/10.2753/jei0021-3624440403 kahu, e. r. (2013). framing student engagement in higher education. studies in higher education, 38(5), 758– 773. https://doi.org/10.1080/03075079.2011.598505 kelly, s., & zhang, y. (2016). teacher support and engagement in math and science: evidence from the high school longitudinal study. university of north caralina press, 99(2), 141–165. https://www.jstor.org/stable/44075288 king, r. b., mcinerney, d. m., & pitliya, r. j. (2018). envisioning a culturally imaginative educational psychology. educational psychology review, 30, 1031–1065. https://doi.org/10.1007/s10648-018-9440-z kumar, r., akane, z., & rhonda, b. (2018). weaving cultural relevance and achievement motivation into inclusive classroom cultures. educational psychologist, 53(2), 78–96. https://doi.org/10.1080/00461520.2018.1432361 klem, a. m., & connell, j. p. (2004). relationshiop matter: linking teacher support to student engagement and achievement. journal of school health, 74(7), 262–273. https://doi.org/10.1111/j.1746-1561.2004.tb08283.x ladson-billings, g. (1995). toward a theory of culturally relevant pedagogy. american educational research journal, 32(3), 465–491. https://doi.org/10.3102/00028312032003465 ladson-billings, g. (2001). crossing over to canaan: the journey of new teachers in diverse classrooms. jossey bass. https://doi.org/10.1177/0013124505274557 https://doi.org/10.1177/2372732216655542 https://doi.org/10.14786/flr.v8i6.613 https://doi.org/10.7748/nr2013.03.20.4.12.e326 https://doi.org/10.1177/016146811711900104 https://psycnet.apa.org/doi/10.1037/a0019506 https://doi.org/10.1080/00461520.2015.1075400 https://doi.org/10.14786/flr.v9i3.751 https://doi.org/10.1016/j.caeo.2021.100051 https://doi.org/10.2753/jei0021-3624440403 https://doi.org/10.1080/03075079.2011.598505 https://www.jstor.org/stable/44075288 https://doi.org/10.1007/s10648-018-9440-z https://doi.org/10.1080/00461520.2018.1432361 https://doi.org/10.1111/j.1746-1561.2004.tb08283.x https://doi.org/10.3102/00028312032003465 35 | flr ladson-billings, g. (2021). i’m here for the hard re-set: post pandemic pedagogy to preserve our culture. equity & excellence in education, 54(1), 68–78. https://doi.org/10.1080/10665684.2020.1863883 larson, r., & csikszentmihalyi, m. (2014). the experience sampling method. in m. csikszentmihalyi (ed.), flow and the foundations of positive psychology (pp 21–34). springer. mccann, e. j., & turner, j. e. (2004). increasing student learning through volitional control. teachers college record, 106(9), 1695–1714. https://doi.org/10.1111/j.1467-9620.2004.00401.x mccarty, t. l., & brayboy, b. (2021). culturally responsive, sustaining, and revitalizing pedagogies: perspectives from native american education. the educational forum, 85(4), 429–443. https://doi.org/10.1080/00131725.2021.1957642 mcinerney, d. m., walker, r. a., & liem, g. a. d. (2011). sociocultural theories of learning and motivation. information age publishing. mclnerney, d. m., & king, r. b. (2018). culture and self-regulation in educational contexts. in d. h. schunk & j. a. greene (eds.), handbook of self-regulation of learning and performance (pp. 485–502). routledge. https://doi.org/10.4324/9781315697048-31 merriam, s. b. (2009). qualitative research: a guide to design and implementation. jossey-bass. montenegro, a. (2017). understanding the concept of agentic engagement. colombian applied linguistics journal, 19(1), 117–128. https://doi.org/10.14483/calj.v19n1.10472 montenegro, e., & jankowski, n. a. (2017, january). equity and assessment: moving towards culturally responsive assessment (occasional paper no. 29). urbana, il: university of illinois and indiana university, national institute for learning outcomes assessment (niloa). https://eric.ed.gov/?id=ed574461 mor, y., & mogilevsky, o. (2013). the learning design studio: collaborative design inquiry as teachers’ professional development. research in learning technology, 21, article 22054. https://doi.org/10.3402/rlt.v21i0.22054 nicol, d. j., & macfarlane-dick, d., (2006). formative assessment and self-regulated learning: a model and seven principles of good feedback practice. studies in higher education, 31(2),199–218. https://doi.org/10.1080/03075070600572090 nolen, s. b., horn, i. s., & ward, c. j. (2015). situating motivation. educational psychologist, 50(3), 234–247. https://doi.org/10.1080/00461520.2015.1075399 okoye, r., & anyichie, a. c. (2008). location and average class size as factors in achievement in jsc mathematics examinations. sokoto educational review, 10(2),175–185. https://doi.org/10.35386/ser.v10i2.403 paris, d. (2012). culturally sustaining pedagogy: a needed change in stance, terminology, and practice. educational researcher, 41(3), 93–97. https://doi.org/10.3102/0013189x12441244 paris, d. (2021). culturally sustaining pedagogies and our futures. the educational forum, 85(4), 364–376. https://doi.org/10.1080/00131725.2021.1957634 paris, d., & alim, h. s. (2014). what are we seeking to sustain through culturally sustaining pedagogy? a loving critique forward. harvard educational review 84(1), 85–100. https://doi.org/10.17763/haer.84.1.982l873k2ht16m77 paris, d., & alim, h. s. (eds). (2017). culturally sustaining pedagogies: teaching and learning for justice in a changing world. teachers college press. parsons, s. a., malloy, j. a., parsons, a. w., peters-burton, e. e., & burrowbridge, s. c. (2018). sixth grade students’ engagement in academic tasks. journal of educational research, 111(2), 232–245. https://doi.org/10.1080/00220671.2016.1246408 patall, e. a., vasquez, a. c., steingut, r. r., trimble, s. s., & pituch, k. a. (2016). daily interest, engagement, and autonomy support in the high school science classroom. contemporary educational psychology, 46, 180–194. https://doi.org/10.1016/j.cedpsych.2016.06.002 https://doi.org/10.1080/10665684.2020.1863883 https://link.springer.com/book/10.1007/978-94-017-9088-8#author-0-0 https://link.springer.com/book/10.1007/978-94-017-9088-8#author-0-0 https://doi.org/10.1111/j.1467-9620.2004.00401.x https://doi.org/10.1080/00131725.2021.1957642 https://psycnet.apa.org/doi/10.4324/9781315697048-31 https://doi.org/10.14483/calj.v19n1.10472 https://eric.ed.gov/?id=ed574461 https://doi.org/10.3402/rlt.v21i0.22054 https://doi.org/10.1080/03075070600572090 https://doi.org/10.1080/00461520.2015.1075399 https://doi.org/10.35386/ser.v10i2.403 https://doi.org/10.3102/0013189x12441244 https://doi.org/10.1080/00131725.2021.1957634 https://doi.org/10.1080/00220671.2016.1246408 https://doi.org/10.1016/j.cedpsych.2016.06.002 36 | flr pekrun, r., & linnenbrink-garcia, l. (2012). academic emotions and students engagement. in s. l. christenson & a. l. reschly (eds.), handbook of research on student engagement (pp. 259–282). springer. https://doi.org/10.1007/978-1-4614-2018-7_12 perry, n. e. (2013). classroom processes that support self-regulation in young children. british journal of educational psychology. monograph series ii: psychological aspects of education-current trends (vol.10, pp. 45–68). perry, n. e., yee, n., mazabel, s., lisaingo, s., & määttä, e. (2017). using self-regulated learning as a framework for creating inclusive classrooms for ethnically and linguistically diverse learners in canada. in n. j. cabrera & b. leyendecker (eds.), handbook on positive development of minority children and youth (pp. 361–377). springer. https://doi.org/10.1007/978-3-319-43645-6_22 perry, n. e. (1998). young children’s self-regulated learning and contexts that support it. journal of educational psychology, 90(4), 715–729. https://doi.org/10.1037/0022-0663.90.4.715 perry, n. e., lisaingo, s., yee, n., parent, n., wan, x., & muis, k. (2020) collaborating with teachers to design and implement assessments for self-regulated learning in the context of authentic classroom writing tasks, assessment in education: principles, policy & practice, 27(4), 416–443. https://doi.org/10.1080/0969594x.2020.1801576 perry, n. e, phillips, l., & hutchinson, l. (2006). mentoring student teachers to support self-regulated learning. elementary school journal, 106(3), 237–254. https://doi.org/10.1086/501485 pietarinen, j., soini, t., & pyhalto, k. (2014) students’ emotional and cognitive engagement as the determinants of well-being and achievement in school. international journal of educational research 67, 40–51. https://doi.org/10.1016/j.ijer.2014.05.001 pintrich, p. r. (2000). the role of goal orientation in self-regulated learning. in m. boekaerts, p. r. pintrich, & m. zeidner (eds.), handbook of self-regulation (pp. 452–502). academic press. powell, r., cantrell, s. c., malo-juvera, v., & correll, p. (2016). operationalizing culturally responsive instruction: preliminary findings of criop research. teachers college record, 118(1), 1–46. http://www.tcrecord.org/content.asp?contentid=18224 rahman, f. a., scaife, j., yahya, n. a., & jalil, h. a. (2010). knowledge of diverse learners: implications for the practice of teaching. international journal of instruction, 3(2), 83–96. https://dergipark.org.tr/en/pub/eiji/issue/5143/70080 reeve, j. (2013). how students create motivationally supportive learning environments for themselves: the concept of agentic engagement. journal of educational psychology, 105(3), 579–595. https://doi.org/10.1037/a0032690 reeve, j., & tseng, c. m. (2011). agency as a fourth aspect of students’ engagement during learning activities. contemporary educational psychology, 36(4), 257–267. https://doi.org/10.1016/j.cedpsych.2011.05.002 reschly, a. l., & christenson, s. l. (2012). jingle, jangle, and conceptual haziness: evolution and future directions of the engagement construct. in s. l. christenson, a. l. reschly, & c. wylie (eds.), handbook of research on student engagement (pp. 3–19). springer. http://doi.org/10.1007/978-1-4614-2018-7_1 salmela-aro, k., moeller, j., schneider, b., spicer, j., & lavonen, j. (2016). integrating the light and dark sides of student engagement using person-oriented and situation-specific approaches. learning and instruction, 43, 61–70. https://doi.org/10.1016/j.learninstruc.2016.01.001 sanchez, c. e., atkinson, k. m., koenka, a. c., moshontz, h., & cooper, h. (2017). self-grading and peer grading for formative and summative assessments in 3rd through 12th grade classrooms: a meta-analysis. journal of educational psychology, 109(8), 1049–1066. https://doi.org/10.1037/edu0000190 schmidt, j. a., rosenberg, j. m., & beymer, p. n. (2018). a person-in-context approach to student engagement in science: examining learning activities and choice. journal of research in science teaching, 55(1), 19– 43. https://doi.org/10.1002/tea.21409 https://psycnet.apa.org/doi/10.1007/978-1-4614-2018-7_12 https://doi.org/10.1007/978-3-319-43645-6_22 https://doi.org/10.1037/0022-0663.90.4.715 https://doi.org/10.1080/0969594x.2020.1801576 https://doi.org/10.1086/501485 https://doi.org/10.1016/j.ijer.2014.05.001 http://www.tcrecord.org/content.asp?contentid=18224 https://dergipark.org.tr/en/pub/eiji/issue/5143/70080 https://doi.org/10.1037/a0032690 https://doi.org/10.1016/j.cedpsych.2011.05.002 http://doi.org/10.1007/978-1-4614-2018-7_1 https://doi.org/10.1016/j.learninstruc.2016.01.001 https://doi.org/10.1037/edu0000190 https://doi.org/10.1002/tea.21409 37 | flr schunk, d., & greene, j. (eds.). (2018). handbook of self-regulation of learning and performance. routledge. https://doi.org/10.4324/9781315697048 schunk, d. h., meece, j. r., & pintrich, p. r. (2013). motivation in education: theory, research, and applications (4th ed.). springer. schunk, d. h., & zimmerman, b. j. (2008). motivation and self-regulated learning: theory, research, and applications. lawrence erlbaum associates. shernoff, d. j., kelly, s., tonks, s. m., anderson, b., cavanagh, r. f., sinha, s., & abdi, b., (2016). student engagement as a function of environmental complexity in high school classrooms. learning and instruction, 43, 52–60. https://doi.org/10.1016/j.learninstruc.2015.12.003 sinatra, g. m., heddy, b. c., & lombardi, d. (2015). the challenges of defining and measuring student engagement in science. educational psychologist, 50(1), 1–3. https://doi.org/10.1080/00461520.2014.1002924 teemant, a., wink, j., & tyra, s. (2011). effects of coaching on teacher use of sociocultural instructional practices. teaching and teacher education, 27(4), 683–693. https://doi.org/10.1016/j.tate.2010.11.006 timperley, h., kaser, l., & halbert, j. (2014). a framework for transforming learning in schools: innovation and the spiral of inquiry (seminar series, 234). centre for strategic education. turner, j. c., christensen, a., kackar-cam, h. z., trucano, m., & fulmer, s. m. (2014). enhancing students’ engagement: report of a 3-year intervention with middle school teachers. american educational research journal, 51(6), 1195–1226. https://doi.org/10.3102/0002831214532515 usher, e. l. (2018). acknowledging the whiteness of motivation research: seeking cultural relevance. educational psychologist, 53(2), 131–144. https://doi.org/10.1080/00461520.2018.1442220 van braak, m., van de pol, j., poorthuis, astrid m. g. & mainhard, t (2021). a micro-perspective on students’ behavioral engagement in the context of teachers’ instructional support during seatwork: sources of variability and the role of teacher adaptive support, contemporary educational psychology, 64, article 101928. https://doi.org/10.1016/j.cedpsych.2020.101928 villegas, a. m., & lucas, t. (2002). preparing culturally responsive teachers: rethinking the curriculum. journal of teacher education, 53(1), 20–32. https://doi.org/10.1177/0022487102053001003 wang, m., willett, j. b., & eccles, j. s. (2011). the assessment of school engagement: examining dimensionality and measurement invariance by gender and race/ethnicity. journal of school psychology, 49(4), 465–480. https://doi.org/10.1016/j.jsp.2011.04.001 wigfield, a., & eccles, j. (2000). expectancy–value theory of achievement motivation. contemporary educational psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015 winne, p. h., & hadwin, a. f. (1998). studying as self-regulated engagement in learning. in d. hacker, j. dunlosky, & a. graesser (eds.), metacognition in educational theory and practice (pp. 277–304). lawrence erlbaum. wolters, c. a., & taylor, d. j. (2012). a self-regulated learning persective on students engagement. in s. l. christenson, a. l. reschly, & c. wylie (eds.), handbook of research on student engagement (pp. 635–651). springer. https://doi.org/10.1007/978-1-4614-2018-7 wu, x., anderson, r. c., nguyen-jahiel, k., & miller, b. (2013). enhancing motivation and engagement through collaborative discussion. journal of educational psychology, 105(3), 622–632. https://doi.org/10.1037/a0032792 xie, k., hensley, l. c., law, v., & sun, z. (2019). self-regulation as a function of perceived leadership and cohesion in small group online collaborative learning. british journal of educational technology, 50(1), 456–468. https://doi.org/10.1111/bjet.12594 yeager, d. s., henderson, m., paunesku, d., walton, g., spitzer, b., d’mello, s., & duckworth, a. l. (2014). boring but important: a self-transcendent purpose for learning fosters academic self-regulation. journal of https://doi.org/10.4324/9781315697048 https://doi.org/10.1016/j.learninstruc.2015.12.003 https://doi.org/10.1080/00461520.2014.1002924 https://doi.org/10.1016/j.tate.2010.11.006 https://doi.org/10.3102/0002831214532515 https://doi.org/10.1080/00461520.2018.1442220 https://doi.org/10.1016/j.cedpsych.2020.101928 https://doi.org/10.1177/0022487102053001003 https://doi.org/10.1016/j.jsp.2011.04.001 https://doi.org/10.1006/ceps.1999.1015 https://doi.org/10.1007/978-1-4614-2018-7 https://doi.org/10.1037/a0032792 https://doi.org/10.1111/bjet.12594 38 | flr personality and social psychology, 107(4), 559–580. https://doi.org/10.1037/a0037637 yin, r. k. (2014). case study research design and methods (5th ed.). sage. zimmerman, b. j. (2000). attaining self-regulation: a social cognitive perspective. in m. boekaerts, p. r. pintrich, & m. zeidner (eds.), handbook of self-regulation (pp. 13–39). academic press. http://doi.org/10.1016/b978-012109890-2/50031-7 zimmerman, b. j. (2002). becoming a self-regulated learner: an overview. theory into practice, 41(2), 64– 70. https://doi.org/10.1207/s15430421tip4102_2 zimmerman, b. j. (2008). investigating self-regulation and motivation: historical background, methodological developments, and future prospects. american educational research journal, 45(1), 166–183. https://doi.org/10.3102/000283120 zusho, a., & clayton, k. (2011). culturalizing achievement goal theory and research. educational psychologist, 46(4), 239–260. https://doi.org/10.1080/00461520.2011.614526 appendix co-designed cr-srl complex tasks. joseph’s classroom title: understanding animal and human adaptations to the land the complex task co-designed for students in joseph’s class was divided into three major interconnected sections: (1) animal adaptations; (2) first nations’ adaptations to the land; and (3) my adaptation to school. section 1: animal adaptation the first section required the students to research on senses and the adaptation of any insect of their choice from the “bug wars playlist” posted on the class website designed by the teacher for this task. they were instructed to: (i) produce a best copy of a scientific drawing after viewing “austin’s butterfly” 7 ; (ii) use the “book creator” app to create a multimedia book; and, (iii) present and share their task online. section 2: first nations’ adaptations to the land building on what the students were learning on the first section, the second section focused on human adaptation with attention on the first nations peoples. section two asked the students to each research one of the aboriginal peoples in canada (e.g., inuit, metes and first nations). this section also required the students to compare their findings with their own daily lives by responding to the guiding questions, including: “what is the biggest difference? what is most surprising when i think of my life? if i was a first nation person my age, what 7 austin butterfly if a video of models, critique and constructive feedback. (https://www.youtube.com/watch?v=e_6pske3zfq) https://psycnet.apa.org/doi/10.1037/a0037637 http://doi.org/10.1016/b978-012109890-2/50031-7 https://doi.org/10.1207/s15430421tip4102_2 https://doi.org/10.3102/000283120 https://doi.org/10.1080/00461520.2011.614526 https://www.youtube.com/watch?v=e_6pske3zfq 39 | flr would i enjoy the most?”. next, the students are expected to gather in groups to record their thoughts and impressions of a field trip to museum of anthropology in a podcast. section 3: my adaptation to school the third section asked the students to build on what they were learning about animal adaptations, first nations’ challenges and adaptation, and research on their personal challenges in school and generate relevant strategies to support of their own adaptations. as part of the third section, the task ended by asking the students to gather in their small groups, discuss their common challenges and adaption strategies, and present their ideas through a role play. matthias’ classroom title: understanding your personal and cultural identity this task asked the students to reflect and respond to specific questions about: (1) their relationships and cultural context including how their culture shaped their identities and choices (e.g., by answering questions such as: “how do you choose your friends? do you base friendship on interests, age, cultural background, appearance, gender, religion, or other qualities?); (2) personal values and choices including how their values could be influenced by their cultures (e.g., by asking them to “list 5 things that are important to you/that you value in life, explain why each of them is important to you?”); and (3) personal strengths and abilities (e.g., by answering questions such as: “what would you say are some of your challenges and weaknesses? how are you using your strengths in your family, school, and relationships?”). part of the task also required the students to create a collage of images and words that described them culturally. the last part of the task asked the students to meet in their small groups and share their similarities and differences and present their findings to the class. abstract 1. introduction 1.2. creating classroom contexts to support engagement: crt and srl pedagogical practices 1.3. understanding engagement 2. research questions 3. method 3.1. design 3.2. participants 3.3. co-designing instructional practices 3.4. procedures 3.5. data collection 3.6. data analysis 4. findings 4.1. how did teachers integrate cr-srl practices into their complex tasks? 4.2. how were culturally diverse students engaging in cr-srl complex tasks? 4.3. how was culturally diverse students’ engagement related to teachers' cr-srl practices? 5. discussion 5.1. practices joseph and matthias integrated in their complex tasks 5.2. associations between student engagement and teacher practices 5.3. students’ perceptions and engagement: individual and context interactions 5.4. contributions and implications 6. conclusion keypoints references appendix joseph’s classroom section 1: animal adaptation section 2: first nations’ adaptations to the land section 3: my adaptation to school matthias’ classroom frontline learning research 5 (2014) 46-63 issn 2295-3159 corresponding author: elisabeth wegner, institute for educational science, university of freiburg, rempartstr. 11, d-79089 freiburg, germany. e-mail: elisabeth.wegner@ezw.uni-freiburg.de http://dx.doi.org/10.14786/flr.v2i3.83 46 | f l r student teachers’ perception of dilemmatic demands and the relation to epistemological beliefs elisabeth wegner a , nora anders a , matthias nückles a a university of freiburg, germany article received 24 january 2014 / revised 3 march 2014 / accepted 4 june 2014 / available online 14 june 2014 abstract teaching is characterized by contradictory demands, resulting in teaching dilemmas. for example, to promote the continuous learning of students, teachers need to set up rules and control them, which in turn can undermine students’ intrinsic motivation. teachers have to become aware of these contradictions and need to understand that not all aspects of good teaching can be maximized at the same time. an adequate representation of the dilemmatic nature of problems of teaching is therefore crucial for judging different teaching situations. also, an adequate epistemological understanding is needed. we assessed student teachers’ (n = 122) perceptions of demands in teaching in general and in regards to specific situations, as well as their epistemological beliefs. perception of demands in general influenced the judgment of specific situations, but there was also a situation-specific component. epistemological beliefs were related to the perceptions of demands in general, especially in situations in which the dilemmatic content was highly visible. together, findings suggest that epistemological beliefs shape the perception of demands in teaching in general, and that the perception of demand in general again influences perception in specific situations. keywords: dilemmas in teaching; epistemological beliefs; teacher decision making; reflection wegner et al. 47 | f l r 1. introduction can teachers ―force‖ students to be motivated? can they adapt instruction to students‘ individual needs and treat them equally at the same time? a number of researchers have pointed out that there are several aspects of teaching that are in conflict with each other (e.g. berlak & berlak, 1981; helsper, 2004; lampert, 1985). therefore, teachers need to continuously decide between equally desirable goals, even though deciding for one goal reduces the possibility of reaching another goal because both options cannot be maximized at the same time. dilemmatic demands, as well as uncertainties and role-conflicts, have also been linked to the high rate of teachers that retire early from their jobs (e.g. schwab & iwanicki, 1982). teacher candidates have been shown to have difficulties in dealing with these kinds of dilemmatic demands (e.g. harrington, 1995; levin, 2002; schoen, 2005). teachers and teacher candidates expect that more knowledge about pedagogy could solve dilemmatic problems (lampert, 1985; fenstermacher, 1994). therefore, the perception of demands in teaching should be related to beliefs about the nature of pedagogical knowledge or knowledge in general, that is, epistemological beliefs. also, there is evidence that some kinds of dilemmas are more apparent then others (levin, 2002; wegner & nückles, 2011). therefore, the awareness of dilemmatic demands might be situation-specific. the question of the role of epistemological beliefs in the perception of demands in teaching, and the question of situation-specificity of the perception of demands in teaching have important consequences for the development of measures for fostering awareness of dilemmatic demands. therefore, we examined in our study (1) how teacher students perceive the demands in teaching in general and how they judge specific dilemmatic teaching situations, (2) how the general perception of demands relates to judgment of specific situations, and (3) which role epistemological beliefs in general and in regards to pedagogy play in the perception of demands in teaching and the judgment of specific teaching situations. we will at first outline what we mean by dilemmatic demands and characterize teaching as dealing with ill-structured problems, then we will summarize the (sparse) research on teacher candidates‘ dealings with dilemmatic demands, and afterwards we will outline the relation of perception of demands in teaching to epistemological beliefs. finally we will present evidence from our study suggesting that the general perception of demands in teaching is related to epistemological beliefs, and also influences the judgment of specific teaching situations, but that there is also a situation-specific component in the judgment of teaching situations. 1.1 dilemmas in teaching and their sources dilemmas in teaching can be tracked down to multiple sources, such as insufficient resources, too many tasks, administrative hierarchies, and badly organized departments that can force teachers to choose between equally necessary actions (e.g. berlak & berlak, 1981; cuban, 1992; windschitl, 2002). other dilemmas stem from the multiple roles to which teachers are assigned within the educational system (e.g. schwab & iwanicki, 1982). for example, educational institutions typically fulfill the function of both educating and assessing students at the same time. because students‘ grades in school or university greatly impact the future lives of the students, students will usually try to get as good grades as possible. therefore, the double demand of educating and assessing can present teachers with the dilemma that they want students to indicate if they have problems, but because the teacher has the power to fail them, students might decide to conceal their problems from the teacher (helsper, 2004). resource dilemmas and role conflicts are a frequent, but not necessarily inherent aspect of teaching, because they might be overcome by a different organizational structure or a better allocation of resources. however, there are other dilemmatic demands that cannot be resolved since they are part of the very nature of teaching. these genuine teaching dilemmas are located ―in the idea of teaching, constituting contradictions or contradicting demands of ideals that are equally relevant and can equally claim validity‖ (helsper, 2004, p. 61, translated by author). the following teaching dilemmas are relevant in almost all educational settings: wegner et al. 48 | f l r the dilemma of self-regulation: how much should a teacher guide students to foster learning (bräu, 2008; labaree, 2000)? teachers need to guide students in learning, provide structure and feedback in order to facilitate learning. at the same time, these supporting actions reduce opportunities for students to learn in a self-regulated way, to develop their own approaches to learning, and to learn to give feedback for themselves (e.g. windschitl, 2002). also, too much structure leads to pressure which easily reduces intrinsic motivation (deci & ryan, 1985; labaree, 2000). the dilemmas of self-regulation have been discussed with regard to many different kinds of learning environments, such as computer aided learning (koedinger & aleven, 2007) or collaborative settings (dann, 2002). the dilemma of didactic structure: how should teachers arrange the learning contents? should they arrange the contents according to the substantive structure of the subject, that is the key principles, theories and explanatory frameworks of the discipline (schwab, 1964), or should they arrange the material according to problems and situations (e.g. geddis & wood, 1997)? while the systematic approach facilitates the understanding of the subject, there is the risk of ―inert knowledge‖, which is not available for students if they have to solve complex problems as encountered in real life settings (renkl, mandl, & gruber, 1996). on the other hand, the problem-based approach facilitates the transfer of knowledge to real-life situations, because the knowledge is acquired in a way that corresponds to situations in which it could potentially be used. nevertheless, arranging learning contents in a problem-based fashion makes it potentially more difficult for students to grasp the substantive structure of the subject (albanese & mitchell, 1993). assessment dilemmas: which reference standard should assessment follow? linking assessment to individual growth fosters intrinsic motivation and values the individuals‘ progress, but on the other hand, it would be unfair if students did not receive the same grade for the same output, thus creating a dilemma between criterion-based norm and individual-based norm (hager, gonczi, & athanasou, 1994; pearson, destefano, & garcia, 1998). another dilemma in assessment is the interdependence of validity and reliability (brookhart, 1994): reliable measurement of achievement needs clear criteria. this often leads to tests that ask students to reproduce knowledge rather than to demonstrate their ability to apply it (e.g., multiple choice questions). assessments of learning outcomes that allow for higher validity, such as essays or scientific writing, have usually a lower reliability because they are less standardized and assessment is more prone to multiple biases. heterogenity dilemma: how should teachers deal with the heterogeneity regarding students‘ prior knowledge, interests and needs? optimal teaching calls for respecting the individual and his or her needs, but at the same time teachers need to treat all students equally (e.g. ball, 1993; brodie, 2010; lampert, 1985; osborne, 1997). the dilemma of professional relationship: how closely or distanced should teachers relate to their students? teachers share with other professions the challenge that they have to maintain a professional relationship, that is, they have to build a relationship without emotional involvement. they need to be neutral and need authority, but at the same time they need to create a positive climate and relationship. this creates a tension between proximity and distance (labaree, 2000). often several teaching dilemmas and structural aspects interact in creating a dilemmatic situation for a teacher. also, sometimes several teachers are involved in a dilemma and have to face the consequences of the decision, for example in assessment dilemmas. other decisions are just dilemmatic for specific situations (for example, didactic structure of one lesson), while other decisions reach out further (for example, arrangement of contents for a whole term). teaching dilemmas can be amplified by diverging expectations of students and teachers (barcelos, 2001), especially if neither learners nor teachers are aware of the dilemmatic nature of the demands in teaching. 1.2 teaching as an ill-structured problem but what do teacher students need to learn in order to deal with dilemmatic demands? teaching can be viewed as an ―ill-structured problem‖ (nespor, 1987, p. 324), that is, ―a problem for which there are conflicting assumptions, evidence, and opinion which may lead to different solutions‖ (kitchener, 1983, p. wegner et al. 49 | f l r 223). the first crucial step in dealing with this kind of problem is to come to an adequate representation of the problem space. this means, before one can start solving the problem, one needs to determine whether the problem is solvable at all, which goals might be pursued, which strategies there are to deal with it, and by which criteria these strategies might be judged. the representation of the problem space is the frame for any further cognition, such as the actual determination of the goals and the actual selection of strategies (kitchener, 1983). which kind of representation a person develops about a problem is also influenced by their epistemological beliefs, (i.e.beliefs about the nature of knowledge and knowing), because one's beliefs about the available knowledge for dealing with a problem also influence the perception of whether a problem can be solved at all. for example, a person who expects pedagogical knowledge to be stable and simple will be more likely to expect all problems in teaching to be solvable than a person who believes that pedagogical knowledge is imprecise and permanently changing. therefore, teachers need to develop an adequate representation of the problems of teaching, that is, develop awareness for dilemmatic demands of teaching, in order to be able to act in dilemmatic teaching situations. also, they need adequate beliefs about the knowledge that is available to solve dilemmatic problems. 1.3 awareness for dilemmatic demands of teaching even though there has been a substantial amount of publications on the problem of teaching dilemmas (e.g. ball, 1993; berry, 2007; cuban, 1992; geddis & wood, 1997), there are few publications that look at teachers‘ awareness of the dilemmatic demands of teaching. lampert (1985) distinguishes four perspectives on dilemmatic demands. in the perspective of ―opposing camps‖, there is little or no awareness for the dilemmatic aspects of teaching. there is one right answer, and teachers with deviant opinions have to be convinced that they are wrong. the perspective of teachers besieged by expectations accepts dilemmatic demands, but teachers are described as helpless and troubled by these demands. the origin of the conflicts is mainly seen in the organizational structure of the educational system. therefore, dilemmas can be solved by changes in the system. however, this does not help with genuine teaching dilemmas. the perspective of teachers as technical production managers and cognitive information processors holds the idea that dilemmas are created by too little knowledge. therefore, researchers have to discover the rules of how to teach, and if teachers implement these rules correctly, all problems will be eliminated. a more refined version of this view accepts the complexity of teaching. in this approach, one has to specify conditions under which circumstances which teaching behavior is appropriate. if the resulting rules are implemented correctly, problems will disappear. this view seems to be especially attractive to pre-service teachers, political decision makers and the public (fenstermacher, 1994). such ―technical rationality‖ has been criticized repeatedly by educational researchers, teacher educators and practitioners (e.g. calderhead, 1989; hatton & smith, 1995; schön, 1983). therefore, lampert puts forward the view of the teacher as dilemma manager. in this perspective, teachers realize that there are dilemmas that cannot be resolved, but only managed by reflecting on different options and weighing arguments against each other. up to now, only a few empirical studies have been conducted on how teachers or teacher candidates conceive of such dilemmas in general, and how they judge different teaching situations. lack of research might also be due to the fact that most studies used qualitative methods such as interviews or writing tasks. such methodologies are, on the one hand, appropriate given the complexity of the research question and the multitude of different kinds of dilemmas. on the other hand, qualitative methods typically limit the research to small samples. for example, schoen (2005) reports that in a sample of 10 pre-service teachers in field placements, all participants experienced dilemmas regarding students‘ discipline (e.g. ―how can a teacher keep control in the classroom without being oppressive?‖). more than half of them struggled with dilemmas between ―teacher-directed‖ and ―student-centered‖ instruction, as well as with dilemmas in dealing with heterogeneity among students. also quite frequent were dilemmas resulting from the need to prepare students for high stakes testing (such as college entrance tests), while at the same time wanting to promote complex understanding. pre-service teachers faced dilemmas in the development of a personal identity (such as developing a professional relationship to their students without too much emotional involvement) and feeling torn between the demands of field supervisors and their teaching education institution, as well as wegner et al. 50 | f l r their own goals. similarly, levin (2002) asked 12 pre-service elementary school teachers to reflect on dilemmas they encountered in their field placements. most dilemmas revolved around the relationship with their cooperating teachers and students, or classroom management concerns. none of the pre-service teachers connected their dilemmas with structural, moral, social or political issues. levin concludes that preservice teachers were only ―beginning to see the complexity and ambiguity of teachers‘ work‖ (p. 215). also, this indicates that some kinds of dilemmas are more visible than others. harrington (1995) examined how student teachers‘ ability in making reasoned decisions on exemplary dilemmas developed within one semester. participants were given dilemmatic, ill-structured cases and had to identify important issues of the case, the priority of the issues at stake, and discuss different perspectives in interpreting the case. also, they had to propose solutions, analyze different consequences of the solution and add critique to their own solution and analysis. special emphasis was put on including different perspectives on the case. 65% of the participants had difficulties in identifying the ill-structured nature of the dilemmatic cases. they failed to make connections between the different issues they had identified and addressed the issues only in isolation. figures improved substantially during the course, thus indicating the need to support pre-service teachers‘ decision making skills. 1.4 perceptions of demands and epistemological beliefs as pointed out above, perceptions of demands in teaching should be related to epistemological beliefs, because beliefs about the domain of pedagogy in general should influence perception of pedagogical problems. epistemological beliefs have been described in different ways. some researchers describe epistemological beliefs in the form of different dimensions, such as structure, certainty and sources of knowledge, as well as control and speed of knowledge acquisition (schommer, 1994; hofer & pintrich, 1997). trautwein and lüdtke (2007) found two dimensions of epistemological beliefs, relativism (―scientific knowledge can change‖) and dualism (―there is just one truth‖). the two dimensions were not independent of each other, but correlated negatively (-.36). similarly, stahl and bromme (2007) described two negatively correlated dimensions, stability and texture. other researchers (king & kitchener, 1994; kuhn, 1991; kuhn, cheney, & weinstock, 2000), who have investigated the development of epistemological beliefs, have described a stage-like development of epistemological beliefs. individuals start out from absolutistic stages (―there is only one truth‖), develop into relativistic stages (―there is no truth but only opinions‖), and eventually reach the highest, evaluatistic stage (―knowledge is subjective but can be justified to various degrees‖). krettenauer (2005) argues that the distinction between dimensional models and stage models is a result of different methodological approaches. interviews bring out the stage-like qualities of development of epistemological beliefs, whereas questionnaires focus on inter-individual differences in regards to certain dimensions of epistemological beliefs at a given point in time (see also hofer & sinatra, 2010). therefore, when assessing epistemological beliefs, one has to choose the methodological approach in consideration of the goal of the assessment. both the dimensional models as well as the stage-models of epistemological beliefs assume that epistemological beliefs are the same in all domains. however, reviews have come to the conclusion that epistemological beliefs also have a strong domain-specific component (e.g buehl, alexander & murphy, 2002; muis, bendixen & haerle, 2006). muis, bendixen and haerle (2006) state that epistemological beliefs are influenced by the socio-cultural context. academic knowledge is situated in another socio-cultural context than everyday knowledge, and also the academic contexts differ between each other. therefore, individuals‘ epistemological beliefs can differ depending on whether they relate to everyday knowledge or to knowledge in academia, and they can also differ in relation to different domains. according to muis, bendixen and haerle, beliefs from different socio-cultural contexts influence each other reciprocally. within these contexts, beliefs develop stage-like from absolutistic via relativistic into evaluatistic stages. against this background, how does the perception of demands relate to epistemological beliefs? at present, there exists a paucity of empirical evidence. wegner and nückles (2011) studied academics‘ awareness for dilemmatic demands in teaching in higher education. in an interview study they assessed the argumentative reasoning of 36 academics with regard to four dilemmatic scenarios. the authors identified wegner et al. 51 | f l r five different perspectives on the scenarios, which mirrored both kuhn‘s stages of epistemological development (kuhn, 1991) and the types of dealing with dilemmas, as lampert (1985) had described them. interviewees adopting an absolutistic perspective did not see any dilemma in the scenarios. interviewees adopting a technological perspective acknowledged the complexity of the problem, but made decisions based on heuristics and clear rules, thus ignoring the dilemma. similarly, academics with a relativistic perspective denied the principally dilemmatic nature in the scenario, because they argued that each individual teacher has his/her own approach to teaching. the academics with the most advanced perspectives recognized that there was a dilemma. under the general evaluatistic perspective, academics acknowledged complexity and were aware that an easy, general answer is not possible. academics adopting a dilemma management perspective additionally stated that dilemmatic demands have to be weighed against each other and that the problem can only be solved by making reflected decisions between equally desirable goals. wegner and nückles also found that interviewees had different perspectives in different scenarios, and that some scenarios were perceived as dilemmatic by more participants than others. this indicates that the perception of demands was specific to the situation and that dilemmas vary in their visibility. schoen (2005) further analyzed in their above-mentioned study of ten pre-service teachers how they dealt with teaching dilemmas they experienced in their field placement. she also determined different levels in dealing with dilemmas based on king and kitchener‘s (1994) stages of development in reflective judgment, ranging from ―knowledge as limited to concrete observations‖ to ―knowledge as the outcome of reasonable inquiry‖. reflective judgment level was linked to the perception of dilemmas and teachers‘ classroom activities regarding these dilemmas. generally, pre-service teachers showed medium levels of reflective judgment, thus indicating the need for improving the awareness for genuine teaching dilemmas. contrarily to the wegner and nückles study, each teacher was assigned to one level of reflective judgment, i.e. no situation-specific component was determined. both studies, wegner and nückles (2011) as well as schoen (2007), suggest that there are structural similarities in the development of the perceptions regarding the dilemmatic nature of demands in teaching and in the development of epistemological beliefs as described in the stage models by kuhn (1991) as well as by king and kitchener (1994), but that the perception of demands in teaching is a different construct. however, both studies leave important questions open for further research. none of the studies assessed epistemological beliefs separately. therefore, no conclusions about the kind of relation between epistemological beliefs and the perception of demands in teaching can be drawn from these studies. also, the studies differ in regards to whether they describe situation-specific or general aspects of the perception of demands. 1.5 relations between epistemological beliefs, general perception of demands in teaching, and the judgment of different teaching situations from the review of literature it can be concluded that teachers need to be aware of the dilemmatic nature of teaching in order to make reflected decisions. the perception of demands in teaching shape the way teachers deal with a concrete teaching dilemma and thereby their ability to make reflected decisions. also, the perception of demands in teaching is related to epistemological beliefs, especially to beliefs in the domain of pedagogy, but is nevertheless a different construct. additionally, the perception of demands might vary based on the situation. for example, there might be a difference between dilemmas that are restricted to one teaching situation (e.g. choice of contents for one lesson), and dilemmas that are more visible because they have further consequences (e.g. choice of contents for a whole term). figure 1 summarizes the assumed relations between epistemological beliefs, the general perception of demands in teaching, and the judgment of different teaching situations based on the model on muis, bendixen and haertle (2006). general and domain-specific beliefs are taken together in the graphic in order to aid in clarity. wegner et al. 52 | f l r figure 1. relations between epistemological beliefs, general perception of demands in teaching, and the judgment of different teaching situations, adopted from muis, bendixen and haerle (2006), p. 31. the arrows a, b and c denote different hypotheses (see section 2, scope of the study). 2. scope of the study in our study, we aimed to examine (1) the perception of demands in teaching in general, (2) the relation of the general perception of demands in teaching to the judgment of specific teaching situations, and (3) the relation of epistemological beliefs to the perception of demands in general as well as to the judgment of specific teaching situations. based on muis, bendixen and haerle (2006), we assume that epistemological beliefs influence perceptions of demands in general. therefore we expect medium correlations between general perception of demands in teaching and general epistemological beliefs, and slightly higher correlations to epistemological beliefs in the domain of pedagogy (hypothesis a; see arrow a in fig. 1). also, perception of demands in general influences the judgment of specific situations, but there is also an influence of the situational context. specifically, we expected differences in judgment of situations with high visibility and with weak visibility of the dilemma. due to the dependence on the situational context, we expected medium correlations of the judgment of different teaching situations with the perception of demands in teaching in general (hypothesis b). finally, we expected the correlation between general epistemological beliefs and situation-specific measures of perception of demands to be only low, because of the strong dependence on the context. again, we assumed the correlation to epistemological beliefs in the domain of pedagogy to be slightly higher than general beliefs (hypothesis c). wegner et al. 53 | f l r 3. method 3.1 sample one hundred twenty-two teacher students preparing for teaching in college-track high schools (―gymnasium‖) took part in the study. all of them filled in the questionnaires in a paper-and-pencil version at the end of a lecture on pedagogy. participants were 22.2 years old (sd = 3.3) on average; 59% were female. half of the participants (50.2%) already had had teaching experience in a field placement, lasting at least 3 months. 3.2 material 3.2.1 general perceptions about demands in teaching for the development of the questionnaire on demands in teaching, in a first step, a broad range of statements capturing different beliefs about the general nature of demands in teaching were collected based on the literature, mirroring the different perspectives on demands as outlined by lampert (1985), wegner and nückles (2011) and schoen (2005, for examples see table 2). items were piloted with a small number of teacher students, until finally 30 items were included in the questionnaire. participants were asked to rate each statement on a 6-point scale (―i don‘t agree at all – i mostly don‘t agree – i rather don‘t agree – i rather agree – i mostly agree – i completely agree‖). 3.2.2 judgment of different teaching situations based on krettenauer (2005), we developed a format of assessment in which for each item two positions were described that were related to a dilemmatic decision in teaching (e.g. teacher a says: ―i rigidly check homework because students otherwise don‘t do their assignments.‖ teacher b says: ―i usually don‘t check homework. students need to learn that they are responsible for their own learning.‖). to make sure that participants actively thought about the statements, they were asked to indicate which statement reflected their opinion most. afterwards they were asked to rate four different judgments on a 6-point scale. these judgments were developed according to kuhn‘s (1994) stages of epistemological development, lampert‘s (1985) differentiation between different perspectives on dilemmas and wegner and nückles (2011) findings (table 1). a complete sample item is given in figure 2. the final version of the questionnaire contained eight different scenarios relating to different dilemmatic decisions: • decisions related to the dilemma of self-regulation (opposing statements about regulation within cooperative learning tasks in the classroom, opposing statements about monitoring self-regulated learning tasks such as homework in general) • one decision related to the heterogeneity dilemma (opposing statements in regards to the choice of tasks for a heterogeneous group) • two decisions related to the dilemma of didactic structure (problem-centered vs. contentcentered approaches in a chemistry class, opposing approaches to the choice of contents in history classes) • two decisions related to assessment dilemma (comparison of two students according to individual vs. criterion based norm; opposing statements about the adaptation of grading to students‘ individual situations) • one decision related to the dilemma of professional relationship (opposing statements about contact with students outside school) we varied the visibility of the dilemmas by varying whether the decision had only consequences for one specific situation (that is, choice of tasks for a group, the regulation within cooperative learning tasks in the classroom, problem-centered vs. content-centered approaches, contact with students outside school), or whether the decision had further consequences for future situationsor for other people as well (e.g. both of the assessment dilemmas, choices of contents for history classes, control over homework). wegner et al. 54 | f l r table 1. selection of judgments for the scenarios kuhn (1994): epistemological beliefs lampert (1985): dealing with wegner & nückles (2011) statement absolutistic stage opposing camps absolutistic perspective ―it is absolutely clear what is right” teachers as technical production managers technological perspective “there should be clear rules for what to do in this situation” relativistic stage relativistic perspective “everyone thinks something else. you have to develop your own style” evaluatistic stage dilemma manager evaluatistic perspective “both teachers have good reasons. one needs to weigh the options carefully” figure 2. sample item from the questionnaire on judgment of teaching situations 3.2.3 epistemological beliefs for the assessment of epistemological beliefs both in general as well as in regards to the domain of pedagogy, we chose questionnaires instead of interviews because we wanted to describe inter-individual differences in beliefs, and not individual belief structures. general epistemological beliefs were assessed by a german questionnaire on epistemological beliefs, containing the two dimensions, ―dualism‖ (sample item: wegner et al. 55 | f l r ―if two scientists have a different opinion on a matter, one of them has to be wrong.‖) and ―relativism‖ (sample item: ―scientific insights that seem true today can turn out to be wrong‖, trautwein & lüdtke, 2007). domain-specific epistemological beliefs in the area of pedagogical knowledge were assessed by using the questionnaire on connotative aspects of epistemological beliefs (caeb, stahl & bromme, 2007). the caeb aims at measuring connotative aspects of beliefs that are difficult to express. participants have to rate pairs of adjectives that represent a semantic differential (such as ―strong – weak‖) on a 7-point rating-scale. the caeb comprises two dimensions that are similar to the scales of trautwein and lüttke (2007). the dimension of ―texture‖ is related to the factor ―dualism‖ and contains items that describe the accuracy and structure of knowledge in a given domain (e.g. ―knowledge in pedagogy is… preciseimprecise‖, ―structured unstructured.‖). the dimension of ―stability‖ is related to the factor ―relativism‖ and describes the stability and dynamics of knowledge (e.g. ―knowledge in pedagogy is… stable unstable‖, ―dynamic static‖). 4. results 4.1. general perception of demands in teaching at first, we analyzed the general perception of demands in teaching. for this purpose, we first determined the factorial structure of the construct of general perception of demands in teaching. we performed an exploratory factor analysis (principal component analysis, pca), because we did not expect a certain number of factors due to the complexity of the construct. findings on epistemological beliefs suggest that different factors of the perception of demands as a form of epistemic thinking are correlated with each other (e.g. stahl & bromme, 1997; krettenauer, 2005, see above). therefore we used oblique rotation (promax). neither the scree plot nor the eigenvalue criterion yielded a clear picture of the number factors. therefore, we ran factor analyses with 3, 4 and 5 factors. the three-factor solution yielded the best result, explaining altogether 37.3% of the variance. we labeled the factors ―simple demands‖, ―subjective demands‖, and ―complex demands‖ (see table 2). items which had loadings < .3 were excluded. the scale of simple demands had the lowest mean values, with the complex demands scale having highest mean values. this shows a generally high awareness for the complexity of demands in teaching. internal consistency as measured by cronbach‘s ɑ was good. also, the three factors were inter-correlated. the complex demands factor was correlated negatively with the factors subjective demands and simple demands. simple and subjective demands were correlated positively (see table 3). table 2. characteristics of the scales on perception of demands highest loading item (factor loading) cronbach‘s ɑ m (sd) number of items simple demands “it is clear to teachers how they have to fulfill their task” (.692) .716 2.51 (0.56) 9 subjective demands “teachers with a good personality don’t have to think about their teaching” (.755) .710 2.45 (1.04) 8 complex demands “when planning a lesson, there are a lot of aspects that have to be considered” (.711) .700 4.65 (0.53) 6 wegner et al. 56 | f l r table 3. inter-correlation of the three factors representing perceptions of teaching demands in general 1 2 3 1 simple demands 1 .40** -.48** 2 subjective demands 1 -.20* 3 complex demands 1 note: ** p < .01, * p < .05 4.2 medium correlation between perception of demands in teaching and epistemological beliefs (hypothesis a) to determine the relation between the general perception of demands and epistemological beliefs, we calculated the two factors of the caeb according to stahl and bromme (2007), as well as the two scales on general epistemological beliefs (―relativism― and ―dualism―) according to trautwein and lüdtke (2007). epistemological beliefs in the domain of pedagogy were correlated with perceptions of demands in teaching (table 4). the factor of ―texture‖ correlated negatively with perception of demands as simple, and positively with the perceptions of demands as complex. this means that persons who perceived knowledge in pedagogy as rather well structured were also likely to perceive demands as simple. the factor of ―stability‖ was positively correlated with perceptions of the demands as simple, and negatively with demands as complex. also, general epistemological beliefs that knowledge is stable and simple were also correlated with perception of teaching as simple. all correlations were significant, but only at a small to medium degree. taken together, these results support our hypothesis that epistemological beliefs are related to the perception of demands in teaching, but that the perception of demands in teaching is a separate construct. nevertheless, there was no difference between domain-specific and general epistemological beliefs. 4.3 general perception of demands and situation specificity of judgments of teaching demands (hypothesis b) 4.3.1 situation specific aspects next we analyzed how the perception of demands in general was related to judgment of specific teaching situations. for this purpose, we calculated in a first step a general measure across all kinds of situations. because for each scenario, participants had to rate the same four strategies on a 6-point scale, we calculated means for each strategy across the eight scenarios. the strategy of reflective decision making was rated highest, whereas simple decision making received the lowest values (see table 5), indicating that students were in general aware of the dilemmatic content of the decisions. the scores were correlated systematically: simple decisions correlated positively with clear rules and negatively with own style and reflective decision making. own style also correlated negatively with clear rules and positively with reflective decision making (see table 6). we could not find any differences in regards to demographic measures or field experience. internal consistency over the scenarios was low to medium, ranging from .449 (reflective decision making) to .691 (clear rules). this indicates that there is some consistency across the situations, but also a situation-specific component in the judgment of teaching situations. wegner et al. 57 | f l r table 4. correlation between epistemological beliefs and general perception of demands domain specific epistemological beliefs: pedagogy general epistemological beliefs texture stability relativism dualism m (sd) 4.29 (.75) 3.80 (.43) 1.89 (.47) 1.92 (.48) simple demands -.24** .29** .27** .30** subjective demands -.04 -.00 .13 .27* complex demands .29** -.32** -.05 -.18 table 5. characteristics of the scales on judgment of different teaching situations scale prototypic statement m (sd) min max cronbach‘s ɑ simple decisions ―it is absolutely clear what is right‖ 2.67 (.68) 1.00 4.71 .618 clear rules ―there should be clear rules what to do in this situation.‖ 3.54 (.81) 1.29 5.71 .691 own style ―that is just a matter of opinion. you‘ve got to develop your own style.‖ 3.83 (.65) 2.00 5.86 .635 reflective decision making ―one needs to weigh the options carefully.‖ 4.56 (.60) 2.71 6.00 .449 table 6. intercorrelation between the four scales 1 2 3 4 1 simple decisions 1 .33 ** -.25 ** -.42 ** 2 clear rules 1 -.26 ** -.05 3 own style 1 .45 ** 4 reflective decision making 1 note: ** p < .01, * p < .05; next we compared situations with high and low visibility of the dilemma. judgments differed between scenarios in which the decision had consequences for one instance only (weak visibility of the dilemma), and scenarios in which the decision had further reaching consequences as well as consequences for other teachers (high visibility of the dilemma, see fig. 3). we analyzed differences between both kinds of scenarios by four one-factorial anovas with repeated measurement, with weak vs. high visibility of the dilemma as within-subject factor and the strategy under consideration as the dependent measure. wegner et al. 58 | f l r dilemmas with weak visibility were rated to a higher degree as simple decisions than those with high visibility, f(122, 1) = 7.959, p=. 01, partial η2= .062, but there were no differences between the two kinds of dilemmas in regards to the rating of reflective decision making, f(122, 1) = 1.997, p=. 160, ns, partial η2= .016. however, in situations in which the dilemmatic content was highly visible, because rather large consequences or consequences for other teachers were to be expected, clear rules were rated higher than in dilemmas with weak visibility. for the development of an own style, the result was reversed: for the dilemmas with low visibility, the development of an own style was rated higher than for the dilemmas with high visibility (difference between the items for clear rules: f(122,1) = 121.062, p = .000, partial η2= .50; for own style: f(122,1) = 74.547, p = .000, partial η2= .38). this seems adequate to the situations, because the highly visible dilemmas contained scenarios with consequences for others, which clear rules might help to minimize. figure 3. mean ratings of teaching situations in regards to decisions with individual and with school-wide consequences. 4.3.2. relation of the general perception of demands to the judgment of specific situations we analyzed how the general perception of demands related to the judgment of specific situations. we found systematic relations (see table 7). perceiving general demands in teaching as simple was associated with positive judgment of simple decisions in specific situations. general perception of demands as subjective was related mildly to simple decisions as well as to developing one‘s own style in teaching. interestingly, perception of demands as complex was associated most strongly with a positive appreciation for the establishment of rules, and only mildly with the strategy of reflective decision making. this indicates that students might wish for a reduction in the complexity of situations. we checked whether the correlation patterns were different for situations with weak and with high visibility of the dilemma. in both types of situations, simple demands were correlated significantly with simple decisions, and complex demands with the establishment of clear rules. only for situations in which the dilemma was highly visible, positive correlations between subjective demands and simple decisions as well as with the development of an own style, and negative correlations with the establishment of clear rules were significant. from this pattern of results, we can conclude that general perceptions of demands do influence the judgment of teaching situations, but there also is a situation-specific component. the influence of general perceptions of demands seems to be somewhat stronger for situations in which consequences are to be expected for other teachers, that is, for situations in which the content is experienced as particularly dilemmatic. taken together, (a) the medium internal consistency of the four scales on judgment of specific teaching situations, (b) the differences between dilemmas with high and with weak visibility, and (c) the wegner et al. 59 | f l r medium correlation of general perception of demands with the judgment of the teaching situation can be seen as an indicator that there is both a personal as well as a situation-specific component of perception of demands, thus confirming hypothesis b. table 7 correlations between perception of demands and judgment of teaching situations (n=122). weak = scenarios with weak visibility of the dilemma, high = scenarios with high visibility of the dilemma, all = all scenarios. simple demands subjective demands complex demands weak all high weak all high weak all high simple decisions .23 * .29 ** .26 ** .10 .23 ** .28 ** -.02 -.04 -.01 clear rules .10 -.03 -.14 -.00 -.15 -.26 ** .24 ** .36 ** .37 ** own style -.03 -.03 -.03 .17 .23 ** .22 * .14 .07 -.00 reflective decision making -.06 -.15 -.18 * -.07 -.13 -.14 .19 * .15 .07 note: ** p < .01, * p < .05; 4.4 relation of judgment of teaching situations to epistemological beliefs (hypothesis c) last, we checked the relation between epistemological beliefs and the judgment of teaching situations. we only found small or no correlations between judgment of specific situations and general epistemological beliefs or epistemological beliefs in the domain of pedagogy (table 8). as with the other scales, neither gender, nor subject of study, nor field experience as teacher had an impact on the epistemological beliefs. again, for the highly visible dilemmas the relationship between epistemological beliefs was more pronounced than for dilemmas with weak visibilty. this indicates that epistemological beliefs have only a minor influence on the judgment of specific teaching situations. table 8. means and sd for the epistemological beliefs. correlations between epistemological beliefs and perception of demands as well as strategies domain specific epistemological beliefs: pedagogy general epistemological beliefs texture stability relativism dualism m (sd) 4.29 (.75) 3.80 (.43) 1.89 (.47) 1.92 (.48) simple decisions -.19 * .07 .15 .19 * clear rules -.11 .02 .10 .04 own style .18 -.16 -.02 .02 reflective decision making .15 -.01 .03 -.02 note: ** p < .01, * p < .05 wegner et al. 60 | f l r 5. discussion in our study, we examined how teacher students perceive demands in teaching in general and in specific situations, and how these perceptions relate to epistemological beliefs in general and in the domain of pedagogy. epistemological beliefs correlated with perception of demands in teaching in general, but only mildly with judgment of specific teaching situations. general perception of demands in teaching influenced the judgment of specific situations, especially in situations in which the dilemma was highly visible. also, there was a situation-specific component in the judgment of situations, as indicated by the medium to low internal consistency of the judgments across all teaching situations. taken together, the results can be interpreted in such a way that epistemological beliefs shape the general perception of demands, and that the general perception of demands shapes the way different teaching situations are judged. the influence is especially strong in situations in which the dilemma is especially visible. therefore, it is important to help teacher students to understand the dilemmatic content of specific situations as well as to develop a differentiated perspective on teaching in general. however, these results are based on correlations and cannot be interpreted as causal relations. longitudinal designs are needed to further support our hypothesis. generally, teacher students showed a high awareness for dilemmatic demands. in regards to specific situations, reflective decision making was rated as the best way to deal with the situation, whereas simple decisions received the lowest rating. links between perceptions of demands in general with the judgment of teaching situations yielded an interesting pattern. for both kinds of scenarios (weakly vs. highly visible dilemmas), general perceptions of the demands as simple were related to judgment of dilemmatic situations as simple, but also a complex representation of the demands in teaching led to a positive judgment of the establishment of clear rules. this was interesting, because rules can help to reduce the complexity of teaching (e.g. koedinger, booth & klahr, 2013). we conclude that especially teacher students who experience teaching as a very complex task wish to be supported in difficult teaching situations by clear directions for dealing with the situation. however, this can be problematic, because rules can prevent teachers from acting deliberately and reflectively in such situations (lampert, 1985). therefore, teacher students should be prevented from thinking about rules in the form of a rigid, technological perspective, but rather be supported as thinking of them as a guideline or heuristic. generally, the ability to deal with contradicting demands is one of the core competences of teachers (e.g. berlak & berlak, 1981; labaree, 2000) that has received little attention by empirical researchers. the present study gives first insights into student teachers‘ perceptions of demands in teaching. because the results are only based on self-reports in questionnaires, we cannot make any inferences about actual decision making in dilemmatic situations. however, the study is a first step in the exploration of teachers‘ ability in dealing with this kind of demands. research on teachers‘ dealing with contradictory demands should therefore be put on the research agenda. future research should be especially directed to the question of how this ability can be fostered and which kind of interventions are most helpful in making teacher students aware that teaching is not merely a question of heuristics and simple answers, but that the challenge in teaching is to manage dilemmas by reflected decision making (lampert, 1985; nückles & wegner, 2013). keypoints teachers have to realize that demands in teaching are contradictory in order to be able to make reflected decisions perception of demands in teaching has a situation-specific as well as a general component, and is related to epistemological belief, especially in situations with strong dilemmatic content. perception of demands as complex promotes reflective judgment of teaching situations, but also the wish for implementing clear rules for everyone. wegner et al. 61 | f l r references albanese, m. a., & mitchell, s. (1993). problem-based learning: a review of literature on its outcomes and implementation issues. academic medicine, 68(1), 52-81. doi:10.1097/00001888-199301000-00012 ball, d. l. (1993). with an eye on the mathematical horizon: dilemmas of teaching elementary school mathematics. the elementary school journal, 93(4), 373-397. barcelos, a. m. f. (2001). the interaction between students' beliefs and teachers' beliefs and dilemmas. in b. johnston & s. irujo (eds.), research and practice in language teacher education. selected papers from the first international on language teacher education (pp. 69-86). minneapolis: university of minnesota. berlak, a., & berlak, h. (1981). dilemmas of schooling: teaching and social change: london: routledge. berry, a. (2007). reconceptualizing teacher educator knowledge as tensions: exploring the tension between valuing and reconstructing experience. studying teacher education, 3(2), 117-134. doi: 10.1080/17425960701656510 bräu, k. (2008). die betreuung selbstständigen lernens—vom umgang mit antinomien und dilemmata. [supporting self-regulated learning – of dealing with antinomies and dilemmas].in: breidenstein, g. & schütze f.: paradoxien in der reform der schule. ergebnisse qualitativer sozialforschung. [paradoxes of the school reform. results of qualitative research.] (pp. 179-199) wiesbaden: vs verlag für sozialwissenschaften. brodie, k. (2010). pressing dilemmas: meaning-making and justification in mathematics teaching. journal of curriculum studies, 42(1), 27-50. doi:10.1080/00220270903149873 brookhart, s. m. (1994). teachers' grading: practice and theory. applied measurement in education, 7(4), 279-301. doi:10.1207/s15324818ame0704_2 buehl, m. m., alexander, p. a., & murphy, p. k. (2002). beliefs about schooled knowledge: domain specific or domain general?. contemporary educational psychology, 27(3), 415-449. doi: 10.1006/ceps.2001.1103 calderhead, j. (1989). reflective teaching and teacher education. teaching and teacher education, 5(1), 4351. doi: 10.1016/0742-051x(89)90018-8 cuban, l. (1992). managing dilemmas while building professional communities. educational researcher, 21(1), 4-11. doi:10.3102/0013189x021001004 dann, h.-d. (2008). lehrerkognitionen und handlungsentscheidungen. [teachers‗ cognitions & decision making]. in: schweer, m. k. w. (ed.): lehrer-schüler-interaktion. inhaltsfelder, forschungsperspektiven und methodische zugänge [teacher-student-interaktion. content areas, perspectives for research and methodology] (pp. 177-207). opladen: leske und budrich deci, e.l. & ryan, r.m. (1985). intrinsic motivation and self-determination in human behavior. new york: plenum press. fenstermacher, g. (1994). the knower and the known: the nature of knowledge in research on teaching. review of research in education, 20, 3-56. geddis, a. n., & wood, e. (1997). transforming subject matter and managing dilemmas: a case study in teacher education. teaching and teacher education, 13(6), 611-626.doi: 10.1016/s0742051x(97)80004-2 hager, p., gonczi, a., & athanasou, j. (1994). general issues about assessment of competence. assessment & evaluation in higher education, 19(1), 3-16.doi: 10.1080/0260293940190101 harrington, h. l. (1995). fostering reasoned decisions: case-based pedagogy and the professional development of teachers. teaching and teacher education, 11(3), 203-214. doi: 10.1016/0742051x(94)00027-4 hatton, n., & smith, d. (1995). reflection in teacher education: towards definition and implementation. teaching and teacher education, 11(1), 33-49. doi: 10.1016/0742-051x(94)00012-u helsper, w. (2004). antinomien, widersprüche, paradoxien: lehrerarbeit – ein unmögliches geschäft? [antinomies, contradiction, paradoxes: working as a teacher – an impossible job?] in b. kochpriewe, kolbe fritz-ulrich, & j. wildt (eds.), grundlagenforschung und mikrodidaktische reformansätze zur lehrerbildung [fundamental research and microdidactic reform ideas in teacher education] (pp. 49-98). bad heilbrunn/obb.: klinkhardt. wegner et al. 62 | f l r hofer, b. k., & pintrich, p. r. (1997). the development of epistemological theories: beliefs about knowledge and knowing and their relation to learning. review of educational research, 67(1), 88140. doi: 10.3102/00346543067001088 hofer, b. k., & sinatra, g. m. (2010). epistemology, metacognition, and self-regulation: musings on an emerging field. metacognition and learning, 5(1), 113-120. doi: 10.1007/s11409-009-9051-7 king, p. m., & kitchener, k. s. (1994). developing reflective judgment: understanding and promoting intellectual growth and critical thinking in adolescents and adults. jossey-bass: san francisco. kitchener, k. s. (1983). cognition, metacognition and epistemic cognition: a three-level model of cognitive processing. human development, 26(4), 222–232. doi: 10.1159/000272885 koedinger, k. r., & aleven, v. (2007). exploring the assistance dilemma in experiments with cognitive tutors. educational psychology review, 19(3), 239-264. doi: 10.1007/s10648-007-9049-0 koedinger, k. r., booth, j. l., & klahr, d. (2013). instructional complexity and the science to constrain it. science, 342(6161), 935-937. doi: 10.1126/science.1238056 krettenauer, t. (2005). die erfassung des entwicklungsniveaus epistemologischer überzeugungen und das problem der übertragbarkeit von interviewverfahren in standardisierte fragebogenmethoden. [measuring the developmental level of epistemological beliefs and the problem of transfering interview procedures to standardized questionnaire methods] zeitschrift für entwicklungspsychologie und pädagogische psychologie, 37(2), 69–79. doi: 10.1026/00498637.37.2.69 kuhn, d. (1991). the skill of argument. cambridge: cambridge university press. kuhn, d., cheney, r., & weinstock, m. (2000). the development of epistemological understanding. cognitive development, 15(3), 309–328.doi: 10.1016/s0885-2014(00)00030-7 labaree, d. f. (2000). on the nature of teaching and teacher education: difficult practices that look easy. journal of teacher education, 51(3), 228-233. doi:10.1177/0022487100051003011 lampert, m. (1985). how do teachers manage to teach? perspectives on problems in practice. harvard educational review, 55(2), 178-195. levin, b. b. (2002). dilemma-based cases written by preservice elementary teacher candidates: an analysis of process and content. teaching education, 13(2), 203-218. doi: 10.1080/1047621022000007585 muis, k. r., bendixen, l. d., & haerle, f. c. (2006). domain-generality and domain-specificity in personal epistemology research: philosophical and empirical reflections in the development of a theoretical framework. educational psychology review, 18(1), 3-54.doi: 10.1007/s10648-006-9003-6 nespor, j. (1987). the role of beliefs in the practice of teaching. journal of curriculum studies, 19(4), 317–328. doi: 10.1080/0022027870190403 osborne, m. d. (1997). balancing individual and the group: a dilemma for the constructivist teacher. journal of curriculum studies, 29(2), 183-196. doi:10.1080/002202797184125 pearson, p., destefano, l., & garcia, g. (1998). ten dilemmas of performance assessment. in: harrison, c. & salinger, t. (eds), assessing reading: theory and practice (pp. 21-49). london: routledge. renkl, a., mandl, h., & gruber, h. (1996). inert knowledge: analyses and remedies: educational psychologist. educational psychologist, 31(2), 115-121. doi: 10.1207/s15326985ep3102_3 schoen, l. (2005). learning to make sense of the dilemmas of teaching practice: an exploration of preservice teachers. online submission journal citation: ph.d. dissertation. boston: boston college. schommer, m. (1994). synthesizing epistemological belief research: tentative understandings and provocative confusions. educational psychology review, 6(4), 293–319. doi: 10.1007/bf02213418 schön, d. (1983). the reflective practitioner: how professionals think in action: basic books. schwab, j.j. (1964). the structure of disciplines: meanings and significance. in: ford, g.w. & pugno, l. (eds.). the structure of knowledge and the curriculum. chicago: rand mcnally. schwab, r. l., & iwanicki, e. f. (1982). perceived role conflict, role ambiguity, and teacher burnout. educational administration quarterly, 18(1), 60-74. doi:10.1177/0013161x82018001005 stahl, e., & bromme, r. (2007). the caeb: an instrument for measuring connotative aspects of epistemological beliefs. learning and instruction, 17(6), 773–785. doi: 10.1016/j.learninstruc.2007.09.016 wegner et al. 63 | f l r trautwein, u., & lüdtke, o. (2007). epistemological beliefs, school achievement, and college major: a large-scale longitudinal study on the impact of certainty beliefs. contemporary educational psychology, 32(3), 348-366. doi:10.1016/j.cedpsych.2005.11.003 wegner, e.& nückles, m. (2011). die wirkung hochschuldidaktischer weiterbildung auf den umgang mit widersprüchlichen handlungsanforderungen. [impact of professional development on dealing with contradictory demands in teaching]. zeitschrift für hochschulentwicklung, 6 (3), 172-188. windschitl, m. (2002). framing constructivism in practice as the negotiation of dilemmas: an analysis of the conceptual, pedagogical, cultural, and political challenges facing teachers. review of educational research, 72(2), 131-175. doi: 10.3102/00346543072002131 microsoft word gonzalez-ocampo et al_publication .docx ! ! ! frontline learning research vol.3 no. 3 special issue (2015) 23 38 issn 2295-3159 corresponding author: dr. janice malcolm, reader in higher education, centre for the study of higher education, uelt building, university of kent, canterbury ct2 7nq, uk, (+44) 01227 824579, cshe@kent.ac.uk doi: http://dx.doi.org/10.14786/flr.v3i3.191! ! the curriculum question in doctoral education gabriela gonzález%ocampoa, margaret kileyb, amélia lopesc, janice malcolmd, isabel menezese, ricardo moraisf, viivi virtaneng a ramon llull university, spain b the australian national university, and university of newcastle, australia c university of porto, portugal d university of kent, united kingdom e school of economics and management, universidade católica portuguesa (porto), portugal f university of helsinki, finland article received 18 july 2015 / revised 18 july 2015 / accepted 23 july 2015 / available online 25 september 2015 abstract the landscape of doctoral education has changed immensely during the last decades. different transnational policies, different publics, different purposes and different academic careers all contribute to the need for a new understanding of this underresearched field. our focus is on explicit curriculum analysis to undertake intentional and meaningful change, especially in terms of the processes and outcomes of doctoral education. we draw on research on doctoral education, as well as the emerging literature on early career researchers (ecrs) and on professional learning, and consider how the concept of curriculum can help us think differently about doctoral education, particularly in relation to processes and outcomes. finally, we suggest a research agenda for developing the curricula of doctoral education. keywords: doctoral education; curriculum; processes, outcomes, professional learning !gonzalez)ocampo!et!al! | f l r ! ! 24! 1. introduction in recent years there has been a burgeoning of research interest into the experiences of phd/doctoral students and supervisors, although much of this work is limited to specific models and contexts of doctoral education (e.g., gardner, 2007; golde, 2005; ives & rowley, 2005; mcalpine, paulson, gonsalves, & jazvac-martek, 2012; pyhältö, vekkaila, & keskinen, 2012; scaffidi & berman, 2011; vekkaila, pyhältö, & lonka, 2013). there has been a clear evolution from the individual focus of the “master-apprenticeship” model to more structured programmes, an increasing number of candidates, growing internationalisation of the academy, and the emergence of new types of phds which reconfigure the relationship between research and practice (brew & peseta, 2004; pearson, evans, & macauley, 2008; walker, golde, jones, bueschel, & hutchings, 2008). as the quality of research and supervision are increasingly recognised as decisive in the process of doing a phd (and the products that emerge from it), the question of “what a phd really is” is also under discussion, leading wellington (2013) and others to explore the possible meanings of “doctorateness”. this research field has struggled to keep pace with proliferating phd formats and diverging practices. doctoral contexts such as “practice as research”, professional doctorates, phds based entirely on publications, etc. are less well understood than more traditional formats, and are stimulating increasing research interest (e.g. the carnegie project on the education doctorate http://cpedinitiative.org/; kot & hendel, 2012; nelson, 2006). if we look at researcher education more broadly conceived, there has been very little work on the extended ‘adolescence’ of academic researchers, or on the experiences and trajectories of researchers in other professional fields beyond the academy; mcalpine, amundsen, and turner (2013) offer one of the few contributions in this area. this raises questions about how far doctoral education succeeds (or perhaps has ever succeeded) in providing appropriate professional preparation and enhancement to those for whom an academic career is a clear motivation. even more so than in undergraduate education, the doctoral student has commonly been seen as an apprentice member of a disciplinary community. the phd degree, once considered the pinnacle of academic achievement, is increasingly regarded as a kind of entry-level global academic passport offering junior scholars access to an insecure career (pearson et al., 2008). yet as we have seen, the proliferation of types of doctorate has been partly driven by the demands of careers outside academia. the development of professional doctorates, and the realisation that many, or even most, phd graduates will experience careers outside the academic labour market have given impetus and legitimacy to the inclusion of employability skills in doctoral education (baker & lattuca, 2010). these developments have been justified in terms of harmonisation and flexibility, and have borrowed heavily from skills models used in vocational education (see e.g., vitae); as yet we have little evidence of how effective they are at meeting their multiple (and often unclear) purposes. in practice, this skills-oriented approach to doctoral education tends to meet resistance from subscribers to a more purist view of the university, according to which academic freedom is not compatible with external standardisation initiatives (kiley, 2014). however these changes also raise important questions about academic judgments and assessment processes at the doctoral level, which are far from standardised and remain, for the most part, poorly understood. these profound changes are occurring in a context where we have scarcely begun to explore the nature of the formal and informal curriculum of doctoral education (eua, 2007). the introduction of the idea of curriculum in phd programmes necessitates urgent discussion among educators from different backgrounds and with different perspectives on doctoral education. we need a clearer understanding of how the curriculum of doctoral education works, how it can be developed to meet changing needs, and how its outcomes can be appropriately assessed. in this paper our goal is to explore whether an explicit curriculum approach can help us make sense of existing research and practices regarding the processes and outcomes of doctoral education. our starting point is that, whether we acknowledge it or not, the curriculum is inevitably there; and adopting an explicit curriculum approach will help us to disclose the tensions between the formal/informal, open/hidden, and standardised/pluralised dimensions in doctoral education and brought to our attention by enders (2002). these dimensions are summarised in table 1 and contribute to a research agenda that allows us to develop more nuanced and useful understandings of the doctoral education curriculum. we blend the contributions of an ecological and socio-constructivist perspective (e.g. bronfenbrenner, 1979, 1986; lave, 1988; vygotsky, !gonzalez)ocampo!et!al! | f l r ! ! 25! 1978) with curricular viewpoints grounded in the policy cycle of stephen ball (1994) that frame a vision of the curriculum as a contextual and social-cultural phenomenon entailing a continuous meaning-making process in which diverse interpretations struggle to emerge (lopes & lópez, 2010). however, before addressing the above in the context of doctoral education we suggest that it might be helpful as background to outline a more standard approach to curriculum, the sort of approach we might see in texts related to coursework degrees at the tertiary level (e.g. kiley, 2014; print, 1987). while the starting point in curriculum is generally contested, one might begin with examining the aims for the course or program. this is where a question such as: what is the teacher/ faculty/ university aiming to achieve with this course? engaging staff in answering this question often uncovers many of the implicit, as well as explicit, views held by participants. at the doctoral level, asking what might appear to be such a simple question is likely to highlight a wide and complex set of responses. again, while curriculum development is rarely linear, for the sake of argument, the next question that can be asked is: what knowledge, skills and attitudes is it expected that the learner will be able to demonstrate following engagement in this program? this stage is often termed “learning outcomes” and at the doctoral level it is again contested with comments ranging from employment skills through to higher level cognitive skills and an original contribution to knowledge. a logical next step in light of having determined the potential learning outcomes is the identification of the possible learning content and activities that are to be provided to allow the learner to engage in appropriate learning. in some cases this is referred to as the syllabus. again at the doctoral level, the notion of content for learning is wide and varied often depending on country, discipline, and type of doctorate being undertaken. linked to the content is the consideration of pedagogy. it is during this stage in curriculum development that questions are posed regarding the teaching approaches to be used. until recently pedagogy was a term that was rarely used in relation to doctoral education (boud & lee, 2009). rather, there was an assumption that the supervisor/mentor/adviser would work with the candidate in private and mysterious ways until the candidate had achieved a level of doctorateness (trafford & leshem, 2009). the concept of achieving doctorateness brings us to the next stage of curriculum development, that is, assessment. in much of the curriculum literature there is discussion of the concept of the aligned curriculum, that is, where the assessment strategies closely align with the espoused learning outcomes (biggs, 2003). at the doctoral level there are various practices ranging from the inclusion of the results of coursework in the assessment through to examination of the written thesis only, or the inclusion of assessment of the candidate’s performance in an oral examination. the final stage in this formalized model of curriculum development is evaluation where the various stages, activities and outcomes are evaluated in an ongoing fashion. following this formal and somewhat stylised discussion of curriculum we now discuss the concept of the curriculum in doctoral education in more sophisticated and complex ways, and then address the processes and outcomes of doctoral education. the analysis of research and practice suggests that there is a strong need for a research agenda that will help reconfigure the notion of curriculum in doctoral education. 2. the curriculum in doctoral education as noted above, it is relatively unusual to speak of “curriculum” in relation to doctoral education. jones, in his review of 40 years of research on doctoral education (jones, 2013), does not use the term at all, though it is implicit in several of the themes he identifies, such as programme design, doctoral writing and research, and socialisation; this obliquity is echoed too in calma and davies’ (2015) review of the history of one key higher education journal. the fact that the curriculum in doctoral education is not explicitly discussed does not make it less significant. however it may hinder our recognition of how the curriculum !gonzalez)ocampo!et!al! | f l r ! ! 26! can generate and reproduce inequalities, and of the need for change and adaptation to the new challenges of doctoral education. moreover, a focus on the curriculum must acknowledge the particularities of doctoral education, and in particular the possible incompatibility between current tendencies towards regulation and structure, and the flexibility and plurality inherent in doctoral education (enders, 2004; pearson et al., 2010). this is only possible if we recognise the curriculum as the unacknowledged “elephant in the room”. in theories of formal education, the curriculum is often understood as a structured selection of propositional knowledge and/or skills which learners need to acquire in order to meet the aims and objectives of the learning programme (e.g. eraut, 2000; print, 1987). with aims and outcomes clearly defined and made explicit, it is then possible to “align” these with appropriate learning activities and assessment strategies (biggs, 2003). however, as colley, hodkinson, and malcolm (2003) argue, “formal” learning activities are only ever one strand of any learning situation and cannot be extricated from the social context in which they take place. although there may be broad learning objectives for higher education programmes, it is expected that learners will develop a degree of self-direction, and will inevitably emerge from the process with differing understandings of the academic content and with varied mastery of research skills. thus educational researchers have turned increasingly to a range of alternative approaches to the curriculum to analyse what is learned at university, and how it is learned (e.g. brennan et al., 2009). doctoral education specifically entails a further shift of emphasis away from the standardised formal curriculum (see table 1), and towards a highly complex set of structures, practices and expectations from which doctoral students and their supervisors create new and unpredictable learning. this complex and pluralised perspective, seeks to address the diversity of training needs and career preferences by adjusting not only to labour concerns but also to students, supervisors and administrators. however, social and labour claims for specific needs may lead into the development of standardised programs. thus, curricula in doctoral education may struggle to find a balance between these two perspectives that contribute to defining their orientation. therefore, a curricular perspective cannot ignore core changes and challenges doctoral education entails. pearson et al. (2010) point out that in the current context of doctoral education, “opportunities for researchers, or employees with enhanced research skills, now arise inside universities and in non-university settings where knowledge and professional industries develop their capacity to carry out work that draws on specialist knowledge and research skills (e.g. contract research, university administration, school teaching, nursing and business)” (p. 348). discussing the dilemma between standardisation and pluralism, described in table 1, the same authors advise that “any attempt to resolve [it] must draw on a fully accurate and up to date picture of the contemporary doctoral experience and address the goals, motivation and expectations of the increasingly diverse doctoral population. particularly important is recognition that the connection and integration of work and learning is an issue for research education, as for other forms of higher education” (pearson et al., 2010, p. 349). a curricular perspective on doctoral education may then take into account the new ecology of doctoral education, considering that students’ experience is framed by (and frames) what happens at the different levels of the ecological system. in bronfenbrenner’s (1979, 1986) ecology of human development, for example, curriculum can be seen as an interaction system constituted by different “nested” subsystems: microsystem (in this context, what happens within typical classes in doctoral programs), mesosystem (what happens within universities, research centres or professional industries), exosystem (educational policies regarding doctoral education), macrosystem (cultural models in a certain period, such as representations of doctoral education and the mandates of universities or the significance of professional phds) and chronosystem (changes that result from specific non-normative events, such as the bologna process in europe). this is surely a good departure point that must be reinforced with two additional features: on the one hand, the representations, contents and meanings that mark the interactions between each ecological system, and on the other, the experience of students in their journey through the doctorate. ball’s policy cycle (1994), particularly as reinterpreted by lopes and macedo (2011) is helpful in addressing the first feature. ball’s studies (1989, 1994; ball, bowe, & gold., 1992) focus on micro-political processes and the need to articulate macro and micro levels in curricular studies. lopes and macedo (2011) insist on the non-hierarchical character of the policy cycle in the field of the curriculum, emphasising the circularity of its three contexts: !gonzalez)ocampo!et!al! | f l r ! ! 27! the context of influence, i.e., of policy-producing; the context of policy text production, and the context of policy practices. this perspective assumes that the curriculum is in itself the struggle for meaning (lopes, 2012) and reveals how the context of policy practices can drive, and is driven by, the other contexts. broadly social-constructivist and situated perspectives on learning can be helpful in identifying significant features of the curriculum of doctoral education that are relevant to understanding the journey of doctoral students (e.g. brown, collins, & duguid, 1989; greeno, collins, & resnick, 1996; lave, 1988; rogoff, 1998). the influence of vygotsky (1978) is apparent in a number of alternative theorisations of learning processes, and indeed vygotsky’s social-cultural approach to learning offers the basis for rich understandings of the contextual and relational dimensions of the curricula. where learning is seen as situated, knowledge is immersed in and generated by the activities, relationships, tools, contexts and culture that occur in daily activities. this implies recognising the collective, participative and social nature of cognition (rogoff, 1998), and this emphasis on social engagement and communication has significant implications in a context where the phd has increasingly become an interactional rather than a solitary endeavour. the notion of communities of practice is of particular interest here; the development of an identity as a researcher can be clearly understood as legitimate peripheral participation through engagement in research activities within a research group or disciplinary community. this attention to how “informal” practices and messages are produced and conveyed has been taken up in the literature of learning in the workplace, and theories of social learning developed to explain workplace practices have increasingly been applied to educational settings as well (e.g. lave & wenger, 1991; billett, 2009). doctoral students, from this perspective, are situated as both learners and emerging practitioners in the discipline, increasingly inhabiting the identity and responsibilities of professional disciplinary researchers in an academic workplace (and the extent of these responsibilities varies in different national systems of doctoral education) or highly qualified and innovative professionals in a hybrid academic and professional context. indeed, an emphasis on inclusion in research communities or networks and the creation of collaborative knowledge-sharing environments appears to be a significant trend in doctoral education (johnson, lee, & green, 2000; malfroy, 2005; pyhältö, stubb, & lonka, 2009). this view of doctoral education emphasises the pluralised approach to curriculum and specifically the impact of the social context in which training take place (table 1). the “landscape” metaphor proposed by clandinin and connelly (1995) can also be useful in analysing doctoral students’ experiences as they construct their identities as researchers. within this metaphor, learning involves a double transaction (biographical and relational) that results from the relationships between people, places, and things, and this view of the “landscape” of professional development as being inherently relational (in itself made up of relations), provides a gateway for relating the study of identity to the study of curriculum (lopes & pereira, 2012). recent work on socio-material understandings of learning (e.g. fenwick, edwards, & sawchuk, 2012) suggests that the “landscape” metaphor can be extended to include all of the actors and practices present in a learning setting – social, material, technological, pedagogic, symbolic – and a close attention to their multiple, complex connections and interactions. the fact that the profile of doctoral students and doctoral programs has changed also implies that issues of identity development will also change (baker & lattuca, 2010). the consensual current distinction within curriculum theory, between “formal”, “informal” and “hidden” curricula (pacheco, 1996) seems to assume a special relevance here (see table 1). the formal curriculum refers to qualifications frameworks, course syllabi; the informal curriculum relates to what is really done through teaching and learning processes, such as readings and discussion, interactions with researchers in the context of classes; and the hidden curriculum represents the unintended learning, often in regard to class and gender roles, social expectations, etc., that emerges from structures, relationships and practices in the educational setting, revealing the pedagogy of the learning context, rather than its intended content (apple, 1971). doctoral education clearly involves codified objectives of degree programmes, as well as a complex web of structures, practices and expectations far beyond the more explicit/formal dimension. solem, hopwood, and schlemper (2011) explore what kind of events made students feel an “academic and belonging to a departmental community” (p. 10) and conclude that mostly these are “informal events [that] include conversations [and] social events” (p. 12): some doctoral students mentioned joint !gonzalez)ocampo!et!al! | f l r ! ! 28! coffee meetings or lunches as significant experiences. however, these events might be experienced very differently by different students. margolis and romero (1998) find “patterns of interaction with intended and unintended consequences that make it particularly difficult for students of color, women, and students from working-class background to survive and thrive in graduate school” (p. 2). gender relations also appeared relevant in the study by solem et al. (2011), with women expressing more extreme evaluations of support that interfered in their perception of progress in their own work; international students and non-white minorities also seem to report more troubles and feelings of isolation. margolis and romero consider apple and king’s (1977) notion of the weak (related to professionalism) and strong (related to socialisation) hidden curriculum, concluding that the formal curriculum (e.g. affirmative action policies) often contradicts these hidden dimensions at the expense of successful experiences for minority students. implicit in all of these alternative approaches to understanding learning as social and situated, is the fundamental problematisation of any notion of a stable set of knowledge and skills to be learned and assessed. guerin (2013) argues that “rhizomatic” models of knowledge structures as proposed by deleuze and guattari (1980) may be a more appropriate way to understand knowledge-content and research cultures at doctoral level: “in effect, this alternative model acts as a licence to try out new combinations of ideas. thus, a rhizomatic research culture is characterised by heterogeneity, multiplicity, proliferation, flexibility, non-linearity, connection and non-hierarchical networks” (guerin, 2013, p. 139, emphasis added). alternative conceptions of knowledge as emergent in social practices (e.g. hager, lee, & reich, 2012), socio-material assemblages (fenwick & nerland, 2014) and hybrid or interdisciplinary research fields (clausen, pohjola, sapprasert, & verspagen, 2012), all offer further possible starting points for a more nuanced analysis of the complexities of the processes and outcomes in the doctoral curriculum. however, some departmental cultures seem to emphasise the phd as a solitary endeavor which students should be able to cope individually (solem et al., 2011). in the next two sections, we turn our attention first to a more detailed discussion of the processes and experience of the doctoral curriculum, and then to the assessment of its outcomes. 3. doctoral education processes – how the curriculum is experienced the analysis of the lived curriculum of doctoral education should firstly consider doctoral students’ experiences during their candidature. recent studies suggest there is quite a high variation in how ecrs experience the doctoral study process, but there are also strong indications that good progress and satisfaction with doctoral education are more likely where candidates experience factors such as good supervisory relationships, belonging to an academic community, and/or being able to contribute new knowledge in science (ives & rowley, 2005; zhao, golde, & mccormick, 2007; overall, deane, & peterson, 2011). it is clearly difficult to identify what emerges from the formal or informal curriculum, or to distinguish formal from informal learning within student experiences. however, some results suggest that when doctoral students talk about their most meaningful experiences, they tend not to emphasise formal studies or other activities that might be seen as constituting the formal curriculum (virtanen & pyhältö 2012, vekkaila et al. 2013); anderson & anderson (2012) also indicate that the curriculum does not always work as intended. from the perspective of doctoral students, it seems, the curriculum appears undefined and lacking in focus, but further research is needed to explore specific conceptions about the curriculum and its manifestations in different contexts. a wide range of activities influences students’ experiences during their doctoral journey, these activities shed light about the different manifestations of curriculum (see table 1). the way in which curriculum is experienced goes beyond institutional policies; beliefs and expectations have a main role, which can create tensions between students’ expectations and supervisors and administrators’ perspectives about doctoral training. !gonzalez)ocampo!et!al! | f l r ! ! 29! a recent study on postdoctoral researchers (postdocs) who had already successfully completed their doctoral studies suggests that career planning should ideally have been included in their doctoral education from the beginning of the doctoral study process. these postdocs also stressed that formal study and other academic activities should have been designed with a view to supporting their future careers. these findings are in line with those of scaffidi and berman (2011) who argue that for postdocs to have the best chances of prospering in academia, industry, or elsewhere, they need to plan their future careers strategically. analysing the experiences and conceptions of post-doctoral researchers (pitcher & åkerlind, 2009) is essential to promoting their future career development after the phd; thus rethinking the curriculum of doctoral studies is vital not only from the perspective of doctoral students themselves, but also from that of higher education researchers. åkerlind argues for “varied” and “flexible” provision to enable postdocs to make “informed career decisions” (åkerlind, 2009). others have proposed a reconceptualisation of postdoctoral research pathways to produce a better “fit” between training and professional interests and skills (berman, juniper, pitman, & thomson, 2008). thus a review of the curriculum of both doctoral and postdoctoral preparation is acknowledged as an essential task. a “hybrid curriculum” model to address the connections among university, profession and workplace, is proposed by lee, brennan, and green (2009) as a way of adapting the curriculum for diverse doctoral needs. this idea has also engendered further studies reviewing the purposes of doctoral education, and taking into account the changing needs of the “knowledge economy” in academic, professional, social and labour domains. this questioning of assumed and hitherto tacit purposes has also encouraged the development of alternatives to traditional doctoral programmes, such as practitioner or professional doctorates for those who are engaged in leading practice and introducing change in tandem with their academic research (lester, 2004). utilising research on networking learning, and on students’ socialisation in disciplinary communities and in other professional fields (e.g. vaessen, van den beemt, & de laat, 2014; boden, borrego, & newswander, 2011) could also strengthen the development of interdisciplinary curriculum structures, enabling ecrs to construct and assume their professional roles taking broader labour market needs into account. studies of the academic transitions experienced by junior researchers could also deepen our understanding of the academic and professional practices needed to offer more appropriate training and support to ecrs, enabling them to make the transition from doctoral education to other careers (mcalpine & emmioğlu, 2014). where the focus is clearly on preparation for an academic career, the quality of supervision emerges as key to supporting doctoral students’ developmental processes (roulston, preissle, & freeman, 2013). in this context the supervisory relationship is of fundamental importance to how students experience the “doctoral journey” (pyhältö et al., 2012; zhao et al., 2007; mcalpine et al., 2013); students’ learning experiences and satisfaction are closely related to the nature of the relationship developed between students and supervisors, so the role of the supervisor is critical to constructive doctoral preparation (lee, 2008). solem et al. (2011) emphasise how “timely, proactive, and supportive advising and mentoring from faculty, peers, and program committees” (p. 13) are essential elements for preventing difficulties. yet the practice of supervision (and often of pedagogy more generally) only becomes a developmental focus after students have completed their thesis, thus presenting a clear obstacle to their development as future academics. as mcalpine et al. (2013) point out, this means that doctoral supervision is a long-term and collective process, and this needs to be acknowledged in the structuring of the curriculum. existing research on doctoral students reveals a high degree of variation in the experience of doctoral study processes (mcalpine & mckinnon, 2013) and further work is needed in order to understand how the curriculum shapes and influences these experiences, particularly with regard to the study of the experiences that are promoted in formal, informal and hidden curriculum and how these experiences affect students’ training as well as the role of supervisors and administrators. this could include longitudinal studies to examine how doctoral programmes are currently developing and how far this development aligns with changes in industry and the employment market. this could then inform discussions of how far the doctoral curriculum and the training of doctoral students can or should be adapted to meet the changing and multiple purposes of the phd. the academic and professional socialisation and disciplinary networking of doctoral !gonzalez)ocampo!et!al! | f l r ! ! 30! students also merit more extensive study; this remains a relatively under-researched area (anderson & anderson, 2012), despite its key importance to students and to their future careers. 4. the outcomes of doctoral education – assessment and employability the question of how the outcomes of doctoral education are assessed cannot be avoided in any discussion of the doctoral curriculum, particularly in the light of the ongoing diversification of programmes and career paths. in this section we consider two of the outcomes of doctoral education: assessment and employability. in spite of commonalties in terms of formality and structure, assessment varies significantly by discipline, country, institution, and supervisor. in addition, the “core competences” of a phd may serve both academic and non-academic careers; these multiple purposes have complex implications which are not yet fully understood, and which may not be susceptible to standardised or comprehensive solutions. the final examination is only one aspect of the complex assessment processes occurring at the doctoral level. for example, we have forms of assessment at entry to a doctorate, and ongoing assessment during the candidature. depending on the country or the disciplinary context, this may take formal shape through the marking and grading of coursework, or structural milestones such as confirmation of candidature seminars, annual reports of progress, mid-term and final seminars. informal assessment occurs throughout candidature as judgments are made by the supervisory team on the quality of writing and thinking candidates display, and peers reach verdicts on the quality of research papers submitted to journals and conferences. these various strategies vary by institution and country. for example in some systems an advisory committee additional to the supervisor/s will have an overview of the quality of the candidate’s work and progress, and meet to assess key milestones. some institutions have developed rubrics to use for assessing these various milestones. others require candidates to provide reflective essays on learning, or to develop a portfolio, or to produce a number of peer-reviewed publications prior to completion. all of these assessment strategies support the expectation of experienced examiners that the thesis they are about to examine is passable (golding, sharmini, & lazarovitch, 2014; mullins & kiley, 2002). despite the variety of formal and informal assessment strategies employed during candidature, the most common formal assessment at doctoral level is the final examination, known by a number of different names and exhibiting a wide range of types (hartley, 2000; morley, leonard, & david (2002). variations in vivas: quality and equality in british phd assessments, 2002). for example, in parts of europe and scandinavia, following examination and approval of the written thesis, the candidate publicly defends her/his thesis before an audience of academics and others. this process is in stark contrast to the uk model where the written thesis is generally examined by one internal and one or two external examiners, and then a private viva voce is held, in some cases in the presence of a neutral chair who oversees the process. while an oral examination in held in canada this is generally a semi-public affair, often with four to five supporters joining the candidate. the us model is different again: the candidate has a “committee” with whom they interact on occasions throughout their candidature, and when the supervisor thinks the candidate is ready, the committee conducts a private oral examination where the candidate “defends” the thesis. a very different model exists in australia and south africa, where the written thesis is the sole examinable item (although universities offer the option of an oral if the examiner requires one). a high level of confidentiality is maintained; the candidate does not know who the examiners are, and the examiners are generally unaware of each other’s identity, and do not discuss the work among themselves. each university has a process for bringing together the various reports into a single recommendation, as a journal editor might do with reviewers’ reports (kiley, 2009). given the diversity of approaches to assessment, in the complex settings of various approaches to curriculum in doctoral education (table 1) one particular question arises: what is being examined? when the !gonzalez)ocampo!et!al! | f l r ! ! 31! written work is examined, one could argue that it is the candidate’s demonstrated ability to be a researcher that is being assessed, judged by the quality of the research and its presentation. with the oral component, it is arguable that other qualities are being assessed, such as the candidate’s broader knowledge of the discipline, and their ability to deal with challenges to their work. however, in view of curriculum considerations and the substantial international developments in doctoral education outlined above, we suggest that there may be other assessable outcomes of the doctoral learning experience which are not yet fully developed, and are not currently the focus of formal assessment. international research in this area is in its early stages; we suggest that it is time to reconsider formal and informal types of assessment for future academic researchers, as well as for those in, or aiming for, other kinds of professional employment. future research will need to take into account the specifics of such forms of assessment in terms of the demands of different disciplines, sub-disciplines, academic and professional fields, and will also need to recognise the significance of local settings and histories. at the simplest level we are asking: are we assessing the candidate or their research? and is this assessment formal, informal or a mixture of both? finally, the question of how the curriculum of doctoral education enhances employability is of key interest, and not only from the doctoral students’ point of view. academic communities – both universities and disciplinary organisations – have increasingly been concerned to support career development for early career researchers and diversify their employment opportunities, recognising that their training is often predicated on the assumption that will pursue an academic or research-only career (åkerlind, 2005). yet there is still little reliable evidence regarding the employability and the career pathways of ecrs, particularly in relation to careers in industry and other non-academic settings, and in the increasingly international labour market. this situation calls for a clearer understanding of multiple doctoral pathways and a review of curriculum structures within doctoral education that might facilitate diverse transitions. the tacit assumption of many supervisors, also implicit in many doctoral programmes and in the popular press (economist, 2010; cyranoski et al., 2011), is that the phd is a training ground for the next generation of academics. this encourages graduates to aspire to, and apply for, academic research positions (manathunga, pitt, & critchley, 2009), though only a minority will get a position in academia. this situation constricts the scope of academic training and skill development by focusing on a narrow range of labour market possibilities, and promotes a perception that many doctoral graduates have effectively “failed”. this problem accentuates the relevance of exploring the changing relationships between university and social and professional spheres (lee et al., 2009), and ensuring that ecrs are aware of and willing to pursue options other than the academic role. this in turn requires the development of new academic cultural practices (boud & tennant, 2006) based on a much clearer understanding of the ‘fit’ between the doctoral curriculum and the doctoral labour market. 5. developing a research agenda this paper has explored some emerging themes in doctoral education from a curricular perspective. this focus on the curriculum is significant not only because it might help to uncover existing tensions, but also because it allows us to face and reinterpret current challenges to doctoral education by undertaking intentional and meaningful change, especially in terms of the processes and outcomes of doctoral education. whilst recognising that knowledge and practices in this field are situated in historical and cultural contexts, we suggest here a number of possible themes for a future research agenda: 1. the diversity of training programmes developed for researchers around the world calls for a review. we need to improve our understanding of the historical context of current curriculum models and their impact on the training and experience of doctoral students and ecrs. 2. despite the extensive research already conducted on the changes in doctoral education, in terms of public policy, internationalisation, formats, etc. there is a need for more research on how these changes are being dealt with at the level of the formal, the informal and the hidden curriculum. !gonzalez)ocampo!et!al! | f l r ! ! 32! 3. in order to avoid the unintended and perverse reproduction of inequalities, we need to explore the central role of departmental cultures and practices (involving both weak and strong elements of the hidden curriculum) in the integration and progression of doctoral students, and the diverse ways in which these are perceived by students from different backgrounds. 4. networking and professional socialisation have become increasingly important strategies in the development of doctoral students as researchers. these elements need to be explored as part of the doctoral curriculum, and supported by research on the roles of communities of practice and networks in supporting the construction of early career researchers’ identity. 5. in the light of the issues addressed in this paper, there is clearly a need for more research on the process of “becoming a supervisor”, and a review of the training and support available to doctoral supervisors and examiners. 6. assessment is a core curricular process in doctoral education, and yet there is very little research evidence on assessment practices (compared to, for example, the extensive literature on assessment in undergraduate education). our understanding of assessment needs to incorporate critical analysis of formal and informal practices and the variety of purposes which they fulfil. the fluidity of the “knowledge economy” presents new challenges to traditional forms of assessment, raising the possibility of replacing or extending traditional examinations with more flexible assessment models more appropriate to the diversity of ecrs’ academic and professional futures. 7. the current evidence on the destinations of ecrs illustrates the need for further research on the new relationships developing between universities and the labour market. from an international perspective there is a lack of evidence on the employability, career aspirations and mobility of ecrs, particularly those who do not follow academic careers. 8. the new demands of the labour market suggest a need to address the competencies of ecrs and a critical appraisal of the career pathways enabled through doctoral and postdoctoral education. this paper has been shaped very much by the interests and experiences of its diverse group of authors, and we recognise that consequently, any proposed research agenda is likely to be partial and incomplete. we welcome further discussion of the themes raised here and wider contributions to this important debate. !gonzalez)ocampo!et!al! | f l r ! ! 33! keypoints the phd has become a “global academic passport”, although doctoral education practices are increasingly diverse; we argue for the need of an explicit discussion of what constitutes the “doctoral curriculum”, including its formal, informal and hidden dimensions. review of the doctoral curriculum should consider how phd students experience the curriculum, including identity as researchers, supervision, insertion in research networks, and the role of departmental cultures. review of the doctoral curriculum requires further research on assessment practices and the preparation of supervisors and examiners, and a consideration how these can be improved. review of the doctoral curriculum needs to take account of the multiple purposes of the phd and the divergent professional pathways of doctoral graduates, both inside and outside the academy. acknowledgments this work was funded (in part) by national funds through the fct – fundação para a ciência e a tecnologia (portuguese foundation for science and technology) within the strategic project of ciie, with the ref. “pest-oe/ced/ui0167/2014”. references åkerlind, g. s. (2005). postdoctoral researchers: roles, functions and career prospects. higher education research and development, 24(1) 21-40. åkerlind, g. s. (2009). postdoctoral research positions as preparation for an academic career. international journal for researcher development, 1(1), 84-96. anderson, s., & anderson, b. (2012). preparation and socialization of the education professoriate: narratives of doctoral student-instructors. international journal of teaching and learning in higher education, 24(2), 239–251. apple, m. w. (1971). the hidden curriculum and the nature of conflict. interchange, 2(4), 27–40. apple, m. w., & king, n. r. (1977). what do schools teach?. in r. h. weller (ed.), humanistic education (pp. 29-63). berkeley, ca: mccutchan. baker, v. l., & lattuca, l. r. (2010). developmental networks and learning: toward an interdisciplinary perspective on identity development during doctoral study. studies in higher education, 35(7), 807827. ball, s. (1989). la micropolítica de la escuela: hacia una teoria de la organización escolar. barcelona: paidós. ball, s. (1994). education reform: a critical and post-structural approach. buckingham: open university press. ball, s., bowe, r., & gold, a. (1992). reforming education and changing school: case studies in policy sociology. london and new york: routledge. berman, j., juniper, s., pitman, t., & thomson, c. (2008). reconceptualising post-phd research pathways: a model to create new postdoctoral positions and improve the quality of postdoctoral training in australia. australian universities' review, 50(2), 71-77. biggs, j. (2003). teaching for quality learning at university: what the student does (2nd ed.). buckingham: srhe and open university press. billett, s. (2009). conceptualizing learning experiences: contributions and mediations of the social, personal and brute. mind, culture and activity, 16(1), 32-47. !gonzalez)ocampo!et!al! | f l r ! ! 34! boden, d., borrego, m., & newswander, l. k. (2011). student socialization in interdisciplinary doctoral education. higher education, 62(6), 741–755. boud, d., & lee, a. (eds.). (2009). changing practices of doctoral education. abbingdon: routledge. boud, d., & tennant, m. (2006). putting doctoral education to work: challenges to academic practice. higher education research & development, 25(3), 293–306. brennan, j., edmunds, r., houston, m., jary, d., lebeau, y., osborne, m., & richardson, j. t. e. (2009). improving what is learned at university: an exploration of the social and organisational diversity of university education. london: routledge. brew, a., & peseta, t. (2004). changing postgraduate supervision practice: a programme to encourage learning through reflection and feedback. innovations in education and teaching international, 41(1), 5-22. bronfenbrenner, u. (1979). the ecology of human development: experiments by nature and design. cambridge, ma: harvard university press. bronfenbrenner, u. (1986). ecology of the family as a context for human development: research perspectives. developmental psychology, 22(6), 723–742. brown, j. s., collins, a., & duguid, p. (1989). situated cognition and the culture of learning. educational researcher, 18(1), 32-41. calma, a., & davies, m. (2015). studies in higher education 1976-2013: a retrospective using citation network analysis. studies in higher education, 40(1), 4-21. clandinin, d. j., & connelly, f. m. (1995). teacher’s professional knowledge landscapes. new york: teachers college press. clausen, t., pohjola, m., sapprasert, k., & verspagen, b. (2012). innovation strategies as a source of persistent innovation. industrial and corporate change, 21(3), 553–585. colley, h., hodkinson, p., & malcolm, j. (2003). formality and informality in learning: a report for the learning and skills research centre. london: lsrc. cyranoski, d., gilbert, n., ledford, h., nayar, a., & yahia, m. (2011). the phd factory. nature, 472, 276279. deem, r., & o’brehony, k. (2000). doctoral students’ access to research cultures: are some more equal than others?. studies in higher education 25 (2) 149-165. deleuze, g., & guattari, f. (1980). mille plateaux: capitalisme et schizophrénie. paris: minuit. economist (2010). the disposable academic: why doing a phd is often a waste of time. the economist, dec 16. enders, j. (2002). serving many masters: the phd on the labour market, the everlasting need of inequality, and the premature death of humboldt. higher education, 44, 493-517. enders, j. (2004). research training and careers in transition: a european perspective on the many faces of the ph.d. studies in continuing education, 26(3), 419–429. eraut, m. (2000). non-formal learning, implicit learning and tacit knowledge. in f. coffield (ed.), the necessity of informal learning (pp. 12-31). bristol: policy press. eua (2007). doctoral programmes in europe’s universities: achievements and challenges. brussels: european universities association. fenwick, t., edwards, r., & sawchuk, p. (2012). emerging approaches to educational research: tracing the socio-material. london: routledge. fenwick, t., & nerland, m. (eds.). (2014). reconceptualising professional learning: sociomaterial knowledges, practices and responsibilities. london: routledge. gardner, s. k. (2007). ‘i heard it through the grapevine’: doctoral student socialization in chemistry and history. higher education, 54, 723–740. golde, c. m. (2005). the role of department and discipline in doctoral student attrition: lessons from four departments. journal of higher education, 76(6), 669–700. golding, c., sharmini, s., & lazarovitch, a. (2014). what examiners do: what thesis students should know. assessment & evaluation in higher education, 39(5), 563-576. greeno, j. g., collins, a. m., & resnick, l. b. (1996). cognition and learning. in d. c. berliner & r. c. calfee (eds.), handbook of educational psychology (pp. 15-45). new york: macmillan. !gonzalez)ocampo!et!al! | f l r ! ! 35! guerin, c. (2013). rhizomatic research cultures, writing groups and academic researcher identities. international journal of doctoral studies, 8, 137–150. hager, p., lee, a., & reich, a. (eds.). (2012). practice, learning and change: practice-theory perspectives on professional learning. springer. hartley, j. (2000). nineteen ways to have a viva: appendix 2. psypag quarterly newsletter, 35, 22-28. ives, g., & rowley, g. (2005). supervisor selection or allocation and continuity of supervision: ph.d. students' progress and outcomes. studies in higher education, 30(5), 535–555. johnson, l., lee, a., & green, b. (2000). the phd and the autonomous self: gender, rationality and postgraduate pedagogy. studies in higher education, 25(2), 135-147. jones, m. (2013). issues in doctoral studies: forty years of journal discussion. where have we been and where are we going?. international journal of doctoral studies, 8(6), 83–104. kiley, m. (2009). rethinking the australia doctoral examination process. australian universities' review, 51(2), 32-41. kiley, m. (2014). coursework in australian doctoral education: what’s happening, why and future directions? final report. sydney: office for learning and teaching. kot, f. c., & hendel, d. d. (2012). emergence and growth of professional doctorates in the united states, united kingdom, canada and australia: a comparative analysis. studies in higher education, 37(3), 345-364. lave, j. (1988). cognition in practice: mind, mathematics, and culture in everyday life. cambridge, uk: cambridge university press. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge: cambridge university press. lee, a. (2008). how are doctoral students supervised? concepts of doctoral research supervision. studies in higher education, 33(3), 267-281. lee, a., brennan, m., & green, b. (2009). re-imagining doctoral education: professional doctorates and beyond. higher education research & development, 28(3), 275–287. lester, s. (2004). conceptualizing the practitioner doctorate. studies in higher education, 29(6), 757–770. lopes, a. c. (2012). a qualidade da escola pública: uma questão de currículo?. in m. taborda, l. faria filho, f. viana, n. fonseca, & r. lages (orgs.), a qualidade da escola pública no brasil (pp. 13-29). belo horizonte: mazza edições. lopes, a. c., & lópez, s. b. (2010). a performatividade nas políticas de currículo: o caso do enem. educação em revista, 26(1), 89-110. lopes, a. c., & macedo, e. (2011). contribuições de stephen ball para o estudo de políticas de currículo. in s. ball & j. mainardes (orgs.), políticas educacionais: questões e dilemas (pp. 249-283). são paulo: cortez. lopes, a., & pereira, f. (2012). everyday life and everyday learning: the ways in which pre-service teacher education curriculum can encourage personal dimensions of teacher identity. european journal of teacher education, 35(1), 17-38. malfroy, j. (2005). doctoral supervision, workplace research and changing pedagogic practices. higher education research & development, 24(2), 165-178. manathunga, c., pitt, r., & critchley, c. (2009). graduate attribute development and employment outcomes: tracking phd graduates. assessment & evaluation in higher education, 34(1), 91–103. margolis, e., & romero, m. (1998). the department is very male, very white, very old, and very conservative: the functioning of the hidden curriculum in graduate sociology departments. harvard educational review, 68(1), 1-33. mcalpine, l., & emmioğlu, e. (2014). navigating careers: perceptions of sciences doctoral students, postphd researchers and pre-tenure academics. studies in higher education, 1–17. mcalpine, l., & mckinnon, m. (2013). supervision the most variable of variables: students perspectives. studies in continuing education, 35(3), 265-280. mcalpine, l., amundsen, c., & turner, g. (2013). identity trajectory: reframing early career academic experience. british educational research journal, 40(6), 952-969. !gonzalez)ocampo!et!al! | f l r ! ! 36! mcalpine, l., paulson, j., gonsalves, a., & jazvac-martek, m. (2012). untold doctoral stories in the social sciences: can we move beyond cultural narratives of neglect?. higher education research and development, 31(4), 511–523. mills, d., & paulson, j. (2014). making social scientists, or not? glimpses of the unmentionable in doctoral education. learning and teaching, 7(3), 73-97. morley, l., leonard, d., & david, m. (2002). variations in vivas: quality and equality in british phd assessments. studies in higher education, 27(3), 263-273. mullins, g., & kiley, m. (2002). it's a phd, not a nobel prize: how experienced examiners assess research theses. studies in higher education, 27(4), 369–386. nelson, r. (2006). practice-as-research and the problem of knowledge. performance research, 11(4), 105116. overall, n. c., deane, k. l., & peterson, e. r. (2011). promoting doctoral students research self-efficacy: combining academic guidance with academic support. higher education research & development, 30(6), 791–805. pacheco, j. a. (1996). currículo: teoria e praxis. porto: porto editora. pearson, m., evans, t., & macauley, p. (2008). growth and diversity in doctoral education: assessing the australian experience. higher education, 55(3), 357-372. pearson, m., kiley, m., evans, t., macauley, p., palmer, n., & pike, m. (2010). pathways to the phd in australia: a symposium. in m. kiley (ed.), quality in postgraduate research: educating researchers for the 21st century (p. 285). adelaide sa: cedam, the anu. pitcher, r., & åkerlind, g. s. (2009). post-doctoral researchers’ conceptions of research: a metaphor analysis. international journal for researcher development, 1(2), 160-172. print, m. (1987). curriculum development and design. sydney: allen & unwin. pyhältö, k., vekkaila, j., & keskinen, j. (2012). exploring the fit between doctoral students and ‘supervisors’ perceptions of resources and challenges vis-a-vis the doctoral journey. international journal of doctoral studies, 7, 395–414. pyhältö, k., stubb, j., & lonka, k. (2009). developing scholarly communities as learning environments for doctoral students. international journal for academic development, 14(3), 221-232. qaa. (2011). doctoral degree characteristics. the quality assurance agency for higher education. http://www.qaa.ac.uk/en/publications/documents/doctoral_characteristics.pdf rogoff, b. (1998). cognition as a collaborative process. in w. damon, d. khun, & r. s. siegler (eds.), handbook of child psychology (5th ed., vol. 2) (pp. 679–743). new york: wiley. roulston, k., preissle, j., & freeman, m. (2013). becoming researchers: doctoral students’ developmental processes. international journal of research & method in education, 36(3), 252-267. scaffidi, a. k., & berman, j. e. (2011). a positive postdoctoral experience is related to quality supervision and career mentoring, collaboration, networking and a nurturing research environment. higher education, 62(6), 685–698. solem, m. n., hopwood, n., & schlemper, b. (2011). experiencing graduate school: a comparative analysis of students in geography programs. professional geographer, 63(1), 1-17. thesis whisperer blog http://thesiswhisperer.com/ accessed 19 june 2015. trafford, v., & leshem, s. (2009). doctorateness as a threshold concept. innovations in education and teaching international, 46(3), 305-316. vaessen, m., van den beemt, a., & de laat, m. (2014). networked professional learning: relating the formal and informal. frontline learning research, 5, 56-71. vekkaila, j., pyhältö, k., & lonka, k. (2013). focusing on doctoral students’ experiences of engagement in thesis work. frontline learning research, 1(2), 10–32. virtanen, v., & pyhältö, k. (2012). what engages doctoral students in biosciences in doctoral studies?. the psychologist, 3(12a), 1231–1237. vitae (uk) researcher development framework. retrieved from: https://www.vitae.ac.uk/researchersprofessional-development/about-the-vitae-researcher-development-framework-planner accessed 19 june 2015. !gonzalez)ocampo!et!al! | f l r ! ! 37! vygotsky, l. s. (1978). mind in society: the development of higher psychological processes. london: harvard university press. walker, g., golde, c., jones, l., bueschel, a., & hutchings, p. (2008). the formation of scholars: rethinking doctoral education for the twenty-first century. san fransisco: jossey-bass. wellington, j. (2013). searching for 'doctorateness'. studies in higher education, 38(10), 1490-1503. zhao, c-m., golde, c. m., & mccormick, a. c. (2007). more than a signature: how advisor choice and advisor behaviour affect doctoral student satisfaction. journal of further and higher education, 31(3), 263–281. gonzalez(ocampo-et-al | f l r ! 38! table 1 dimensions of various approaches to curriculum, and the specific themes/questions arising from them dimensions in 1 dimensions in 2 arising themes/questions 1 formal – 2 informal refers to qualifications frameworks, course syllabi aims and learning outcomes defined activities: workshops, supervision, seminars, conferences -includes regulations for candidature -e.g. affirmative action policies relates to what is really done through teaching and learning processes, such as readings and discussion, interactions with researchers in the context of classes. -activities: peer interaction, dialogues in academic community -impact of the social context in which training take place the role of academic practices in learning outcomes: -peer learning -social media (e.g. thesis whisperer blog) -departmental practices (e.g. golde, 2005) disciplinary networking (e.g. deem & brehony 2000) allocation of teaching duties/other work professional conventions/ expectations in particular subject areas 1 open – 2 hidden refers to such contents in doctoral training that are defined but variable in individual level, e.g., prescribed reading, research methods provision, seminars etc. which doctoral candidates are expected to attend. -learners’ degree of self-direction and the social context in which training take place embedded refers to unintended learning, often in regard to class and gender roles, social expectations, etc., that emerges from structures, relationships and practices in the educational setting, revealing the pedagogy of the learning context, rather than its intended content (apple 1971) what is students’ role (active/passive) in developing their doctoral training? -departmental practices (e.g. mills & paulson 2014) -dyadic dynamics in the supervisory relationship (including gender etc., plus reputational/prestige issues which are very intangible) (e.g. mcalpine & mckinnon, 2013, johnson et al., 2000) 1 standardised – 2 pluralized refers to systems such as phd programmes (i.e. with prescribed taught elements preceding thesis), and also skills programmes, e.g. vitae researcher development framework. -an inflexible system -intended learning outcomes laid down in policy documents (e.g. qaa) refers to a highly complex set of structures, practices and expectations from which doctoral students and their supervisors create new and unpredictable learning. -flexible -impact of the social context in which training take place -learners’ degree of embedded self-direction the purpose of doctoral degrees in relation to working career and employment (enders, 2004) publication larmuseau et al frontline learning research vol.7 no. 2 (2019) 57 74 issn 2295-3159 combining physiological data and subjective measurements to investigate cognitive load during complex learning charlotte larmuseauaa, jan cornelisb, piet desmet a & fien depaepe a a itec, imec research group at ku leuven, etienne sabbelaan 51, kortrijk, belgium b imec, kapeldreef 75, leuven, belgium article received 27 august 2018 / revised 2 december / accepted 2 may / available online 10 may abstract cognitive load theory is one of the most influential theoretical explanations of cognitive processing during learning. despite its success, attempts to assess cognitive load during learning have proven difficult. therefore, in the current study, students’ self-reported cognitive load after the problemsolving process has been combined with measures of physiological data, namely, electrodermal activity (eda) and skin temperature (st) during the problem-solving process. data was collected from 15 students during a high and low complex task about learning and teaching geometry. this study first investigated the differences between subjective and physiological data during the problemsolving process of a high and low complex task. additionally, correlations between subjective and physiological data were examined. finally, learning behavior that is retrieved from log-data, was related with eda. results reveal that the manipulation of task complexity was not reflected by physiological data. nevertheless, when investigating individual differences, eda seems to be related to mental effort. keywords: cognitive load; physiological data; electrodermal activity; skin temperature; complex learning info corresponding author: rcharlotte.larmuseau@kuleuven.be doi: 10.14786/flr.v7i2.403 1. introduction as society and work environments become more complex it is increasingly relevant that learning environments mirror this complexity of the real world (jonassen, 2000; kirschner, ayres & chandler, 2011; merrill, 2009; van merriënboer, kirschner & kester, 2003). nevertheless, a risk of complex learning environments is that the cognitive load imposed by the complex learning tasks is often excessive (larmuseau, elen & depaepe, 2018; van merriënboer & sluijsmans, 2009). this phenomenon can be explained by cognitive load theory (clt) introduced by sweller (1994). clt uses current knowledge about the human cognitive architecture as a baseline to develop the instructional design for complex learning environments (martin, 2014). clt distinguishes three types of cognitive load, intrinsic, extraneous and germane load (brunken, plass & leutner, 2003; paas, tuovinen, tabbers & van gerven, 2010; sweller, 2010). the level of intrinsic load is assumed to be determined by the complexity of the task or learning material and cannot be directly altered by the instructional designer. extraneous load is mainly imposed by instructional procedures that are suboptimal, whereas germane load refers to the learners’ working memory resources available to deal with the complexity of the task or learning material (sweller, 2010). both extraneous and germane load can by facilitated by the instructional designer. an instructional designer should find a balance between keeping the matter sufficiently challenging but still within the cognitive capacities of the learner. exceeding learners’ cognitive capacities can induce cognitive overload which could hamper learning. specifically, this means that when the content is very complex due to high element interactivity (i.e., the amount of interrelations between knowledge, procedures, formulas etc.) which affects intrinsic load, instructional designers should keep extraneous load to a minimum (e.g., by providing clear instructions, provide embedded support) and subsequently foster germane load (kirschner, kester & corbalan, 2011; sweller, 2010). in order to align the instructional design with students’ cognitive abilities, we should be able to measure cognitive load during complex learning. former studies investigated cognitive load by using subjective measurements such as self-reported questionnaires (boekaerts, 2017; zheng & cook, 2012). those self-reported questionnaires have some important disadvantages (e.g., subjective measures, assumption of constant workload capacity, see section 2.2 ; deleeuw & mayer, 2008; raaijmakers, baars, schaap, paas & van gog, 2017; spanjers, van gog & van merriënboer, 2012). as a result, more researchers show interest in using objective, real-time measures. physiological measures provide objective data and can be unobtrusively collected while dealing with a task or learning material. moreover, physiological data might provide an indication of changes in cognitive functioning throughout the process of solving a task (boekaerts, 2017). former studies already indicated that electrodermal activity (eda) and skin temperature (st) can be linked to different levels of task complexity (haapalainen, kim, forlizzi & dey, 2010; nourbakhs, wang, chen & calvo, 2012; shi, ruiz, taib, choi & chen, 2007). nevertheless, it is unclear whether these physiological measures are related to self-reported intrinsic load, extraneous load, germane load and the overall mental effort during complex problem solving (leppink, paas, van der vleuten, van gog & van merriënboer, 2013). therefore, in the current study, a high and low complex task was developed relating to the learning and teaching of geometry. the complexity of the task was manipulated by increasing the element interactivity for the high complex task (sweller, 2010). in both tasks the same amount of support was provided. data was retrieved using self-reported questionnaires to measure students’ experienced intrinsic load, extraneous load, germane load and mental effort. this distinction between the different types and mental effort was made because the different types of cognitive load concerns mental load induced by task complexity and instructional design, whereas mental effort invested covers the overall amount of cognitive processing for a particular task (paas et al., 2003). the subjective measures were combined with physiological data through wrist-worn wearables containing both eda and st. the purpose of this study was threefold. first, we investigated differences in the experienced cognitive load and the physiological data while solving a high and low complex task. secondly, we examined whether individual differences of subjective measurements are related to individual differences of physiological data for the high and low complex task. finally, we described whether peaks (i.e., eda) and/or drops (i.e., st) of physiological data are related to specific events (e.g., consultation of support) that took place during the problem solving process. 2. theoretical framework 2.1 cognitive load theory clt is concerned with the instructional implication of the interaction between the complexity and instructional design of the learning material and human cognitive architecture (sweller, 2010). basically, the human cognitive architecture consists of an effectively unlimited long-term memory, which interacts with a working memory that has limited processing capacity (kirschner et al., 2011; sweller, 1994). long-term memory contains cognitive schemata that are used to store and organize knowledge. learning occurs when information is successfully processed in working memory and when new schemas are created or incorporated into consisting schemas in the long-term memory. as the processing capacity of the working memory is so limited, overcoming individual working memory limitations by instructional manipulations has been the main focus of clt (sweller, van merriënboer & paas, 1998). cognitive load can be defined as a multidimensional construct representing the load that performing a particular task, imposes on the learners’ cognitive system (paas et al., 2010). clt claims that the cognitive load that learners experience can be intrinsic, extraneous or germane (sweller, 2010). the level of intrinsic load for a particular task is assumed to be determined by the inherent difficulty of a certain topic and the level of element interactivity of the learning material in relation a student’s prior knowledge. the more elements that interact, the more intrinsic processing is required for coordinating and integrating the material and the higher the working memory load (de leeuw & mayer, 2008; sweller, 2010). working memory load is not only imposed by the intrinsic complexity of the material that needs to be learned, it can also be imposed by the instructional design. for instance, unclear instructional procedures can impose extraneous load. extraneous processing means that the learner engages in cognitive processing that does not support the learning objective (de leeuw & mayer, 2008; glogger-frey, gaus & renkl, 2017; van merriënboer & sluijsmans, 2008; sweller, 2010). instructional design techniques that reduce extraneous load (e.g., fading support) should ensure that students devote less attention to irrelevant aspects of the task. subsequently, more cognitive capacity can be allocated to the actual learning objective (ciernak, scheiter & gerjets, 2009; mayer & moreno, 2010; sweller, ayres & kalyugo, 2011). meanwhile, intrinsic and extraneous load depend on the characteristics of the learning tasks or the instructional design, germane load is more concerned with the cognitive characteristics of the learner. more specifically, it refers to the working memory resources that are available to engage in knowledge elaboration processes and argumentation (sweller, 2010). accordingly, in order to optimize learning, learning tasks should be aligned with the learner’s cognitive capabilities (schmeck, opfermann, van gog, paas & leutner, 2015; sweller, 2010). measuring cognitive load during complex learning should provide more insight into how to align instructional design with students’ cognitive capabilities. 2.2 subjective measurements of cognitive load self-reports for measuring cognitive load are subjective measurements consisting of unidimensional and multidimensional scales. unidimensional subjective rating scales have been used intensively in research and have been identified as reliable and valid estimators of cognitive load (boekaerts, 2017; chang & yang, 2010; leppink et al., 2013; paas, 2003). the paas’s nine-point mental effort rating scale has been most frequently used in cognitive load research (chen et al., 2016; paas, 1992). paas’s nine-point mental effort rating scale requires learners to rate their mental effort immediately after completing a task (paas, 1992). mental effort refers to the cognitive capacity that is allocated to accommodate the demands imposed by a task (paas et al., 2003). according to paas, learners can introspect the amount of mental effort invested during a learning task. subsequently, paas claims that the learner’s assessment can be used as an index of overall cognitive load (chen et al., 2016). nevertheless, this unidimensional scale gives little insight into the influence of the complexity of the task and the influence of the instructional design on cognitive load (boekaerts, 2017; de bruin & van merriënboer, 2017; klepsch, schmitz & seufert, 2017; leppink et al., 2013). accordingly, leppink et al. (2013) and klepsch et al. (2017), developed a subjective cognitive load scale in which they used multiple items for each type of cognitive load in order to get more specific information about intrinsic load, extraneous load and germane load. despite the frequent use of self-reported scales to assess cognitive load, some critiques have been raised. firstly, subjective measurements are based on the assumption that students are able to introspect on their cognitive processes and accordingly are able to self-report on their experienced cognitive load (boekaerts, 2017; schmeck et al., 2015). secondly, as subjective scales are often administered after the learning task, subjective scales do not capture variations in load over time. taking into account these limitations, it might be more interesting to combine subjective measurements with real-time objective cognitive load information (boekaerts, 2017; chen et al., 2016; zheng & cook, 2012). 2.3 physiological measures of cognitive load the physiological approach for cognitive load measurement is based on the assumption that any change in the human cognitive functioning is reflected in the human physiology. subsequently, in contrast to subjective measurements, physiological measures are continuous and measured at a high frequency (e.g., every second) and with a high precision (chen et al., 2016). given the close relationship between cognitive load and neural systems, human neurophysiological signals are seen as promising avenues to measure cognitive load (boekaerts, 2017; chen et al., 2016). former research has investigated the relationship between learners’ cognitive load and their physiological behaviour. the physiological measures that have been used to investigate cognitive load are among others heart rate by electrocardiography (ecg), brain activity by electroencephalography (eeg), eye activity (e.g., blink rate, pupillary dilation), eda, heat flux and st (antonenko, paas, grabner & van gog, 2010; haapalainen et al. 2010; scharinger, soutschek, schubert & gerjets, 2015; smets et al., 2018; zagermann, pfeil & reiterer, 2016). although a lot of physiological data, such as brain and eye activity, has been proven to be highly effective for measuring cognitive load, these types of physiological data often requires expensive sophisticated equipment that is highly obtrusive in measuring cognitive activities, especially in ecological valid contexts (chen et al., 2016; scharinger et al., 2015). possible solutions to collect physiological data in an unobtrusive way is by means of wrist-worn wearables. these wearables can easily capture different physiological data such as eda and st and are less expensive compared to more sophisticated measures of physiological data (chen et al., 2016). eda involves measuring the electrical conductance of the skin through sensors attached to the wrist. skin conductivity varies with changes in skin moisture level (i.e., sweating) and can reveal changes in the sympathetic nervous system (sns). the slowly changing part of the eda signal is called the skin conductance level (scl) and is a measure of psychophysiological activation. scl can vary substantially between and within individuals. a fast change in the eda signal (i.e., a peak) occurs in reaction to a single stimulus and is called galvanic skin response (gsr; braithwaite, watson, jones & rowe, 2013). research has linked gsr variation to stress and sns arousal. as a person becomes more or less stressed, the gsr increases or decreases respectively (hoogerheide, renkl, logan, paas & van gog, 2019; liapis, katsanos, sotiropoulos, xenos & karousos, 2015, smets et al., 2018). additionally, research has also linked gsr readings to cognitive activity, claiming gsr responses increase when more cognitive load is experienced (ikehara & crosby, 2005; nourbakhs et al, 2012; setz et al., 2010; shi et al., 2007, yousoof & sapiyan, 2013). the study of nourbakhs, wang, chen and calvo (2015) captured gsr data of 13 and 16 participants from different reading and arithmetic tasks. the arithmetic tasks contained four difficulty levels, whereas the reading task contained three difficulty levels. results of anova indicated that both mean gsr and accumulated gsr yielded significantly different results throughout different task difficulty levels. shi et al. (2007) investigated 11 subjects when dealing with four tasks divided in four distinct levels of cognitive load. results revealed insignificant differences across the interactive models for mean gsr, but significant differences when using accumulated gsr. yousoof and sapiyan (2013) investigated whether cognitive load could be detected by mean eda. in this experiment 7 subjects had to solve three different programming tasks that were different in terms of complexity. yousoof and sapiyan found no conclusive results for mean gsr, indicating that the variation among the subjects was very different during one task. in addition to eda, st can also reflect changes in sns. research claims that acute stress triggers peripheral vasoconstriction, causing a rapid, short-term drop in skin temperature. moreover, stress can also cause a more delayed skin warming, providing two opportunities to quantify stress (herborn et al., 2015; karthikeyan, murugappan & yaacob, 2012; shusterman, anderson & barnea, 1997; smets et al., 2018; vinkers, et al., 2013). little research has used st to assess cognitive load. nevertheless, the study of haapalainen et al. (2010) investigated the cognitive load of 20 subjects through gsr and heat flux data (i.e., rate of heat transfer). the subjects had to solve six elementary cognitive tasks that differed in difficulty. afterwards, haapalainen et al. (2010) evaluated the performance of each of the features in assessing cognitive load using personalised machine learning techniques (i.e., naïve bayes classifier). results indicated that they did not obtain satisfactory results for gsr. by contrast, they did find that across all participants heat flux was shown to be an indicator of differences in cognitive load. the findings of former studies indicate that eda and st can indicate differences in cognitive load, but none of these studies related physiological data with self-reported cognitive load. 2.4 research aims to conclude, physiological measures have some important advantages when compared to subjective measurements. these measures are more objective (i.e., not dependent on students’ perceptions), multidimensional (i.e., different physiological measures are sensitive to different cognitive processes), unobtrusive (i.e. no additional requirements), implicit (i.e., collect data while students are working on their tasks) and continuous (i.e. provide information of cognitive processes during learning). nevertheless, it can be difficult to interpret physiological data. therefore, it would be interesting to investigate whether there is a relationship between subjective measurements of cognitive load and physiological data. the following research questions are formulated: • rq1: does the manipulation of the level of complexity of a task, based on element interactivity, result in differences in perceived cognitive load and mental effort when controlled for prior knowledge? • rq2: does the manipulation of the level of complexity of a task, based on element interactivity, result in differences in physiological data, when controlled for prior knowledge? • rq3: is there a relationship between individual differences in self-reported data and individual differences of physiological data for a high and low complex task? rq4: is there a relationship between the physiological data of one learner and his/her interactive behaviour during the problem solving process? 3. methodology 3.1 participants and study design participants were 15 future primary school teachers of which ten were female and five male (age between 18-24). all participants were first year bachelor students (i.e., second semester). the study was highly ecologically valid as the study was orchestrated by the students’ lecturer of the teaching mathematics course unit. moreover, the intervention was integrated into the students’ study program (i.e., primary school teacher training). the intervention consisted of a within-subject design and was conducted online in the moodle learning management system (lms). the intervention took place in the auditorium of their faculty where students could solve the tasks individually on their own computer among their fellow students. this session was supervised by their lecturer and a researcher. students first received an online questionnaire of which the timeframe (+/five min.) to complete the first questionnaire was used as an adaption period in order to stabilize the wearable signals (i.e., baseline measurement). next, all students had to solve a high complex and a low complex task on preparing a lesson in geometry as shown in figure 1. in order to control for order effects, (a) half of the subjects were exposed to the high complex task during the first session and the low complex task during the second session, whereas for (b) the other half, the sequence was vice versa. more specifically, eight students started with the high complex task and seven students started with the low complex task. 3.2 high and low complex tasks the high and low complex tasks were developed in moodle lms. the scope of both tasks was designing a lesson preparation on the circumference of a circle for primary school children. this subject matter was not yet covered in previous lessons. both tasks contained six elements where both aspects of pedagogical content knowledge; pck (i.e., inductive teaching strategy, choose teaching materials to support your lesson, aligning the topic of the lesson with the flemish curriculum and integration of differentiation in your lesson in the classroom) and content knowledge; ck (i.e., formula of the circumference of the circle) were addressed. the complexity of the high complex task was manipulated based on element interactivity (sweller, 2010). in the high complex task students had to coordinate and integrate six elements consisting of ck and pck in order to write a course preparation about the circumference of the circle, whereas the low complex task consisted of six questions where each element was addressed separately (see figure 1). during both problems, the same support consisting of procedural and supportive information was provided. an example of procedural information can also be found in figure 1 in the second box. procedural information is provided just-in-time and concise. supportive information is much more comprehensive and is comparable to the background theory. both procedural and supportive information can be consulted by clicking on the words in italics. figure 1. high complex task, question of the low complex task and an example of the procedural information 3.3 students’ prior knowledge information about students’ prior knowledge was gathered in the first semester during their examination. students were tested on their knowledge of pck (mean = 63.5%, sd = 19.7) and ck (mean = 72.2%, sd = 27.8). content was (teaching) mathematics in general and geometry in particular. examples of test-items can be found in figure 2. all tests were corrected by the instructor of the course unit. we have no insight into the prior knowledge of one student who participated in the study, which means that we can include an indicator of prior knowledge for 14 students in the analysis. figure 2: example questions of the prior knowledge test 3.4 subjective measurements for the measurement of cognitive load a validated instrument developed by leppink et al. (2013) was used for the measurement of intrinsic, extraneous and germane load. the questionnaire was translated into the specific context of the present study as shown in table 1. the questionnaire consisted of a 7-point likert scale (i.e., ranging from “totally disagree” to “totally agree”). reliability was determined through cronbach’s α in order to investigate the overall consistency of the constructs (schreiber, nora, stage, barlow & king, 2006). confirmatory factor analysis (cfa) was not conducted due to the small sample size, but former research has validated the questionnaire and has proven that the questionnaire is reliable (leppink et al., 2013). additionally, the paas’s nine-point mental effort rating scale was added to the questionnaire (paas, 1992). table 1 survey items and reliability of the constructs 3.5 physiological data to measure physiological data including eda and st, 15 students were monitored with wrist-worn wearables as shown in figure 2. these wearables were able to sense gsr with a high dynamic range (.05-20µs) at the lower side of the wrist and the output was accurate within a frame of approximately 1 second. st was acquired at the upper side of the wrist at a frequency of 32 hz and the output was accurate within a frame of approximately 1 second at 0.1 °c. before analysing the physiological data, a number of procedures were carried out. firstly, a confidence indicator (ci), with values ranging from 0 to 1, monitors whether the sensor is correctly attached to the body. values of ci lower than .80 were ignored as this indicates low quality of the data due to incorrect sensor attachment (+/.01% per individual). secondly, visual analysis of the signal was conducted for both eda and st. artefacts were removed 20s before and after the artefact and an interpolation over the gap was performed. thirdly, large differences in skin conductance among individuals can occur (yousoof & sapiyan, 2013). therefore, to counteract the variation between subjects, the eda and st data of each individual participant were standardized, bringing the mean of each signal to 0 and its variance to 1. fourthly, time domain features were analysed and mean eda and st were calculated as shown in figure 3. figure 3. standardized mean eda and s 3.6 log-data log-data was retrieved from the moodle learning management system (lms). the lms-system automatically keeps tracks of user activity (i.e., every min) and session. log-data was divided into several events, namely: (1) start the task; reading instructions, (2) writing an answer, (3) consultation of support and (4) submission; reviewing the answer. 3.7 analysis this study first investigated the differences between a high and low complex task for both the subjective measurements and physiological data (i.e., rq1, rq2). therefore, both subjective measurements and physiological data were tested on the normality assumption. results of the shapiro-wilk tests reveal that both subjective measurements and physiological measurements were normally distributed. as we were interested in the mean differences between the high and low complex task of both the self-reported and physiological data, controlled for prior knowledge (i.e., both pck and ck), order effect (see section 3.1), we conducted a linear mixed model (lmm) incorporating pck, ck and order as fixed factors and measurement time as a repeated measure (two-level for rq1 and three-level for rq3). when conducting lmm, the restricted maximum likelihood method (reml) was applied (baayen, davidson & bates, 2008). based on findings of rq1 and rq2, this study investigated the individual differences in the self-reported data of cognitive load for a high and low complex task, and how this relates to individual differences in physiological data (rq3). cohen’s d was calculated when differences were significant in order to have insight into the effect sizes (lecroy & krysik, 2007). a bivariate correlation analysis was conducted in order to find relationships between physiological data and subjective measurements of cognitive load. fourthly, as the advantage of physiological data is that it is measured continuously, this study investigated whether there are relationships between specific events (i.e., consultation of support) based on log-data and peaks (i.e., spontaneous fluctuations per s) of eda and drops of st (i.e., rq4). given the small sample size, the analysis more descriptive. 4. results 4.1 research question 1 descriptive statistics of the subjective measurements as shown in table 2 reveal that students reported on average higher intrinsic load, extraneous load and mental effort during the high complex task in comparison with the low complex task. results furthermore indicate that students reported higher germane load during the low complex task which was expected. table 2 descriptive statistics of the subjective measurements of the high and low complex task in order to investigate differences in the perceived cognitive load and mental effort (i.e., rq1), lmm was conducted incorporating pck, ck and ‘order effect’ as fixed factors and time as a two-level repeated measurement. pairwise comparison of the different measurements of intrinsic load, extraneous load, germane load and mental effort are indicated in table 3. results reveal that intrinsic load differed significantly across phases. f(1,13) = 6.43, p = .03. pairwise comparison reveals that intrinsic load was significantly higher (m = .86, p = .03) during the high complex task with cohen’s d = .88. when investigating the fixed factors, there was no significant effect of both pck, f(1,10) = .05, p = .82 and ck, f(1,10) = .43, p = .53. moreover, no significant order effect was found f(1,10) = 12, p = 74. as expected, results reveal no significant difference for extraneous load across phases f(1,13) = 17, p = .69. pairwise comparison reveals no significant mean difference (m = -.05, p = .90) between the high and low complex task for extraneous load. results of the fixed effects reveal no significant effect of pck f(1,10) = .04, p = .84, ck f(1,10) = .17, p = .69, and order f(1,10) = 1.58, p = .24. results for germane load indicate no significant differences across phases f(1,13) = 1.21, p = .29. pairwise comparison reveals no significant mean difference for germane load (m = -.18, p = .29) between the high and low complex task. results of the fixed effects indicate no significant effects for pck, f(1,10) = .00, p = .96 and ck, f(1,11) = .01, p = .93. moreover, no order effect was found, f(1,10) = 1.39, p = .2. finally, results revealed that mental effort was different across phases. mean difference of mental effort between the high and low complex task was significant (m = 1.43, p = 00) in the predicted direction with cohen’s d = 1.52. no significant effects of pck, f(1,11) = 2.39, p = .15 and ck, f(1,11) = 2.84, p = .12. additionally, no order effect, f(1,10) = .27, p = 62 was found. table 3 pairwise comparison of subjective measurements controlled for prior knowledge (i.e., pck, ck) and order effect 4.2 research question 2 descriptive statistics of the physiological data can be found in table 4. mean eda is lower during the high complex task compared to the low complex task. mean st is lower during the high complex task. table 4 descriptive statistics of the standardized physiological data in order to investigate the differences of physiological data between the baseline measurement, high and low complex task (i.e., rq2), lmm was conducted incorporating pck, ck, order effect as fixed factors and time as a three-level repeated measurement. results indicate that differences were found for mean eda across the different phases f(2,26) = 6.56, p = .01. pairwise comparison of the different measurements of mean eda are indicated in table 5. results of pairwise comparison reveals that the mean difference between the baseline measurement and high complex task phase is significant in the predicted direction (m = -.60, p = .05) with cohen’s d = .19. moreover, the mean difference is significant between the baseline measurement and the low complex task (m = -1.05, p = .00) with cohen’s d = .14. results reveal that no significant mean difference was found between the high and low complex task (m = -.45, p = .14). moreover, the mean difference was in the unexpected direction. when investigating the fixed factors, there was a non-significant main effect of both pck f(1,10) = .18, p = .68 and ck f(1,10) = .81, p = .36. additionally, there was a significant effect of order f(1,10) = 7.62, p = .02, which indicates an order effect. no significant differences were found for mean st across the different measurements, f(2,26) =.16, p = .85. pairwise comparison reveals no significant mean differences between baseline measurement and the high complex task (m = 1.02, p = .61), baseline measurement and the low complex task (m = .87, p = .66), and between the high and low complex task (m = -.15, p = .94). nonetheless, all mean differences were in the expected direction. when investigating the fixed effects, there was a non-significant main effect of both pck f(1,10) = .00, p = .97 and ck f(1,10) = .12, p = .74. additionally, there was no significant order effect, f(1,10) = .45, p = 52. table 5 pairwise comparison of physiological data controlled for prior knowledge and order 4.3 research question 3 results of rq1 reveal significant differences for perceived intrinsic load and mental effort. rq3 investigates the relationship between the individual differences of intrinsic load, mental effort and physiological data. results are displayed in table 6 and reveal that mental effort is significantly positive correlated with mean eda (r = .58, p = .03) for the high complex task. nevertheless, no significant positive correlation was found between mean eda and mental effort for the low complex task. no significant results were found for st. table 6 correlations between standardized physiological data and subjective measurements for the high complex task and low complex task. 4.4 research question 4 in the final rq4, this study investigates the relationship between physiological data and specific events retrieved from log-data and eda peaks. an example of such relationships is shown in figure 4. table 7 gives an overview of the amount of relationships between specific events and eda peaks. in contrast to eda, no conclusive relationships were found between st (i.e., drops) and specific events. st for most participants increased throughout the intervention as illustrated in figure 5. figure 4. electrodermal activity related to log-data of participant 15 figure 5: skin temperature related to log-data of participant 15 table 7 the relationship between specific events and eda peaks 5. discussion 5.1 research question 1 this study attempted to firstly investigate the difference of subjective measurements of cognitive load between a high and low complex task (i.e., rq1). results reveal that the students indicate higher perceived intrinsic load for the high complex task when compared with the low complex task. this indicates that the manipulation of complexity based on element interactivity was successful. additionally, students indicated that the perceived mental effort was higher during the high complex task. effect sizes of both intrinsic load and mental effort were high (>.80) indicating that the manipulation of complexity had an impact (lecroy & krysik, 2007). this reveals that students invested more mental effort into solving the high complex task in order to maintain performance at a constant level (paas et al., 2003). this is also in line with clt, since the high complex task was high in element interactivity and possibly required a lot of cognitive processing (van merriënboer & sweller, 2005). no significant differences were found for extraneous load between both tasks. this finding was expected as the instructions for both tasks were of the same level of difficulty. additionally, no significant differences were found for germane load, indicating that both tasks enhanced students’ understanding of the content at a similar level. this was in line with our expectations as the content and available support of both tasks was the same. 5.2 research question 2 secondly, this study aimed at investigating whether we can use physiological data to distinguish between the two complexity levels of the task. when investigating mean eda, results reveal that significant differences were found between both tasks and the baseline measurement. these findings indicate that both tasks result in a higher mean eda. nevertheless, effect sizes were very small (< .20), indicating that task complexity only had a minimal impact on mean eda (lecroy & krysik, 2007). moreover, no significant differences were found for mean eda between the high and low complex. these results are in line with the findings of the study of haapalainen et al. (2010), which also revealed no significant differences for eda between six tasks of different levels of difficulty. moreover, against expectations, descriptive statistics reveal that mean eda was higher during the low complex task, when compared with the high complex task. these unexpected findings may be induced by the order effect. this order effect may reduce a clear difference between the eda during the high and low complex task. moreover, visual analysis reveals that for the majority of all participants, skin conductance rises throughout the intervention (i.e., drift). since, more participants had the low complex at the end, this might indicate that results are biased by drift. this indicates the need for the current study to also examine eda peaks as these peaks are not affected by drift (rq4). when investigating mean st no significant mean differences were found for mean st across all different phases. nevertheless, descriptive statistics reveal that st was higher during the baseline measurement period. moreover, st was higher during the low complex task compared with the high complex task. this could indicate that st is related to task complexity as research indicated that st declines relative to a trigger event (ikehara & crosby, 2005). current findings indicate that mean eda and mean st might be indicators of changes of cognitive load, but cannot be used to detect differences in task complexity. nevertheless, there is no clear link between st and cognitive load. accordingly, correlations between individual differences in the perceived intrinsic load, mental effort and physiological data for a high and low complex task are investigated (rq3). 5.3 research question 3 a third aim of this study was to investigate whether we can relate subjective measures of the perceived intrinsic load and mental effort (i.e., based on findings of rq1) with physiological data (i.e., mean eda and st) during a high and low complex task. findings reveal that mental effort positively correlates with mean eda for the high complex intervention. nevertheless, we did not find a significant correlation between mean eda and mental effort during the problem-solving process of the low complex task. results might also be influenced by the fact that skin conductance was rising throughout the intervention. in addition, most students first solved the high complex task. no significant correlations between mean st and self-reported data were found. this finding could be due to the fact that st shows a very slow rise and decline in temperature change relative to the trigger event. therefore, it might be difficult to relate st to self-reports (ikehara & crosby, 2005). since, there seems to be a relationship between eda and mental effort and since st drops can be related to specific events, we investigated the relationship between physiological data and learning behaviour retrieved from log-data. 5.4 research question 4 in order to investigate the relationship between physiological data and learning behaviour. log-data was investigated and divided into four main events, namely, reading instructions, writing an answer, consulting support and reviewing the answer. results reveal that there seems to be a relationship between specific learning behaviour and eda peaks. moreover, results reveal that more peaks were registered during the high complex task, when compared with the low complex task, which indicates a different result compared to rq2. when investigating the intensity of the peaks, findings reveal that the peaks that are related to the events ‘submission’ are more intense. this might explain, besides the occurrence of drift, why mean eda was higher during the low complex task. possibly, results may have been influenced by the fact that the low complex task was presented as a test-format, which might induced more intensive peaks when students submitted their task. when investigating relations between peaks and events it seems that during the high complex task, peaks are more frequently related to cognitive processes (e.g., reading instructions, consulting support and writing) when compared with the low complex task (e.g., submission). for instance, when investigating the event ‘consultation of support’ more in detail, peaks were related to students (n = 4) watching a video that explains the circumference of a circle. this is line with previous research indicating that gsr responses are associated with effortful cognitive processing during multimedia learning (antonietti, colombo & di nuzzo, 2015). additionally, hardly any peaks were found for the low complex task during writing, which is in line with the study of mudrick, taub, azevedo, price & lester (2017). mudrick et al. (2017) investigated multimedia learning and indicated that the lowest amount of gsr responses were retrieved when answering multiple choice questions, suggesting that this might require less cognitive processing. this finding is also in line with the study of hoogerheide et al. (2018) indicating that mean eda was significantly lower during the problem-solving process of a practice problem, when compared with teaching a practice problem in an authentic learning situation. these exploratory findings indicate that the intensity of eda signals might be more related to the type of learning activities. in line with previous findings of rq2 and rq3, no conclusive results were found for st. nevertheless, on the basis of data visualisation of all students we could see that for the largest number of participants (i.e., 8 students), st is lower during the high complex task, which is in line with findings of rq2. 5.5 limitations and further research despite the merits of the study in terms of indicating that individual differences in experienced mental effort can indicate individual differences in eda, there are some important limitations that should be mentioned. firstly, results must be approached carefully as multiple analyses on the same dependent variable were conducted which can increase the chance of committing a type 1 error (roth, 1999). secondly, as we were investigating physiological data, we were obliged to implement a within-subject design. this is required when investigating skin conductance, as skin conductance can vary markedly between individuals (braithwaite et al., 2013). nevertheless, the within subject design had some important disadvantages. since the same learning materials were taught within both the high and low complex task, students might have learned from the previous task and therefore perceived the high complex task as less difficult. this is turn might have influenced skin conductance and skin temperature, and may be a reason why there was no clear difference between the high and low complex task. this problem can be addressed in future studies by addressing different topics. moreover, future studies should offer more different tasks of different levels of complexity, and also create more conditions in order to increase the amount of measurements. this could provide a better understanding of possible correlations between mental effort and mean eda. a third important limitation, when investigating skin conductance is drift, a continuous increase of the intensity of the signal. it is important to distinguish drift from important shifts in real tonic processes (braithwaite et al., 2013). nevertheless, this distinction between drift and real tonic processes is not always entirely clear. this emphasizes the need of an accurate baseline measurement. the baseline measurement in the current study could be optimized by giving the participants a moment of relaxation. given the small sample size we decided not to remove data of participants. instead, in this study we have additionally investigated the peaks of skin conductance (as these are no subject of drift) and related them to specific events in the learning environment. nevertheless, it can be advisable to remove data of participants on the basis of drift in larger datasets. moreover, a larger sample size would also allow us to investigate patterns between eda peaks and specific events in the learning environment (e.g., reading instructions) while using quantitative methods. finally, as the study did not take place in a lab setting but in the classroom of the students, a lot of confounding factors unrelated to cognitive load may cause clouds in the data such as a lecturer entering the classroom and students leaving the classroom when finished. these events are likely to degrade the accuracy of cognitive load measurement by gsr (i.e., eda). nevertheless, the ecological valid setting also has advantages such as authenticity of the results (schmuckler, 2001). moreover, as the content was part of students’ training program, students were encouraged to thoroughly solve the tasks, which is reflected in the task performance. 6. conclusion this study attempted to firstly investigate the difference of subjective measures of cognitive load and physiological data (i.e., mean eda and st) between a high and low complex task in an ecologically valid setting. students indicated that they perceived higher intrinsic load during the high complex task and that the high complex task required more mental effort. this indicates that task complexity can be manipulated based on element interactivity. nevertheless, complexity was not reflected by differences in physiological data (i.e., mean eda and st). accordingly, in a next phase this study investigated correlations between perceived intrinsic load, mental effort and physiological data. results revealed a positive correlation between mean eda and mental effort during the high complex task. nevertheless, no significant correlations were found for the low complex task. preliminary results of a more descriptive analysis showed that peaks of eda during the high complex task were more frequently related to cognitive processes when compared with the low complex task (i.e., submitting the task). the latter finding might explain the significant relationship between mental effort and mean eda. future research should replicate similar studies while using larger sample sizes to verify these findings. additionally, the relationship between eda and the type of learning behaviour (i.e., retrieved from log-data) should not be overlooked. keypoints preliminary results indicate that mean eda is correlated with self-reported mental effort. results indicate that perceived intrinsic load can be manipulated based on element interactivity, which is in line with the cognitive load theory. it is important for future research to investigate correlations between subjective measurements and physiological data while using large sample sizes. when investigating eda, it is important to investigate peaks of skin conductance in combination with specific events retrieved from log-data. this might reveal patterns and provide more insight into the influence of the learning behaviour on skin conductance. references antonenko, p., paas, f., grabner, r., & van gog, t. (2010). using electroencephalography to measure cognitive load. educational psychology review, 22. 425-438. doi:10.1007/s10648-010-9130-y antonietti, a., colombo, b., & di nuzzo, c. (2015). metacognition in self-regulated multimedia learning: integrating behavioural, psychophysiological and introspective measures. learning, media and technology, 40. 187-209. doi:10.1080/17439884.2014.933112 baayen, r. h., davidson, d. j., & bates, d. m. (2008). mixed-effects modeling with crossed random effects for subjects and items. journal of memory and language, 59. 390-412. doi:10.1016/j.jml.2007.12.005 boekaerts, m. (2017). cognitive load and self-regulation: attempts to build a bridge. learning and instruction, 51, 90–97. doi:10.1016/j.learninstruc.2017.07.001 braithwaite, j., watson, d., jones, r., & row, m. (2013). a guide for analysing electrodermal activity (eda) & skin conductance responses (scrs) for psychological experiments. psychophysiology, 49, 1017–1034. doi:10.1017.s0142716405050034 brunken, r., plass, j. l., & leutner, d. (2003). direct measurement of cognitive load in multimedia learning. educational psychologist, 38. 53-61. doi : 10.1207/s15326985ep3801_7 chang, c. c., & yang, f. y. (2010). exploring the cognitive loads of high-school students as they learn concepts in web-based environments. computers and education, 55. 673-680. doi: 10.1016/j.compedu.2010.03.001 chen, f., zhou, j., wang, y., yu, k., arshad, s. z., khawaji, a., & conway, d. (2016). robust multimodal cognitive load measurement. human-computer interaction series. doi: 10.1007%2f978-3-319-31700-7 cierniak, g., scheiter, k., & gerjets, p. (2009). explaining the split-attention effect: is the reduction of extraneous cognitive load accompanied by an increase in germane cognitive load? computers in human behavior, 25. 315-324. doi: 10.1016/j.chb.2008.12.020 de bruin, a. b. h., & van merriënboer, j. j. g. (2017). bridging cognitive load and self-regulated learning research: a complementary approach to contemporary issues in educational research. learning and instruction, 51. 1-9. doi: 10.1016/j.learninstruc.2017.06.001 deleeuw, k. e., & mayer, r. e. (2008). a comparison of three measures of cognitive load: evidence for separable measures of intrinsic, extraneous, and germane load. journal of educational psychology, 100, 223-234. doi: 10.1037/0022-0663.100.1.223 glogger-frey, i., gaus, k., & renkl, a. (2017). learning from direct instruction: best prepared by several self-regulated or guided invention activities? learning and instruction, 51. 25-35. doi:10.1016/j.learninstruc.2016.11.002 hoogerheide, v., renkl, a., fiorella, l., paas, f., & van gog, t. (2018). enhancing example-based learning: teaching on video increases arousal and improves problem-solving performance.journal of educational psychology, 211. 45-56. doi: 10.1037/edu0000272 haapalainen, e., kim, s., forlizzi, j. f., & dey, a. k. (2010). psycho-physiological measures for assessing cognitive load. proceedings of the 12th acm international conference on ubiquitous computing. doi: 10.1145/1864349.1864395 herborn, k. a., graves, j. l., jerem, p., evans, n. p., nager, r., mccafferty, d. j., & mckeegan, d. e. f. (2015). skin temperature reveals the intensity of acute stress. physiology and behavior, 1. 225-230. doi: 10.1016/j.physbeh.2015.09.032 ikehara, c., & crosby, m. (2005). assessing cognitive load with physiological sensors. proceedings of the 38th hawaii international conference on system sciences. doi: 10.1109/hicss.2005.103 jonassen, d. h. (2000). toward a design theory of problem solving.educational technology research and development, 48. 63-85. doi: 10.1007/bf02300500 karthikeyan, p., murugappan, m., & yaacob, s. (2012). descriptive analysis of skin temperature variability of sympathetic nervous system activity in stress. journal of physical therapy science, 24. 1341-1344. doi: 10.1589/jpts.24.1341 kirschner, p. a., ayres, p., & chandler, p. (2011). contemporary cognitive load theory research: the good, the bad and the ugly. computers in human behavior, 27. 99-105. doi: 10.1016/j.chb.2010.06.025 kirschner, f., kester, l., & corbalan, g. (2011). cognitive load theory and multimedia learning, task characteristics and learning engagement: the current state of the art. computers in human behavior, 27 , 1-4. doi: 10.1016/j.chb.2010.05.003 klepsch, m., schmitz, f., & seufert, t. (2017). development and validation of two instruments measuring intrinsic, extraneous, and germane cognitive load. frontiers in psychology, 8. doi: 10.3389/fpsyg.2017.01997 larmuseau, c., elen, j., & depaepe, f. (2018). the influence of students’ cognitive and motivational characteristics on students’ use of a 4c/id-based online learning environment and their learning gain. in lak'18:international conference on learning analytics and knowledge, march 7–9, 2018, sydney, nsw, australia. acm, new york, ny, usa , 10 pages. doi: 10.1145/3170358.3170363 lecroy, c. w., & krysik, j. (2007). understanding and interpreting effect size measures. social work research, 31. 243-248. doi: 10.1093/swr/31.4.243 leppink, j., paas, f., van der vleuten, c. p. m., van gog, t., & van merriënboer, j. j. g. (2013). development of an instrument for measuring different types of cognitive load. behavior research methods, 45. 1085-1072. doi: 10.3758/s13428-013-0334-1 liapis, a., katsanos, c., sotiropoulos, d., xenos, m., & karousos, n. (2015). recognizing emotions in human computer interaction: studying stress using skin conductance. in lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). 255-262. doi: 10.1007/978-3-319-22701-6_18 martin, s. (2014). measuring cognitive load and cognition: metrics for technology-enhanced learning. educational research and evaluation, 20. 592-621. doi: 10.1080/13803611.2014.997140 mayer, r. e. (2014). incorporating motivation into multimedia learning. learning and instruction, 29. 171–173. doi: 10.1016/j.learninstruc.2013.04.003 mayer, r. e., & moreno, r. (2003). nine ways to reduce cognitive load in multimedia learning. educational psychologist, 38. 43-52. doi: 10.1207/s15326985ep3801_6 merrill, d. (2009). first principles of instruction. in instructional-design theories and models, 50. 43-59. doi: 10.4324/9780203872130 mudrick, n. v., taub, m., azevedo, r., price, m. j., & lester, j. (2017). can physiology indicate cognitive, affective, metacognitive, and motivational self-regulated learning processes during multimedia learning? paper presented at the annual meeting of the american educational research association (aera), san antonio, tx. nourbakhsh, n., wang, y., chen, f., & calvo, r. a. (2012). using galvanic skin response for cognitive load measurement in arithmetic and reading tasks. proceedings of the 24th conference on australian computer-human interaction ozchi ’12 . doi: 10.1145/2414536.2414602 paas, f. (1992). training strategies for attaining transfer of problem solving skills in statistics: a cognitive load approach. journal of educational psychology, 84, 429–434. doi: 10.1037/0022-0663.84.4.429 paas, f., van gog, t., & sweller, j. (2010). cognitive load theory: new conceptualizations, specifications, and integrated research perspectives. educational psychology review, 2. 115-121. doi: 10.1007/s10648-010-9133-8 paas, f., tuovinen, j., tabbers, h., & van gerven, p. w. m. (2010). cognitive load measurement as a means to advance cognitive load theory. educational psychologist, 38. 63-71. https://doi.org/10.1207/s15326985ep3801 raaijmakers, s. f., baars, m., schaap, l., paas, f., & van gog, t. (2017). effects of performance feedback valence on perceptions of invested mental effort. learning and instruction, 51. 35-46. doi: 10.1016/j.learninstruc.2016.12.002 roth, a. j. (1999). multiple comparison procedures for discrete test statistics. journal of statistical planning and inference, 82. 101-117. doi: 10.1016/s0378-3758(99)00034-8 scharinger, c., soutschek, a., schubert, t., & gerjets, p. (2015). when flanker meets the n-back: what eeg and pupil dilation data reveal about the interplay between the two central-executive working memory functions inhibition and updating. psychophysiology. 1293-1304. doi: 10.1111/psyp.12500 schmeck, a., opfermann, m., van gog, t., paas, f., & leutner, d. (2015). measuring cognitive load with subjective rating scales during problem solving: differences between immediate and delayed ratings. instructional science, 43. 93-114. doi: 10.1007/s11251-014-9328-3 schmuckler, m. a. (2001). what is ecological validity? a dimensional analysis. infancy, 2, 419–436. doi: 10.1207/s15327078in0204_02 schreiber, j. b., nora, a., stage, f. k., barlow, e. a., & king, j. (2006). reporting structural equation modeling and confirmatory factor analysis results: a review. the journal of educational research, 99, 323-338. doi: 10.3200/joer.99.6.323-338 setz, c., arnrich, b., schumm, j., la marca, r., tröster, g., & ehlert, u. (2010). discriminating stress from cognitive load using a wearable eda device. ieee transactions on information technology in biomedicine, 14. 410-417. doi: 10.1109/titb.2009.2036164 shi, y., ruiz, n., taib, r., choi, e., & chen, f. (2007). galvanic skin response (gsr) as an index of cognitive load. in chi ’07 extended abstracts on human factors in computing systems chi ’07 . doi: 10.1145/1240866.1241057 shusterman, v., anderson, k. p., & barnea, o. (1997). spontaneous skin temperature oscillations in normal human subjects. the american journal of physiology, 273. doi: 10.1152/ajpregu.1997.273.3.r1173 smets, e., velazquez, e. r., shiavone, g., chakroun, i., d’hondt, e., de raedt, … van hoof, c. (2018). large-scale wearable data reveal digital phenotypes for daily-life stress detection. digital medicine, 67. doi: 10.1038/s41746-018-0074-9 spanjers, i. a. e., van gog, t., & van merriënboer, j. j. g. (2012). segmentation of worked examples: effects on cognitive load and learning.applied cognitive psychology, 26. 353-358. doi: 10.1002/acp.1832 sweller, j. (1994). cognitive load theory, learning difficulty, and instructional design. learning and instruction, 4. 295-312. doi: 10.1016/0959-4752(94)90003-5 sweller, j. (2010). element interactivity and intrinsic, extraneous, and germane cognitive load. educational psychology review, 22. 123-138. doi: 10.1007/s10648-010-9128-5 sweller, j., ayres, p., & kalyuga, s. (2011). cognitive load theory. explorations in the learning sciences, instructional systems and performance technologies. doi: 10.1007/978-1-4419-8126-4 sweller, j., van merrienboer, j., & paas, f. (1998). cognitive architecture and instructional design. educational psychology review, 3. 251-196. doi: 10.1023/a:1022193728205 van merrienboer, j. j. g., kirschner, p. a., & kester, l. (2003). taking the load off a learner’s mind: instructional design for complex learning. educational psychologist, 38, 5–13. doi: 10.1207/s15326985ep3801_2 van merriënboer, j. j. g., & sluijsmans, d. m. a. (2009). toward a synthesis of cognitive load theory, four-component instructional design, and self-directed learning. educational psychology review, 21. 55–66. doi: 10.1007/s10648-008-9092-5 vinkers, c. h., penning, r., hellhammer, j., verster, j. c., klaessens, j. h. g. m., olivier, b., & kalkman, c. j. (2013). the effect of stress on core and peripheral body temperature in humans. stress, 16. 520-520. doi: 10.3109/10253890.2013.807243 yousoof, m., & sapiyan, m. (2013). measuring cognitive load for visualizations in learning computer programming-physiological measures. ubiquitous and communication journal, 8. 1410-1426. retrieved from : https://pdfs.semanticscholar.org/bdbd/8af1870e956e4a727e2449897077266fa8e5.pdf zagermann, j., pfeil, u., & reiterer, h. (2016). measuring cognitive load using eye tracking technology in visual computing. proceedings of the sixth workshop on beyond time and errors on novel evaluation methods for visualization. doi: 10.1145/2993901.2993908 zheng, r., & cook, a. (2012). solving complex problems: a convergent approach to cognitive load measurement. british journal of educational technology, 43. 233-246. doi: 10.1111/j.1467-8535.2010.01169.x puurtinen publication frontline learning research vol 6 no. 3 special issues (2018) 148-161 issn 2295-3159 learning on the job: rethinks and realizations about eye tracking in music-reading studies marjaana puurtinen a a university of turku, finland article received 23 may 2018 / revised 31 august / accepted 18 september / available online 19 december abstract the application of new methods and measures in domains with few methodological traditions of that kind often presents researchers with a challenge; they may have to take up the task of developing their understanding of the phenomenon while, at the same time, creating the practices for its study. for us, the method was eye tracking, and the topic, music reading. one key characteristic of music reading is that the music reader’s gaze moves slightly ahead of the current point of performance. this gap allows the performer to prepare for the upcoming motoric responses. in this paper, we present our 10-year-long path, describing the steps we have taken while studying this “looking ahead” in music reading. we will point out how we have, after both advances as well as setbacks, come to change our views on how best to explain the various components affecting this specific act and how it is best measured. finally, we discuss some of the lessons we have learned, hoping in this way to provide practical suggestions for others who plan to take up methods from other domains and use them in novel ones. keywords: expertise; eye tracking; eye-hand span; eye-time span; music reading corresponding author email: marjaana.puurtinen@utu.fi doi: https://doi.org/10.14786/flr.v6i3.38 1. introduction similar to all researchers who plan to apply a method more often used for studying other topics than the one they are wishing to target, we in our team have had to learn about expertise in music reading in parallel with developing the eye-tracking methodology for its study. our initial educated guesses about how to go about this were based on one team member’s prior experience with eye tracking and on all team members’ much better comprehension of western music notation than of the cognitive side of things (due to our background in music and academic training in other disciplines than psychology). as often occurs in research, some of our hunches were better than others, and our views about both what to study and how to study it have therefore changed considerably throughout the 10 years we have conducted our work. it may be that the interest in process measures in educational sciences puts more and more project leaders, research coordinators, doctoral candidates and their supervisors in similar situations to where we were some time ago. to be sure, many will be much faster in learning their topics than we have been. still, for anyone intending to take up eye tracking or any other similarly complicated method and apply it to a little explored topic, we suggest that it is good to prepare for similar kinds of rethinks and realizations to what we have faced and, in advance, already be thinking of both practical and emotional ways to cope with them. we begin our history by giving some background about eye tracking in the music domain (based on our current understanding of the matter), and then we briefly describe some of our more or less successful data collections and attempts to report our findings from 2008 onward with respect to the study of “looking ahead” during music reading. for the sake of presentation, we write about our sub-studies as steps 1, 2 and 3, although the whole process consisted of several overlapping stages. finally, based on our history, we state some of the lessons we have learned from the perspective of taking up new research methods in little explored fields. 1.1why use eye tracking in the music domain? with eye tracking, we can accurately record the durations and locations of fixations, that is the short moments (typically around 200–400 ms) when we target our gaze to a certain location in our visual array and process information from it. during one fixation, we only see a narrow area accurately (for instance one word of this text), and in order to collect more visual information, we must quickly move our eyes to fixate upon another spot (for instance the next word). the perceptual span is the area from which useful visual information is obtained. for instance, when we fixate on this word, a few words to the left and right of it fit within the perceptual span but are blurred. this blurred visual information helps us, however, to see where the next word begins, and we can then target our next fixation on that word. our eye movements are so rapid that we get the impression that our eyes do not perform these skips and jumps, and we may also fixate upon targets (for instance reread a word) without noticing it ourselves. (for details, see rayner, 2009; holmqvist et al., 2015.) we therefore need eye trackers to record and show us how we are looking at a painting, reading a newspaper, checking the traffic while driving or otherwise examining our surroundings. eye tracking has a long research tradition in psychological studies about text reading. in these studies, eye-movement data is considered to give insight into the cognitive processes related to the reading and comprehending of words and longer texts (see rayner, 1998; 2009). the method has also been used to study visual expertise in various domains (jarodzka, holmqvist, & gruber, 2017), of which chess and medicine are the ones typically mentioned first. in expertise research, the goal is to make experts’ automated visual practices visible and, similarly to text-reading research, to search for indicators of the underlying cognitive processes. often this is done by comparing the visual processing of experts to that of novices or intermediates while they perform a domain-specific task. music reading would seem like a natural part of studies about visual expertise. it is a visual motor task rehearsed and widely used by a large group of amateur musicians and used professionally (and on very high level of expertise) by some, while there are still many almost illiterate with respect to western music notation. music reading also has one feature which points especially to the usefulness of the eye-tracking method; while reading, the performer produces a continuous motor response, as she is constantly “acting out” the stimulus. this gives the researcher constant verification of the quality of the visual processing as well as the possibility of synchronizing, throughout a recording, the “intake” of visual information with a motor output. this is unlike the process with many other visual tasks. (see puurtinen, 2018.) at this point, we need to specify what we mean by “music reading”. to us, it is a task where someone reads music notation and executes the symbols in one way or another. the motor part may consist only of tapping rhythms, singing or playing an instrument (see video 1). we consider it important to make a distinction between performance tasks and tasks where music notation is only read and not performed in any way. to highlight the difference, we have come to call the latter silent (music) reading (penttinen, huovinen, & ylitalo, 2013). as an example, think of a piano teacher taking a pile of sheet music in her hands in order to go through the sheets and select which piece to play with a student. the teacher is probably not reading all of the note symbols to the extent that they could be performed (cf. the performer in video 1) but is scanning the music quickly in order to see whether one piece would seem to be of suitable difficulty. in this type of a task, there are no temporal restrictions imposed upon the reading (see section 1.2, below), nor any need to plan for motor responses. thus, compared to a performance task, we believe that the silent-reading task presents the readers with very different goals for going through the material and, accordingly, with very different cognitive demands (see penttinen, et al., 2013; puurtinen, 2018). however, due to the lack of a need to plan for a motor response, silent-reading tasks could be used as a reference when comparing music reading with the processing of other types of domain-specific symbol systems. for those interested in the visual processing of note symbols with no performance requirements, we suggest that they turn to studies which have applied a non-performance design (e.g., waters & underwood, 1998; burman & booth, 2009; drai-zerbib & baccino, 2013; penttinen, et al., 2013). video 1. the author’s eye movements during the piano performance of a familiar children’s song, “mary had a little lamb”. a metronome (the click heard in the background) is set at 60 beats per minute and provides the temporal framework. recorded with a tobii tx300 eye tracker and a yamaha electric piano. overall, then, the use of the eye-tracking method seems well suited for music-reading studies. still, for some reason, such studies are quite few, and the field lacks a coherent research tradition (for reviews, see madell & hébert, 2008; puurtinen, 2018). we can, however, say that in general, experts in music reading are faster at encoding note symbols than are novices and intermediate performers, and therefore, they can read the music with shorter fixation durations. this expertise effect has been noticed both in silent-reading tasks (waters & underwood, 1998; penttinen, et al., 2013) and during-performance tasks. considering the latter, experts’ advanced skill has appeared both when more and less experienced musicians’ performances differ in terms of performance tempo (truitt, clifton, pollatsek, & rayner, 1997; gilman & underwood, 2003; penttinen & huovinen, 2011) but also when all performances are alike in a temporal sense and in performance quality (penttinen, huovinen, & ylitalo, 2015). the features of the music notation, too, play a role in the reading, though this side of the process has only rarely been studied and when it has been, often only on a very general level (madell & hébert, 2008; puurtinen, 2018). 1.2 music reading is all about the use of time the uniqueness of music reading as a visual motor task is in the fact that it is temporally regulated. time can be thought to affect the reading on two levels. first, the overall tempo affects the amount of absolute time a performer has for inspecting and performing the visual elements on the score. for example, if one performer uses 10 seconds to perform the melody in video 1, and another one plays the same melody in 5 seconds, it is obvious that the latter performer has less time to study the written symbols and plan for the motor responses. secondly, the course of reading is also regulated, because the relative durations of individual notes are signalled by the symbolic system itself. in figure 1, in measure 2, the performer has to perform more (two eighth notes) during the first beat and less (only one quarter note) during the second beat. naturally, the performer may choose to ignore the durations of individual symbols and stop to correct errors, but that will lead to a “staggering” performance, where the flow of the music is disrupted. this is exactly what beginning musicians do when they are incapable of interpreting the symbols and performing them in a given tempo (e.g. drake & palmer, 2000). all of this is unlike the situation in text reading, where readers can return to difficult sections or words and where, in fact, targeted rereading may even result in better text comprehension and be considered a preferable strategy (e.g. hyönä, lorch, & kaakinen, 2002). another complexity of western music notation is the fact that the same symbols contain information about both the relative duration of each symbol and the pitch height of each (e.g. which key to press on a piano keyboard), and the reader has to process both features in order to perform accurately. figure 1 demonstrates the relative durations of notes in “mary had a little lamb” (the first line) along with the whole melody, with pitch heights included (the second line). (note that this is a highly simplified example of western music notation; the amount of information on musical scores can vary from this type of simple information to detailed information about simultaneously performed notes, performance tempos and expressive interpretation.) figure 1. above, the rhythms of “mary had a little lamb”. in western music notation, the score is typically divided into “measures” marked by vertical bar lines. in this example, in measure 2, the performer first performs two eighth notes during one quarter beat (the beat marked in red) and then one quarter note which lasts for the whole quarter beat (the beat marked in blue). the correct timing of the notes (which land either on a beat onset or between two beat onsets) is of importance, as it secures the steady flow of the music. below is the full song, the melody represented through the different pitch heights of the note heads. thus, musicians have to make sense of the complex symbolic system and select what to target their gaze at, as well as when and for how long, in order to perform successfully while operating within the given temporal framework. we know that musicians manage this by maintaining their gaze slightly ahead of the current point of performance, and with the help of this buffer (typically of perhaps around 1–2 seconds [furneaux & land, 1999; penttinen, et al., 2015; huovinen, ylitalo, & puurtinen, 2018; rosemann, altenmüller, & fahle, 2016; see video 1]), the performer prepares for the upcoming motoric responses. the gaze may typically tend to remain very close to the performed notes (truitt et al., 1997; penttinen, et al., 2015), but instead of a steady “looking ahead”, at least for skilled music readers, the reading may also consist (mainly or in part) of rapid back and forth eye movements (goolsby, 1994; cf. penttinen, et al., 2015). overall, this “looking ahead” behaviour, often called the eye-hand span (madell & hébert, 2008; holmqvist et al., 2015) should work as an indicator of music-reading-related cognitive processes, since it seems to vary due to performers’ expertise as well as stimulus features. however, the interplay of all the factors involved is still not properly understood, and the field is methodologically scattered to the extent that only the most elementary features of the “looking ahead” can be regarded to be established ones. (for a recent review of findings thus far, see huovinen et al. [2018].) before 2008, the year of our first data collection, published works on this topic were indeed very few, and we therefore had to start considerably near the beginning. 2. our eye-tracking investigations about the “looking ahead” in music reading 2.1 step 1: focus only on gaze targets (penttinen & huovinen, 2009; 2011) 2.1.1. study summary we began our work in 2008–2009 by recording non-professional pianists’ sight-reading of simple melodies prior to, during and after nine months’ training. our aim was to track down potential eye-movement indicators of early skill development, especially relating to the reading of larger melodic intervals. our author team then consisted of a phd candidate in education (who was also an active musician) along with a musicologist, who were supported by a group of colleagues from a department of teacher education. forty-nine education majors took part in the experiment, of whom 15 novices and 15 amateur musicians’ data was included in the final analyses. the data collection was organized alongside a compulsory music course (with piano playing as one of the course topics), and this allowed us to follow the novices’ development during a real-life learning task. our four experimental melodies consisted of quarter notes (see figure 2) played only with the right hand, with a metronome signalling the onset of each quarter note at a relatively slow pace (60 beats per minute, where each beat thus lasted for 1 s). the melodies were stepwise apart from two larger melodic intervals (see the “skips” at two of the bar lines in figure 2). figure 2. one of the four piano performance tasks applied in penttinen and huovinen (2009; 2011). in this melody, each of the right-hand fingers could be put on one key (the thumb on the first note), and the performer did not have to move her hand. for the most part, even for the novices, this eliminated the need to look at the fingers; with difficult pieces, pianists often need to divide their visual attention between the score and the keyboard to ease motor coordination. we built a set-up where music was presented from the screen of a 50 hz tobii 1750 eye tracker, and an electric piano, attached to a laptop with sequencer software, was positioned in front of the tracker. after careful consideration, no chin rest was used, to allow the performers as natural a playing position as possible. we assumed that although during our simple tasks the performers did not need to look at their fingers to any great extent (apart from prior to performing when placing the hand on the keyboard), preventing that altogether with a chin rest might have had larger effects on the eye-movement data than the rare looks toward the keyboard. previous music-reading studies have not been very precise about how participants in them were trained for the study protocols (sometimes there is no indication that any training took place), but we also took care in preparing a practice trial with exactly the same kind of protocol (down to every instruction slide appearing on the screen) than what was then applied in the actual recording. this proved a useful procedure, and we have applied that practice ever since. we attempted to launch both the eye-tracking and sequencer recording simultaneously from a third computer, but though the system worked in pilots, inconsistent lags started to appear during the actual data collection. thus, we could not synchronize the performance and eye-movement data after all, and our analyses had to be limited to the allocation of fixation time without information about where the performance was ongoing at each fixation. we reported indicators of novices’ skill development in three respects. first, after training, the novices performed the large intervals with better timing and accuracy. second, together with increasing performance accuracy, the novices began to allocate more fixation time to the last two notes of each measure (similar to the amateurs, who did so from the first measurement session onward). third, we noticed that the novices, with training, perhaps began to identify the latter notes in large intervals more quickly. this was suggested by the gradual shortening of the average first-pass fixation durations for those symbols. there were, however, some matters which suggested that some of the results should be treated with caution, and we therefore tried to corroborate them later on. after two revisions, the manuscript was accepted for publication (penttinen & huovinen, 2011). 2.1.2. rethinks and realizations after step 1 this was the first music-reading study we designed, and it certainly had some drawbacks, but perhaps with beginners’ luck, some benefits as well. to begin with the latter, due to some of our background in text-reading studies, we decided to apply in our music-reading studies the eye-tracking measures suggested by hyönä, lorch and rinck (2003) as suitable common ones for text-reading research. we only later fully realized the need for music-reading research, too, to be much more consistent in the use of the measures; we continued with these measures in our later work. another benefit was the choice of very simple melodies; we realized that the set-up should perhaps be kept simple, but the reasons were not solely due to our less sophisticated understanding of the eye-tracking method and the kind of stimuli with which it was most usefully applied. we also simply considered what kinds of melodies our novices could learn to play during the nine-month-long training. however, we did come to think (to some extent, at least) of the fact that due to the multidimensional nature of western music notation (see section 1.1), we should be able to distinguish whether any eye-movement effects were due to the interpretation or planning of the rhythm or melodic features of a certain melody. by simplifying the rhythm in these tasks, we could narrow down our interpretations and suggest that our novices’ eye-movement effects were indeed indicators of the interpreting and planning of the melodic features in the melodies, rather than being about reading and planning the execution of the rhythmic patterns. this has not been the case in many music-reading studies (see, puurtinen, 2018). however, we were also faced with several challenges, both while collecting the data and especially when analysing it. first, the synchronization of performance and eye-tracking data failed, as described above. for this reason, we could only speculate about what had caused shorter or longer fixation times and, importantly, when particular fixations took place. we had no idea whether our novices and amateurs read in a steady manner approximately one second ahead of the performed notes or whether they tended to do longer advance inspections and then return toward the point of performance, as was proposed as one possible pattern for skilled readers (goolsby, 1994). when discussing the findings, we could only offer alternative hypotheses about the “looking ahead” in this specific task. the fixation durations suggested that not all note symbols were treated equally, but we did not have the full explanation of why. and in the analysis of note-specific fixation times, we also applied the kruskal-wallis analyses with dependent data in one part of the analyses, which, we know now, omitted the within-subject dependence. furthermore, we presented the four melodies in the same order to all participants and did not counterbalance them. thus, although the mean values for first-pass fixation times for selected note symbols decreased for the novices during training, the statistical testing should be interpreted with caution and retested with a better study design. we tried to address these matters in our follow-up studies. 2.2 step 2: adding the hand (penttinen, huovinen, & ylitalo, 2012; 2015) 2.2.1. study summary in 2008–2009, when we collected the data set described above, we also invited the same participants to perform another task, to play a familiar children’s song, “mary had a little lamb”, a few times. some versions of the melody contained a one-measure-long “mistake” or notes which were “wrong”, and our idea was to trace the eye-movement indicators of coping with such surprising misprints in the music. here, too, we applied a metronome, which forced the performers to solve the problem of playing something against their expectations at a given tempo. our attempt with the 49 participants failed (for details, see section 2.2.2, below), and we ended up using only five experienced musicians’ data as a pilot study (reported as study 1 in penttinen, et al. [2012]). we developed the setting further and collected a new data set in 2011. at this point, we had also included a student of statistics in our team. due to many novices’ difficulties in performing the melody accurately and in the given tempo in our first try, this time we only asked skilled performers (amateur and professional-level pianists) to perform the same song and two of the variations from the previous attempt. an electric piano was placed in front of a 300-hz tobii tx300 eye tracker, and the performances were recorded into a separate computer. the metronome was provided by the second computer, which recorded the performances in midi format. both computers were operated by the first author, who coordinated the change of slides on the eye tracker monitor so that they occurred at metronome beats. this procedure provided both the eye tracking and midi data streams with simultaneous events, and we created a laborious way to synchronize the data streams based on these markers (for details, see penttinen, et al., 2015). this enabled us to calculate what we called the eye-hand span, that is, how much the gaze was ahead of the onset of a performed note (see figure 3). figure 3. how we calculated the eye-hand span in penttinen and colleagues (2015). in this melody, two quarter beats (beat 1 in red, beat 2 in blue) fit into one measure. the performer plays all of the note symbols. when performing the first note symbol of measure three (in red), the performer’s gaze is targeted near the note in the second beat area of the measure (in blue). thus, at this point, the eye-hand span is, roughly put, one quarter beat. in the tempo of 60 beats per minute, this equals one second. we also calculated what we called gaze activity, namely how many quarter beat areas were fixated on when performing the notes of one quarter beat (i.e. notes fitting under a red or a blue line). this was typically only two beat areas. we observed that professional performers applied longer eye-hand spans more often than amateurs did. also, during one-second intervals, professional pianists fixated on more of the music than amateur performers did. we called this finding an increase in their gaze activity. however, these between-group differences disappeared in the face of melodic deviations, suggesting that performing against expectations inhibited the professional performers’ reading, to some extent. we were also able to compare in detail the “looking ahead” during the performance of different rhythm symbols and obtained some information about the reading of quarter notes and the more rapidly performed eighth notes (see figure 1). for the analyses, we used tand chi-square tests and were fairly content with our protocols for data collection and analyses — for a while. our first full-paper submission was not successful, but the second one was (penttinen, huovinen, & ylitalo, unpublished manuscript; penttinen, et al., 2015). 2.2.2. rethinks and realizations after step 2 as described above, our first attempt (in 2008-2009) with this particular research design failed. first of all, there was a lack of synchronization of the performance and eye-movement data. secondly, the task was too difficult for the most novice participants, and this reduced the data set considerably. erroneous performances were so much unlike each other (with different types of errors appearing in different parts of the melodies) that they could not be pooled. thirdly, to present the melody in a big enough font size, we placed it on two rows (first four measures on row 1, last four on row 2). thus, the performers’ gaze moved from the end of the first row to the beginning of the second, and in such a short reading task, all additional and accidental fixations landing here and there during these sweeps from one line to the next made the eye-movement data even more complicated to handle. finally, one of the three variations we used had the “mistake” in the second-to-last measure. however, during music reading, the endings differ from other parts of the reading, since there is nothing more to look in advance, and thus, the “extra” time gained is just spent on the final measures. thus, this variation did not work out similarly to the other two variations. after we addressed these deficiencies in our second data collection in 2011, we began to get a bit closer to what we were after. skilled performers’ adjustment of the eye-hand span and gaze activity, when something unexpected had to be performed, suggested that the “looking ahead” is indeed involved in the planning of motor responses and that it is affected both by the performers’ expertise and by stimulus characteristics. this time, we had also been able to synchronize the performance and eye-movement data with what we thought to be sufficient accuracy, and we had found measures which had brought forth something new about the “looking ahead”. however, even though we were content with this improved data set and thought it could indeed contribute to our understanding about the “looking ahead”, the first manuscript we submitted was rejected and only later accepted by another journal. this was, at the latest, the moment when we realized that it really is hard to find publication forums for this type of work (apart from music education, which we thought to be one relevant field). as we framed our work both within educational sciences, where music is by no means a mainstream topic, and within cognitive musicology, where in 2011, eye-tracking was still a relatively unknown method, and as none of us were psychologists and therefore not accustomed to write to that audience, we were caught somewhere in between these fields. (at this point, we need to thank all reviewers who took up the job of reviewing our work; some of them have stated that they were not experts in music or eye-tracking, and we do appreciate the effort they put into their reviews.) in addition, we later came to be more critical of our analytical approaches; they were still only a series of separate (piecemeal) comparisons. we started to think that if several factors (e.g. performer characteristics, even the minutest features of the stimulus, and most likely also the performance tempo) did co-influence the “looking-ahead” in music reading, as now seemed to be the case, we should study them in a way which would separate the overriding factors from those which interacted with other factors. we had also applied only one tempo in these studies. (some of these realizations were reached while already working on our next data sets [see 2.3, below]). although this performance task, with its authentic melody and violation of expectations, differed from the simple performance task in step 1 and was designed to address very different kinds of research questions, in retrospect, the studies seem to have shared more common features in a methodological sense than we had previously considered. 2.3 step 3: replacing the hand with metrical time (huovinen, et al., 2018) 2.3.1. study summary step 3 had its origins in the data collection in 2008–2009. we first attempted to publish the 2008–2009 data as a separate paper which focused not on novices’ reading but on the reading of the two different kinds of experimental melodies (ones which had a large interval in the middle of a measure and ones where the “skip” was across a bar line) during correct performances. however, even after submitting a revised version, our work was rejected, and we realized that we did need the synchronized performance data in order to fully explain our findings (huovinen & penttinen, unpublished manuscript; huovinen, penttinen, & ylitalo, unpublished manuscript). thus, in 2011, we asked the same skilled participants who had performed the “mary had a little lamb” task to also perform another task. we modified the 2008–2009 set-up (see 2.1, above) by adding eight more melodies to the set, including four with no large intervals, or “skips”, in them. we had realized that we needed these stepwise melodies so that we would have a baseline with which to compare the melodies with the large intervals. we also asked the performers to play half of the melodies in a slower tempo and half in a faster one. the tobii tx300 eye tracker was used, and all was considered to go smoothly; our skilled performers did the tasks with very good temporal accuracy and few mistakes, and though the work was laborious, we were able to synchronize the performance and eye-movement data with a similar technique as in the task in step 2. our previous measure for the eye-hand span was calculated from the point of performance onward (see figure 3). this was the traditional approach in other prior studies, too. however, we now noticed that in fact, this measure describes more those moments in the performance when the performer can look ahead (huovinen, et al., 2018), that is, moments when performing a symbol is easy enough to allow the reading of symbols further ahead. in our case, that was not the issue of interest (especially when the melodies were almost overly simple to perform). since we had simple melodies with “more difficult” notes in them at the large intervals, our question was whether these notes were fixated upon, or “looked ahead”, as early on as possible and whether the visual processing was therefore affected by such minute features of the written music. this we did not know beforehand. thus, we came to realize that the eye-hand span measure should be calculated exactly the other way around from how we had done it; if we wanted to examine what the performer needed to look at in advance, we needed to take the first-pass fixation upon a specific target as the starting point and calculate the “span” backward to where the metrical time was at that particular moment (see, figure 4). in practice, this would tell us when a performer attempts to fixate a symbol as early on as possible. we therefore differentiated between two measures, the traditional eye-hand spanperformance and the new eye-hand spanfixation. figure 4. how we calculated the eye-hand spanfixation, or eye-time span, in huovinen and colleagues (2018). (this is only for illustration; please note that those studies actually applied stimuli resembling that in figure 2.) the red arrow marks the steady progress of metrical time. the eye-time span was calculated from the first fixation to a note symbol (in this example, from the last note in measure three). by regarding the metrical time as a continuous axis, we could calculate where the metrical time was at the onset of a fixation. in this example, the eye-time span is 1.5 beats or, in a tempo of 60 bpm, 1500 ms (i.e. the metrical time is ongoing 1500 ms behind the fixated note). the keypress (or “hand”) information was only used to estimate the correctness of the performance as well to synchronize the metrical time with the eye-movement recording; after that, all calculations were based on the timestamps for fixation onset with respect to the thus abstracted metrical time. we considered the adding of the eye-hand spanfixation measure a significant advance in our thinking. in terms of data analyses, we also turned to generalized estimation equations and began to statistically model our data instead of listing a series of pairwise comparisons. this shift in analytical thinking later greatly affected how we designed our studies. at this point, we thought we had finally come to understand the “looking ahead” from all its angles and what kinds of questions should be answered with what kind of approach. (later, we noticed that in domains other than music reading, the eye-hand span can also be calculated as the distance between the first fixation on a target and its subsequent performance [holmqvist et al., 2015; huovinen, et al., 2018].) however, our measures and findings were far from easy to explain or present, and our first manuscript, too, was not accepted for publication (huovinen, ylitalo, penttinen, & penttinen, unpublished manuscript). in 2015, we decided to collect one more data set in order to corroborate our previous findings. this data collection was enabled by large project funding, which also gave the three authors a new support team, with representatives from the fields of education and musicology. in what was titled experiment 2 (data from 2011 being experiment 1), we had only professional pianists performing longer melodies which had more difficult larger intervals, again in the same two tempos as in the 2011 study. this time we used the 60-hz tobii t60xl eye tracker. the idea was to make the large intervals more salient and thus make any potential “looking ahead” they may cause more profound. from this perspective, it was only the eye-hand spanfixation measure that could answer this question. thus, we left out the comparison of the two eye-hand measures and moved on to explain the effects caused by the now ever more salient large intervals only with the eye-hand spanfixation measure, and we gave it a new more appropriate name, the eye-time span (since the hand was not involved in the calculations). again, we thought that we were on to something; we were focusing on one characteristic of the music notation and how it affected the “looking ahead”, having also developed a measure which could be used for its study. the feedback for this was not favourable, however, and the manuscript, in which we reported experiments 1 and 2 together, was rejected (huovinen, ylitalo, & puurtinen, unpublished manuscript). after rethinking again the nature of the phenomenon under study, and the measures, we prepared yet a new manuscript, this time including a more detailed description of the eye-hand span and eye-time span, their measurement, and use. in it we reported that besides the general effects of performance tempo and musical expertise on the eye-time span, the local melodic complexity (i.e. the larger intervals) could elicit longer advance inspections; however, these inspections might not land on the “complex” notes themselves but on the ones preceding them (huovinen, et al., 2018). 2.3.2. rethinks and realizations after step 3 all of step 3 indeed took a while, and those years included a number of disappointments and moments when we had to admit that what we had thought to be a good idea was not. nevertheless, despite the emotional struggle, in retrospect, we know that we needed all of that earlier work. during each stage (be it data collection or the writing of a rejected manuscript), we realized and came to think of something which then helped us with the next study design and analyses. we also improved the formulation of research questions and are now better able to estimate what kinds of evidence eye tracking can provide us with in this context. as a result, we can now argue for slightly more straightforward explanations of our findings and can see more clearly which way we or some other team should turn next. however, the results we reported by huovinen and colleagues (2018) are still not the full answer to all of our questions. if, in suitable circumstances, as the data suggests, the pianist’s gaze is drawn toward salient features, but fixations do not necessarily land exactly on them, this suggests that the perceptual span (see 1.1, above) has come more clearly into play. in some cases, then, the salient features observed (albeit in a blurry form) within the perceptual span may attract a visual response so early that the performer can not target a fixation upon that exact note; if she did, the gaze might move too far away from where the music is ongoing. this would perhaps make it difficult to keep all the notes in between in short-term memory and perform them correctly. whether this undershooting is involuntary due to temporal restrictions, or deliberate because the procedure is enough for fitting the problematic symbols into the area of accurate vision, we can not be sure. however, perhaps the visual information which is visible from that fixation point is still enough to allow the performer to get ready for a correct motor response. in order to search for the full answer to this question, we would have to conduct studies that apply the gaze-contingent methodology, that is, where a limited part of the notation is shown to the reader at a time (rayner, 1998). the method has apparently been applied only in two studies about the size of the perceptual and eye-hand spans during music reading (gilman & underwood, 2003; truitt et al., 1997), but not with the most controlled music stimuli—and we do claim, based on all our research, that the stimulus features should be included as part of the analyses and should not be thought of only as part of the study protocol (puurtinen, 2018). we have, to this day, opted for a “natural” layout of the music, showing all of the music during our recordings, but after step 3 it now seems that even more experimental set-ups may be needed in order to further explain the interplay between the eye-time span and the perceptual span during music reading. another matter we have come to recognize is that both the eye-hand span and the eye-time span we have calculated have one drawback: we have only calculated them at certain points during the performances (although in step 2, the gaze activity measure described to some extent what happened between two quarter beats). the eye-time span is actually a continuously changing parameter, similar, for instance, to pupil size. thus, to truly understand the “looking ahead”, the study of this aspect should not be limited in future only to within-subject or between-subject comparisons at certain notes. instead, we should also develop methods which allow the examining of the “looking ahead” as a process, a characteristic of a full performance, varying according to the factors which appear to affect it. 3. lessons learned as the examples above perhaps demonstrate, our study of the “looking ahead” in music reading has been a long and, on occasion, also tiring endeavour. we have came to change a number of things in our methodologies, such as the skill level of our study participants (starting with novices and amateurs and finally recording performances of professionals only), our measures (developing one, the eye-time span, that we thought will be the answer to our questions but which actually lead to even more open ones), as well as our analytical approach (from piece-meal comparisons to statistical modelling). already these examples should demonstrate the methodological twists and turns of our work. to be sure, each step we have taken, from planning an experiment to the final published report of the findings, has made us less the novices we initially were. still, we are aware that the path toward expertise in this methodology and domain will require the taking of many more similar steps. hopefully, this history of our choices and our reasons for moving away from our earlier choices are of benefit to others planning to use new methods and measures and perhaps even use them in domains where there is little methodological support available. we will now attempt to summarize some of the more general lessons we have learned for the benefit of others who may find themselves in a similar situation to where we have been (and still are), using our own experiences as an example. first, many of us who are trained in something other than the computer sciences might not be in our comfort zone when we first start to apply research methods which require the use of technology we are not used to. for us, for instance, the synchronizing of the two recording onsets in step 1 was done with remote support from a manufacturer. we were able to build the system ourselves, but when it failed, there was nothing we could do about it. when learning the use of new equipment, and with limited skills in solving problems related to it, we recommend that researchers create a plan b in their research design; they should ensure that their research questions can be answered, at least in part, even when something goes wrong on the technical side. in our case, the comparison of novices and amateurs helped us to make use of the data, and we also included a task of silent music reading where the synchronization was not needed (penttinen, et al., 2013). however, more insight into the participants’ reading would have been gained had the technical side been more successful. second, when starting to work with a new method and/or new types of stimuli, an exploratory pilot with a complex and perhaps realistic task is a possible starting point for creating research hypotheses (see, puurtinen, 2018), but to get beyond descriptive data, it may be useful for researchers to control their stimuli (or task) as much as possible. this might not answer the most interesting of research questions, but enables one to test the possibilities of the method. for instance, in prior work about eye movements in music reading, the role of the applied music was thought of more as a part of the study protocol, something “read” or “played”, although it now seems that even the minutest details of the music (whether there is one quarter note or two eighth notes to be played, or a slightly larger interval among an otherwise stepwise melody) are already reflected in the eye-movement process. thus, following this line of thought, before knowing how a complex task is actually represented in the measure of the researchers’ choice, they should start with simple set-ups. the reporting of the findings might be challenging (we have heard a number of times how our tasks are not “musical” at all, since they are so simple), but in the end, one can argue with more certainty what one’s findings might mean and what their cause might be. third, for others beginning their work with new methods, we suggest the use of measures from other domains (or previous studies, if available). importantly, pay attention to the fact that measures which different research teams consider the same may, in fact, be calculated quite differently. with unique measures and their definitions, one’s results might be very hard to align with others’ findings. in music-reading studies, there is great variability in the measures and in how they are calculated (puurtinen, 2018); the eye-hand span, with its many ways of operationalization, is a very good (or bad) example (huovinen, et al., 2018). in our work, partly because they were familiar to us, we began from the very beginning with some measures applied in text-reading research, and we have been content with that early choice we made. in developing areas for research, we should aim at methodological consistency to the extent that it is possible and should provide data and reports which can later be used in meta-analyses. to be sure, measures also need to be developed further and new ones tested; still, existing measures provide a good starting point. fourth, after conducting some studies with similar protocols, test the protocols. this is something we should also do. for instance, our team members’ own background in music made us think of using a metronome in order to secure the temporal similarity of the performances. in practice, we had an external “click” marking the onset of certain beats. we only later noticed that this was not a standard protocol, up until that point, at least. the external temporal control means that the performer herself does not have to use any of her cognitive capacity to maintain the tempo (something that is challenging, especially for novice and intermediate performers), but still, the possible effects caused by the external click should also be tested. it may be that in our data, there are main effects caused either by the strain of following the external click or by the lack of strain due to the relaxation of the need to maintain the tempo without any help. similarly, with other methods, there may be protocols which first seem sensible and which are ones we become accustomed to, but those could also raise new research questions when further thought through. fifth, if possible, create or join a team, and make it something more than just a group of colleagues. in research areas which reach out to several background theories or methodologies in particular, no one can be the expert on all of the relevant topics, and it is difficult for one person to reach all the audiences which might be interested in and whose input would be beneficial for the work. also, potential disappointments and difficulties are much easier to handle with people whom the researchers trust and who support them — and the celebrations are also much more fun like that. thinking of our step 3, one may stop to think whether, alone, they would have done the three data collections, written and submitted the several manuscripts and revisions which were rejected, or kept one subproject going for the ten years it took to get one piece of the work published. we now think it is a good thing that we made it (though we immediately saw the next steps someone should definitely take). of course, in long-term projects and among people who work with something as personal as their intellectual skills, small or large conflicts necessarily arise at one point or another — but with good enough emotional bonding and similar goals, these do not break the team and just become a part of life. to conclude, we have here presented how we have tried to develop our methodologies for the study of “looking ahead” in music reading, openly listing our unpublished work and what we now know were mistakes or simply bad ideas. overall, as will probably also be the case for many others, we do not think our path has simply been either a success or a failure. instead, it has been a back-and-forth movement (cf. goolsby, 1994), and we are glad to have the chance to report also the backward steps. hopefully, other researchers, too, can share those parts of their projects which are faded out of polished publications and presentations — those are, after all, the ones we could really learn from. keypoints the application of a method in a research domain where there is little methodological tradition of that kind presents the researcher with a challenge; likely, there is a need to develop the understanding of the phenomenon and the methodology in parallel. by using our own 10-year-long work as an example, we demonstrate the advances and setbacks we have faced in our eye-tracking studies about music reading. we discuss some of the lessons we have learned in order to provide practical suggestions to others planning to take up a new method or apply one in a novel domain. acknowledgments this work was supported by the academy of finland (project number 275929). the author would like to thank erkki huovinen and anna-kaisa ylitalo for collaboration in the presented work, the members of the “reading music: eye movements and the development of expertise” consortium for their support, the study participants involved in the data collections described in this paper, and the anonymous reviewers for their feedback and suggestions. references burman, d. d., & booth, j. r. (2009). music rehearsal increases the perceptual span for notation. music perception: an interdisciplinaryjournal, 26(4), 303–320. doi: 10.1525/mp.2009.26.4.303 drai-zerbib, v. & baccino, t. (2013). the effect of expertise in music reading: cross-modal competence. journal of eye movement research, 6(5), 1–30. doi: 10.16910/jemr.6.5.5 drake, c., & palmer, c. (2000). skill acquisition in music performance: relations between planning and temporal control. cognition, 74(1), 1–32. doi: 10.1016/s0010-0277(99)00061-x furneaux, s., & land, m. f. (1999). the effects of skill on the eye-hand span during musical sight-reading. proceedings of the royal society of london, series b, 266 (1436), 2435–2440.doi: 10.1098/rspb.1999.0943 gilman, e., & underwood, g. (2003). restricting the field of view to investigate the perceptual span of pianists. visual cognition, 10(2), 201–232. doi: 10.1080/713756679 goolsby, t. w. (1994). eye movement in music reading: effects of reading ability, notational complexity, and encounters. music perception, 12(1), 77–96. doi: 10.2307/40285756 holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2015). eye tracking: a comprehensive guide to methods and measures. oxford, uk: oxford university press huovinen, e., & penttinen, m.(unpublished manuscript). the allocation of fixation time in simple sight-reading tasks. huovinen, e., penttinen, m., & ylitalo, a.-k.(unpublished manuscript). the visual processing of melodic group boundaries: an eye-movement study. huovinen, e., ylitalo, a.-k., penttinen, m., & penttinen, a.(unpublished manuscript). the where and when of sight reading: effects of performance tempo and musical structure on eye movements. huovinen, e., ylitalo, a.-k., & puurtinen, m.(unpublished manuscript).the eye-time span: structural salience and looking ahead in simple sight-reading tasks. huovinen, e., ylitalo, a.-k., & puurtinen, m. (2018).early attraction in temporally controlled sight reading of music. journal of eye movement research, 11(2), 1-30. doi: 10.16910/jemr.11.2.3 hyönä, j., lorch, r. f., jr., & kaakinen, j. k. (2002). individual differences in reading to summarize expository text: evidence from eye fixation patterns. journal of educational psychology, 94(1), 44–55. doi:10.1037//0022-0663.94.1.44. hyönä, j., lorch, r. f. jr, & rinck, m. (2003). eye movement measures to study global text processing. in j. hyönä, r. radach, & h. deubel (eds.), the mind’s eye: cognitive and applied aspects of eye movement research (pp. 313–334). amsterdam: elsevier science. jarodzka, h., holmqvist, k., & gruber, h. (2017). eye tracking in educational science: theoretical frameworks and research agendas. journal of eye movement research, 10(1):3, 1–18. doi: 10.16910/jemr.10.1.3 madell, j., & hébert, s. (2008). eye movements and music reading: where do we look next? music perception, 26(2), 157–170. doi: 10.1525/mp.2008.26.2.157 penttinen, m., & huovinen, e. (2009). the effects of melodic grouping and meter on eye movements during simple sight-reading tasks. in j. louhivuori, t. eerola, s. saarikallio, t. himberg, & p. eerola (eds.), proceedings of the 7thtriennial conference of european society for the cognitive sciences of music , jyväskylä, finland. (pp. 416-424). available at https://jyx.jyu.fi/dspace/handle/123456789/20910 penttinen, m., & huovinen, e. (2011). the early development of sight-reading skills in adulthood: a study of eye movements. journal of research in music education, 59(2), 196-220. doi: 10.1177/0022429411405339 penttinen, m., huovinen, e., & ylitalo, a.-k. (unpublished manuscript). eye-movement effects of melodic deviations in temporally controlled sight reading. penttinen, m., huovinen, e., & ylitalo, a.-k. (2012).unexpected melodic events during music reading: exploring the eye-movement approach. in e. cambouropoulos, c. tsougras, p. mavromatis, & k. pastiadis (eds.), proceedings of the 12thinternational conference on music perception and cognition and the 8thconference of the european society for the cognitive sciences of music, thessaloniki, greece. (pp. 792-798). available at http://icmpc-escom2012.web.auth.gr/sites/default/files/papers/792_proc.pdf penttinen, m., huovinen, e., & ylitalo, a.-k. (2013).silent music reading: amateur musicians’ visual processing and descriptive skill. musicæ scientiæ, 17(2), 198-216. doi: 10.1177/1029864912474288 penttinen, m., huovinen, e., & ylitalo, a.-k. (2015).reading ahead: adult music students’ eye movements in temporally controlled performances of a children’s song.international journal of music education: research, 33(1) , 36-50. doi:10.1177/0255761413515813 puurtinen, m. (2018). eye on music reading: a methodological review of studies from 1994 to 2017. journal of eye movement research, 11 (2), 1-16. doi: 10.16910/jemr.11.2.2 rayner, k. (1998). eye movement in reading and information processing: 20 years of research. psychological bulletin, 124(3), 372–422. doi: 10.1037/0033 rayner, k. (2009). eye movements and attention in reading, scene perception, and visual search. the quarterly journal of experimental psychology, 62(8), 1457–1506. doi: 10.1080/17470210902816461 rosemann, s., altenmüller, e., & fahle, m. (2016).the art of sight-reading: influence of practice, playing tempo, complexity and cognitive skills on the eye-hand span in pianists. psychology of music, 44(4), 658–673. doi: 10.1177/0305735615585398 truitt, f. e., clifton, c., pollatsek, a., & rayner, k. (1997). the perceptual span and the eye-hand span in sight-reading music. visual cognition, 4(2), 143–161. doi: 10.1080/713756756 waters, a., & underwood, g. (1998). eye movements in a simple music reading task: a study of experts and novice musicians. psychology of music, 26(1), 46–60. doi: 10.1177/0305735698261005 microsoft word ritella et al_publication.docx spe frontline learning research vol.4 no. 4 special issue (2016) 48 -‐ 55 issn 2295-‐3159 corresponding author: giuseppe ritella, cradle, institute of behavioural sciences, university of helsinki, siltavuorenpenger 5a, p.o. box 9, 00014 university of helsinki. email: gritella@gmail.com doi: http://dx.doi.org/10.14786/flr.v4i4.210 theorizing space-time relations in education: the concept of chronotope giuseppe ritella1a, maria beatrice ligoriob & kai hakkarainena a university of helsinki, finland b university of bari, italy article received 15 september / revised 21 december / accepted 21 december / available online 19 january abstract due to ongoing cultural-historical transformations, the space-time of learning is radically changing, and theoretical conceptualizations are needed to investigate how such evolving space-time frames can function as a ground for learning. in this article, we argue that the concept of chronotope – from greek chronos and topos, meaning time and place/space – lends itself well to reach this aim. in particular, we outline three features of chronotope: 1) its analytical focus includes the examination of the potential interdependency between space and time; 2) it allows us to examine space and time as social constructions, negotiated in dialogical interaction; 3) it involves the analysis of both the material organization and the discursive negotiation of space and time. we use examples from our own studies and from relevant literature to illustrate how these features of the concept allow us to examine the role that space-time relations play in educational practice. finally, we draw our conclusions and briefly introduce the theoretical and methodological challenges to be addressed for a full development of the concept. keywords: space-time, ecological perspectives on learning, co-construction of knowledge, agency, materiality, trans-contextuality, chronotope ritella et al | f l r 49 1. introduction this article is framed as a theoretical contribution to a special number of the journal whose purpose is to discuss how existing conceptualizations of educational practices should be redefined to address emerging learning issues. the aim of the article is twofold: 1) to argue for the relevance of analysing spacetime relations as an emerging issue in research on learning, and 2) to propose chronotope as a useful theoretical concept, allowing us to uncover the time-space relations in local investigation sites. the rationale of our argumentation is that we live in an historical moment in which both technological innovations and educational reforms are triggering deep changes in space-time relations in learning. the introduction of continuously evolving virtual spaces and the implementation of pedagogical approaches such as the flipped classroom (flipped learning network, 2014), connected learning (ito et al., 2010) and place-based learning (van eijck & roth, 2010) entail the transformation of the spatial and temporal organization of learning. for example, in a school using the flipped learning approach, lessons can be shared through internet-based software and studied at home, while in the classroom the students can engage in collaborative learning. the time-space organization of learning and teaching change, based on the pedagogical use of technology. historically, such transformations of space-time have been crucial in changing educational practices: “the organization of a serial space was one of the great technical mutations of elementary education. it made it possible to supersede the traditional system (a pupil working for a few minutes with the master, while the rest of the heterogeneous group remained idle and unattended). by assigning individual places it made possible the supervision of each individual and the simultaneous work of all. it organized a new economy of the time of apprenticeship. it made the educational space function like a learning machine, but also as a machine for supervising, hierarchizing, rewarding.” (foucault, 1977, p. 155) accordingly, in order to understand what current transformations imply for learning, we need conceptual and analytical tools that allow examining the space-time relations that emerge in the empirical sites of investigation. we claim that the concept of chronotope can be productively used to reach this aim. crafted from the ancient greek words chronos and topos, meaning time and place/space, chronotope was devised by mikhail bakhtin (1981) both to examine the space-time patterns characterising literary genres and to develop a framework for the cultural analysis of space-time (holquist, 1982). recently, educational research has shown increasing interest in this concept. following bakhtin, this literature is based on the assumption that space and time are interdependent social constructions rather than independent given realities (van eijck & roth, 2010). a literature review of the uses of the concept is beyond the scope of this short article. instead, by drawing on our own studies and on some of the relevant literature, we discuss how chronotope can be adopted for enriching our understanding of learning in the twenty-first century. first, we will briefly introduce the main features of chronotope as a conceptual tool for the analysis of space-time frames. second, we will refer to three significant socio-cultural studies that have addressed issues related to space-time relations. we have selected these three studies because we think that they are valuable contributions to sociocultural research; they are also relevant for developing a theory of space-time in education. in sum, we seek to demonstrate that we can gain further insight about learning processes by carrying out a specific analysis of the (discursive) social negotiation and bodily-material organization of space-time. finally, theoretical and methodological challenges will be discussed. ritella et al | f l r 50 2. the concept of chronotope three features of chronotope make its use fruitful for examining space-time relations: a) its analytical focus includes the examination of the potential interdependency between space and time; b) it allows us to examine space and time as social constructions, negotiated in dialogical interaction; c) it involves the analysis of both the material organization and the discursive negotiation of space and time. in the following paragraphs, we will briefly discuss each feature of chronotope, illustrating through examples how it contributes to make salient situated time/space arrangements. first, space and time are interdependent when considered for the analysis of human action, making it crucial to examine space and time in a coordinated way. a clear example of this interdependence can be found in the different space-time organization of a literature review carried out by means of a search engine such as google books and the same review carried out in a physical library. the absence of a digital search engine implies that one has to take the time to visit the library, consult the physical book records, often organised alphabetically by author names, titles or fields/topics of the books. an alternative is to ask for help from the library staff. all of that must be done before going to find the actual physical books on the shelves and then finding the relevant text within each book. these actions should be carried out in this order, unless one decides to engage in a (most likely ineffective) exploration of the shelves, opening and skimming systematically those books having interesting titles. in contrast, some digital search engines enable a search for books in any location connected with the internet through a full-text search of one or more keywords and obtaining immediate access to the pages where the searched keyword appears. from that point, a researcher has the chance to form an idea of the contents of interest, and then decide whether to read the full text and if so, how carefully. analysing these two cases by using chronotope implies to examine spatial relations (e.g. location of events, spatial arrangements of workspaces, etc.) and temporal relations (e.g. duration, temporally ordered sequences of actions, rate of recurrence of events, etc.) in a coordinated way. thus, it allows recognizing that the introduction of a virtual space containing a digital search engine in the workspace involves a transformation of the entire temporal structure and duration of the activity. the other way around, temporal limitations can affect the selection of tools and the organization of the spaces of the activity. for example, in one of our investigations (ligorio & ritella, 2010), we used the concept of chronotope to analyse how a group of teachers changed their bodily positions around the computer throughout a session of collaborative problem solving. we identified different spatial arrangements of bodies and technology and different tempos of the activity. the analysis showed that one particular spatial arrangement, where all the teachers were standing around the same computer, was realized in order to speed up the accomplishment of the task when the end of the session was approaching. in observing that space and time are interdependent, we do not mean that all transformations of space involve a transformation of time and vice versa. it is possible that some changes in space are neutral with regard to the organization of time and vice versa. for example, changing the spatial organization of the objects on a desk does not necessarily implies changes in the temporal organization of the activity. however, theoretically it is important to acknowledge the existence of such interdependence, as described in the two examples above. it is a goal for researchers to determine under which conditions such interdependence is relevant for educational practice. second, the concept of chronotope was introduced by bakhtin as a concept for the cultural analysis of space and time (holquist, 1981). this implies considering all the different voices involved in social processes, in contrast to the “philosophical monopolization” of scientific discourse, which considers space and time as given realities external to human experience (van eijck & roth, 2010). therefore, the task set by using the concept of chronotope is not just to measure physical spaces and time intervals according to the ritella et al | f l r 51 consolidated scientific paradigm. the scientific understanding of space-time is just one of the possible voices to be considered for understanding space-time relations, and the other voices – for instance, the voices of the participants – should not be silenced in the analysis. thus, there is a need to consider dialogue in the analysis of space-time. in other words, the concept of chronotope is devised to examine space and time as they are socially negotiated in dialogue. indeed, they are not simply ‘given’ to the participants involved in an activity: meanings associated with both space and time are socially negotiated, and the organization of activities in space-time can be highly flexible. van eijk and roth (2013), for example, examined an environmental education project in which students from aboriginal canadian communities engaged in nature conservation practices in a marine park. by using the lens of chronotope, the authors managed to understand how the aboriginal students and the collaborators to the project constructed the material space of the marine park in dialogical interaction. in particular, they illustrated how science education activities carried out during the project involved conflicting notions of space and time derived from the perspective of natural science and from the local culture. analysing these tensions allowed the researchers 1) to identify some contradictions in the project and 2) to trace how the students experienced the science education activities while developing their cultural identities as aboriginals. third, as stated by many authors (hirst & vadeboncouer, 2006; van eijck & roth, 2010; matusov, 2009), chronotope concerns both the immaterial, semiotic, worlds of discourse and narratives, and “patterns of organization of space and time” (lemke, 2004) that are enacted through the movement of bodies and objects. therefore, chronotopic analysis takes into account also the bodily-material aspect of space-time. given such ground, many scholars have used chronotope to investigate space-time at the boundary between material and discursive processes. for example, brown and reshaw (2006) discuss how students express their agency by actively shaping the space-time contexts of the classroom, drawing on past, present and future temporal relations though discursive interaction. in particular, they analysed: 1) how a student initially built her private space-time within the classroom by using a library shelf and a desk, and in a second moment removed those barriers to actively enter into the shared space-time of collaborative activity; 2) how the students discursively shifted between different space-times while explaining and justifying their ideas and developing their identities. another good example is the study by hist and vadeboncoeur (2006). here the authors analyse how the re-engagement in schooling by a dropout student was mediated by the construction of a dynamic spatial network, which involved movements between material spaces (a learning centre and the student’s home) and the concomitant reframing of the student-teacher relationship. in both these studies, the use of the concept of chronotope allowed to uncover how space-time arrangements affected the learning processes. in the present article, chronotopes are defined as “socially emergent” (sawyer, 2005) units of spacetime, where both discursive and material aspects of space-time relations are considered. in line with the literature, human cognition and learning are not conceived as located within the boundaries of the mind, but are distributed in the space-time context of the activity. contexts – including space-time – emerge from a continuous process of social negotiation engaged in by learners (bateson, 1972; cole, 1996; duranti & goodwin, 1992). during learning activities, participants individually or jointly attend to various physical and symbolic spaces: they organise their workspace, co-ordinate their efforts, and perceive space-time constraints and opportunities related to the technological tools used and to the institutional regulation of space and time. following bakhtin, we consider these spatial and temporal processes to be fused, requiring a co-ordinated analysis. examining only space or only time could bias our understanding, given the reciprocal impact they can have on each other. ritella et al | f l r 52 3. sketching a conceptualization of chronotope for 21st-century learning as argued above, chronotope is a concept specifically intended for the analysis of space-time relations. given the features of chronotope that we have presented above, the goal we set for chronotopic analysis is to investigate: 1) how patterns of space-time organization are involved in different learning activities, different schools systems, different pedagogical approaches; 2) how participants make sense of space-time patterns in dialogical interaction; 3) how participants’ discursive negotiation of space-time is related to their bodily-material organization of space and time. this topic is not new to education. space and time are ubiquitous categories of human experience, and this is reflected in the research literature. indeed, most studies in the learning sciences involve references to space and/or time co-ordinates, although it is rare to find studies that specifically address space and time, or better put, space-time, as the primary focus of investigation. below, we will discuss how specific attention to space-time attained through the concept of chronotope can enrich and extend the knowledge derived from three investigations in which the categories of space and time were relevant. first, silseth and arnseth (2011; 2015) provide an interesting example by examining learning across sites from a dialogical perspective. the authors analyse how ideas and perspectives that emerge in out-ofschool situations, and external representations produced in the past, are mobilized in situated interaction. these resources contribute to creating connections between different situations of learning. space and time are highly relevant for this topic of investigation, which concerns learning processes taking place in multiple locations across extended periods of time. however, silseth and arnseth seem to consider space and time as a background against which the learning takes place, failing to consider them as analytical foci. we argue that the role played by the organization of space-time in this process can be examined by using the concept of chronotope. for example, in a previous case study (ritella & ligorio, 2016) we studied the collaborative sensemaking of a group of professionals working on the design of a web-platform. we discussed how the space-time organization of the activities that took place before face-to-face meetings of the group is connected with the emergence of ideas and viewpoints during subsequent meetings. we found that providing an online space for writing individual notes foregrounded the emergence of the personal perspectives of the participants during the upcoming meeting. in contrast, arranging a physical meeting resulted in the emergence of a collective perspective by a subgroup of participants. thus, we argued that chronotope helped to uncover how the space-time organization of activities affects sensemaking across multiple locations and extended periods of time. second, engle (2006) shows that the way teachers discursively frame the context of learning, including the definition of temporal boundaries, affects students’ transfer of knowledge across different contexts. in particular, the author analyses how a teacher framed learning episodes as building on previous ones or as relevant for the future. the continuous references to the past and the future helped the students to consider these episodes as interconnected, thus supporting the transfer of knowledge across contexts. the author states that space is also relevant for analysing the framing of context, and further research is needed to understand its role. the concept of chronotope, we argue, could be employed to direct analytical attention to both space and time relations, and analyse them in a co-ordinated way, adding further insights. ryan (2011), for example, used the concept of chronotope to examine how students discursively conceive the space-time of university. this study shows that space and time jointly contribute to define the students’ orientation towards academic life. in particular, some students depicted the university as a site of mass education with large lecture theatres, no permanent space for student groups and limited time for individual meetings with teachers because of the busy life of the academic staff. put together, time limits and spatial arrangements of university buildings generated a conception of the university as a potentially distant service provider, which encouraged students to spend most of their time off-campus. thus, using the concept of chronotope allowed detecting how the discursive construction of space-time can have an effect on the students’ academic practices. ritella et al | f l r 53 a last example is the research by jornet and roth (2015), who discussed how students make sense of multiple material representations of scientific phenomena across time. in particular, the authors trace the students’ bodily and pragmatic actions in interaction with representations of scientific phenomena, and discuss how these relate to the students’ experiences and interpretations. we recognise that jornet and roth significantly discuss such a relationship by means of their analysis. however, we claim that this topic of investigation requires also a specific analysis of space-time relations that was missing in their study. the material representations used by the students are distributed in space and they are picked up at different times. conceptualizing space-time relations as chronotopes might allow researchers to uncover how the students spatially arrange their bodies around, and attend to, multiple representations at different times. in a previous study (ritella, ligorio & hakkarainen, 2015), we have used the concept of chronotope to examine how a group of teachers managed the resources available in the context during a session of collaborative problem solving (ps). we interpreted, diachronically, the alternation of 1) events in which the participants explored the space and actively searched for resources in the environment, and 2) events characterized by a focus on a stable set of resources. one of the findings of this study was that in some phases of ps the first type of events was dominant. these moments were associated with: (a) the introduction of a new task, (b) the use of a software suite not yet mastered by the teachers or (c) a change in the configuration of participation (i.e. the participants physically changed their positions in the room, or changed the set of tools used) realised in conjunction with a difficulty in trying to solve the problem. we expect that patterns of this kind could be found also while analysing the students’ use of multiple representations. for example, it could be found that material representations play a crucial role in some phases of educational activities and/or that the way in which they are spatially organized affects their usage by the students. thus, a better understanding of when the students explore the space around them and when they attend to each representation could give us further insights on how they interact with the material environment during learning practices. in sum, in this section we have shown that a specific focus on space-time can yield additional insights for the analysis of learning. for instance, we could see how learning may be affected by the psychical organization of the space within which students interact and how the perception of time constrains influences how the task is perceived. as we will further discuss in the next section, the concept of chronotope has great potentials for examining learning practices as they unfold in space and time, even though some challenges have still to be tackled for the full deployment of the concept. 4. implications and challenges for chronotopic analysis in this paper, we argue that the concept of chronotope can enrich the (dialogical) understanding of learning practice. we have briefly presented this concept and discussed how it could be used to enhance and extend the findings of investigations that implicitly or explicitly address space and time relations in learning. we define chronotope as the emergent configuration of temporal and spatial relations in educational practices. to provide some examples of how chronotopes can be fruitfully used to analyse learning practices, we have discussed how the discursive/bodily/material organization of space-time is connected to: a) the learning processes taking place at multiple locations across extended periods of time, as theorised by silseth and arnseth (2011); b) the participants’ discursive framing of situations of learning, as outlined by engle (2006); c) the use of multiple representations during a learning activity, as examined by jornet and roth (2010). we believe that the range of applications of chronotope extends far beyond the processes we have discussed here. surely, considering how space-time is organised by participants may be useful in designing learning tasks, especially when technology is involved. indeed, contemporary digital environments offer multiple types of resources; (re)organising them in time and space during an activity is not a trivial task. a ritella et al | f l r 54 complex orchestration is needed to use all resources effectively. the same applies in designing training situations for teachers. the appropriation of technology from the teachers’ side is not only a matter of understanding the technical features. developing awareness of how the time-space is transformed by technology may help teachers in improving their educational practices. the changes that technology introduces go beyond the local classroom situation. as in a cascade effect, these changes ultimately cause modification in the larger society. renshaw (2014) pointed out that the school systems can be characterised by different chronotopes in different historical periods. we are aware that the conceptualization sketched in this paper is not yet fully developed and has some limitations. one of them is that current methodological tools are not yet adequate to grasp the organization of space-time in all its complexity. indeed, the organization of space and time applies to different units of analysis. both micro-processes, such as the situational co-ordination of a group of students, and macro-level processes, such as the historical development of school systems, involve patterns of organization of space and time. this suggests that the operationalization of chronotope and the methods used should be adapted to the unit of analysis in each investigation. moreover, another challenge is that the negotiation of space and time can often be implicit and difficult to detect. in one of our studies (ritella, ligorio & hakkarainen, accepted), while negotiating the meaning of a task set by teachers, the students broadly discussed the negotiation of time, but the discussion about space was marginal during the observed interaction. one possible interpretation was that space had been taken for granted and did not emerge clearly in the students’ discourse. however, this could be attributed also to a methodological limitation. the students may have engaged in discussions about space during breaks or informal meetings, when the researcher did not observe the interactions. therefore, a more comprehensive research design should be planned, able to make explicit the conceptions of space and time while preserving the privacy of the participants. based on the discussion here presented, we argue that continuing to pursue this frontline concept is important for advancing our understanding of contemporary learning practices because space-time relations have undergone profound transformations. acknowledging the ongoing transformations of space and time in education involves theoretical and methodological challenges to research. we believe that the concept of chronotope, thanks to its focus on space and time as interconnected social constructions, which are discursively negotiated and bodily-materially enacted by participants, lends itself well to addressing these challenges. aknowledgments the authors are grateful to feldia loperfido and antti rajala for their comments on a previous version and to the editors and reviewers for constructive criticism that have allowed the development of a stronger article. references bakhtin, m. (1981). the dialogic imagination. four essays by m. m. bakhtin. austin: university of texas press. bateson, g. (1972). steps to an ecology of mind: collected essays in anthropology, psychiatry, evolution, and epistemology. chicago: the chicago university press. brown, r., & renshaw, p. (2006). positioning students as actors and authors: a chronotopic analysis of collaborative learning activities. mind, culture and activity, 13(3), 244–256. doi: 10.1207/s15327884mca1303_6 cole, m. (1996). cultural psychology: a once and future discipline. cambridge: harvard university press. ritella et al | f l r 55 duranti, a., & goodwin, c. (eds.). (1992). rethinking context: language as an interactive phenomenon. cambridge: cambridge university press. van eijck, m., & roth, w. m. (2010). towards a chronotopic theory of “place” in place-based education. cultural studies of science education, 5(4), 869-898. doi: 10.1007/s11422-010-9278-2 van eijck, m., & roth, w. m. (2013). place and chronotope. in imagination of science in education (pp. 133-162). springer netherlands. engle, r. a. (2006). framing interactions to foster generative learning: a situative explanation of transfer in a community of learners classroom. the journal of the learning sciences, 15(4), 451-498. doi: 10.1207/s15327809jls1504_2 flipped learning network (fln). (2014) the four pillars of f-l-i-p™ foucault, m. (1977) discipline and punish: the birth of the prison. vintage. hirst, e., & vadeboncoeur, j. a. (2006). patrolling the borders of otherness: dis/placed identity positions for teachers and students in schooled spaces. mind, culture, and activity, 13(3), 205-227. doi: 10.1207/s15327884mca1303_4 holquist, m. (1982). bakhtin and rabelais: theory as praxis. boundary 2, 5-19. ito, m., gutiérrez, k., livingstone, s., penuel, b., rhodes, j., salen, k., & watkins, s. c. (2013). connected learning: an agenda for research and design. irvine, ca: digital media and learning research hub. jornet, a., & roth, w. m. (2015). the joint work of connecting multiple (re) presentations in science classrooms. science education, 99(2), 378-403. doi: 10.1002/sce.21150 lemke, j. (2004). learning across multiple places and their chronotopes. in aera 2004 symposium, april (pp. 12-16). ligorio, m. b., & ritella, g. (2010). the collaborative construction of chronotopes during computersupported collaborative professional tasks. international journal of computer-supported collaborative learning, 5(4), 433-452. doi: 10.1007/s11412-010-9094-4 matusov, e. (2009). journey into dialogic pedagogy. nova science publishers. renshaw, peter d. (2013). classroom chronotopes privileged by contemporary educational policy: teaching and learning in testing times. in phillipson, s., kelly y. l. ku and shane n. phillipson (ed.), constructing educational achievement: a sociocultural perspective (pp. 57-69). oxon, uk: routledge. ritella, g., ligorio, m. b., & hakkarainen, k. (2015). the role of context in a collaborative problem-solving task during professional development. technology, pedagogy and education, 25(3), 395-412. doi: 10.1080/1475939x.2015.1062412 ritella, g., & ligorio, m. b. (2016). investigating chronotopes to advance a dialogical theory of collaborative sensemaking. culture & psychology, 22(2), 216-231. doi: 10.1177/1354067x15621475 ritella, g., ligorio, m. b., & hakkarainen, k. (accepted) interconnections between the discursive negotiation of space-time and the interpretation of a collaborative task. learning, culture and social interaction. ryan, m. (2011). productions of space: civic participation of young people at university. british educational research journal, 37(6), 1015-1031. doi: 10.1080/01411926.2010.517827 sawyer, r. k. (2005). emergence: societies as complex systems. cambridge: cambridge university press. silseth, k., & arnseth, h. c. (2011). learning and identity construction across sites: a dialogical approach to analysing the construction of learning selves.culture & psychology, 17(1), 65-80. doi: 10.1177/1354067x10388842 silseth, k., & arnseth, h. c. (2015). frames for learning science: analyzing learner positioning in a technology-enhanced science project. learning, media and technology, 1-20. doi: 10.1080/17439884.2015.1100636 frontline learning research 5 (2014) 64-91 issn 2295-3159 corresponding author: catherine gabelica, department of educational research and development, school of business and economics, maastricht university, tongersestraat 53, 6211lm maastricht, the netherlands, phone: +31 43 38 82587, email: c.gabelica@maastrichtuniversity.nl http://dx.doi.org/10.14786/flr.v2i3.79 64 | f l r dynamics of team reflexivity after feedback catherine gabelica a , piet van den bossche ab , mien segers a , wim gijselaers a a maastricht university, the netherlands b antwerp university, belgium article received 15 january 2014 / revised 11 february 2014 / accepted 5 june 2014 / available online 18 june 2014 abstract a great deal of work has been generated on feedback in teams and has shown that giving performance feedback to teams is not sufficient to improve performance. to achieve the potential of feedback, it is stated that teams need to proactively process this feedback and thus collectively evaluate their performance and strategies, look for alternatives, and make clear decisions about ways to tackle their task. this concept of team reflexivity has been commonly described as a sequence of behaviours, which relative importance has not been demonstrated. further, empirical research investigating the dynamic aspects of reflexivity has been scarce. this study sought to explore how reflexivity evolves over time and at which moments of the team interaction it is related to team performance. thirty-two student dyads participated to a cognitively complex task (flight simulation) over four performance episodes comprising action phases followed by transition (feedback) phases. high interdependence between participants (pilots and co-pilots) was ensured through the distribution of complementary knowledge in the dyads. the results showed that teams seldom engaged in full cycles of reflective behaviours. when looking into individual behaviours, teams exhibited more reflective behaviours during action over time, while their reflective behaviours during feedback did not change, demonstrating a suboptimal feedback processing as time goes by. additionally, it was demonstrated that teams were capable to learn from their past and act upon feedback to better subsequent team performance but also that initial performance acts as a trigger to future reflective behaviours. keywords: teams, team learning, feedback, team reflexivity, team performance c. gabelica 65 | f l r 1. introduction small group work has gradually progressed to being one of the dominant approaches in the domain of learning and instruction and professional development (e.g., kirschner, 2009). collaborative learning is one of the most successful and widespread instructional practice implemented in schools and universities (e.g., dillenbourg, 1999, johnson & johnson, 1992). similarly, work teams have become a central element in the functioning of organisations in many domains (e.g., health care, military, and aviation) (salas, stagl, burke, & goodwin, 2007). both professional teams and learning teams face similar challenges inherent to collaboration and joint understanding (barron, 2000; järvelä, volet, & järvenoja, 2010). specifically, in both environments, interdependent team members need to interact and communicate effectively, share knowledge and experiences, and capitalise each other‟s skills and resources to successfully complete a common task (e.g., johnson, johnson, & stanne, 2000; salas, dickinson, converse, & tannenbaum, 1992). crucially, recent work has shown that teams that engage in team learning processes and learn how to work effectively are more likely to succeed (e.g., dochy, gijbels, raes, & kyndt, 2014; van der haar, segers, & jehn, 2013, veestraeten, kyndt, & dochy, 2014). team learning has been defined as “an ongoing process of reflection and action‟‟ (edmondson, 1999, p.353) during which teams reflect on their own prior activities and consequently plan adjustments for future practice (see decuyper et al., 2010, for a review). although scholars in these areas have tended to remain isolated within their own disciplines despite obvious overlaps in research interests, they generally agree that team learning processes do not occur naturally (johnson & johnson, 1992; rummel & spada, 2005; sims, salas, & burke, 2005). the awareness that not all teams learn, and as a consequence may reach substandard group performance, raises the need to outline deliberate interventions to build learning in teams. more and more, new research interests focus on what can be done to leverage learning in teams and improve their performance (e.g., decuyper, dochy, & van den bossche, 2010; salas, stagl, & burke, 2004). despite these renewed efforts, it seems that potential leverage points (such as training or the provision of feedback) calibrated for teams need to be better specified and validated (kozlowski & ilgen, 2006). giving teams feedback on their team process and performance has been identified as a leverage point that shapes team learning and can improve team performance (gabelica, van den bossche, segers, & gijselaers, 2012; johnson & johnson, 2002; london & sessa, 2006; phielix, prins, & kirschner, 2010). in school and beyond, teams need feedback to monitor and regulate their work (hattie & timperley, 2007). previous theoretical work on feedback provided by external agents at the team level of analysis (e.g., goodman, wood, & hendrickx, 2004; london & sessa, 2006) suggests that to achieve changes in team learning and performance, teams need to process received feedback, be receptive to this feedback, understand its value, and actively engage in collaborative activities during which they use feedback cues to make improvements. nevertheless, empirical work on the value of active feedback processing to the mere reception of feedback in teams has considerably lagged behind theoretical development (hattie & timperley, 2007). this feedback processing has yet to be empirically examined (gabelica et al., 2012). more specifically, the more interesting question about feedback effectiveness is rather how learning naturally happens during the team feedback process and how effective are these learning processes (e.g., adcroft, 2011). moreover, previous work in both team and collaborative learning research leaves much about the dynamics of feedback processing in teams unspecified, such as 1) how do teams respond to repeated (external) feedback in dialogue over the course of ongoing activities, and 2) when (i.e., at which point in time) are these behaviours related to effective learning and performance. there is a general agreement across disciplines that we should consider feedback loops in which behavioural changes resulting from each cycle are inputs in cycles that follow (e.g., soller, monés, jermann, & mühlenbrock, 2005) but this is rarely reflected in research designs (e.g., ilgen, hollenbeck, johnson, & jundt, 2005). concerning how teams process feedback, we propose that they do so by performing shared reflective activities, that is by collectively discussing and reflecting upon their functioning (e.g., schippers, den hartog, & koopman, 2007). these activities are core building blocks of team learning. specifically, it has been shown that reflective teams evaluate their performance and strategies, look for alternatives to consider situations, and make decisions about new ways to tackle their task. the concept of team reflexivity, as proposed in organisational psychology, mirrors these activities (west, garrod, & carletta, 1997). in c. gabelica 66 | f l r educational settings, generic forms of intra-group reflection such as collective/social metacognition (mccarthy & garavan, 2008), reflection (edmondson, 1999), collaborative reflection (morris & stew, 2007; yukawa, 2006), peer reflection, or reflective self-explanation (rummel, spada & hauser, 2009) have been increasingly used. this recent research area is an extension of the work on individual reflection or reflective practice (e.g., boud, keogh, & walker, 1985) that adds interactions and communication with peers to the learning process. many authors agree that team reflexivity (in any generic sense) allows teams to reach a more accurate understanding of their task and, as a result, better performance (e.g., mccarthy & garavan, 2008; schippers, homan, & van knippenberg, 2013). although the very recent research strand on team reflexivity acknowledges the importance of the dynamics of team performance when considering team reflexivity, the empirical work is only beginning to consider under which circumstances team reflexivity relates to changing performance, but not in contexts with systematic performance data on which to reflect (schipper et al., 2013). across disciplines, external and specific feedback is not systematically part of the reflective process while it is usually agreed that reflection can only occur if people have accurate knowledge about their current and desired learning state (hattie, 2013). also, the relation between time and timing of reflexivity and team performance remains in question, such as does reflecting right from the start of a team activity help the team get started and allows later success or does sustained reflection after events later in a team‟s life also matter for sustained performance? thus, when teams process feedback appears as a gap in both feedback and reflexivity research. previous research on feedback and team performance suggests that feedback effects are not static but dynamic (e.g., mcgrath, 1993); it cannot be understood as a single-cycle linear path from inputs (e.g., feedback) through outcomes (e.g., team performance). in the same vain, how teams learn from feedback should also be considered with a dynamic glance (ilgen et al., 2005) the purpose of the present study is therefore to address the above-mentioned gaps by shedding light on how reflective behaviours relate to performance over a period of time in a complex, fast-paced, and highworkload situation in which two individuals with distributed information have to keep on learning to achieve success. specifically, the following questions are explored: 1) how do teams naturally overtly reflect when provided with feedback depicting their performance, 2) how do reflective behaviours grow over time during and/or after action, 3) how does the timing at which team reflexivity occurs relate to performance? 1.1 feedback interventions prior to addressing feedback in team settings, it is critical to briefly solicit input from multiple disciplines to better understand how the much more substantial body of research on feedback given to individuals have shaped the feedback concept in teams. in the learning sciences, feedback is an instructional practice that is expected to enhance motivation and learning (mulder & ellinger, 2013; shute, 2008). learning scientists have acknowledged feedback as a key characteristic of quality teaching decades ago in non-team settings (e.g., mory, 2003; shute, 2008, yang & carless, 2013). much of the extensive work on feedback given to individuals has come to two main conclusions: 1) learners should be given feedback containing learning information (e.g., duijnhouwer, prins, & stokking, 2012; gibbs & simpson, 2004) and 2) researchers should consider feedback from the perspective of the feedback receiver and thus incorporate the uptake and the receptivity of feedback in the feedback process (e.g., boud & molloy, 2013; eva et al., 2012). this also introduces the idea of a feedback process that goes beyond the provision of feedback (mulder, 2013). since feedback is traditionally part of instructional programs, the drawback of multi-component interventions is that it is not always possible to assign behavioural changes to feedback interventions (e.g., van der pol, van den berg, admiraal, & simons, 2008). further, most studies concern primary school and high school students (e.g., johnson & johnson, 1993), which raises the question of the generalizability of findings to higher education or workplace (adult) learning. by contrast, organisational, social, and behavioural psychology have incorporated feedback delivery in many (semi)experimental research and extensively investigated its added value with or without other c. gabelica 67 | f l r components (such as goal setting) to human performance (e.g., kluger & denisi, 1996) while the feedback process has largely remained a black box. furthermore, feedback is often mere “knowledge of performance or results” (e.g., performance data of a company or score on a simulation game) instead of elaborated informational feedback (e.g., austin, kessler, riccobono, & bailey, 1996). the learning value of feedback seems to be a consistent omission. moreover, this research tradition has primarily focused on post-secondary levels. taken together these disciplines have given rise to a new question transcending the simple question of whether or not feedback is truly effective: how and under which conditions feedback improves learning and performance. this concern has been echoed in the relatively smaller research strand on feedback to teams. 1.2 feedback to teams feedback at the team level of analysis is defined as the communication of information provided by (an) external agent(s) concerning actions, events, processes, or behaviours relative to task completion or teamwork (gabelica et al., 2012, london, 2003). it is widely accepted that feedback can provide teams with accurate information on their performance and may steer, motivate, support, and reinforce future team behaviour. feedback is considered as a leverage point in the team's development of a collective view of expectations and awareness about its behaviours, capabilities, and skills (london & sessa, 2006; prins, sluijsmans, & kirschner, 2006). research in collaborative learning environments has highlighted that feedback has the power to draw the team‟s attention to specific aspects of its task and hence encourage task-related discussion (johnson & johnson, 2002). in the workplace, feedback can also serve as a motivational trigger. for example, scott-young and samson (2009) showed that providing teams of managers with performance feedback reinforced teams‟ confidence in themselves and in turn, their performance. despite many potential benefits of feedback delivery, a recent review by gabelica et al. (2012) integrated findings from fifty-nine empirical studies investigating the effects of feedback in teams in educational and professional settings and showed mixed results. approximately one third of the studies did not find support for its expected positive effect on performance. for example, in a field experiment, jung and sosik (2003) found that giving feedback to teams performing decision-making tasks had positive benefits on group members‟ collective confidence (i.e., collective efficacy and group potency) but not on team performance. based on these inconsistent results, analogue to feedback research in non-team settings, gabelica and colleagues (2012) concluded that the key question of whether team feedback is effective depends on the conditions under which feedback is given, and not only on feedback as such (e.g., its quality). based on educational research, it can be argued that in addition to factors related to the feedback giver and environment, feedback receivers have a critical role to play. research on team feedback suggests that teams given feedback will only change if they perceive a learning need and opportunity and if they attend, interpret, and act upon feedback (e.g. london & sessa, 2006; phielix et al., 2010). in other words, teams need to proactively process the content of feedback, and thus invest time and effort into actively building content-oriented reactions if we expect visible changes in the way they perform. yet, how teams process information cues contained in feedback, and thus what specific processing behaviours and activities are dynamically related to performance remains largely unknown (gabelica et al., 2012). although there are few studies on peer feedback exploring the role of feedback receivers during the feedback process in teams (e.g. prins et al., 2006), the interconnections between uptakes of feedback receivers and ongoing performance are still unclear. also, since the success of feedback in terms of an effective uptake from the receivers depends at least partially on the feedback quality provided by others, studying the uptake of standardised feedback (i.e., of constant quality and constant source) would allow us to isolate the learning effects of providing feedback. in sum, there seems to be an agreement that reaching an intersubjective understanding of the content of the feedback in teams by discussing what can be learned and worked out from past experiences is a potent c. gabelica 68 | f l r factor augmenting feedback effectiveness (e.g., boud et al., 1985). despite a lack of direct evidence establishing the benefits of feedback processing behaviours, the consensus appears to be that the construct holds enough potential to warrant further investigation. the recent research strand on team reflexivity depicting the extent to which teams reflect upon and modify their functioning informs our understanding of these processing behaviours (schippers et al., 2013). 1.3 team reflexivity in pedagogy, individual reflectionor reflective practicecan be traced back to the early 1900s (dewey, 1910, 1997) but has been introduced more extensively into the field of professional learning by schön (1983) as professionals‟ critical consideration of what they are working on while they are working on it. on a simple level, one can consider reflection in the past, present, and future tense. schön refers to „reflection-in-action‟ as analysis in the present tense (i.e., reflection on the spot) and „reflection-on-action‟ as analysis in the past tense (i.e., review of past actions). killion and todnem (1991) underlie a lack of forward thinking implicit in dewey‟s work and propose that reflection should also inform future action. thus, they added „reflection-for-action‟ as reflection oriented towards the future (i.e., identification of guidelines to follow to succeed in the future). reflection as an individual critical thinking process has been recently extended to a view of reflection as a collaborative critical thinking process consisting of cognitive and social interactions between two or more individuals who examine their experiences to construct novel intersubjective understandings (boud et al., 1985; yukawa, 2006). as such, it is considered as a core team learning process. work on team learning has demonstrated that collective learning can be realised through iterative sequences of action, reflection, and implementation (dochy et al., 2014; edmondson, 1999). in the learning sciences, there are multiple labels denoting this concept of reflection at the team level. for example, the following terms have been used: collaborative reflection (morris & stew, 2007; yukawa, 2006), peer reflection, reflective selfexplanation (rummel, spada, & hauser, 2009), or collective or social metacognition (mccarthy & garavan, 2008). in small group research, principally one label “team reflexivity” has been introduced by west (1996) as a set of collaborative reflective behaviours and activities during which the team objectives, strategies, and processes are discussed openly. we use the term “team reflexivity” throughout this paper as a unique label for reflection at the team level. originally, the concept of team reflexivity has not been explicitly connected to the feedback process. however, it is generally acknowledged that reflection is enabled by feedback to ensure accuracy in learning (hattie, 2013). as a result, team reflexivity can be conceptualised as ways teams collectively try to extract meaning and cues for future behaviours from received feedback, generate intentions and plans, and ultimately decide to act upon feedback. thus, when performance feedback that is merely evaluative is given to teams, the process that follows this feedback moment might be shared reflection on the task and the team process. the underlying assumption is that team feedback gives goal-oriented information but teams are still responsible for its mindful uptake. it can be argued that reflective teams consider reasons, rationales, and evidence for this evaluation of past performance, weigh alternative perspectives to construct a better understanding of their collective experience that, in turn, better guides their future action (yukawa, 2006). three behaviours that reflect complementary dimensions of team reflexivity can hence be derived from previous work on team reflexivity across disciplines (e.g., schippers et al., 2007; yukawa, 2006): (a) evaluating present and past performance and strategies, (b) looking for alternatives, and (c) making decisions. evaluating refers to team members reviewing their goals, performance, strategies, and possible reasons behind success or failures. looking for alternatives occurs when teams make an inventory of possible ways to achieve the task. finally, making decisions consists of clearly stating a decision about how to handle the task differently and acting upon it. evaluating and looking for alternatives reflect the capability of the team to be self-aware of its behaviours and the necessity to make changes. according to schippers and colleagues (2007), this is necessary but not sufficient to engage in change. teams also need to implement the adapted actions. this is reflected by our conceptualisation of „‟making decisions‟‟ that depicts both the „‟intention to act‟‟ and „‟carrying out the decision‟‟. hence, this suggests a time-ordered sequence of c. gabelica 69 | f l r reflective behaviours that might constitute reflective cycles, although no empirical work supports the necessity of full three-phase sequences. overall, reflective teams have the ability to uncover why they succeeded or failed, solve misunderstandings, and correct their future approaches as new challenges emerge (tschan, semmer, nägele, & gurtner, 2000; wills & clerkin, 2009). as a consequence, team reflexivity has been recognised as an important contributor to effective collaboration and performance (e.g., kramarski, 2004; rummel, mullins, & spada, 2012; schippers, den hartog, koopman, & van knippenberg, 2008; tjosvold, tang, &west, 2004; van ginkel, tindale, & knippenberg, 2009). however, in their review of small group research moreland and mcminn (2010) draw attention to 1) the lack of significant relation between reflexivity and team performance found in some studies (e.g., edmondson, bohmer, & pisano, 2001; savelsbergh, van der heijden, & poell, 2009) and 2) relatively limited evidence of the effect of reflexivity on team performance (e.g., lewis, belliveau, herndon, & keller, 2007; müller, herbig, & petrovic, 2009). they concluded that reflexivity could be beneficial to team performance under certain circumstances. in the learning sciences, a similar trend has been observed: although team reflexivity in collaborative teams is highly important for the learning process, it does not always yield better learning gains (e.g., prinsen, terwelb, zijlstrac, & volman, 2013). given these mixed results, limitations of the small but growing research strand on team reflexivity need to be synthesised. first, reflexivity does not happen in a vacuum. teams will eventually adapt their operating methods and ways of working based on feedback cues from their environment. we could expect reflexivity to only improve team performance when teams have access to feedback describing their objective and accurate performance (schippers et al, 2013). yet, reflexivity is seldom conceptualised as a process augmenting the effect of feedback on performance (seibert, 1999). moreover, little is known about how people reflect on feedback at the team level, while there has been empirical evidence of the effect of reflection upon feedback at the individual level (e.g., duijnhouwer et al., 2012). for example, anseel, lievens, and schollaert (2009) have tested the effect of feedback augmented with reflection at the individual level and demonstrated that the combined use of individual-level feedback and reflection improved performance better than individual feedback alone. at the team-level, only one series of studies isolated the effect of feedback from its combination with reflection in computer-supported collaborative learning in highschool teams (phielix, prins, & kirschner, 2010; phielix, prins, kirschner, erkens, & jaspers, 2011). the authors expected shared self and peer assessment and shared reflection to have complementary effects. they did not find any significant effect of reflection alone or of the combined use of feedback and reflection on objective performance (i.e., grade), but demonstrated that the joint use of feedback and reflection lead to higher group process satisfaction and social and cognitive behaviour. interestingly, they draw attention to the fact that feedback (based on peer and self-perceptions) and reflection did not decrease unrealistic positive perceptions teams generally have about their own and other performance. this could be a reason for a lack of effect on objective performance. we do not know what are the effects of external feedback based on objective criteria for task achievement. second, as in most research in organisational psychology, the vast majority of small group research measuring team reflexivity in relation to team performance has used self-report instruments. self-report measures are limited by team members‟ level of awareness of their own behaviours and states and distorting biases such as social desirability. calls for studying reflective behaviours in teams have generally gone unheeded (west, 1996). in the learning sciences, dillenbourg, baker, blaye, and o‟malley (1996) have advised researchers to zoom in the „black box‟ of collaborative processes. subsequent to this call, there has been a recent proliferation of process-oriented research on collaboration (e.g., de wever, schellens, valcke, & van keer, 2005) that advanced our understanding on interaction features contributing to more effective learning. most of this collaborative learning research strand has focused on individual learning without explicitly investigating how collaborative processes influence team performance (e.g., janssen, kirschner, erkens, kirschner, & paas, 2010). nevertheless, these insights underscore the value of observational methods to provide crucial information about the context in which reflective behaviours occur and relate to team performance (e.g., chi, 1997; leicht, hunter, saluja, & messner, 2010). c. gabelica 70 | f l r finally, one area in which our understanding is incomplete across disciplines concerns the role of time and timing of reflective behaviours and their relation with team performance (e.g., ballard, tschan, & waller, 2008; janssen et al., 2010; okhuysen & waller, 2002; reimann, 2007; waller, 1999). there is a general agreement in the team literature that team performance is the product of ongoing and recurrent processes and actions (mcgrath, 1993). marks and colleagues (2001) conceptualise these cycles as performance episodes. performance episodes consist of repeated cycles of action (i.e., when teams perform an activity) and transition (or „interrupts‟) phases between actions. these interrupts are opportunities for teams to stop and reflect about their progresses for engaging in change (okhuysen & waller, 2002). the most common conceptualisation of team reflexivity does stipulate that shared reflection can occur before, during, and after a task (west, 2000; schippers et al., 2007). however, scholars have generally measured it as an overall working style (gurtner, tschan, semmer, & nagele, 2007) or as an aggregated measure of collaborative activities and have not differentiated the moments at which it occurs (e.g., lajoie & lu, 2012). specifically, they have not tested whether team reflexivity was more or less beneficial in certain phases, in relation to team performance dynamics, as suggested by certain authors (hoeg & parboteeah, 2006; janssen et al., 2010; schippers et al., 2003). this time-related issue of team reflexivity is elaborated upon in the following section. 1.3.1 time and timing of team reflexivity we identify three primary issues in understanding the dynamic aspects of team reflexivity. first, although the importance of dynamic conditions experienced by teams over time is widely accepted (e.g., waller, 1999), empirical work on how team reflexivity changes over time is missing (e.g., janssen et al., 2010). on the one hand, it may be that overt communication is no longer needed as teams improve their implicit coordination over time, thus decreasing reflective interactions (e.g., entin & serfaty, 1999). additionally, teams tend to define their goals and strategies at an early stage of their work and not to deliberatively review them after some work has been accomplished (argote, 1989; hackman & wageman, 2005). also, in line with arguments from schippers and colleagues (2003), reflexivity might decline over time in diverse teams as viewpoints and perspectives become incompatible. on the other hand, they suggest that this declining effect might be reduced by the provision of feedback. it may be that the availability of accurate performance data highlighting deficiencies and a sustained task complexity trigger learning needs for teams perhaps calling for more reflection over time (rulke & rau, 2000). as such, feedback provision occurring during transition can act as a formal mechanism, a temporal punctuation likely to encourage reflection without necessarily giving a predetermined framework to follow (okhuysen & waller, 2002). second, the role of timing of reflective behaviours is similarly not well understood. the scarce previous research on the question „‟does reflection during action and/or transition lead to better performance‟‟ has shown mixed results in contexts without explicit feedback. for example, in a study conducted by moreland and mcminn (2010), none of the (scarce) reflective behaviours occurring during transition was significantly related to changes in team performance. by contrast, research looking into the impact of "interrupts" on group processes concluded that these were triggers of change in groups (okhuysen & eisenhardt, 2002). team members appear to naturally interrupt their work around the midpoint of the allocated time for task completion and be more likely to put into practice strategies they set during these time outs (gersick, 1989). concerning reflection during action, moreland and his colleague suggest that it might have more impact on performance. this reflection-in-action would be more directly related to the activities team members perform and thus prevent errors from being committed in real time. conversely, it is likely that reflection while performing has a cost especially in a task that combines active processing of information and coordinated actions (kirschner, paas, & kirschner, 2009; schippers et al., 2013). that is, reflecting while executing a complex task places an extra burden on teams which may overload their working memory occasioning less optimal performance (kirschner et al., 2009). furthermore, it is generally recognised that what happens in the early part of the team interaction might provide insight into subsequent effectiveness (e.g., eriksen & dyer, 2004; kaplan, laport, & waller, 2013). team decision-making literature has provided preliminary insights into this issue of timing of behaviours. it was shown that in teams with distributed information (i.e., comprising team members holding c. gabelica 71 | f l r unique information), early agreements might harm decision quality because teams are less focused on exchanging and integrating distributed information (van ginkel et al., 2009; van ginkel & van knippenberg, 2009). accordingly, jumping too early into task completion might lead to process losses and performance decline (mathieu & rapp, 2008). on the contrary, it was demonstrated that effective teams share and elaborate upon distributed information at the beginning of interaction (van ginkel at al., 2009). rulke and rau (2000) came to a similar conclusion when examining how teams develop a shared understanding of „who knows what‟ in the team (i.e., transactive memory systems), proved to be an important factor to achieve better success. they observed that teams with high transactive memory systems were those whose members shared understandings and evaluated each other‟s expertise early in their team interactions. in another example from computer-supported collaborative learning research, kapur, voiklis, and kinzer (2008) demonstrated that a high quality contribution at the beginning of a problem solving process had more impact than those occurring later during team interactions. therefore, the temporal pattern within reflective interactions should be taken into account in further understanding team reflexivity upon feedback. also, presently, in the team reflexivity literature the question „‟do the three behaviours making up reflexivity have differential effects on subsequent performance at an early stage of team interaction‟‟ remains unanswered. no empirical work has shown the necessity of a certain order of these reflective behaviours nor whether, and if so when, certain reflective behaviours were more conducive to better performance. for example, if we look into team reflective behaviours individually, we do not know if evaluating and looking for alternatives during teams‟ first moment of interaction promote elaboration and understanding of the task whereas making decisions at an early stage is detrimental to subsequent performance. third, the direction of the relation between reflexivity and performance can be questioned (e.g., janssen et al., 2010). in line with the core assumption of previous research on team reflexivity, does reflexivity lead to subsequent better performance? alternatively or at the same time, do teams learn from previous performance, and thus reflect more as a consequence of how they performed previously? research still has to prove the theoretical claim that teams can learn from the past through reflection with clear sight of performance criteria and information about their attainment. only recently, schippers and colleagues (2013) have given indirect evidence in this regard. in this study, self-report reflexivity was measured at two points in time in teams of students working on their bachelor thesis. this study showed that low-performing teams had the capability to translate information from performance feedback into effective task approaches. however, students were only given a grade and not feedback describing attainment of specific performance criteria. as suggested by waller, gupta, and giambatista (2004) the timing of errors and subsequent behaviours has to be recorded to answer the question of causality. further, investigating the timing of reflective activities in teams can help detect the points at which team reflexivity occurs and may need to be supported (lajoie & lu, 2012). 1.4 the present study in the present study, we seek to understand the dynamics of team reflexivity and the relation (uni or bi-directional) between team reflexivity and performance. therefore, we explore the two following questions. 1) does the occurrence of team reflexivity augment or decline over time during action and transition phases of teamwork? 2) how is the timing at which reflective behaviour occurs related to performance? specifically, is reflexivity during action and/or during transition related to higher performance (a)? does each behaviour making up team reflexivity have the same impact on performance when occurring during teams‟ first moment of interaction (b)? 2. method 2.1 participants sixty-four students (32 male and 32 females) were recruited from a university in the netherlands and randomly assigned to thirty-two dyads (n = 32). their ages ranged from 18 to 29 years, m = 22.3, sd = 2.4. c. gabelica 72 | f l r participants were not eligible if any of the following exclusion criteria were present: experience in flight or related simulations and familiarity with each other. they were either paired with a same-gender partner (female and male teams, n = 11 and n = 11 respectively) or different-gender partner (mixed teams, n = 10). by random assignment, half of the sample was assigned to a role of pilot and the other half to a role of copilot. subjects participated voluntarily in exchange for vouchers. 2.2 task participants in the role of pilot and co-pilot were required to complete four landing missions of the computer simulation “microsoft flight simulator x”. the task, a complex, fast-paced, and high-workload situation, was chosen to stimulate ongoing learning in a controlled environment. cognitively complex and interactive simulation tasks, such as flight-simulations, are commonly used in team research to investigate processes related to team performance (e.g., bowers, salas, prince, & brannick, 1992; villado & arthur, 2013). we did not use this computer simulation to mimic real-work team environments but rather to examine a set of theoretical relations (i.e., nomological network) among constructs within specific and controlled boundaries: a complex, fast-paced, and high-workload situation in which team members with unequally distributed information have to learn from each other to achieve their team goal and extend their learning to more complex variations of the task (marks, 2000). to avoid that good-performing teams would have less need to learn as a consequence of their reflection on performance (schippers et al. 2013), the level of complexity of the missions increased gradually over time. the abundance of information teams received before and during the missions and the high level of interdependence between pilot and co-pilot ensured a high level of complexity across performance episodes. in each mission, teams had to follow a predetermined traffic pattern during which they were required to maintain appropriate levels of speed, altitude, and a correct configuration of the airplane. the missions were completed when the team managed to land safely on the runway. the computer was connected to a whiteboard on which the game was screened. 2.3 procedure the whole session lasted approximately two and a half hours. after introduction to the procedure and random assignment to the role of pilot or co-pilot, participants were individually trained during fortyfive minutes. items of information necessary for achieving a good landing were distributed between the team members. pilots and co-pilots were seated in separate rooms to study the task material containing critical role-specific knowledge of piloting or monitoring the aircraft. the task of the pilot was to fly the plane and operate the joystick. for that purpose, pilots received an additional 10-minute hands-on training to practice. the task of the co-pilot was to control the gas of the plane and provide the pilot with indications and directions. only the co-pilot had the access to the air traffic control (atc)‟s instructions, given through headphones, and knew how to interpret the cockpit instruments. after the training, participants were seated together to complete four landing missions. teams had up to fifteen minutes to complete each mission and were also allowed to restart a mission if they had crashed. before each mission the team received a written description of the flight objectives and the general mission scenario. moreover, before starting missions 2, 3, and 4, teams were given specific performance feedback about their previous performance. performance feedback described the attainment of success criteria such as speed, altitude, rate of descent, pitch, touchdown, and traffic pattern. the participants were allowed to communicate freely with one another. all teams were videotaped. 2.4 measures 2.4.1 team performance two performance scores were computed: the total number of errors during a mission and the number of times teams crashed. the number of errors was derived from an instrument rating objective performance c. gabelica 73 | f l r criteria (e.g., speed, altitude, activation of flaps and landing gear, landing position) of a good landing approach. this instrument was based on two sources: firstly, to identify key factors of a good flight, we performed a task analysis with a flight expert. secondly, we used tests that the game itself provides its players to refine these criteria. examples of deficiencies (i.e., errors) included failure to extend the flaps before landing, to maintain a certain speed interval during descent, to reduce the speed before touchdown, to keep a constant rate of descent, to align with the runaway, or to have one touchdown on the runaway. the total number of potential errors varied in the four missions. this variation reflects the increasing level of difficulty of the missions. we chose the number of crashes to depict one of the most salient manifests of performance for participants. there were four measurement times in total (i.e., t1, t2, t3, and t4). 2.4.2 categories of team reflective behaviours team communication was coded to identify representative behaviours that could be taken as evidence of team reflexivity (rourke & anderson, 2004). we developed the coding scheme of the present study through a series of steps assuring its validity and reliability (schippers et al., 2007). first, we determined the granularity of the unit of analysis. the unit of meaning was applied (rourke, anderson, garrison, & archer, 2001). specifically, to consider a verbal statement a significant unit, we decided that utterances had to be individual messages (questions or statements) that 1) were expressed by one team member, 2) dealt with one topic, idea, or argument chain, 3) reflected one unique behaviour, and 4) related to the topic at hand or the team. thus, one semantic feature (unit of meaning) and one activity feature (team member speaking) were used for segmentation of the communication content into units (chi, 1997). as such, as soon as the topic or the speaker changed, a new behaviour was coded (visschers-pleijers, dolmans, de leng, wolfhagen, & van der vleuten, 2006). in addition to verbal statements, one unambiguous non-verbal behaviour was set as an evidence of one of the reflective behaviours. second, we discriminated verbal interactions types that typified reflexivity. to do this, we adapted and expanded the initial framework of team reflexivity (west, 1996, 2000) and an existing questionnaire from schippers and colleagues (2003). reflexivity was originally defined as an iterative process including three broader behaviours, namely reflection, planning, and acting/adapting. as shown in table 1, the coding scheme covers three reflective behaviours: evaluating or reviewing present or past team performance and strategies, looking for alternatives, and making decisions. information directly forwarded from the atc (repetitions) and the literal reading of the feedback form were excluded from the coding. table 1 the coding scheme for the content analysis of team reflexivity categories description examples evaluating or reviewing performance or strategies statements or questions about team performance (e.g., whether the team does/did well, is/was on the right track according to plans or received instructions), the goal of the mission and its requirements, actions and strategies (mis)used, reasons behind success, failure, or problems (e.g., he/she gives examples of behaviours, task or team strategies that may explain why they achieved success or encountered problems during this mission). “we are going in the wrong direction.‟‟ “we crashed because we were always too fast.” “something went wrong, maybe the nose of the plane went too low?” c. gabelica 74 | f l r looking for alternatives suggestions or discussions of alternatives in how they approached the task (at the task or team levels) and of the sequence of actions undertaken. in other words, teams discuss how they could do or could have done differently. “we could have reduced the speed by pitching up or reducing the throttle.” “we could lower the speed by extending the flaps, pitching up, or lower the gas.” “we could either make a u-turn either still try to lower speed and make a sharp descent.” making decisions statements clearly depicting a decision about a new direction to take or observable behaviours following a decision. team members‟ utterances depicting very explicit decisions about the way they were going to approach the task or work as a team, explicit statements about the intention to follow decisions made within the team, and explicit reaction to a decision by an action (e.g., by pressing the flaps, pulling the gas controller). „‟we are going to make a uturn” or “this time, you look at the speed indicator and i will pitch down”. third, we ran a pilot study to test and validate the coding scheme. this lead to adaptations, clarifications of the reasoning behind the framework definitions and the boundaries of the units, and the addition of typical examples. fourth, we extensively trained two coders, each blind to the hypotheses of the study, to optimise reliability and consequently reduce errors in observation. they were provided with clear examples (of inclusion and exclusion) of the manifestation of the behaviours and had rating exercises with multiple rounds and discussions to attain consistency among coders. fifth, videotapes were coded with the newly developed coding scheme. finally, the two coders coded independently one-third of the videotapes to estimate interrater reliability (cohen‟s kappa). kappas were calculated for all the categories. these kappas ranged from 0.65 to 0.88, with an average of 0.78, indicating a „substantial‟ to „almost perfect agreement‟ across the two coders as to the occurrence of the specific behaviours (landis & koch, 1977). coders and two trainers resolved any discrepancies. the research design is displayed in figure 1. c. gabelica 75 | f l r figure 1. overview of the phases of the study and behaviours measured at each wave of data. 3. analyses for the coding, we used the observer® xt 10.5, a computer software aimed for quantitative analysis of observational data. videotapes were directly coded without transcripts. the extent to which teams engaged in each behaviour was expressed in terms of frequencies of occurrence of the behaviour of its members for each mission (i.e., action phase) and for each feedback (transition) phase. additionally, we computed an overall team reflexivity score (i.e., aggregation of the three behaviours) for each time measurement for action and feedback. c. gabelica 76 | f l r we coded acts on the basis of utterances of reflective behaviours occurring at the team level of analysis (i.e., aggregation of individual utterances for each team). besides utterances, we examined phases (or „‟when‟‟, specifically action or transition, earlier interaction or later interaction) in which some performance events (i.e., crashes and errors) were related to team reflective behaviours. behaviours were coded at four points in time during action and feedback phases (time 1, time 2, time 3, and time 4) for each team (n = 32). 4. results in the following section, we first present an overview of the frequencies of behaviours considered individually and of the frequencies of sequences comprising two or three behaviours. second, we test whether reflective behaviours change over time during action and feedback using repeated-measures analyses of variance. third, correlations between team reflective behaviours and performance are examined, more specifically (a) the relations between prior performance and subsequent team reflexivity and (b) prior team reflexivity and subsequent performance. 4.1 frequencies of behaviours at the team level of analysis figures 2, 3, and 4 depict means and standard deviations of the reflective categories across time during action and feedback phases. it can be seen that evaluating is the most frequent reflective behaviour. looking for alternatives during action at time 1 appears very scarce. during feedback, it seems that looking for alternatives is not a frequent practice. the same trend can be noted for making decisions. it is more frequent during missions than after feedback reception. it has to be noted that standard deviations reflect important differences between teams. in sum, reflective behaviours, when examined individually, tend to follow a similar pattern: they appear more frequent during action while they are low during feedback. figure 2. means and standard deviations of evaluating for each measurement time and phase (n = 32). c. gabelica 77 | f l r figure 3. means and standard deviations of looking for alternatives for each measurement time and phase (n = 32). figure 4. means and standard deviations of making decisions for each measurement time and phase (n = 32). 4.2 sequences of evaluative behaviours the coded reflective acts described above are single communication behaviours. in the present study, team reflexivity was conceptualised as a collection of three behaviours. we explored whether teams actually completed full „‟reflective cycles‟‟ comprising all three behaviours in a sequence. since there has been no empirical work demonstrating the necessity of all three behaviours we also considered the most basic behavioural patterns consisting of two subsequent reflective behaviours. as can be seen in table 2, most reflective communication across teams can be summarised by two main two-behaviour sequences: sequences starting with evaluating and looking for alternatives and ending with clear decisions about a different way to handle the task. in these sequences, teams “skipped” the evaluation or search for alternatives phases. while sequences of looking for alternatives followed by decisions seemed to grow over time during action, except for the last mission, the frequencies of the sequences starting with an evaluative comment and ending with a decision stayed relatively stable over time, except for a drop at time 2. during feedback, the same trend than for individual behaviours is observed; sequences are very scarce. importantly, full cycles were almost never completed, suggesting that teams were not naturally systematic in their reflective process. the absence of reflective cycles does not allow us to further investigate their change over time and relatedness to team performance. still, how individual reflective behaviours evolve over time and are related to team performance is of importance to map 1) how teams naturally respond to feedback and 2) whether some behaviours appear more important than others in the feedback process. c. gabelica 78 | f l r table 2 frequencies of sequences of reflective behaviours at each wave of data (n = 32) sequences time 1 time 2 time 3 time 4 action feedback action feedback action feedback action b1-b2 2 2 3 0 15 1 9 b1-b3 30 14 15 7 29 0 29 b2-b3 2 2 18 0 30 3 23 b1-b2-b3 0 0 1 0 2 0 0 notes. b1 = evaluating, b2 = looking for alternatives, b3 = making decisions. 4.3 does team reflexivity change over time? to test for significant changes of team reflexivity behaviours (considered individually) over time, we computed repeated-measures analyses of variance with a greenhouse-geisser correction (as sphericity was violated for all behaviours) and with the four times each behaviour was measured as a within-team factor. pairwise comparisons with bonferroni corrections controlling for inflation of type i error were also computed. evaluating during action changed significantly from time 1 to time 4, f (2.26, 65.67) = 5.04, p = .05, with pairwise comparisons showing that evaluating at time 4 was significantly more frequent than evaluating at time 2 (p = .019). in contrast, evaluating during feedback did not change over time. regarding looking for alternatives during action there was also an overall significant difference between the means at the different time points, f (2.24, 64.84) = 5.76, p = .004. a pairwise comparison confirmed a difference between time 1 and time 2 (p = .034), time 1 and time 3 (p = .000), and time 1 and time 4 (p = .0006), signifying an increase of the behaviour over missions. looking for alternatives during feedback, making decisions during action and during feedback did not change significantly over time with a bonferroni correction. with tukey‟s test, making decisions during action at time 4 was significantly higher than at time 3 (p = .043) and time 2 (p = .045). finally, while overall reflexivity across feedback did not change significantly over time, the mean scores for overall reflexivity across missions (i.e., aggregated behaviours) were significantly different, f (2.33, 67.56) = 5.46 p = .004. specifically, reflexivity during mission at time 4 was higher than reflexivity during mission at time 1 (p = .027) and time 2 (p = .035). it is worthwhile noting that if the less conservative tukey post hoc test is used, the score in mission 3 is higher than at time 1 (p = .015) and time 2 (p = .043). 4.4 is team reflexivity related to team performance? intercorrelations between reflective behaviours and performance measures are presented per time period in table 3. c. gabelica 79 | f l r table 3 correlations between coded categories and performance measures at each point in time (n = 32) reflective behaviors errors mission 1 crash mission 1 errors mission 2 crash mission 2 errors mission 3 crash mission 3 errors mission 4 crash mission 4 evaluating mission t1 .46** evaluating mission t2 .45* .43* evaluating mission t3 .50* evaluating mission t4 -.42* evaluating feedback t1 evaluating feedback t2 -.45* -.42* evaluating feedback t3 -.45* alternatives mission t1 alternatives mission t2 .61** alternatives mission t3 .36* .38* alternatives mission t4 alternatives feedback t1 alternatives feedback t2 .51** alternatives feedback t3 decisions mission t1 .46** .48** decisions mission t2 .51** .47** .51** decisions mission t3 .48** decisions mission t4 -.42* -.43* decisions feedback t1 .45* .39* .38* decisions feedback t2 .43* .43* decisions feedback t3 c. gabelica 80 | f l r note. *p < .05. ** p < .01. t1= time 1; t2=mission 2; t3=mission3; t4=mission 4. only significant correlations are indicated 4.4.1 the impact of prior team performance on team reflexivity at first glance, initial errors seem to be beneficial to subsequent reflective behaviours, while the trend that errors trigger reflection tends to decline with time. specifically, the number of errors teams made during the first action phase was positively correlated with numerous reflective behaviours and to all reflective categories. the number of crashes teams initially faced followed the same path. teams did not seem to be discouraged by their first experience with crash. for example, the initial number of crashes was positively related to more decision-making behaviours after it occurred, both during the feedback phase following the “failure” (r = .39, p < .05) and during the next action at time 2 (r = .47, p < .01). in contrast, errors committed at times 2 and 3 were not related to higher subsequent reflective behaviours, with the exception of errors at time 2 appearing as a trigger for decision making during the transition phase immediately following that mission (r = .43, p < .05). conversely, errors at times 2 and 3 seem to hamper decision making during action at time 4 (respectively r = .42, p < .05 and r = .43, p < .05), while the number of crashes at time 2 was related to less frequent evaluative behaviours during feedback at time 2 (r = -.45, p < .05) and during action at time 4 (r = .42, p < .05). in sum, these first results seem to suggest that initial failure acts as an eye opener to evaluate what went wrong, look for alternative ways of approaching the task, and make more decisions, while later, as task becomes more complex, it might relate to less subsequent reflection. 4.4.2 the impact of team reflexivity on subsequent team performance concerning the impact of team reflexivity on subsequent performance, the correlations indicate a differential impact depending on the type of behaviour considered. the extent to which teams evaluated their performance and strategies during feedback at time 3 was related to lower number of crashes at time 4 (r = -.45, p < .05). similarly, the extent to which teams engaged in evaluative behaviours during feedback at time 2 significantly related to fewer errors at time 3 (r = -.42, p < .05). what is particularly noticeable is that these significant relationships concern 1) evaluative behaviours and 2) periods of team transition, suggesting a positive effect of processing feedback on subsequent performance. reflection during action does not seem to be significantly related to subsequent performance when bivariate correlations are computed. in contrast, the extent to which teams made decisions during action at time 1 was related to more crashes at time 3 (r = .48 p < .01), suggesting that making decisions during the initial moments of team development on this novel task (with its specific characteristics) may impede subsequent performance. 5. discussion to uncover when giving teams feedback about their performance creates an opportunity for learning, it has been posited that it is important to examine how teams actively process feedback and thus collaboratively evaluate information about past activities and derive alternative recommendations for next action. however, research on this feedback processing has been scarce in the learning sciences and organisational psychology (gabelica et al., 2012; london & sessa, 2006; phielix, et al., 2010). it is shown that the specific activities teams perform to deal with feedback and when these activities are related to performance remain to be considered. following from these perceived gaps, we conducted a study attempting to build upon and extend research in this domain by 1) identifying actual behaviours enabling feedback processing (i.e., team reflexivity) and 2) providing a more fine-grained analysis of dynamic aspects of team reflexivity in a context with systematic and explicit feedback. while theoretical work seems to suggest team reflexivity is an iterative three-step cycle that involves evaluating performance and strategies, looking for alternatives, and c. gabelica 81 | f l r making decisions (e.g., schippers et al., 2007; yukawa, 2006), we do not know if to perform well, it is necessary to follow this series of steps and if some steps are more dominant and influential than others in relation to team performance. we explored the development of team reflective behaviours and the relation between timing of reflective behaviours and performance during four performance episodes. following conclusions could be drawn. firstly, teams never completed full cycles of evaluating, looking for alternatives, and making decisions. they did, however, completed two behaviour-sequences starting with evaluating or looking for alternatives and ending with a clear decision. moreover, these sequences were very scarce during transition (i.e., interrupts during which team performance feedback was delivered). while the reflective cycle is usually described sequentially, teams seem to rarely follow a rigid series of steps to deal with feedback. instead, it seems like they often skip steps or even go back through steps several times. when taken individually, team reflective behaviours were overall more frequent and increased over time during action, whereas team reflexivity during transition was relatively less frequent and did not change. looking for alternatives was very scarce but also increased over time during action only. there might be two reasons for this growth of reflection-in-action. first, natural reflection might arise as a response to an immediate learning need when a team observes cues of ineffective behaviours or experience misunderstanding or uncertainty (e.g., they get lost) while completing the task. additionally, this learning need could have been triggered by the increasing complexity of the task (i.e., more cues to understand and interpret). second, preceding feedback, which has been advanced as a way to counteract the natural decline of team reflexivity (schippers et al., 2003), could also have had a delayed effect on next reflection-during action. in the learning sciences, feedback has been shown to impact subsequent learning (e.g., mory, 2003). it may be that teams only see the learning content of a feedback when they have to deal with a similar situation. this high reflexivity-in-action suggests that teams are more reactive than proactive and adaptive to anticipated circumstances when faced with higher complexity and workload. this is in line with a common rationale some authors have been using to speculate on the causes of a lack of actual reflection in teams (arvaja, häkkinen, eteläpelto, & rasku-puttonen, 2000; morris & stew, 2007). they state that teams are more driven by their results (i.e., producing and performing) than the learning they can gain, especially when they are under pressure and despite the obvious benefits of strategy development (gurtner et al., 2007; karau & kelly, 1992). a possible explanation of why teams did not use transition phases to increase reflection in the same way lies in teams‟ tendency to set their goals and strategies early and not to actively question them later (argote, 1989; hackman & wageman, 2005; weingart, 1992). further, it may be that the concurrent cognitive demands of collaborating on the complex task, making sense of the received feedback, and trying to prepare for the next task were too high for teams to make the most of their learning experience (kirschner et al., 2009; rummel & spada, 2009). reflecting is a challenging high-order activity (jay & johnson, 2001). the concept of reflection assumes individuals have the capacity to engage in self-examination and openminded analysis of their own knowledge. additionally, even if team members can reflect in solo-learning situations, they may not be able to coordinate and co-reflect in a team, communicate their reflective thoughts, nor agree on ways to address the task (chan, 2012). secondly, we uncovered patterns that transcended the straightforward question of whether team reflexivity can change performance in teams given explicit feedback. it seems that the key question is rather when improvements occur. (lajoie & lu, 2012) as signified in recent research on team reflexivity (e.g., moreland & macminn, 2010; schippers et al, 2013), reflexivity was not uniformly beneficial. we showed instances in which timing of reflective behaviours determined its effect (positive, negative, or neutral) on performance. first, early decision-making was related to lower subsequent outcomes. these findings are in accordance with previous studies on timing of decision-making indicating that early decision making might prevent deep processing and sharing of unevenly distributed information (van ginkel, et al., 2009; van ginkel & van knippenberg, 2008). second, after that first experience with their task, teams were able to derive insights from past performance (depicted by feedback) and correct misunderstandings that prevented effective action (tjosvold et al., 2004). however, only evaluative behaviours performed during feedback phases were related to later improved performance. for example, teams were able to reduce their errors in the last high-workload task with more preparation (i.e., evaluating during preceding feedback time). as such, c. gabelica 82 | f l r the effect of evaluative behaviours seems to be contingent on the phase during which they are performed (during transition rather than during action). this points a paradox: though they could experience positive consequences of reflecting during these time outs, teams did not increase reflection during feedback over time. third, reflection during action does not seem to be significantly related to better subsequent performance when bivariate correlations are computed. as such, no empirical support could be found for moreland and mcminn‟s (2010) proposition that reflecting during the task could be more beneficial to teams due to the immediate possibility to adjust to the situation. however, this (increasing) reflection during action did not harm team performance either and showed that, in general, teams remained connected despite the increasingly higher workload of their task. communication breakdowns could have been expected. previous team research on non-routine working situations (e.g., waller et al., 2004) has demonstrated that lowperforming teams become cognitively overwhelmed in case of high workload and consequently tend to focus more on their individual sub-task instead of collaborating (e.g., salas, rosen, & king, 2007). finally, initial errors appeared to be a driver of subsequent team reflexivity while later errors mostly did not have this motivational role. these preliminary findings open up the possibility that there might be some time-specific effects of previous errors that are determinant to trigger motivation to improve. maybe the first mistake does not really hurt while repeated errors would be more detrimental to performance? importantly, these results raise the question of the causality of the relation between team reflexivity and performance (e.g., janssen et al., 2010). these analyses seem to rather show a more dynamic and retroactive relation between past and future performance and team reflexivity, raising the need for timeseries-analyses. this is however behind the scope of the present explorative study. this suggests that the question “does prior performance trigger team reflexivity or does prior team reflexivity generate better performance” should rather be changed into: when do performance and team reflexivity dynamically interact to trigger team learning and better subsequent performance. as we found a reversed effect as well, it could be formulated that initial errors do matter and that teams have the capability to learn from them under certain circumstances. taken together, these findings underscore the importance of a careful evaluation of how the team is doing and why during transition phases corresponding to feedback reception. we could not empirically test the effects of three-step reflective cycles since they did not naturally occur but we provided evidence that teams using the feedback opportunity to stop and analyse their performance and strategies were able to translate information about performance into corrective behaviours since their performance got improved. these results are line with studies on the impact of interrupts (e.g., okhuysen & waller, 2002) and theories on feedback in teams that stipulate that feedback receivers‟ involvement plays a critical role to explain feedback effectiveness in teams (e.g., gabelica et al., 2012). 6. limitations and future directions although we have specified behaviours signifying team reflexivity, we have not explored depth of processing (volet, summers, & thurman, 2009). it is likely that deeper reflection (e.g., reflective statements conveying inferences) generates better insight into the feedback content and use in subsequent tasks (anseel et al., 2009). also, previous research has acknowledged that developing strategies for action is important and necessary but does not always ensure actual strategy implementation (gurtner et al., 2007; marks et al., 2001; tschan et al., 2000). our third reflective category (i.e., making decisions) encompassed both strategy development (i.e., clear decisions about new strategies and actions) and strategy implementation (i.e., clear gesture or overt behaviour showing the team acting upon a decision). despite the fact that teams were provided with feedback based on specific criteria, improvement strategies defined by teams could still have been too general or abstract to be directly put into action, or coordination problems could have render the well-defined plans unused. a further investigation of implementation strategies and their quality seems warranted. c. gabelica 83 | f l r overall, this empirical study was designed to meet methodological requirements of a rigorous design with high internal validity, based on its temporal sequencing and the collection of objective performance data. capturing fundamental processes in a controlled environment is a first step to better understand complex phenomena. to that purpose, we used a flight simulation as a research platform to simulate and control task and team features. simulations constitute an interesting environment that offers standardised performance measures with possibilities of controlling complexity, information overload, and cues available from the environment (mathieu, 2000). in a learning perspective, they also allow learners to apply their knowledge and understanding to a task and observe the effects of their decisions in a reactive environment that offers real-time feedback (bronack, riedl, & tashner, 2006; gredler, 2004). prior work on simulations has shown that they stimulate numerous cognitive processes, such as higher-level reasoning or creative thinking (moreno & mayer, 2005; moreno, mayer, spires, & lester, 2001). as a trade-off, laboratory environments overlook natural factors in real-world contexts that may mediate learning. as such, the extent to which the results from our controlled design can be generalised to real had-hoc teams has to be considered with cautious. the artificial and temporal nature of the team and the limited number of team members must be acknowledged. teams of more than two team members and/or knowing each other before completing a team task might exhibit more complex interaction patterns. in this regard, we acknowledge that there has been a recent debate about whether findings from research on dyads can be simply generalised to larger teams (moreland, 2010; williams, 2010). we are aware that certain aspects of team processes and dynamics (e.g., group socialization) can hardly be grasped by the use of dyads and that the addition of team members increases complexity of team communication and coordination (michinov & michinov, 2009; noroozi et al. 2012). however, since research into the dynamic aspects of co-reflection in teams is a relatively new area, the present study looking into the timing of basic team behaviours in the smallest form of teams provides a very good start for further research. a replication study with triads and larger groups is needed to corroborate our results and explore the relationship between team size and successful team reflexivity. moreover, although novelty of the task was controlled, we did not account for group-ability composition while research has demonstrated its influence on the accuracy and quality of explanations in teams (webb et al., 1998). finally, the use of students is sometimes considered as a possible limitation. nevertheless, previous studies have established that little difference holds between the use of students and professional teams when using problem solving and decision-making scenarios (balijepally et al., 2009; yoo & alavi, 2001). still, further research exploring the effects of reflexivity with explicit feedback with more team members and in different settings will be needed to understand the complexity of how team members with different expertise, knowledge, and possibly high diversity deal with feedback that describes the aggregated group effort towards a shared goal. furthermore, future field studies obviously need to consider critical contextual factors that influence and constraint team behaviours (kozlowski & ilgen, 2006). another limitation of the study is the relatively lower frequency of some reflective behaviours (e.g., looking for alternatives). similarly, sequences of reflective behaviours did not occur very often, limiting the possible analyses relating these to performance improvements. this necessitates caution about drawing premature conclusions and underlines the need for replication studies with larger samples. additionally, this limitation could be overcome in future research by stimulating or training teams to become reflexive (e.g., king, 1991) and comparing occurrences of reflective behaviours and their relation with team performance with a notraining condition. another challenging issue is motivational: we do not know why teams did not frequently reflect after feedback. motivational factors behind a lack of reflection and receptivity to feedback should be investigated in further research. finally, the relation between team performance (signified by accurate and specific feedback) and team reflexivity should be analysed in longitudinal designs. additional measurement points of team reflexivity spaced over time would provide a more fine-tuned understanding of under which circumstances previous performance has more impact on learning. feedback loops in which previous performance acts as an input for determining subsequent processes and performance have been recently forwarded as relevant models to understand team dynamics (llgen et al., 2005). c. gabelica 84 | f l r 7. practical implications the present study suggests that to be effective, feedback requires high levels of cognitive engagement from learning teams. however, discussion of and reflection on underlying reasons for success or failure, alternatives, and improvement strategies does not seem to happen spontaneously in teams, which in turn brings out the need to provide them with appropriate external support. this has potential implications for education. first, teachers should uncover whether a lack of engagement in thoughtful analysis of team experience is due to a lack of ability to perform shared reflection (i.e., availability deficiency) or a lack of execution of available skills (i.e., production deficiency). if students know how to reflect as a team, prompts (i.e., scenarios indicating how learners should interact) designed to induce inferences and deep-oriented processing of the content of the feedback appear an appropriate intervention to enhance teams‟ motivation to engage in team reflexivity and to elicit learning strategies which teams would not naturally demonstrate (veenman & elshout, 1999). if students have not yet acquired the skills needed to perform shared reflection, teachers may first model and organise repeated practice with reflective cycles and provide more guidance and structure in their prompts (king, 2007). however, it may be that when students have attained a high level of reflective skills, prompts need to be withdrawn or faded-out to facilitate internalization (dillenbourg & tchounikine, 2007). second, teachers should plan time and space for feedback, in which errors are considered as learning opportunities, and provide tools enabling students to actually perform reflective activities on feedback (gan & hattie, 2014). keypoints there is a lack of evidence in team and collaborative learning research on the role of team reflexivity when team performance feedback is provided. this empirical (explorative) study provides a better understanding of how teams actively process feedback they receive and thus collaboratively evaluate information about past activities and derive better solutions for next action and better team performance theoretically, team reflexivity is often seen as a process consisting of a series of steps, including evaluating performance and strategies, looking for alternatives, and making a clear decision about how to implement changes. however, no empirical work has confirmed this sequential view. our results show that teams seldom engage in full reflective cycles following feedback reception when looking into reflective behaviours individually, our study shows that teams that analyse their performance and strategies and underlying reasons for success or failure during feedback (after action) are able to improve their subsequent team performance. by contrast, it seems that hasty decision-making (occurring at the beginning of team interaction) might be detrimental to future performance we question the fundamental idea of one-way causality between team reflexivity and team performance. while we find instances of better subsequent performance after mutual reflection upon feedback, we also find that initial mistakes promote team reflexivity acknowledgements the authors are very grateful to the two coders of the present study, claudia baudewijns and lubomira nikolova. c. gabelica 85 | f l r references anseel, f., lievens, f., & schollaert, e. (2009). reflection as a strategy to enhance task performance after feedback. organizational behavior and human decision processes, 110, 23-35. doi:10.1016/j.obhdp.2009.05.003 adcroft, a. (2011). the mythology of feedback. higher education research and development 30(4), 405419. doi:10.1080/07294360.2010.526096 argote, l. (1989). to centralize or not to centralize: the effects of uncertainty and threat on group structure and performance. organizational behavior and human decision processes, 43, 58–74. doi:10.1016/j.bbr.2011.03.031 arvaja, m., häkkinen, p., eteläpelto, a., & rasku-puttonen, h. (2000). collaborative processes during report writing of a science learning project: the nature of discourse as a function of task requirements. european journal of psychology of education, 15(4), 455-466. doi:10.1007/bf03172987 austin, j., kessler, m. l., riccobono, j. e., & bailey, j. s. (1996). using feedback and reinforcement to improve the performance and safety of a roofing crew. journal of organizational behavior management, 16(2), 49-75. doi:10.1300/j075v16n02_04 ballard, d. i., tschan, f., & waller, m. j. (2008). all in the timing: considering time at multiple stages of group research. small group research, 39, 328–351. doi:10.1177/1046496408317036 balijepally, v., mahapatra, r., nerur, s., & price, k. h. (2009). are two heads better than one for software development? the productivity paradox of pair programming. mis quarterly, 33(1), 91-118. barron, b. (2000). achieving coordination in collaborative problem-solving groups. journal of the learning sciences, 9, 403–436. doi:10.1207/s15327809jls0904_2 boud, d., keogh, r., & walker, d. (1985). promoting reflection in learning: a model. in d. boud, r. keogh, & d. walker (eds.), reflection: turning experience into learning (pp. 18-40). london: kogan page. boud, d., & e. molloy. (2013). rethinking models of feedback for learning: the challenge of design. assessment & evaluation in higher education, 38(6), 698-712. doi:10.1080/02602938.2012.691462 bowers, c. a., salas, e., prince, c., & brannick, m. t. (1992). games teams play: a methodology for investigating team coordination and performance. behavior research methods, instruments, and computers, 24, 503-506. doi:10.3758/bf03203594 bronack, s., riedl, r., & tashner, j. (2006) learning in the zone: a social constructivist framework for distance education in a 3d virtual world. interactive learning environments, 14(3), 219-232. doi:10.1080/10494820600909157 chan, c. k. k. (2012). co-regulation of learning in computer-supported collaborative learning environments: a discussion, metacognition and learning, 7(1), 63-73. doi:10.1007/s11409-012-9086z chi, m. t. h. (1997). quantifying qualitative analysis of verbal data: a practical guide. the journal of the learning sciences, 6, 271-315. doi:10.1207/s15327809jls0603_1 decuyper, s., dochy, f., & van den bossche, p. (2010). grasping the dynamic complexity of team learning: an integrative model for effective team learning in organizations. educational research review, 5, 111-133. doi:10.1016/j.edurev.2010.02.002 de wever, b., schellens, t., valcke, m., & van keer, h. (2006). content analysis schemes to analyze transcripts of online asynchronous discussion groups: a review. computers & education, 46(1), 6–28. doi:10.1016/j.bbr.2011.03.031 dewey, j. (1910/1997). how we think. mineola, new york: dover. dillenbourg, p. (1999). collaborative learning: cognitive and computational approaches. advances in learning and instruction series. new york, ny: elsevier science, inc. dillenbourg, p., baker, m., blaye, a, & o‟malley, c. (1996). the evolution of research on collaborative learning. in e. spada & p. reiman (eds), learning in humans and machine: towards an interdisciplinary learning science (pp. 189-211). oxford: elsevier. dillenbourg, p., & tchounikine, p. (2007). flexibility in macro-scripts for computer-supported collaborative learning. journal of computer assisted learning, 23(1), 1–13. doi:10.1111/j.1365-2729.2007.00191.x c. gabelica 86 | f l r dochy, f., gijbels, d., raes, e., & kyndt, e. (2014, forthcoming). team learning: research in education and professional contexts. in s. billett, c. harteis, and h. gruber (eds.), international handbook of research in professional practice-based learning. springer. duijnhouwer, h., prins, f. j., , stokking, k. m. (2012). feedback providing improvement strategies and reflection on feedback use: effects on students‟ writing motivation, process, and performance. learning and instruction, 22(4), 171-184. doi:10.1016/j.learninstruc.2011.10.003 edmondson, a. c. (1999). psychological safety and learning behaviour in work teams. administrative science quarterly, 44(2), 350–383. doi:10.2307/2666999 edmondson, a. c, bohmer, r. m, & pisano, g. p. (2001). disrupted routines: team learning and new technology implementation in hospitals. administrative science quarterly, 46, 685-716. doi:10.2307/3094828 edmondson, a. c., dillon, j., & roloff, k. s., (2007). three perspectives on team learning outcome improvement, task mastery, and group process. the academy of management annals, 1, 269-314. doi:10.1080/078559811 entin, e. e., & serfaty, d. (1999). adaptive team coordination. human factors, 41(2), 312-325. doi:10.1518/001872099779591196 eriksen, j., & dyer, l. (2004). right from the start: exploring the effects of early team events on subsequent project team development and performance. administrative science quarterly, 49, 438–471. doi:10.2307/4131442 eva, k. w., armson, h., holmboe, e., lockyer, j., loney, e., mann, k., et al. (2012). factors influencing responsiveness to feedback: on the interplay between fear, confidence, and reasoning processes. advances in health sciences education: theory and practice, 17(1), 15–26. doi:10.1007/s10459-0119290-7 gabelica, c., van den bossche, p., segers, m., & gijselaers, w. (2012). feedback, a powerful lever in teams: a review. educational research review, 7(2), 123-144. doi:10.1016/j.bbr.2011.03.031 gan, m. j. s., & hattie, j. (2014). prompting secondary students‟ use of criteria, feedback specificity and feedback levels during an investigative task. instructional science. doi:10.1007/s11251-014-9319-4. gersick, c. j. g. (1989). marking time: predictable transition in task groups. academy of management journal, 32, 274-309. doi:10.2307/256363 gibbs, g., & simpson, c. (2004). conditions under which assessment supports students‟ learning. learning and teaching in higher education, 1, 3-31. goodman, j. s., wood, r. e., & hendrickx, m. (2004). feedback specificity, exploration, and learning. journal of applied psychology, 89(2), 248-262. doi:10.1037/0021-9010.89.2.248 gredler, m. e. (2004). games and simulations and their relationships to learning. in d. h. jonassen (ed.), handbook of research for educational communications and technology (2nd ed., pp. 571-82). mahwah, nj: lawrence erlbaum associates. gurtner, a., tschan, f., semmer, n. k., & nagele, c. (2007). getting groups to develop good strategies: effects of reflexivity interventions on team process, team performance, and shared mental models. organizational behavior and human decision processes, 102, 127–142. doi:10.1016/j.bbr.2011.03.031 hackman, j. r., & wageman, r. (2005). a theory of team coaching. academy of management review, 30, 269-287. doi:10.5465/amr.2005.16387885 hattie, j. (2013). calibration and confidence: where to next? learning and instruction, 24, 62-66. doi:10.1016/j.bbr.2011.03.031 hattie, j., & timperley, h. (2007). the power of feedback. review of educational research, 77(1), 81-112. doi:10.3102/003465430298487 hoegl, m., & parboteeah, k. p. (2006). team goal commitment in innovative projects. international journal of innovation management, 10(3), 299–324. doi:10.1142/s136391960600151x. ilgen, d. r., hollenbeck, j. r., johnson, m., & jundt, d. (2005). teams in organizations: from i-p-o models to imoi models. annual review of psychology, 56, 517-543. doi:10.1146/annurev.psych.56.091103.070250 jay, j. k., & johnson, k. l. (2001). capturing complexity: a typology of reflective practice for teacher education. teaching and teacher education, 18(1), 73-85. doi:10.1016/j.bbr.2011.03.031 c. gabelica 87 | f l r janssen, j., kirschner, f., erkens, g., kirschner, p. a., & paas, f. (2010). making the black box of collaborative learning transparent: combining process-oriented and cognitive load approaches. educational psychology review, 22,139-154. doi:10.1007/s10648-010-9131-x järvelä, s., volet, s., & järvenoja, h. (2010). research on motivation in collaborative learning: moving beyond the cognitive-situative divide and combining individual and social processes. educational psychologist, 45, 15–27. doi:10.1080/00461520903433539 johnson, d. w., & johnson, r. t. (1992). key to effective cooperation. in r. hertz-lazarowitz, & n. miller (eds.), interaction in cooperative groups. the theoretical anatomy of group learning (pp. 174–199). new york, ny: cambridge university press. johnson, d. w., & johnson, r. t. (1993). what we know about cooperative learning at the college level. cooperative learning, 13(3), 17–18. johnson, d. w., johnson, r. t., & stanne, m. e. (2000). cooperative learning methods: a meta-analysis. university of minnesota, minneapolis: cooperative learning center. retrieved from: http://www.cooperation.org/pages/cl-methods.html. jung, d. i., & sosik, j. j. (2003). group potency and collective efficacy: examining their predictive validity, level of analysis, and effects of performance feedback on future group performance. group & organization management, 28, 366-391. doi:10.1177/1059601102250821 kaplan, s., laport, k., & waller, m. j. (2013). the role of positive affect in team effectiveness during crises. journal of organizational behavior, 34(4), 427-580. doi:10.1002/job.1817 kapur, m., voiklis, j., & kinzer, c. k. (2008). sensitivities to early exchange in synchronous computersupported collaborative learning (cscl) groups. computers & education, 51(1), 54-66. doi:10.1016/j.compedu.2007.04.007 karau, s. j., & kelly, j. r. (1992). the effects of time scarcity and time abundance on group performance quality and interaction process. journal of experimental social psychology, 28, 542–571. doi: 10.1016/j.bbr.2011.03.031 killion, j., & todnem, g. (1991). a process for personal theory building. educational leadership, 48(6), 1416. king, a. (2007). scripting collaborative learning processes: a cognitive perspective. in f. fischer, i. kollar, h. mandl, & j. m. haake (vol. eds.), computer-supported collaborative learning series: vol. 6. scripting computer-supported collaborative learning (pp. 13–37). new york: springer. doi:10.1007/978-0-387-36949-5. king, a. (1991). effects of training in strategic questioning on children‟s problem-solving performance. journal of educational psychology, 83(3), 307–317. kirschner, p. a. (2009). epistemology or pedagogy, that is the question. in: s. tobias & t.m. duffy (eds.), constructivist theory applied to instruction: success or failure? (pp. 144-157). new york: routledge. kirschner, f., paas, f., & kirschner, p. a. (2008). a cognitive load approach to collaborative learning: united brains for complex tasks. educational psychology review, 21, 31-42. doi:10.1016/j.chb.2008.12.008 kluger, a. n., & denisi, a. (1996). the effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory. psychological bulletin, 119, 254-284. doi:10.1037/0021-9010.73.1.87 kozlowski, s. w. j., & ilgen, d. r. (2006). enhancing the effectiveness of work groups and teams. psychological science in the public interest, 7, 77–124. doi:10.1111/j.1529-1006.2006.00030.x kramarski, b. (2004). making sense of graphs: does metacognitive instruction make a difference on students‟ mathematical conceptions and alternative conceptions. learning and instruction, 14, 593619. doi:10.1016/j.bbr.2011.03.031 lajoie, s. p., & lu, j. (2012). supporting collaboration with technology: does shared cognition lead to coregulation in medicine. metacognition and learning, 7, 45-46. doi:10.1007/s11409-011-9077-5 landis, j. r., & koch, g. g. (1977). the measurement of observer agreement for categorical data, biometrics, 33, 159-174. leicht, r. m., hunter, s. t., saluja, c., & messner, j. i. (2010). implementing observational research methods to study team performance in construction management. journal of construction engineering and management, 136(1), 76–86. doi:10.1061/(asce)co.1943-7862.0000080 c. gabelica 88 | f l r lewis, k., belliveau, m., herndon, b, & keller, j. (2007). group cognition, membership change, and performance: investigating the benefits and detriments of collective knowledge. organizational behavior and human decision processes, 103(2), 159-178. doi: 10.1016/j.bbr.2011.03.031 london, m. (2003). job feedback: giving, seeking, and using feedback for performance improvement (2nd ed.). mahwah, nj: lawrence erlbaum. london, m., & sessa, v. i. (2006). group feedback for continuous learning. human ressource development review, 5(3), 1-27. doi:10.1177/1534484306290226 marks, m. a., mathieu, j. e., & zaccaro, s. j. (2001). a temporally based framework and taxonomy of team processes. academy of management review, 26,356–376. doi: 10.2307/259182 mathieu, j. e., & rapp, t. l. (2009). laying the foundation for successful team performance trajectories: the role of team charters and performance strategies. journal of applied psychology, 94, 90 –103. doi:10.1037/a0013257 mccarthy, a., & garavan, t. (2008). team learning and metacognition: a neglected area of hrd research and practice. advances in developing human resources, 10, 509-524. doi:10.1177/1523422308320496 mcgrath, j. e. (1993). the jemco workshop: description of a longitudinal study, small group research, 24, 285–306. doi:10.1177/1046496493243002 michinov, n., & michinov, e. (2009). investigating the relationship between transactive memory and performance in collaborative learning. learning and instruction, 19(1), 43–54. doi:10.1016/j.learninstruc.2008.01.003 moreland, r. l., & mcminn, j. g. (2010). group reflexivity and performance. in s. r. thye & e. lawler (eds.), advances in group processes (vol. 27, pp. 63-95). bingley, uk: emerald press. moreno, r., mayer, r. e., spires, h. a., & lester, j. c. (2001). the case for social agency in computerbased teaching. do students learn more deeply when they interact with animated pedagogical agents? cognition and instruction, 19(2), 177-213. doi:10.1207/s1532690xci1902_02 mory, e. h. (2003). feedback research revisited. in d. h. jonassen (ed.), handbook of research on educational communications and technology (pp. 745-783). mahwah, nj: erlbaum. morris, j., & stew, g. (2007). collaborative reflection: how far do 2:1 models of learning in the practice setting promote peer reflection, reflective practice, 8(3), 419-432. doi:10.1080/14623940701425220 mulder, r. h., & ellinger, a. d. (2013). perceptions of quality of feedback in organizations: characteristics, determinants, outcomes of feedback, and possibilities for improvement: introduction to a special issue. european journal of training and development, 37(1), 4-23. doi:10.1108/03090591311293266 mulder, r. h. (2013). exploring feedback incidents, their characteristics and the informal learning activities that emanate from them. european journal of training and development, 37(1), 49-71. müller, a., herbig, b. , & petrovic, k. (2009). the explication of implicit team knowledge and its supporting effect on team processes and technical innovations. an action regulation perspective on team reflexivity. small group research, 40, 28-51. doi:10.1177/1046496408326574 mullins, d., rummel, n., & spada, h. (2011). are two heads always better than one? differential effects of collaboration on students' computer-supported learning in mathematics. international journal of computer supported collaborative learning, 6(3), 421-443. doi:10.1007/s11412-011-9122-z noroozi, o., weinberger, biemans, h. j. a., mulder, m., & chizari, m. (2012). argumentation-based computer supported collaborative learning (abcscl). a systematic review and synthesis of fifteen years of research. educational research review, 7(2), 79–106. doi:10.1016/j.learninstruc.2008.03.001 okhuysen, g. a., & eisenhardt, k. m. (2002). integrating knowledge in groups: how simple formal interventions help. organization science, 13, 370-386. doi:10.1287/orsc.13.4.370.2947 okhuysen, g., & waller, m. j. (2002). focusing on midpoint transitions: an analysis of boundary conditions. academy of management journal, 45, 1056-1065. doi:10.2307/3069330 phielix, c., prins, f. j., & kirschner, p. a. (2010). awareness of group performance in a cscl environment: effects of peer feedback and reflection. computers in human behavior, 26, 151-161. doi:10.1016/j.bbr.2011.03.031 phielix, c., prins, f. j., kirschner, p. a., erkens, g., & jaspers, j. g. m. (2011). group awareness of social and cognitive performance in a cscl environment: effects of a peer feedback and reflection tool. computers in human behavior, 27(3), 1087-1102. doi:10.1016/j.bbr.2011.03.031 c. gabelica 89 | f l r prins, f. j., sluijsmans, d. m. a., & kirschner, p. a. (2006). feedback for general practitioners in training: quality, styles, and preferences. advances in health sciences education, 11, 289-303. doi:10.1007/s10459-005-3250-z prinsen, f. r., terwel, j., zijlstra, b. j. h., & volman, m. m. l. (2013). the effects of guided elaboration in a cscl programme on the learning outcomes of primary school students from dutch and immigrant families. educational research and evaluation, 19(1), 39–57. doi:10.1080/13803611.2012.744694 reimann, p. (2007). time is precious: why process analysis is essential for cscl (and can also help to bridge between experimental and descriptive methods). in c. chinn, g. erkens & s. puntambekar (eds.), mice, minds, and society. proceedings of the computer-supported collaborative learning conference (cscl 2007) (pp. 598–607). new brunswick, nj: international society of the learning sciences. rourke, l., & anderson, t. (2004). validity in quantitative content analysis. education technology research and development, 52(1), 5-18. doi:10.1007/bf02504769 rourke, l., anderson, t., garrison, d. r., & archer, w. (2001). methodological issues in the content analysis of computer conference transcripts. international journal of artificial intelligence in education, 12, 8-22 rulke, d. l., & rau, d. (2000). investigating the encoding process of transactive memory development in group training. group & organization management, 25, 373-396. rummel, n., mullins, d., & spada, h. (2012). scripted collaborative learning with the cognitive tutor algebra. international journal of computer supported collaborative learning, 7(2), 307-339. doi:10.1007/s11412-012-9146-z rummel, n., & spada, h. (2005). learning to collaborate: an instructional approach to promoting collaborative problem solving in computer-mediated settings. journal of the learning sciences, 14(2), 201-241. doi:10.1207/s15327809jls1402_2 rummel, n., spada, h., & hauser, s. (2009). learning to collaborate from being scripted or from observing a model. international journal of computer-supported collaborative learning, 4(1), 69-92. doi: 10.1007/s11412-008-9054-4 salas, e. dickinson, t. l., converse, s. a., & tannenbaum, s. i. (1992). toward an understanding of team performance and training. in r. w. swezey & e. salas (eds.), teams: their training and performance (pp. 3-29). norwood, nj: ablex. salas, e., rosen, m. a., & king, h. (2007). managing teams managing crises: principles of teamwork to improve patient safety in the emergency room and beyond. theoretical issues in ergonomics science, 8(5), 381-394. doi:10.1080/14639220701317764 salas, e., stagl, k. c., & burke, c. s. (2004). 25 years of team effectiveness in organizations: research themes and emerging needs. in c. l. cooper & i. t. robertson (eds.), international review of industrial and organizational psychology (pp. 47–91). new york: wiley. savelsbergh, c., heijden, b. i. j. m. van der, & poell, r. f. (2009). the development and empirical validation of a multi-dimensional measurement instrument for team learning behaviors. small group research, 40(5), 578-607. doi:10.1177/1046496409340055 schippers, m. c., den hartog, d. n., & koopman, p. l. (2007). reflexivity in teams: a measure and correlates. applied psychology-an international review, 56(2), 189-211. doi:10.1111/j.14640597.2006.00250.x schippers, m. c., den hartog, d. n., koopman, p. l., & wienk, j. a. (2003). diversity and team outcomes: the moderating effects of outcome interdependence and group longevity and the mediating effect of reflexivity. journal of organizational behavior, 24, 779-802. doi:10.1002/job.220 schippers, m. c., den hartog, d. n., koopman , p. l., & van knippenberg, d. (2008). the role of transformational leadership in enhancing team reflexivity. human relations, 61, 1593-1616. doi:10.1177/0018726708096639 schippers, m. c., homan, a. c., & van knippenberg, d. (2013). to reflect or not to reflect: prior team performance as a boundary condition of the effects of reflexivity on learning and final team performance. journal of organizational behavior, 34(1), 6-23. doi:10.1002/job.1784 schön, d. (1983). the reflective practitioner: how professionals think in action. new york: basic. scott-young, c., & samson, d. (2009). how team efficacy beliefs impact project performance: an empirical c. gabelica 90 | f l r investigation of team potency in capital projects in the process industries. world academy of science, engineering and technology, 3, 6-28. doi:10.1.1.308.8306 seibert, k. w. (1999). reflection-in-action: tools for cultivating on-the-job learning conditions. organizational dynamics, 27, 54-65. doi:10.1016/s0090-2616(99)90021-9 soller, a., monés, a. m., jermann, p., & mühlenbrock, m. (2005). from mirroring to guiding: a review of state of the art technology for supporting collaborative learning. international journal on artificial intelligence in education, 15(5), 261– 290. doi:10.1.1.26.2186 sims, d. e., salas, e., & burke, c. s. (2005). promoting effective team performance through training. in s. a. wheelan (ed.), the handbook of group research and practice (pp. 407-425). thousand oaks, ca: sage. shute, v. j. (2008). focus on formative feedback. review of educational research, 78, 153-189. doi:10.3102/0034654307313795 tschan, f., semmer, n. k., nägele, c., & gurtner, a. (2000). task adaptive behavior and performance in groups. group processes and intergroup relations, 3(4), 367–386. doi:10.1177/1368430200003004003 tjosvold, d., tang, m. m. l., & west, m. (2004). reflexivity for team innovation in china: the contribution of goal interdependence. group & organization management, 29(5), 540–559. doi:10.1177/1059601103254911 van ginkel, w., tindale, r. s., & van knippenberg, d. (2009). team reflexivity, development of shared task representations, and the use of distributed information in group decision making. group dynamics, 13, 265-280. doi:10.1037/a0016045 van ginkel, w. p., & van knippenberg, d. (2009). knowledge about the distribution of information and group decision making: when and why does it work? organizational behavior and human decision processes, 108(2), 218-229. doi:10.1016/j.obhdp.2008.10.003 van der haar, s., segers, m. r. s, & jehn, k. a. (2013). towards a contextualized model of team learning processes and outcomes. educational research review, 10, 1-12. doi:10.1016/j.edurev.2013.04.001 van der pol, j., van den berg, i., admiraal, w. f., & simons, p. r. j. (2008). the nature, reception, and use of online peer feedback in higher education. computers & education, 51, 1804-1817. doi: 10.1016/j.bbr.2011.03.031 veenman, m. v. j., & elshout, j. j. (1999) changes in the relation between cognitive and metacognitive skills during the acquisition of expertise. european journal of psychology of education, 14, 509-523. doi:10.1007/bf03172976 veestraeten, m., kyndt, e., & dochy, f. (2014). investigating team learning in a military context. vocation and learning, 7(1), 75-100. doi:10.1007/s12186-013-9107-3 villado, a. j., & arthur, w. jr. (2013). the comparative effect of subjective and objective after-action reviews on team performance on a complex task. journal of applied psychology, 98, 514-528. doi:10.1037/a0031510 visschers-pleijers, a. j. s. f., dolmans, d. h. j. m., de leng, b. a., wolfhagen, h. a. p., & van der vleuten, c. p. m. (2006). analysis of verbal interactions in tutorial groups: a process study. medical education, 40, 129-137. doi:10.1111/j.1365-2929.2005.02368.x volet, s., summers, m., & thurman, j. (2009). high-level co-regulation in collaborative learning: how does it emerge and how is it sustained? learning and instruction, 19, 128–143. doi:10.1016/j.bbr.2011.03.031 waller, m. j. (1999). the timing of adaptive group responses to nonroutine events. academy of management journal, 42, 127–137. doi:10.2307/257088 waller, m. j., gupta, n., & giambatista, r. c. (2004). effects of adaptive behaviors and shared mental models on control crew performance. management science, 50, 1534–1544. doi:10.1287/mnsc.1040.0210 weingart, l. r. (1992). impact of group goals, task component complexity, effort, and planning on group performance. journal of applied psychology, 77(5), 682–693. doi:10.1037/0021-9010.77.5.682 west, m. a. (ed.) (1996). handbook of work group psychology, chichester: john wiley & sons, ltd. west, m. a. (2000). reflexivity, revolution and innovation in work teams. in m. m. beyerlein, d.a. johnson & s.t. beyerlein (eds), product development teams, vol. 5. (pp. 1-29). stamford, ct: jai c. gabelica 91 | f l r press. west, m. a., garrod, s., & carletta, j. (1997). group decision-making and effectiveness: unexplored boundaries. in c.l. cooper & s.e. jackson (eds.), creating tomorrow’s organizations: a handbook for future research in organizational behavior (pp. 293316). chichester: john wiley & sons ltd. williams, k. d. (2010). dyads can be groups (and often are). small group research, 41, 268-274. doi:10.1177/1046496409358619 wills, k. v., & clerkin, t. a. (2009). incorporating reflective practice into team simulation projects for improved learning outcomes. business communication quarterly, 72(2), 221 227. doi:10.1177/1080569909334559 yang, m., & carless, d. (2013). the feedback triangle and the enhancement of dialogic feedback processes. teaching in higher education. 18(3), 285-297. doi:10.1080/13562517.2012.719154 yoo, y., & alavi, m. (2001) media and group cohesion: relative influences on social presence, task participation, and group consensus, mis quarterly, 25(3), 371-390. doi:10.2307/3250922 yukawa, j. (2006). co-reflection in online learning: collaborative critical thinking as narrative. journal of computer-supported collaborative learning, 1, 203–228. doi:10.1007/s11412-006-8994-9 microsoft word von der linden et al_publication.docx frontline learning research vol.3 no. 4 (2015) 37-‐ 55 issn 2295-‐3159 effects of a short strategy training on metacognitive monitoring across the life-span nicole von der linden1, elisabeth löffler, wolfgang schneider university of würzburg, germany article received 29 july / revised 20 november / accepted 21 november / available online 18 january abstract the present study was conducted to explore the potential positive influence of a short strategy training on metacognitive monitoring competencies covering a life-span approach. participants of four age groups (3rd-grade children, adolescents, younger and older adults) concluded a paired-associate learning task. additionally, they gave delayed judgments-of-learning (jols), that is, they rated their certainty that they would later be able to recall specific details correctly, and confidence judgements (cjs), that is, they rated their certainty that the provided answers in the recall test were correct. half of the participants underwent a short strategy training in order to enhance their recollection of contextual details thus providing a diagnostic basis for forming metacognitive judgements. results revealed significant gains in memory performance after completing the strategy training. moreover, a positive effect of the strategy training on jols and cjs differentiation and accuracy could be detected. effects were most pronounced for children and older adults. participants who had completed the strategy training also reported a decrease of familiarity-based metacognitive judgments and were able to identify memories for which no reliable cues existed more easily than participants in the control condition. accordingly, improvements in monitoring performance seemed to be due to a shift in underlying cues. in sum, this study integrates traditional aims from the relatively separately existing lines of metacognitive research in the developmental and cognitive literature and adds to understanding and improving monitoring judgments in a lifetime sample. keywords: metamemory, judgments-of-learning, confidence judgments, monitoring, life-span, strategy instruction 1 corresponding author: nicole von der linden, department of psychology, university of würzburg, röntgenring 10, 97070 würzburg, germany. phone: +49(0)931 / 31 89067, fax: +49(0)931 / 31 2763, email: linden@psychologie.uni-wuerzburg.de doi: http://dx.doi.org/10.14786/flr.v3i4.196 von der linden et al | f l r 38 1. introduction accurate metacognitive monitoring plays an important role in many everyday situations as well as in learning contexts (schneider, 2010; son & metcalfe, 2000). in daily routines, metacognitive monitoring is for instance relevant when one has to decide about whether one has memorized the departure time of one’s train, or whether one has taken appropriate notes of a lecture. moreover, subjective monitoring judgments influence learning behaviors, especially the selection of to-be-studied items and the allocation of study time (see son & metcalfe, 2000, for a review). structured learning situations are not only important to children and young adults but also for life-long learning which has gained importance in recent years. yet, life-span perspective is still rare in the metacognitive monitoring literature. metacognitive research has traditionally been conducted within two main but separate lines of research: a developmental perspective (flavell, 1999) which focuses on the changes of metacognitive abilities during life and a cognitive tradition (for an overview cf. dunlosky & metcalfe, 2009; koriat & levy-sardot, 1999) which tries to explore mechanisms underlying metacognitive monitoring processes and its consequences for regulation of learning typically in an adult sample only. in the following, we present a study which is one of few existing attempts to combine both perspectives concerning metacognitive monitoring processes. firstly, the present study aimed at exploring developmental trajectories of monitoring abilities in a life-span sample from early school age to older adulthood. a second purpose was to improve participants’ monitoring competencies by reinforcing highly diagnostic cues in paired-associate learning situations (mccabe & soderstrom, 2011; robinson, hertzog, & dunlosky, 2006) and to investigate the role of familiarity and recollection-based cues for monitoring processes. in learning situations two aspects of monitoring are of special interest: judgments of learning (jol) and confidence judgments (cj). according to nelson and narens’ (1990) seminal model of procedural metamemory, jols provide subjective information about the degree to which encoded information has been mastered and can be potentially recalled during a future memory test (nelson & narens, 1990). findings from studies using different age groups suggest that even young children can effectively monitor their learning progress under certain circumstances. on the one hand, the results indicate that immediate jols are typically inaccurate and also represent overestimations of one’s actual performance. remarkably, this is true not only for children of different ages but also for adults. immediately after studying new information, judgments about its future recall seem severely biased by the false belief that information currently in shortterm memory can be easily recalled some minutes later. obviously, this bias operates similarly in participants of different ages. on the other hand, however, even young children can make rather accurate assessments of the subsequent recallability of items when this judgment is somewhat delayed, that is, when it takes place a minute or two after studying the item. in other words, even young children seem to have a good feeling for which items will be recallable and which will not when long-term memory information has to be accessed for the jol (schneider, 2015). confidence judgments (cjs) concern retrieval monitoring and are typically made after a response is given to indicate how sure participants are about the correctness of an answer. cjs are thought to reflect a substantive sense of certainty that arises from the strength of the memory that is being retrieved, and this sense of certainty has been interpreted as an indicator of memory accuracy (ghetti, lyons, lazzarin, & cornoldi, 2008; roebers, 2002). metacognitive monitoring judgments are commonly believed to be based on multiple cues. this accessibility view has been proposed for immediate and delayed jols (koriat, 1997; metcalfe & finn, 2008; toth, daniels, & solinger, 2011) as well as for cjs (kelley & jacoby, 1996; kelley & sahakyan, 2003). the accuracy of those judgments depends on whether accessible cues are diagnostic of memory performance or not (dunlosky & metcalfe, 2009). cues are highly diagnostic if they influence metacognitive judgments and recall performance in a similar way. thus far multiple sources involved in the construction of monitoring judgments have been postulated. for example, research investigating immediate jols emphasizes encoding fluency as a major base, which in turn depends on different factors such as the concreteness and the frequency of the items (begg, duft, lalonde, melnick, & sanvito, 1989) as well as familiarity of items von der linden et al | f l r 39 (nelson & narens, 1990). delayed jols have been linked to retrieval fluency (benjamin & bjork, 1996) as well as success and ease of item retrieval (nelson, narens, & dunlosky, 2004). similar to jols, for cjs a number of different cues have been discussed to influence accuracy, among them perceived ease (zakay & tuvia, 1998) and vividness of retrieval (robinson, johnson, & robertson, 2000). different attempts have been made to categorize these various types of cues (koriat, 1997; kelley & jacoby, 1996). in recent years the literature has begun to discuss the distinction between familiarity-based and recollection-based cues for different metacognitive judgments (daniels, toth, & hertzog, 2009; mccabe & soderstrom, 2011; metcalfe & finn, 2008; toth et al., 2011). recollection is typically defined as the consciously controlled intentional use of memory that allows for the retrieval of qualitative details of a past event. this process is frequently associated with the subjective experience of vivid remembering. familiarity, by contrast, usually refers to experiences of prior events that may arise from activated semantic representations. the relative contribution of recollection and familiarity may differ from task to task. memory tasks that require participants to recall or recognize details about the target item rely heavily on recollection. so far, the literature lacks a systematic examination of the role of recollection and familiarity cues across different monitoring indicators and across the life-span. evidence suggests that both immediate (daniels et al., 2009; toth et al., 2011) and delayed jols (metcalfe & finn, 2008) are influenced by recollection and familiarity processes. yet, delayed jols are mainly based on cues related to recollection such as target retrievability (for an overview see metcalfe & finn, 2008), whereas familiarity processes, such as processing fluency, have been identified as primary cues for immediate jols in younger adults (matvey, dunlosky, & guttentag, 2001; rhodes & castel, 2008). to our knowledge, the role of familiarity and recollection processes underlying jols in children has not been examined to date. concerning older adults, first evidence suggests that they have more problems with monitoring recollection processes than younger adults (daniels et al., 2009). several findings support the idea that recollection processes increase the accuracy of monitoring processes. first, as noted above, delayed jols have been shown to be more accurate than immediate jols for children, younger, and older adults (connor, dunlosky, & hertzog, 1997; koriat & shitzer-reichert, 2002; nelson & dunlosky, 1991; schneider,visé, lockl, & nelson, 2000). this can be explained by the fact that for delayed jols participants actively assess long-term memory (recollection processes), which is more predictive of recall than short-term memory used for immediate jols. additionally, with delayed jols participants seem to rely more on idiosyncratic cues of encoding and remembering than with immediate jols (koriat, 1997). idiosyncratic cues refer to personal, item-specific details, for instance, images or associations. providing a rich basis of idiosyncratic cues (e.g. through a strategy training) should facilitate the identification of recollection-based memories. further evidence for the importance of recollection processes for accurate monitoring in younger adults is provided by the following fact: focusing participants’ attention to cues connected with target retrievability enhanced jol accuracy compared to immediate jols (mccabe & soderstrom, 2011). furthermore, for recollection-based memories higher (daniels et al., 2009; toth et al., 2011) and more accurate immediate jols (toth et al., 2011) could be found than for familiarity-based memories. in sum, more research on the role of familiarity and recollection processes for jols is needed, not only for children but also in terms of comparisons of broader age ranges. yet, existing evidence points to the fact that the retrieval of contextual information seems to provide a reliable cue for later memory performance in different age groups. similarly, cjs seem to be based on recollection (analytic) or familiarity (non-analytic) components (kelley & jacoby, 1996; kelley & sahakyan, 2003). recollection processes have been identified as playing a major role for cj accuracy (kelley & sahakyan, 2003) and accuracy losses with increasing old age have been linked to recollection impairments (kelley & sahakyan, 2003; wong, cramer, & gallo, 2012). for children the role of recollection and familiarity processes underlying cjs has not been addressed yet. remember-know-judgments which are positively correlated with cjs (holmes & weaver, 2010) also emphasize the role of recollection and familiarity processes for metacognitive judgments. remember-knowjudgments distinguish whether a memory content is associated with specific contextual information von der linden et al | f l r 40 (recollection) or is familiarity-based, with participants unable to retrieve the personal encounter with a memory detail. although there is consensus that participants base their monitoring judgments on various cues, they do not always take into consideration the factors which are most predictive of memory performance (koriat, 1997; touron, hertzog, & speagle, 2010). as discussed above, recollection-based cues are considered to be important and also highly diagnostic for both jols and cjs. therefore, a training program that aims at strengthening the accessibility of those cues should enhance monitoring accuracy (mccabe & soderstrom, 2011). yet, to our knowledge no systematic training in this area has been carried out. a training should be beneficial for all age groups but especially for older adults, among whom deficits in monitoring processes have been linked to problems with recollection processes for jols (toth et al., 2011), cjs (shing, werklebergner, li, & lindenberger, 2009; wong et al., 2012) and feeling-of-knowing judgments (souchay, bacon, & danion, 2006). children should also particularly profit from such a training procedure as developmental progression in monitoring skills seems to be influenced by improved retrieval processes for both jols (koriat & shitzer-reichert, 2002) and cjs (roderer & roebers, 2011). the study presented here was designed to fill this gap in the literature and to explore the effect of recollection and familiarity based cues on metacognitive monitoring. although the role of recollection and familiarity-based processes for metacognitive processes has received more attention in recent years, available studies have focused on one type of metacognitive judgment only and involve one or at most two age groups (daniels et al., 2009; souchay et al., 2006). especially empirical studies with children are scarce. therefore the design included a life-span perspective to account for the life-long significance of learning. moreover, two different types of metacognitive judgments (jols and cjs) were included in the present study in order to allow for direct comparison within one sample. multiple but partly different cues seem to underlie delayed jols and cjs (koriat, 1997, 2012) for they occur in different stages of the learning process. compared to jols, strengthening recollection processes may have a somewhat greater effect on cjs as the longer interval to the learning stage might otherwise foster the reliance on familiarity, especially in older adults (shing et al., 2009). consequently, it is of interest to compare different monitoring indicators yet such studies are very rare in the literature (leonesio & nelson, 1990). specifically, participants of four age groups (early school age to later adulthood) were asked to complete a paired-associate learning task and to give delayed jols (which are more accurate than immediate jols across all of the different age groups; see above) and cjs. a paired-associate task was chosen in order to ensure comparability with related studies and because for this stimulus material ample evidence exists for the effectivity of strategy trainings across all included age groups (see below). half of the participants underwent a strategy training in order to enhance their recollection of contextual details during jol, cj and test collection, thus providing a diagnostic basis for forming metacognitive judgments. to ensure that the training was transferable to rehearsing processes in everyday life and for different age groups a short instruction in mental imagery was chosen. in paired-associate learning, mental imagery has proven to be the most efficient way of processing (richardson, 1998), and it effectively improves recall performance in different age groups from first grade on to older age (richardson, 1998; verhaeghen, marcoen, & goossens, 1992; willoughby, porter, belsito, & yearsley, 1999). even more relevant to our study, some recent studies provide first evidence for the fact that strategy use successfully improves monitoring processes for both jols and cjs although this research does not specifically explore the cues underlying monitoring judgments and hardly ever an explicit strategy training was done. hertzog, sinclair, and dunlosky (2010) have shown that spontaneous strategy use (e.g. mental imagery), which was not induced but only accessed after jol collection, substantially influenced jols and jol resolution, that is, the accuracy with which a person can monitor the relative recallability of different items, in adults aged 18 to 81. robinson et al. (2006) instructed but not trained younger and older adults to use a mental imagery strategy when memorizing pairs of items. they found that the size of the jols and recall performance were positively correlated with strategy use in both age groups and mental imagery was identified as a diagnostic cue for jols. to our knowledge no comparable studies exist for children. thus in von der linden et al | f l r 41 jols, so far no attempts have been made to directly train subjects of a broad age range to apply an imagery strategy. concerning cjs, nietfeld and schraw (2002) trained college students to use various strategies for probability tests. as a result, subjects benefitted from the instructions both in terms of performance and monitoring accuracy (cjs). besides shing et al. (2009) showed that participants from 10 to 75 years of age benefitted from strategy training in terms of their cjs by enlarging the difference in cjs provided after hits compared to false alarms. in accordance with the literature we expected a positive effect of strategy training on both recall processes and metacognitive processes (jols and cjs) in all age groups. as our study is the first to include a life-span approach to investigate the influence of recollection processes on metacognitive monitoring, developmental effects were of special interest. we proposed that children and older adults would benefit most from the strategy training: production deficits concerning strategy use are most pronounced in children and older adults (naveh-benjamin, brav, & levy, 2007; pressley & levin, 1977), and recall performance increases during childhood declines in older adulthood (weinert & schneider, 1996). although generally little developmental progression is found for jols, recent evidence suggests that under certain circumstances deficits in recollection processes may play a role in lower jol accuracy in older adults (daniels et al., 2009; toth et al., 2011). as for cjs, their accuracy has been shown to improve over the primary school years (roebers, von der linden, howie, & schneider, 2007), and they seem to be influenced by retrieval processes (roderer & roebers, 2010). deficits in older adults’ cjs have also been linked to deteriorated recollection processes (kelley & sahakyan, 2003). additionally, we aimed to compare the effects of our training on different monitoring indicators as the importance of recollection processes might vary in different stages of the learning process. since we proposed that a strategy training should be effective mainly due to enhanced accessibility of recollection-based cues, we additionally asked participants to classify the basis of their recall as recollection, familiarity or no memory (rfn-judgments). these classifications were successfully introduced for jols by daniels et al. (2009) and toth et al. (2011). we extended the use of rfn-judgments to cjs. 2. method 2.1 sample a total of 160 (85 male, 75 female) participants of four age groups (40 children in 3rd grade, 40 adolescents in 7th and 8th grade, 40 younger adults between 19 and 26 years of age and 40 older adults between 60 and 75 years) took part in our study. this sample size surpasses the required number of n = 132 participants as determined by an a-priori power-analysis which was conducted with the premise to detect medium-sized effects according to cohen (d = .25). they were recruited via contacting their schools directly and via newspaper and internet advertisements. children and adolescents received small gifts, whereas the other participants got 10-euro vouchers or were paid in cash. subjects’ mean ages were 8.38 (sd = 0.49) for the children, 12.73 (sd = 0.72) for the adolescents, 22.75 (sd = 2.02) for the younger adults and 68.40 (sd = 4.08) for the older adults. 2.2 materials the learning items consisted of pairs of concrete german nouns from different semantic categories (e.g. zoo animals, furniture, clothing etc.). to vary the difficulty, one half of the pairs represented two words von der linden et al | f l r 42 from the same category and the other half of the pairs comprised words from two different categories. the item list for children and older adults consisted of 45 word pairs in the study phase and 60 word pairs in the recognition phase (the latter including 30 pairs identical to the study phase, 15 newly matched pairs and 15 completely new pairs). item pairs for adolescents included 54 word pairs in the study phase and 72 items in the recognition phase. the numbers for younger adults were 60 and 80 item pairs respectively. four practice pairs not included in the analysis preceded each single phase of the experiment. the appearance of word pairs as identical or recombined was counterbalanced among the subjects. the order of presentation was randomized as well. 2.3 procedure the consent of the parents and of the school was obtained for the children and adolescents before the beginning of the study. participants were tested individually in quiet rooms in the school or in the laboratory. half of the subjects of each age group were randomly assigned to the strategy-instruction condition (experimental group). they received instructions on visual imagery. the test administrator first explained the advantages of memorizing word pairs as one interconnected image emphasizing the importance of integration of the two images. the explanation was facilitated by two drawings, one of a frog carrying a banana and one of a candle burning a letter. then the subjects in the experimental condition had to practice this visual imagery strategy by means of ten word pairs which were different from those used in the experiment. they were given feedback on the quality of their imagery and were asked to imagine another combined image if necessary. the instruction was standardized for all participants with the restriction that the wording in children’s was slightly simplified. participants of all age groups reported that they easily understood the strategy. the participants in the experimental group were instructed to use the visual imagery strategy while memorizing the items. in the control group no strategy instruction was given. the word pairs were presented on a computer screen with presentation rates of 8 seconds per item pair for children and older adults, 6 seconds for adolescents, and 2.5 seconds for younger adults. the presentation rates were adapted in order to control for baseline difficulty between the age groups. subjects were instructed to concentrate on the pairs because they later would have to recognize them and to indicate whether the word pair had appeared in the study phase or not. in the jol phase, each left noun of the item pair (stimulus) was presented on the screen in the same order as in the learning phase. to avoid relearning only the stimulus was shown. subjects were asked to indicate the likelihood of recognizing the word pair in about 30 minutes. jols were rated on a thermometer scale from 0 (very unsure) to 100 (very sure) successfully used in previous studies (koriat, ackerman, lockl, & schneider, 2009; koriat & shitzer-reichert, 2002). in the recognition phase, word pairs of each type (i.e., either identical, recombined, or new) were presented. participants had to indicate by opting for “yes” or “no” whether they thought that the item had appeared in exactly this combination in the studying phase. after that, the item disappeared from the screen in order to avoid distraction from the presented item pair, and participants were asked to indicate how they generated the yes-or-no-decision. they had to decide between three options: a) “i can remember the word pair very well” (recollection), b) “the word pair seems familiar to me” (familiarity), c) “i cannot remember the word pair at all” (no memory). finally, subjects had to indicate on a hot-cold-scale equivalent to that used for jols how sure they were that the given answer was correct (cj). at the end of the session, the test administrator asked whether the subjects in the experimental condition had applied the strategy while memorizing the items. participants in the control group were asked whether they had employed any strategy, and if so, to specify the strategies and to indicate how often they were used. von der linden et al | f l r 43 3. results a preliminary analysis assessing the effect of gender did not reveal any systematic differences between male and female participants. thus data were collapsed across this variable. scheffé tests were used as a post-hoc follow-up on main effects. the level of significance was set to p < 0.05. in a first step of analysis, we assessed memory performance in terms of the percentage of correctly recognized items, that is, either identical items correctly recognized as “old” or recombined or completely new items correctly classified as “new” items. next, we analyzed jols and cjs as indicators of metacognitive monitoring. finally, we will report changes in the rfn judgments as cues for monitoring processes. results are reported as a function of age group and experimental condition in order to examine the influence of cognitive development and strategy instruction on recognition rates and metacognitive monitoring. 3.1 recognition rates the first column of table 1 shows the mean proportion of correctly recognized items as a function of age group and experimental condition, that is overall recognition rates. an anova with age group and experimental condition as between-subject factors revealed a main effect of age group (f(3,152) = 5.34; p < .01; η2 = .10). a post-hoc analysis indicated that younger adults performed significantly better than children (.80 vs. .71 correct, respectively). furthermore, a main effect of the experimental condition was found (f(1,152) = 21.82; p < .001; η2 = .13), indicating that those participants who had received the strategy instruction recognized significantly more word pairs correctly than participants who had not received such an instruction (.80 vs. .72, respectively). the second to fourth column of table 1 splits the recognition rates into percentages depending on the type of word pair: that is, whether the word pair in the recognition phase was identical to that in the study phase or whether it was recombined or a completely new word pair. inferential statistics were conducted separately for each word pair in order to facilitate the interpretation. for the identical word pairs, an anova with age group and experimental condition as between-subject factor revealed a significant main effect of age group (f(3,152) = 5.14; p < .01; η2 = .09). a post-hoc analysis showed that children (.65) recognized fewer of the identical items correctly than younger (.77) and older adults (.76). furthermore, the main effect of the experimental condition reached significance (f(1,152) = 5.24; p < .05; η2 = .03) with subjects in the experimental group (.74) recognizing more items correctly than subjects in the control group (.69). for the recombined word pairs, only the main effect of strategy instruction reached the significance level (f(1,152) = 5.24; p < .001; η2 = .11), with subjects in the strategy instruction group recognizing more of the recombined word pairs correctly (.75) than subjects who had not received the strategy instruction (.61). as for the new word pairs, a significant main effect of strategy instruction was found as well (f(1,152) = 7.04; p < .01; η2 = .04). those participants who had received the strategy instruction recognized more of the new word pairs correctly (.94) than those who had not been thus instructed (.88). von der linden et al | f l r 44 table 1 recognition rates as a function of age group, experimental condition, and type of word pair age group type of word pair overall identic word pairs recombined word pairs new word pairs children control group experimental group .65 (.06) .77 (.11) .60 (.16) .71 (.18) .54 (.19) .72 (20) .88 (.13) .91 (.10) adolescents control group experimental group .70 (.11) .81 (.09) .64 (.17) .76 (.12) .63 (.20) .76 (.15) .89 (.11) .94 (.09) younger adults control group experimental group .79 (.09) .81 (.12) .75 (.10) .79 (.15) .73 (.19) .75 (.23) .92 (.10) .93 (.10) older adults control group experimental group .74 (.12) .80 (.11) .79 (.14) .73 (.18) .53 (.28) .77 (.22) .85 (.21) .95 (.08) standard deviations are in parentheses. 3.2 metacognitive monitoring 3.2.1 mean jols before correct vs. incorrect responses figure 1 shows participants’ mean jol ratings as a function of the correctness of the subsequent response, age group, and experimental condition. an anova with correctness of response as within-subject factor and age group and experimental condition as between-subject factors revealed a significant main effect of correctness of response (f(1,151) = 135.38; p < .001; η2 = .47): subjects gave higher jols before correct (57.98) than before incorrect responses (46.78). in addition, a significant interaction between the factors correctness of response and experimental condition was found (f(1,151) = 6.22; p < .05; η2 = .04). furthermore, the triple interaction between correctness of response, experimental condition, and age group attained a significant level (f(3,151) = 3.43; p < .05; η2 = .06). in order to examine the direction of the interactions post hoc, we analyzed the experimental and the control group data separately. for subjects in the experimental condition, an anova with correctness of response as within-subject factor and age group as between-subject factor revealed a main effect of correctness of response (f(1,75) = 89.07; p < .001; η2 = .54) with mean jols being higher before correct (59.58) than before incorrect responses (45.98). for the participants in the control condition the main effect of correctness of response was also significant (f(1,76) = 47.36; p < .001; η2 = .38). furthermore, for the subjects in the control condition a significant interaction between correctness of response and age group was found (f(3,76) = 5.93; p < .01; η2 = .19). subsequent analyses revealed that only the adolescents and the younger adults distinguished between correct and incorrect responses given that it was only for these two age groups that the factor correctness of response turned out to be significant (children: f(1,19) = 0.66; p = .426; η2 = .03; adolescents: f(1,19) = 19.82; p < .001; η2 = .51; younger adults: f(1,19) = 66.06; p < .001; η2 = .78; older adults: f(1,19) = 4.43; p = .05; η2 = .19). von der linden et al | f l r 45 figure 1. mean jols preceding correct vs. incorrect answers as a function of age group and experimental condition. 3.2.2 jol accuracy in order to assess jol accuracy as a function of age group, question format and experimental condition, goodman-kruskal gamma correlations between jols and recall performance were computed for each participant, and then averaged for each single cell in the experimental design. gamma correlations are considered to be the most appropriate measure of metacognitive accuracy (nelson, 1984) and are commonly used in the contemporary literature (nelson & dunlosky, 1991; schneider et al., 2000). a positive correlation indicates that higher jols were given for items that were recalled correctly than for those recalled incorrectly. table 2 shows mean gamma correlations for jols as a function of age group and experimental condition. one-tailed t-tests revealed that all gamma correlations were different from zero for almost all 30 40 50 60 70 children adolescents younger adults older adults control condition correct answers incorrect answers von der linden et al | f l r 46 groups. the only exception concerned the children in the control group whose mean gamma correlations for jols were not significantly different from zero. table 2 mean gamma correlations as a function of age group and experimental condition age group jols cjs children control group experimental group .03 (.24) .32 (.28) .18 (.20) .38 (.29) adolescents control group experimental group .20 (.19) .30 (.19) .33 (.21) .42 (.21) younger adults control group experimental group .32 (.15) .27 (.23) .42 (.20) .35 (.23) older adults control group experimental group .14 (.26) .29 (.25) .38 (.26) .49 (.26) standard deviations are in parentheses. an anova with age group and experimental condition as between-subject factors revealed a significant main effect of strategy instruction (f(1,151) = 11.36; p < .001; η2 = .07). in addition, the anova showed a significant interaction between age group and experimental condition (f(3,151) = 3.58; p < .05; η2 = .07). in order to examine the direction of this effect, the mean gamma correlations of each age group were tested individually using univariate anovas. in children, the main effect of experimental condition was significant (f(1,38) = 12.25; p < .01; η2 = .24) with children in strategy-instruction group having higher gamma correlations (.32) than those in the control group (.03). in older adults, the results pointed into the same direction (experimental group: 29; control group: .14): the main effect of experimental condition was just short of being significant (f(1,38) = 3.29; p = .077; η2 = .08). 3.2.3 mean cjs after correct vs. incorrect responses differentiation in cjs was analyzed in the same way as for jols (cf. figure 2): mean cj ratings after correct and incorrect responses respectively were calculated for each subject. as for jols, an anova with correctness of response as within-subject factor, and age group and experimental condition as betweensubject factors was conducted. the anova revealed a main effect of age group (f(3,151) = 6.26; p < .001; η2 = .11). post-hoc tests according to scheffé’s procedure showed that younger adults gave significantly lower mean cjs (65.56) than the other age groups (children: 76.10, adolescents: 74.59, older adults: 75.00), regardless whether the answer was correct or not. in addition, a main effect of correctness of response was found (f(3,151) = 188.75; p < .001; η2 = .56). subjects of all age groups gave higher ratings after correct (79.02) than after incorrect responses (66.60). finally, the interaction between correctness of response and age group reached the significance level (f(3,151) = 5.61; p < .001; η2 = .10). to further explore the direction of the effect, separate anovas for cjs after correct and incorrect answers were conducted. for correct answers, the main effect of age group reached significance (f(3,156) = 3.42; p < .05; η2 = .06). subsequent post-hoc analyses showed that older adults (82.57) were more confident after correct responses than younger adults (74.25). for the cjs after incorrect answers, a significant main effect of age group was found as well (f(3,155) = 7.89; p < .001; η2 = .13). here, post-hoc analyses showed that younger adults gave lower cjs (57.33) after incorrect responses than the other three age groups (children: 72.63; adolescents: 69.00; older adults: 67.42). von der linden et al | f l r 47 figure 2. mean cjs after correct vs. incorrect answers as a function of age group and experimental condition 3.2.4 cj accuracy cj accuracy was assessed in the same way as jol accuracy. mean gamma correlations are displayed in table 2. all gamma correlations were different from zero (using one-tailed t-tests). an anova with age group and experimental condition as between-subject factors revealed a main effect of age group (f(3,151) = 2.93; p < .05; η2 = .06). post-hoc tests according to scheffé showed a significant difference between children (.28) and older adults (.44). furthermore, a main effect of strategy instruction was found (f(1,151) = 4.75; p < .05; η2 = .03). subjects in the strategy instruction condition had higher mean gamma correlations (.41) than participants without such instruction (.33). 3.3 rfn-judgments in addition, the rfn-judgments subjects made were contrasted. we compared the percentage of how often each option was picked out because age groups differed in quantity of items. table 3 shows the percentage of each rfn-judgment as a function of age group and experimental condition. an anova was 40 50 60 70 80 90 children adolescents younger adults older adults experimental condition correct answers incorrect answers 40 50 60 70 80 90 children adolescents younger adults older adults control condition correct answers incorrect answers von der linden et al | f l r 48 conducted with rfn-judgment as within-subject factor and age group and experimental condition as between-subject factor. we found a significant main effect of rfn-judgment (f(1,152) = 29.99; p < .001; η2 = .17). paired contrasts revealed that subjects chose “familiarity” (.22) less often than “recollection” (.40) and “no memory” (.38). furthermore a significant interaction between rfn-judgment and experimental condition was found (f(1,152) = 4.12; p < .05; η2 = .03). separate anovas for each judgment showed that the strategy instruction had a significant effect only for the answer “familiarity” (f(1,158) = 12.63; p < .01; η2 = .07), and “no memory” (f(1,158) = 5.54; p < .05; η2 = .03): subjects in the experimental group chose the “familiarity” option less often (.19) than subjects in the control condition (.26), and the “no memory” option more often (.41) than the control group (.34). table 3 rfn-judgment in percentage of chosen option as a function of age group and experimental condition age group recollection familiarity no memory children control group experimental group .41 (.24) .35 (.23) .27 (.15) .19 (.11) .32 (.25) .45 (.21) adolescents control group experimental group .41 (.19) .42 (.18) .30 (.16) .20 (.11) .29 (.18) .38 (.20) younger adults control group experimental group .35 (.16) .41 (.22) .24 (.10) .22 (.11) .40 (.17) .38 (.17) older adults control group experimental group .42 (.21) .40 (.20) .22 (.11) .15 (.09) .36 (.16) .45 (.22) standard deviations are in parentheses. 3.4 spontaneous and instructed strategy use in a last step of analysis, we assessed the outcomes for the strategy use questionnaire. table 4 shows how many participants of each age group and experimental condition reported having used the visual imagery strategy. participants’ open responses were categorized as visual imagery by two independent raters (kappa = .95). we compared the use of the visual imagery strategy in the experimental and the control group. an anova with age group and experimental condition as between-subject factors revealed a significant main effect of age group (f(3,154) = 10.97; p < .001; η2 = .18) with post-hoc analysis showing that younger adults (.83) applied mental imagery more often than the other age groups (children: .48; adolescents: .53; older adults: .55). furthermore, we found a significant main effect of experimental condition (f(1,154) = 221.46; p < .001; η2 = .63): participants who received the strategy instruction used mental imagery much more often than participants in the control condition (.28 vs. .95). the interaction between age group and experimental condition was also significant (f(3,154) = 9.38; p < .001; η2 = .16). separate analyses carried out for participants in the experimental group on the one hand and participants in the control group on the other hand showed there was no main effect of age group for participants who received the strategy instruction, indicating that the participants were able to transfer the training to the learning process. for subjects in the control group, the anova revealed a main effect of age group (f(3,74) = 12.30; p < .001; η2 = .35). a post-hoc analysis showed that the percentage of younger adults (.65) spontaneously applying a mental imagery strategy was significantly higher than that of the other age groups. more specifically, none of the von der linden et al | f l r 49 children, only 5% of the adolescents, and about 25% of the older adults applied such a strategy spontaneously. table 4 percentage of subjects reporting the use of visual imagery as a function of age group and experimental condition age group children adolescents younger adults older adults control group experimental group .00 .95 .05 1.00 .65 1.00 .25 .85 4. discussion the present study is among the first to explore metacognitive monitoring skills across the life-span, and also to investigate the effects of a memory strategy training principally suited to improve skills in this domain. thus our study combined traditional interests of the developmental and cognitive literature on metacognition. in particular, we focused on possible positive effects of strengthening highly diagnostic cues (koriat, 1997). this was achieved by training half of our participants in visual imagery before memorizing item pairs, and by assessing the training effects on monitoring quality, that is, jol and cj differentiation and accuracy. we postulated that instructing subjects to connect idiosyncratic content to items should lead them to rely less on familiarity but to focus on recollection processes when monitoring their performance. an important innovative aspect of our study was its life-span perspective in that four age groups (children in third grade, adolescents in seventh and eighth grade, younger and older adults) were included in the sample. especially for children only very few studies exist that have explored the cues underlying monitoring judgments but for all included age-groups more research on the effects of familiarity and recollection-based cues is needed. first, the results show that our manipulation of task difficulty across the age groups was successful: recognition rates in both the experimental and the control condition were of comparable height across the four age groups. thus, the task was suitable for participants from primary school to older adulthood. secondly, we found significant gains in recognition performance in subjects who underwent the strategy training. this effect was most pronounced for recombined and new item pairs but still substantial for the overall data. these results thus point to the fact that our experimental manipulation was successful and are in line with many previous findings: it has been shown for various age groups from primary school to older adulthood that visual imagery is an efficient strategy in paired-associate learning (richardson, 1998; verhaegen et al., 1992), and that its instruction leads to superior recall performance compared to spontaneous use (shing et al., 2009). also in accordance with the literature, self-reported use of visual imagery was higher in the experimental than in the control group, and substantial spontaneous use of visual imagery was only reported by young adults. we acknowledge that we cannot rule out the possibility that pre-training differences had an effect on memory performance. yet, participants in our study were randomly assigned to experimental and control group which substantially reduces the risk of imbalance in potentially confounding factors. although our sample was at the lower end of recommended sample sizes for randomization (bortz & döring, 2009), the expected positive results of the strategy training on recognition performance and the reported low level of von der linden et al | f l r 50 strategy use in the control group speak for a true effect of the strategy training. however, in future research the inclusion of a pre-training measurement would further substantiate the results. in accord with our hypothesis, we found evidence that both jol differentiation between later correct and incorrect answers and jol accuracy as measured by goodman-kruskal gamma correlations were enhanced by our strategy training in certain age groups. concerning jol differentiation, in the experimental group subjects of all age groups differentiated between correct and incorrect answers compared to the control group where only adolescents and younger adults gave higher jols before correct than before incorrect responses. additionally, adolescents in the experimental condition descriptively showed a more pronounced discrimination between later correct and incorrect answers, as compared to those in the control condition. concerning jol accuracy, the analysis revealed that only children’s accuracy improved significantly by the strategy instruction. although there was a similar tendency in the group of older adults, only a marginal effect of strategy training was found. similarly, adolescents’ and younger adults’ gamma correlations were not enhanced by the training. we also detected the expected positive effect of visual imagery on cj quality. for cj accuracy, we found a significant main effect of strategy instruction, implying that all age groups benefitted from the strategy instruction. concerning differentiation, training effects were found for children and older adults; here, the difference between correct and incorrect answers was about twice as high in the experimental condition than in the control condition. in contrast, adolescents and younger adults showed about the same amount of differentiation in both conditions. the strategy training had positive effects on both jols and cjs which were most pronounced for children and older adults for both monitoring indicators. yet, impact on cjs could be detected in a broader age range than for jols. this points to the fact that both jols and cjs rely on cues involved in our strategy instruction but at the same time draw onto different sources. for cjs information from the retrieval process might be most significant and it is possible that cues based on recollection processes are even more important for accurate judgments than for jols. further research is needed to clarify this point. in sum, the results confirm our predictions that a strategy training can improve the quality of metamemory monitoring judgments. this finding is in line with outcomes of other studies that have shown a positive influence of strategy use on prospective and retrospective monitoring judgments for children, younger and older adults (hertzog et al., 2010; nietfield & schraw, 2002; robinson et al., 2006; shing et al., 2009) and expands the existing literature by training strategy use explicitly and by the inclusion of two monitoring judgments. the developmental trends found in our study were also as expected: that is, children and older adults benefitted most from strategy instruction in terms of enhancing both their jol and cj quality, followed by the adolescents. these results are in accordance with developmental trends in regard to production deficits concerning strategy use (naveh-benjamin et al., 2007), and of recollection processes in general (ghetti & angelini, 2008). this outcome also emphasizes the practicability of our training. it proved to be effective yet was simple enough to be understood by elementary school children, and could also be successfully acquired by older adults in very short time. still, the strategy training did not account for much monitoring improvements in younger adults, with the exception of cj accuracy. one possible explanation for this finding is the high level of spontaneous use of visual imagery reported by young adults in the control group. it appears likely that the short strategy training did not greatly improve the already high level of strategy use in young adults. obviously, young adults showed high competence to memorize and to monitor their recall performance in paired-associate learning without further instruction. it is possible and should be investigated in further research that more pronounced effects of a strategy training would be found on more complex tasks. support for this assumption comes from studies where gains from a strategy training in cj accuracy could be shown for a comprehensive problem-solving task (nietfield & schraw, 2002). von der linden et al | f l r 51 presumably further reasons were responsible for the fact that we were not able to confirm the positive effects of the strategy instruction for all age groups and for all indicators of metacognitive monitoring. one possible cause is the influence of the memory paradigm. a recognition task was chosen in this study in order to explore the basis of memory and monitoring processes by collecting rfn-judgments. specifically, participants were asked to rate the quality of their recognition memory as recollection, familiarity or no memory as an indication of the mode of action of our strategy training. yet, it seems possible that recognition processes make differentiation of recollection and familiarity-based cues more difficult than recall, given that no active memory retrieval was necessary. recall processes seem to offer more cues to increase the accuracy of monitoring judgments, as compared to recognition (buratti & allwood, 2012). other research showed a positive effect of strategy use on monitoring accuracy required active memory recall (hertzog et al., 2010; robinson et al., 2006). although shing and colleagues (2009) found positive effects of a strategy training on cjs in a recognition task, they used more complicated stimuli (malay word pairs) than those used in this study. furthermore they collected metacognitive measures of calibration and not resolution as done here. another explanation for this unexpected outcome could be that we collected delayed jols which have been shown to be more accurate than immediate jols (nelson & dunlosky, 1991). this seems to be due to the fact that delayed jols in all age groups are based on active assessment of long-term memory (recollection processes) instead of short-term and long-term memory as in immediate jols. yet, we wanted to explore a possible add-on effect to maximize the quality of monitoring judgments. this in turn makes it more difficult to show an effect than with immediate jols which are commonly used in many studies (daniels et al., 2009; hertzog, fulton, mandiwala, & dunlosky, 2013; robinson et al., 2006). a third possibility is that the effects of a short strategy intervention as used here are generally limited. possibly results of a longer intervention could exceed the promising findings of our training in all age groups and for both monitoring indicators. this issue would be worth to be explored in a follow-up study. yet in general, a strategy instruction proved to be a promising starting point to influence monitoring processes in different age groups and across various monitoring indicators. as a mode of operation we proposed that the strategy instruction should be effective due to a shift in accessible cues. specifically, we assumed that improvement should be due to the fact that now sources of monitoring judgments should be less familiarity-based cues and increasingly recollection-based cues. rfn-judgments confirm that as predicted the number of familiarity based judgments was significantly reduced in the experimental group. this trend was accompanied by more “no memory” responses. we assume that subjects in the experimental group profited from the instruction in that they were able to decide for which memories no reliable cues existed. in such cases, the answer “no memory” was correctly given. participants could have used recall of interactive imagery to discriminate recollection states: if they can recall something about the image created to memorize a word pair, they are more confident that they base their delayed-jol on a diagnostic cue. strategy recall seems to have a similar effect on feeling-of-knowing judgments (hertzog, fulton, sinclair, & dunlosky, 2014). thus, although the number of recollection-based memories as perceived by the participants could not be enhanced, the strategy training seems to have increased participants’ awareness of possible cues and enabled them to distinguish more securely between real memories and no memories. this contrast of correct recall and no retrieval at the time of the jol has been shown to be the most important source for high accuracy of delayed-jols (nelson et al., 2004). it appears likely that the number of guesses which probably fell into the familiarity category could be successfully reduced by our training. given that no age effects were found, the instruction seems to be effective in a similar way across all age groups in that it reduces the impact of familiarity. in sum, the present study yields evidence that a strategy training is a suitable means to improve prospective and retrospective monitoring processes throughout the life-span, especially in children and older adults. the instruction used in this study proved to be an economic procedure that could be successfully applied in different age groups. the training was simple enough to be easily mastered by both elementary school children and older adults. therefore a transfer to everyday-life situations seems possible. von der linden et al | f l r 52 the findings of the present study emphasize the significance of recollection-based cues as well as its distinction from other cues for metacognitive monitoring processes and encourage further research in this direction. especially an expansion of our findings to more complex stimuli like texts or films and the investigation of the effect of a more elaborate strategy training are of interest. this would allow to further test relevance for every-day life. additionally, it would be interesting to explore the effects of a strategy training on a larger samples as our study included relative small sample sizes per group and to follow up long-term effects of the training. we do not exclude the possibility that multiple cues underlie and can significantly influence monitoring processing. yet, along with recent research (ghetti et al., 2008; mccabe & soderstrom, 2011; metcalfe & finn, 2008; toth et al., 2011) we assume that exploring the role of recollection and familiarity processes in mediating the accuracy of monitoring judgments is a promising issue for future research. improving monitoring processes is of great importance as it is very closely linked to memory performance. thus successful monitoring represents a very valuable competence in different learning contexts. our results demonstrate that monitoring occurs from childhood on, but that there is still room for improvement at every age level. at the same time, the findings also illustrate that there are very economical ways to improve metacognitive monitoring in different age-groups. they thus indicate a direction which is worth to be pursued in future research. keypoints approaches to improve metacognitive monitoring in a broad age range covering the life-span are still very rare. integration of traditional developmental and cognitive research questions, as in this study, are scarce in the metacognitive literature. our results show that a short training in visual imagery enhances both memory and metamemory performance, especially in children and older adults. improvements in monitoring seem to be associated to a use of more reliable cues after the strategy training. acknowledgments this research was conducted as part of a research project on the development of procedural metacognitive knowledge across the life-span and financed by the german research foundation (dfg-gz. schn 315/45-1). we wish to thank all participating children, adolescents and adults as well as teachers, principals and parents for their cooperation. references begg, i., duft, s., lalonde, p., melnick, r., & sanvito, j. (1989). memory predictions are based on ease of processing. journal of memory and language, 28, 610-632. doi: 10.1016/0749-596x(89)90016-8 benjamin, a. s., & bjork, r. a. (1996). retrieval fluency as a metacognitive index. in l. reder (ed.), implicit memory and metacognition (pp. 309-338). hillsdale, nj: erlbaum. doi: 10.1037/00963445.127.1.55 von der linden et al | f l r 53 bortz, j., & döring, n. (2009). forschungsmethoden und evaluation für humanund sozialwissenschaftler. heidelberg: springer. buratti, s., & allwood, c. m. (2012). the accuracy of meta-metacognitive judgments: regulating the realism of confidence. cognitive processing, 13, 243-253. doi: 10.1007/s10339-012-0440-5 connor, l. t., dunlosky, j., & hertzog, c. (1997). age-related differences in absolute but not relative metamemory accuracy. psychology and aging, 12 (1), 50-71. doi: 10.1037/0882-7974.12.1.50 daniels, k. a., toth, j. p., & hertzog, c. (2009). aging and recollection in the accuracy of judgments of learning. psychology and aging, 24, 494-500. doi: 10.1037/a0015269 dunlosky, j., & metcalfe, j. (2009). metacognition. thousand oaks, ca: sage publications, inc. flavell, j. h. (1999). cognitive development: children’s knowledge about the mind. annual review of psychology, 50, 21-45. doi: 10.1146/annurev.psych.50.1.21 ghetti, s., & angelini, l. (2008). the development of recollection and familiarity in childhood and adolescence: evidence from the dual-process signal detection model. child development, 79, 339-358. doi: 10.1111/j.1467-8624.2007.01129.x ghetti, s., lyons, k. e., lazzarin, f., & cornoldi, c. (2008). the development of metamemory monitoring during retrieval: the case of memory strength and memory absence. journal of experimental child psychology, 99, 157-181. doi: 10.1016/j.jecp.2007.11.001 hertzog, c., fulton, e. k., mandviwala, l., & dunlosky, j. (2013). older adults show deficits in retrieving and decoding associative mediators generated at study. developmental psychology, 49, 1127-1131. doi: 10.1037/a0029414 hertzog, c., fulton, e. k., sinclair, s. m., & dunlosky, j. (2014). recalled aspects of original encoding strategies influence episodic feelings of knowing. memory and cognition, 42, 126-140. doi: 10.3758/s13421-013-0348-z hertzog, c., sinclair, s. m., & dunlosky, j. (2010). age differences in the monitoring of learning: crosssectional evidence of spared resolution across the adult life span. developmental psychology, 46, 939948. doi: 10.1037/a0019812 holmes, a. e., & weaver iii, c. a. (2010). eyewitness memory and misinformation: are remember/know judgments more reliable than subjective confidence? applied psychology in criminal justice, 6, 47-61. kelley, c. m. & jacoby, l. l. (1996). adult egocentrism: subjective experience versus analytic bases for judgment. journal of memory and language, 35, 157-175. doi: 10.1006/jmla.1996.0009 kelley, c. m., & sahakyan, l. (2003). memory, monitoring, and control in the attainment of memory accuracy. journal of memory and language, 48, 704-721. doi: 10.1016/s0749-596x(02)00504-1 koriat, a. (1997). monitoring one's own knowledge during study: a cue-utilization approach to judgments of learning. journal of experimental psychology: general, 126, 349-370. doi: 10.1037/00963445.126.4.349 koriat, a. (2012). the self-consistency model of subjective confidence. psychological review, 119, 80113. doi: 10.1037/a0025648 koriat, a., ackerman, r., lockl, k., & schneider, w. (2009). the memorizing effort heuristic in judgments of learning: a developmental perspective. journal of experimental child psychology, 102, 265-279. doi: 10.1016/j.jecp.2008.10.005 koriat, a., & levy-sadot, r. (1999). processes underlying metacognitive judgments: information-based and experience-based monitoring of one's own knowledge. in s. chaiken, & y. trope (eds.), dual-process theories in social psychology (pp. 483-502). new york, ny: guilford press. koriat, a., & shitzer-reichert, r. (2002). metacognitive judgments and their accuracy: insights from the processes underlying judgments of learning in children. in m. izaute, p. chambres, p.-j. marescaux (eds.), metacognition: process, function, and use (pp. 1-17). new york: kluwer. leonesio, r. j., & nelson, t. o. (1990). do different metamemory judgments tap the same underlying aspects of memory? journal of experimental psychology, 16, 464-470. doi: 10.1037/02787393.16.3.464 von der linden et al | f l r 54 matvey, g., dunlosky, j., & guttentag, r. (2001). fluency of retrieval at study affects judgments of learning (jols): an analytic or nonanalytic basis for jols? memory & cognition, 29, 222-233. doi: 10.3758/bf03194916 mccabe, d. p., & soderstrom, n. c. (2011). recollection-based prospective metamemory judgments are more accurate than those based on confidence: judgments of remembering and knowing (jorks). journal of experimental psychology: general, 140, 605-621. doi: 10.1037/a0024014 metcalfe, j., & finn, b. (2008). familiarity and retrieval processes in delayed judgments of learning. journal of experimental psychology, learning, memory & cognition, 34, 1084-1097. doi: 10.1037/a0012580 naveh-benjamin, m., brav, t. k., & levy, o. (2007). the associative memory deficit of older adults: the role of strategy utilization. psychology and aging, 22, 202-208. doi: 10.1037/0882-7974.22.1.202 nelson, t. o. (1984). a comparison of current measures of the accuracy of feeling-of-knowing predictions. psychological bulletin, 95, 109-133. doi: 10.1037/0033-2909.95.1.109 nelson, t. o., & dunlosky, j. (1991). when people's judgments of learning (jols) are extremely accurate at predicting subsequent recall: the "delayed-jol effect". psychological science, 2, 267-270. doi: 10.1111/j.1467-9280.1991.tb00147.x nelson, t. o., & narens, l. (1990). metamemory: a theoretical framework and new findings. in g. bower (ed.), the psychology of learning and motivation: advances in research and theory (vol. 26, pp. 125141). new york: academic press. nelson, t. o., narens, l., & dunlosky, j. (2004). a revised methodology for research on metamemory: prejudgment recall and monitoring (pram). psychological methods, 9, 53-69. doi: 10.1037/1082989x.9.1.53 nietfeld, j. l., & schraw, g. (2002). the effect of knowledge and strategy training on monitoring accuracy. the journal of educational research, 95(3), 131-142. doi: 10.1080/00220670209596583 pressley, m., & levin, j. r. (1977). developmental differences in subjects' associative-learning strategies and performance: assessing a hypothesis. journal of experimental child psychology, 24, 431-439. doi: 10.1016/0022-0965(77)90089-3 rhodes, m. g., & castel, a. d. (2008). metacognition and part-set cuing: can inference be predicted at retrieval. memory and cognition, 36, 1429-1438. doi: 10.3758/mc.36.8.1429 richardson, j. t. e. (1998). the availability and effectiveness of reported mediators in associative learning: a historical review and an experimental investigation. psychonomic bulletin & review, 5, 597-614. doi: 10.3758/bf03208837 robinson, a. e., hertzog, c., & dunlosky, j. (2006). aging, encoding fluency, and metacognitive monitoring. aging, neuropsychology, and cognition, 13(3-4), 458-478. doi: 10.1080/13825580600572983 robinson, m. d., johnson, j. t., & robertson, d. a. (2000). process versus content in eyewitness metamemory monitoring. journal of experimental psychology: applied, 6(3), 207-221. doi: 10.1037/1076-898x.6.3.207 roderer, t., & roebers, c. m. (2010). explicit and implicit confidence judgments and developmental differences in metamemory: an eye-tracking approach. metacognition and learning, 5, 229-250. doi: 10.1007/s11409-010-9059-z roebers, c. m. (2002). confidence judgments in children’s and adults’ event recall and suggestibility. developmental psychology, 38, 1052-1067. doi: 10.1037/0012-1649.38.6.1052 roebers, c. m., von der linden, n., schneider, w., & howie, p. (2007). children’s metamemorial judgments in an event recall task. journal of experimental child psychology, 97, 117 137. doi: 10.1016/j.jecp.2006.12.006 schneider, w. (2010). metacognition and memory development in childhood and adolescence. in h. s. waters & w. schneider (eds.), metacognition, strategy use, and instruction (pp. 54-81). new york: guilford press. schneider, w. (2015). memory development from early childhood through emerging adulthood. new york: springer. von der linden et al | f l r 55 schneider, w., visé, m., lockl, k., & nelson, t. o. (2000). developmental trends in children's memory monitoring: evidence from a judgment-of-learning (jol) task. cognitive development, 15, 115-134. doi: 10.1016/s0885-2014(00)00024-1 shing, y. l., werkle-bergner, m., li, s.-c., & lindenberger, u. (2009). committing memory errors with high confidence: older adults do but children don't. memory, 17, 169-179. doi: 10.1080/09658210802190596 son, l. k., & metcalfe, j. (2000). metacognitive and control strategies in study-time allocation. journal of experimental psychology: learning, memory, and cognition, 26, 204-221. doi: 10.1037/02787393.26.1.204 souchay, c., bacon, e., & danion, j. (2006). metamemory in schizophrenia: an exploration of the feelingof-knowing state. journal of clinical and experimental neuropsychology, 28, 828-840. doi: 10.1080/13803390591000846 toth, j. p., daniels, k. a., & solinger, l. a. (2011). what you know can hurt you: effects of age and prior knowledge on jol accuracy. psychology and aging, 26, 919-931. doi: 10.1037/a0023379 touron, d. r., hertzog, c., & speagle, j. z. (2010). subjective learning discounts test type: evidence from an associative learning and transfer task. experimental psychology, 327-337. doi: 10.1027/16183169/a000039 verhaeghen, p., marcoen, a., & goossens, l. (1992). improving memory performance in the aged through mnemonic training: a meta-analytic study. psychology and aging, 7, 242-251. doi: 10.1037/08827974.7.2.242 weinert, f. e., & schneider, w. (1996). entwicklung des gedächtnisses . in d. albert & k.-h. stapf (hrsg.), enzyklopädie der psychologie, themenbereich c, serie ii, band 4 (s. 433-487). göttingen: hogrefe. willoughby, t., porter, l., belsito, l., & yearsley, t. (1999). use of elaboration strategies by students in grades two, four and six. the elementary school journal, 99(3), 221-231. wong, j. t., cramer, s. j, & gallo, d. a. (2012). age-related reduction of the confidence-accuracy relationship in episodic memory: effects of recollection quality and retrieval monitoring. psychology and aging, 27(4), 1053-1065. doi: 10.1037/a0027686 zakay, d., & tuvia, r. (1998). choice latency times as determinants of post-decisional confidence. acta psychologica, 98, 103-115. doi: 10.1016/s0001-6918(97)00037-1
filius et al publication2 frontline learning research vol.6 no. 2 (2018) 92 113 issn 2295-3159 promoting deep learning through online feedback in spocs renée m. filiusa, renske a.m. de kleijnc, sabine g. uijld, frans j. prinsc, harold v.m. van rijenb, diederick e. grobbeea auniversity medical center utrecht, julius center for health sciences and primary care, the netherlands buniversity medical center utrecht, biomedical sciences department, the netherlands cutrecht university, department of education, the netherlands dutrecht university, university college, the netherlands article received 8 march 2018/ revised 18 april/ accepted 6 september/ available online 12 november abstract higher education aims for deep learning and increasingly uses a specific form of online education: small private online courses (spocs). to overcome challenges that instructors face in order to promote deep learning through that format, the use of feedback may have significant potential. we interviewed eleven instructors and four students and organized a focus group to formulate scalable design propositions for instructors in spocs to promote deep learning. propositions have been formulated according to the cimo-logic. this study resulted in identification of four mechanisms by which the desired outcome (deep learning) can be achieved, which we describe here along with proposed interventions. results show that the “online learning interaction model” can be deepened with these mechanisms: 1) feeling personally committed, 2) asking and providing relevant feedback, 3) probing back and forth, and 4) understanding one’s own learning process. to activate these mechanisms, scalable feedback interventions are described in three categories. results at this relatively young field of spocs also show that feedback as a dialogical process may contribute to solving the current challenges of instructors in spocs to achieve deep learning with their students. keywords: online learning; deep learning; peer feedback; spocs; teaching/ learning strategies. info corresponding author mail: r.m.filius@uu.nl doi: 10.14786/flr.v6i2.350 1. introduction deep learning involves critical thinking, integrating what the student is learning with what he or she already knows and creating new connections (biggs, kember, & leung, 2001; entwistle, 1991; marton & saljo, 1997; hounsell, 1997) and is related to higher-order thinking skills. promoting deep learning is an important task for higher education (biggs & tang, 2011, nicolls, 2002, lynch, mc namara & seery, 2012, ramsden, 1992), which is increasingly conducted online (geitz, brinke, & kirschner, 2015). small private online courses (spocs) are a distinctive form of online education that is used in higher education, especially since the last decade (uijl, filius, & ten cate, 2017). recently, filius, de kleijn, uijl, prins, van rijen and grobbee (2018) found that instructors experience specific challenges when trying to promote deep learning in spocs. their study resulted in a description of five main challenges for instructors: alignment of learning activities, insights into student needs, adaptivity of teaching strategy, social cohesion, and creation of dialogue. these challenges are due to a lack of facial contact and visual cues, as online learning tends to involve mostly asynchronous written interaction, and to the fact that the course material is usually developed and set before the start of the course. to overcome such challenges to promoting deep learning in online education, the incorporation of feedback as a pedagogical strategy may have significant potential, which is currently not optimally exploited (lynch, mcnamara, & seery, 2012; rushton, 2005). following carless (2011), we take a broad definition of feedback as “all dialogue to support learning in both formal and informal situations” (askew & lodge, 2000, p. 1). this illustrates how we view feedback as a two-way form of interaction and not as a one-way comment from one to the other. it is generally agreed that feedback plays an important role in higher education (nicol & macfarlane-dick, 2006). feedback leads to the development of higher-order skills (davies & berrow, 1998), connecting new knowledge to what students already know and to knowledge construction (nicol, 2009). engaging students in peer feedback helps develop skills for reflection, self-regulation, and critical thinking (boud, 2001; dochy, segers, & sluijsmans, 1999; lin, liu, & yuan, 2001; p. m. sadler & good, 2006). feedback in spocs may be even more important than in face-to-face classes, because it increases the student-instructor interaction and student-student interaction, and thus compensates for the potential geographical disconnect in online courses that may affect students’ retention (dennen, aubteen darabi, & smith, 2007; richardson, koehler, besser, caskarlu, lim, and mueller, 2015). however, nowadays instructors are under pressure to provide high-quality feedback to students in a prompt manner, often to large and diverse cohorts (allan & bentley, 2012; nicol, 2009; planar & moya, 2016). and even though spocs involve small groups, the number of parallel courses that run at the same time and the diversity of students may be high, which makes the provision of feedback very time-consuming (crook, mauchline, maw, lawson, drinkwater, lundqvist, … and park, 2012). therefore, this study aims at exploring how challenges of instructors can be overcome when providing feedback by developing design propositions for instructors in spocs to promote deep learning. 2. design propositions to promote deep learning in spocs 2.1 deep learning the distinction between deep learning and surface learning as students’ approaches to studying has been supported by the results of previous research (biggs, kember, & leung, 2001; entwistle, 1991; marton & saljo, 1997). deep and surface learning are considered to be two extremes of a continuum. surface learning indicates that the learner simply memorizes new ideas. deep learning is defined as the process of actively integrating new ideas into the existing cognitive structure through critical thinking, integrating what is learned with what was already known, and creating new connections between concepts (aharony, 2006; biggs, 1999; hall, ramsay, & raven, 2004). according to garrison, anderson, and archer (2001), in order to promote deep learning, the whole person should be engaged–cognitively, socially and affectively–in the learning process. a deep learning approach is more likely to result in better retention and transfer of knowledge (ramsden & moses, 1992) and to lead to high-quality learning outcomes such as a good understanding of the discipline and critical thinking skills (athanassiou, mcnett, & harvey, 2003; athanassiou, 2003; biggs, 1999; booth, luckett, & mladenovic, 1999; lindblom-ylänne, 1999; ramsden & entwistle, 1983; trigwell, prosser, & waterhouse, 1999). students are unlikely to experience high-quality learning outcomes, or develop appropriate skills and competences through a surface approach to learning (hall et al., 2004). 2.2 using feedback to promote deep learning there are several instruments that instructors can use to promote deep learning, such as using concepts maps (hay, 2015), cross-cultural chat (osman & herring, 2007), podcasting (pegrum, bartle, longnecker, 2014) and online asynchronous discussions (du, harvard & li, 2005). one of the most powerful instruments for instructors to influence learning is feedback (hattie & timperley, 2007; kluger & denisi, 1996). we argue that feedback through dialogue between instructors and students, among peers, or perhaps even between student and computer may promote deep learning. the purpose of feedback is to reduce the discrepancies between the students’ current understanding or performance and the understanding or performance that is being aimed for (hattie & timperley, 2007). according to hattie and timperley, feedback is information provided by a source (e.g., teacher, peer, book, parent, self, experience) regarding aspects of one’s performance or understanding. however, once the feedback has been provided, the receiver needs to process and respond to the feedback. the way the student receives the feedback is just as important as how the provider intended the feedback (ilgen, fisher, & taylor, 1979). ilgen and colleagues composed a model in which they showed the student’s processing of feedback into different stages. emphasis was put on those aspects of feedback that influence: a) the way feedback is perceived, b) its acceptance by the recipient, and c) the willingness of the recipient to respond to the feedback (ilgen et al., 1979). in line with this model, according to nicol (2010), carless, salter, yang, and lam (2011), boud and molloy (2013), and planar and moya (2016), feedback can be viewed as two-directional and needs to constitute a dialogue between the person who facilitates it and the one who receives it. it must explicitly promote self-regulation and a proactive attitude on the part of the student towards it; at the same time, it needs to focus on the learning process and involve peers. according to geitz et al. (2015), feedback should be supported by dialogue and by activities that not only inform students about their current performance, but also teach them to seek and ask for feedback on future performances. this will put students more in control. it will also enable them to add meaning to the feedback and to discuss the feedback as equals with their peers. 2.3 the role of the instructor and the student this student-centered approach assumes that no longer the instructor, but the student, has become the center of the learning process. the instructor has become a facilitator who guides the learning process. garrison, anderson, and archer (2000) developed the community of inquiry framework that sheds more light on the role of the instructor to influence students’ deep learning approaches. in order to promote deep learning, the instructor should aim at three interdependent structural elements of the framework—social, cognitive, and teaching presence. social presence reflects the development of climate and interpersonal relationships in the community. cognitive presence provides a description of the progressive phases of practical inquiry leading to resolution of a problem or dilemma. teaching presence provides leadership throughout the course or study. these three elements that the instructor should focus on in online education show similarities with the “online learning interaction model” from ke and xie (2009). as garrison and colleagues (2000) focus on the teaching activities of the instructor, ke and xie focus on the learning activities of the students. both view interaction as a core indicator for deep learning. ke and xie (2009) distinguish three different types of interaction of students in an online course: 1) social interaction, 2) knowledge construction, and 3) regulation of learning. their model is based on concepts for deep learning in adult education and helps to examine the quality of online education. even though instructors may view interaction as essential to deep learning, given the high student-staff ratio it can be difficult for the instructor to engage in dialogue with students. thus, instructors look for alternative feedback strategies that are efficient and effective and less time-consuming (allan & bentley, 2012) and that can be implemented in spocs. for example, peer feedback strategies have shown to be beneficial to deep learning (anderson & rourke, 2002; boud, cohen, & sampson, 1999; moon, 2013). the combination of feedback strategies and the specific context of spocs will lead to a set of design propositions specifically useful for promoting deep learning in spocs. 2.4 design propositions and cimo-logic design propositions are heuristic statements about how and why a pedagogical intervention works in a certain context (plomp & nieveen, 2009). a design proposition is intended to be transparent, comprehensive, and described in such a way as to make clear under which conditions it lends itself to generalization for other contexts. in this study, the design propositions will be formulated according to the cimo-logic (van aken, 2007; van den akker, 1999) used in design literature (e.g. denyer, tranfield, & van aken, 2008) and several recent studies (bronkhorst, meijer, koster, & vermunt, 2011; brouwer, brekelmans, nieuwenhuis, & simons, 2012; dobber, akkerman, verloop, admiraal, & vermunt, 2012). a design proposition describes the specific context to which it applies, the intervention proposed, and the mechanism by which the desired outcome is achieved: cimo-logic. the causal relation between the intervention and outcome in the context is (potentially) more plausible when all cimo components are described (brouwer et al., 2012). this inclusion of context dependency and mechanisms triggered is why the cimo-logic is preferred over other ways of specifying design propositions that exist in the literature, which are often limited to specification of intervention and outcome. cimo-logic determines that a design principle has the following structure: “in this class of problematic contexts, use this intervention type to invoke these generative mechanism(s), to deliver these outcome(s)” (denyer et al., 2008, p. 395). for example, “if you have a spoc in which you want students to respond to each others’ contributions and try to look for common understanding (context), support group work (intervention type) to promote deep learning (intended outcome) through probing back and forth (mechanism).” figure 1 shows how the cimo-logic has been applied in this study. the context is defined by the specific challenges that instructors in spocs experience when aiming to promote deep learning. the contexts elucidate the context dependency of the intervention. interventions are purposeful measures (products, processes, or activities) that are formulated by the designer (or instructor) in order to solve a design problem or need (denyer et al., 2008; midgley, 2000), for example the need for deep learning. van aken (2004) indicates that the key question is not so much whether the intervention works, but what it is about the intervention that makes it work. why does an intervention lead to a certain outcome in a specific context? this has been described in the mechanisms. outcomes are the result of the interventions. figure 1. cimo logic (based on van den akker, 1999). 2.5 research question we believe taking a design proposition perspective in which interventions, outcomes, ánd mechanisms are investigated in relation to each other is rather unique, and will provide contextualized conclusions that have both practical and conceptual value. therefore, this paper aims to address the question: “how and why can deep learning in higher education spocs be promoted using scalable feedback interventions?” feedback interventions consist of information that is externally generated and includes tips for improvement (kluger & denisi, 1996). in this study, only feedback interventions that are scalable have been included, which refers to the requirement that it must be possible to increase the number of students involved without increasing the workload of the instructors. by investigating the mechanisms, we aim to answer why the intervention will (not) promote deep learning. 3. methods 3.1 design the study design was qualitative and exploratory and used individual interviews with instructors in spocs representing participants from different fields of study as well as students. since this study focuses on the design propositions for instructors, the interviews with the students have solely been used to substantiate the interviews with the instructors. this triangulation of the findings supported multiple perspectives rather than only the instructors’ perspective. moreover, a focus group representing experts from different disciplines was added. according to powell and single (1996), in cases where the existing knowledge of a subject is inadequate, as is the case here, the use of a focus group is especially useful and can be employed to gather diverse ideas about possible feedback interventions. the supportive, congenial, non-judgmental setting offered by the focus group enhanced the likelihood of collecting the diverse and spontaneous opinions that eluded the in-depth interviews (powell & single, 1996). the study was approved by the dutch ethical board for research in education (nvmo, the netherlands association for medical education, approval no. 210). the nvmo is an independent association that carries out activities for anyone involved in medical and health care education in the netherlands and flanders (belgium). 3.2 participants 3.2.1 individual interviews the data used in this study were taken from the same dataset as a previous study (filius et al., 2018) for the individual questions with the instructors. each study used different parts of this dataset. concerning the selection of participants, we aimed for maximal variation and theoretical sampling (guba, 1981). therefore, the first author asked the heads of the education and it departments at 4 institutions to recommend instructors and students from their institutions with experience in teaching or participating in spocs. from these recommendations we selected instructors and students in spocs with varying levels of experience in years of teaching in or following spocs. we expected age and experience to be relatively large influencers, more than for example backgrounds. in addition, we included instructors that we considered to be experts and who are known for being keynote speakers at relevant international conferences about online education. we expected them to have a broad view of developments among instructors and to increase the chance that we included as many experiences as possible. both the participating instructors and students represent different universities and virtual learning environments. a maximum of 2 of the same universities and a maximum of 2 of the same virtual learning environments were represented, which resulted in 10 different universities and 8 different virtual learning environments. the number of the purposive sample sizes of instructors has been determined by data saturation as the collection of more data appeared to have no additional interpretive worth (guest, bunce, & johnson, 2006). in the case of the students, we were looking for counter evidence for the findings of the interviews with the instructors. after four interviews we had not found any counter evidence and then we decided not conduct any additional interviews. all of the 11 invited instructors and 4 invited students agreed to be interviewed. all instructors (4 female and 7 male) were involved in teaching online courses in higher education. the average age of instructors was 51.8 years (sd=20.0), average teaching experience was 15 years (sd=19.6) compared with their average experience with spocs of 10.4 years (sd=6.8). six instructors had 2 years or less experience with spocs, the other 5 instructors had 10 years or more experience with spocs and online distance education. two of the instructors are also researchers in the field of online education. additionally, 4 students (3 female and 1 male) were involved, ranging in age from 28 to 52, with an average age of 39 years (sd=9). two of them had participated in just one spoc; the others had participated in more spocs, varying in duration and study load. 3.2.2 focus group session in total 10 professionals, other than the interviewed instructors, engaged in the focus group session. they were selected using specific-criterion sampling, which is a type of purposive sampling and selection method in which one concentrates on people with specific characteristics (palys, 2008). we interviewed 10 professionals from multiple disciplines and areas of expertise who are known for their open-mindedness, to fill in the gaps with more unconventional interventions. their ages ranged from 23 to 52 years of age. all of them work in art, technology, and/or education, a number of them being at the intersection of several disciplines. the three disciplines were evenly represented. their job positions are: journalist, artist, product manager of moocs, researcher, educational platform manager, educational technologist, and game designer. some of them were also students or instructors. all participants took part on a voluntary basis. 3.3 procedure 3.3.1 individual interviews participants were informed of the study’s purpose and approach both in the invitation e-mail and at the start of the interview. this included an explanation of the outcome ‘deep learning’ and ‘scalable interventions’. during the interviews, the interviewer asked each participant to name several examples of deep learning, to compare these with findings in literature to determine whether their understanding corresponded to our previous study. hardly any differences emerged in this respect. each participant signed a consent form. interviews were based on an open interview scheme following a qualitative approach (cohen, manion, & morrison, 2013; cresswell, 2007). this was done so as to do justice to the complexity of the topic as well as to the nature of encapsulated expert knowledge (berliner, 2001), since the in-depth nature of open interviewing allows the informants to answer from their own frame of reference (bogdan & biklen, 2003; cohen et al., 2013). the interviews lasted an average of one hour. the interview questions for instructors are shown in table 1. the same questions were asked of students, but from their perspective. questions were related to the cimo method by asking for the context, intervention used, mechanism activated, and outcome achieved. the deep learning process has been operationalised as the initiation of critical thinking, integrating what the student is learning with what he or she already knows and creating new connections. these three deep learning activities are mental processes which, when initiated, are considered as 'deep learning outcome'. specific attention was paid to what interventions have been used and which mechanisms triggered the deep learning activities. by subsequently asking for three statements or golden rules about providing feedback to promote deep learning, participants were encouraged to speak freely about their ideas on what might help to promote deep learning in spocs and why this might help. other questions asked to all participants were to prompt and/or probe for additional information. table 1 interview questions supplemented with probing questions 3.3.2 focus group session during the focus group session, a short introduction was provided to present the results of the interviews and to explain the definitions of spocs, feedback, and deep learning. participants were informed about the summarized results of the interviews in terms of contexts, mechanisms, and desirables outcomes. then they were asked to brainstorm in three rounds about the results of the interviews. every round involved different group compositions. following their suggestions in the small groups, they collaboratively discussed the interventions and mechanisms in more depth in order to conclude how feedback may promote deep learning in spocs. 3.4 analysis the analysis of the data proceeded in stages, using nvivo to code and retrieve the data. first, the interviews and focus group session were audio recorded and transcribed. to avoid misrepresentation and misinterpretation of interviewees’ statements, the transcript and a summary of the transcription were sent to each participant for member checking (poortman & schildkamp, 2012). the focus group participants received a report for verification, which was created based on the transcript and notes. all participants agreed with the transcribed content. second, the transcripts of the interviews and focus group session were inductively coded into meaningful categories by the first author, using open coding (cresswell, 2007). fragments of all texts of which the corresponding code was debatable according to the first researcher, which came to less than 2.5% of all texts, were discussed by the full research team. next, each meaningful category has been classified using the theme intervention and the theme mechanism, according to the cimo-logic. then the first author moved to more selective coding stages according to an iterative process. based on the previous round of analysis, the codes have been revised. on the basis of the data, some codes have been merged, deleted or reformulated. subsequently, all data were analyzed again, now with the new codes. considering the open and grounded nature of this analysis (bogdan & biklen, 2003) at every coding stage, all different categories were discussed by the research team until agreement on the categories’ content, as well as the codes, was reached. to enhance reliability in coding, an independent researcher also analyzed a random sample of approximately 10 percent of the data for calculating the inter-rater reliability. the percentage of agreement was 93%. internal validity was further enhanced due to the description of the results, which were context-rich, meaningful, and thick. external validity was promoted by including respondents’ quotes and by describing the coherence with the theoretical framework. reasoning from the perspective of the cimo-logic, the interventions that were derived from the data into meaningful categories are suggestions from respondents on how online feedback could overcome the problems mentioned in the specific context of a spoc. only interventions that are scalable, that is, without being very time-consuming, were selected as meaningful categories, in light of the constraints of shrinking staff budgets and expanding student numbers. for example, feedback interventions such as direct conversations with videoconferencing tools between instructor and student have been excluded for this reason, despite their potential in achieving deep learning. the mechanisms that derived from the data shed light on why interventions lead to the desired outcome, which is deep learning. each of the mechanisms was classified into one of the categories of ke and xie’s online learning interaction model (2009): 1) social interaction, 2) knowledge construction, and 3) regulation of learning. to ensure quality in all of the steps described, an audit was conducted by an independent researcher concerning all steps of data gathering and analysis (akkerman, admiraal, brekelmans, & oost, 2008). the audit had both a formative and a summative function. as a consequence, the auditor assessed the steps taken several times during the study and at the end of the study. this resulted in an audit report with questions and answers, mostly about the analysis of the data. for that reason, there have been some adjustments in the description of the analysis in this article. thereafter, the auditor reviewed the study again and affirmed it as being visible, comprehensible and transparent. according to the auditor, decisions are explicated and communicated, decisions have been substantiated and decisions are acceptable according to standard, values and norms. 4. results in the results of this study, we describe design propositions to overcome challenges in promoting deep learning in spocs according to the cimo-logic. the design propositions consist of the context (specific challenges in spocs) in which feedback interventions will trigger student mechanisms that will lead to the desired outcome (deep learning). we start with describing four main student mechanisms by which deep learning (the desired outcome) can be achieved in spocs (context). after that we address how these mechanisms can be triggered by feedback interventions. the letter after each quote refers to either an instructor (i) or student (s). where the suggestions of students were additional to those of instructors, it was explicitly mentioned that this originated from students. 4.1 student mechanisms the mechanisms in this study are the processes that are internal to the student. they describe how students engage in learning activities, which largely determines the quality of the learning outcomes they attain (vermunt & verloop, 1999). knowledge concerning the mechanisms sheds light on why interventions lead to the desired outcome, which is deep learning. mechanisms are: 1) feeling personally committed, 2) asking and providing relevant feedback, 3) probing back and forth, and 4) understanding one’s own learning process. each of the mechanisms has been categorized according to the online learning interaction model (ke & xie, 2009) as a) social, b) knowledge construction, or c) regulation. 4.1.1 mechanism 1: feeling personally committed (category: social) if students are personally addressed, they feel personally committed and accept the feedback more easily. according to the instructors, possibilities to do so in online education have not been optimally utilized yet. one of the instructors said: “one of the benefits of online learning, i think, is the transparency. because students write assignments, receive and give feedback, it is easy to get the picture: he is there, they are there, and those guys over there still don’t get it” (i8). another instructor explained, “here’s what i find is the benefit: in a classroom situation you rarely have the opportunity to ask, to focus on what every single student thinks or what every student is thinking about that question. in a classroom you only have a limited amount of time and you may have three, four, five students answer that question, but you don’t know what every student is thinking. in an online course you have the opportunity to get that student to respond to that–every single student to respond to that question and you have the opportunity to provide one-on-one feedback and ask those questions. in a face-to-face classroom i would never know those students who weren’t thinking… you only see the stars, basically” (i7). students will prefer to choose a deep learning approach once they feel personally committed, which can be achieved through tailored feedback: “the individualization, the differentiation that you can give to students in an online environment is so much greater than you can do in a face-to-face classroom.” (i4). 4.1.2 mechanism 2: asking and providing relevant feedback (category: knowledge construction) to learn how to focus on a deep learning approach, students indicate that it helps them to learn how to ask for feedback, but also how to provide peer feedback that promotes deep learning. instructors confirm this. one of them adds: “i think it is very instructional for students to provide feedback, for themselves. that they learn how to grade such a piece of work, what criteria are being used. and they will have to keep doing so, later in their life, when they are working at the university or elsewhere” (i6). students said that they haven’t been taught how to provide meaningful feedback and that it is hard to learn it oneself. instruction will thus be useful. compared to face-to-face education, students tended to ask for feedback more frequently, just because it is easier since there seems to be an opportunity 24 hours per day. both instructors and students also tend to provide feedback faster in online education because the virtual learning environment enables them to be very quick. both instructors and students think that this fast way of asking for and providing feedback may promote more of a surface approach to learning. and because the number of feedback requests is so high, it is difficult for students all to get involved in a dialogue with the instructor. an instructor explains how he deals with the large number: “we selected the most important issues and also some examples, and that was what we discussed” (i5). thus, according to the members of the focus group, it may help students to learn how to prioritize feedback requests. 4.1.3 mechanism 3: probing back and forth (category: knowledge construction) in order for deep learning to occur, students and instructors experienced a need for a back-and-forth probing to take place. by presenting ideas and getting feedback on these ideas by ping-ponging back and forth with peers and/or instructor, students thought deeply and got the opportunity to combine what they already knew with new knowledge. it required an environment in which students felt safe and interacted comfortably with each other and with the instructor. according to a student: “you need to build a relationship with each other in order to be motivated and to be able to accept the feedback, so someone must be open to receiving feedback” (s1). another instructor explained: “the feedback that works best is the feedback in which you can keep asking questions after each response from the student. as a dialogue. because that really forces the student to think deeply” (i1). back-and-forth probing can be either synchronous or asynchronous, but most respondents preferred it as synchronous: “it becomes snappier, it is easier to ask questions right away, to help the student to take the necessary steps and to think deeper” (i8). 4.1.4 mechanism 4: understanding one’s own learning process (category: learning regulation) both instructors and students expressed the view that deep learning can be promoted by letting students apply their knowledge, for example, in a scenario or case study. students will have to try to apply new information in other contexts, which enables them to create new knowledge and to make connections with prior knowledge and new concepts. they will have to go through various steps and receive feedback on each step. by doing so, they engage themselves in meaningful ways that enable them to reflect deeply on the learning activity and the feedback they have received. creating the right feedback for each step to be taken requires forward thinking. one of the instructors explains: “i found that very little deep learning occurs online anyway, unless there is some type of a scenario, or they have to apply it in a case study. in other words, it’s forward thinking. i would call it that the deep learning occurs when you have opportunities for forward thinking, forward looking. ‘what would you do if…? what would happen if…? what’s the projection if this?’ and it’s a little bit of what-if/then kind of thinking, that i think precedes all of the other learning. and without that, i don’t think that it really progresses further” (i4). this mechanism helps students to be prepared for opportunities to develop the capacity to regulate their own learning as they progress through higher education. 4.2 triggering mechanisms through feedback interventions the mechanisms described above can be triggered by several feedback interventions, which are described in three following categories: 1) feedback management 2) peer feedback, and 3) automatic feedback. the mechanisms and interventions are summarized in figure 2. figure 2. interventions and mechanisms according to the instructors and students. 4.2.1 feedback management interventions the feedback management interventions describe how to manage online the monitoring and provision of feedback to and among students in such a way that deep learning is promoted. for each intervention, the dominant mechanisms that were identified in this study are indicated in italics. intervention a: collect student information in advance in order to make students feel personally committed and to estimate what feedback is needed, it helps to collect student data before the start of the course. student data involved learning characteristics, such as the education level, results on a pre-test, information on expectations, personal learning objectives, and motivation. collecting this data benefited the feedback provided, because it enabled instructors to adjust their feedback to the needs of the students and thus make it more specific. the relatively convenient availability of student data in spocs compared to face-to-face education may compensate in part for the lack of facial contact. for instructors in spocs, knowing more about their students helped them to formulate the feedback to make it more tailored to the student’s needs. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 2. table 2 specific suggestions of intervention a < intervention b: monitor progress using a dashboard instructors monitored students’ progress during their education using a dashboard. the dashboard provided the instructor with an analysis of student data such as their contributions to assignments and discussion forums, questions, completion rates, and grades. it enabled instructors to intervene during the course and provide specific personalized formative feedback, for example when students skipped certain necessary steps or when they tended to think in a wrong direction. according to instructors, receiving personalized feedback helps students to feel more personally committed and may help them to understand their own learning progress better–especially when the dashboard is visible to the students themselves, as members of the focus group suggested. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 3. table 3 specific suggestions of intervention b intervention c: bring requests back to the essentials participants in the focus group suggested reducing the number of feedback requests and letting students prioritize the issues they want to receive feedback on. instructors suggested teaching students how to ask for the right feedback and guiding them during this learning process by reflecting on the type of feedback questions they ask. instructors in spocs expect this to be useful in aligning the learning activities with the learning goals and the assessment goals so that they all promote deep learning. moreover, it will help students to ask for (more) relevant feedback. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 4. table 4 specific suggestions of intervention c intervention d: discuss and rate the quality of the feedback students suggest that teaching them how to provide relevant feedback may promote deep learning. to do so, participants in the focus group suggested letting students discuss and rate the quality of the feedback they provide and receive. by discussing and rewarding the quality of the feedback provided, students aim to learn how to focus on deep learning and how to increase the quality of their feedback. this might also help to give students recognition for the effort they make to provide good feedback. according to the interviewed respondents, feedback to promote deep learning should include many questions to elicit deep learning. a discussion could start with an instruction on, for example, what questions are helpful to promote deep learning, such as what-if/then questions; for example, “what would you do if…?” “what would happen if…?” “what’s the projection if this...?” specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 5. table 5 specific suggestions of intervention d 4.2.2 peer feedback types amongst the participants, peer feedback is considered as an appropriate and scalable intervention to activate the mechanism “asking and providing relevant feedback” and, once delivered in dialogue form, the mechanism “probing back and forth.” however, it may also trigger other useful mechanisms. dominant mechanisms have been indicated in italics below. different types of peer feedback to promote deep learning can be distinguished. intervention e: encourage asynchronous oral peer feedback (audio or video) instructors encouraged the involvement of peers in feedback processes and invited them to provide their feedback in spoken form. even though nearly all interviewed instructors and students used only written feedback in online education, several instructors and students mentioned the expected potential of oral peer feedback. it was quicker and more personal, and using the voice and inflection made it easier to be critical, to deliver bad and good news, and to add nuances. in contrast to the written feedback, it added the richness of tone of voice, and, by using video even of facial expressions, it made students feel personally committed and more connected to the course material. and according to one of the instructors, students listened to it more, because they typically accessed it on their smartphones and their tablets. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 6. table 6 specific suggestions of intervention e intervention f: encourage written asynchronous peer feedback teaching students how to provide written peer feedback that is focused on deep learning and is provided as a dialogue was recommended by both instructors and students and confirmed by members of the focus group. this creates awareness about the type of feedback that can be given and stimulates critical thinking, questioning, and reflecting. when providing feedback in written form, there is more time to think about it thoroughly and to formulate it carefully. doing so in a dialogue form by probing back and forth, students can ask each other questions, reflect, and respond to each other, which encourages deep learning. compared to oral feedback, written peer feedback was found to promote deep learning even more effectively because of the more precise type of feedback students are able to provide. according to both the interviewed instructors and students, students often learn more from providing feedback than from receiving feedback. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 7. table 7 specific suggestions of intervention f intervention g: support group work both instructors and students mentioned online group work as a learning method in which deep learning could be promoted through feedback. instructors steered the students towards different group assignments and stimulated personal commitment and interaction. by doing so, students felt motivated and encouraged to be engaged, to reflect and to explicate what they have learned. the instructor taught students to suspend their opinions to create a dialogue and to construct questions in such a way that higher-order thinking is necessary for the others to answer the questions. this not only stimulated back and forth probing, but also made students feel personally committed, which may have motivated students to work just a little harder. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 8. table 8 specific suggestions of intervention g intervention h: provide organized synchronous feedback instructors organized sessions in which students discussed their work and their feedback. the synchronous character made students feel more personally committed than written feedback. the simultaneous communication enabled back and forth probing. the prompt feedback gave the students the opportunity to adjust their performance immediately. both students and instructors said that they appreciated the opportunity to ask for immediate clarification in such a way that the feedback process became a dialogue. specific suggestions of how to implement this intervention mentioned in interviews and/or focus group have been added in table 9. table 9 specific suggestions of intervention h 4.2.3 automatic feedback intervention i: add scenario-based multiple choice questions add scenario-based multiple choice questions, aimed at deep learning, to the course design. scenario-based multiple choice questions contain follow-up questions and may be represented by a tree structure. questions should be asked in such a way that students are encouraged to synthesize information, draw conclusions, and support findings, and reflect on them. online students appreciate multiple choice questions because of the active method and the immediate feedback that provides them with understanding of their own learning process. even though none of the respondents have experience with multiple choice questions specifically aimed at deep learning, most think it will be possible. it requires much precision and thinking very carefully about the questions and the responses, and will therefore be time-consuming during the development phase. however, if the number of students is large enough, the time investment will be worth it. a specific suggestion of how to implement this intervention mentioned in interviews and focus group have been added in table 10. table 10 specific suggestion of intervention i 5. discussion promoting deep learning is an important task for higher education, which is increasingly conducted online. spocs may be a form of online learning that has much potential for deep learning because of its small groups and relatively high number of interaction possibilities. in a previous study (filius et al., 2018), we showed that instructors experience specific challenges when trying to promote deep learning in spocs. this previous study resulted in a description of five main challenges: alignment of learning activities, insights into student needs, adaptivity in teaching strategy, social cohesion, and creating dialogue. to meet these challenges, the incorporation of feedback may have significant potential. therefore, the aim of this study was to provide scalable design propositions for instructors in spocs to promote deep learning through online feedback. design propositions have been formulated according to the cimo-logic. specific attention was paid to the mechanisms behind the interventions as they are central to the plausibility of a design principle (van aken, 2004). the results match with the categorization used by the online learning interaction model of ke and xie (2009), which also aims at deep learning. their three categories could be extended by the mechanisms found in this study. we suggest that interaction in the category “social interaction” may promote deep learning if it makes students feel personally committed. to make them feel that way, it helps students to receive adapted and individualized feedback. online learning interaction in the category “knowledge construction” should, in order to promote deep learning, be aimed at probing back and forth, as a dialogical process. this is in line with previous studies such as the work of rakoczy, harks, klieme, blum, and hochweber (2013), who indicate that receiving feedback is just as important as providing feedback. to fully exploit the feedback, students should be actively engaged in the feedback dialogue. in that same “knowledge construction” category we argue that it is important for students to learn how and when to ask for relevant feedback. this is supported by nicol (2010), who argues that getting students to request feedback, to respond to feedback, and to actively connect feedback to their assignments might result in students’ paying more attention to, and being able to use, instructor feedback. geitz et al. (2015) suggest that this may be explained by the fact that learning how and when to ask for exactly what type of feedback helps students to be more in control and to add personal meaning to the feedback. quality of feedback is important, but the quality of the interaction with the feedback may be even more important. moreover, it helps instructors to manage their time effectively. regarding the third category, “regulation of learning,” interaction to promote deep learning is especially useful when it provides students with more insight into their own learning process. this has been confirmed by other research: students must be equipped with the skills to think for themselves, to set their own goals, and to make improvements to their work while it is being produced (andrade, du, & mycek, 2010; molloy & boud, 2013; narciss, 2013; d. r. sadler, 2013). students need to develop awareness and responsiveness so they can detect anomalies or problems for themselves (d. r. sadler, 2013). according to topping (1998), these self-regulation skills provide students with skills that they will need not only during their higher education, but also during their future life. students who are more effective at selfregulation produce better feedback or are more able to use the feedback they generate to achieve their desired goals (butler & winne, 1995). interestingly, it is shown that peer feedback helps students to obtain these self-regulation skills even better than instructor feedback does (planar & moya, 2016). and peer feedback may be useful for instructors to manage their time effectively. with the current high student-staff ratio, it may be difficult for instructors to engage in dialogue with students. therefore we specifically aimed for scalable interventions. this possibly excluded several instructor-student interventions. results suggest that scalability occurs in three categories of interventions. the first category concerns feedback management, which seeks to reduce the range of tasks of the instructor or to better facilitate the instructor. as feedback should be adaptive in order to be effective (nicol, 2010) and adaptive feedback is considered to be a challenge in the specific context of spocs (filius et al., 2018), the interventions in this category offer possibilities to allow the provision of adaptive feedback to become more feasible. the second category concerns peer feedback, which has been shown to have much potential for promoting deep learning. boud et al. (1999) suggested that working with peers rather than with the instructor may promote higher-order thinking. anderson and rourke (2002) confirmed that discussions by peers were useful in achieving higher-order, but not lower-order, learning objectives because the controversial perspectives offered by other peers disturbed students’ initial understanding of the content and therefore prompted them to process it thoroughly. based on the results of this study, we suggest that the mechanisms found may play an important role in determining whether the peer feedback interventions will lead to deep learning. automatic feedback is the third category. although automatic feedback can be provided for most constructed response items (benson, 2010), the use to specifically promote deep learning has, to the best of our knowledge, not yet been fully explored. since both students and instructors have expectations that this may lead to deep learning, this paper may lead to further investigation of the use of automatic feedback to promote deep learning in spocs. how might instructors use the findings in this paper? one practical proposal is that instructors in spocs examine current feedback practices in relation to the interventions and mechanisms as described above. especially, we expect that combinations of several feedback interventions, triggering multiple mechanisms, may support deep learning in spocs. an examination of this kind might help identify where feedback practices might be strengthened. however, the design propositions presented here do not exhaust all interventions that instructors might perform to promote deep learning in spocs. they merely provide a starting point and emphasize the importance of framing feedback as a dialogical process with active engagement of students. the research challenge is to refine these design propositions, identify gaps, and gather further evidence about the potential of feedback to promote deep learning. learning in an online environment can constitute a positive springboard to the new role that instructors need to take on in an online education model where the student is at the center of the learning process (planar & moya, 2016). given this new role, it is crucial to develop and analyze learning methods that enable a greater amount of dialogue among the students in the learning process (planar & moya, 2016). this present study provides relevant insights into how and why deep learning can be promoted in spocs. since this study is exploratory in nature, we recommend to focus subsequent research on examining the findings on a larger scale. for a follow up study, we also recommend to aim at different methods for assessing deep learning, such as grades or academic performance in general. further on, we chose deliberately to focus primarily on the perspective of the instructors. therefore, the perspective of the students has been used only as a supplement and thus we have limited the number of students to four. a next study could include the perspective of the students and compare them with the findings in this study. future research should be aimed at how feedback interventions are better suitable for promoting deep learning while also taking into account the specific learning mechanisms that should be activated within the different contexts and the workload that instructors experience. future research could also include combinations with other instruments other than feedback, such as collaborative assignments and integrate earlier research into, for example, concept maps (hay, 2015), cross-cultural chat (osman & herring, 2007), podcasting (pegrum, bartle, longnecker, 2014) and online asynchronous discussions (du, harvard & li, 2005). as we explore the relatively young field of spocs, the results of this study show that feedback as a dialogical process may contribute to solving the current challenges of instructors in spocs to achieve deep learning with their students. specific attention has been paid to the mechanisms that are internal to the student and can be triggered by feedback interventions. findings concerning the mechanisms sheds light on why interventions lead to the desired outcome, which is deep learning. it is essential to continue this line of research and to explore systematically the implementation of the design principles, both on learning processes and on learning performance. acknowledgements the authors would like to thank rianne bouwmeester phd keypoints scalable design propositions for instructors in spocs to promote deep learning through online feedback four mechanisms by which deep learning can be achieved student mechanisms triggered by feedback interventions fostering of increased dialogue among the students quality of the interaction with feedback more important than quality of feedback itself references aharony, n. (2006). the use of deep and surface learning strategies among students learning english as a foreign language in an internet environment. british journal of educational psychology, 76(4), 851-866. http://dx.doi.org/10.1348/000709905x79158 akkerman, s., admiraal, w., brekelmans, m., & oost, h. (2008). auditing quality of research in social sciences. quality & quantity, 42 (2), 257-274. http://dx.doi.org/10.1007/s11135-006-9044-4 allan, r., & bentley, s. (2012, april). feedback mechanisms: efficient and effective use of technology or a waste of time and effort? paper presented at the stem annual conference,imperial college, london. anderson, t., & rourke, l. (2002). using peer teams to lead online discussions. journal of interactive media in education, 1, 1-21. andrade, h. l., du, y., & mycek, k. (2010). rubric-referenced self-assessment and middle school students’ writing. assessment in education: principles, policy & practice, 17(2), 199-214. http://dx.doi.org/10.1080/09695941003696172 askew, s., & lodge, c. (2000). gifts, ping-pong and loops–linking feedback and learning. in s. askew (ed.) (1st ed.)., feedback for learning(pp. 1-17). london: routledge, falmer. http://dx.doi.org/10.4324/9780203017678 athanassiou, n., mcnett, j. m., & harvey, c. (2003). critical thinking in the management classroom: bloom's taxonomy as a learning tool. journal of management education, 27(5), 533-555. http://dx.doi.org/10.1177/1052562903252515 benson, a. d. (2010). assessing participant learning in online environments. facilitating learning in online environments: new directions for adult and continuing education , number 100, 103, 69. http://dx.doi.org/10.1002/ace.120 berliner, d. c. (2001). learning about and learning from expert teachers. international journal of educational research, 35(5), 463-482. http://dx.doi.org/10.1016/s0883-0355(02)00004-6 biggs, j. (1999). what the student does: teaching for enhanced learning. higher education research & development, 18(1), 57-75. http://dx.doi.org/10.1080/07294360.2012.642839 biggs, j., kember, d., & leung, d. y. (2001). the revised two-factor study process questionnaire: r-spq-2f. british journal of educational psychology, 71(1), 133-149. http://dx.doi.org/10.1348/000709901158433 biggs, j., & tang, c. (2011). teaching for quality learning at university, berkshire: the society for research into higher education and open university press. bogdan, r., & biklen, s. k. (2003). qualitative research for education: an introduction to theories and methods. new york: pearson. booth, p., luckett, p., & mladenovic, r. (1999). the quality of learning in accounting education: the impact of approaches to learning on academic performance. accounting education, 8(4), 277-300. http://dx.doi.org/10.1080/096392899330801 boud, d. (2001). peer learning and assessment. in d. boud, r. cohen, & j. sampson (eds.), peer learning in higher education(1 sted., pp. 67-84). london: kogan page limited. boud, d., cohen, r., & sampson, j. (1999). peer learning and assessment. assessment & evaluation in higher education, 24 (4), 413-426. http://dx.doi.org/10.1080/0260293990240405 boud, d., & molloy, e. (2013). rethinking models of feedback for learning: the challenge of design. assessment & evaluation in higher education, 38(6), 698-712. http://dx.doi.org/10.1080/02602938.2012.691462 bronkhorst, l. h., meijer, p. c., koster, b., & vermunt, j. d. (2011). fostering meaning oriented learning and deliberate practice in teacher education. teaching and teacher education, 27(7), 1120-1130. http://dx.doi.org/10.1016/j.tate.2011.05.008 brouwer, p., brekelmans, m., nieuwenhuis, l., & simons, r. (2012). fostering teacher community development: a review of design principles and a case study of an innovative interdisciplinary team. learning environments research, 15(3), 319-344. http://dx.doi.org/10.1007/s10984-012-9119-1 butler, d. l., & winne, p. h. (1995). feedback and self-regulated learning: a theoretical synthesis. review of educational research, 65(3), 245-281. http://dx.doi.org/10.3102/00346543065003245 carless, d., salter, d., yang, m., & lam, j. (2011). developing sustainable feedback practices. studies in higher education, 36 (4), 395-407. http://dx.doi.org/10.1080/03075071003642449 cohen, l., manion, l., & morrison, k. (2013). research methods in education. london: routledge. http://dx.doi.org/10.4324/9781315456539 cresswell, j. (2007). qualitative inquiry and research design: choosing among five perspectives. london: sage publishing. crook, a., mauchline, a., maw, s., lawson, c., drinkwater, r., lundqvist, k., ... & park, j. (2012). the use of video technology for providing feedback to students: can it enhance the feedback experience for staff and students? computers & education, 58(1), 386-396. http://dx.doi.org/10.1016/j.compedu.2011.08.025 davies, r., & berrow, t. (1998). an evaluation of the use of computer supported peer review for developing higher-level skills. computers & education, 30(1), 111-115. http://dx.doi.org/10.1016/s0360-1315(97)00086-9 dennen, v. p., aubteen darabi, a., & smith, l. j. (2007). instructor-learner interaction in online courses: the relative perceived importance of particular instructor actions on performance and satisfaction. distance education, 28(1), 65-79. http://dx.doi.org/10.1080/01587910701305319 denyer, d., tranfield, d., & van aken, j. e. (2008). developing design propositions through research synthesis. organization studies, 29 (3), 393-413. http://dx.doi.org/10.1177/0170840607088020 dobber, m., akkerman, s. f., verloop, n., admiraal, w., & vermunt, j. d. (2012). developing designs for community development in four types of student teacher groups. learning environments research, 15(3), 279-297. http://dx.doi.org/10.1007/s10984-012-9116-4 dochy, f., segers, m., & sluijsmans, d. (1999). the use of self-, peer and co-assessment in higher education: a review. studies in higher education, 24(3), 331-350. http://dx.doi.org/10.1080/03075079912331379935 du, j., havard, b., & li, h. (2005). dynamic online discussion: task‐oriented interaction for deep learning. educational media international, 42(3), 207-218. http://dx.doi.org/10.1080/09523980500161221 entwistle, n. j. (1991). approaches to learning and perceptions of the learning environment. higher education, 22(3), 201-204. http://dx.doi.org/10.1007/bf00132287 fan, x., miller, b. c., park, k. e., winward, b. w., christensen, m., grotevant, h. d., & tai, r. h. (2006). an exploratory study about inaccuracy and invalidity in adolescent self-report surveys. field methods, 18(3), 223-244. http://dx.doi.org/10.1177/152822x06289161 filius, r.m., de kleijn, r.a.m., uijl, s.g., prins, f.j., van rijen, h.v.m. and grobbee, d.e. (2018). challenges concerning deep learning in spocs. international journal of technology enhanced learning, 10(1-2), 111-127. http://dx.doi.org/10.1504/ijtel.2018.088341 garrison, d. r., anderson, t., & archer, w. (2000). critical inquiry in a text-based environment: computer conferencing in higher education. internet and higher education, 2(2–3), 87−105. http://dx.doi.org/10.1016/s1096-7516(00)00016-6 garrison, d. r., anderson, t., & archer, w. (2001). critical thinking, cognitive presence, and computer conferencing in distance education. american journal of distance education, 15(1), 7-23. http://dx.doi.org/10.1080/08923640109527071 garrison, d. r., anderson, t., & archer, w. (1999). critical inquiry in a text-based environment: computer conferencing in higher education. the internet and higher education, 2(2), 87-105. http://dx.doi.org/10.1016/s1096-7516(00)00016-6 geitz, g., brinke, d. j., & kirschner, p. a. (2015). goal orientation, deep learning, and sustainable feedback in higher business education. journal of teaching in international business, 26(4), 273-292. http://dx.doi.org/10.1080/08975930.2015.1128375 guba, e. g. (1981). criteria for assessing the trustworthiness of naturalistic inquiries. ectj, 29(2), 75. guest, g., bunce, a., & johnson, l. (2006). how many interviews are enough? an experiment with data saturation and variability. field methods, 18(1), 59-82. hall, m., ramsay, a., & raven, j. (2004). changing the learning environment to promote deep learning approaches in first-year accounting students.accounting education, 13(4), 489-505. http://dx.doi.org/10.1080/0963928042000306837 hattie, j., & timperley, h. (2007). the power of feedback. review of educational research, 77(1), 81-112. http://dx.doi.org/10.3102/003465430298487 hay, d.b. (2007). using concept maps to measure deep, surface and non-learning outcomes. studies in higher education, 32(1), 39-57. http://dx.doi.org/10.1080/03075070601099432 hounsell, d. 1997. contrasting conceptions of essay-writing. in the experience of learning, edited by f. marton, d. hounsell, and n. entwistle, 106–125. edinburgh: scottish academic press. ilgen, d. r., fisher, c. d., & taylor, m. s. (1979). consequences of individual feedback on behavior in organizations. journal of applied psychology, 64(4), 349. http://dx.doi.org/10.1037/0021-9010.64.4.349 ke, f., & xie, k. (2009). toward deep learning for adult students in online courses. the internet and higher education, 12(3), 136-145. http://dx.doi.org/10.1016/j.iheduc.2009.08.001 kluger, a. n., & denisi, a. (1996). the effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory. psychological bulletin, 119(2), 254-284. http://dx.doi.org/10.1037/0033-2909.119.2.254 lin, s. s., liu, e. z., & yuan, s. (2001). web-based peer assessment: feedback for students with various thinking-styles. journal of computer assisted learning, 17(4), 420-432. http://dx.doi.org/10.1046/j.0266-4909.2001.00198.x lindblom-ylänne, s. (1999). studying in a traditional medical curriculum-study success, orientations to studying and problems that arise. helsinki: printing house. lynch, r., mcnamara, p. m., & seery, n. (2012). promoting deep learning in a teacher education programme through self-and peer-assessment and feedback.european journal of teacher education, 35(2), 179-197. http://dx.doi.org/10.1080/02619768.2011.643396 marton, f., & saljo, r. (1997). approaches to learning. in f. marton, d. hounsell, & n. j. entwistle (eds.), the experience of learning: implications for teaching and studying in higher education (1sted., pp. 39-58). edinburgh: scottish academic press. midgley, g. (2000). systemic intervention. in midgley, g. (ed.),systemic intervention: philosophy, methodology, and practice(1 sted., pp. 113-133). new york: springer us. http://dx.doi.org/10.1007/978-1-4615-4201-8 molloy, e., & boud, d. (2013). changing conceptions of feedback. in e. molloy & d. boud (eds.), feedback in higher and professional education: understanding it and doing it well (1sted., pp. 11-33). london: routledge. moon, j. a. (2013). reflection in learning and professional development: theory and practice. london: routledge. http://dx.doi.org/10.4324/9780203822296 narciss, s. (2013). designing and evaluating tutoring feedback strategies for digital learning. digital education review,23, 7-26. nicol, d. (2009). assessment for learner self-regulation: enhancing achievement in the first year using learning technologies. assessment & evaluation in higher education, 34(3), 335-352. http://dx.doi.org/10.1080/02602930802255139 nicol, d. (2010). from monologue to dialogue: improving written feedback processes in mass higher education. assessment & evaluation in higher education, 35(5), 501-517. http://dx.doi.org/10.1080/02602931003786559 nicolls, g. (2002). developing teaching and learning in higher education.london: routeledge falmer. http://dx.doi.org/10.4324/9780203469231 osman, g., & herring, s. c. (2007). interaction, facilitation, and deep learning in cross-cultural chat: a case study. the internet and higher education, 10(2), 125-141. http://dx.doi.org/10.1016/j.iheduc.2007.03.004 palys, t. (2008). purposive sampling.the sage encyclopedia of qualitative research methods, 2, 697-698. pegrum, m., bartle, e., & longnecker, n. (2015). can creative podcasting promote deep learning? the use of podcasting for learning content in an undergraduate science unit. british journal of educational technology,46(1), 142-152. http://dx.doi.org/10.1111/bjet.12133 planar, d., & moya, s. (2016). the effectiveness of instructor personalized and formative feedback provided by instructor in an online setting: some unresolved issues. electronic journal of e-learning, 14(3), 196-203. plomp, t. & nieveen, n. (eds.) (2009). an introduction to educational design research: proceedings of the seminar conducted at the east china normal university, shanghai. enschede, the netherlands: slo – the netherlands institute for curriculum development. poortman, c., & schildkamp, k. (2012). alternative quality standards in qualitative research? quality & quantity, 46(6), 1727-1751. http://dx.doi.org/10.1007/s11135-011-9555-5 powell, r. a., & single, h. m. (1996). focus groups. international journal for quality in health care, 8(5), 499-504. http://dx.doi.org/10.1093/intqhc/8.5.499 rakoczy, k., harks, b., klieme, e., blum, w., & hochweber, j. (2013). written feedback in mathematics: mediated by students’ perception, moderated by goal orientation. learning and instruction, 27, 63-73. http://dx.doi.org/10.1016/j.learninstruc.2013.03.002 ramsden, p. (1992). learning to teach in higher education. london: routledge. http://dx.doi.org/10.4324/9780203413937 ramsden, p., & entwistle, n. (1983). understanding student learning.kent: croom helm. ramsden, p., & moses, i. (1992). associations between research and teaching in australian higher education. higher education, 23(3), 273-295. http://dx.doi.org/10.1007/bf00145017 richardson, j. c., koehler, a. a., besser, e. d., caskurlu, s., lim, j., & mueller, c. m. (2015). conceptualizing and investigating instructor presence in online learning environments. the international review of research in open and distributed learning, 16 (3), 256-297. http://dx.doi.org/10.19173/irrodl.v16i3.2123 rushton, a. (2005). formative assessment: a key to deep learning? medical teacher, 27(6), 509-513. http://dx.doi.org/10.1080/01421590500129159 sadler, d. r. (2013). opening up feedback. in s. merry, m. price, d. carless, & m. taras (eds.), reconceptualising feedback in higher education: developing dialogue with students (1sted., pp. 54-63). london: routledge. http://dx.doi.org/10.4324/9780203522813 sadler, p. m., & good, e. (2006). the impact of self-and peer-grading on student learning. educational assessment, 11(1), 1-31. http://dx.doi.org/10.1207/s15326977ea1101_1 topping, k. (1998). peer assessment between students in colleges and universities. review of educational research, 68(3), 249-276. http://dx.doi.org/10.3102/00346543068003249 trigwell, k., prosser, m., & waterhouse, f. (1999). relations between teachers’ approaches to teaching and students’ approaches to learning. higher education, 37(1), 57-70. uijl, s., filius, r., & ten cate, o. (2017). student interaction in small private online courses. medical science educator, 1-6. http://dx.doi.org/10.1007/s40670-017-0380-x van aken, j. e. (2004). management research based on the paradigm of the design sciences: the quest for field-tested and grounded technological rules. journal of management studies, 41(2), 219-246. http://dx.doi.org/10.1111/j.1467-6486.2004.00430.x van aken, j. e. (2007). developing organization studies as an applied science using a triple learning approach. paper presented at the third organization studies summer workshop, greece. van den akker, j. j. h. (1999). principles and methods of development research. in van den akker, j. j. h., branch, r. m. gustafson, k., nieveen, n. & plomp, t. (eds.),design approaches and tools in education and training(1 sted., pp. 1-14). dordrecht: springer netherlands. http://dx.doi.org/10.1007/978-94-011-4255-7 vermunt, j. d., & verloop, n. (1999). congruence and friction between learning and teaching. learning and instruction, 9(3), 257-280. http://dx.doi.org/10.1016/s0959-4752(98)00028-0 frontline learning research 5 special issue „learning through networks‟ (2014) 72-80 issn 2295-3159 corresponding author: maarten de laat, open university of the netherlands, welten institute, valkenburgerweg 177, 6419 at, heerlen, the netherlands, maarten.delaat@ou.nl doi: http://dx.doi.org/10.14786/flr.v2i2.122 72 | f l r unfolding perspectives on networked professional learning: exploring ties and time maarten de laat a , jan-willem strijbos b a open university of the netherlands, welten institute, the netherlands b ludwig-maximilians-university of munich, department of psychology, germany article received 26 june 2014 / revised and accepted 27 june 2014 / available online 15 july 2014 abstract networked learning and learning networks are commonplace concepts in most contemporary discourse on learning in the 21st century. this special issue provides a collection of studies that address the need for a growing body of empirical work to extent the limited understanding of the use and benefits of networks in relation to learning and professional development. in this article we attempt to offer a synthesis of the studies presented in this special issue and reflect on their findings. the studies in this issue present a rich combination of networked professional learning research addressing issues related to the composition and structure of learning networks, their content and activities, showing how multi-faceted research in the field of networked learning really is. based on the findings and methods used in the articles in this issue, we articulate some recommendations for further research. the recommendations are focused on the need for advanced multi-level analysis to understand the complexity of learning ties, the need for employing a multi-method research approach to triangulate and contextualize findings, the need to conduct process and time-based analysis and finally the need to further develop a theory and toolkit for applying social network analysis in the context of networked learning. keywords: networked professional learning, networked learning, professional development, informal-formal learning. m. de laat & j. w. strijbos 73 | f l r 1. introduction the domain of networked learning research has been around for some time. the term has been used predominantly in the uk, where the research by steeples and jones (2002) and goodyear, banks, hodgson and mcconnell (2004) played a central role in the early days. originally there was a strong focus on higher education, but nowadays a networked learning approach to understanding learning practices has extended to include learning in formal, non-formal and informal settings (hodgson, de laat, mcconnell, & ryberg, 2014). according to hodgson et al. (2014), networked learning refers to learning through connections between learners, learners and their tutors, and a learning community and its learning resources. within networked learning, learners have always been seen as proactive and engaging agents. many contemporary perspectives on networked learning derive from critical and humanistic traditions (dewey, 1916; freire, 1970; illich, 1971; mead, 1934) positing that learning is social, takes place in communities and networks, is a shared practice, involves negotiation, and requires dialogue (hodgson, mcconnell, & dirckinck-holmfeld, 2012). often, digital technology is used to support networked learning processes (goodyear et al., 2004). the field of networked learning aims to understand the pedagogical values and beliefs underpinning networked learning in order to advance teaching and learning practices and the design of technologies supporting such practices. the focus is on understanding how relations between learners influence teaching and learning in physical, online and/or blended settings. this special issue is an example of how a networked approach to learning has spread beyond education, since all the articles address questions around professional development, in this paper termed “networked professional learning”. this special issue forms an important and timely collection of articles especially because there is a strong interest in the promise and value of networked professional learning. there is considerable consensus that professionals organise and carry out their own professional development effectively through their own social networks and communities (cross & parker, 2004; duguid, 2005; hargreaves & fullan, 2012; weinberger, 2011; wenger, 1998). however, we lack empirical evidence about people‟s specific networked learning experiences. in particular, it is not well-understood how professionals build and maintain networked connections for learning, what the composition of these networks is, whether and how these learning relationships create value, and how to assess the outcomes of learning through networks in the context of professional development. research brought together in this special issue advances our understanding of networked professional learning, allowing us to reflect on and contribute to networked learning theory and helping us to develop and facilitate networked learning in practice. each article investigates learning from a relational point of view, in formal, informal or mixed settings. this final article offers a reflection on the unfolding perspectives and research presented by the articles collected in this special issue. 2. exploring networked professional learning the articles in this special issue are focused on understanding how social networks influence and impact professional development in networks and communities. vaessen, van den beemt and de laat (this issue) present a conceptual literature review to uncover some underlying mechanisms and factors that influence usage of networked learning in the context of teacher professional development. they explicitly explore the tension between formal and informal learning. they argue that the increased complexity of work requires professionals to use their networks to access and/or develop knowledge and expertise to stay up to date and function successfully. understanding the role and impact of these informal social networks on professional development can foster a better relationship – if necessary – with the traditional, yet dominant, formal professional development activities informed by acquisition and transfer of knowledge via expertdriven, pre-planned courses. vaessen et al.‟s literature review provides a broader framework for understanding professional development through participation in social networks, setting the context for the other articles in the special issue that examine particular aspects of “networked professional learning” in greater detail. m. de laat & j. w. strijbos 74 | f l r pataria, falconer, margaryan, littlejohn and fincher (this issue) investigate academics‟ learning through their personal professional networks. pataraia et al. build on roxå and mårtensson‟s (2009) research on teacher networks in academic contexts, focusing on conversations about teaching. more specifically, pataria et al. examined whether the composition of personal networks (i.e., the proximity of people with whom one is connected) and characteristics of interactions in these networks (i.e., what is exchanged and how it is valued) may support change of teaching practice in universities. this important descriptive research showed that the networks of academics were small, discipline-specific and strongly localised. based on data from interviews from two studies, they conclude that the academics interacted most frequently with closely proximate colleagues, typically from the same discipline. these findings support the notion that homophily (degree of similarity) influences establishment of ties and the development of networks. hytonen, palonen and hakkarainen (this issue) investigated network patterns and structures that contribute to professionals‟ cognitive centrality within a network. the context of their study was a professional training course in the field of energy efficiency. cognitive centrality was based on ties that represented who people contacted for professional advice over the course of twelve months; as such the networked ties constitute the product of the networked learning. more specifically, hytonen et al. examined the central actors within the network and their learning connections in order to identify possible factors that could explain cognitive centrality. their study showed that cognitive centrality is influenced by several factors, such as personal characteristics, expertise, and organisation that the actor represents, but a single decisive factor could not be found. these findings emphasise the complexity of social learning, suggesting that learning is highly contextualized and situated. rehm, gijselaers and segers (this issue) examined the transferability of knowledge in relation to the hierarchical network positions of members of an online community of learners during a professional development training program. rehm et al. addressed the notion that participants‟ hierarchical positions within the organization can have an effect on the collaborative processes within communities of learning. they showed that higher inand out-degree and centrality scores were associated with higher hierarchical positions within the organization. their longitudinal analysis indicated that these trends were established relatively early on during the professional development programme. the studies present a rich combination of networked professional learning research addressing issues related to the composition and structure of learning networks, their content and activities, showing how multi-faceted research in the field of networked learning really is. based on the findings reported and methods used, the following sections articulate some recommendations for further research. 3. unfolding networked professional learning the articles in this special issue provided us with snapshots of networked professional learning and details about the constitution of the learning networks in a variety of contexts. combined these articles challenge the naïve view that large(r) networks with many ties, or very elaborate networks with many ties of a specific type (e.g., weak vs. strong), are better and/or preferable by default. needless to say professionals take part in and maintain many networked relationships, but in essence networks are always about something, focused on a particular problem or shared interest. the “whole might be greater than the sum”, but the merit of the research presented in this special issue is to understand how particular (sub)networks or networked activity that professionals take part in contribute to their learning. for example, pataraia et al., and to some extent also hytonen et al., clearly show that professionals maintain many networked relationships with a variety of people for a number of reasons. done from an ego perspective, this work shows that professionals use their relations for exchanging information and ideas, to talk about work-related problems and to seek advice. rather than focussing on the impact and effects of networking in general it is very important to understand in great detail “what goes on in particular networks” and see how participation in networks affects learning. although the articles present findings at different levels of analysis and network scale (ego-personal network, sub-network, and/or whole network), there are interesting connections between the findings of m. de laat & j. w. strijbos 75 | f l r these different studies to be reflected upon. in the following synthesis we will try to address these differences in theory, method and network scale and explore if and how these different levels can be connected. boud and hager (2012) highlighted the importance of uncovering the ways in which people participate in social settings (networks) through which they seek to co-create knowledge and become a better professional. all articles in this issue take a social perspective on learning. in reflecting on professionals‟ network positions and the role of these networks in the learning processes, these studies draw on the “participation” metaphor of learning (as opposed to the acquisition metaphor, see sfard, 1998, for an elaborate discussion on these metaphors). while rehm et al. concentrate on the transfer of knowledge amongst members of a community of learners, they too position professional learning as a process of collaborative knowledge creation in social networks. social participation and network building is predominantly seen as an informal activity promoted, for example, through professional autonomy (cross & parker, 2004; kessels, 2012). however, vaessen et al. specifically argue that the underlying mechanisms for networked learning are found in both formal and informal settings. they further argue that networked learning is the most effective when located within work practices. in the workplace, learning is collaborative and situated within social relationships. networked learning is most effective in work settings in which professionals have high levels of autonomy, trust, openness and accountability and where these is an organisational culture of management promoting collaboration, discursive and open communication, and bottom-up learning and change. the findings presented by vaessen et al. are to some extent reflected in the studies by pataraia et al., hytonen et al., and rehm et al., but criticised as well. for example, the finding by pataraia et al. that academics‟ teaching networks were small, discipline-specific and strongly localised reveals that the establishment of connections with others is influenced by proximity, homophily, as well as perceived relevance and anticipated value of these connections. pataraia and colleagues‟ empirical data seems to suggest that academics‟ teaching networks are predominantly formed around strong ties. in a similar vein, findings by hytonen et al. show that cognitive centrality of core participants is affected by a multitude of factors, including personal characteristics (e.g., expertise, engagement), openness, and their organisational background. finally, rehm et al. show that characteristics of the formal work setting – i.e. people‟s hierarchical position – influence interaction patterns in an informal setting. the findings are in line with vaessen et al. in the sense that a similar structure (hierarchy) in the formal setting affects networked learning ties in the informal setting, but not necessarily as intended (although rehm et al. do not comment on this aspect). rehm et al. concluded that more senior professionals could draw more actively upon the input of colleagues allowing less senior participants to gradually move towards the centre of a network. likewise, vaessen et al. concluded that the network(s) transcend organisational boundaries, while rehm et al. indicate that this process may also benefit from some facilitation and/or intervention. both agree that management may need to promote networked learning by opening up organisational structures where management and community members can learn together. finally, vaessen et al. indicate that informal networks thrive in open practices, in which strong and weak ties co-exist (granovetter, 1973). such open network practices and “culture of learning” that is facilitated and promoted by the management appear especially relevant for professional learning (price, 2013). open practices consist of networks that are collections of individuals across organisational, spatial and disciplinary boundaries, who come together to create and share a body of knowledge (de laat, schreurs, & nijland, 2014). open networks focus typically on developing, distributing and applying knowledge (pugh & prusak, 2013). open network members connect around a common goal and share social and operational norms. they typically participate out of common interest and of a shared purpose rather than because of contract, quid pro quo or hierarchy. they are not bound or confined by shared identities and knowledge and meaning is not retained in the way in which it is done in communities of practice. the relationship between the members is much more loose and dynamic, yet effective in the creation of new ideas. open network practices offer professionals a more dynamic platform to connect with relevant peers who can help them to stay up to date than communities of practice do. a further feature of such open network practices is that they m. de laat & j. w. strijbos 76 | f l r are self-directed and non-hierarchical. wellman‟s (2002) notion of networked individualism emphasizes the point that professionals have a great ability to act on their own, to solve their problems and organise their lives, but they do this in a networked way with the help of friends and other relationships. the diversity of sources in a professionals‟ network is also echoed in the findings by pataraia et al. and hytonen et al. although rather implicitly, the articles in this special issue suggest several avenues for further research. in the next subsections we will discuss some directions for further research. 3.1 need to clarify the “what” and “why” of “learning tie” there is a clear need to develop theoretically-based and differentiated qualifications of the meaning of a tie, that is to investigate the “what” and “why” of a tie. in this special issue, pataraia et al., hytonen et al. and rehm et al. explicitly unfold the meaning of a tie. pataria et al. and hytonen et al. focused on both the “what” (content of a tie) and “why” (explanation for a tie or structure of personal/ego-network or the entire network), whereas rehm et al. focused only on the “why”. furthermore, networked learning ties can be treated both as relations that connect people as well as outcomes of relations (haythorntwaite & de laat, 2012). in the first instance, the tie refers to relational ties used for learning, such as a student learning from a teacher, students or professionals learning from peers, or novice professionals learning from experts. an example of networked learning ties as outcomes is when a group collectively acquires a competence in a certain domain that helps them to deal with new situations. as relational ties can represent both the process and the product of learning, there is a clear need to separate them or at least treat each tie as a compound construct consisting of several layers of process and product components. for example, at the individual level, a tie may consist of 30% on-going learning activities, 40% current project work, 20% personal bonds, and 10% status. a multi-layered perspective on ties allows for (multilevel) multiple regression approaches to understand the multifaceted nature of ties. the conceptualisation of ties as multi-layered constructs also opens up new directions regarding how these ties can be afforded, fostered, and facilitated through social interaction, design for learning, and technology. 3.2 need to examine networks at multiple levels and the interplay between levels combined the articles in this special issue cover all levels of scale possible. pataria et al. investigated the individual level in terms of academics‟ personal teaching networks and the characteristics of their interactions with colleagues. in a slightly different way, hytonen et al. examined the individual level to understand both the structure and heterogeneity of central participants‟ personal networks. they analysed the entire network to identify which other participants (alters) connected to the cognitively-central actors and to examine the associated network clusters and the degree of collaboration within these. finally, rehm et al. investigated networks as communities of learners, adopting a whole network analysis approach to explore the positions within these online communities based on participants‟ rank and hierarchical position within the organization. although these articles cover the range of possible levels – personal (ego), sub-network (community or larger cluster) and entire network – the explicit comparison or investigation of the interplay between various levels was not attempted. it is conceivable, for example, that an individual‟s personal network may be low in density, yet the individual may hold a key brokering position in the entire network. examining the interplay between levels might be a promising direction in future research in professional networked learning. referring back to the issue of the “whole being greater than the sum”, research on the interplay of levels will help to uncover how to potentially assess the nature of learning ties for the individual, a particular network and the organization. within human resource development (hrd) – especially from a formal management point of view – there is interest in monitoring and assessing networked learning in order to validate and award it. the immediate response seems to be on trying to assess networks, similar to registration of attendance of professional development programmes, rather than focussing on the value that is created through networks and communities (wenger, trayner, & de laat, 2011). multi-level research on the value of learning ties can help assess the outcome of networked professional learning in relation to different stakeholders. m. de laat & j. w. strijbos 77 | f l r 3.3 need for extending the methodological toolkit as there are different ways to conceptualise learning ties, there are different analysis techniques to study them. for example, who learns from whom, what do learners learn from each other, the kinds of interactions between learners, the direction of ties, flow of resources, and the frequency of interactions. several of these aspects are related to communication and information patterns, whereas others directly deal with learning itself. networked learning studies often address these aspects, however some reflection on how we may be more critical and cautious about the way in which network analysis is used to understand learning ties is required. a popular method for studying networked (professional) learning is the use of social network analysis (sna). two studies in this issue applied sna (hytonen et al.; rehm et al.) to understand the network structures or dynamics. sna has become rather popular for trying to understand learning ties, but we have to remain cautious about its application. sna was developed to understand for example the flow of information or communication across networks – i.e., more factual data. if person a passes something on to person b, traditional sna assumes that person b has received it. however, when learning is concerned this assumption may not hold. first, the extent to which whatever was passed on was received may be uncertain. second, the contribution of information to the actual learning process of the receiver is uncertain. hence, the network theory or operationalization of indicators behind the tests that researchers conduct may have different implications. does density in a communication network imply the same as density in a learning network? what does the shortest path mean in terms of “learning”? are all sna indicators by default useful indicators of learning? a more advanced theory of sna is needed to guide studies on “social network learning analysis” (snla). sna is a very flexible method, but it requires a solid theoretical framework to enable interpretation of findings. in the absence of a solid theoretical framework of learning through networks, researchers rely on conceptualisation from related research domains. when applying analysis techniques that reflect a different theoretical orientation, researchers risk type i and ii errors. furthermore, despite the ease with which network visualisations can be produced from sna, such visualisations should be approached with more restraint when interpreting the network structures. another approach would be the application of multilevel analyses, discussed by rehm et al. (this issue). an example is the recent study by eberle, stegmann and fischer (2014), who investigated legitimate peripheral participation (a well-known construct introduced by lave and wenger (1991) to describe learning processes in communities of practice), in terms of support structures used to foster newcomers‟ participation. they applied a 2-level model, which included 14 student councils (communities) and 68 newcomers. they found that exposure time (duration of community membership) and the support structure of “accessibility of community knowledge” positively predicted the newcomers‟ participation, whereas community size and “recruitment strategies” negatively predicted participation. finally, the instruments and methods applied in the articles in this special issue reflect that a multimethod approach is required to investigate networked professional learning and obtain a more complete understanding of the nature of “learning” reflected by the ties and the indicators that sna offers. a potential direction would be the combination of sna, content analysis of communication, and a contextual analysis through interviews (de laat & lally, 2003). the contributions by pataraia et al. and hytonen et al. are examples of such a contextualised approach to understanding network structures. rehm et al. acknowledge that their study would have benefitted from content analysis to help uncover how the “what” of the tie might have impacted network position and exchange of knowledge. 3.4 need to examine networked learning over time over the past decade, the issue of time has slowly developed into a more focal point of research on interactive learning processes. an early contribution in this respect is the work by de laat and lally (2003), who identified changes in both interactive and tutoring patterns within a community of learners, by distinguishing between the early, middle and end phase of the community experience. similarly, the notion of time is receiving more attention in the domain of (small) group collaborative learning, where learning is studied longitudinally, in terms of sequences of actions (suthers, dwyer, medina, & vatrapu, 2010), specific m. de laat & j. w. strijbos 78 | f l r timeframes such as days, weeks or months (arrow, henry, poole, wheelan, & moreland, 2005; reimann, 2009), or in terms of activities in a learning environment over time (schümmer, strijbos, & berkel, 2005). vaessen et al., pataraia et al., and hytonen et al. implicitly refer to issues of time. vaessen et al. describe professional development as an “ongoing process”, arguing that “networking skills need to be developed over time”. pataraia and colleagues‟ data were collected over a 12-months time span. they argue that “the temporal component of interactions determines the strength of ties”, and that the networks are not only influenced by proximity and/or discipline, but also have a “historical or temporal component”. hytonen et al. analysed data collected after a 12-months period following a training programme. their study focused on small group and community level aspects, but the development of the networks of cognitively central participants was not part of their aim. in contrast, rehm et al. explicitly adopted a longitudinal analytical lens when analyzing reply structures in online communities collected over a 14-week time span in terms of two blocks of about six weeks. their analysis showed that the more central positioning of senior management was established relatively early on and persisted – in fact slightly increased – over time. 4. closing remarks the articles comprising this special issue have advanced our understanding of networked professional learning. the empirical studies provided detailed accounts of the structure and focus of networked learning at various levels (ego, sub-network and whole network). they improve our understanding of the characteristics of networked learning and contribute to a much-needed empirical knowledge base in this area of research. the literature review provided by vaessen et al. helps to broaden our horizon as well as situating the findings of the other articles, by offering the mechanisms that influence networked professional learning. the three empirical studies, although addressing different levels of scale, reinforce and supplement each other. for example, where pataraia et al. find that professionals maintain multiple networks for their development, hytonen et al. identify several sub-networks centralized around key actors. the emergence of these personal networks hinges on expertise, interest, enthusiasm, competency, familiarity, organizational background, as well as hierarchy and formal organizational relationships and structures. simultaneously the studies (implicitly) generated some directions for future research that we elaborated upon: (a) the need to clarify the “what” and “why” of “learning tie”, (b) the need to examine networks at multiple levels and the interplay between levels, (c) the need for extending the methodological toolkit, and (d) the need to examine networked learning over time. the importance of professional autonomy and cross-boundary collaboration that seems to foster networked professional learning brings the emergence of open practices into perspective. both professionals and organizations are increasingly becoming aware that knowledge and innovation processes are not bounded by the organizational context and that boundary crossing becomes an important aspect of professional development. this raises further questions about how to monitor, promote and assess networked professional learning. the naïve view of “the more contacts the merrier” is too simplistic. studies in this special issue have shown that important features and mechanisms of networks are personal (probably small and localized), centralized around shared topics interests and key members, driven by professional autonomy and negotiate both informal and formal settings influenced by hierarchical organizational structures. these findings shed some light on how networks operate and create value, based on which knowledge about how to facilitate and manage networked professional learning can be inferred. m. de laat & j. w. strijbos 79 | f l r keypoints the studies in this issue present a rich combination of networked professional learning research addressing issues related to the composition and structure of learning networks, their content and activities, showing how multi-faceted research in the field of networked learning really is. need for advanced multi-level analysis to understand the complexity of learning ties need for employing a multi-method research approach to triangulate and contextualize findings need to conduct process and time-based analysis need to further develop a theory and toolkit for applying social network analysis in the context of networked learning references arrow, h., henry, k. b., poole, m. s., wheelan, s., & moreland, r. (2005). traces, trajectories, and timing. in m. s. poole & a. b. hollingshead (eds.), theories of small groups: interdisciplinary perspectives (pp. 313-367). thousand oaks, ca: sage. boud, d., & hager, p. (2012). re-thinking continuing professional development through changing metaphors and location in professional practice. studies in continuing education, 34(1), 17-30. doi: 10.1080/0158037x.2011.608656 cross, r. l., & parker, a. (2004). the hidden power of social networks: understanding how work really gets done in organizations. boston, ma: harvard business school press. de laat, m., & lally, v. (2003). complexity, theory, and praxis: researching collaborative learning and tutoring processes in a networked learning community. instructional science, 31(1-2), 7-39. doi: 10.1023/a:1022596100142 de laat, m. f., schreurs, b., & nijland, f. (2014). communities of practice and value creation in networks. in. r. f. poell, t. rocco & g. roth (eds.), the routledge companion to human resource development (pp. 249-257). new york: routledge. dewey, j. (1916). democracy and education: an introduction to the philosophy of education. new york: the macmillan company. duguid, p. (2005). the art of knowing: social and tacit dimensions of knowledge and the limits of the community of practice. the information society, 21(2), 109-118. doi: 10.1080/01972240590925311 eberle, j., stegmann, k., & fischer, f. (2014). legitimate peripheral participation in communities of practice: participation support structures for newcomers in faculty student councils. the journal of the learning sciences, 23(2), 216-244. doi: 10.1080/10508406.2014.883978 freire, p. (1970). pedagogy of the oppressed. new york: continuum. goodyear, p., banks, s., hodgson, v., & mcconnell, d. (eds.) (2004) advances in research on networked learning. dordrecht, the netherlands: kluwer academic publishers. granovetter, m. s. (1973). the strength of weak ties. the american journal of sociology, 78(6), 1360-1380. http://www.jstor.org/stable/2776392 hargreaves, a., & fullan, m. (2012). professional capital: transforming teaching in every school. new york: teachers college press. haythornthwaite, c., & de laat, m. f. (2012). social network informed design for learning with educational technology. in a. olofson & o. lindberg (eds.). informed design of educational technologies in higher education: enhanced learning and teaching (pp. 352-374). hershey, pa: igi-global. hodgson, v., de laat, m. f., mcconnell, d., & ryberg, t. (2014). researching design, experience and practice of networked learning: an overview. in v. hodgson, m. f. de laat, d. mcconnell & t. ryberg (eds.). the design, experience and practice of networked learning (pp. 1-26). dordrecht, the netherlands: springer. m. de laat & j. w. strijbos 80 | f l r hodgson, v., mcconnell, d., & dirckinck-holmfeld, l. (2012). the theory, practice and pedagogy of networked learning. in l. dirckinck-holmfeld, v. hodgson & d. mcconnell (eds.), exploring the theory, pedagogy and practice of networked learning (pp. 291-305). new york: springer. illich, i. (1971). deschooling society. manchester, uk: pelican books. kessels, j. w. m. (2012). leiderschapspraktijken in een professionele ruimte [leadership practice in a professional space]. heerlen, the netherlands: ruud de moor centrum, open university of the netherlands. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge, ma: university press. mead, g. (1934). mind, self & society from the standpoint of a social behaviorist. chicago, il: university of chicago press. price, d. (2013). open: how we’ll work, live and learn in the future [ebook, kindle edition]. crux publishing. pugh, k., & prusak, l. (2013). designing effective knowledge networks. retrieved october 24, 2013, from http://sloanreview.mit.edu/article/designing-effective-knowledge-networks reimann, p. (2009). time is precious: variableand event-centered approaches to process analysis in cscl research. international journal of computer-supported collaborative learning, 4(3), 239-257. doi: 10.1007/s11412-009-9070-z roxå, t., & mårtensson, k. (2009). significant conversations and significant networks: exploring the backstage of the teaching arena. studies in higher education, 34(5), 547-559. doi: 10.1080/03075070802597200 schümmer, t., strijbos, j. w., & berkel, t. (2005). measuring group interaction during cscl. in t. koschmann, d. suthers & t. w. chan (eds.), computer supported collaborative learning 2005: the next 10 years! (pp. 567-576). mahwah, nj: lawrence erlbaum associates. sfard, a. (1998). on two metaphors for learning and the dangers of choosing just one. educational researcher, 27(2), 4-13. doi: 10.3102/0013189x027002004 steeples, c., & jones, c. (2002). networked learning: perspectives and issues. london: springer. suthers, d., dwyer, n., medina, r., & vatrapu, r. (2010). a framework for conceptualizing, representing, and analyzing distributed interaction. international journal of computer-supported collaborative learning, 5(1), 5-42. doi: 10.1007/s11412-009-9081-9 weinberger, d. (2011). too big to know: rethinking knowledge now that the facts aren’t the facts, experts are everywhere, and the smartest person in the room in the room. new york: basic books. wenger, e. (1998). communities of practice: learning, meaning, and identity. cambridge, ma: cambridge university press. wenger, e., trayner, b., & de laat, m. (2011). telling stories about the value of communities and networks: a toolkit. heerlen, the netherlands: ruud de moor centrum, open university of the netherlands. wellman, b. (2002). little boxes, glocalization, and networked individualism. in m. tanabe, p. van den besselaar & t. ishida (eds.), digital cities ii: computational and sociological approaches (pp. 1025). berlin: springer. microsoft word geeraerts et al_publication.docx frontline learning research vol.5 no. 2 (2017) 78 -‐ 98 issn 2295-‐3159 corresponding author: kendra geeraerts, gratiekapelstraat 10, 2000 antwerp, belgium email: kendra.geeraerts@uantwerpen.be doi: http://dx.doi.org/10.14786/flr.v5i2.293 intergenerational professional relationships in elementary school teams: a social network approach kendra geeraertsa, piet van den bosschea,b, jan vanhoofa, nienke moolenaarc auniversity of antwerp, belgium buniversity of maastricht, the netherlands cuniversity of utrecht, the netherlands article received 20 february / revised 22 august / accepted 22 august / available online 8 september abstract this paper examines the extent to which school team members’ professional relationships are affected by being part of a certain generational cohort. these professional relationships provide opportunities for intergenerational knowledge flows and can therefore be relevant for intergenerational learning. nowadays these topics have gained more attention due to worldwide demographic changes such as increased retirement rates and high levels of teacher dropout. data were gathered through a survey with socio-metric questions among 299 school team members in 15 elementary schools in the netherlands. using social network analysis, in particular p2 modelling, we analysed the effect of being part of a generational cohort on teachers’ likelihood of having professional relationships in networks such as discussing work, asking and providing advice, and collaboration. findings indicate that generational cohorts based on chronological age do matter in the formation of work related ties. these findings also support the importance of focusing on different professional networks since different age dynamics can be at play. our findings also show that school team members of the youngest cohort tend to form intra-generational relationships, whereas older generational cohort members prefer inter-generational relationships. this study is innovative due to its application of social network analysis to investigate intergenerational knowledge flows. keywords: intergenerational learning; school teams; teacher development; social network analysis; p2 modelling geeraerts et al | f l r 79 1. introduction nowadays, the role of knowledge management within schools as an organizational context has received more attention due to its potential to encourage innovative practices and to avoid knowledge loss within school teams (thambi & o'toole, 2012). knowledge loss can occur when workers leave the profession without sharing their knowledge or without turning implicit knowledge into an explicit mode. similar to other countries, schools in the netherlands are confronted with a large outflow of older teachers and a challenge to retain young teachers into the teaching profession. as compared to secondary school teams, elementary schools are often smaller organizations and characterized by a more cohesive organizational culture (johnson, 1990). also, the tasks of elementary school teachers show more similarities than for secondary school teachers. therefore, we assume elementary school teams to be a fruitful context for exchange of knowledge which offers an interesting case to investigate how professional relationships are shaped. facilitating intergenerational learning interactions seems to be a promising way to prevent knowledge loss within organizations (gerpott, lehmann-willenbrock, & voelpel, 2016; ropes, 2013; starks, 2013). intergenerational learning in school teams is mainly conceptualised as an interactive process between teachers of different generations that results in learning from one or both parties (novotný & brücknerová, 2014; ropes, 2011). in this study, we refer to generations of teachers by using the term generational cohort, which in turn refers to being born in the same chronological time period. individuals of generational cohorts are found to possess different kinds of knowledge (gerpott et al., 2016). previous research within the context of school teams showed that teachers’ knowledge varies depending on their generational cohort or level of experience. for instance, young teachers are perceived to possess innovative teaching methods and ict skills, while teachers of the oldest generational cohort are perceived to have excellent classroom management skills and content knowledge (geeraerts, vanhoof, & van den bossche, 2016). simultaneously, classroom management skills are known to be a challenge for beginning teachers (wolff, van den bogert, jarodzka, & boshuizen, 2015). these findings make age diversity and intergenerational learning within school teams relevant. moolenaar (2010) highlighted the importance of interactions between school team members to facilitate knowledge sharing and learning. consequently, intergenerational knowledge sharing can be understood as a socio-constructive process in which interaction plays a facilitating role (geeraerts et al., 2016; novotný & brücknerová, 2014). this implies that school team members must be aware of resources such as information, knowledge and expertise of their colleagues, and make use of their social relationships to access these assets. dynamics of sending and receiving professional relationships between or within generational cohorts have the potential to facilitate knowledge flows within school teams. these flows can be influenced by the fact that different generations of teachers also differ in life and work experiences or feelings. for example, a great number of early career teachers face a practice shock that is accompanied by feelings of uncertainty (pillen, beijaard, & den brok, 2013; stokking, leenders, de jong, & van tartwijk, 2003). in addition, young teachers fear being perceived as incompetent or vulnerable by their more experienced colleagues, also described as feeling evaluated (kelchtermans & ballet, 2002). on the contrary, older teachers or experienced teachers are perceived to have a high level of self-confidence in their profession (geeraerts et al., 2016). consequently, we expect that teachers of different generations show differences in the formation of professional relationships. in this study, we examine the formation of relationships in elementary school teams in the netherlands, focusing on discussing work, asking and providing advice, and collaboration. we label these relationships as professional relationships and question them from an intergenerational perspective, meaning that these relationships can be formed within or across different generations of school team members. hereto, we apply methods from social network analysis (sna) enabling us to investigate social relationships and to look more deeply into interactions among school team members. when investigating professional relationships we focus on sending and receiving relationships. specifically, this study investigates to what extent school team members of different generational cohorts differ in the number of geeraerts et al | f l r 80 professional relationships they send and receive, and secondly, how being part of the same generational cohort affects the likelihood of engaging in professional relationships. in the following, we start with framing our work within the literature of generations, and school team members’ social relationships. the latter part builds on social capital theory and social network theory. 2. theoretical framework 2.1. generational diversity among school team members the concept of generations was first introduced by mannheim (1952) and referred to a generation as a group of individuals who share mutual social and historical events during their lifespan. according to edge (2014) school teams include three generations of teachers and leaders: baby boomers (1946-65), generation x (1966-80) and generation y (1981-2003). in literature, there are a lot of inconsistencies concerning the labels of these generational cohorts and the boundaries used to determine a generational cohort. brücknerová and novotný (2016) relied in their study on intergenerational learning of teachers on teachers’ own perceptions of themselves as a member of a particular generation. most studies used the chronological dimension of age to frame school team members’ generational cohorts (edge, descours, & frayman, 2016; geeraerts et al., 2016). looking at school teams through a generational lens sheds light on the level of age diversity within school teams (brücknerová & novotný, 2016). previous studies in other contexts have already recognized the benefits of age diversity in teams since individuals contribute different kinds of information, knowledge, skills, and expertise to the team (gerpott, lehmann-willenbrock, & voelpel, 2016; williams & o'reilly, 1998). the traditional point of view in which older workers are perceived as experts is questioned nowadays. fuller and unwin (2004) acknowledged that young workers have already developed different kinds of knowledge and skills, and also have higher educational levels than many of their older counterparts with whom they interact in the workplace. this finding brings the importance of the bidirectional character within the learning process under attention. previous research by geeraerts et al. (2016) showed that teachers of different generational cohorts are perceived to possess different kinds of knowledge. whereas young teachers were seen as a knowledge source for innovative teaching methods and ict skills, teachers older than 50 were rather associated with classroom management skills and subject knowledge. many studies have focused on differences between novice teachers and experienced ones (wolff et al., 2015), but studies in which the interactions between both parties are investigated are rather scarce. thus, interactions between teachers of different generations can provide opportunities to learn from each other’s knowledge, especially when these learning processes are characterized by bidirectional interactions instead of unidirectional ones. these findings underline the added value of the formation of relationships across different generations of teachers for the construction and transfer of knowledge, and raises important questions on knowledge management within school teams. 2.2. the social side of learning across generations in this paper, we argue that professional relationships between teachers of different generations may provide opportunities to learn from each other. this notion builds on a more 'social' interpretation on how learning takes place. lee et al. (2004) highlight that learning is a complex concept due to different approaches to learning. the traditional approach to learning describes 'learning as acquisition’ and builds on cognitive psychology and behaviourism (sfard, 1998). this learning as acquisition approach focusses mainly on the individual. a more recent approach refers to 'learning as social participation’ and focusses on learning geeraerts et al | f l r 81 through social relations and participation of individuals within communities of practice (lave & wenger, 1991; sfard, 1998). this implies that learning can be understood as a social, interactive process of coconstruction. learning in the workplace is mainly informal and involves learning from colleagues on the job (eraut, 2004). knowledge acquisition and access to information from others are equally important contributors to learning processes in workplaces (ashton, 2004). therefore, many researchers in the field of workplace learning emphasize the importance of social relationships for informal learning (doornbos, bolhuis, & simons, 2004; eraut, 2004; tynjälä, 2008). also in terms of teacher learning, the social side of learning is emphasised based on the idea that cognition is situated in nature (e.g. kwakman, 2003; lohman, 2000, 2006; meredith, van den noortgate, struyve, gielen, & kyndt, 2017; van waes, van den bossche, moolenaar, de maeyer & van petegem, 2015). a shift from a focus on the individual towards a more social approach contributed to the popularity of social network research methods to investigate relationships of teachers (baker-doyle, 2015). we will now further explore how knowledge sharing among generational cohorts may contribute to intergenerational learning by first zooming into teachers' relationships, and then, into some relevant social network concepts and dynamics. 2.3. school team members’ social relationships the attention for social relationships between teachers is underpinned by research on the importance of social capital for school improvement and instructional reform (spillane, kim, & frank, 2012). social capital within an organization reflects an investment in social relationships through which valuable resources such as knowledge, information and expertise can be accessed, borrowed, or leveraged (daly, 2010; lin, 1999). the general concept of social capital provides a framework to conceptualize how individuals have access to resources (e.g. information, expertise) by the web of social relationships surrounding them, thereby offering (or hindering) opportunities for (intergenerational) learning. borgatti, everett, and johnson (2013) distinguish two main types of relations: relational states and relational events. relational states refer to continuously persistent relationships between individuals (e.g. being a colleague, teacher or friend), whereas relational events refer to discrete events such as interactions (e.g. asking a colleague for advice). the outcomes of interactions are flows, and can contain, for instance, information, knowledge, and expertise (borgatti et al., 2013). we build on earlier work, suggesting that social relationships offer opportunities for knowledge creation, knowledge retention, and knowledge transfer (argote, mcevily, & reagans, 2003). in school teams, collegial relationships have the potential to initiate occasions to learn from each other within or between generational cohorts. informal learning interactions between teachers occur in the form of engaging in dialogue, collaborating, sharing resources such as information, lesson materials, ideas, advice, etc. (baker-doyle, 2015; kwakman, 2003; lohman, 2006). accordingly, knowledge flows are the result of interactions between two teachers through which information is exchanged. in addition, social relationships can be described in terms of the content of the relationship. ibarra (1993, 1995) distinguishes between expressive and instrumental relationships. this distinction also applies to school teams (moolenaar, 2010). whereas expressive relationships do not directly aim at work related issues (e.g. friendship), instrumental relationships do aim to achieve organisational goals. for instance, work related discussions and asking questions are interactions that enable exchange of expertise (gerpott et al., 2016). furthermore, spillane et al. (2012) see advice and information seeking relationships as critical for teachers’ professional development and for knowledge development. also, relationships in terms of teacher collaboration can be seen as an indicator of informal learning within school teams (richter, kunter, klusmann, lüdtke, & baumert, 2011). all of these relationships: discussing work, asking advice, providing advice, and collaboration, provide opportunities for intergenerational learning and can be labelled as instrumental (geeraerts et al., 2016; novotný & brücknerová, 2014). moolenaar (2010) found only partial overlap between these different networks, which highlights the semi-unique character of these networks as a source for exchange of knowledge, expertise, teaching materials, and other resources valuable to teacher geeraerts et al | f l r 82 learning and school performance. this study focuses on instrumental relationships and labels them as professional relationships due to its professional nature and closer link to organisational benefits. taken together, professional relationships among school team members are essential since they provide access to social resources such as knowledge, information, and expertise. school team members can only benefit from these resources when they have access to them through social interactions, and these interactions are facilitated by social relationships. a research method that allows us to investigate the formation of school team members’ relationships is social network analysis. in the following, we discuss degree centrality and network homophily as two important network concepts. 2.4. social network concepts and dynamics 2.4.1. degree centrality social network research has the potential to reveal the underlying network structure in an organization so that more insight in the exchange of resources within an organization can be established (cross, parker, & borgatti, 2002). in this study, networks are represented by school teams in which school team members are the actors or nodes. centrality is a commonly used concept within social network research that focuses on the position of a node or actor within a network (borgatti et al., 2013). the concept of centrality identifies the structural importance of a node, by looking at how many connections or relations one node has to other nodes. within social network research, these relationships are often referred to as ties. since our study aims to investigate relationships, we do not approach our data from the perspective of centrality as such, but rather from the perspective of interactions, measured by degree centrality. degree centrality is a frequently used measure for relationships within networks, and it refers to the number of ties a node has to other nodes. in a directed graph or network, degree centrality has two types: indegree and outdegree (borgatti et al., 2013). indegree centrality involves the number of incoming ties of an actor within a network. it can be seen as a measure of individual popularity (in the case of a positive tie network), since this measure is the number of colleagues by whom the respondent, or school team member, was nominated. outdegree centrality counts the number of outgoing ties of an actor within a network. it involves the number of colleagues nominated by the respondent, or school team member, which suggests a measure of individual activity (borgatti et al., 2013). consequently, degree centrality refers to the individual node level. a normalized inand outdegree score can be interpreted as the percentage of relationships that school team members maintain within the whole network. we focus on the calculation of degree centrality within the networks of discussing work, asking and providing advice, and collaboration, since these networks can be seen as a potential indicator of learning within school teams. moreover, we assume that there might be differences in degree centrality between different generational cohorts. for instance, kelchtermans (2006) mentioned that asking for advice from a colleague might be seen as a request for help, which is accepted for young or beginning teachers but not for experienced ones. this might imply that teachers of the youngest cohort are more likely to form more ties to ask advice than teachers of the older generational cohorts. spillane et al. (2012) see advice and information relationships as critical for teachers’ professional development and for knowledge development. in their study, more experienced teachers were less likely to receive advice and information from other colleagues, as compared to early career teachers (spillane et al., 2012). in terms of teacher collaboration, previous research of richter et al. (2011) showed that young teachers tend to collaborate more frequently than older colleagues. teacher collaboration seems to decrease with age (richter et al., 2011). regarding work related conversations, beginning teachers are more likely to interact with colleagues in order to overcome professional challenges and to exchange teaching ideas, as compared to experienced colleagues (grangeat & gray, 2007). geeraerts et al | f l r 83 2.4.2. network homophily the concept of network homophily, often referred to by the proverbial expression ‘birds of a feather flock together’, captures the idea that individuals are more likely to have ties with others who are similar to themselves on attributes such as age, race, gender, education, and values, than with individuals that are dissimilar to them (feld, 1982; mcpherson, smith-lovin, & cook, 2001). a study of marsden (1988) revealed that the greater the age difference between individuals, the less likely they were to discuss important matters with each other. in particular, the youngest age cohorts tend to have confiding relationships with individuals of their own cohort. whereas degree centrality focused on the individual level, network homophily refers to similarities between individuals and therefore focuses on the dyad level within networks. these dyads can be mutual, asymmetric or null dyads (wasserman & faust, 1994). in mutual dyads, actors within the network choose each other; ties are reciprocated. an asymmetric dyad refers to a one-directional tie in which one actor chooses the other actor, without being reciprocal. a null dyad indicates the absence of a tie between two actors. when dyads occur between actors with similar attributes, in our case being part of the same generational cohort, this can be labelled as homophily. individuals with similar background characteristics are more likely to have mutual experiences which in turn results in shared knowledge (reagans & mcevily, 2003). according to reagans and mcevily (2003) this common knowledge has a positive effect on knowledge transfer. therefore, we expect that teachers of the same generational cohort are more likely to engage in work related or so-called instrumental interactions. this concept of network homophily is also supported by the ideas of social identity theory (tajfel & turner, 1986). this theory follows a similar reasoning, suggesting that individuals have more positive perceptions towards people who are similar to them, compared to people who are dissimilar. this results in categorizations of in (“us”) and out (“them”) groups. within work-group diversity research, williams and o'reilly (1998) referred to this way of categorizing as a social categorization perspective. similarity of age characteristics can be seen as a trigger for these inand out-group categorizations (dencker, joshi, & martocchio, 2007). consequently, teachers might be more likely to form ties with colleagues of the same generational cohort. this implies that resources such as information and knowledge tend to flow within the particular generational cohort. following this reasoning, both outgoing and incoming relationships, also referred to as ties of sender and receiver, occur within a generational cohort rather than between different generational cohorts. 3. research questions we investigate whether belonging to a certain generation affects individual school team members’ likelihood of having relationships in networks through which resources can flow. more specifically, we focus on four instrumental networks, referring to professional relationships in which school team members discuss their work, receive or provide advice, and collaborate, since these relationships can be relevant for learning in school teams (geeraerts et al., 2016; novotný & brücknerová, 2014). therefore, the following research questions are set forward: rq1: to what extent do school team members of different generational cohorts differ in the number of professional relationships (indegree and outdegree; sending and receiving relationships)? rq2: to what extent does being part of the same generational cohort affect the likelihood of engaging in professional relationships within school teams (network homophily)? geeraerts et al | f l r 84 4. methodology 4.1. sample the data from this study was collected at 15 elementary schools in the netherlands. the data collection was part of a larger project on school improvement in the netherlands in which 53 schools participated. all schools were organised as a cluster of catholic schools, supported by a single catholic school board. we selected the subsample of schools based on the following criteria: school team size was 10 or more teachers, and in each school, each generational cohorts was represented by at least 20% of the respondents. this resulted in a final selection of 15 schools, which enabled us to investigate intergenerational relationships in schools where all generational cohorts were sufficiently represented. the sample consisted of all principals and teachers, including instructional coaches (teachers with specialised instructional tasks, such as emotional/behavioral support), since we wanted this selection to be as close as possible to the core teaching team. the sample did not include temporary and replacement teachers. the overall sample contained 284 teachers and 15 school principals (n=299). generational cohorts are based on chronological age. within this sample, three generational cohorts can be distinguished. the 'young' cohort contained 94 educators aged 35 years old or younger. the 'middle' cohort contained 87 educators from 36 to 50 years old. the 'old' cohort consists of 118 teachers older than 50 years. most school members (75%) were female. the sample demographics are summarized in table 1. table 1 sample demographics (n schools= 15, n respondents=299) categories teachers principals total number of school team members percentage generational cohort young cohort (-36 yrs) 93 1 94 31% middle cohort (36 – 50 yrs) 84 3 87 30% old cohort (50+ yrs) 107 11 118 39% gender male 63 11 74 25% female 221 4 225 75% total number 284 15 299 100% 4.2. data collection the survey included questions on job satisfaction, leadership, school team, strategy and policy, processes, citizenship in the classroom and general questions. the section on ‘school team’ contained sociometric questions that questioned the social networks within the school team. network questions used in this study were: geeraerts et al | f l r 85 • discussing work: whom do you turn to in order to discuss your work? • asking advice: whom do you prefer to go to for work related advice? • providing advice: to whom do you give work related advice? • collaboration: with whom do you like to collaborate the most? to answer these socio-metric questions, respondents were provided with a list of their school team members. according to marsden (2011), this list helps respondents to remind the alters in their network and so it also minimizes measurement error. in order to contribute to the premise of anonymous analysis of the data, the alter list contained a letter code for each alter (e.g. jessica thompson = ab). respondents were asked to indicate this letter code by completing the survey instead of mentioning the name and surname of the alter. there was no limitation to the number of colleagues a respondent could indicate as part of his/her network. an example of the visualization of a school team network is displayed in figure 1. figure 1. example of ‘asking advice’ network. 1 4.3. measures our dependent variable is the existence or the absence of a professional relationship between two school team members (a dyad). concretely, for every pair of school team members i and j, a value of 1 represents a relationship between i and j. for instance, i provides advice to j. a value of 0 indicates the absence of a tie between i and j. the mathematical representation of these relationship is an adjacency matrix composed by 0s and 1s (van duijn & vermunt, 2006). 4.3.1. individual level measures involve characteristics of the individual school team members. generational cohort in line with the findings of a study by richter et al.(2011), based on spearman’s correlation we found a high correlation between age and the number of years of teacher experience within education or within the school, r=0.84 and r=0.52 respectively. this implies that these measures are nearly interchangeable and suggests that the number of teachers who enter the teaching profession at later stages in their life is limited. consequently, we only take into account the variable of generational cohort based on chronological age. this measure refers to three age related categories: young, middle and old cohort. in the survey, teachers were asked to indicate their age (1 = 20-25 years, 2 = 26-30 years, 3 = 31-35 years, 4 = 36-40 years, 1 the youngest cohort is represented by circles, the middle cohort by squares, and oldest cohort by triangles geeraerts et al | f l r 86 5 = 41-45 years, 6 = 46-50 years, 7 = 51-55 years, 8 = older than 55). these age categories were first recoded into three categories that are in line with generational cohorts that can be found in the literature under the labels of generation y, generation x, and the baby boomers generation (young = 20-35 years, middle = 3650 years, old = 51 and older) (edge, 2014; geeraerts et al., 2016; glass, 2007; novotný & brücknerová, 2014). consequently, this categorical variable contains three categories that each cover approximately 15 years. finally, two dummy variables were generated, in which ‘middle’ and ‘old’ were contrasted with the ‘young’ cohort. gender this individual feature is coded in the following way: a value of 0 refers to male school team members, and a value of 1 refers to female school team members. function this measure takes a value of 0 for a teacher position, and a value of 1 for a principal position. 4.3.2. dyadic level measures, also named relationship covariates. generational cohort similarity three dummy variables of a generational cohort are used: youngest cohort, middle cohort, oldest cohort. the ‘absolute difference’ function in the p2 module is used to investigate the likelihood of a relationship when actors are part of a different generational cohort, in other words, not being part of the same generational cohort. 4.4. data analysis in order to respond to rq1, social network properties at the individual level were calculated by using software package ucinet 6.0 (borgatti et al., 2013). normalized degree centrality, both indegree and outdegree, was calculated. these measures can be interpreted as percentages. consequently, these normalized inand outdegree scores have a value from 0 to 100, in which 0 indicates the absence of relationships, and 100 refers to being tied to the entire school team. the variations in the percentages of incoming and outgoing relationships are provided by the standard deviations of the normalized inand out degree measures. in addition, the statistical software program ibm spss 24 was used to measure the effects of individual node characteristics on the network properties as a dependent variable. a one-way analysis of variance (anova) was conducted on each network question. for comparisons of the mean scores among the different generational cohorts, the post hoc test tukey with its significant difference procedure (α=0.05) was used. regarding rq2, we used the p2 package within the social network software stocnet (boer et al., 2006). by using p2 modelling, we investigated dyadic ties as the dependent variable. dyadic level factors focus on similarities and differences. the p2 model is a model for the statistical analysis of directed binary relationship data with actor and/or dyadic covariates (boer et al., 2006; zijlstra & van duijn, 2003). as such, a p2 model is designed to predict the likelihood of the formation of social relationships (e.g. work discussions) between pairs of actors (e.g. teachers and principals) based on individual and dyadic variables (e.g. belonging to a generational cohort (boer et al., 2006; zijlstra & van duijn, 2003). p2 models can be seen as a type of logistic regression model that takes the dependency between relationships from one to another actor into account (lazega & van duijn, 1997). ordinary logistic regression models cannot be used here since the assumption of data independence is violated. p2 modelling specifically focusses on complete, directed networks, which implies that every actor within the network can have ties with all other actors. however, the model can handle (some) missing data (van duijn & vermunt, 2006). the multilevel variant of the p2 model is an extension of the p2 model which can be used for the analysis of multiple networks. parameter estimates of the p2 model geeraerts et al | f l r 87 and the multilevel p2 model derive from the markov chain monte carlo (mcmc) procedures which are integrated in the p2 module of the social network analysis software stocnet (boer et al., 2006; zijlstra, van duijn, & snijders, 2006). the p2 model is designed to compute the likelihood of sending a relationship (cf. out-degree; called sender effect), receiving a relationship (cf. in-degree; called a receiver effect), and the likelihood of engaging in a relationship based on dyadic similarity (cf. homophily; called a reciprocity effect). a positive significant parameter estimate indicates a positive effect of the variable on the likelihood to form a relationship. for example, a positive significant sender effect of gender (male/female) indicates that female teachers have a higher likelihood to send relationships within the network than male teachers. in order to investigate homophily effects, the p2 software constructs dyadic matrices based on the absolute difference between two actors within the network. for instance, a dyad between a school team member of the youngest generational cohort and a school team member of the middle generational cohort represents a relationship between school team members of a different generational cohort. this absolute difference between being part of the youngest and oldest cohort (dummy variable=0) and being part of the middle cohort (dummy variable=1) is 1. in this example, a negative parameter estimate would suggest that a difference in generational cohort is related to a lower likelihood of having relationships. consequently, a negative parameter suggests that relationships between members of the same generational cohort are more likely to occur. as such, a negative reciprocity effect signals the existence of a homophily effect. regarding the significance level of the parameter estimates, the p2 output in stocnet does not directly provide this information. an additional wald test needs to be calculated by dividing the parameter estimate by the standard error of the estimator. when the ratio is smaller than -2 or larger than 2, a significant effect occurs at 0.05 level. 5. findings 5.1. to what extent do generational cohorts differ in the number of professional relationships? the first part of the study investigated if teachers of different generational cohorts show differences in degree centrality measures of the network questions. a one-way analysis of variance (anova) was conducted on each network question. the independent variable contained the generational cohorts and the dependent variable contained indegree centrality and outdegree centrality. 5.1.1. the number of outgoing professional relationships no significant generational cohort differences were found with regard to normalized outdegree centrality measures of the networks, as displayed in table 2. this implies that school team members of different generational cohorts do not significantly differ statistically, with regard to the average number of sending ties within the networks of discussing work, asking advice, providing advice, and collaboration. geeraerts et al | f l r 88 table 2 anova of normalized outdegree (n=15, n=299) young (1) middle (2) old (3) anova main effect m sd m sd m sd f p discussing work 0.32 0.19 0.33 0.22 0.27 0.19 2.448 0.088 asking advice 0.20 0.15 0.17 0.14 0.16 0.13 2.304 0.102 providing advice 0.20 0.19 0.22 0.23 0.19 0.19 0.460 0.632 collaboration 0.27 0.20 0.27 0.24 0.25 0.22 0.362 0.697 5.1.2. the number of incoming professional relationships regarding normalized indegree centrality, the main effect analysis found a statistically significant generational cohort difference in the network question ‘providing advice’ [f(2, 297)=9.003, p=0.000] and the network question ‘collaboration’ [f(2, 297)=9.367, p=0.000]. no significant differences were found for the networks of ‘discussing work’ and ‘asking advice’. descriptives of the mean values for the normalized indegree of the generational cohorts are displayed in table 3. table 3 anova of normalized indegree (n=15, n=299) young (1) middle (2) old (3) anova main effect post hoc tukey m sd m sd m sd f p discussing work “being chosen to discuss work with” 0.32 0.21 0.29 0.17 0.30 0.19 0.530 0.589 asking advice “being asked for advice” 0.16 0.18 0.18 0.16 0.18 0.16 0.633 0.532 providing advice “being provided with advice” 0.24 0.14 0.21 0.12 0.17 0.12 9.003** 0.000 1 > 3 ** collaboration “being chosen to collaborate with” 0.32 0.18 0.25 0.12 0.23 0.14 9.367** 0.000 1 > 3 * 1 > 2 ** * significant at 0.05 level ** significant at 0.01 level the post hoc test tukey with its significant difference procedure (α=0.05) was used for comparisons of the mean scores among the different generational cohorts. regarding the indegree of ‘providing advice’, members of the youngest cohort (m=0.24, sd=0.14) were found to have significantly higher ratings than members of the oldest cohort (m=0.17, sd=0.12). this implies that the youngest school team members geeraerts et al | f l r 89 receive advice from more colleagues than the members of the oldest generational cohort do. whereas young school team members are tied to on average 24% of the network, the oldest cohort is tied to on average 17% of the school team. there were no statistically significant differences found between the other generational cohorts in terms of indegree on giving advice. regarding the indegree of ‘collaboration’, the youngest group of team members (m=0.32, sd=0.18) was found to have significantly higher scores than members of the middle and oldest cohorts (m=0.25, sd=0.12; and m=0.23, sd=0.14, respectively). this implies that the youngest team members are chosen by more colleagues to collaborate with, as compared to the two older generational cohorts. young school team members form ties with on average 32% of their network, in contrast to 25% for the middle cohort and 23% for the oldest cohort. the above described anova analysis and findings provide insight in the average number of outgoing and incoming ties, however, insight in which generational cohort sends ties to and receives ties from which cohort is missing. in addition, the anova did not control for gender and function, which might give a different picture of sending and receiving ties. also, the occurrence of a homophily effect could not be tested by the previous analysis. therefore, we ran the p2 analysis to further our understanding in these dynamics. parameter estimates of the multilevel p2 models for investigating the effect of individual and dyadic level demographics on the likelihood of having relationships within the networks of discussing work, asking advice, providing advice, and collaboration, are presented in table 4. geeraerts et al | f l r 90 table 4 the effect of sender and receiver demographic variables on the likelihood of having relationships within the networks of discussing work, asking advice, providing advice, and collaboration. parameter estimates of the multilevel p2 models. (n=299) network discussing work asking advice providing advice collaboration pe (se) pe (se) pe (se) pe (se) overall effects density -1.98 (0.22) -2.12 (0.15) -2.51 (0.22) -2.17 (0.25) reciprocity 2.63 (0.18) 2.21 (0.17) 2.11 (0.18) 2.21 (0.21) sender covariates middle cohort 0.10 (0.21) -0.36 (0.22) 0.40 (0.23) 0.29 (0.27) old cohort -0.28 (0.18) -0.39 (0.13) 0.47 (0.19) 0.15 (0.21) gender 0.20 (0.15) 0.00 (0.00) 0.25 (0.18) 0.25 (0.21) function -0.09 (0.31) 0.62 (0.33) -0.06 (0.30) 0.86 (0.36) receiver covariates middle cohort -0.16 (0.19) 0.28 (0.12) -0.37 (0.14) -0.44 (0.17) oldest cohort -0.08 (0.16) 0.07 (0.15) -0.60 (0.13) -0.55 (0.14) gender 0.15 (0.17) 0.00 (0.00) 0.24 (0.13) 0.21 (0.14) function 1.35 (0.29) 0.97 (0.21) -0.38 (0.25) -0.17 (0.28) relationship covariates youngest cohort -0.21 (0.06) -0.18 (0.08) -0.15 (0.08) -0.15 (0.07) middle cohort 0.01 (0.07) 0.08 (0.08) 0.03 (0.09) 0.10 (0.08) oldest cohort -0.10 (0.06) -0.08 (0.07) -0.04 (0.08) -0.09 (0.07) random effects sender variance 1.19 (0.15) 0.72 (0.11) 0.54 (0.18) 1.78 (0.22) receiver variance 1.06 (0.13) 1.18 (0.16) 0.36 (0.08) 0.68 (0.11) covariance -0.78 (0.12) -0.62 (0.11) -0.58 (0.11) -0.71 (0.13) note: pe= parameter estimate; se= standard error; bold typeface refers to a significant pe; n=9309 dyadic relations from 299 school team members in 15 elementary schools geeraerts et al | f l r 91 first of all, overall effects show negative density effects and positive reciprocity effects within the four networks. these findings suggest that the networks are overall rather sparse, meaning that the likelihood of having a tie is lower than 50% for the reference group, which are dyads of young male teachers. the positive parameter estimates of reciprocity indicate a tendency of reciprocated ties instead of unidirectional ties throughout the different networks. with regard to the random effects, the positive and significant effects of sender and receiver variance indicate that there is considerable variation among school team members in the amount of ties they send and receive within the four networks. the negative sender-receiver covariance suggests that school team members who report to send more ties have a lower likelihood of receiving ties within their network, when allowing for differences between schools. 5.1.3. the likelihood to send professional relationships looking at the sender covariates, we found no significant effects for the network discussing work. in other words, none of the individual characteristics affected the likelihood of sending ties in a positive or negative way. more specifically, school team members of the middle or oldest cohort did not send more ties than young school team members, female not more than male, and school principals not more than teachers. within the other three networks, results indicated that some of the individual characteristics affected the likelihood of sending relationships. within the network of asking advice, being part of the oldest cohort decreases the likelihood of asking advice. in addition, the oldest cohort tends to send significantly more relationships of providing advice. within the network of collaboration, results reveal that being a school principal increases the likelihood of sending collaboration ties. 5.1.4. the likelihood to receive professional relationships regarding the receiver covariates, significant effects were found in all the four networks. being a school principal increases the likelihood of receiving relationships in discussing work and asking advice networks. also, being part of the middle cohort increases the likelihood of receiving relationships in the network of asking advice. this implies that principals and teachers of the middle cohort are more likely to be sought out for advice. in addition, the network of ‘providing advice’ shows that teachers of the middle and oldest cohort have a lower likelihood to receive advice relationships. the same trend can be found within the network of collaboration. both individual characteristics, being part of the middle and being part of the oldest cohort, decreases the likelihood of receiving collaboration ties. for both networks, providing advice and collaboration, the effect of being part of the oldest cohort is stronger than the middle cohort. this suggests that being part of the oldest cohort decreases the likelihood of the formation of a tie to a greater extent than their colleagues of the middle cohort. gender did not affect any of the sender and receiver relationships in the four instrumental networks. 5.2. to what extent does being part of the same generational cohort affect the likelihood of engaging in professional relationships? regarding the effects of the relationship covariates, homophily effects can be found for the youngest cohort within all the networks. this finding suggests that school team members of the youngest cohort are more likely to form ties with colleagues of the same generational cohort than with colleagues of the middle or the oldest generational cohort. this tendency of homophily occurs at the level of significance for the youngest generational cohort in the networks of discussing work, asking advice, and collaboration. the other generational cohorts, middle cohort and oldest cohort, do not show a significant homophily effect in all the networks. relationships in these cohorts can therefore be described as heterogeneous. we conclude that, in particular, school team members of the youngest cohort tend to form intra-generational ties, whereas older generational cohort members form inter-generational ties. geeraerts et al | f l r 92 6. conclusion and discussion in the present study we have focused on the role of being part of a generational cohort in the formation of professional relationships within elementary school teams in the netherlands. our first research question focused on differences between generational cohorts in terms of professional relationships being sent or received. to answer this question, we included both findings from descriptive anova analysis of degree centrality measures, and from a p2 model based on probability distributions. the latter analysis gave a slightly different picture which we explain by the fact that our p2 model provides a more sophisticated investigation of sender and receiver tendencies within the networks. therefore, we elaborate our conclusions primarily on the basis of this p2 model and indicate how they are in line with the anova analyses. when looking for differences in sending and receiving relationships, we noticed a significant role of generational cohort within the networks of asking advice, providing advice, and collaboration. within the network of discussing work we did not find any significant impact of belonging to a certain generation, neither in the anova and the p2 findings. next, we discuss the findings for each of the respective networks studied. regarding asking and providing advice, our results reveal that the middle cohort can be seen as an important source of advice within the school team. this is the cohort who is asked by most colleagues for advice. on the other hand, we noticed that the oldest cohort sees themselves as an important provider of advice, since this cohort provides more colleagues with advice. it must be noticed that members of the oldest cohort provide advice to colleagues who did not necessarily ask for it. the middle cohort does not certainly provide more advice, but they are asked for it by more colleagues. this finding underlines the importance of the middle cohort as a source of knowledge within the school team. in addition, we found that the youngest cohort is provided with advice by more colleagues, as compared to their older counterparts. relationships within the networks asking advice and providing advice can be interpreted as complementary. when looking at these networks asking and providing advice, it can be questioned to what extent providing advice is the result of asking advice. further research might pay attention to what degree providing advice is a voluntary action or the result of being asked for advice. when looking to the effects of generational cohort within these two advice networks, we also recognize the complementary tendencies. being part of the oldest cohort did decrease the likelihood of sending relationships for asking advice, and did increase the likelihood of sending relationships for providing advice. a second complementary tendency for asking and providing advice was found for the middle cohort. while being part of the middle cohort increased the likelihood of receiving relationships within the network asking advice, it decreased the likelihood of receiving relationships within the network of providing advice. being part of the oldest cohort decreased the likelihood of being provided with advice even more than the middle cohort. the anova results were supporting these findings, revealing that young school team members were more provided with advice than their oldest counterparts. a similar tendency has been observed by spillane et al. (2012), where more experienced colleagues were less likely to receive advice. our findings of both asking and providing advice relationships do relate to the traditional mentor models in which older or experienced teachers serve as knowledge providers to younger ones. within recent developments and ideas on intergenerational learning, knowledge supplies and demands are seen as important for all generational cohorts. consequently, our findings bring the importance of stimulating intergenerational relationships under attention. within the last network, collaboration, we found that being part of the oldest cohort did decrease the likelihood of receiving collaboration ties. a decrease of receiving collaboration relationships was also found for the middle cohort. these findings were also supported by the anova results. the youngest cohort is the most preferred cohort to collaborate with. previous research revealed that older teachers have positive perceptions towards their youngest counterparts in terms of enthusiasm and creativity (geeraerts et al., 2016). these positive perceptions about young teachers might contribute to preference to collaborate with them. we found similarities within the tendencies of sending and receiving relationships between the providing advice network and collaboration network. this raises questions on the existence of overlap geeraerts et al | f l r 93 between both networks. further research might investigate the extent to which instrumental relationships show overlap. whereas gender did not affect the likelihood to send or receive professional relationships, the other control variable function did. principals tend to mention more different colleagues they prefer to collaborate with. in addition, principals were also chosen by more colleagues to discuss work with and to ask advice to. further research might foreground the role of the school principal and investigate, for instance, the network position of the principal within professional school team networks, and, the role of the principal’s age. regarding our first research question we conclude that, for some networks, generational cohorts based on chronological age do matter in the formation of professional relationships. these findings also underline the importance of focussing on different instrumental networks since different age dynamics can be at play, and therefore give support to the findings of moolenaar (2010) to approach different professional networks as unique networks. future research might include expressive relationships in addition to instrumental relationships, for instance, by including friendship relationships. by doing this, the extent to which instrumental relationships are explained by expressive relationships can be investigated. our second research question captured the mechanisms of homophily within elementary school teams. teachers of the youngest cohort in particular seem to form relationships within their own generation for discussing work, asking advice, and collaboration, which is in line with the homophily effects for the young individuals in other contexts found by marsden (1988). from the viewpoint of intergenerational learning and intergenerational knowledge sharing, this is a worrying finding. this suggests that facilitating bidirectional intergenerational relationships is important for practice. further research, for instance by using qualitative methods, may dive deeper into the reasons why young teachers have this tendency. factors such as the level of trust, a safe and respectful climate, as perceptions of being evaluated or hierarchical perceptions of seniority are worth taking into account. in addition, it is worth to investigate how this tendency of homophily relates to early career teacher dropout and challenges such as practice shock or feelings of uncertainty (pillen, beijaard, & den brok, 2013; stokking, leenders, de jong, & van tartwijk, 2003). also, literature on intergenerational relationships often focusses on age bias and generational stereotyping within organizations (e.g. king & bryant, 2017; rupp, vodanovich, & credé, 2006). the effects of age stereotyping in school teams on the formation of professional relationships and intergenerational knowledge exchange might offer interesting starting points for further research. this study has limitations that suggest additional paths for future research. first of all, sending and receiving advice ties are only general indicators of knowledge flows. the content of the advice ties has not been included in this study. given the idea that teachers of different generations are able to provide different kinds of information and knowledge, it might be interesting to investigate content related advice relationships, for instance, whom do you go to for advice on classroom management? this will provide information on which knowledge can be seen as a demand or supply for a certain generational cohort. a second limitation is related to the fact that we did not have information about the amount of advice shared and the frequencies of interactions within the networks. an actor can receive more advice from one colleague than from a number of different colleagues. further research can more explicitly map the strength of teacher relationships across generational cohorts by looking at the frequency, length, and duration of contact (e.g. van waes et al., 2015). also, the relevance of the received advice has not been discussed yet; to what extent is provided advice valuable advice to a teacher? exchange of information or advice is no guarantee for learning to occur. for instance kyndt, vermeire, and cabus (2016) did not find a significant relationship of knowledge acquisition and access to information with informal workplace learning outcomes. this finding underlines the importance of further evaluating the relevance of the advice that is being shared (e.g. van der rijt, van den bossche, van de wiel, de maeyer, gijselaers, & segers, 2013; van der rijt, van den bossche, van de wiel, segers, & gijselaers, 2012). similarly, information on the quality of relationships between generational cohorts would be a valuable contribution for further research. this also opens up the discussion on the ‘unknown side of intergenerational learning’ which refers to the idea that intergenerational learning does not necessarily lead to positive learning outcomes. this is also connected to newly emerging geeraerts et al | f l r 94 social network research on ‘negative ties’. an interesting perspective on knowledge sharing within school teams might be to investigate the opposite, for instance, knowledge hiding. further research might focus on reasons why school team members would be shielding knowledge from their school team. previous research suggested not to use a too narrow approach on ‘generation’, but also to take into account factors such as work tenure and years of job experience (geeraerts et al., 2016; kooij, de lange, jansen, & dikkers, 2008). we did not include years of experience since this variable was too highly correlated with our age variable, which might cause problems of multicollinearity. we would argue that the conceptualisation of generations and operationalization of generations of teachers needs further elaboration. for instance, future research might focus on the relevance of age boundaries of generational cohorts or reveal whether there exists a linear effect of age? the division of generational cohort in this study was based on previous studies within the context of school teams (edge, 2014; geeraerts et al., 2016). in this study, our generational cohorts are diverse. for instance, the youngest cohort includes both inexperienced teachers in their induction phase and teachers with 10 years of experience. this division has not been included in this study, but potentially offers an important perspective to further unravel the complexity of teacher generations. due to the selection of our sample in which we targeted balance in terms of the presence of three generational cohorts within school teams, the generalizability of our study is limited to schools that are characterized by equal distribution of generational cohorts. an interesting path for further studies is to look at schools with different constellations of generational cohorts and examine whether these different constellations paint a similar picture regarding the formation of intergenerational relationships within school teams. would our results remain if a certain generational cohort is absent within a school team? differences in age demographic profiles of school teams might raise interesting questions for further research. also, our analysis technique (multilevel p2 modelling) has certain limitations since this model is restricted to dyadic relationships (spillane et al., 2012). the extent to which relationships between pairs of teachers are influenced by their relationships within the larger structure of the network (e.g. triads) are not taken into account. based on our findings, future studies might hypothesize more complex social network models and for instance use exponential random graph models (ergm) to explore triadic structures in intergenerational relationships. to conclude, we state that being part of a generational cohort based on chronological age does matter within these elementary school teams and that it plays a role in the formation of teachers’ professional relationships. both researchers and practitioners may regard social networks as a valuable concept to contextualise and investigate teacher interactions in order to further understand and support teacher intergenerational learning and teachers’ professional development in general. keypoints being part of a generational cohort affects the formation of relationships within the networks asking and providing advice, and collaboration. different generation dynamics are at play within different networks. young teachers are more likely to be mentioned as a preferred partner to collaborate with. old teachers are less likely to ask advice, while teachers of the middle generation are more being asked for advice. young teachers in particular tend to form relationships within their own cohort. geeraerts et al | f l r 95 acknowledgments we would like to thank prof. dr. marijtje van duijn (university of groningen, the netherlands) for her helpful comments on our p2 model. references argote, l., mcevily, b., & reagans, r. (2003). managing knowledge in organizations: an integrative framework and review of emerging themes. management science, 49(4), 571-582. doi:10.1287/mnsc.49.4.571.14424 ashton, d. n. (2004). the impact of organisational structure and practices on learning in the workplace. international journal of training and development, 8(1), 43-53. doi:10.1111/j.13603736.2004.00195.x baker-doyle, k. (2015). no teacher is an island: how social networks shape teacher quality. in g. k. letendre & a. w. wiseman (eds.), promoting and sustaining a quality teacher workforce (international perspectives on education and society (vol. 27, pp. 367-383). boer, p., huisman, m., snijders, t. a. b., steglich, c., wichers, l. h. y., & zeggelink, e. p. h. (2006). stocnet: an open software system for the advanced statistical analysis of social networks. version 1.7. . groningen: ics/science plus. borgatti, s. p., everett, m. g., & johnson, j. c. (2013). analyzing social networks. london: sage publications ltd. brücknerová, k., & novotný, p. (2016). intergenerational learning among teachers: overt and covert forms of continuing professional development. professional development in education, 1-20. doi:10.1080/19415257.2016.1194876 carolan, b. v. (2014). social network analysis and education. theory, methods & applications. los angeles: sage. choo, c. w. (1998). the knowing organization: how organizations use information to construct the meaning, create knowledge, and make decisions. new york, ny: oxford university press. cross, r., parker, a., & borgatti, s. p. (2002). making invisible work visible: using social network analysis to support strategic collaboration. california management review, 44(2), 25-46. daly, a. j. (2010). social network theory and educational change. cambridge: harvard education press. dencker, j. c., joshi, a., & martocchio, j. j. (2007). employee benefits as context for intergenerational conflict. human resource management review, 17(2), 208-220. doi:10.1016/j.hrmr.2007.04.002 doornbos, a. j., bolhuis, s., & simons, p. r. j. (2004). modeling work-related learning on the basis of intentionality and developmental relatedness: a noneducational perspective. human research development review, 3(3), 250-274. doi:10.1177/1534484304268107 edge, k. (2014). a review of the empirical generations at work research: implications for school leaders and future research. school leadership & management, 34(2), 136-155. doi:10.1080/13632434.2013.869206 edge, k., descours, k., & frayman, k. (2016). generation x school leaders as agents of care: leader and teacher perspectives from toronto, new york city and london. in k. leithwood, j. sun & k. pollock (eds.), how school leaders contribute to student success: springer international publishing. eraut, m. (2004). informal learning in the workplace. studies in continuing education, 26(2), 247-273. doi:10.1080/158037042000225245 feld, s. l. (1982). social structural determinants of similarity among associates. american sociological review, 47(6), 797-801. retrieved from http://www.jstor.org/stable/2095216 fuller, a., & unwin, l. (2004). young people as teachers and learners in the workplace: challenging the novice-expert dichotomy. international journal of training and development, 8(1), 32-42. doi:10.1111/j.1360-3736.2004.00194.x geeraerts et al | f l r 96 geeraerts, k., vanhoof, j., & van den bossche, p. (2016). teachers' perceptions of intergenerational knowledge flows. teaching and teacher education, 56 (may 2016), 150-161. doi: 10.1016/j.tate.2016.01.024 gerpott, f. h., lehmann-willenbrock, n., & voelpel, s. c. (2016). a phase model of intergenerational learning in organizations. academy of management learning & education. doi:10.5465/amle.2015.0185 glass, a. (2007). understanding generational differences for competitive success. industrial and commercial training, 39(2), 98-103. doi:10.1108/00197850710732424 grangeat, m., & gray, p. (2007). factors influencing teachers' professional competence development. journal of vocational education & training, 59(4), 485-501. doi:10.1080/13636820701650943 johnson, j. c. (1990). the primacy and potential of high school departments. in m. w. mclaughlin, j. e. talbert & n. bascia (eds.), the contexts of teaching in secondary schools (pp. 167-184). new york: teachers college press. kelchtermans, g. (2006). teacher collaboration and collegiality as workplace conditions. a review. zeitschrift für pädagogik, 52(2), 220-237. kelchtermans, g., & ballet, k. (2002). the micropolitics of teacher induction. a narrative-biographical study on teacher socialisation. teaching and teacher education, 18, 105-120. king, s. p., & bryant, f. b. (2017). the workplace intergenerational climate scale (wics): a self-report instrument measuring ageism in the workplace. journal of organizational behavior, 38(1), 124-151. doi:10.1002/job.2118 kooij, d., de lange, a., jansen, p., & dikkers, j. (2008). older workers' motivation to continue to work: five meanings of age. a conceptual review. journal of managerial psychology, 23(4), 364-394. doi:10.1108/02683940810869015 kwakman, k. (2003). factors affecting teachers' participation in professional learning activities. teaching and teacher education, 9(2), 149-170. doi:10.1016/s0742-051x(02)00101-4 kyndt, e., vermeire, e., & cabus, s. (2016). informal workplace learning among nurses. organisational learning conditions and personal characteristics that predict learning outcomes. journal of workplace learning, 28(7), 435-450. doi:10.1108/jwl-06-2015-0052 lave, j., & wenger, e. (1991). situated learning: legitimate peripherical participation. cambridge: cambridge university press. lazega, e., & van duijn, m. (1997). position in formal structure, personal characteristics and choices of advisors in a law firm: a logistic regression model for dyadic network data. social networks, 19, 375397. lee, t., fuller, a., ashton, d. n., butler, p., felstead, a., unwin, l., & walters, s. (2004). workplace learning: main themes & perspectives. learning as work research paper 2. university of huddersfield. lin, n. (1999). building a network theory of social capital. connections, 22(1), 28-51. lohman, m. c. (2000). environmental inhibitors to informal learning in the workplace: a case study of public school teachers. adult education quarterly, 50(2), 83-101. doi:10.1177/07417130022086928 lohman, m. c. (2006). factors influencing teachers' engagement in informal learning activities. journal of workplace learning, 18(3), 141-156. doi:10.1108/13665620610654577 mannheim, k. (1952). essays on the sociology of knowledge. london, uk: routledge & kegan paul. marsden, p. v. (1988). homogeneity in confiding relations. social networks, 10, 57-76. marsden, p. v. (2011). survey methods for network data. in j. scott & p. j. carringston (eds.), the sage handbook of social network analysis (pp. 310-388). thousand oaks, ca: sage publications. mcpherson, j. m., smith-lovin, l., & cook, j. m. (2001). birds of a feather: homophily in social networks. annual review of sociology, 27, 415-444. doi: 10.1146/annurev.soc.27.1.415 meredith, c., van den noortgate, w., struyve, c., gielen, s., & kyndt, e. (2017). information seeking in secondary schools: a multilevel network approach. social networks, 50, 35-45. doi:10.1016/j.socnet.2017.03.006 moolenaar, n. m. (2010). ties with potential. nature, antecedents, and consequences of social networks in school teams. (doctoral dissertation), university of amsterdam. geeraerts et al | f l r 97 novotný, p., & brücknerová, k. (2014). intergenerational learning among teachers: an interaction perspective. studia paedagogica, 19(4). doi:10.5817/sp2014-4-3 pillen, m., beijaard, d., & den brok, p. (2013). tensions in beginning teachers' professional identity development, accompanying feelings and coping strategies. european journal of teacher education, 36(3), 240-260. doi:10.1080/02619768.2012.696192 reagans, r., & mcevily, b. (2003). network structure and knowledge transfer: the effects of cohesion and range. administrative science quarterly, 48(2), 240-267. doi:10.2307/3556658 richter, d., kunter, m., klusmann, u., lüdtke, o., & baumert, j. (2011). professional development across the teaching career: teachers' uptake of formal and informal learning opportunities. teaching and teacher education, 27, 116-126. doi: 10.1016/j.tate.2010.07.008 ropes, d. (2011). intergenerational learning in organisations. a research framework. in cedefop (ed.), working and ageing. guidance and counselling for mature learners. luxembourg: publications office of the european union. ropes, d. (2013). intergenerational learning in organizations. european journal of training and development, 37(8), 713-727. doi:10.1108/ejtd-11-2012-0081 rupp, d. e., vodanovich, s. j., & credé, m. (2006). age bias in the workplace: the impact of ageism and causal attributions. journal of applied social psychology, 36(6), 1337-1364. doi:10.1111/j.00219029.2006.00062.x sfard, a. (1998). on two metaphors for learning and the dangers of choosing just one. educational researcher, 27(2), 4-13. doi:10.3102/0013189x027002004 spillane, j. p., kim, c. m., & frank, k. a. (2012). instructional advice and information providing and receiving behavior in elementary schools: exploring tie formation as a building block in social capital development. american educational research journal, 49(6), 1112-1145. doi:10.3102/0002831212459339 starks, a. (2013). the forthcoming generational workforce transition and rethinking organizational knowledge transfer. journal of intergenerational relationships, 11, 223-237. doi:10.1080/15350770.2013.810494 stokking, k., leenders, f., de jong, j., & van tartwijk, j. (2003). from student to teacher: reducing practice shock and early dropout in the teaching profession. european journal of teacher education, 26(3), 329-350. doi:10.1080/0261976032000128175 tajfel, h., & turner, j. c. (1986). the social identity theory of intergroup behavior. in s. worchel & w. g. austin (eds.), psychology of intergroup relations (pp. 7-24). chicago, il: nelson-hall. thambi, m., & o'toole, p. (2012). applying a knowledge management taxonomy to secondary schools. school leadership & management, 32(1), 91-102. doi:10.1080/13632434.2011.642350 tynjälä, p. (2008). perspectives into learning at the workplace. educational research review, 3, 130-154. doi:10.1016/j.edurev.2007.12.001 van duijn, m., & vermunt, j. k. (2006). what is special about social network analysis? methodology, 2(1), 2-6. doi:10.1027/1614-1881.2.1.2 van der rijt, j., van den bossche, p., van de wiel, m. w. j., de maeyer, s., gijselaers, w. h., & segers, m. s. r. (2013). asking for help: a relational perspective on help seeking in the workplace. vocations and learning, 6, 259-279. doi:10.1007/s12186-012-9095-8 van der rijt, j., van den bossche, p., van de wiel, m. w. j., segers, m., & gijselaers, w. h. (2012). the role of individual and organizational characteristics in feedback-seeking behaviour in the initial career stage. human research development international, 15(3), 283-301. doi:10.1080/13678868.2012.689216 van knippenberg, d., de dreu, k. k. w., & homan, a. c. (2004). work group diversity and group performance: an integrative model and research agenda. journal of applied psychology, 89(6), 10081022. doi:10.1037/0021-9010.89.6.1008 van waes, s., van den bossche, p., moolenaar, n. m., de maeyer, s., & van petegem, p. (2015). knowwho? linking faculty's networks to stages of instructional development. higher education, 70(5), 807826. doi:10.1007/s10734-015-9868-8 geeraerts et al | f l r 98 wasserman, s., & faust, k. (1994). social network analysis: methods and applications. new york: cambridge university press. williams, k. y., & o'reilly, c. a. (1998). demography and diversity in organizations: a review of 40 years of research. research in organizational behavior, 20, 77-140. wolff, c. e., van den bogert, n., jarodzka, h., & boshuizen, h. p. a. (2015). keeping an eye on learning: differences between expert and novice teachers' representations of classroom management events. journal of teacher education, 66(1), 68-85. doi:10.1177/0022487114549810 zijlstra, b. j. h., & van duijn, m. a. j. (2003). manual p2. version 2.0.0.7. groningen: iec progamma/university of groningen. zijlstra, b. j. h., van duijn, m. a. j., & snijders, t. a. b. (2006). the multilevel p2 model: a random effects model for the analysis of multiple social networks. methodology, 2(1), 42-47. doi:10.1027/16141881.2.1.42 editorial
the journey to proficiency: exploring new objective methodologies to capture the process of learning and professional development
christian harteis, ellen kok, halszkajarodzka
introduction
over the last decades, educational research established different foci on learning – all of them aiming at understanding how learning takes place and how its outcomes can be improved by instruction. they can be distinguished regarding the object of observation: (a) there is research focusing input to learning processes, particular discussing content and teacher or learner characteristics influencing learning; (b) other research approaches exist that focus learning processes themselves, (c) finally there is research focusing learning outcomes. what varied over the time is the emphasis of these foci. for example, the current interest on international comparisons of educational systems (i.e. timss, isglu, pisa) addresses research with focus on learning outcomes.
within the last two decades, the focus of research on learning and instruction shifted from an emphasis on the outcomes back to the processes that underlie learning. however, in contrast to process research between 1960ies and 1980ies that investigated school-based classroom teaching (shulman, 1986), current process research spreads across the entire lifetime and across all skill levels, from kids studying a textbook for 20 minutes to professionals developing extraordinary expertise over years. how varying these fields may seem, their mutual aim is to understand the mental structures and their changes through learning including social, motivational, and emotional aspects influencing learning processes. however, for decades it was extremely challenging to capture these processes in a meaningful manner.
recent development of software methodology and hardware technology opened fascinating opportunities for educational research. on the one hand, the development of data analysis methodology, such as machine learning or data sequence comparison and string detection, is reaching a level so that it can be applied to approach new research questions within educational science. on the other hand, a variety of sensors made various measurements originating from highly specialized fundamental research (e.g., electroencephalography, cardiovascular measures, infrared eye-tracking) seemingly accessible and applicable. for instance, in the early 2000’s companies started building ‘plug & play’ eye trackers with ready-to-use analysis software that claimed to guide its users intuitively. more recently, relatively cheap wearable devices for measuring brain activity (eeg) and electrodermal activity have become available. it is the combination of decreasing prices for and sizes of sensor systems with increasing usability of operating and analysis software and the development of novel, easier to apply analysis methodologies that reduced inhibition threshold for application the area of education.
hence, many researchers started utilizing these methodologies in a wide area of educational research. however, quickly it turned out that neither was the use of the hardware as easy as the sellers claimed, nor was the analysis of the data as straightforward. all online measures of learning create a kind of data that is not comparable with traditional qualitative or quantitative empirical data. many online measures collect longitudinal data in a frequency of milliseconds and, thus, generate thousands of data-values. while researchers found many opportunities these measures offer, they also faced many challenges. these comprise a variety of problems, e.g. detecting meaningful events in high-frequency measures, combining process measures of different granularities, synchronizing measures, capturing the sequential nature of learning processes and defining reasonable frequencies for statistically analyzing skewed, multilevel data sets. what keeps happening, though, is that researchers face the same issues or similar problems on and on as they cannot get easily access to the progress made by others in their field. due to this lack of exchange, researchers often have to re-invent the wheel. on top of that, online measures of learning operate on a granularity that does not easily match with the predictions that can be made from our current theoretical models of learning and expertise development. online measures provide data on a very micro level of learning whereas theoretical models usually address the macro level of development. thus, researchers that use process measures are in need of ways to exchange thoughts, not just about methodological issues, but also how methodological choices relate to theoretical models.
hence, researchers started activities to establish more or less formal groups which aim at sharing their experiences (e.g., one of these is the earli sig 27 ‘online processes of learning’). however, also within already existing communities the interest for these measures rose, such as in the earli sig 14 ‘learning and professional development’. we argue that this exchange is crucial for meaningful and fruitful further development within educational research, in particular as the usage of such techniques is growing. just as an example, the methodology of eye tracking is extremely growing with over 700.000 google scholar hits in the past decade (~370.000 in 1998 – 2007, ~ 66.000 in 1988 – 1997, and less than 30.000 publications before 1987).
therefore, it is important to go beyond purely talking about experiences with process measures. what is needed is explorative, methodology-focused research in order to initiate negotiation within the scientific community of educational researchers on how these novel approaches of data-collection and data-analysis contribute to the communities’ state of knowledge on learning and instruction. thus far, the challenges of using process measures go by unnoticed as it is hardly possible to discuss them in traditional empirical study papers, which focus on knowledge structures and their changes through learning instead of on methodological developments. the current special issue in frontline learning research is a first step to fill this gap.
about this special issue
this special issue aims at exploring possibilities of using process data on learning in different contexts and critically discussing exactly these methods with respect to their explanatory power for learning and expertise development, and for gaining insight on its underlying processes operating within cognitive structures. these contributions present experiences in applying new methodologies and put the findings up for discussion. the goal of all contributions is to reflect the strengths and limitations of their measures and to provide a statement on how informative their data can be for researching learning. it addresses the broad readership within the earli community. the idea for this special issue resulted from two well attended and highly appreciated sig-invited symposia (sig 14 and sig 27) for the earli conference at tampere, finland in 2017. a public call for contributions provided additional contributions and we invited three discussants to reflect upon the articles. now, more than one year after this conference, this special issue provides a broad set of papers reporting and reflecting selected approaches of online measures of learning processes. we intentionally considered contributions from various fields and domains of educational research. the earli community represents educational research covering the entire life-span and applying laboratory conditions as well as conducting field research. all contributors were encouraged to discuss carefully if and how the selected methodological approaches and data can be informative not only for the context of the respective paper but for the educational research community in general. hence, we hope that everybody can find novel, interesting and fruitful information within this special issue.
the methods discussed in this special issue cover neuroscience topics, such as the possibilities of neuroscience for education (van atteveldt et al., this issue), eeg (scharinger, this issue). it also discusses different applications of eye tracking, such as machine learning to analyze eye tracking data (garcia moreno-esteva et al., this issue; harteis et al., this issue), its limitation in investigating web search processes (salmeron et al., this issue), or the challenge of combining it with musical performance (puurtinnen et al, this issue). further contributions discuss the use of physiological data, such as skin conductance (eteläpelto et al.; nokelainen et al.; both this issue) and combine it with further measures to study collaboration (hoogeboom et al., this issue), logfile analyses to study blended learning (van laer & ellen, this issue), or even prosodic analyses of conversations in classrooms (hämäläinen et al., this issue). several contributions use specifically multimodal aspects to study collaboration (hoogeboom et al., this issue), observational data for analyzing teachers’ behavior (donker et al., this issue), or self-regulated learning (järvenoja et al, this issue).
overarching themes
taken together, the contributions to this special issue discuss opportunities and limitations of process measures, their combination with each other, or their combination with conventional measure for analyzing learning processes. they reveal the following overarching challenges.
objective data
one important benefit of process measures as presented here, is that they do not rely on self-reports, and, thus, can be considered as objective data. we must keep in mind, though, that their interpretation remains subjective to the experimenter. furthermore, these methods open opportunities to gather data about unconscious regulation processes, whereas self-reports necessarily provide access only about what participants are aware of. it is important to keep in mind, however, that sometimes subjective data is more appropriate for a given research question. a clear understanding of what type of data would fit a certain research question remains the crucial challenge of utilizing online measures.
multimodal data
as several contributions to this special issue reveal, one important development is the increased use of multimodal designs. such designs aim to gather a broader understanding of real-world learning situations by using different types of data (e.g., a combination of psychophysiological data, video data and data from sociometric badges). severe challenges hereby are that the added value from each of those types of data for the research question needs to be clear, that the data is often collected at different levels (e.g., some data was collected at the team level and other data was collected at the individual level), as well as at different sampling frequencies (e.g., eye-tracking data can be measured at a level of 500 hz, skin conductance levels are measured at 4 hz, whereas team effectiveness is only measured once). synchronization of data can be challenging, both on a practical level (i.e., data files should use the same time representation) as well as on a sampling level (i.e., data from different participants need to be synchronized).
an important aspect of many of these process measures is that they are potentially personal data (i.e., they may be traced back to one specific person). this is particularly the case if several data sets are collected from one participant. in such a case, the new gdpr regulations come into play (https://eugdpr.org). we should also keep in mind that even if a process data set is not yet easily traceable back to one person, it may easily become so with the fast development of machine learning techniques in near future.
analysis
those large data-sets also call for an improvement in our techniques of analysis, as the current conventional statistics do not allow for a non-biased analysis of large numbers of variables at the same time. furthermore, data is often averaged or summed up over time, so the temporal order of data might get lost, while the process of interest could be reflected in the temporal order. data mining and machine learning techniques are one important option that apply complex algorithms based on different mathematical models than inferential statistics. they provide novel opportunities of revealing hidden patterns within huge data sets. hence, they allow for testing the predictive value of sets of variables for certain outcome measures and, thus, make it possible to quantify and statistically test which online measures predict the outcome.
at the same time, a careful theory-driven decision as to what variables should reflect the processes of interest is still critical, and the fact that large numbers of variables are available should not tempt us to simply report large numbers of analyses, and cherry-pick the interesting (significant) effects for discussion.
ecological validity
many of the authors argue that their designs allow for collecting data in the field (instead of laboratory environments), and, as such, increases ecological validity. the underlying assumption is that ecological validity, the extent to which the study approximates ‘the real world’, predicts external validity, the extent to which the study generalizes to ‘the real world’. in particular because many of the measures do not disturb natural task performance. however, studies within this special issue showcase that ecological validity does not necessarily translate into external validity (e.g., fluctuations in skin conductance under ‘natural’ conditions might also reflect body posture or movement instead of mental effort). it depends on the context and the research interest if and how fuzzy data appear acceptable or precise data are required. we can find argumentations that a compromised quality of, e.g., easy-to-use eeg apparatus is acceptable in the context of the new opportunities for neuroimaging in ‘real-world’ settings, and we can find argumentations that low data quality and the confounding effects that could occur in ecological valid environments may occlude real effects and, thus, compromise external validity.
hence, it is often useful to start from hypotheses generated by laboratory research and investigate these in real-world settings. however, the opposite could also provide insights: taking important but tentative findings from field studies and bringing these into the lab (including, where possible, the real-life complexity) to understand the mechanisms involved in more detail.
more ecologically valid testing environments are mostly useful when appropriate analysis techniques are available. the fixation-related eeg frequency band power analysis that scharinger introduced (in this special issue), for example, is an analysis technique that makes it possible to investigate multimedia learning environments using eeg, where previous eeg research on reading required the presentation of single words instead of free reading tasks.
coupling of high-level theoretical models to fine-grained data streams
common theoretical concepts of learning and development describe processes that usually last (much) longer than milliseconds. online measures, however, bear the particular quality to provide data on a very high resolution. it is important to keep the different granularity in mind when developing research questions and designing research settings. there may be theoretical frameworks that require less precision than others: understanding visual expertise and pattern recognition, on the one hand, focuses detailed phenomena of physical behavior that may remain under the surface of consciousness. this case requires quite a precise coupling of theories of vision with the data-streams. on the other hand, investigating the importance of emotions for learning may allow more fuzziness, as long as the duration of an emotional state is considered less important for learning processes than the pure occurrence of an emotion. hence, there is neither ‘the’ challenge of coupling theories with data, nor is there the ‘one fits for all’ solution. in the developing field of researching learning with process data the full breadth of opportunities can be found. what concretely is to be considered crucially depends on necessary theoretical decisions.
conclusions and outlook
all contributions have put their findings up for discussion and reflected on strengths and limitations of the measures applied in their studies. as such, this special issue provides a valuable resource for any researcher who already works with process measures or start working with process measures. based on the contributions, we derived suggestions on how to implement new methods and technologies to our applied field of educational science in a meaningful way.
choosing a method
when thinking about using a new method or technology for research, we always must as ourselves, why we need it. is it to address an otherwise not possible to address hypothesis? is it to explore thus far hidden processes? or is it rather to simply try out a fancy, new technology that was thrown at us? whichever the answer may be, we need to be clear about it. based on the experiences gathered in this special issue, we strongly advice to always consider how this new methodology or technology will help to approach the research question. moreover, we recommend to be cautious when being drawn towards new gadgets out of pure curiosity and we advise to stay always as low tech as possible and as high tech as necessary.
implementing a methodology when researchers start to use a new method, it is critical that they understand where the method comes from, what its history is (i.e., in which fields is it already successfully applied and how?). this often helps to understand why certain approaches were chosen and decisions were made. how is this technique currently used in its ‘home’ domain? how are the experiments set up? what are the analysis techniques? even though many of the ‘new’ methods are (relatively) new to educational research, often there is a large body of research available in other domains, that can inform the researchers. it is important to get to know this field to make sure that no huge mistakes are made because you do not know the field, as this might results in invalid data recording or analysis. we suggest to always begin by cooperating with an expert from the original field.
analysis
since computers and software developed similar rapidly further, it is now possible to utilize calculator power and software algorithms for completely new procedures of data analysis (e.g., big data, data-mining). since we, as educational researchers, might not have the expected background in statistics, computer science or data-science to execute those kind of procedures, it is central that we collaborate with researchers and practitioners from other fields. the combination of different types of expertise is central for progress in the use of process measures.
interpretation
an important challenge is how to incorporate these new methodologies that measure fine grained processes with our theories that make statements on more macro levels. how to derive predictions from our theories to these measures and how to make meaningful statements for our theories from these empirical findings? these questions should not only guide our choice of research methodology, but also challenge us to further develop and specify existing theories and frameworks.
drawing conclusions
to conclude, we hope that this special issue provides a starting point for more methodological papers in educational sciences, which critically discuss the application of (new) technological approaches and process measures, including validity information for that (set of) measure as well as practical advice for their use.
this field is developing rapidly, so we also tried to realize this special issue quite swift in order to contribute to the start of the scientific discourse on those issues. we are aware though, that the development just started and we are far away from fully understanding potential and limitations of these new kind of data and measurements. hence, we hope to see more methodological publications discussing new ways of capturing learning processes in the future!
acknowledgements
the guest editors would like to thank the reviewers of the special issue.
references
shulman, l. s. (1986). paradigms and research programs in the study of teaching. in m. wittrock (eds.), handbook of research on teaching (pp. 3-36). new york: macmillan.
hämäläinen et al publication frontline learning research vol.6 no.3 (2018) 204 227 issn 2295-3159 it’s not only what you say, but how you say it: investigating the potential of prosodic analysis as a method to study teacher’s talk raija hämäläinena, bram de weverbteija waaramaac, anne-maria laukkanenc, joni lämsäa a university of jyväskylä, finland b ghent university, belgium c university of tampere, finland article received 13 may 2018 / revised 22 october/ accepted 28 november / available online 19 december abstract in this study, we introduce new insights into prosodic analyses as an emerging method to study what happens in classrooms interactions. we claim that the prosodic aspects (features of speech such as intonation, volume and pace) of talk are important, but under-represented in the learning sciences. these prosodic aspects may be used to complement, intensify or even reverse the linguistic content of speech. thus far, most research on classrooms has focused on the content (what is said) rather than on understanding the meaning of the prosodic features (how it is said) of talk. in this study, we introduce prosodic analyses as a method to study classroom discussions. our exploratory experiment focuses on the prosodic perspective of teacher’s talk to shed light on classrooms interactions. we present a case in which we align prosodic features with the content of teacher's talk during a nine-week physics course. this article shows that prosodic analyses may have added value for research on learning and professional development. namely, we illustrate that acting in an authentic classroom setting might trigger specific prosodic aspects in teacher's talk. we further found indications that the teacher applied different voice prosody regarding certain patterns of classroom talk. for the future, we suggest that a combination of content and prosodic analysis is a promising tool for gaining new insights into classroom interactions. keywords: teacher’s talk; prosodic analyse info corresponding author mail raija.h.hamalainen@jyu.fi doi: https://doi.org/10.14786/flr.v6i3.372 1. introduction multiple methods and techniques are required to understand what happens in classrooms, and while many researchers have investigated the content of talk – for example, with content analysis (de wever et al., 2006; hämäläinen & de wever, 2013) and discourse or conversation analysis (mercer & dawes, 2014; warwick, vrikki, vermunt, mercer, & van halem, 2016) – few researchers have attempted to understand the prosodic features of talk (gweon et al., 2013). the present study methodologically bridges the gap between two research domains to advance research on classroom discussions via the analysis of prosodic features. this analysis focuses on elements such as intonation and pitch on teacher’s talk in an authentic classroom context from a sociocultural perspective. 1.1 teachers’ talk in the classroom: educational dialogues and teacher monologues according to mercer, dawes, and staarman (2009), authentic classroom situations typically involve non-dialectic teacher monologues and educational dialogues. in non-dialectic situations, typically only the teacher talks (teacher monologue). in classroom contexts, in addition to educational dialogues, non-dialectic situations may be necessary and an intriguing way to stimulate learning. on the other hand, nowadays, the role of the teacher is changing from only providing knowledge to also supporting students’ knowledge construction activities. during educational dialogue – also referred to as productive classroom talk, which has an analogous meaning (muhonen et al., 2017) – both students and teachers talk. according to mercer (1995), classrooms set up ‘sceneries’ of educational dialogues where, ideally, teachers and their students will collaboratively discuss the topic about which they are learning. in these educational dialogues, teachers engage their students in discussions that include a series of questions and answers. for sociocultural research, the creation of meaning is inherently an intrapersonal process, and ways of thinking are embedded in particular ways of using language (littleton & mercer, 2013). dialogue may thus be said to be more than ‘just talk’ (o’connor & michaels, 2007). dialogue is talk that is productive and, therefore, should be the central interest of analysis (grounded on the work of vygotsky, 1987). according to muhonen et al. (2017), previous studies on learning have indicated that the quality of teacher-student dialogue is associated with the growth of students’ understanding (alexander, 2001; lemke, 1990; mortimer & scott, 2003). as a direct result, the field particularly needs to understand teacher talk during educational dialogues that creates opportunities to promote learning (see also mortimer & scott, 2003). educational dialogues are influenced first by differential power relations between teachers and students (lemke, 1990) and second by differential knowledge relations, for example by being either the ‘primary’ knower (typically a teacher) or a ‘secondary’ knower (typically a student) regarding the topic under discussion (berry, 1981). the teacher also plays a special role in guaranteeing that students benefit from classroom activities (nassaji & wells, 2000). some scholars have argued that in (science) classrooms, educational dialogues are likely to follow triadic dialogue patterns (lemke, 1990), especially during whole-class discussions (lemke, 1990; mehan, 1978; mortimer & scott, 2003; salloum & boujaoude, 2017). for example, according to nassaji and wells (2000), educational dialogues typically proceed along an initiation-response-feedback (i-r-f) pattern, which includes three phases. first, the teacher initiates a question (usually with a known answer); second, one or more students respond to that question; third, the teacher evaluates the answers, provides feedback and may or may not ask for follow-up questions or activities. wells (1993) further highlights that the students’ responses are a crucial element of i-r-f, since without such responses, there is no exchange (dialogue). sinclair and coulthard (1975) have also called the third stage (feedback) ‘follow-up’ and further define three types of follow-up acts: (1) accept or reject, (2) evaluate and (3) comment, which includes exemplifying, expanding, and justifying. 1.2 acoustic speech research from the research area of acoustic speech and voice research, it is known that prosodic features affect speech (later referred to as talk) perception and discussion. by prosodic features, we mean vocal characteristics like pitch variation and stress pattern, pausing, tempo, mean pitch and loudness, and vocal quality. prosody refers to the intentional or unintentional use of these characteristics to convey the meaning of an utterance (see the links in the appendix for more information on prosody). for example, prosody may signal one’s psychophysiological activity level and emotions. changes in pitch and/or loudness and tempo may reflect changes in activity or arousal level (vilkman & manninen, 1986; laukkanen et al., 1997; waaramaa et al., 2010; waaramaa et al., 2014). activity or arousal level can be low, moderate, or high. typically, a positive emotion of joy and a negative emotion of anger have a high arousal level, while tenderness and sadness have a low arousal level. a high arousal level is typically expressed by high pitch and loudness and a firmer (more pressed, tense) voice quality (laukkanen et al., 1997; waaramaa et al., 2010). in a low arousal level, the mean pitch and loudness are lower, and the voice quality is less firm (softer, laxer). the valence of the emotion (i.e. whether it is positive, negative, or neutral) may be conveyed by a complex combination of features including voice timbre (e.g. a brighter voice is associated with a more positive emotion than a darker voice, possibly because smiling makes the voice timbre brighter) (laukkanen et al., 1997; waaramaa et al., 2006). 1.3 analysing the prosody of teacher talk from the methodological perspective to understand what happens in classroom, we have to take into account that perceptions of paralinguistic and nonverbal characteristics of talk are, to a large extent, subconscious (see e.g. zald, 2003). these subconscious characteristics influence discussion processes, as according to brazil (1978), intonation in discourse illustrates the interaction. various studies have shown the effect of prosodic aspects on the listeners’ opinion of the speaker and perceptions of his/her personality (e.g. addington, 1968; lukkarila et al., 2012; scherer, 1972; zellner keller, 2004). prosodic (or suprasegmental) features intensify what is said or add meaning to the segmentals (phonemes). additionally, several studies have indicated that prosodic features can even reverse the meaning of a message (see e.g. laver, 1991; lehiste, 1970; scherer & giles, 1979) and some studies have shown that when linguistic-semantic content and prosodic-paralinguistic content are contradictory, the latter usually wins (lyons, 1977). therefore, we need a better understanding on what kind of role prosodic aspects play in teacher’s talk. divergent intonation patterns are also known to be used to praise and encourage students, to minimise the embarrassing effect of a wrong answer and to open and close discussions (hellermann, 2003). furthermore, intonation may be used to mark text cohesion (e.g. halliday & hasan, 1976), which improves speech comprehension. studies have also shown that a teacher’s dysphonic voice quality (e.g. irregularities in the sound signal, perceived as vocal fry or hoarseness) negatively affects students’ comprehension of instruction (see imhof et al., 2014; lyberg åhlander et al., 2014; rogerson & dodd, 2005). on the other hand, the classroom activity is also affected by environmental factors, like the size of the student group and classroom acoustics. for instance, a noisy environment requires a higher volume. consequently, a teacher may unintentionally modify other prosodic aspects, like intonation or voice quality, which may restrict the natural use of prosodic variation in conveying content or even provoke contradictory connotations. this may influence teacher’s talk and is another reason to investigate possibilities and limits of prosodic analysis as an approach for gaining insights into classroom talk. 1.4 aims the motivation for this article is that the content features of talk might not be sufficient for describing and understanding the true nature of classroom activities. therefore, methodological development is needed. this article aims to identify novel methods for studying teacher talk, combining both content and prosodic perspectives in this analysis. we concretize the methodological approach in light of two selected physics lessons. special attention will be paid to the applicability and restrictions of the selected methods for analysing the features of talk. to illuminate the method, we advanced two research questions: (rq1): how were the prosodic features of teacher talk influenced by the contextual factors of the authentic classroom, such as noisy classroom conditions? (rq2): how did the teacher’s use of prosody vary between different kinds of talk patterns? 2. method 2.1 context and data the present work is an exploratory case study based on data obtained in an authentic classroom setting. twenty-seven seventh-grade students and a teacher worked in a computer-supported inquiry science classroom during a nine-week physics course (27 hours of teaching and studying). one researcher observed the lessons and took ethnographic field notes during classroom observations (derry et al., 2010). based on these observation notes, we selected two lessons that included different types of teacher’s talk (see, next section) for future analysis. the two respective 45-minute videos of physics lessons from which both the teacher’s and students’ dialogue were transcribed, served as data for the present study. the teacher played a central role in planning the practical organisation of the project, and she was not given any specific instructions regarding her role as a teacher in the project. the teacher was fully responsible for implementing the instructional design without interference from the researchers. one video camera and three audio recording systems taped lessons. 2.2 analysing teacher’s talk in this study, we are interested in teacher talk in classrooms as a medium for pedagogy (e.g. kumpulainen et al., 2010). first, our analysis focused on how the prosodic features of teacher talk were influenced by the contextual factors of an authentic classroom (rq1). this investigation is necessary because most prosodic research has been conducted in settings in which participants use their ‘natural’ voice, while teachers may use their voice differently in classroom conditions, as the classroom is a specific condition with rather high levels of noise due to students. second, we sought whether (and how) a teacher’s use of prosody varied between different kinds of talk patterns (educational dialogues and teacher monologues, rq2). in-depth qualitative analysis and descriptive statistics were used to analyse and interpret the teacher’s talk. the identified 110 episodes of educational dialogue (n=42) and teacher monologues (n=68) were analysed. frequency counts and illustrative qualitative analyses were combined to explore the teacher talk in detail. the data analysis of classroom talk was grounded in educational dialogues and teacher monologues; it was also adjusted to sociocultural discourse analysis (mercer & dawes, 2014; niemi, 2016). the talk was analysed sequentially, which means that each utterance in a selected sequence is understood and viewed in relation to the previous utterance in the ongoing discussion. according to linell (1998), analytical descriptions are thus oriented towards the dialectical achievements of the participants. we identified key episodes related to educational dialogues based on triadic dialogue (an i-r-f pattern, lemke, 1990) and teacher monologues. as nassaji and wells (2000) have noted, however, this basic structure of triadic dialogue can be used for many purposes, particularly because the nature of the feedback the teacher is providing may vary. in our analysis, we focussed on these variations of teacher’s feedback. we based this analysis on sinclair and coulthard’s (1975) follow-up acts – accept/reject, evaluate and comment – however, we further developed the analysis of follow-up moves. we argue that sinclair and coulthard’s (1975) follow-up moves may not fully account for the influence of feedback variations that are present in the current inquiry-based science classrooms. teacher talk has radically changed since 1975, especially in the context of inquiry-influenced science classrooms. while the role of the teacher was once that of knowledge provider, classroom talk today is more based on shared discussions in which teachers try to trigger and support their students’ knowledge construction activities. for example, whether a teacher accepts or rejects a student’s response will make a difference. we thus broadened the analysis of the i-r-f pattern and labelled follow-up moves as cumulative, promotive and disputational i-r-f patterns (see table 1 for more details) (see also an analysis of student-student collaboration: exploratory, cumulative and disputational talk, mercer & wegerif, 1999). table 1 the talk patterns used for coding transcribed data to form a meaningful unit of analysis, typically several utterances, both from the teacher and student, needed to be combined. consequently, there were cases in which sequentially analysed utterances had characteristics of two or more classes from table 1. in these situations, the coding of units of analysis was based on the teacher’s talk. in addition to three educational dialogue patterns, there were episodes of teacher monologues when only/mostly the teacher spoke. during teacher monologues, the teacher, for example, gave physics-related information without dialogic discussion with students. she also organised the groups to get them to work more effectively. these patterns of talk are referred to as teacher presentations and group organising, respectively. finally, there was ‘other’ talk. all these classes with their descriptions are presented in table 1. the coding was done with software for qualitative data analysis (atlas.ti). this coding (six classes, see table 1) allowed an examination of educational dialogues and teacher monologues, which enabled talk episodes to be identified as patterns for frequency counts. from the coded data, the durations of the units of analysis were determined as measured in seconds. subsequently, based on the coded frequency counts of teacher talk patterns were selected for the intonation analysis. we did not conduct an intonation analysis for ‘other’ talk because this group was so heterogeneous in its contexts, for example the teacher discussed with her colleague (without the students hearing it) a student who probably ran away from school. in summary, in our analysis we first identified talk episodes, such as (1) cumulative, (2) promotive, and (3) disputational i-r-f patterns in the educational dialogues, as well as (4) teacher presentation and (5) group organising in the teacher monologues. after that, we investigated how the prosody related to these patterns of talk was characterised and whether variations could be identified. the educational dialogues and teacher monologues took place in finnish, and the researchers translated the excerpts presented in this article. pseudonyms were used to report the results. our analysis revealed typical patterns of how the teacher used her voice regarding different classroom situations. we aimed to select representative excerpts of classroom talk. there are several reasons for selecting these specific excerpts. first, in line with how the prosodic features of teacher talk were influenced by contextual factors (rq1), we selected excerpts that illustrate prosodic challenges that emerged when teachers acted in authentic classroom settings and that may be a useful starting point for future studies. additionally, to illustrate how the teacher’s use of prosody vary between different kinds of talk patterns (rq2), excerpts were selected in accordance with the theoretical perspective regarding educational dialogues and teacher monologues. due to space limitations, only some examples of the talk are illustrated in detail. however, we do not claim that the episodes presented here are necessarily typical of the larger sample. rather, they were chosen in view of our aim to illustrate the method developed and show that prosodic analyses may be an interesting venue for further research. however, to increase reliability, the data excerpts and the analyses have been actively discussed within our research group. critical comments and joint analysis efforts have contributed to strengthening the validity of the empirical analysis. 2.3 acoustic analysis in the prosodic approach to our data analysis, we focused on pitch and intonation. additionally, voice quality was addressed, as it may change involuntarily as a consequence of environmental challenges. for readers who are unfamiliar with the measures within prosody research, we briefly explain the terminology in detail in the next section. readers who are familiar with this terminology can move directly to the section ‘analyses in the present study’. 2.3.1. introduction to acoustic analysis human voice production can be divided into three main parts: power source, vibrator and filter. airflow from the lungs provides the power source for vocal fold vibration. the vocal tract (space from the vocal folds to the lips and nostrils) acts as a filter, thereby colouring and amplifying the sound produced by vocal fold vibration. through articulation, we modify the vocal tract to produce speech. speech consists of linguistic content and paralinguistic cues (prosodic, suprasegmental). linguistic content refers to the words used, and paralinguistic cues are the way the words are expressed and what other sounds or modifications are included (laughter, crying, smiling). paralinguistic content contains prosodic elements. prosodic features are said to be suprasegmental, as they are properties of speech units larger than the individual segment. it is necessary to distinguish between the personal, background characteristics that belong to an individual’s voice (for example, one’s habits influencing the pitch range) and the independently variable prosodic features that are used contrastively to communicate meaning (for example, the use of changes in pitch to distinguish questions from statements). we can alter our vocal production in many ways. the human voice, like sound in general, has three basic characteristics: pitch (how high or low the sound is), loudness (how soft or loud it is) and quality (timbre/colour, i.e. whether the sound is dark or bright, or sounds tense or lax). in addition, sounds have duration (how long or short the signal is). prosodic characteristics consist of the manipulation of these aspects. thus, they include the average pitch and loudness used by the speaker, as well as variations in pitch and loudness during sentences. variations in pitch during a sentence are called intonation. variations of loudness during a word or a sentence are used to stress (highlight) the important parts of the message. the manipulations of temporal aspects in talk include altering the duration of sounds and talk tempo and talk rhythm, consisting of temporal aspects and pausing. besides these characteristics, we can change our vocal quality: we can speak in a tenser way or a laxer way, or we can use various types of vocal fold vibration: chest voice, falsetto or vocal fry. furthermore, prosodic aspects are used to complement, intensify or even contradict one’s conversational content. prosodic aspects also convey (even subconsciously or involuntarily) information about our psychophysiological state, with aspects like mood, emotion and attitude, as well as our physical status (age, gender, health etc.). voices can be studied acoustically. the characteristic of sound that is perceived as pitch originates from the fundamental frequency (f0). this, in turn, in human voice production, corresponds to the number of vocal fold vibrations per second. it is measured in hertz (hz); one vibration per second is 1 hz. the faster the vocal folds vibrate, that is the more vibrations they produce per second, the higher a pitch we hear. the average f0 of a male speaking voice is about 120 hz and a female voice about 200 hz; however, the frequencies (i.e. the pitch use) may be somewhat language and culture dependent (pépiot, 2013). below, in figure 1, we present an example of the pitch curve, or f0 curve, made from a sentence read by a female speaker: ‘to my chagrin (in finnish: ‘harmikseni’), i did not find the basket there any longer.’: in this example, the mean f0 of the low-pitched finnish female speaker is 165 hz, which corresponds to the e note (e3) on a musical scale. the highest peak (f0 maximum) at the beginning of the sentence is 308 hz (dis1 or dis4), and the lowest f0 at the end of the sentence is 122 hz (h or h2). thus, the pitch variation range in this sentence is approximately 16 semitones (a semitone, i.e. a half step, is the smallest interval in western tonal music). in general, peaks on an f0 curve are associated with sentence stress (emphasis). figure 1. an example of pitch curve, or the f0 curve, and analysis. x-axis: time (in seconds), y-axis: fundamental frequency (f0), pitch in hertz (hz, 1 hz is the inverse of the duration of one vocal fold vibration, thus telling how many vibrations fit in one second of time). the high peak that is clearly seen at the beginning results from sentence stress placed on the word ‘harmikseni’ (in english: ‘to my chagrin’). the main acoustic correlate of perceived loudness is the sound pressure level (spl), which is most often measured in decibels (db). voice quality, in turn, can be acoustically studied, for example, with spectrum analysis. there, the sound is divided into components (harmonics). these components are present simultaneously. how strong these components are in relation to each other affects how the voice sounds, that is, whether the voice’s timbre is bright or dark. in a bright voice, there are stronger components in the high-frequency range compared to a darker voice. the overall tilt of the spectrum tells how the voice is produced. the tilt is steeper in a soft or breathy voice (see figure 2). figure 2. an illustration of two examples of long-term average spectra (ltas) from two [a:] vowel samples from the same female speaker. the solid line describes ltas from a pressed voice, while the dotted line shows a breathy voice. the spectrum tilts more steeply in the sample with a breathy voice. this means that the harmonic energy declines faster as a function of frequency. the curves drawn on top of the spectrum (green for breathy and red for pressed) show this phenomenon. the higher the peaks are in the spectrum, the more energy there is in the expression, and the louder the voice sounds. 2.3.2. analyses in the present study in the present study, a fundamental frequency (f0) analysis was conducted to give physical correlates of pitch and intonation for sentences that were classified to represent the five teacher-talk patterns mentioned in table 1: cumulative, promotive, and disputational i-r-f patterns; teacher presentations; and group organising talk. the f0 analysis was performed using praat software (boersma & weenink, 2006, version 6.0.21). we studied the f0 curve both qualitatively and quantitatively. in the latter approach, we measured the mean, range and standard deviation (sd) of f0. the sd of f0 illustrates f0 variation in intonation more reliably than the f0 range, as the latter can be affected by unintentional voice quality-related matters (like the use of vocal fry with a very low f0) or random errors in the automatic f0 analysis. voice quality was illustrated through ltas and praat analysis. 3. results 3.1 prosodic challenges when teachers act in authentic classroom settings (rq1) our methodological approach offers possibilities to show prosodic challenges that emerged when teachers acted in authentic classroom setting. we found that when the teacher interacted with students in authentic (meaning often rather noisy) classroom conditions, she used a loud voice, which led to a heightened pitch. figure 3 illustrates this phenomenon in terms of an f0 curve. as we can see in figure 3, there is a part of teacher presentation, interrupted by a question from a student, followed by the teacher’s answer to the question. teacher: speaking of next week’s exam that we thought about today… student: so, when it will be next week? (in finnish: ’eli millon se on ens’ viikolla?’) teacher: on wednesday of next week. here, the teacher’s mean f0 is 296 hz (d1) in the beginning, which represents her general way of speaking to the whole class. at the end part of the curve, the mean f0 is 220 hz (a), as she answers a student’s question seemingly using her natural conversational volume. thus, at the beginning she raised her pitch by circa 5 semitones to speak to all the students. this may also restrict the habitual livelier use of pitch variation, as can be seen by comparing the beginning part of the f0 curve in figure 3 with the f0 curves seen in the other figures. in figure 4, we illustrate this exchange in terms of voice quality. the black line shows a spectrum of the part: ‘speaking of next week’s exam that we thought about today…’ in this part, the teacher speaks loudly, and her perceptual vocal quality is pressed (see e.g. kankare et al., 2012; waaramaa & kankare, 2013; waaramaa et al., 2014). the grey line shows the spectrum of the part: ‘on wednesday of next week.’ in this part, the teacher’s voice sounds more relaxed. thus, in pressed phonation, there is more sound energy at the higher frequency range than in the ordinary phonation type. the use of a pressed vocal quality poses more biomechanical load on the vocal folds than the use of an ordinary, relaxed voice. additionally, this example illustrates that a teacher seemed unintentionally modify prosodic aspects, which may restrict the natural use of prosodic variation in conveying content or even provoke contradictory connotations. for instance, a teacher may involuntarily sound angry when trying to get his/her voice heard over background noise. this, in turn, may influence how teacher’s talk is interpreted in classroom. figure 3. an example of how the teacher raises her pitch by circa 5 semitones (left part of this figure) when raising her voice to speak to the whole group of students (compared to when she is speaking to a single student, right part of the figure). figure 4. two long-term average spectra of teacher talk. y-axis: mean sound energy in db, x-axis: frequency in hz. the black line represents a pressed and loud voice (addressing the complete group of students), while the grey line represents a relaxed voice (talking to an individual student). 3.2 differences in a teacher’s use of prosody (rq2) in table 2, we summarise the f0 characteristics in the five patterns of talk studied (see table 1 for a description of the patterns). even though based on this small sample size, it is impossible to claim direct correspondences between the patterns of talk and intonation patterns, some typical features could be identified based on this exploratory case study. in general, cumulative i-r-f patterns seemed to use a moderate mean pitch level and a moderate pitch variation (see table 2), with frequent word stresses that were realised using the same pitch pattern (see figure 5). promotive i-r-f patterns showed a low mean pitch, a wide pitch range and large emphatic sentence stresses (high peaks in the f0 curve; see figure 6). disputational i-r-f patterns seemed to use moderate mean pitch level with large pitch variation and strong emphatic stresses (see figures 7 and 8). teacher presentation resulted in a high mean pitch level with small pitch variation (figure 9), while group organising was characterised by a relatively high mean pitch level with moderate pitch variation (figure 10). in the following sub-sections, our methodological approach is exemplified with empirical examples. we demonstrate how language was manifested in various talk patterns and what kinds of prosodic features were typical for each type of talk pattern. for each type of talk listed in tables 1 and 2, we first introduce a representative example of a talk episode, followed by a representative figure (graph of a prosodic phase) illustrating how the teacher’s intonation is used and varies. within the selected representative sentences, bold words refer to stressed words in the sentence. table 2. teacher’s f0 (pitch) variation and range in hz and in semitones. 3.2.1. teacher’s talk prosody in educational dialogues cumulative i-r-f patterns were rare and emerged nine times (8.2%) with a total duration of 333 seconds (12.2%). they involved speakers in pleasant, uncritical exchanges that built towards a common understanding through accumulated repetition and confirmation. from a prosodic perspective, cumulative i-r-f patterns were associated with a relatively narrow pitch range and frequent word stresses that were realised using the same pitch pattern. typically, the intonation pattern repeated itself, without extreme deviations from the mean pitch level. the following excerpt 1 of cumulative i-r-f pattern shows how the teacher cumulated dialogue. first, she conformed: ‘well, now, it’s here’, and she asked what material a jar of jam is made from. the student pekka responded. this was followed by a new question from the teacher and another response from pekka. subsequently, the teacher inquired what happens to materials when they are heated, and she accepted a trivial answer from joel, a student: ‘the material gets warmer’ (i.e. constructing positively). then, the teacher tries to remind the students of a video in which the same phenomenon was shown. in this way, the common knowledge is constructed further. the teacher repeats her student elvira’s answer that, when matter becomes warmer, it expands (conformation and repetition). figure 5 is selected from excerpt 1 to show a typical example of the flow of the cumulative i-r-f pattern. in this example, the mean f0 is 232 hz, and the sd is 46 hz, that is, seven semitones, and the total f0 range is 112–335 hz. furthermore, the graphic lines show that the intonation pattern repeats itself, without extreme deviations from the mean pitch level. between the vertical lines is a sentence: ‘expands (in finnish: ‘laajenee’), when matter becomes warmer (in finnish: ‘lämpenee’), it expands (in finnish: ‘laajenee’). now, when the jar is made of glass, the lid is made of metal. does it expand in the same way?’ excerpt 1: an example of cumulative i-r-f patterns teacher: well, now, it’s here. think about a jar; what material is a jar usually made of? a jar of jam. pekka: glass. teacher: and what material is the lid of the jar, typically? pekka: metal. teacher: think about glass and metal; when you put the jar under hot water, what happens to it? what happens to matter when it becomes warmer? when getting warmer, matter, what…? joel: it becomes warmer. teacher: becomes warmer, but at the same time …? quite in the beginning we made those, there was … you have the kind of a video there, with the hole and the bullet, and the bullet is heated. elvira: expands. teacher: expands, when matter becomes warmer, it expands. now, when the jar is made of glass, the lid is made of metal. does it expand in the same way? liisa: the lid gets larger. teacher: yes, the lid gets larger than the jar, so then we get it open. figure 5. an example of a cumulative i-r-f pattern. during promotive i-r-f patterns, the teacher engaged constructively with students’ ideas, trying to trigger productive collaboration (to trigger exploratory talk, see, mercer & wegerif, 1999). promotive i-r-f patterns were applied fairly actively (n=20, 18.2%; total duration of 11.4 minutes, 25%). the prosodic analysis illuminated that promotive i-r-f patterns were associated with a wider pitch range than cumulative i-r-f patterns and larger emphatic sentence stresses than the cumulative i-r-f patterns presented previously. the following excerpt 2 is a typical example of a discussion between the teacher and her students. the teacher asks her students to think of the three materials and infer which expands differently from the others. the teacher urges her students to consider a phenomenon and converse about it, and she offers an immediate verbal response to her students’ reactions. as we can see, the teacher engages students to actively encounter the phenomenon: ‘now, think of these three materials. can you infer which one of these expands differently from the others?’ when juuso’s response is correct, the teacher becomes excited and immediately gives him positive feedback: ‘correct! well done!’ the teacher continues to explore the issue and asks her students to make a hypothesis regarding what happens to metals when they are heated. in this case, the teacher tries to engage her students to consider and justify their answers (aiming to promote students’ exploratory talk). she also offers alternative hypotheses: ‘well, what happens to metals when they exp … become warmer? do they shrink, stretch or not change?’ after ilkka replies correctly, the teacher again provides positive feedback. figure 6 highlights the discussion between the teacher and a student. for figure 6, the mean f0 is 199 hz, and sd 55hz, i.e. 9.8 semitones, and the total range is 98–310 hz. we can see that promotive i-r-f patterns showed a wider pitch range and larger emphatic sentence stresses than the cumulative i-r-f patterns presented previously. furthermore, an emotional state of excitement involves a high arousal level that is seen in high f0 peaks in the intonation curve: ‘correct! well…’ (in finnish: ‘aivan! hyvin…’) the two high peaks in the pitch curve at the end of the sample reflect stressed words, expressing the teacher’s excitement when getting a correct answer from her student: ‘correct! well done!’ (in finnish: ‘aivan! hyvin pää(telty)!’) the last syllables in the parentheses are expressed in a whisper. the excerpt also shows how the intonation curve drops in a question in finnish (the first half of the picture). excerpt 2: an example of promotive i-r-f patterns teacher: if you think of these three materials, could you infer which one of those expands differently from the others? juuso: well, water. teacher: correct! well concluded! juuso: do i win a prize? teacher: nope. well, what happens to metals when they exp … become warmer? do they shrink, stretch or not change? ilkka: they stretch. teacher: good! the first page is completed. figure 6. an example of promotive i-r-f -pattern. finally, disputational i-r-f patterns were characterised by disagreements, by teacher disagreeing and showing critical approach to student response(s) and by short, often confrontational, interchanges from the students. it emerged 13 times (11.8%) with a total duration of 346 seconds (12.7%). figures 7 and 8 below illustrate that during disputational i-r-f -patterns, the pitch variation seemed to be the greatest and sentence stresses strongest. in the following excerpt 3, we can see how the teacher gets frustrated when trying to motivate the students to think about the insulation properties of a thermos bottle. the teacher asks many times if the students could explain why the inner surface of the thermos bottle is glossy without giving them sufficient resources. there is also evidence that the students are not listening actively to the teacher, as markus asks regarding the glossy surface: ‘on the inside, you mean?’ even though the teacher has consistently been talking about the inner surface of the thermos bottle. finally, when markus responds to the question, the teacher disagrees with him: ‘that isn’t enough, markus, that it keeps the drink warm.’ in practice, the teacher demands more information from markus. of the interactive patterns, pitch variation seems to be largest and sentence stresses strongest for disputational i-r-f -patterns. for figure 7, the mean f0 is 230 hz, and sd 66 hz, i.e. 10 semitones, and the total range is 84–346 hz. in figure 7, we can also notice that the sentence stress is on the word ‘enough’ (in finnish: ‘riitä’), which is indicated by the highest peak in the intonation curve: ‘that isn’t enough (in finnish: ‘toi ei riitä...’), markus, that it keeps the drink warm.’ when comparing figure 7 to figure 6, high f0 peaks can be seen in both figures reflecting a high arousal level in the teacher’s talk. however, in figure 6, a positive emotion of excitement was expressed and in figure 7 a negative emotion, perhaps frustration considering the content of the teacher’s talk. thus, the emotional valence (positive, negative or neutral) cannot be directly concluded from the acoustic cues of the arousal level e.g. related to intonation (bänziger & scherer 2005). figure 8 below illuminates how this disputational i-r-f pattern continues with similar significant prosody variation. here, the teacher is still not happy with the students’ study process, and she is still demanding more from the students. in the episode represented in figure 8, the teacher’s mean f0 was 210 hz, the sd was 46 hz, i.e. ca 8 semitones, and the range was 122–317 hz. thus, in the continuation of disputational i-r-f pattern the pitch variation continues to be great and sentence stresses strong: ‘it is kind of a fact (in finnish: ‘fakta’) why a vacuum bottle is used (in finnish: ‘käytetään’), but you should now explain why the glossy (in finnish: ‘kiiltävä’) interior is helpful there.’ the very low f0 values in the total range reflect the use of vocal fry phonation, especially in sentence endings. excerpt 3: an example of disputational i-r-f -patterns jan: a thermos bottle is a bottle that is heat insulated from its environment as much as possible. teacher: yes. does it explain why the inner surface is made glossy? it is one of the many ways by which it is made a good insulation, the structure of the whole bottle, but… (off-topic discussions between the students) teacher: yes, but why does the glossy surface keep it warm unlike a red surface, for example? jan: i don’t know. teacher: and you cannot find any explanation, can you, if you go and study the material there? [miscellaneous noise] teacher: hey – now! you were supposed to think now of the glossy inner surface of a thermos bottle. what might be [i know] the reason? well? markus: on the inside, you mean? teacher: uhum, on the inside, yes. have you ever looked into a thermos bottle? [miscellaneous noise, the teacher’s comments to one student.] think; search for information. it can be found there in the heat transfer mechanisms section. teacher: you can’t search for anything, can you? markus: me? yes, watch out; it’s coming soon…keeps warm…. teacher: that isn’t enough, markus, that it keeps the drink warm. it’s like a fact why people use a thermos bottle, but you should now explain why the glossy surface helps there. so, because what...? markus: i don’t know. teacher: well, it cannot be necessarily found from wikipedia now. [miscellaneous noise] teacher: it is kind of a fact why a vacuum bottle is used, but you should now explain why the glossy interior is helpful there. figure 7. an example of disputational i-r-f -pattern figure 8. another example of disputational i-r-f -pattern, a continuation from figure 7. 3.2.2. teacher’s talk prosody in teacher monologues the teacher’s talk initiated by a student’s question or her talk at the beginning of different sections of the lesson was labelled as teacher presentation. the teacher gave general instructions to the students, so they could start working on their tasks, the teacher interrupted the students’ working so they could begin reviewing the correct answers to the problems, or the teacher gave a short lecture about the theory when students faced challenges while solving problem; these are typical examples of when talk classified as teacher presentation emerged. in general, it can be said that when there was a need for straightforward instruction (mercer, 1995) the teacher’s talk had characteristics of teacher presentation. overall, teacher presentation was a common type of talk, comprising one fourth of the total duration of different talk patterns (n=27, 24.5%; total duration 11.5 min, 25.3 %). as an example, in the following excerpt 4 the teacher does not provide physics-related information, but rather general instruction to the whole group about the lesson plan of the day. she also gives general feedback to the students about combining information from different sources when taking their previous exam. there are no attempts (e.g. questions) to invite students to take part in this discussion, so the conversation can be referred to as teacher monologue. figure 9 displays an example of teacher presentation talk: ‘there were some minor difficulties in the answers (in finnish: ‘pieniä ongelmia vastauksessa’) to the exam last week (in finnish: ‘viime viikon’). you were not able to combine (in finnish: ‘osannu yhdistellä’) pieces of information …’ the mean pitch of the teacher’s talk is relatively high (290hz), and there are no wide changes in f0 during intonation. the range of f0 variation comprises frequencies from 193 to 374 hz, and the sd is 37 hz, i.e. about 4.4 semitones. excerpt 4: an example of teacher presentation teacher: there were some minor difficulties in the answers to the exam last week. you were not able to combine pieces of information (or) find it, so i think that we will practice a little for the exam. there are similar types of questions to those that will be on the exam, so let’s take a look (at these). first, go through (the problems) yourself or with a pair or a group, and think about how you would answer. together, try to find what kind of answer would be good when you have to combine pieces of information from many resources now. figure 9. an example of teacher presentation talk. in addition to teacher presentation, there were 31 units of analysis (28.2 %) belonging to group organising, but the total duration (9.0 min, 19.8 %) of those units was not that high, mainly because excerpts were usually short remarks and comments from the teacher to the students relating to their behaviour and studying methods. group organising, like teacher presentation, was characterised by teacher monologue. in general, there was a need for group organising at regular intervals when students solved the problems themselves (6.5 min, 25.4 %, cf. teacher presentation: 3.9 min, 15.4 %). when the section of the lesson was more teacher-led by nature (the students started to go through the correct answers with the teacher), the teacher had to guide the students less frequently to concentrate on the teaching (2.5 min, 12.5 %, cf. teacher presentation: 7.6 min, 38.0 %). in the following excerpt 5, the teacher explains to the students that the problems they are going to solve are a rehearsal for the exam. when a student points out that he did not get a problem sheet, the teacher says, with a twinkle in her eyes, that some of the students may have taken more than one (same) problem sheet, as there were not enough papers in the stack. even though there are utterances both from the teacher and the student, there is no typical triadic dialogue visible, and the conversation can be referred to as teacher monologue without true collaboration between the participants. the main motive of the teacher was to get the problem sheets for everyone so the students can start revising physics-related issues for the exam. a situation given below illustrates the use of prosody during group organisation. in figure 10, we can see how the teacher says that the exercise ‘is a rehearsal (in finnish: ‘harjoittelua’) for the exam’, and a student responds that ‘i didn’t get one’ (in finnish: ‘mä en saanu.’) then, the teacher answers that ‘there should be (more) in the stack (in finnish: ‘siin pinos’), so perhaps somebody took more than one. see, the most hard-working (in finnish: ‘ahkerimmat’) students do two (papers).’ as we can see from figure 10, the teacher’s mean f0 was 248 hz, the absolute minimum 56 hz representing vocal fry (creaky sound), 175 hz was the lowest f0 without vocal fry, and the maximum was 419 hz. thus, the total frequency range was 15 semitones, i.e. 1 ¼ octaves. the sd of f0, which reflects the intonation range more reliably, was 55 hz, i.e. approximately 8 semitones. excerpt 5: an example of group organising teacher: this is a rehearsal for the exam. jenna: i didn’t get (one). teacher: there should be (more) in the stack, so perhaps somebody took more than one. see, the most hard-working students do two (papers). figure 10. an example of group organising. 4.discussion the present study is a first step towards developing a new method of analysing classroom interaction. we investigate the potential of prosodic analyses of teacher talk. our argument is that analysis of prosodic features has been underrepresented when classroom talk is analysed. we claim that in addition to analysing the content of talk (focusing on what is said), analysing the prosodic features of talk (focusing on how something is said, thus considering elements as intonation, volume, and pace) is also important. in some cases, it might be as important as – or even be more important than –what is actually said. for example, a teacher asking a student, ‘what do you think about this?’ may be simply inquiring for a student’s opinion. however, the same question – in exactly the same wording – can be conveyed in such way (by changing intonation and stressing other words) that a student really feels involved in the discussion process and appreciates being invited to share his/her ideas. at the same time, exactly the same words can be pronounced in such a way that the student feels threatened and reprimanded for not listening attentively. in this article, the methodological development grounds on a notion that voices can be studied acoustically. we focused on teacher’s talk in an authentic classroom, and two research questions were addressed in relation to the general aim of investigating the potential of prosodic analysis. knowing that a classroom is far from a laboratory setting, we investigated how the prosodic features of teacher talk were influenced by the contextual factors of the authentic classroom (i.e. an often quite noisy environment) (rq1). with our methodological approach, we were able to illustrate some specific prosodic challenges. our findings showed that when the teacher acted in the authentic classroom setting, she often used her voice in a different way. the results show that when addressing the complete classroom, her voice was more raised, resulting in a more pressed voice (indicated by a higher pitch) than in other occasions, such as talking to the student in a one-to-one way or to small group of students when guiding them. in the latter situation, the voice was more relaxed and thus closer to her natural voice. we argue that how teachers use their voice may have an influence on teachers’ health, teacher-student interaction, and classroom climate. firstly, the risk of vocal fatigue increases when using a pressed voice (e.g. kankare et al., 2012). secondly, a pressed voice quality is related to the expression of anger (e.g. laukkanen et al., 1997; waaramaa et al., 2010; 2014). therefore, speaking in a large and noisy classroom may lead to involuntary and misleading prosodic characteristics. these characteristics can be interpreted as shouting in anger, which may affect negatively on teacher-student interaction and classroom climate (see, 3.1). this may be disconcerting as a high-quality teacher–student interaction and a supportive classroom climate is one possible protective factor against the negative impacts of learning (kiuru et. al., 2012). the second research question focused on studying how the teacher’s use of prosody varies between different kinds of talk patterns (rq2). therefore, we first identified talk episodes, such as (1) cumulative, (2) promotive, and (3) disputational i-r-f patterns in the educational dialogues, and (4) teacher presentation and (5) group organising in the teacher monologues. next, we checked how the prosody related to these patterns of talk was characterised and whether differences could be identified. we found that cumulative i-r-f patterns seemed to use less pitch variation and word stress patterns were often repeated here. on the opposite, a wide pitch range and clear emphatic sentence stresses with large f0 jumps characterised disputational as well as promotive i-r-f patterns. thus, the intonation pattern in cumulative i-r-f patterns seems to reflect continuation, while the strong emphatic stresses with relatively wide pitch intervals marks a contrast e.g. between the student’s answer and the teacher’s instruction. in promotive i-r-f patterns, some strong accents with high pitch peaks were used to acknowledge correct answers and to give support. regarding teacher monologues, teacher presentation used a high mean pitch, a narrower pitch variation and a more pressed voice quality, while group organising was characterised by a relatively high mean pitch level with moderate pitch variation. in sum, by combining the prosodic and content characteristics of teacher’s talk, we were able to identify initial variations in how the teacher used her voice in diverse educational dialogues and teacher monologues. 4.1 limitations and critical issues the strength of this study is that, along with studying the content of the talk, it pays attention to the potential offered by the prosodic perspective of teacher’s talk that has rarely been explored to date. however, there are several limitations and critical issues to consider as this study was an initial attempt to illustrate how the teacher’s intonation varies depending on the situation. first, this study is exploratory in nature, and although we were able to show that different prosodic characteristics are somehow related to distinctive patterns of talk content-wise, additional explorative and hypothesis-testing research is needed to analyse this relationship more specifically. second, as this case study like case studies in general is based on a small sample, all limitations thereof should be duly considered. moreover, there are three more limitations to our study that are related to the use of prosodic analysis in general in this type of research settings. the third limitation concerns the use of acoustic speech methodology in an authentic classroom setting and comprises three problematic aspects: (1) although the technology is available, it is not necessarily easy to get the hardware needed (especially when using it on a larger scale) and to use this hardware to capture voices in classrooms without compromising the authenticity of the setting; (2) the teachers may tend to use their voice in different ways depending on the specific conditions within the classroom; and (3) authentic classroom conditions may hamper the quality of audio recordings and thus limit the usability of the method. the fourth and fifth limitations also pertain to acoustic speech research in general, both of which may make it more difficult to establish a catalogue of normative data on classroom interactions. the fourth limitation is that the interpretation of how voice is used might be culturally bound (waaramaa, 2014; waaramaa & leisiö, 2013), and this aspect was not considered in this study. on a general level, this limitation might also make it more difficult to compare findings from different classrooms around the world. the fifth limitation is that language specificity might form another barrier for the comparability of the research findings. specifically, when studying collaboration, language is always the central aspect under investigation, and with regard to prosodic analysis, specific features of different languages may have specific characteristics (see, method section). therefore, we briefly discuss the specifics of intonation patterns and features of finnish language (compared to other languages) in the remainder of this paragraph. in finnish, sentence stress and intonation do not serve linguistic purposes to the same extent as e.g. in swedish or english, as finnish takes advantage of enclitics. the intonation curve is typically declining in statements, and a relatively smooth and high intonation pattern is used to express continuation (aaltonen & wiik, 1979). a pitch rise in sentence endings has been regarded as untypical for finnish language, even though lately it has become a characteristic of teenagers’ talk (routarinne, 2003 a and b; härkönen, 2016). in general, finnish talk has been described as characterised by soft phonation, a low mean pitch, small intervals in intonation and a relatively ‘tame’ expression of emotions (hakulinen, 1979). on the other hand, despite the differences between the languages, prosody has been analysed in other fields, e.g. the therapist–patient dialogues (e.g. leszcz, 2017). from this research, we know that differences between the languages can be considered and dealt with. in the present study, we investigated the intonation pattern of a teacher’s talk and our findings can be considered to be in line with earlier results e.g. reported by o’connor and arnold (1973) for english language. however, even though we can take differences between languages into account, it could be interesting to explore the value of prosodic analysis in view of analysing teacher talk in different languages. 4.2 directions for future research in this section we put forward many opportunities that this new method may bring, and we relate this to some elements to be further developed and explored in future research. first of all, we see the potential for developing our methodological approach towards (semi-)automatic analysis of audio and video data. in a first phase, a possible methodological application could be to identify interesting discussion phases based on the prosodic features, which can then be further analysed and interpreted by educational researchers. based on the results of this explorative study, we are optimistic that using prosodic analysis in such a semi-automatic way is a likable future application, and even if it means that the identified phases still need to be interpreted by researchers, this is a promising venue, as often researchers have an enormous amount of data, and thus being able to use prosodic analyses to pre-process and reduce this amount of data for manual coding would be a useful application. in a second phase, future research could focus on investigating whether it is possible to move to fully automatic analyses of talk, based on prosodic analyses. a second opportunity and direction for further research is to broaden the scope to also analyse students’ talk. while our study focused on teacher’s talk, we suggest that next methodological step should be taken by combining prosodic and content analysis to study students’ talk, and more specifically student–student dialogues that are happening as a part of collaborative learning. in this respect, prosodic analyses could be applied to identify different types of talk or collaboration on the one hand, while on the other hand it could be used to capture students’ (and also teachers’) emotions. earlier research in the field of therapist–patient dialogues has shown that in addition to verbal, non-verbal, para-verbal, implicit and explicit communication, prosodic analyses are useful in capturing and predicting emotions (leszcz, 2017). the role of emotions has often been underestimated and could be of great importance (isohätälä et al., 2017). what is particularly important is the consideration of how, when and why students’ emotions arise and how they shape interaction (student-student/teacher-student) and affect students’ dedication towards collaboration and learning. this may be associated what and how is said in the classroom context (e.g. our results about teacher involuntary sounding angry). recently, positive activating emotions have been shown to be related to good academic success (postareff et al., 2017), so being able to capture and analyse emotions through prosodic analyses – and in a next phase do this ad hoc, on the fly and provide teachers with this information through a learning analytics powered dashboard – could be very interesting and valuable future application. related to this, a third opportunity that we put forward is the development of new digital tools to support teachers (see also, harteis, 2018). automatic prosodic analyses could be an interesting feature to inform teachers of students’ collaborative discussions, e.g. by signalling group processes to the teachers. if students’ voices could be interpreted on the fly, data from these analyses could be used to create process indicators in an automatic way. by adding information from automated prosodic analysis, existing tools could be extended. as an example, we can think of how a lantern device (dillenbourg et al., 2011) could be fed by prosodic data. the lantern device of dillenbourg and colleagues (2011) is a small device with leds that is controlled by students to allow them to indicate which exercises or phase of a collaborative activity they are working on (i.e. by changing the colour of the lamp), and if they have questions for the instructor, it allows them to signal this to the instructor (i.e. by making the lantern blink). the blinking rate is furthermore increased over time, allowing the instructors to see how long students have been waiting for them (for more details, we refer to dillenbourg et al., 2011). the goal of the devices was to provide the instructors with some awareness of the teams’ behaviour. in their implementation, students controlled the tool themselves, but in future extensions, based on automated on-the-fly analyses of the prosodic features of students’ collaborative discussions, the tool could provide additional useful information about collaboration processes for instructors. finally, a fourth opportunity is allied to teacher training and teachers’ professional development. being able to capture, interpret and understand students’ emotions on the fly while engaged in technology-enhanced collaborative learning may be helpful for teachers’ professional development. this is also related to the question of how emotional valence — whether positive, neutral, or negative — can be derived from the teacher’s talk. typically, vocal emotions are studied first for their arousal level and second for their valence. in the present investigation, we concentrated on arousal level, displayed by intonation curves. in our future research, we will scrutinise teacher’s vocal expression of valence, how the teacher uses his/her voice to convey emotions related to the content of the talk, e.g. when encouraging the students, when expressing contentment or disappointment, and how valence expressed is associated with teacher’s talk. in this respect, research needs to focus on triangulating data resources. so far, there is research available focusing on physiological measures of emotions (e.g. with the smart rings, e.g. http://www.moodmetric.com) or heart rate variability measures (see e.g. https://www.firstbeat.com/en/) and self-report measures of emotions (see oksanen & hämäläinen 2010; castellar et al., 2014). we argue that an application could be to add prosodic analyses as a method in combination with these methods, as another source to triangulate from. to conclude, there is a current trend of exploring more advanced methods to capture social, cognitive, and emotional features of classroom talk, as these novel approaches are needed to meet the analytical challenges of making sense of the processes of learning and instruction (damsa & ludvigsen, 2016). the present exploratory study can in this view be seen as one contribution. we showed that acting in an authentic classroom setting might trigger specific prosodic aspects in the teacher's talk. additionally, we were able to identify differences in how the teacher used her voice and relate those to diverse educational talk patterns. we believe that prosodic analyses may be one novel approach that allows us to understand learning and instruction processes better. keypoints multiple methods and techniques are required to understand what happens in classrooms prosodic aspects (features of speech such as intonation, volume, and pace) of talk are under-represented in the field of the learning sciences we introduce prosodic analyses as a method to study teacher talk in classroom we showed that the teachers’ prosody varied depending on different patterns of talk that were identified based on the content. this article shows that prosodic analyses may have an added value for research on learning and professional development acknowledgments this work was supported by the academy of finland under grant numbers 292466 and 318095 [the multidisciplinary research on learning and teaching profile of jyu] and by the emil aaltonen foundation and the finnish cultural foundation. references aaltonen, o., & wiik, k. (1979). (1979). suomen jatkuvuuden intonaatiosta. in p. hurme. (eds.) jyväskylän yliopiston suomen kielen ja viestinnän laitoksen julkaisuja, 18 1. fonetiikan päivät (the first finnish phonetics symposium), (pp. 23-33). alexander, r. j. (2001). culture and pedagogy: international comparisons in primary education (pp. 391-528). oxford: blackwell. addington, d. w. (1968). the relationship of selected vocal characteristics to personality perception.speech monographs, 35(4), 492-503. berry, m. (1981). systemic linguistics and discourse analysis: a multi-layered approach to exchange structure. studies in discourse analysis, 1, 20-145. bänziger, t., & scherer, k. r. (2005). the role of intonation in emotional expressions.speech communication, 46(3), 252-267. boersma, p., & weenink, d. (2006). praat: doing phonetics by computer. brazil, d. c. (1978). discourse intonation ii, discourse analysis monographs ii(1st ed.). birmingham: university of birmingham, english language research. castellar, e. n., oksanen, k., & van looy, j. (2014). (2014). assessing game experience: heart rate variability, in-game behavior and self-report measures. in anonymous (eds.) 2014 sixth international workshop on quality of multimedia experience (qomex) (pp. 292-296). de wever, b., schellens, t., valcke, m., & van keer, h. (2006). content analysis schemes to analyze transcripts of online asynchronous discussion groups: a review.computers & education, 46(1), 6-28. derry, s. j., pea, r. d., barron, b., engle, r. a., erickson, f., goldman, r., . . . sherin, b. l. (2010). conducting video research in the learning sciences: guidance on selection, analysis, technology, and ethics. journal of the learning sciences, 19(1), 3-53. hakulinen, l. (1979). suomen kielen rakenne ja kehitys(4th ed.). helsinki, finland: otava. halliday, m. a. k., & hasan, r. (1976). cohesion in english. london: longman. harteis, c. (2018). machines, change, work: an educational view on the digitalization of work. in c. harteis (hrsg.), the impact of digitalization in the workplace – an educational view (s. 1-10). dordrecht: springer. hämäläinen, r., & de wever, b. (2013). vocational education approach: new tel settings—new prospects for teachers’ instructional activities? international journal of computer-supported collaborative learning, 8 (3), 271-291. härkönen, r. (2016). tilanteen vaikutus 14-vuotiaiden puheen akustisiin ja perkeptuaalisiin piirteisiin (acoustical and perceptual analysis of the situational effect on 14 year-olds' speech) hellermann, j. (2003). the interactive work of prosody in the irf exchange: teacher repetition in feedback moves.language in society, 32(1), 79-104. imhof, m., välikoski, t., laukkanen, a., & orlob, k. (2014). cognition and interpersonal communication: the effect of voice quality on information processing and person perception. studies in communication sciences, 14(1), 37-44. isohätälä, j., järvenoja, h., & järvelä, s. (2017). socially shared regulation of learning and participation in social interaction in collaborative learning. international journal of educational research, 81, 11-24. kankare, e., laukkanen, a., ilomäki, i., miettinen, a., & pylkkänen, t. (2012). electroglottographic contact quotient in different phonation types using different amplitude threshold levels. logopedics phoniatrics vocology, 37(3), 127-132. kiuru, n., poikkeus, a. m., lerkkanen, m. k., pakarinen, e., siekkinen, m., ahonen, t., & nurmi, j. e. (2012). teacher-perceived supportive classroom climate protects against detrimental impact of reading disability risk on peer rejection. learning and instruction, 22(5), 331-339. kumpulainen, k., & lipponen, l. (2010). productive interaction as agentic participation in dialogic enquiry. in k. littleton, & c. howe (eds.), educational dialogues: understanding and promoting productive interaction (pp. 48-63). london: routledge. laukkanen, a., vilkman, e., alku, p., & oksanen, h. (1997). on the perception of emotions in speech: the role of voice quality. logopedics phoniatrics vocology, 36(3), 465-475. laver, j. (1991). voice quality and indexical information. in j. laver (ed.), the gift of speech. papers in the analysis of speech and voice(pp. 147-161). edinburgh: edinburgh university press. lehiste, i. (1970). suprasegmentals. cambridge, massachusetts: mit press. lemke, j. l. (1990). talking science: language, learning, and values.new jersey, usa: ablex publishing corporation. leszcz, m. (2017). how understanding attachment enhances group therapist effectiveness.international journal of group psychotherapy, 67(2), 280-287. linell, p. (1998). approaching dialogue : talk, interaction and contexts in dialogical perspectives . amsterdam ; philadelphia, pa: j. benjamins pub. co. littleton, k., & mercer, n. (2013). interthinking: putting talk to work. london: routledge. lukkarila, p., laukkanen, a., & palo, p. (2012). influence of the intentional voice quality on the impression of female speaker. logopedics phoniatrics vocology, 37(4), 158-166. lyberg-åhlander, v., haake, m., brännström, j., schötz, s., & sahlén, b. (2015). does the speaker's voice quality influence children's performance on a language comprehension test? international journal of speech-language pathology, 17(1), 63-73. lyons, j. (1977). semantics. cambridge, uk: cambridge university press. mehan, h. (1978). structuring school structure .harvard educational review, 48(1), 32-64. mercer, n. (1995). the guided construction of knowledge : talk amongst teachers and learners . clevedon, avon, england: multilingual matters. mercer, n., & dawes, l. (2014). the study of talk between teachers and students, from the 1970s until the 2010s. oxford review of education, 40(4), 430-445. mercer, n., dawes, l., & staarman, j. k. (2009). dialogic teaching in the primary science classroom.language and education, 23(4), 353-369. mercer, n., wegerif, r., & dawes, l. (1999). children's talk and the development of reasoning in the classroom. british educational research journal, 25(1), 95-111. mortimer, e., & scott, p. (2003). meaning making in secondary science classrooms (1st ed.). berksire, england: open university press. muhonen, h., rasku-puttonen, h., pakarinen, e., poikkeus, a., & lerkkanen, m. (2017). knowledge-building patterns in educational dialogue. international journal of educational research, 81, 25-37. nassaji, h., & wells, g. (2000). what's the use of 'triadic dialogue'?: an investigation of teacher-student interaction. applied linguistics, 21(3), 376-406. niemi, k. (2016). moral beings and becomings: children's moral practices in classroom peer interaction o’connor, c., & michaels, s. (2007). when is dialogue ‘dialogic’? human development, 50(5), 275-285. o'connor, j. d., & arnold, g. f. (1973). intonation in colloquial english (2nd ed.). london: longman. oksanen, k., & hämäläinen, r. (2010). (2010). using psychophysiological methods in the research of collaborative learning games. in anonymous (eds.) international symposium on collaborative learning and argumentation (icla 2010), pépiot, e. (2013). voice, speech and gender: male-female acoustic differences and cross-language variation in english and french speakers. xvèmes rencontres jeunes chercheurs de l’ed 268, hal id: halshs-00764811 postareff, l., mattsson, m., lindblom-ylänne, s., & hailikari, t. (2017). the complex relationship between emotions, approaches to learning, study success and study progress during the transition to university. higher education, 73(3), 441-457. rogerson, j., & dodd, b. (2005). is there an effect of dysphonic teachers' voices on children's processing of spoken language? journal of voice, 19(1), 47-60. routarinne, s. (2003a). parenteesit ja nouseva sävelkulku keskustelun kielioppiin.virittäjä, 107(3), 398. routarinne, s. (2003b). tytöt äänessä. parenteesit ja nouseva sävelkulku kertojan vuorovaikutuskeinoina . helsinki, finland: suomalaisen kirjallisuuden seura. salloum, s., & boujaoude, s. (2017). the use of triadic dialogue in the science classroom: a teacher negotiating conceptual learning with teaching to the test.research in science education, scherer, k. r. (1972). judging personality from voice: a cross-cultural approach to an old issue in interpersonal perception. journal of personality, 40(2), 191-210. scherer, k. r., & giles, h. (1979). social markers in speech. cambridge, uk: cambridge university press. sinclair, j. m., & coulthard, r. m. (1975). towards an analysis of discourse: the english used by teachers and pupils (1st ed.). london: oxford university press. vilkman, e., & manninen, o. (1986). changes in prosodic features of speech due to environmental factors .speech communication, 5(3-4), 331-345. vygotsky, l. s. (1987). thinking and speech . in r. w. rieber, & a. s. carton (eds.), (1st ed., ). new york: plenum. waaramaa, t. (2014). perception of emotional nonsense sentences in china, egypt, estonia, finland, russia, sweden, and the usa. logopedics, phoniatrics, vocology, 40(3), 129-135. waaramaa, t., alku, p., & laukkanen, a. (2006). the role of f3 in the vocal expression of emotions.logopedics phoniatrics vocology, 31 (4), 153-156. waaramaa, t., laukkanen, a., airas, m., & alku, p. (2010). perception of emotional valences and activity levels from vowel segments of continuous speech.journal of voice, 24(1), 30-38. waaramaa, t., palo, p., & kankare, e. (2014). emotions in freely varying and mono-pitched vowels, acoustic and egg analyses. logopedics phoniatrics vocology, 40(4), 156-170. waaramaa, t., & kankare, e. (2013). acoustic and egg analyses of emotional utterances.logopedics phoniatrics vocology, 38(1), 11-18. waaramaa, t., & leisiö, t. (2013). perception of emotionally loaded vocal expressions and its connection to responses to music. a cross-cultural investigation: estonia, finland, sweden, russia, and the usa.frontiers in psychology, 4, 344. warwick, p., vrikki, m., vermunt, j. d., mercer, n., & van halem, n. (2016). connecting observations of student and teacher learning: an examination of dialogic processes in lesson study discussions in mathematics.zdm, 48(4), 555-569. zald, d. h. (2003). the human amygdala and the emotional evaluation of sensory stimuli (review).brain research reviews, 41, 88-123. zellner keller, b. (2004). prosodic styles and personality styles: are the two interrelated? in anonymous (eds.) proceedings of the 2nd international conference on speech prosody – sp2004 international conference on speech prosody – sp2004, (pp. 383-386). appendix db: http://science.howstuffworks.com/question124.htm. dec 12th 2016. frequency: https://www.merriamwebster.com/dictionary/frequency#medicaldictionary. dec 12th 2016. ? fundamental frequency (f0 and pitch): https://www.researchgate.net/post/what_is_pitch_or_pitch_frequency_of_a_speech_signal. dec 12th 2016.? hertz (hz): https://www.merriam-webster.com/dictionary/hertz#medicaldictionary. dec 12th 2016. loudness: http://hyperphysics.phy-astr.gsu.edu/hbase/sound/loud.html. dec 12th 2016.? pitch: https://www.merriam-webster.com/dictionary/pitch#medicaldictionary. dec 12th 2016. prosody: https://www.merriam-webster.com/dictionary/prosody. dec 12th 2016; ? http://grammar.about.com/od/pq/g/prosodyterm.htm. dec 14th 2016. semi-tone: https://www.merriam-webster.com/dictionary/semi-tone. dec 12th 2016. frontline learning research 3 (2014) 50-63 issn 2295-3159 corresponding author: markus gebhardt, school of education, tu münchen, arcisstraße 21, 80333 münchen, germany, markus.gebhardt@tum.de http://dx.doi.org/10.14786/flr.v2i1.73 50 | f l r basic arithmetical skills of students with learning disabilities in the secondary special schools: an exploratory study covering fifth to ninth grade markus gebhardt a , fabian zehner a , marco g. p. hessels b a tu münchen, germany b university of geneva, switzerland article received 27 th november 2013 / revised 17 th march 2014 / accepted 17 th march 2014 / available online 25 th april 2014 abstract the mission of german special schools is to enhance the education of students with special educational needs in the area of learning (sen-l). however, recent studies indicate that students with sen-l from special schools show difficulties in basic arithmetical operations, and the development of basic mathematical skills during secondary special school is not warranted. this study presents a newly developed test of basic arithmetical skills, based on already established tests. the test examines the arithmetical skills of students with sen-l from fifth to ninth grade. the sample consisted of 110 students from three special schools in munich. testing took place in january and june 2013. the test shows to be an effective tool that reliably and precisely assesses students’ performance across different grades. the test items can be used without creating floor and ceiling effects among fifth to ninth grade students with sen-l. the items’ conformity to the dichotomous rasch model is demonstrated. the students’ skills turn out to be very heterogeneous, both overall and within grades. many of the students do not even master basic arithmetical skills that are taught in primary school, although achievement improves in higher grades. keywords: arithmetical skills; curriculum based measurement; special needs gebhardt et al. 51 | f l r 1. schooling of students with sen the schooling of children with special educational needs (sen) is a controversial issue in school policies (european agency for development in special needs education, 2007). it has been shown that students in integrative educational settings show superior school performance (particularly in mathematics) and, in the long run, show greater social skills than students in special schools (baker, wang, & walberg, 1995; carlberg & kavele, 1980; eckhart, haeberlin, sahli lozano, & blanc, p., 2011; haeberlin, blanc, eckhart, & sahli-lozano, 2012; haeberlin, bless, moser, & klaghofer, 1991; merk, 1982; wang & baker, 1986). longitudinal research among students with sen in german-speaking regions showed a delay in school achievement of at least two years compared to children of a corresponding grade in a regular school (haeberlin et al., 1991). the hamburg school trials showed that the performance gap appeared in second grade and increased up to fourth grade, even in classes with particularly good inclusive care (hinz, katzenbach, rauer, schuck, wocken, & wudtke, 1998). cross-sectional studies confirm these findings (tent, witt, bürger, & zschoche-lieberum, 1991; wocken, 2000, 2005; wocken & gröhlich, 2007). seventh grade students with sen-l in special schools did not accomplish the requirements of fifth grade students in a general-education secondary school (hauptschule; wocken, 2000). in germany in 2010, however, only 22% of the students with sen and 23% of the students with sen in the area of learning (sen-l) were in integrative settings (sekretariat der ständigen konferenz der kultusminister der länder in der bundesrepublik deutschland, 2010). nevertheless, the integration rate is rising slowly. in the usa, the statistics about the school performance of students with sen draw a similar picture. in the special education elementary longitudinal study (seels; schiller, sandford, & blackorby, 2008), children with sen between the ages of 10 and 17 (n=5400) were observed over a period of six years. results showed that 60% of students with learning disabilities (ld) in segregated settings and 32% of students with ld in integrative classes achieved the lowest performance level in mathematics (lower than the 20 th percentile; schiller et al., 2008). in secondary school, the performance gap between the students with and without sen continues to widen. in ninth grade, the delay ranges from 3 to 4.9 years on average for students with ld, 1 to 3 years for students with emotional disturbance and more than five years for students with intellectual disabilities (blackorby, chorost, garza, & guzman, 2003). the individual growth over three school years varies widely, but in general, there are no significant differences in the magnitude of growth between the students with different types of sen (blackorby et al., 2003). this kind of longitudinal study is missing in the german speaking countries. 2. identification of students with sen-l in germany in almost all school systems, children with sen are identified to give them a legal right to additional resources and support in school, but the concepts of ld vary widely from country to country. as a consequence, the size of the population of children with diagnosed ld is different in any given country (sideridis, 2007). in the usa, for example, 5% of the student population is classified as having ld (hallahan, lloyd, kauffman, weiss, & martinez, 2005). in germany, 3% of all students are identified as students with sen-l (kmk statistics, 2010). these students have basic difficulties in various learning areas. traditionally, in german-speaking countries, next to pervasive difficulties in school learning, an iq below 85 (but above 70, thus excluding intellectual disability) was considered as the most effective diagnostic criterion of sen-l, since this allowed a general “objective” assessment of a child’s cognitive performance without using school indicators (grünke, 2004). the categorization of students with sen-l in germany is similar to the international definition of ld by lloyd, keller, and hung (2007). this definition refers to significant academic difficulties in school, for which neither other disabilities (e.g., sensory impairment, intellectual disability or emotional and behavioral disorders) nor lack of schooling can be found as cause (lloyd et al., 2007). students with a diagnosed dyslexia or dyscalculia are not identified as students with sen in germany (büttner & hasselhorn, 2011). identification of students with sen-l and, therefore, the gebhardt et al. 52 | f l r allocation of special educational resources to the school only applies to children with severe learning difficulties (klauer & lauth, 1997; schröder, 2008). since the diagnosis of sen-l appears not caused by somatic-medical reasons, but rather by the specific criteria of a given school system, the diagnosis of sen-l is under constant legitimacy pressure. iq testing has been criticized since the 1970s (bundschuh, 2010), both by psychologists and, especially, by teachers and educational practitioners, and consequently, iq is no longer used as the sole indicator of sen-l in present governmental recommendations in germany. nevertheless, many researchers still regard low intellectual abilities as the most important aspect of diagnosing sen-l (kretschmann, 2006) and recommend the administration of a language-free iq test in addition to standardized academic achievement tests as part of the diagnostic process (kany & schöler, 2009; kottmann, 2006). we hope that the instrument under construction that is presented in this article will provide an additional means for improved objective diagnosis of sen-l in the future. 3. basic mathematical skills one third of the students with sen-l, who have graduated from special schools, cannot handle numbers adequately and also have great trouble solving simple division tasks (lehmann & hoffmann, 2009). students show problems with the understanding of word problems, division, the decimal system, and the doubling or halving of numbers (moser opitz, 2007). the lack of elementary arithmetic skills is mainly responsible for mathematical difficulties in secondary school. basic mathematical skills require knowledge of quantity and numbers as well as operation rules (ehlert, fritz, arndt, & leutner, 2013; ennemoser, krajewski, & schmidt, 2011). a cross-sectional study by krajewski and ennemoser (2010) showed that basic skills are not only acquired in elementary school, but also trained in secondary school classes. however, the level of mastery of these basic skills of students in different school tracks is very diverse. high school fifth graders in gymnasium (grammar school) show better mastered basic skills than students in the eighth grade of hauptschule (lower track of secondary school; ennemoser et al., 2011). only one study exists in integrative classes which includes students with sen. an austrian study carried out in urban integrative classes showed that the level of basic skills was also very heterogeneous (gebhardt, schwab, schaupp, rossmann, & gasteiger-klicpera, 2012). even pupils without sen-l had great difficulties in basic arithmetic. as a matter of fact, more than 30% of the regular students (without sen) in fifth grade scored more than one standard deviation below the mean on a standardized school test (lower than the 16 th percentile). students with sen-l were able to solve tasks regarding additions and subtractions, but had significant problems with tasks concerning multiplications and divisions in the number range up to 10,000 (gebhardt et al., 2012). in german-speaking regions, research on the academic performance of students with sen-l is mostly performed in intervention studies (hecht, sinner, kuhl, & ennemoser, 2011; moog, 1993, 1995; moog & schulz, 1997, 2005; sinner & kuhl, 2010). these studies generally observed significant effects immediately following the interventions, but follow-up results again showed large differences between students with sen-l and regular students with learning difficulties. when the training in basic mathematical skills ended, the students with sen-l regressed to the same low level they showed before the intervention (hecht et al., 2011; sinner & kuhl, 2010). all intervention studies used grade based standardized school-tests, which were constructed with classical test theory. however, when overlooking these various studies, which show the specific difficulties of students with sen-l, it would be very useful to have one diagnostic tool that addresses the various arithmetic sub-skills and that is specifically tailored to this special population. 4. research question special needs students show a oneto three-year delay in their development of basic arithmetic skills. the problem with standardized school tests is that they were developed and standardized for average students in the regular curriculum and, as a consequence, have difficulty displaying the academic growth of gebhardt et al. 53 | f l r students with sen-l. adapting such tests raises challenges with respect to the measurement’s discriminatory power (e.g., ceiling and floor effects). another possibility is to use curriculum-based measurements (cbm) to examine academic growth of students with sen (deno, 2003). tests that are actually available were constructed with classical test theory. however, to measure academic progress, item response theory would be the better option (klauer, 2011; wilbert & linnemann, 2011) since these models avoid certain methodological flaws that are associated with tests constructed with classical test theory (such as unreliability of the change scores and incomparability of the scale units of the subsequent measures). our goal is to longitudinally assess the students’ arithmetic skills and to evaluate the achievements of students of different ages, both criterion-based and norm-based. this can be achieved by using instruments that show conformity to specific models from item response theory. assessing basic arithmetical skills, the instrument developed in the longitudinal study on student development in integrative classes silke (schulische integration im längsschnitt – kompetenzentwicklung bei schülerinnen mit und ohne spf in der sekundarstufe i; academic integration in a longitudinal study – development of competences of students with and without sen in secondary schools; gebhardt, 2013; gebhardt, schwab, krammer, & gasteiger-klicpera, 2012; gebhardt, schwab, schaupp et al, 2012; schwab, 2013), is used in this study to assess sen-l students in separated special schools. in contrast to students without sen, students in these special secondary schools are still explicitly taught in elementary arithmetical skills and these need to be addressed in the test. the aims of this pilot study, hence, are the following: − apply the instrument assessing basic arithmetical skills to assess the arithmetical skills of a sample of sen-l students and evaluate the scale’s conformity to the dichotomous rasch model. − explore the instrument’s characteristics regarding discriminatory power, as well as classical psychometric criteria. − explore the basic arithmetical achievement of students with sen-l in special schools, especially in respect to its development across the secondary school grades (cross-sectional), across one school year (longitudinal), as well as the interaction between these two factors. 5. method 5.1 design and sample the study was carried out in three special schools in munich in january and june 2012, which constitute the middle (t1) and the end (t2) of the school term, respectively. at both times of measurement, 62 male and 48 female students (n = 110) with sen-l from fifth to ninth grade were tested with the same instruments. at t1, students were 13.9 years old on average (sd = 1.6). students took tests in groups in sessions of about 15 to 20 minutes, but they could take as much time as needed. if a student did not answer an item, the test administrator reminded the student to do his very best to do so. as all items comprise free response formats guessing behavior can be neglected. table 1 shows the distribution of the sample across grades. gebhardt et al. 54 | f l r table 1 distribution of participants across school grades grade n female male age 5 20 (18%) 35% 65% 11.9 (0.6) 6 23 (21%) 48% 52% 13.1 (0.7) 7 14 (13%) 43% 57% 13.8 (0.6) 8 33 (30%) 48% 52% 15.0 (0.6) 9 16 (15%) 38% 62% 16.0 (0.6) total 110 (100%) 44% 56% 13.9 (1.6) 5.2 instruments on the basis of the arithmetic tests eggenberger rechentest 3+ (ert 3+; holzer, schaupp, & lenart, 2010) and ert 4+ (schaupp, lenart, & holzer, 2010), an instrument was devised that consists of the ert-scales, as well as additional, newly constructed items to handle the large heterogeneity in the target population and to avoid floor and ceiling effects. the ert was originally designed to assess arithmetical skills at the end of the third (3+) and the fourth grade (4+) of elementary school. ennemoser et al. (2011) differentiate arithmetic skills into knowledge of quantity as well as numbers and operation rules. in the currently devised instrument this differentiation is reflected in its subtests: knowledge of quantity is represented by the subtests writing numbers from dictation and number series; numbers and operation rules is represented by the subtests basic arithmetical skills and word problems. for the adapted instrument, the 12 items of the ert 4+ subtest number series were used, which measures knowledge about the place-value system. furthermore, the subtest basic numeracy (comprising 13 items) was used, dealing with addition, subtraction, multiplication and division. the placeholder task is another subtest taken from ert 4+, consisting of 6 items in which 2 numbers are given and the student has to find the third (e.g., ___ + 8 = 21). the subtest word problems comprise 9 items and was taken from ert 3+ to match the students’ levels and to avoid floor effects. table 2 presents the final instrument with its four subtests. table 2 subtests of the final instrument before item-selection procedure subtest origin n items basic arithmetical skills ert 4+: basic numeracy 13 ert 4+: placeholder 6 constructed by authors 15 word problems ert 3+: word problems 9 p re c u rs o rs number series ert 4+ number series 12 constructed by authors 2 writing numbers from dictation constructed by authors 14 5.3 analyses to test the subtests’ unidimensionality, the data were checked for conformity to the dichotomous rasch model. this means, all items pertaining to the same subtest were scaled in one model. then, to check the models’ conformity with regard to specific objectivity, the independence of item parameters across subsamples was evaluated. these subsamples were chosen using two split criteria: raw score median (thus creating two achievement groups) and gender (kubinger, 2005). andersen’s likelihood ratio test (lrt; gebhardt et al. 55 | f l r andersen, 1973), which is based on conditional maximum likelihood estimates, was used to indicate items’ conformity or non-conformity. for testing the items’ fit to the model, the so-called waldtest was used, which indicates the item parameter’s deviance from the model while taking the estimates’ standard error into account (fischer & scheiblechner, 1970). all analyses reported in this article were conducted with the software r (r core team, 2013) and more specifically the package erm (mair, hatzinger, & maier, 2012) which was used for estimating item parameters and calculation of goodness of fit tests, as well as the package pp (reif, 2012) for estimating person parameters. to analyze students’ ability and development in arithmetical skills, the person (ability) parameters were estimated using the item parameters from t1. these allowed to estimate person parameters for t1 as well as for t2 and, consequently, to map these abilities on one scale. in this case, warm maximum likelihood estimates were used, as these allow for the estimation of extreme abilities, especially regarding possible 0 scores in the sen-l group. 6. results 6.1 scaling and item-selection procedure the scaling process was based on the data of t1 and afterwards crosschecked with the data of t2, taking into account its interdependency. after removing two items from the subtest word problems and one item from each of the other subtest, all items showed conformity to the dichotomous rasch model. the subsequent quasi-cross-validation using t2 data was also successful. only for word problems and writing numbers from dictation the gender effect reached significance, but all other tests were not significant. table 3 presents the statistical values of the final rasch models for the four subtests. the andersen lrts showed to be not significant for the final selection of items (1% level of significance was chosen to avoid accumulation of type-i-errors; cf. kubinger, 2005), which indicates conformity to the dichotomous rasch model, both with respect to t1 and t2 data. as the andersen lrt uses cml-estimates, item parameters could not be estimated for items that were solved by all or never solved in the subsamples (the number of items is labeled with na in table 3). table 3 statistical values of the final rasch models for the four subtests split criterion lrt ² df 2 α=.01 p items removed na basic arithmetical skills t1 raw score median 42.6 28 48.3 .04 1 3 gender 45.8 31 52.2 .04 0 t2 raw score median 31.1 28 48.3 .32 3 gender 23.5 32 53.5 .86 0 number series t1 raw score median 9.4 10 23.2 .50 1 2 gender 16.2 11 24.7 .14 1 t2 raw score median 10.8 8 20.1 .22 4 gender 22.6 11 24.7 .02 1 word problems t1 raw score median 6.6 3 11.3 .08 2 3 gender 12.1 5 15.1 .03 1 t2 raw score median 5.0 4 13.3 .28 2 gender 14.6 5 15.1 .01 1 gebhardt et al. 56 | f l r writing numbers from dictation t1 raw score median 8.8 6 16.8 .18 1 6 gender 12.5 11 24.7 .33 1 t2 raw score median 16.8 7 18.5 .02 5 gender 25.7 12 26.2 .01 0 note. all tests show to be not significant at 1% level, indicating conformity to the rasch model. the nacolumn indicates the number of items that could not be evaluated due to 0% or 100% correct in the subsample. to illustrate the results of the item-selection procedure, a graphical representation of the model check of the subtest basic arithmetical skills at t2 is shown in figure 1. nearly all items are situated in the region of acceptable deviance, which is indicated by the gray control line. acceptable deviance is defined in regard to the standard error of estimations in the respective area on the logit scale (cf. wright & stone, 1999). furthermore, the standard errors of the estimations appear to be in an acceptable range (min = 0.2, mean = 0.3, max = 0.8, across all subtests and both times of measurement). gebhardt et al. 57 | f l r figure 1. graphical model checks of the subtest basic arithmetical skills (top) and number series (bottom) by raw score-median (left) and gender (right). the gray line indicates the limit of acceptable deviance for single items (cf. text). finally, the total instrument with the four subtests comprising 33, 8, 12 and 13 items, respectively, also showed conformity to the dichotomous rasch model. table 4 shows that the items present a wide range of difficulty levels, both overall and across grades in nearly every subtest, leading to a reliable assessment across a broad range of ability. only the subtest writing numbers from dictation shows a more narrow range of item difficulty for 9 th graders, which might lead to a small ceiling effect for these students. table 4 proportion correct within subtests across grades, including all selected items. basic arithmetic number series word problems writing numbers grade lo hi m lo hi m lo hi m lo hi m 5 6 7 8 9 .00 .00 .00 .00 .00 .85 .91 .93 .97 1.00 .27 .39 .46 .54 .68 .00 .04 .14 .21 .44 1.00 1.00 1.00 .97 1.00 .40 .57 .70 .67 .85 .00 .00 .00 .03 .19 .65 .91 1.00 1.00 1.00 .24 .30 .41 .44 .62 .11 .30 .36 .42 .81 .95 1.00 1.00 1.00 1.00 .50 .69 .79 .81 .97 note. lo = lowest value, hi = highest value, m = mean the subtest reliabilities (cronbach α) are presented on the diagonal of table 5. the reliabilities vary from .72 to .92, which is above the conventional cut-off-value of .80, except for the subtest word problems, of which the reliability is still very acceptable. it should be mentioned that items that function conform the rasch model are, as such, internally consistent because unidimensionality is included in the theoretical formulation of the model. table 5 further reports high inter-correlations between the subtests, ranging from .64 between number series and writing numbers from dictation to .74 between number series and word problems. gebhardt et al. 58 | f l r table 5 reliabilities and inter-correlations between the subtests at t1 (1) (2) (3) (4) basic arithmetic skills (1) .92 .75 .74 .72 number series (2) .86 .67 .64 word problems (3) .72 .66 writing numbers (4) .85 note. the subtests’ cronbach α is presented on the diagonal. 6.2 basic arithmetical achievement of students with sen-l students’ achievement, in the form of their person (ability) parameter, was very heterogeneous in every subtest and in every grade. person parameters referring to the subtest basic arithmetical skills showed standard deviations from 1.7 (on the logit scale) in grade six to 2.4 in grade nine. the dispersion in achievement did not show a trend across grades in terms of reduced or increased standard deviations. linear regression shows that achievement in every subtest at t1 is predicted by grade (α = .05), with effects ranging from β = 0.47 in word problems to β = 0.58 in basic arithmetical skills. these relations were also significant at t2, but decreased in effect size, which were now ranging from β = 0.37 in word problems to β = 0.41 in writing numbers from dictation. the moderate relationships between grade and ability confirm the instrument’s developmental validity. however, it must be noted that students from grades 7 and 8 showed very similar levels of achievement in every subtest and at both measurement points, except for basic arithmetical skills, in which 8 th grader scored 0.7 logits higher than 7 th graders at t1, but this difference vanished at t2. when shifting from cross-sectional analysis to a longitudinal analysis of the development of achievement from t1 to t2, further differences between the subtests become evident. two subtests appeared to group together with regard to development of mean achievement: in the basic arithmetical skills and the writing numbers for dictation subtests, students from lower grades somewhat improved over time, while those from higher grades regressed (see figure 2). in the other two subtests, number series and word problems, students from every grade improved over time. however, these are descriptive tendencies and in terms of significance only number series showed a longitudinal main effect (d = 0.22). an anova for repeated measurements shows that the factor time plays a significant role, f(1, 101) = 8.6, p = .00, η² = .08. although the interaction term did not reach significance, especially students from grade five increased in their achievements (+1.3 logits). a significant interaction effect between development and grade was found in basic arithmetical skills: f(1, 101) = 3.9, p = .01, η² = .14. this indicates that students in lower grades improve their basic arithmetical skills over time while those in higher grades do not, or even drop in performance (d = 0.34 for 5 th graders, d = 0.20 for 6 th graders, d = -0.12 for 7 th graders, d = -0.53 for 8 th graders, d = -0.41 for 9 th graders). gebhardt et al. 59 | f l r figure 2. development of person (ability) parameter distributions from t1 to t2 of the two subtests basic arithmetical skills (left) and number series (right), for each grade separately. research in the field of special education is particularly interested in the students’ performance variations. table 6 shows the students’ mean ability parameters on all 4 subtests and for all grades separately, in the context of temporal development. the fifth and sixth graders show improvement on all subtests. however, the mean scores of students in 7 th , 8 th and 9 th grade decreased in basic arithmetical skills and writing numbers for dictation, but remained stable or improved on number series and word problems. overall, a regression-to-the-mean-effect was found. i.e., students with low scores tended to improve their scores whereas students with high scores tended to show a decrease at t2. this was confirmed by weak to moderate negative correlation between learning gains (t2 t1) and achievement at t1 in basic arithmetic skills (r = -.55), number series (r = -.38), word problems (r = -.38) and writing numbers (r = -.35). table 6 mean (m) values and standard deviations (sd) of person parameters per grade at t1 and t2 basic arithmetic skills number series grade m t1 m t2 sd t1 sd t2 m t1 m t2 sd t1 sd t2 5 -2.0 -1.4 2.2 1.7 -1.0 0.4 2.4 1.9 6 -0.6 -0.4 1.0 1.4 0.7 1.2 1.8 2.2 7 -0.2 -0.4 1.6 2.2 1.6 1.7 1.1 1.1 8 0.5 -0.2 1.5 1.3 1.4 1.9 1.8 1.8 9 1.7 1.2 1.1 1.2 3.2 3.3 1.5 1.5 word problems writing numbers f. dictation grade m t1 m t2 sd t1 sd t2 m t1 m t2 sd t1 sd t2 5 -2.4 -1.9 2.2 2.5 0.3 0.6 2.2 2.6 6 -1.6 -1.3 1.7 1.7 2.0 2.3 2.0 2.0 7 -0.7 -0.3 1.9 2.4 3.0 2.6 1.5 1.8 8 -0.4 -0.4 1.7 2.3 3.0 2.4 1.8 2.0 9 1.0 1.3 2.4 2.1 4.8 4.3 1.0 1.3 gebhardt et al. 60 | f l r 7. discussion the instrument described in this article showed conformity to the dichotomous rasch model. it also did not show remarkable ceiling or floor effects and, thus, allowed to measure basic arithmetical performance of students with sen-l in special schools. only the newly constructed subtest writing numbers from dictation showed a somewhat narrow range of item difficulties for 9 th graders. this is not unexpected, since these students should already have acquired the basic competence of knowledge of quantity (krajewski & ennemoser, 2010). it would further be questionable if additional, more difficult items would measure the same construct. two items of the subtest word problems, which was taken from the ert 3+, had to be rejected and this scale should be further improved. nevertheless, the instrument showed similar results as those found in the silke study in integrative classes (gebhardt, 2013; gebhardt, schwab, schaupp et al., 2012; schwab, in press) and allowed a first exploration of the basic performance of students with sen-l in special schools. generally, students with sen-l lag several years behind their peers without sen. they are still learning what the other students learn in primary school and especially the basics of multiplication and division are taught to them in secondary school (see also moser opitz, 2007). the inter-correlations of the subtest showed that the performance levels were similar across the subtests and, empirically, it would be sufficient to describe a student with only one scale score, indicating arithmetical ability. however, since the subtest scores are indicative of the development of different arithmetical skills, these should provide support for fitting an appropriate arithmetic curriculum of students with sen-l. thus, the results should help improve the construction of real curriculum based measurement of arithmetic for students with learning disabilities. the instrument discriminated between the grades. although the grade level showed medium effects on all subtests at t1 and t2, the heterogeneity of student performance within the grades was very large. this means that it is necessary to have different mathematical problems with varying levels of complexity available to be able foster the mathematical abilities of all students (moser opitz, 2007). similar findings were described previously in several intervention studies (hecht et al., 2011; moog & schulz, 1997, 2005; sinner & kuhl, 2010), but until now, the arithmetical performance of students with sen had not been measured with a rasch scaled standardized test. one important finding of the longitudinal results was that students from every grade improved on the subtests number series and word problems, while only the 5 th and 6 th graders improved on the subtests writing numbers from dictation and basic arithmetical skills. this might be explained by the fact that the curriculum in 5 th and 6 th grade includes teaching basic arithmetical skills, whereas the curriculum of grades 7 to 9 prepares the students for vocational training. in these grades, basic skills are no longer explicitly trained, but instead, new operations such as fractions are introduced. as the old skills are not explicitly consolidated, basic arithmetic skills (including writing numbers form dictation) and from 3 rd grade in primary school may again become a challenge for students in the 9 th grade of special schools (see, e.g., steiner, 2009). another factor influencing the results, might be that the special school students who are performing well in 5 th and/or 6 th grade can attain integrative classes in 7 th grade. since such students “disappear” to other classes or schools, the cross-sectional data presented here cannot be interpreted in the same way as real longitudinal data. the present data must be viewed as giving explorative information, also when considering the relatively small sample that was included in this study. a much larger sample must be tested to draw stronger conclusions. finally, the development of basic arithmetical skills in this study was relatively limited. this underlines the challenge of teaching basic arithmetical skills in special schools and the, currently, rather limited success. instruments such as the one presented in this article, that allow the continuous measurement of a series of arithmetical skills in secondary special education, may help to further develop evidence based interventions that are tailored to the needs of the students. when measurement and intervention are adapted to the needs of the students, they can jointly help in improving the students’ arithmetic abilities. gebhardt et al. 61 | f l r keypoints students with special educational needs from special schools show difficulties in basic arithmetical operations. a newly developed rasch scaled instrument allows the reliable measurement of basic arithmetical skills of students with sen-l in secondary education. students’ skills turn out to be very heterogeneous, both overall and within grades. many students do not even master arithmetical skills that are taught in primary school, although achievement improves in higher grades. references andersen, e. b. (1973). a goodness of fit test for the rasch model. psychometrika, 38(1), 123–140. doi: 10.1007/bf02291180 baker, e. t., wang, m. c. & walberg, h. j. (1995). the effect of inclusion on learning. educational leadership, 52(4), 33–35. blackorby, j., chorost, m., garza, n., & guzman, a. m. (2003). the academic performance of secondary school students with disabilities. in u.s. department of education (eds.), the achievement of youth with disabilities during secondary school. a report from the national longitudinal transition study 2. menlo park, ca: sri international. retrieved from http://www.seels.net/designdocs/seels_w1w3_final.pdf bundschuh, k. (2010). einführung in die sonderpädagogische diagnostik (7th ed.). münchen: e. reinhardt. büttner, g. & hasselhorn, m. (2011). learning disabilities: debates on definitions, causes, subtypes and responses. international journal of disability, development and education, 58(1), 75–87. doi: 10.1080/1034912x.2011.548476 carlberg, c. & kavale, k. (1980). the efficacy of special versus regular class placement for exceptional children: a meta-analysis. the journal of special education, 14(3), 295–309. doi: 10.1177/002246698001400304 deno (2003). curriculum-based measurment. journal of special education, 37, 184–192. doi:10.1177/00224669030370030801 eckhart, m., haeberlin, u., sahli lozano, c., & blanc, p. (2011). langzeitwirkungen der schulischen integration. [long-term effects of school integration]. bern, switzerland: haupt verlag. ehlert, a., fritz, a., arndt, d. & leutner, d. (2013). arithmetische basiskompetenzen von schülerinnen und schülern in den klassen 5 bis 7 der sekundarstufe. journal für mathematik-didaktik, 34(2), 237– 263. doi:10.1007/s13138-013-0055-0 ennemoser, m., krajewski, k., & schmidt, s. (2011). entwicklung und bedeutung von menge-zahlenkompetenzen und eines basalen konventionsund regelwissens in der klasse 5 bis 9. zeitschrift für entwicklungspsychologie und pädagogische psychologie, 34(4), 228–242. doi: 10.1026/00498637/a000055 european agency for development in special needs education (2007). assessment in inclusive settings. key issues for policy and practice. odense, denmark: european agency for development in special needs education. fischer, g. h., & scheiblechner, h. h. (1970). algorithmen und programme für das probabilistische testmodel von rasch. [algorithms and programs for rasch’s probabilistic test model.]. psychologische beiträge, 12, 23–51. gebhardt, m. (2013). integration und schulische leistungen in grazer sekundarstufenklassen. eine empirische explorative pilotstudie. wien: lit verlag. gebhardt, m., schwab, s., krammer, m., & gasteiger-klicpera, b. (2012). achievement and integration of students with special needs (sen) in the fifth grade. journal of special education and rehabilitation, 13, 7–19. doi: 10.2478/v10215-011-0022-6 http://dx.doi.org/10.2478/v10215-011-0022-6 gebhardt et al. 62 | f l r gebhardt, m., schwab, s., schaupp, h., rossmann, p., & gasteiger-klicpera, b. (2012). heterogene gruppen in mathematischen grundfertigkeiten: eine explorative erkundung der fähigkeiten im grundrechnen in integrationsklassen der 5. schulstufe. zeitschrift für inklusion (online), (1-2). available from www.inklusion-online.net/index.php/inklusion/article/view/155/147 grünke, m. (2004). lernbehinderung. in g. w. lauth, & m. grünke (eds.), interventionen bei lernstörungen. förderung, training und therapie in der praxis (pp. 65–77). göttingen: hogrefe, verl. für psychologie. haeberlin, u., blanc, p., eckhart, m., & sahli-lozano, c. (2012, may). intégration scolaire d'enfants en difficultés d'apprentissage: effets à long terme. information sur la recherche éducationnelle, csre, n° 12:021. [school integration of children with learning difficulties: long-term effects. information about educational research, csre, n° 12:021]. retrieved may 31, 2013 from www.skbfcsre.ch/de/bildungsforschung/datenbank/. haeberlin, u., bless, g., moser, u., & klaghofer, r. (1991). die integration von lernbehinderten: versuche, theorien, forschungen, enttäuschungen, hoffnungen. bern: haupt. hallahan, d. p., lloyd, j. w., kauffman, j. m., weiss, m. p., & martinez. (2005). learning disabilities: foundations, characteristics, and effective teaching. needham heights: allyn & bacon. hecht, t., sinner, d., kuhl, j., & ennemoser, m. (2011). differenzielle effekte eines trainings der mathematischen basiskompetenzen bei kognitiv schwachen grundschülern und schülern der förderschule mit dem schwerpunkt lernen – reanalysen zweier studien. empirische sonderpädagogik, (4), 308–323. hinz, a., katzenbach, d., rauer, w., schuck, k. d., wocken, h. & wudtke, h. (1998). die integrative grundschule im sozialen brennpunkt: ergebnisse eines hamburger schulversuchs. hamburg: hamburger buchwerkstatt. holzer, n., schaupp, h., & lenart, f. (2010). eggenberger rechentest (ert 3+): diagnostikum für dyskalkulie für das ende der 3. schulstufe bis mitte der 4. schulstufe. bern: huber. kany, w., & schöler, h. (2009). diagnostik schulischer lernund leistungsschwierigkeiten: ein leitfaden mit einer anleitung zur gutachtenerstellung. stuttgart: kohlhammer. klauer (2011). lernverlaufsdiagnostik – konzept, schwierigkeiten und möglichkeiten. empirische sonderpädagogik, (3), 207–224 klauer, k. j., & lauth, g. w. (1997). lernbehinderungen und leistungsschwierigkeiten bei schülern. in f. e. weinert (eds.), enzyklopädie der psychologie, psychologie des unterrichts und der schule (pp. 701–738). göttingen: hogrefe. kubinger, k. d. (2005). psychological test calibration using the rasch model – some critical suggestions on traditional approaches. international journal of testing, 5(4), 377–394. lauth, g. w., & grünke, m. (eds.). (2004). interventionen bei lernstörungen: förderung, training und therapie in der praxis. göttingen: hogrefe, verl. für psychologie. kottmann, b. (2006). selektion in die sonderschule: das verfahren zur feststellung von sonderpädagogischem förderbedarf als gegenstand empirischer forschung. bad heilbrunn: klinkhardt. krajewski, k., & ennemoser, m. (2010). entwicklung mathematischer basiskompetenzen in der sekundarstufe. empirische pädagogik, 24(4), 353–370. kretschmann, r. (2006). diagnostik bei lernbehinderungen. in u. petermann, & f. petermann (eds.), diagnostik sonderpädagogischen förderbedarfs (pp. 139–162). göttingen: hogrefe. lehmann, r., & hoffmann, e. (2009). berliner erhebung arbeitsrelevanter basiskompetenzen von schülerinnen und schüler und schüler mit dem förderbedarf „lernen“. münster: waxmann. lloyd, j. w., keller, c., & hung, l. (2007). international understanding of learning disabilities. learning disabilities research & practice, 22(3), 159–160. doi: 10.1111/j.1540-5826.2007.00240. mair, p., hatzinger, r., & maier, m. j. (2012). erm: extended rasch modeling. r package version 0.15–1. merk (1982). lernschwierigkeiten – zur effizienz von fördermaßnahmen an grundund lernbehindertenschulen. heilpädagogische forschung, 1, s. 53–69. moog, w. (1993). schwachstellen beim addieren – eine erhebung bei lernbehinderten sonderschülern. zeitschrift für heilpädagogik, 44, 534–554. gebhardt et al. 63 | f l r moog, w. (1995). flexibilisierung von zahlbegriffen und zählhandlungen – ein übungsprogramm. heilpädagogische forschung, 21(3), 113–121. moog, w., & schulz, a. (1997). das dortmunder zahlbegriffstraining – lernwirksamkeit bei rechenschwachen grundschülern. sonderpädagogik, 27(2), 60–68. moog, w., & schulz, a. (2005). zahlen begreifen: diagnose und förderung bei kindern mit rechenschwäche (2. überarb. aufl.). weinheim: beltz. moser opitz, e. (2007). rechenschwäche – dyskalkulie: theoretische klärungen und empirische studien an betroffenen schülerinnen und schüler. bern: haupt. r core team. (2013). r: a language and environment for statistical computing. r foundation for statistical computing: vienna, austria. reif, m. (2012). pp: person parameter estimation. r package version 0.2. schaupp, h., lenart, f., & holzer, n. (2010). eggenberger rechentest (ert 4+): diagnostikum für dyskalkulie für das ende der 4. schulstufe bis der mitte der 5. schulstufe. bern: huber. schiller, e., sanford, c. & blackorby, j. (2008). the achievments of youth with disabilities during secondary school: a report from the national longitudinal transition study 2 (u.s. department of education, hrsg.). retrieved from http://www.seels.net/info_reports/seels_learndisability_%20spec_topic_report.12.19.08ww _final.pdf. schröder, u. (2008. lernbehindertenpädagogik: grundlagen und perspektiven sonderpädagogischer lernhilfe (2nd ed.). stuttgart: kohlhammer. schwab, s. (2013). schulische integration, soziale partizipation und emotionales wohlbefinden in der schule ergebnisse einer empirischen längsschnittstudie. berlin: lit-verlag. sekretariat der ständigen konferenz der kultusminister der länder in der bundesrepublik deutschland (2010). sonderpädagogische förderung in schulen 1999 bis 2008: dokumentation nr. 189 – märz 2010. retrieved from http://www.kmk.org/fileadmin/pdf/statistik/dok_189_sopaefoe_2008.pdf. sideridis, g. d. (2007). international approaches to learning disabilities: more alike or more different? learning disabilities research & practice, 22(3), 210–215. doi: 10.1111/j.1540-5826.2007.00249.x sinner, d., & kuhl, j. (2010). förderung mathematischer basiskompetenzen in der grundstufe der schule für lernhilfe. zeitschrift für entwicklungspsychologie und pädagogische psychologie, 42(4), 241– 251. steiner, g. (2009). forgetting while learning: a plea for specific consolidation. journal of cognitive education and psychology, 8, 117–127. tent, l., witt, m., bürger, w., & zschoche-lieberum, c. (1991). ist die schule für lernbehinderte überholt? heilpädagogische forschung, (1), 289-320. wang, m. c. & baker, e. t. (1985-86). mainstreaming programs: design features and effects. the journal of special education, 19(4), 503–521. wilbert, j., & linnemann, m. (2011). kriterien zur analyse eines tests zur lernverlaufsdiagnostik. empirische sonderpädagogik, 3, 225–242. wocken, h. (2005). andere länder, andere schüler?: vergleichende untersuchungen von förderschülern in den bundesländern brandenburg, hamburg und niedersachen. potsdam. retrieved from http://bidok.uibk.ac.at/library/wocken-forschungsbericht.html wocken, h. (2007). fördert förderschule? eine empirische rundreise durch schulen für „optimale förderung“. in i. demmer-dieckmann, & a. textor (eds.), integrationsforschung und bildungspolitik im dialog (pp. 35–60). bad heilbrunn: klinkhardt. wocken, h. & gröhlich, c. (2009). kompetenzen von schülerinnen und schülern an hamburger förderschulen. in: w. bos, m. bonsen & c. gröhlich (hrsg.), kess 7 – kompetenzen und einstellungen von schülerinnen und schülern an hamburger schulen zu beginn der jahrgangsstufe 7 (pp. 133–142). münster: waxmann. wright, b.d., & stone, m.h. (1999). measurement essentials. wide range inc.: wilmington. retrieved from: http://www.rasch.org/measess/me-all.pdf frontline learning research 5 (2014) 92 114 issn 2295-3159 corresponding author: andria andiliou, academic staff developer, university of bristol, andria.andiliou@bristol.ac.uk http://dx.doi.org/10.14786/flr.v2i3.87 92 | f l r creative solutions and their evaluation: comparing the effects of explanation and argumentation tasks on student reflections andria andiliou a , p. karen murphy b a university of bristol, united kingdom b the pennsylvania state university, usa article received 12 february 2014 / revised 24 march 2014 / accepted 15 june 2014 / available online 27 june 2014 abstract creative problem solving which results in novel and effective ideas or products is most advanced when learners can analyze, evaluate, and refine their ideas to improve creative solutions. the purpose of this investigation was to examine creative problem solving performance in undergraduate students and determine the tasks that support critical selfevaluations of creative solutions by comparing alternative types of reflective tasks. participants (n = 103) first provided demographic information and responded to individual difference measures (i.e., divergent thinking, need for cognition, and beliefs about creative outcomes) and then read a problem scenario in which they assumed the role of a high school teacher who was asked to design a creative college preparatory course. following, participants completed either an explanation reflective task or an argument based reflective task. finally, participants evaluated their proposed course by rating it on characteristics that describe the originality and effectiveness of creative solutions. findings confirmed the role of divergent thinking as a positive predictor of the originality of a creative solution, whereas, need for cognition, and academic major were positive predictors of the effectiveness of a creative solution. participants rated their creative solutions differentially depending on their beliefs and the type of reflective task. those whose beliefs aligned better with conceptualizations of creative outcomes assessed more positively the originality and effectiveness of their solution. the findings indicate that the argumentation task could potentially promote reflective and critical thinking about a creative solution as participants who completed the argumentation task evaluated their solution more conservatively. keywords: creative problem solving; creativity beliefs; self-evaluation; reflection; argument diagrams a.andiliou 93 | f l r 1. introduction creative problem solving is manifested in everyday life situations as well as in academic contexts (diakidoy & constantinou, 2001). it represents a goal directed cognitive process that results in the production of original and effective solutions when no obvious solution method is available (antiliou, 2012). consider the example of a mud engineer trying to find an innovative solution to prevent oil leaks from an underwater well, the example of a community center manager attempting to design activities that address the needs of a diverse community, or a group of preschool children trying to improvise the rules of a game to accommodate more players. in all cases, the situations call for novel but still effective solutions. many countries around the world have identified the development of creative thinking across subject areas as a core student learning outcome (diakidoy & constantinou, 2001) and have pushed for problembased approaches that provide students with opportunities to construct creative solutions to authentic problems. however, students are often overwhelmed when they are asked to submit creative assignments or generate creative solutions. arguably, one possible explanation for these negative reactions is that students have to rely on their creativity beliefs, but they are uncertain about the characteristics of creative outcomes. consequently, they are unsure about how creative their proposed solution is or how to determine the creativeness of their solution. unfortunately, the role of beliefs has not been adequately explored with respect to creative problem solving performance. as such, the first goal of the present study was to examine the contribution of cognitive and affective individual difference variables including an individual’s beliefs on creative performance. besides creative performance per se, researchers have explored the self-evaluations of the proposed solutions and research evidence suggests that students’ evaluations of self-generated solutions are superficial rather than reflective (runco & chand, 1994). in addition, students tend to engage in case-building about a solution. instead of critically judging a solution, students argue and justify their solutions by discounting potential obstacles and consequences or curtailing the importance or extensiveness of the problem (byrne, shipman, & mumford, 2010; daily & mumford, 2006; nussbaum, 2008). in order to promote more critical reasoning and reflective evaluations in problem solving, researchers examined the effectiveness of structure supports such as prompts (chen & bradshaw, 2007; ge & land, 2003), directions (feretti, macarthur, & dowdy, 2000; nussbaum & sinatra, 2003; nussbaum & kardash, 2005), cases (choi & lee, 2009; hernandez-serano & jonassen, 2003), visual representations (nussbaum, 2008; nussbaum & schraw, 2007), collaborative reasoning, and argumentation tools and tasks (cho & jonassen, 2002). these types of structure supports engage students in thinking about other perspectives, opinions, and approaches to the problem. this is particularly the case when the structure support involves argumentation as a mechanism to elaborate, make explicit the reasoning underlying the problem solving and to foster reflection about a solution (andriessen, 2006; oh & jonassen, 2007). a type of structure support, argumentation tools, were found to significantly improve students’ argumentation skills and group problem solving with some marginal effects on individual problem solving (cho & jonassen, 2001; oh & jonassen, 2007; uribe, klein, & sullivan, 2003). however, past research has not specified how argumentation tools and specifically, argument diagrams influence the self-evaluation of a solution when the problem calls for creative solutions which are original and effective. thus, the second goal of this study was to address this gap in the literature by investigating the effects of an argument diagram on the self-evaluation of a creative solution that participants forwarded to a course design problem. 1.1 creative problem solving several models have been proposed to describe creative problem solving including: the simplex model of creative process (basadur et al., 1994), the creative problem solving framework (isaksen et al., 1994), and the model of creative thought (mumford et al., 1991). in the simplex model the problem solver moves in cycles of ideation and evaluation that occur in different phases of problem solving during which the learner generates and formulates the problem, solves the problem and implements the relevant, appropriate, and original ideas (runco & chand, 1994). according to the creative problem solving a.andiliou 94 | f l r framework, the learner needs to understand the problem, generate solution ideas, and plan for action by developing solutions that could be effectively implemented (treffinger, 1995). in mumford et al. (1991) model of creative thought the learner combines and reorganizes categories or concepts to develop a new understanding of the problem (ideas), which are then evaluated and implemented. these aforementioned models illustrate that creative problem solving evolves across several cognitive subprocesses initiated by the construction of a problem space, the generation of ideas, and the evaluation of a selected solution. creative problem solving is a form of ill-structured problem solving, which results in the production of original and effective solutions (antiliou, 2012). drawing on a review of the empirical literatures of creative and ill-structured problem solving, certain individual difference variables were expected to have an effect on the creativity of a solution with respect to its originality and effectiveness. specifically, divergent thinking that is the ability to generate multiple ideas was found to be predictive of creative problem solving performance (e.g., hunter et al., 2008; reiter-palmon et al., 2009; 1997). in addition, research evidence indicated that need for cognition that represents an individual’s tendency to engage in and enjoy effortful cognitive endeavours (cacioppo, petty, & kao, 1984) also, predicts creative problem solving performance (butler et al., 2003; hunter et al., 2008; osburn & mumford, 2006). students’ domain knowledge and their beliefs about the characteristics of creative outcomes can potentially exhibit an influence on creative problem solving performance. research evidence suggests that problem solvers’ perceptions of a task and their domain knowledge impacts the search for relevant information, the representation of the problem, and the evaluation of potential solutions (jonassen, 1997; voss et al., 1991). also, evidence indicates that knowledge of the important concepts and principles of a domain contributes in better performance on ill-structured tasks (shin, jonassen, & mcgee, 2003; voss & post, 1988) and serves as the foundation of creative solutions (weisberg, 2006). evaluation is a component of problem solving and it represents the metacognitive process during which problem solvers reflect and assess a proposed solution. evaluation is essential for ill-structured problems that require a creative solution because it is the process by which problem solvers can determine whether a proposed solution meets the creative criteria of originality and effectiveness. according to voss and her colleagues (1981) argumentation is a means for problem solvers to evaluate more analytically a solution by elaborating and clarifying the solution and identifying its limitations. researchers have experimented with graphic organizers such as argument diagrams in order to promote critical and reflective thinking in writing tasks. for example, nussbaum and schraw (2007) found that argument diagrams supported better integration of arguments and counterarguments in writing tasks. for the present study, our second aim was to determine whether an argument task (i.e., argument diagram) promotes more reflective critical self-evaluations of a potentially creative solution in comparison to an explanation task. 1.2 explanation and argumentation for critical thinking explanation and argument tasks have been used to promote and assess understanding, critical thinking, conceptual change, and problem solving (reznitskaya, anderson, & kuo, 2007; jonassen & kim, 2010; nussbaum & sinatra, 2003; willey & voss, 1999). explanation is a constructive learning activity during which learners elaborate and clarify an idea by explaining it to oneself and it was found to lead to enhanced learning, more accurate self-assessments, and more effective problem-solving (fonseca & chi, 2011). when learners elaborate they generate inferences and integrate information with prior knowledge. the self-explanation effect was found to be positive both for learners with low and high prior knowledge. a possible interpretation of this result is that for individuals with low knowledge, self-explaining allows them to generate inferences to fill their knowledge gaps and for learners with high prior knowledge, selfexplaining allows them to repair their existing mental models (chi, 2000; fonseca & chi, 2011). research findings indicate that self-explanation can be a powerful learning strategy due to the underlying cognitive mechanisms that allow learners to identify and remedy knowledge gaps by generating inferences and to develop and repair their knowledge representation models. thus, when learners move beyond simply knowledge-telling with summaries or paraphrased statements and are engaged in self-explanation through inference generation and knowledge integration they seem to gain a deeper understanding. however, even if a.andiliou 95 | f l r self-explanation to one-self represents a constructive learning activity it was found to be somewhat less effective in learning and problem solving tasks when compared to more interactive learning activities such as responding to question prompts, explaining to someone else, and discussing with a partner to generate collaborative explanations. in the present study, within the context of a problem solving task an explanation prompt was compared with an argument task to examine the degree to which the tasks promote reflective thinking when evaluating a proposed creative solution. argumentation was conceptualized by kuhn (1991) as the cognitive process of formulating and weighting the arguments for and against a course of action, a point of view, or a solution to a problem. argumentation skills are comprised of the skill to generate reasons, offer evidence, and provide counterarguments and rebuttals. three theoretical frameworks have been applied to analyze argumentation based on rhetorical and dialectical arguments in educational settings. rhetorical arguments are put forward to persuade or convince others about a claim or proposition without consideration to alternative positions (toulmin, 1958). dialectical arguments are based on the dialogue between supporters of alternative positions during a dialogue game or a discussion (jonassen & kim, 2010). through dialectical argumentation within an individual or within a group, individuals resolve differences, compromise between multiple opinions, and convince on the advantages of a position. researchers in the learning sciences draw primarily on three theoretical approaches to analyze and evaluate the quality of argumentation: toulmin’s rhetorical argumentation framework, pragma-dialectics (van eeemeren & grootendorst, 1992) and walton’s (2000) dialogue theory. toulmin has proposed an argument scheme to describe the structure of effective argumentation that includes a sequential set of components: a claim that expresses the position, facts or opinions that serve as data in support of the claim, a warrant as justification, and elaborative elements such as a backing, qualifier, and rebuttal to potential counterclaims. although toulmin’s framework is useful for analyzing rhetorical argumentation to determine the soundness and effectiveness of a line of reasoning of an individual (andriessen, 2006), it has two primary limitations: its complexity (e.g., warrants are often implicit) and its focus on the perspective of one proponent (leitão, 2003; van eemeren & grootendorst, 1999) instead of argumentation as “a discourse phenomenon” (andriessen, 2006) especially with reference to educational contexts. two theoretical models that are more applicable to the dialectical nature of argumentation as it is manifested in educational contexts are the pragma-dialectics (van eeemeren & grootendorst, 1992) and dialogue theory (walton, 2000). based on the pragma-dialectics, argumentation is a means of resolving differences of opinions through critical discussions that evolve in four stages. first, people present their positions at the confrontation stage, they assume their roles and agree on procedures at the opening stage, they defend and challenge during the argumentation stage and at the concluding stage they decide who has prevailed the critical discussion. another collaborative view of argumentative discourse is conceptualized in walton’s dialogue theory (walton, 2000) in which walton argues that argumentation is a goal-directed and interactive dialogical activity during which individuals reason together about arguments to generate one proposed solution. walton (2000) identified specific forms of dialogue (e.g., information seeking, negotiation, persuasion, inquiry) along with argumentation schemes comprised of critical questions and moves to model and support argumentation. educators can draw on the schemes suggested in dialogue theory to plan, organize, and evaluate classroom and online discussions, and use argumentation as a vehicle for critical thinking and problem solving. in the present study an overarching critical question was used to stimulate student argumentation with an imaginary group of stakeholders with the purpose of exploring the potential of a creative solution to an authentic problem. 1.2.1 supporting argumentation researchers have documented that acquiring the skills to argue effectively is challenging both for adolescents and young adults (felton & kuhn, 2001; reznitskaya et al. 2001). in order to engage students in argumentation and promote the development of argumentation skills educational researchers have designed and experimented with argumentation supports in the contexts of reading, writing, and problem solving tasks. among these argumentation supports are directions, computerized and face-to-face collaborative argumentation, and visual argumentation aids. a.andiliou 96 | f l r goal directions have been used as a means to promote reflective argumentation in problem solving and writing tasks. for example, nussbaum and sinatra (2003) asked undergraduate students who provided a wrong answer to a physics problem in which they had to predict the path of a falling object to counter-argue by providing reasons why a person would hold an opposing position. the researchers found that students who proposed counterarguments had a more integrated understanding of the problem situation and the important underlying concepts. the effectiveness of directions in supporting argumentation varied based on the goal they conveyed. when students were instructed to persuade an audience instead of explaining their position or solution, evidence indicated that they engaged in case-building, the overall quality of writing was poorer with fewer counterarguments but more reasons in justification of their position. alternatively more specific directions that guided students to generate complete arguments were more effective in facilitating student argumentation. when nussbaum and kardash (2005) gave goal directions that varied in generality (i.e., opinion, reason, counterargue/rebut), they found that the group that received more specific directions to persuade by generating reasons, evidence, counterclaims and rebuttals, produced writing of better overall quality, more balanced, and the participants offered more counterarguments and rebuttals. in a follow-up experiment, nussbaum and kardash (2005) compared the effects of two types of goal directions (e.g., express an opinion or persuade an audience) and the effects of a two-sided non-refutational text. undergraduate students who were directed to express an opinion and read the text produced essays of better overall quality, wrote more elaborative arguments and offered more counterarguments in comparison to those who were directed to persuade as the text stimulated students’ thinking. directions to persuade had a significant negative effect on the overall quality of argumentation only for students who did not read the text. researchers raised caution about persuasion directions as it is possible that students’ rely on a misconception that they are more effective in convincing an audience by elaborating on their position than raising counterarguments (ferretti, macarthur, & dowdy, 2000; nussbaum & kardash, 2005). in order to promote more balanced and reflective reasoning researchers have utilized other structure supports such as collaborative reasoning discussions and visual argumentation aids including computer-scaffolding tools, outlines and diagrams. researchers have investigated the effect of collaborative argumentation both computerized and faceto-face to facilitate students’ critical reasoning and argumentation within the context of problem solving tasks. in two exemplar studies researchers examined the effects of argumentation scaffolds and question prompts on the quality of argumentation, the group problem solving performance and transfer to individual problem solving (cho & jonassen, 2003; oh & jonassen, 2007). typical argumentation scaffolds included sentence openers that helped to explicate a solution, agree or disagree with a solution, put forward evidence, and elaborate on the solution. in addition, guidance questions functioned as scaffolds (e.g., how can you verify the accuracy or value of your solution?) in the collaborative discussion environments. researchers found that the argumentation scaffolds improved the quality of the discussion in terms of the number of argument components including claims on how to solve the problem and evidence to support the solution (cho & jonassen, 2003; oh & jonassen, 2007). there was also improvement in the overall quality of problem solving subprocesses including problem definition, selection of relevant information, hypothesis generation and testing, solution development and evaluation. however, in both studies the researchers did not detect significant transfer effects of argumentation scaffolding on individual problem solving. thus, suggesting that learners may need long-term and more comprehensive opportunities for extended engagement in collaborative problem solving to effectively transfer and apply argumentation skills in individual problem solving. collaborative discourse was also effective in facilitating argumentation when combined with instruction on basic argumentation concepts and reading of multiple texts. in a study of middle school students, martunen and laurinen (2006) found that after reading three texts and participating in pair conversations on the topic of genetically modified organisms, the student-constructed argumentation diagrams were more elaborative and reflective as they included more themes and arguments. moreover, kim (2001) found that incorporating a metacognitive group monitoring activity in collaborative reasoning discussions had contributed in more dialogic and reflective student writing. in addition, the counterarguments and rebuttals increased in the post-discussion essays and the essays provided evidence that students were attentive to their reasoning by reflecting and evaluating their position. in conclusion, guided a.andiliou 97 | f l r opportunities in which learners participate in argument-based discourse facilitate development and internalization of argumentation knowledge and skills, and improve both the quality of the arguments and the peer dialogues. visual argumentation aids such as argument diagrams have been utilized by researchers to promote coherent and organized argumentation with well-integrated arguments and counterarguments in support of a final position. in a series of studies nussbaum and schraw (2007) examined the effectiveness of a graphic organizer to guide more balanced and reflective argumentation. they found that both instruction about the criteria of a good argument and the graphic organizer improved the quality of writing, increased the number of counterarguments, and the overall integration score. however, the students who used the graphic organizer preferred to apply refutation as an integration strategy in comparison with students who received criteria instruction and primarily used weighing and synthesizing opposing perspectives into a creative position. in a follow-up study, that aimed to facilitate students to become more metacognitively reflective and explore perspectives on an issue before integrating them into a final position, nussbaum (2008) modified the graphic organizer to an argumentation vee diagram (avd). in this study nussbaum examined whether an elaborative intervention that utilizes the diagram with instruction on how to integrate arguments and counterarguments, and discussion of the criteria of evaluating the strength of arguments and counterarguments results in better argumentation and has a transfer effect. the experimental group improved their writing in terms of integration over three sessions using most frequently the synthesis strategy but there was no significant transfer effect to a task in which the diagram was removed. in another study, when fifth graders collaboratively used an argument diagram, they generated more coherent arguments than when they collaborated to list pro-con positions (scwarz, neuman, & biezuner, 2000). thus, the studies provide evidence that the argument diagrams have the potential to stimulate consideration of counterarguments and facilitate more elaborated and coherent argumentation but more research is needed to determine whether the use of diagrams enhances reflective and critical thinking about complex issues. virtual graphic tools have also been used to support student argumentation and engagement in critical discussions. typically computerized argumentation diagrams have the capacity to represent both the components of an argument and relations of support and disagreement (jonassen & kim, 2010). in their study of the vcri argumentation tool, munneke, van amelsvoort, and andriessen (2003) examined the role of argumentative diagrams that were constructed in advance individually or collaboratively during an electronic discussion in supporting student interaction with the purpose of writing a collaborative text on genetically modified organisms. the researchers found that diagrams that were constructed in advance helped students to focus their subsequent discussions on argumentation and were used as information sources and collaborative diagrams were also used for note-taking to summarize the discussion. thus, the diagram helped to maintain focus and functioned as an aid for organizing and maintaining coherence during the discussion. munneken and colleagues (2003) noted though, that even if diagrams stimulated collaborative discussions still argumentation was one-sided as most diagrams were very unbalanced. another study conducted by easterday, aleven, and scheines (2007) provided further evidence of the effectiveness of argument diagrams as a graphic organizer for argumentation. learners in this experimental study who analyzed public policy problems using a causal diagram organized better their perceptions of the arguments in comparison with students who only read about the problem in a text. however, students who used the diagramming tool learned more about constructing causal arguments as they were engaged in a more constructive activity while using the tool to formulate their arguments. as newell and colleagues (2011) argued in their review of the studies on teaching and learning argumentation, the diagrams printed or virtual help learners manage the complexities of argumentation and especially the task of considering alternative perspectives and integrating arguments with counterarguments but more evidence is needed to support whether they facilitate more critical and reflective thinking. 1.3 purpose of the study the purpose of the study was to examine creative problem solving performance in undergraduate students and compare how alternative tasks (e.g., explanation or argumentation) support reflective selfevaluations of creative solutions. two research questions guided our investigation: a.andiliou 98 | f l r how do individual differences in divergent thinking, need for cognition, beliefs about creative outcomes, and academic major impact the creativity of a solution with respect to its (a) originality and (b) effectiveness? to what extent does a reflective task (i.e., an explanation task or an argumentation task) differentially support the students’ self-evaluation of their creative solution? based on the review of literature the following five hypotheses were forwarded: hypothesis 1.1. students who are strong divergent thinkers and high in need for cognition will propose creative solutions that are both original and effective. hypothesis 1.2. students who conceptualize creative solutions as both original and effective will develop a solution that is highly effective and may or may not be as original. these students will also evaluate their creative solutions more positively. hypothesis 1.3. students who possess more extensive prior knowledge, based on their academic major, will develop highly effective solutions. hypothesis 2.1. for students who complete the argumentation task, the effectiveness of the proposed creative solution will strongly and positively predict the self-evaluation of the solution with respect to its effectiveness. hypothesis 2.2. for students who respond to the explanation task, the effectiveness of their proposed creative solution will be less predictive of their self-evaluation of the effectiveness of the solution. 2. method the purpose of this study was to explore creative problem solving performance in undergraduate students and compare alternative tasks that support reflective self-evaluations of their proposed creative solutions. the study was designed based on a single-factor, between groups design with two comparison groups (i.e., explanation or argumentation task). 2.1 participants for this study participants were recruited from an undergraduate educational psychology course at a public research university in the united states. the completion rate was 82% with 103 volunteers completing the study. the sample was comprised of primarily sophomores (52%) and freshmen (30%), the majority were females (n=88), and more than half of the participants were education majors (57%) in comparison to 43% of non-education students (e.g., communication sciences and disorders, kinesiology). the demographics were comparable to most introductory courses required for teacher certification. 2.2 measures 2.2.1 demographics respondents completed a demographic cover page in which they provided background information including their academic major, academic classification, courses they completed in preparation for the transition to college, and courses they enrolled or completed pertaining to curriculum and instruction. participants also listed and described their teaching experiences. 2.2.2 divergent thinking the two divergent thinking tasks were derived from the tasks in guilford’s consequences’ test a’ (christensen, merrifield, & guilford, 1953). for each task students had two minutes to generate as many possible results to each of these hypothetical scenarios: (a) what would happen if a new invention makes it unnecessary for people to eat? (b) what would happen if a new invention makes it unnecessary for people to a.andiliou 99 | f l r sleep? the responses were scored for ideational fluency operationalized as the number of distinct valid ideas recorded by a respondent excluding any duplicates or irrelevant ideas due to a misinterpretation of the scenario. on average, for the two divergent thinking tasks participants generated m1=6.11(2.46) ideas and m2=5.52(2.27). due to their marginal internal consistencies (α=.63), the two scores were entered as separate divergent thinking indicators for data analysis. 2.2.3 beliefs questionnaire a 28-item likert scale designed for the purposes of this study was administered to gauge participants’ beliefs about creative outcomes with reference to a creative course. twelve items targeted characteristics of a creative course related to its (a) originality (i.e., innovative, unusual, original, novel, unique, and imaginative) and (b) effectiveness (i.e., successful, affordable, effective, implementable, goaldirected, and feasible). these characteristics are recurring terms that describe creative outcomes in the extant literature of creativity and creative problem solving. the remaining 16 items were distracters. an example of an item on the belief scale is: “creative high school courses are implementable.” participants rated the belief scale items with a score ranging from not very (0) to very (5) to indicate how typical the characteristic is of a creative course. a factor analysis with a principal axis factoring (paf) and a promax rotation was conducted with the 12 characteristics of creative courses to determine the underlying structure of the belief scale. the promax rotation was selected because it is a type of rotation that aids the interpretation of the factor analysis results when the factors are believed to be correlated as in this case (r12=.52). three factors were extracted with eigenvalues of 37.066, 10.688, and 6.996 respectively and they exceeded the criterion of 1.0 based on the kaiser-guttman rule (guttman, 1954; kaiser, 1960). however, only two underlying factors were detected in the scree plot. the characteristic affordable was the only one that loaded on the 3rd factor, which explained 6.996 % of the data. this characteristic was the only item that targeted financial aspects of a creative course and this is possibly why this characteristic failed to load on the two first factors that represented the effectiveness and originality of a course. thus, this item was removed and a second factor analysis was conducted with the 11 items. two factors emerged from the final factor analysis with eigenvalues of 4.828 and 1.497, which explained 39.984% and 9.917% of the variation in the data, respectively. as evidenced in table 1, nine items had loadings greater than the harman criterion value of .40. the two detected factors represent underlying characteristics of creative courses with the first factor representing the effectiveness dimension and the second factor representing the originality dimension of a creative course (r12 =.57). characteristics that underlie the effectiveness of a creative course included successful, effective, innovative, implementable, and feasible. characteristics that underlie the originality dimension of a creative course included the characteristics imaginative, unique, novel, and original. a belief scale with the nine items was formulated with acceptable internal consistency (α=.87). the internal consistency of the two component subscales were α1= .84 for effectiveness and α2=.81 for originality. the composite score for the entire belief scale ranged from 0 to 45 with higher scores indicating beliefs in agreement with current conceptualizations of creative outcomes in the literature. the average belief score was m=26.89(7.29) suggesting that participants’ beliefs were in moderate alignment with these conceptualizations. table 1 coefficients for the factor analysis with promax rotation for the beliefs scale characteristic effectiveness originality successful .998 -.153 effective .959 -.195 innovative .605 .070 implementable .509 .181 feasible .440 .169 a.andiliou 100 | f l r imaginative .004 .812 unique .081 .757 novel .095 .619 original .233 .619 unusual -.231 .367 goal-directed .325 .307 eigenvalues 4.828 1.497 percentage of variance 39.984 9.917 note. factor loadings >.40 are in boldface. 2.2.4. need for cognition scale the 18-item abbreviated need for cognition scale (cacioppo, petty, & kao, 1984, p.306) was administered (α=.79) to assess participants’ tendency to engage in and enjoy effortful cognitive endeavours. an example item from the scale is the following: “i would prefer complex to simple problems”. participants rated the statements with a score ranging from not very much (0) to very much (5). for the scoring of the scale, a composite need for cognition score was calculated and it ranged from 0 to 90. on average, participants manifested moderate to low need for cognition m=48.5(10.46). 2.2.5 solution self-evaluation questionnaire finally, participants evaluated their creative course on a 16-item likert scale questionnaire developed for the study, which consisted of two distracter items and 14 items that represented criteria of a creative solution with respect to its originality (i.e., innovative, unusual, original, imaginative, novel, unique, and risky) and effectiveness (i.e., effective, successful, affordable, implementable, goal-directed, feasible, and organized). these items reflected descriptive characteristics of creative outcomes identified in the extant theoretical and empirical literature of creativity and creative problem solving. participants rated how creative their proposed solution was based on the aforementioned characteristics on a scale ranging from not very (0) to very (5). a factor analysis with a principal axis factoring (paf) and a promax rotation was conducted with the 14 items after the distracters were removed. two factors emerged with eigenvalues equal to 6.086 and 2.452, which explained 40.58% and 14.01% of the data, respectively. the factor intercorrelation was moderate (r12=.49). table 2 summarizes the loadings on the two factors based on the pattern matrix. based on the results of a factor analysis two subscales were formulated: the originality selfevaluation and the effectiveness self-evaluation subscale with seven items each. both subscales ranged from 0 to 35 and had acceptable internal consistency indices of α1=.87 and α2=.88 respectively. οn average, participants evaluated their proposed course solution low in originality m=18.96(6.73) and moderate in effectiveness m=26.69(5.38). table 2 coefficients for the exploratory factor analysis with promax rotation for the self-evaluation scale characteristic course effectiveness course originality effective .81 .04 successful .78 .10 affordable .74 -.30 organized .73 .10 goal-directed .69 .00 implementable .69 -.09 feasible .65 .04 unique .08 .86 imaginative -.02 .80 a.andiliou 101 | f l r unusual -.13 .75 original .13 .68 novel .06 .68 risky -.40 .60 innovative .34 .56 eigenvalues 6.09 2.45 percentage of variance 40.58 14.01 note. factor loadings >.40 are in boldface. 2.3 procedure participants completed the study through an online survey system (qualtrics) that randomly assigned them to a condition either the explanation (n1=53) or the argumentation (n2=50) task. the study was selfpaced as students completed it in one sitting at their own pace. students first provided demographic information and responded to two counterbalanced divergent thinking tasks. following, they completed a beliefs questionnaire and the need for cognition scale. then all participants read the same problem scenario and in response to it they developed a creative course as a solution to the problem described in the scenario. following participants responded to a reflective task (i.e., an explanation or an argumentation task) about their proposed creative course. finally, all participants evaluated the creativity of their course by rating it on a scale with a set of characteristics that describe the originality and effectiveness of a creative solution. 2.4 problem solving task the problem scenario was originally developed by hunter and his colleagues (2008) for a study of undergraduate students’ idea generation and problem solving. the specific scenario was selected for two reasons: (a) it had been previously used with undergraduate students and has yielded acceptable interrater agreement scores (0.70-0.80) with respect to the originality and quality scores assigned to the solution and (b) the embedded problem solving task is ill-structured as it requires students to: extract the important and relevant information from the scenario, identify the parameters and constraints for solving the problem, apply personal beliefs about creative courses and creative teaching, draw on their knowledge and experiences to define the problem, make judgments, and establish criteria for evaluation. the scenario required participants to assume the role of a high school teacher asked to design a creative college preparatory course for the high school’s seniors to better prepare them for college and reduce the college dropout rate among this high school’s graduates. the final paragraph in the problem scenario explained the task: “in her description of the requirements for the course, the principal makes one point very clear, the senior prep course needs to be a creative high school course designed to prepare the high school students for college. she emphasized that you need to take a creative approach in designing and teaching the course. the principal has asked you to 1) identify the overall goal of the course and 2) list and describe the specific learning activities that you will include in the course.” also, for the purposes of the study we modified the problem scenario in two ways. first, any descriptions or conceptualizations of a creative course were removed so that participants rely on their own beliefs and understandings of a creative course. still we emphasized in the scenario that the problem solver needs to take a creative approach in designing and teaching the course. second, in the final paragraph we identified the two specific tasks that participants had to complete after reading the scenario. we conducted two pilot studies followed by two focus group discussions in order to gather evidence for the comprehensibility and the face validity of the problem scenario and the problem solving task. the pilot study participants were representative of the sample (i.e., non-education and education majors) and in general the students found the directions clear and the scenario understandable. they also acknowledged the authenticity of the problem scenario as they pointed that it challenged them to provide a solution to a real life a.andiliou 102 | f l r problem: the fact that high school students are not prepared for the academic, social, and emotional challenges of the transition to college. pilot study participants said that once they read the problem scenario they had to pause and reflect on what their needs were when they moved to college and what is important for a student to succeed in college. overall, the authenticity of the problem scenario and its relevance to students’ recent college transition experiences seems to have motivated participants to engage with the task as they agreed on the importance of designing a high school course to prepare students for college. 2.4.1. coding a coding scheme was developed to summarize the responses that participants provided to the problem solving task. specifically, the scheme was used to code the learning activities participants suggested for their creative high school course. an iterative procedure was followed to develop the coding scheme and establish its validity. the researcher and an independent coder (coder a) applied a keyword content analysis approach to identify the task-relevant units within a response. a task-relevant unit was defined as any distinct task-relevant statement that captured learning activities that participants generated for their high school course. a learning activity was defined as any learning experience, enactive (i.e., actual doing) or vicarious (i.e., students observe, listen or are engaged in other ways), designed for the learners to attain an instructional goal such as the acquisition of information, knowledge, skills, abilities, attitudes and strategies (antiliou, 2012, p. 66). the first author began by reading all of the responses to generate an initial set of coding categories to summarize the responses to the question prompt that participants recorded. this initial review of responses revealed that participants recorded learning activities as well as assessment activities and other instruction and course design elements such as materials, educational technology, and learning goals. following, the researcher provided directions to another colleague (coder a) to develop independently her version of the coding scheme. the directions included the problem scenario, the problem solving task, a set of coding guidelines and an example of a coded response. then, coder a proceeded to read the entire set of responses to independently generate a second version of the coding scheme. in the two discussions that followed between the first author and coder a, the two independent coders analyzed, compared, and synthesized the two alternative coding schemes to generate a merged coding guide that included a coding scheme and a set of guidelines with definitions, assumptions, and decision rules. there was consensus that the coding scheme should capture task-relevant units that identified not only learning and assessment activities but also other instruction/course design elements (e.g., instructional materials or educational technology, etc.). a total of 47 coding categories were included in the coding scheme and they are summarized under ten overarching categories including discussion-based activities, problem solving activities, experiential learning activities, reading and writing assignments (see table 3 for the complete list of categories). the 47 coding categories represented learning or assessment activities, as well as other instruction/course design elements. six of the 47 coding categories were further divided into four additional specific subcodings. for example, the category modelling had two subcodes: instructor model and other models. also, the category expository writing had two subcodes: extended and brief. in the case of coding categories with more specific subcodes, the coding was done by applying the more specific subcode. the coding guide was used for a trial coding (10%) followed by a discussion to resolve potential differences, refine the scheme, and clarify the decisions rules. following the development and validation of the coding scheme, the intercoder agreement for the reliability of the coded responses was examined. the researcher and coder a independently coded another 20% of the responses for this purpose. in each response, the coders (a) identified the total number of taskrelevant units and (b) coded the task-relevant units including the learning activities or other instruction/course design elements. a total of n=349 valid task-relevant units were recorded by the participants. the intercoder agreement for the total number of task-relevant units was α=.79 and for the type of code assigned to a taskrelevant unit was α=.72. both indices were above the moderate criterion .70 selected for the conservative a.andiliou 103 | f l r kalpha coefficient of intercoder agreement. in a discussion that followed, the coders first resolved disagreements on the number of task-relevant units and then disagreements on the assigned codes in order to reach consensus. 2.4.2. scoring the creativity of a course that participants proposed was operationalized with respect to the average originality and effectiveness of the valid task-relevant units. each valid task-relevant unit that a participant recorded was scored for its originality and effectiveness. among the valid task-relevant units, 315 units were classified as learning or assessment activities. another 34 task-relevant units represented instruction/course design elements, which referred to aspects of instruction or course design including materials, educational technology, the structure of the course, and the learning environment. originality: originality was defined as the rareness of occurrence of a task-relevant unit within the pool of valid units (n=349) generated by all participants. the originality score (x) assigned to a valid taskrelevant unit (i) was the rareness proportion of the specific code within the pool of (a) learning/assessment activity units or (b) instruction/course elements, depending on the nature of the coded unit. for example, a task-relevant unit with the code instructor modelling appeared 55 times so its proportion of occurrence within the pool of learning/assessment activities was 55/315=0.18 and its rareness of occurrence was 10.18=.82. similarly, a task-relevant unit with the code educational technologies appeared 6 times within the pool of the 34 instruction/course design elements, thus its proportion of occurrence was 6/34=0.18 and its rareness of occurrence was 1-0.18=.82. the average originality score for the solution proposed by a participant (j) was: average originality = . the average originality score (x) is the sum total of the rareness proportion (x) for every ith valid task-relevant unit divided by the number of valid task-relevant units (m) recorded by each participant. effectiveness: the potential effectiveness of a learning activity was defined as the degree to which a learning or assessment activity or other instruction/course design element could contribute to the smooth transition and academic success during the first years of college. an effectiveness rubric was developed to operationalize and score each task-relevant unit (see appendix). the rubric was developed by drawing on the literatures of instructional design, and college transition and persistence and instructional design (eggen & kauchak, 2010; goldbrick-lab et al., 2007; louie, 2007; pritchard et al, 2007; roe clark, 2005). the effectiveness scores ranged from inadequate (0) to strong (4) effectiveness. the effectiveness of a learning activity or other task-relevant unit was considered strong if (a) it targeted important information, knowledge, abilities, skills, or strategies for smooth transition and academic success in the first years of college and (b) it strongly aligned (i.e., directly relevant) with the identified overall goal of the course. examples of potentially effective activities include those that targeted writing, note taking, and test taking skills but also coping strategies and interpersonal skills. in order to establish the reliability of the effectiveness scores another colleague was trained to serve as a second rater using a subset of the pilot data. following, the researcher and the second rater independently, scored two subsets of the data to reach an acceptable interrater agreement level (α=.82). when all valid task-relevant units were scored for their potential effectiveness, an average effectiveness score for a solution was estimated by applying the following formula: average effectiveness = . for each participant (j), the average effectiveness score (y) is the sum total of the effectiveness score (y) for every ith valid task-relevant unit that each participant generated divided by the number of valid units (m) proposed by each participant (j). 2.5.1 reflective task in both experimental conditions participants completed one of two alternative post problem solving tasks. students in the explanation condition (n1=53) were directed to “provide an explanation of their high a.andiliou 104 | f l r school course to the school board members”. the explanation task required a short written response to this prompt. students in the argumentation condition (n2=50) completed an argument diagram. the argumentation diagram is a modified argumentation vee diagram1 (nussbaum, 2008) adapted (a) for the online administration of the study and (b) for constraining participants to use weighing as the integration strategy between the arguments and counterarguments. figure 1 presents the argumentation diagram administered in this study. the overarching question inquired whether the proposed course is a potentially creative course. the participants generated reasons in favour of their creative course and corresponding potential objections of the school board. then they were directed to reread the reasons and objections and decide for each pair whether the reason or objection was stronger. participants could offer up to 5 pairs of reasons and objections. 1 two pilot studies (n1=19; n2=9) and focus group discussions were conducted to ensure that the modified argumentation diagram is comprehensible and that participants are able to complete the diagram. please, contact the first author for information on the pilot studies. a.andiliou 105 | f l r figure 1. the argumentation diagram utilized in the study. 3. results the purpose of the study was to examine problem solving performance and identify reflective tasks that better support students’ self-evaluations of their proposed creative solutions. participants completed online a set of individual difference measures before responding to the problem solving task in which they assumed the role of a high school teacher who was asked to design a creative college preparatory course for the high school senior students. the participants identified the overall goal of their high school course and generated specific learning activities for the course. following they completed an explanation or argument reflective task and rated the creative course in terms of its originality and effectiveness. a summary of the creative solutions that participants forwarded is followed by the presentation of the descriptive statistics and corresponding statistical models performed to answer the two research questions. 3.1 the creative solutions participants designed a creative course to reduce the high college dropout rate among the high school graduates and better prepare them for the transition to college. participants listed and described specific learning activities that they would implement in their course. among the most widely referenced learning activities within the pool of valid task-relevant units generated by the respondents were instructor led activities in which the instructor or another more experienced individual (e.g., guest speaker) was responsible for providing instructional support such as presenting content and sharing experiences. equally popular were activities that were based on experiential learning, such as simulations, fieldtrips, and student presentations. table 3 frequency of occurrence of overarching categories within valid task-relevant units (n=349) overarching category frequency of occurrence f percentage of occurrence % discussion 18 5.16 warm up 4 1.15 instructor led 94 26.93 problem solving 6 1.72 experiential learning 93 26.65 research 13 3.72 writing assignments 58 16.62 reading assignments 3 0.86 classroom assessment 26 7.45 instruction/course design 34 9.74 other learning activities included writing (i.e., expository, persuasive, reflective, organizational aids) and reading assignments (e.g., textbooks, articles, or reports), discussions including student-centered discussions, debates, and discussions with experts. moreover, participants identified research activities, for example searching information about an academic topic, and searching about potential careers and colleges, and learning activities based on problem solving (e.g., decision making). classroom assessments such as formative, summative, and diagnostic assessments were included in participants’ proposed learning activities (n1=26; 7.45%). several students (n2=34; 9.74%) suggested other instructional or course design elements such as materials and educational technologies. it is possible that these students had interpreted the prompt more broadly than intended such that they provided ideas about how they would organize the course and plan instruction to attain the goals of the creative course. a.andiliou 106 | f l r 3.2 predictors of creative solutions in the first research question we examined the extent to which individual difference variables including divergent thinking, need for cognition, beliefs about creative outcomes, and academic major impact the creativity of a proposed solution in terms of its average originality and potential effectiveness. across the sample, the mean average originality score was high m=0.9(0.09) and the mean average effectiveness score was moderate m=3.23(0.59). descriptive statistics for the three continuous predictors are presented in table 4. with respect to their academic major, 57% of the sample were education majors and 43% non-education majors (e.g., communications, kinesiology, or human development). participants exhibited moderate divergent thinking ability and on average they generated six valid ideas. considerable more variability was evident in participants’ need for cognition which was moderate to low. table 4 means and standard deviations of predictors of creative solutions variables m sd range divergent thinking (i) 6.11 2.42 divergent thinking (ii) 5.54 2.23 need for cognition 48.50 10.50 0-90 beliefs 26.92 7.31 0-45 moreover, the students’ beliefs about creative outcomes mean score was m=26.89(7.29), which indicates that participants’ beliefs were somewhat in alignment with conceptualizations in the literature. participants rated high characteristics of creative outcomes pertaining to their effectiveness [i.e., feasible m=3.98(1.24), effective m=3.42(1.04), and successful m=3.23(1.12)]. they also acknowledged as important characteristics those describing the originality of a creative course, for example, innovative m=3.25(1.21) and imaginative m=3.02(1.05). this result suggests that participants took into consideration the context of schooling and appreciated not only the originality but also the effectiveness of a course as an important quality of a creative course. two regression models were conducted to determine the predictors of the average originality and effectiveness of a solution since the correlation between the two outcome variables of average originality and potential effectiveness was non-significant (r=.07, p=.5). due to the violation of the assumption of the residuals, instead of a multiple regression, an ordinal regression model was conducted to determine the predictors of average solution originality. thus, the dependent variable was transformed into an ordinal variable with three levels of average originality (i.e., low, moderate, high) to determine the cumulative odds ratio of proposing a solution of high originality. high average originality (≥.94) was manifested by 52 participants, moderate average originality (.86≤ y ≤.93) was exhibited by 30 participants, and 21 participants scored low (≤.85) in average originality. table 5 predictors of solution originality variable estimate wald p confidence intervals threshold low -.828 0.47 .49 [-3.19, 1.53] moderate .572 0.23 .63 [-1.78, 2.93] parameter divergent thinking (ii) 0.20* 4.82 .03 [.02, 0.38] beliefs 0.01 .03 .47 [-0.05, 0.02] a.andiliou 107 | f l r academic major 0.50 0.02 .90 [-0.73, 0.83] need for cognition -0.01 0.51 .86 [-0.05, 0.02] the initial full ordinal regression model was non-significant. the non-significant predictors were removed stepwise and the ordinal regression model reached significance (-2ll=67.66, χ2(4) =5.08, p=0.02) with divergent thinking (task ii) being the only significant predictor (table 5). divergent thinking positively predicted average solution originality and for each unit increase in divergent thinking participants had lower cumulative odds of developing a solution of poorer originality (low or moderate) by a factor of 0.82. a multiple linear regression with the same individual difference variables as predictors was performed (see table 6) to determine their effect on the average effectiveness of creative solutions. the full model was significant but explained a modest amount of variation [f(4,97)=3.51, p=0.01, r2=0.13]. academic major and need for cognition positively predicted the average effectiveness of a creative solution. for education majors, a solution was on average 0.24 (p=.02) more effective in comparison to a solution proposed by a non-education major. in addition, for each unit of increase in need for cognition there was a 0.21 (p=.03) increase in solution effectiveness. table 6 predictors of solution effectiveness variable b β p confidence intervals constant 2.72** <.001 2.18-3.27 divergent thinking (ii) 0.03 0.12 .22 -0.02-0.07 beliefs -0.01 -0.1 .32 -0.02-0.01 academic major 0.23 0.24 .02* 0.04-0.43 need for cognition 0.01 0.21 .03* 0.001-0.02 3.3 creative solution self-evaluation in the second research question we explored the effect of two alternative reflective tasks: the explanation and the argumentation task on the self-evaluations of the creative solution, with respect to its originality and effectiveness. a multivariate multiple regression (mmr) was performed with four predictors as covariates and the type of reflective task (1=explanation, 2=argumentation) as the fixed factor in the model. the covariates included beliefs about creative outcomes, academic major, and the average assigned originality and effectiveness score. the mmr analysis was conducted since the two outcome variables namely the effectiveness and originality self-evaluations were significantly and positively correlated (r12=.38, p<.001). table 7 predictors of creative solutions self-evaluations hotelling’s trace f p partial η 2 observed power intercept .20 11.67 .001 .20 .99 a.andiliou 108 | f l r beliefs .27 17.31 .001* .27 1.00 average originality .03 1.34 .27 .03 .28 average effectiveness .001 .001 .99 .001 .05 academic major .02 .77 .47 .02 .18 condition .11 5.77 .004* .11 .86 the mmr model was significant [f(2,94) =11.67, p=<.001, η2 =.20]. the type of reflective task [f(2,93) =5.77, p=.004, η2 =.11] and beliefs about creative outcomes [f(2,93) =17.31, p=<.001, η 2 =.27] had a significant effect on the self-evaluations (see table 7). specifically, the type of reflective task significantly and positively predicted the effectiveness self-evaluations [f(1,94) =11.23, p<.001, η 2 =.11]. participants in the argumentation condition evaluated their creative course by 2.81 [p=.001, 95% (1.14, 4.47)] points lower than participants in the explanation condition when all other predictors were equal. thus, the argument diagram was a structure support that seems to have promoted more conservative self-evaluations about the proposed creative solution. participants’ beliefs about the characteristics of creative outcomes were a significant positive predictor of the self-evaluations of originality [f(1,94) =14.63, p<.001, η 2 =.14] and effectiveness [f(1,94) =28.74, p<.001, η 2 =.23] of a forwarded creative solution. in fact, participants whose beliefs better aligned with the current conceptualizations of creative outcomes evaluated higher the creativity of their solution both in terms of its originality and effectiveness. 4. discussion students across education levels are challenged to acquire complex cognitive skills including creative thinking. in this study we examined the individual difference variables that contribute to creative performance in problem solving with respect to the originality and effectiveness of a proposed creative solution. in addition, we attempted to address a gap in the literature related to the effect of argumentation tasks on the self-evaluation of creative solutions. the major contribution of the present study is the development of the creative solution selfevaluation questionnaire which is a reliable rating scale that can be administered by teachers and used by students to evaluate creative solutions, ideas, and products with respect to a set originality and effectiveness criteria. however, the self-evaluation scale needs to be further validated to determine whether it yields the same underlying structure for creative outcomes across fields as it is also possible that additional criteria have to be met for an outcome to be judged as creative in a different field since influential individuals in each field evaluate ideas based on some consensus about the contribution of an idea in the field (antiliou, 2010; csikszentmihalyi, 1999). our investigation also contributes to the research efforts to identify the cognitive and affective variables that predict the creative performance of novices. the findings of the study aligned with research findings regarding the predictors of creative performance. divergent thinking was reported as a predictor of creative problem solving (diakidoy & constantinou, 2001; hunter et al., 2008; reiter-palmon et al., 1997), and this ability to generate various, distinct responses to a divergent thinking task was found in this study to be the single significant predictor of the originality of a proposed creative solution. the effectiveness of a creative solution was predicted by affective and cognitive variables, specifically, need for cognition and academic major. the findings add to the existing evidence, which show that individuals with high need for cognition perform more effectively when solving complex problems (butler et al., 2003; nair & ramnarayan, 2000; osburn & mumford, 2006). however, it is worrisome that participants reported moderate to low need for cognition since this cognitive disposition to enjoy effortful and challenging endeavours represents a prerequisite for lifelong learning and continued professional a.andiliou 109 | f l r development especially for future educators. academic major served as a prior knowledge proxy and it positively predicted the effectiveness of the creative solution. in the future, researchers who aim to examine creative problem solving in specific disciplines could administer measures of domain knowledge such as the pedagogical/psychological (ppk) knowledge measure of general pedagogical knowledge (voss, kunter, baumer, 2011) instead of relying on proxies of prior knowledge, which was a limitation in this study. in the present study we also aimed to investigate the type of tasks that support more reflective selfevaluations of creative solutions. the findings of the study provide some indication that argumentation tasks facilitate more critical self-evaluations of the effectiveness of creative solutions. participants who completed the argument diagram rated the effectiveness of their course more conservatively in comparison to those who responded to the explanation prompt, possibly because the argument diagram provided a structure support for students to elaborate, reflect more deeply and to critically analyze their proposed solution by considering alternative perspectives held by other stakeholders (jonasssen & kim, 2010; nussbaum & sinatra, 2003; suthers, 2001). further, andriessen cited baker (2004) to argue that argumentation is a mechanism through which students not only provide explanations but also prepare a justification to explicitly describe their rationale, which fosters better reflection. munneke (2004) and colleagues also argued that as a knowledge representation tool a diagram explicitly presents the structure of argumentation, thus, providing an overview and making components and perspectives more visible. the fact that participants were more conservative about their solutions after completing the argumentation diagram provides some evidence for voss’s (1981) idea that argumentation is a mechanism that allows students to not only elaborate and clarify the solution but also identify potential limitations, thus, becoming more critical of their solution. however, more research evidence from a study based on a think aloud procedure is needed to provide stronger evidence on how students reflect on their solutions and whether the argument diagram itself promotes more reflective selfevaluations. given that in the present study the design of the argument diagram guided students to apply the weighing argument-counterargument integration strategy, a follow-up study with a think aloud methodology would allow for a more authentic assessment of the integration strategies (e.g., synthesis, refutation, and minimization) that students choose to apply in tasks that require a creative solution that realizes benefits and minimizes disadvantages. the findings of the study confirm the important role of beliefs as an affective variable that impacts problem solving with regard to the self-evaluations of a proposed creative solution rather than on creative performance per se. in fact, participants whose beliefs about the characteristics of creative outcomes aligned better with current conceptualizations in the literature rated both the originality and effectiveness of their solutions more positively. this finding signals the need for educators to pay more attention to affective dimensions of learning including students’ beliefs since they inform critical thinking such as the selfevaluations of solutions. teachers need to provide students with opportunities to explicate, contradict, and enrich their beliefs through classroom discussions, encounters with creative individuals, and exposure to examples of creative work across domains. when practitioners realize that students’ beliefs are narrow or naïve they can also administer rating scales in advance to provide students with criteria for their selfevaluations. the finding also confirms that ontological beliefs about the nature of creative outcomes play an important role in the self-evaluation process. educational researchers have shown interest in examining how epistemological beliefs impact problem solving performance (lodewyk, 2007; muis, 2008; oh & jonassen, 2007) but further research can be conducted in other knowledge domains by using approaches such as think aloud protocols, interviews, and classroom discourse to provide additional evidence on the role of ontological beliefs in creative problem solving in which learners have to draw on their creativity beliefs to define the problem and establish criteria to evaluate a potentially creative solution. thinking as argument is implicated in the beliefs that people hold, the judgments they make, and the conclusions they come to; it arises every time a significant decision must be made (jonassen & kim, 2010, p.439). drawing on the findings of this study, we encourage educators who aim to facilitate students’ critical thinking to use argument-based tasks in the form of diagrams to support students in generating, organizing, and evaluating their arguments and counterarguments in order to make more reflective evaluations during problem solving. a.andiliou 110 | f l r keypoints creative solutions were operationalized as original and effective and innovative procedures were used to measure these dimensions creative outcomes. the theoretical frame emerged from a literature review, which integrate two lines of research namely ill-structured and creative problem solving. the findings confirmed that argumentation diagrams can support reflective critical evaluations beyond writing tasks but in problem solving as well. a reliable self-evaluation scale was developed to assess characteristics of creative solutions which students and teachers can use to evaluate creativity references anderson, l.w., & krathwohl, d.r. (eds.). (2001). a taxonomy of learning, teaching, and assessment: a revision of bloom's taxonomy of educational objectives. new york: longman. andriessen, j. (2006). arguing to learn. in: k. sawyer (ed.) handbook of the learning sciences (pp.443459). cambridge: cambridge university press. andiliou, a. & murphy, p. k. (2010). examining variations among researchers’ and teachers’ conceptualizations of creativity: a review and synthesis of contemporary research, educational research review, 4(3), 201-219. antiliou, a. (2012). the effect of an argumentation diagram on the self-evaluation of a creative solution. (unpublished doctoral dissertation). the pennsylvania state university, university park, pa. basadur, m., runco, m. a., & vega, l. a. (2000). understanding how creative thinking skills, attitudes and behaviors work together: a causal process model. journal of creative behavior, 34(2), 77-100. butler, a. b., scherer, l. l., & reiter-palmon, r. (2003). effects of solution elicitation aids and need for cognition on the generation of solutions to ill-structured problems. creativity research journal, 15(23), 235-244. doi:10.1207/s15326934crj152&3_13. byrne, c. l., shipman, a. s., & mumford, m. d. (2010). the effects of forecasting on creative problemsolving: an experimental study. creativity research journal, 22(2), 119-138. cacioppo, j. t., petty, r. e., & kao, c. f. (1984). the efficient assessment of need for cognition. journal of personality assessment, 48(3), 306-307. chen, c., & bradshaw, a. c. (2007). the effect of web-based question prompts on scaffolding knowledge integration and ill-structured problem solving. journal of research on technology in education, 39(4), 359-375. chi, m.t.h. (2000). self-explaining expository texts: the dual processes of generating inferences and repairing mental models. in r. glaser (ed.), advances in instructional psychology, hillsdale, nj: lawrence erlbaum associates. cho, k., & jonassen, d. h. (2003). the effects of argumentation scaffolds on argumentation and problem solving. educational technology research and development, 50(3), 5-22. christensen, p. r., merrifield, p. r., & guilford, j. p. (1953). consequences form a-1. beverly hills, ca: sheridan supply. csikszentmihalyi, m. (1999) implications of a systems perspective for the study of creativity. in r. j. sternberg, (ed.), handbook of creativity (pp. 313-335). new york: ny cambridge university press. dailey, l.r. & mumford, m.d. (2006). evaluative aspects of creative thought: errors in appraising the implications of new ideas. creativity research journal, 18(3), 367-384. diakidoy, i. n., & constantinou, c. p. (2001). creativity in physics: response fluency and task specificity. creativity research journal special issue: commemorating guilford's 1950 presidential address, 13(3-4), 401-410. eggen, p. & kauchak, d. (2010). educational psychology: windows on classrooms (8 th ed.). new jersey: pearson education. a.andiliou 111 | f l r easterday, m.w., aleven, v., & scheines, r. (2007). tis better to construct or to receive? effect of diagrams on analysis of social policy. in r. luckin, k. r. koedinger, & j. greer (eds.), proceedings of the 13 th international conference on artificial intelligence in education (pp. 93-100). amsterdam: ios. felton, m., & kuhn, d. (2001). the development of argumentative discourse skill. discourse processes, 32(2&3), 135-153. ferretti, r. p., macarthur, c. a., & dowdy, n. s. (2000). the effects of an elaborated goal on the persuasive writing of students with learning disabilities and their normally achieving peers. journal of educational psychology, 92(4), 694-702. fonseca, b. a. & chi, t. h. (2011). instruction based on self-explanation. in r. e. mayer & p. alexander (eds.) handbook of research for learning and instruction. new york, ny: routledge ge, x., chen, c., & davis, k. a. (2005). scaffolding novice instructional designers' problem-solving processes using question prompts in a web-based learning environment. journal of educational computing research, 33(2), 219-248. ge, x., & land, s. m. (2003). scaffolding students' problem-solving processes in an ill-structured task using question prompts and peer interactions. educational technology research and development, 51(1), 2138. doi:10.1007/bf02504515. goldbrick-lab, s., carter f. d., & wagner, r. w. (2007). what higher education has to say about the transition to college? teachers college record, 109(10), 2444-2481. hunter, s. t., bedell-avers, k. e., hunsicker, c. m., mumford, m. d., & ligon, g. s. (2008). applying multiple knowledge structures in creative thought: effects on idea generation and problem-solving. creativity research journal, 20(2), 137-154. isaksen, s. g. & treffinger, d. j. (1985). creative problem solving: the basic course, buffalo, ny: bearly limited. jonassen, d. h. (1997). instructional design models for well-structured and ill-structured problem-solving learning outcomes. educational technology: research & development, 45(1), 65-94. jonassen, d.h., & kim, b. (2010). arguing to learn and learning to argue: design justifications and guidelines. educational technology: research & development, 58, 439-457. kim, s. (2001). the effects of group monitoring on transfer of learning in small group discussions. unpublished doctoral dissertation, university of illinois at urbana-champaign. kuhn, d. (1991). the skills of argument. cambridge, uk: cambridge university press. leitão, s. (2003). evaluating and selecting counterarguments. written communication, 20, 269-306. lodewyk, k. r. (2007). relations among epistemological beliefs, academic achievement, and task performance in secondary school students. educational psychology, 27(3), 307-327. louie, v. (2007). who makes the transition to college? why we should care, what we know and what we need to do. teachers college record, 109(10), 2222-2251. marttunen, m., & laurinen, l. (2006). collaborative learning through argument visualisation in secondary school. in s. n. hogan (ed.), trends in learning research. (pp. 119-138). hauppauge, ny, us: nova science publishers. muis, k. r. (2008). epistemic profiles and self-regulated learning: examining relations in the context of mathematics problem solving. contemporary educational psychology, 33, 177-208. mumford, m.d., & mobely, m. i., uhlman, c. e., reiter-palmon, r., & doares, l. m. (1991). process analytic models of creative thought. creativity research journal, 4, 91-122. munneke, l. van amelsvoort, m., & andriessen, j., (2003). the role of diagrams in collaborative argumentation-based learning. international journal of educational research, 39, 113-131. nair, k. u., & ramnarayan, s. (2000). individual differences in need for cognition and complex problem solving. journal of research in personality, 34(3), 305-328. newell, g. e., beach, r., smith, j., & vanderheide, j. (2011). teaching and learning argumentative reading and writing: a review of research. reading research quarterly, 46(3), 273-304. nussbaum, e. m. (2008). using argumentation vee diagrams (avds) for promoting argumentcounterargument integration in reflective writing. journal of educational psychology, 100(3), 549-565. nussbaum, e. m., & schraw, g. (2007). promoting argument-counterargument integration in students’ writing. journal of experimental education, 76, 59-92. a.andiliou 112 | f l r nussbaum, e. m., & kardash, c. m. (2005). the effects of goal instructions and text on the generation of counterarguments during writing. journal of educational psychology, 97, 157-169. nussbaum, e. m., & sinatra, g. m. (2003). argument and conceptual engagement. contemporary educational psychology, 28, 384-395. doi:10.1016/s0361-476x(02)00038-3 oh, s., & jonassen, d. h. (2007). scaffolding online argumentation during problem solving. journal of computer assisted learning, 23(2), 95-110. osburn, h. k., & mumford, m. d. (2006). creativity and planning: training interventions to develop creative problem-solving skills. creativity research journal, 18(2), 173-190. pritchard, m. e., wilson, g., & yamnitz, b. (2007). what predicts adjustment among college students? a longitudinal panel study. journal of american college health, 56(1), 15-21. reiter-palmon, r., illies, m. y., cross, l. k., buboltz, c., & nimps, t. (2009). creativity and domain specificity: the effect of task type on multiple indexes of creative problem-solving. psychology of aesthetics, creativity, and the arts, 3(2), 73-80. reiter-palmon, r., mumford, m. d., o'connor boes, j., & runco, m. a. (1997). problem construction and creativity: the role of ability, cue consistency and active processing. creativity research journal, 10(1), 9-23. reznitskaya a., anderson, r. c., mcnurlen, b., nguyen-jahiel, k., archondidou, a., & kim, s. y. (2001). influence of oral discussion on written argument. discourse processes, 32(2&3), 155-175. roe clark, m. (2005). negotiation the freshman year: challenges and strategies among first-year college students. journal of college student development, 46(3), 296. runco, m. a., & chand, i. (1994). problem finding, evaluative thinking, and creativity. in m. a. runco (ed.), problem finding, problem solving, and creativity. (pp. 40-76). westport, ct, us: ablex publishing. scwarz, b. b., neuman, y., & biezuner, s. (2000). two wrongs may make a right ...if they argue together! cognition and instruction, 18(4), 461-494. shin, n., jonassen, d. h., & mcgee, s. (2003). predictors of well-structured and ill-structured problem solving in an astronomy simulation. journal of research in science teaching, 40(1), 6-33. suthers, d. d. (2001). towards a systematic study of representational guidance for collaborative learning discourse. journal of universal computer science, 7(3), 254–277. toulmin, s. e. (1958). the uses of argument. cambridge, england: cambridge university press. uribe, d., klein, j. d., & sullivan, h. (2003). the effect of computer-mediated collaborative learning on solving ill-defined problems. educational technology research & development, 51(1), 5-19. van eemeren, f., & grootendorst, r. (1999). developments in argumentation theory. in j. andriessen & p. coirier (eds.). foundations of argumentative text processing (pp. 43-57). amsterdam: amsterdam university press. van eemeren, f. h., & grootendorst, r. (1992). argumentation, communication, and fallacies: a pragmadialectical perspective. hilisdale, nj: lawrence erlbaum associates. voss, j. f. & post, t. a. (1988). on the solving of ill-structured problems. in m. t. h. chi, r. glaser & m. j. farr (eds.), the nature of expertise (pp. 261-285). hillsdale, nj: lawerence erlbaum associates. voss, j. f., wolfe, c. r., lawrence, j. a., & engle, r. a. (1991). from representation to decision: an analysis of problem solving in international relations. in r. j. sternberg, & p. a. frensch (eds.), complex problem solving: principles and mechanisms. (pp. 119-158). hillsdale, nj, england: lawrence erlbaum associates. voss, t., kunter, m., & baumert, j. (2011). assessing teacher candidates’ general pedagogical and psychological knowledge: test construction and validation. journal of educational psychology, 103(4), 952-969. walton, d. (2000). the place of dialogue theory in logic, computer science, and communication studies. synthese, 123, 327-346. walton, d. n. (1996). argumentation schemes for presumptive reasoning. mahwah, nj: laurence erlbaum associates. weisberg, r. w. (2006). expertise and reason in creative thinking: evidence from case studies and the laboratory. in j. c. kaufman & j. baser (eds.) creativity and reason in cognitive development (pp. 742). new york: cambridge university press. a.andiliou 113 | f l r wiley, j., & voss, j. f. (1999). constructing arguments from multiple sources: tasks that promote understanding and not just memory for text. journal of educational psychology, 91(2), 301-311. a.andiliou 114 | f l r appendix effectiveness scoring rubric table 8 effectiveness scoring rubric for the coded task-relevant units score descriptor 4-strong the learning activity or other instruction/course design element targets important information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. strongly aligns with the overall goal of the course. 3-moderate (weak/strong) or (strong/weak) the learning activity or other instruction/course design element targets somewhat important information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. strongly aligns with the overall goal of the course. the learning activity or other instruction/course design element targets important information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. weakly aligns with the overall goal of the course. 2-weak (weak/weak) the learning activity or other instruction/course design element targets somewhat important information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. weakly aligns with the overall goal of the course. 1-insufficient (weak/inadequate) or (inadequate/weak) the learning activity or other instruction/course design element targets somewhat important information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. does not align with the overall goal of the course. the learning activity or other instruction/course design element does not target information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. weakly aligns with the overall goal of the course. 0-inadequate (inadequate/inadequate) the learning activity or other instruction/course design element does not target information, knowledge, abilities, skills and strategies for academic success in college or smooth transition to college. does not align with the overall goal of the course. frontline learning research vol.4 no. 4 special issue (2016) 30 38 issn 2295-3159 corresponding author: william r. penuel, school of education, university of colorado boulder, ucb 249, boulder, co 80305 usa. email: william.penuel@colorado.edu doi: http://dx.doi.org/10.14786/flr.v4i4.205 a social practice theory of learning and becoming across contexts and time william r. penuel, daniela k. digiacomo, katie van horne, ben kirshner university of colorado, united states article received 7 september / revised 9 august / accepted 12 september / available online 19 december abstract this paper presents a social practice theory of learning and becoming across contexts and time. our perspective is rooted in the danish tradition of critical psychology (dreier, 1997; mørck & huniche, 2006; nissen, 2005), and we use social practice theory to interpret the pathway of one adolescent whom we followed as part of a longitudinal study of interest-related learning. a social practice theory calls out the ways people pursue diverse concerns, become aware of new possibilities for action as they move across settings of practice, and learn as they adjust contributions to the flow of ongoing activity and to fit demands and structures of local institutions. it also highlights the ways that existing institutional structures of practice frame the choices people make about how and where to participate in activities. this perspective on learning is potentially transformative, in that it provides a way to promote equity by surfacing issues associated with linkages among settings of practice, networks of actors who support persons’ movement across settings, and diversities in structures of practices that shape opportunities to learn and become. keywords: social practice theory; learning; agency; equity mailto:william.penuel@colorado.edu http://dx.doi.org/10.14786/flr.v4i4.205 penuel et al | f l r 31 1. introduction education systems around the world are experimenting with different ways to prepare youth to participate in, and lead, the global “knowledge economy.” in the united states, where we live, this has meant a push for higher quality science, technology, engineering, and math (stem) learning opportunities, more stem graduates from universities, and greater access to stem learning for groups that have been historically excluded from quality schooling. although the equity-related goals of expanding stem access are laudable, we worry that too many stem policies and educational interventions rest on flawed, inaccurate understandings of what it means to develop and sustain a learning pathway. put briefly, where curriculum developers privilege narrowly defined maps of cognitive learning progressions required for deep understanding, we propose a more expansive view of pathways that embeds learning and development in the broader context of social practice. ours is a social, materialist conception of pathways that has roots within the danish school of critical psychology (dreier, 1997; mørck & huniche, 2006; nissen, 2005). learning pathways unfold over time across multiple settings. the temporal dimension of learning is always situated within the evolution of broader social practices and institutions. as such, learning is part of persons’ “changing participation in changing practices” (lave, 1996, p. 150). the spatial dimension involves a form of movement of actors within and across the different social contexts of their lives (gutiérrez, 2008). as practices evolve and spaces are reorganized, people pursue diverse and evolving concerns and imagine new possibilities for themselves, thereby becoming different people by gathering, adopting, and pursuing stances toward what they do and value (dreier, 2008). at the same time, political borders and historically rooted but persistent inequities shape who people become and where they can move; for many, these constraints are experienced as disruptions and threats to their well-being and social worlds that must be actively managed (gonzales, 2011; mørck, 2010). historically and still today, research on learning has proceeded from a different set of assumptions about the nature of learning pathways. for example, many studies of learning in neuroscience and psychology focus on brief experiments that take place in a single setting (e.g., mcdaniel, agarwal, huelser, mcdermott, & roediger iii, 2011). in education, where there is more research on complex classroom processes that unfold over time, few learning researchers have studied how these processes are embedded within changing social and political structures of schools and societies (lave & mcdermott, 2002; penuel, 2016). there is limited attention to how inequities are reproduced within educational systems through the production of failure, that is, by making educational success into a scarce resource and disproportionately identifying individuals from disenfranchised groups “disabled” or “failures” who should be held to account for their own plight (lave, 1996; varenne & mcdermott, 1998). in this conceptual paper, we argue for social practice theory to guide the study of learning pathways across setting and time, presenting a case study analysis that illustrates how it might be applied to the study of learning over time and across settings. social practice theory (dreier, 1999, 2008; holland & lave, 2009; lave, 2012) emphasizes the importance of tracing pathways of participation across varied contexts and over time. we elaborate this theory in the context of a longitudinal study of students’ interest-related learning. we argue that analyzing the visibility, continuities, and discontinuities of learning pathways from youth’s own point of view provides a valuable framework for diagnosing inequity in access to participation in valued social practices. the assumption is that structures of practice present both constraints and possibilities for action to persons; the task of the analyst is to identify specific conditions that are relevant to the lives of persons in practice, what those mean to persons and practices, and the reasons for persons’ actions (jefferson & huniche, 2009; mørck & huniche, 2006). penuel et al | f l r 32 2. studying learning pathways in social practices social practice theory begins with the premise that people participate in multiple and variable social contexts. participation is neither constant nor bound to a particular place: people “participate for longer or shorter stretches of time, on a regular or occasional basis and for various reasons in several contexts” (dreier, 2008, p. 38). the name pathway captures the ideas that movement and direction are both important aspects of the temporal and spatial dimensions of participation in practice (dreier), though social practice theory views these pathways in concrete, social, and material terms and not as idealized progressions (penuel, 2016). similarly, a learning pathway can be described in terms of how people move across borders between the familiar and the unfamiliar (gutiérrez, 2008) and in terms of a telos, that is, the direction of movement or change (lave, 1996). from the perspective of social practice theory, the movement and direction of learning is not chosen in isolation from available social and institutional structures, but in relation to them. as dreier (2008) writes, people must “take into account the structuring of social practice into particular contexts with particular links and use those links in directing their trajectories [pathways] across them” (p. 38). from the standpoint of social practice theory, the complexity and diversity of practices in which people participate is not necessarily a burden but is an enriching aspect of life. by moving across settings of social practice, people are able to pursue diverse concerns and become aware of new possibilities for action and arrangements for participation in practice (dreier, 2008). in addition, they are confronted with dilemmas and contradictions that motivate change and learning (engeström & sannino, 2010; mørck, 2010). people learn by adjusting their contributions to activities to one another (o'connor & allen, 2010) and to fit the demands and structures of local institutions (dreier, 2009). people also learn by inventing new ways to participate in practice, molding it into new cultural forms through our participation (calabrese barton & tan, 2009; gutiérrez, baquedano-lopez, & tejada, 2000). existing institutional structures of practice frame the choices people make about how and where to participate in activities. directing their learning pathways requires that people distribute their engagement across different settings, according to the suitability of each setting’s institutional arrangements for pursuing a particular concern and how the settings are linked to valued practices in other settings (dreier, 2008). these institutional arrangements themselves vary with respect to roles and possibilities for action, requirements for access to those roles, and persistent patterns of privilege, exclusion, and marginalization (lave & mcdermott, 2002). any one institutional setting for participation, then, is both a place for learning and a connection point in a more spatially and temporally extended pathway. for our analysis, then, we used social practice theory because it allowed us to make sense of youth’s suspended participation alongside attending carefully to youth’s oft-ambiguous relationship to contentious structures of practice and histories (holland & lave, 2009). social practice theory also demands that we pay “particular attention to differences among participants, and to the ongoing struggles that develop across activities around those differences” (holland & lave, 2009, p. 5). in this respect, a social practice account differs in emphasis from what might emerge from an analysis of activity systems from a cultural-historical activity theory (chat) perspective, because the unit of analysis is persons in practice. in chat, the unit of both analysis and intervention is the activity system (cole & engeström, 2006). at the same time, social practice theory has common roots with chat, in that both draw from accounts of human activity developed by marx and engels (marx & engels, 1848/1998). in addition, both share a commitment to collaborative engagement of researchers with practice to expand possibilities for learning and movement across contexts (gutiérrez, 2008; gutiérrez & vossoughi, 2010; jurow & shea, 2015; mørck, 2010).1 penuel et al | f l r 33 3. supporting and studying science learning pathways: the case of jerome a number of science education researchers have documented ways that youth are active participants in directing learning pathways in science in ways that are in close alignment with how dreier (2008) characterizes peoples’ efforts to distribute their engagements across settings (bricker & bell, 2014; crowley, barron, knutson, & martin, in press; polman & miller, 2010). as bell and colleagues (bell, tzou, bricker, & baines, 2012) write, “science learners need to figure out how to adapt their abilities, interests, and identities across a diverse set of locations on a routine basis as they attempt to accomplish their goals or respond to the interests of other social actors” (p. 270). they further argue that learners’ efforts are strongly shaped by competing value systems that valorize some practices over others, variations in supports available from guides, and others’ recognition (and misrecognition) of their racial and class identities and perceived abilities. as we illustrate through the case of jerome (a pseudonym), the particular pattern of engagements and stance toward science-related pursuits emerges from a kind of “dance of agency” (pickering, 1995) as he initiates movement toward science-related futures within a particular context that enables development toward some directions but not others, leading him to re-configure their engagements in relation to how that context responds to his initiative (see also carlone, 2004). 3.1. context for the case analysis for the duration of the study, jerome was a participant in a program called pathways into science (also a pseudonym) at a science museum in a large city in the u.s. west. the museum’s program is like many others in large science museums across the country that provide learning opportunities to youth from groups that are underrepresented in science. it is funded by a variety of private foundations and individual donors. youth participants must attend public schools in the city where the museum is located, be enrolled as a ninth or tenth grader, and commit to meeting the attendance requirements year-round and over multiple years for which the youth are eligible. the program is run as a paid internship in which youth serve as docents for the museum visitors and have opportunities to contribute to science investigations led by resident scientists. jerome fit the profile well of the type of youth the program seeks to serve. he had a strong interest in science and a strong work ethic that made him well suited to meet the heavy time commitments of the program. his mother was an immigrant to the united states, and during our interview with him, he identified himself as black, which he asserts made him stand out among his peers and at the museum as different from most other people. at the time of the interview he had just completed his junior year of high school. 3.2. approach to case development jerome was part of a larger study our team has been conducting for the past three years of youths’ experience of connected learning, an emerging, synthetic model of interest-related learning being investigated by a network of scholars in a range of settings (ito et al., 2013). as we defined it for this study, youth were engaged in interest-related learning if they could identify an activity they enjoyed doing, pursued it over a long period of time, and believed they were getting better at the activity over time or learning from it. the approach we took to analyzing youth learning was a case study approach, in which we focused on an interest-related pursuit as youth’s participation in it transformed over time and as they moved across settings. we drew on interview data we collected from a larger study of 54 youth who were aged 13-17, and through a process of collaborative data organization, coding, representation, and analysis, aimed to illustrate the utility of taking a social practice approach to the study of learning as movement across settings. our interview protocol included questions that elicited youth’s descriptions of their activities and purposes for participation, their current involvement in their activity, the networks (e.g. linkages and penuel et al | f l r 34 supports) they drew upon when participating in their activity, obstacles they experienced, and how they perceived the future as related to their participation. informed by our theoretical orientation to learning as movement, the interviews purposefully elicited youths’ perspectives on how their participation changed over time and across different settings. because a social practice perspective encouraged a focus on youth agency in distributing their engagements, we focused on how youth themselves viewed their actions, as well as the continuities and discontinuities they experienced as they moved across the settings they traverse (see also akkerman and bakker, 2011). to analyze the data, we began deductively with a set of high-level codes related to the broad outlines of social practice theory, attending specifically to how engagement in the activity transformed over time and across settings. parent codes such as ‘linkages/supports,’ ‘barriers,’ ‘possible futures,’ and ‘identities/roles’ allowed us to get a sense of how youth characterized the many disruptions and opportunities that became more or less relevant to their lives—a central aspect of a social practice approach to analysis. the coding was used as a first step in identifying themes related to how individuals distributed their engagements across multiple settings and over time. in the analysis for this conceptual article, we focused on themes derived from codes linked to youths’ experiences during their initial engagement with the activity and subsequent engagements with the activity, namely ‘initial relationship to engagement’ as it co-occurred with the child codes within ‘reason/stance/relationship to participation’ such as ‘career,’ ‘friends,’ ‘academics,’ ‘civic engagement,’ and/or ‘skill building and mastery.’ the focus on these codes was reflective of our desire to make sense of youths’ relationship to varied structures of practice, as well as their different stances toward interest-related pursuits. we then created data displays for each student that listed their descriptions of initial activity, history of involvement in activity, youth articulations of participation within and outside of the program, their future goals, and immediate outcomes, noting also the different settings in which activities took place. our analysis provided strong supporting evidence that rather than linear or straightforward, youths’ pathways were often characterized by a shifting and fluid distribution of engagement in a variety of settings over time. for this paper, we chose a case (jerome) that we concluded illustrated well the agency of young people in this process. 3.3. jerome's initial goals for participation in the pathways program jerome’s story is not one of a simple pipeline into a science career that is institutionally supported and easily navigable for him. he actively worked to set up and select his involvement in science-related programs, and he benefited from access to institutions that helped him to link engagements over time between contexts. before he first started at the science museum, he worked with another program that connected him with numerous different programs around the city that matched with his interest in science. jerome applied to many of these programs in attempt to find a place where he could explore his interest in science and be exposed to different kinds of sciences. he also decided against another program because it was too familiar, stating, “that’s a lot of people from my neighborhood, so i know them and i always like to get to know other people and see what different places and cities are like.” consistent with jerome’s goal to expand his social network, at the beginning he experienced discontinuities between his community and the museum. he said, “it was different because i wasn’t too accustomed to being around people who weren’t exactly from my neighborhood or my ethnicity.” jerome experienced this new context as discontinuous with his prior context but complementary to his goals, which he linked to a sense of racial difference within the program. over time, his sense of difference from others faded as he deepened relationships with others through retreats and a leveled role structure (e.g., youth at level one are mentored by youth at level two). as he has moved up this structure, jerome has taken on leadership roles in the program, joining the leadership council, giving him “more of a say in the program’s direction and working with others on how to improve the program.” penuel et al | f l r 35 3.4. emergent patterns of engagement an stances a big part of jerome’s learning within the paid intern program centered on learning how to contribute to the ongoing activities of the museum. interns’ teaching was organized around “stations” connected to natural history exhibits at the museum. he had to work three days a week on the floor of the museum, which he described as “nerve-wracking” since he disliked public speaking. but he said the push helped him build his public speaking skills, and because near-peers in the program—interns one level up in the program from him—pushed him to contribute: when we first started, they would try to encourage you to—you’re at a station and you’re talking about whatever your station is about and the other intern, they are more experienced, so they know what to talk about and to push you out there they will say, “oh, michael can tell you more about it.” he also got to take part in science investigations at the museum, where he gained a partial view of the ongoing work of scientists there. he came to realize through these experiences that he really liked working in labs with other people and coming up with a question that “you really never know what it is going to be in the end.” at the same time, he developed a stance toward science that pointed away from a career in basic science. “i couldn’t be a botanist,” he said, and then generalized to all scientists, saying, “i see what they do here and confined to the basement or downstairs. most scientists, they work alone and i don’t really like to work alone, but i would want to do it as volunteer work or helping out whatever.” though he had opportunities to work with scientists in the laboratory, it is not clear whether he has not accompanied them to places where he might gain a fuller view of range of activities that characterize scientific practice. jerome did anticipate doing something science-related for his future career and continuing to volunteer in a science museum as an adult, though these pathways are not visible or directly accessible to him, either through the program or through his social network. he expressed a mix of worry and confidence about his direction: i’ve never been exposed to the steps that it takes to be a doctor or physician or whatever. so i don’t know, is the workload going to be like crazy, but i’ve never experienced something that was like impossible. so i guess it’s possible. in addition, his map of colleges and universities was relatively incomplete, focused on highly competitive and well-known national universities like harvard and stanford on one end and a less competitive local public university. he believed that “it doesn’t matter which school you go to, it just matters how good you do at that school,” and that it’s not important to get caught up in the reputation of the school. 3.5. support from outside the program some of jerome’s confidence likely arose from the fact that jerome is strongly supported by his family members. jerome described his mother, grandmother, and older brother as all highly supportive of his involvement. his mother was especially supportive of jerome’s decision not to take the “sports route” his peers seemed to be on and that he saw was highly valued. she and other family members encouraged him to go his own way: they [older brother, grandmother] just tell me to do what i feel is best and are supportive. i was on the flyer for the program the first year i was in here and my mom made like fifty copies and sent them out. my grandma has one hanging in her room and she mailed it to other family members. they’re really happy and supportive in that kind of way, just happy for me in general. though jerome was happy that the recognition of his accomplishments followed him home, his personal stance toward sharing his accomplishments with others was more complex. he preferred not to talk penuel et al | f l r 36 about them, because he saw it as too self-centered. he said, “it’s weird to talk about yourself. i just do it and move on.” at the conclusion of our interview with him, we asked jerome to characterize his identity—is he a scientist, an intern, or something else? he easily characterized himself as a “student,” saying “i consider myself a student, i just feel like i’m constantly learning, not just in school, but in everyday life. in the city and just being alive…” jerome connected his identity as a student or an everyday learner to his interaction with people. he expressed confidence in his ability, and also a compulsion to know things, which he said makes him an especially good fit to the program he was in. 4. discussion and conclusion our analysis illustrates one way to use social practice theory to study transformations in youth participation in an interest-related pursuit over time and across settings. in jerome’s case, attending to the different stances toward participation in activity helped illuminate how he was thinking about his future. there was evidence of both continuity and discontinuity between his experience in the museum and his imagined future. as such, his interest in science may be more akin to a “line of practice” of the kind azevedo (2011, 2013) describes, in which a “line” or pathway can be discerned, though participation changes significantly over time. jerome’s opportunity to participate in and move across varied settings was central to his developing sense of future possibilities. social practice theory helped us to see jerome’s distribution of his engagements as central to his interest development. jerome highlighted opportunities to learn how to be a docent on the museum floor, as well as his participation in research as significant for his learning. he noted, for example, through his observation (“i see what they do here”) that most of the scientists worked alone, in basements, and contrasted that with his own enjoyment of collaboration, interaction, and helping others. although this may not be a completely accurate view of what is often highly collaborative work of scientists, this observation enabled jerome to channel his energies where he sees the greatest alignment with what he wants to be doing in his everyday work life. in its emphasis on how youth distribute engagement across different settings, social practice theory differs from traditional “pipeline” metaphors for pathways into stem, which tend to focus on content but not on movement across settings. this social practice analysis, which we have foregrounded in this paper, also shows that interaction with graduates of his program has already prepared him for some of the challenges that he may face in a few years. he learned from near-age peers about how challenging pre-med courses could be and the self-doubt that can creep in during that pre-med experience. jerome’s achievements to this point represent the best of what stem interventions seek to accomplish, in terms of broadening access to the field for someone with limited exposure to it, and cultivating deeper and more nuanced interest in the various pathways related to stem. he has a better sense of what he wants and what he doesn’t want, as well as the threats to achieving that goal. we contend that policy interventions that foreground these issues associated with linkages, networks, and practices represent a promising direction for the field. studying and supporting those interventions requires us, however, to move beyond traditional foci of learning research and look at learning through a social practice lens. we contend that using social practice theory in the analysis of youth’s learning is potentially transformative in this respect, for it highlights potential leverage points for transforming systems to enable broader participation in stem. penuel et al | f l r 37 keypoints a social practice theory offers a lens for interpreting interest-related learning pathways across settings and over time. a social practice account highlights learners’ agency as they pursue diverse concerns across a range of settings, as well as existing how structures of practice constrain agency and limit access to some settings. a social practice account provides a more expansive framework for understanding how and when persons and practices are mutually constituted in such a way as to broaden access to stem fields. acknowledgments funding for this research comes from the john d. and catherine macarthur foundation. any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funder. references bell, p., tzou, c., bricker, l. a., & baines, a. d. (2012). learning in diversities of structures of social practice: accounting for how, why, and where people learn science. human development, 55, 269-284. doi:10.1159/000345315. bricker, l. a., & bell, p. (2014). “what comes to mind when you think of science? the perfumery!”: documenting science-related cultural learning pathways across contexts and timescales. journal of research in science teaching, 51(3), 260-285. doi: 10.1002/tea.21134. calabrese barton, a., & tan, e. (2009). funds of knowledge, discourses and hybrid space. journal of research in science teaching, 46(1), 50-73. doi: 10.1002/tea.20269. carlone, h. b. (2004). the cultural production of science in reform-based physics: girls' access, participation, and resistance. journal of research in science teaching, 41(4), 392-414. doi: 10.1002/tea.20006. cole, m., & engeström, y. (2006). cultural-historical approaches to designing for development. in j. valsiner & a. rosa (eds.), the cambridge handbook on sociocultural psychology (pp. 484-507). new york: cambridge university press. crowley, k., barron, b. j. s., knutson, k., & martin, c. k. (in press). interest and the development of pathways to science. in k. a. renninger, m. nieswandt, & s. hidi (eds.), interest in mathematics and science learning and related activity. washington, dc: american educational research association. dreier, o. (1997). subjectivity and social practice. aarhus, denmark: center for health, humanity, and culture. dreier, o. (1999). personal trajectories of participation across contexts of social practice. outlines: critical social studies, 1(1), 5-32. dreier, o. (2008). psychotherapy in everyday life. new york: cambridge university press. dreier, o. (2009). persons in structures of social practice. theory & psychology, 19(2), 193-212. doi: 10.1177/0959354309103539. engeström, y., & sannino, a. (2010). studies of expansive learning: foundations, findings and future challenges. educational research review, 5, 1-24. doi: 10.1016/j.edurev.2009.12.002. gonzales, r. g. (2011). learning to be illegal: undocumented youth and shifting legal contexts in the transition to adulthood. american sociological review, 76(4), 602-619. doi: 10.1177/0003122411411901. penuel et al | f l r 38 gutiérrez, k. d. (2008). developing sociocritical literacy in the third space. reading research quarterly, 43(2), 148-164. doi: 10.1598/rrq.43.2.3. gutiérrez, k. d., baquedano-lopez, p., & tejada, c. (2000). rethinking diversity: hybridity and hybrid language practices in the third space. mind, culture, and activity, 6(4), 286-303. doi: 10.1080/10749039909524733. gutiérrez, k. d., & vossoughi, s. (2010). lifting off the ground to return anew: mediated praxis, transformative learning, and social design experiments. journal of teacher education, 61(1-2), 100117. doi: 10.1177/0022487109347877. holland, d., & lave, j. (2009). social practice theory and the historical production of persons. actio: an international journal of human activity theory (2), 1-15. ito, m., gutiérrez, k. d., livingstone, s., penuel, w. r., rhodes, j. e., salen, k., schor, j., sefton-green, j., & watkins, s. c. (2013). connected learning: an agenda for research and design. irvine, ca: digital media and learning research hub. jefferson, a. m., & huniche, l. (2009). re(searching) for persons in practice: field-based methods for critical psychological practice research. qualitative research in psychology, 6(1-2), 12-27. doi: 10.1080/14780880902896507. jurow, a. s., & shea, m. (2015). learning in equity-oriented scale-making projects. journal of the learning sciences, 24(2), 287-307. doi: 10.1080/10508406.2015.1004677. lave, j. (1996). teaching, as learning, in practice. mind, culture, and activity, 3(3), 149-164. doi: 10.1207/s15327884mca0303_2. lave, j. (2012). changing practice. mind, culture, and activity, 19(2), 156-171. doi: 10.1080/10749039.2012.666317. lave, j., & mcdermott, r. p. (2002). estranged labor learning. outlines, 1, 19-48. marx, k., & engels, f. (1848/1998). the german ideology, including theses on feuerbach and introduction to the critique of political economy. amherst, ny: prometheus books. mcdaniel, m. a., agarwal, p. k., huelser, b. j., mcdermott, k. b., & roediger iii, h. l. (2011). testenhanced learning in a middle school science classroom: the effects of quiz frequency and placement. journal of educational psychology, 103(2), 399. doi: 10.1037/a0021782. mørck, l. l. (2010). expansive learning as production of community. in learning research as a human science. yearbook of the national society for the study of education (vol. 109, pp. 176-191). new york, ny: teachers college record. mørck, l. l., & huniche, l. (2006). critical psychology in a danish context. annual review of critical psychology, 5. nissen, m. (2005). the subjectivity of participation: sketch of a theory. international journal of critical psychology, 15, 151-179. o'connor, k., & allen, a.-r. (2010). learning as the organizing of social futures. in learning research as a human science. national society for studies in education, 109(1), 160-175. penuel, w. r. (2016). studying science and engineering learning in practice. cultural studies of science education, 11(1), 89-104. doi: 10.1007/s11422-014-9632-x. pickering, a. (1995). the mangle of practice: time, agency, and science. chicago, il: university of chicago press. polman, j. l., & miller, d. (2010). changing stories: trajectories of identification among african american youth in a science outreach apprenticeship. american educational research journal, 47(4), 879-918. doi: 10.3102/0002831210367513. varenne, h., & mcdermott, r. p. (1998). successful failure: the school america builds. new york: westview press. vygotsky, l. s. (1934/1978). mind in society: the development of higher psychological processes. cambridge, ma: harvard university press. vygotsky, l. s. (1987). thought and language (a. kozulin, trans.). cambridge: cambridge university press. egloff et al publication frontline learning research vol.7 no. 1 (2019) 1 22 issn 2295-3159 students´ reading ability moderates the effects of teachers´ beliefs on students´ reading progress frank egloffa, natalie försteraelmar souvigniera auniversity of münster, germany article received 13 october 2017 / revised revised 6 september 2018/ accepted 25 october / available online 16 january abstract teachers’ beliefs about teaching have been found to affect students’ learning growth. the aim of this study was to investigate effects of teachers’ constructivist and direct-transmissive beliefs on learners’ reading progress and whether these effects are influenced by students’ ability. we measured constructivist and direct-transmissive beliefs of 29 teachers and the progress in reading fluency and reading comprehension of their students (n = 568) at eight points of measurement over one school year. results of three-level latent growth curve modeling revealed that only teachers’ global, but not reading specific constructivist beliefs, were generally positively related to learners’ progress in reading fluency. beliefs about teaching had no general effect on growth in reading comprehension, but the relation between constructivist beliefs and students’ progress in reading comprehension was affected by students’ prior skills. teachers with stronger constructivist beliefs effected higher learning growth for high ability compared to low ability learners within their classrooms. no effects were found for direct-transmissive beliefs. this study adds a more differentiated view to findings concerning the effects of teacher beliefs by showing that effects vary depending on the skill under study (fluency vs. comprehension), and that effects of teacher beliefs may depend on students’ ability. keywords: teacher beliefs, beliefs about teaching, reading comprehension, reading fluency info corresponding author email egloff@uni-muenster.de doi: 10.14786/flr.v7i1.336 1. introduction teachers’ beliefs about teaching have been found to influence teachers’ behaviour and thereby affect student learning (pajares, 1992; peterson, fennema, carpenter, & loef, 1989). two theoretically and empirically distinguishable beliefs about teaching are constructivist and direct-transmissive beliefs. a teacher with high constructivist beliefs for example is convinced that students play an active part in their learning and that they should and will develop their own problem-solving strategies. in contrast, a teacher who holds a direct-transmissive view about teaching believes that a teacher should guide students´ learning process. most of the research on teacher beliefs has shown that teachers’ high constructivist and low direct-transmissive beliefs are positively related to higher learning progress (dubberke, kunter, mcelvany, brunner, & baumert, 2008; peterson et al., 1989; souvignier & mokhlesgerami, 2005; staub & stern, 2002). nevertheless, in a sample of particularly low achieving students, contrary results were found (behrmann & souvignier, 2013). thus, similar to the concept of child x instruction interactions (e.g. connor, morrison, & petrella, 2004), which assumes that effects of instruction depend on the fit to students’ abilities, also effects of teachers’ beliefs might vary depending on learners’ initial skills. therefore, the goals of our study were to contribute to the research on effects of teachers’ beliefs by studying—as yet under-investigated¬—students in primary school in the domain of reading and—more importantly—to investigate whether these effects are influenced by prior abilities of individual students or the respective classroom. 1.1 teachers’ beliefs about teaching beliefs are subjective evaluations on whether a specific proposition is true (e.g., pajares, 1992; richardson, 2003). they can be distinguished from knowledge, which is rather based on logical argumentation, fact and thus, expert consensus. in contrast, beliefs are non-consensual because they are rather built on personal emotional experiences (behrmann & souvignier, 2013; nespor, 1987; pajares, 1992; richardson, 2003). teachers’ beliefs are assumed to be very important as they affect what happens in the classroom (e.g., buehl & beck, 2015; dubberke et al., 2008; kagan, 1992; peterson et al., 1989; staub & stern, 2002). they refer to issues that are relevant to teachers’ profession such as their own teaching effectiveness, the nature of knowledge, and how students should be taught (e.g., pajares, 1992). hence, beliefs about teaching, in general, cover all aspects of the spectrum on quality of education (fives, lacatena, & gerard, 2015). within the category of beliefs about teaching, constructivist and direct-transmissive beliefs are most apparent. case studies using analyses of teacher talk during shared planning time (gill & hoffman, 2009) and other qualitative methods mostly revealed constructivist and direct-transmissive beliefs among teachers (see fives et al., 2015; kleickmann, 2007). furthermore, it was demonstrated that operationalisations of constructivist and direct-transmissive beliefs predicted teaching behaviour as well as students’ learning outcomes (e.g., dubberke et al., 2008; staub & stern, 2002). constructivist views on learning and teaching can be related to cognitive constructivist learning theories (see savery & duffy, 1995). according to this perspective, teachers believe that learners play an active role in the process of studying. the underlying idea of learning is that students actively integrate new information into their existing knowledge. to foster an active and autonomous studying process, the teachers’ task is to provide learners with meaningful learning environments (staub & stern, 2002). in contrast, direct-transmissive views on learning and teaching are related to behavioural-associationist learning theories (see resnick & hall, 1998). according to this view, teachers explicitly instruct and guide the students through new contents. thereby, teachers pre-structure the topics and monitor learners’ progress, which leads to a more passive role of the students in the process of knowledge building (staub & stern, 2002). 1.2 effects of teachers’ beliefs about teaching on students’ progress teachers’ beliefs affect perception, information processing, judgement, decision making, and the way of teaching (e.g., buehl & beck, 2015; dubberke et al., 2008; kagan, 1992; pajares, 1992; peterson et al., 1989). hattie (2012) concluded that next to commitment, “teachers’ beliefs (…) are the greatest influence on student achievement over which we have some control” (p. 22). to date, several studies have examined the influence of teachers’ constructivist or direct-transmissive beliefs, showing either positive or negative effects on students’ progress. nevertheless, the number of such studies is limited. peterson et al. (1989) investigated a sample of 39 first-grade math teachers and found constructivist views to be positively related to learners’ mathematical word problem solving. staub and stern (2002) surveyed 27 teachers and similarly demonstrated that constructivist beliefs were positively linked to students’ growth in mathematical word problem solving in second and third grade. studying effects of teachers’ transmissive beliefs in a sample of 155 ninth and tenth-grade school teachers, dubberke et al. (2008) found a negative relation between strong transmissive beliefs and student achievement in mathematics. in reading, only two studies have been conducted, yet. in line with studies in mathematics, data from souvignier and mokhlesgerami (2005) revealed a positive relation between constructivist beliefs and learning growth in reading strategy knowledge for fifth and sixth-grade students from the highest school track. behrmann and souvignier (2013) studied a sample of particularly low performing sixth and seventh graders. in contrast to prior research, they found advantages of direct-transmissive beliefs concerning learners’ declarative and procedural reading strategy knowledge but not their reading fluency. summing up the findings in the literature, positive relations between teachers’ constructivist beliefs and students’ learning growth have been found in mathematics and reading within groups of average to high performing students, whereas direct-transmissive beliefs seem to be negatively related to student learning. this might be explained by higher cognitive activation and motivational advantages of teaching methods that are in line with constructivist beliefs (e.g., dubberke et al., 2008; savery & duffy, 1995; staub & stern, 2002). nevertheless, the opposite pattern of findings in a sample of particularly low achieving students (behrmann & souvignier, 2013) raises the question of whether these effects depend on students’ initial ability. following the concept of child x instruction interactions, connor et al. (2004) found that high achieving students benefitted from more self-regulated phases of reading, while lower performing students needed more pre-structured teaching. given that beliefs affect teachers’ decision-making process and their teaching, the same interactional effects as those between child x instructions might exist for teacher beliefs. thus, a teacher with high constructivist beliefs who provides much child-managed instruction meets the needs of high but not those of low achieving students. in contrast, a teacher who uses many phases of teacher-managed instruction due to his or her high direct transmissive beliefs, might affect higher growth for the low but not the high achieving students in the classroom. the short review of studies that analysed the relation between teachers’ beliefs and learning progress reveals that the majority of studies have been conducted in the domain of mathematics. only two studies (behrmann & souvignier, 2013; souvignier & mokhlesgerami, 2005) investigated effects of teachers’ beliefs on growth in students’ reading related skills. in these studies, reading achievement was assessed specifically according to reading fluency and knowledge of reading strategies. effects on reading comprehension, however, have not been studied yet. therefore, the goal of our study was to broaden the empirical basis concerning effects of teachers’ beliefs in the domain of reading, differentiating between the two key constructs of reading fluency and reading comprehension (national institute of child health and human development [nichd], 2000). 1.3 reading fluency and reading comprehension reading fluency consists of word recognition accuracy and reading speed (samuels, 1979). it is based on the automation of word recognition (e.g., nichd, 2000) and therefore relies on large amounts of student-driven decoding practice (rasinski, reutzel, chard, & linan-thompson, 2011). students’ reading fluency increases from grade to grade with a decreasing growth rate over time (parrila, aunola, leskinen, nurmi, & kirby, 2005; tilstra, mcmaster, van den broek, & kendeou, 2009). reading comprehension in contrast, is a process of constructing a subjective representation of textual information (kintsch, 1998). concerning this skill, two sub-processes can be distinguished. one of them results in a local semantic representation of information explicitly inherent to the text. to build this so-called textbase, the reader has to infer meaning from connecting information of words and sentences of smaller units of a text. the other sub-process connects the semantic information of the textbase with prior knowledge to build a meaningful macrostructure representation of the text, which is called situation model. this skill is based on elaboration, organization and metacognitive processes and thus, cannot be automatised. cromley and azevedo (2007) showed that background knowledge about the content of a text, strategies and vocabulary are important predictors to knowledge-based inferences as well as to reading comprehension itself. longitudinal studies revealed mixed results about growth trajectories. studies of parrila et al. (2005) as well as of farnia and geva (2013) showed that reading comprehension growth decreases over time. nevertheless, studies of tilstra et al. (2009) as well as of nation, cocksey, taylor, and bishop (2010) revealed that growth rates do not decrease over time, but rather follow a linear growth pattern. from the delineation of these two different reading skills, it becomes obvious that they consist of entirely different cognitive processes. while reading fluency is based on automatised word recognition, reading comprehension is based on prior knowledge, metacognitive processes and consciously made inferences. consequently, successful instructional designs to support these skills vary concerning the amount of teacher and student guided activities. reading interventions designed to support reading fluency like paired reading (topping, 1987) and repeated reading (samuels, 1979) encourage students to play an active role in the learning process. these methods especially rely on extended practice so that there is only little need for direct instruction. reading programs to foster reading comprehension (e.g., brown & pressley, 1994; souvignier & mokhlesgerami, 2006; paris, cross, & lipson, 1984), however, demand an active role by the teacher who is supposed to directly instruct and model cognitive and meta-cognitive strategies for reading by a relatively large part. thus, teacher guided instruction may be necessary especially for lower performing students to successfully develop their reading comprehension. 1.4 assessment of teachers’ beliefs about teaching from a learning theory perspective, constructivist and direct-transmissive beliefs are contradictory (behrmann & souvignier, 2013; oecd, 2009; staub & stern, 2002). thus, the assessment of beliefs has often been conceptualized on a constructivist-transmissive continuum, using only one single scale (e.g., peterson et al., 1989; souvignier & mokhlesgerami, 2005; staub & stern, 2002). dubberke et al. (2008), however, used one scale to assess direct-transmissive beliefs only. likewise, behrmann and souvignier (2013) used separate constructivist and direct-transmissive scales in addition to a constructivist-transmissive continuum scale. this alternative approach follows the argumentation that teachers may hold even potentially contradictive perspectives on effective teaching (see fives et al., 2015). it was found that teachers can endorse constructivist and direct-transmissive views at the same time (organisation for economic co-operation and development [oecd], 2009; fives et al., 2015; snider & roehl, 2007). teachers’ beliefs about teaching may be inconsistent because of a vast range of different teaching situations that teachers have to face and to consider (behrmann & souvignier, 2013; fives et al., 2015). this is supported by means of factor analytical results, which show that constructivist as well as direct-transmissive beliefs each build their own factors (e.g., bunting, 1985; woolley, benjamin, & woolley, 2004). thus, conceptualizing and measuring constructivist and direct-transmissive views on teaching with separate scales may be more appropriate (behrmann & souvignier, 2013; buehl & beck, 2015; woolley et al., 2004). nevertheless, strong negative correlations between measures of constructivist and direct-transmissive scales indicate that teachers generally tend to favour one of the two orientations (see behrmann & souvignier, 2013). another important aspect of beliefs about teaching is their specificity with respect to a certain content. beliefs can be conceptualized as content specific (i.e. related to reading instruction) as well as content general. content specific beliefs about teaching may be especially important because they may more precisely apply to content specific teaching situations (peterson et al. 1989; staub & stern, 2002). reading specific beliefs may thus, especially affect students’ reading competence growth. nevertheless, global beliefs about teaching may also have an important impact on students’ learning of reading skills because they may affect teaching on a more general level in most teaching situations. given that reading is not limited to a specific subject like maths, it seems reasonable to assess teachers’ beliefs both in a content specific and in a content general way. 1.5 assessment of student progress to analyse the effects of teachers’ beliefs on learners’ progress, usually data from longitudinal designs with preand posttests on student achievement have been used. difference scores from two measures as an indicator for learning progress, however, have been criticized with respect to limited reliability (willett, 1989). assessing change with multiple points of measurement creates advantages over pre-post measures, because it boosts the reliability of the growth rate estimates (e.g., singer & willett, 2003; willett, 1989). willett (1989) demonstrated that every additional point of assessment helps to deflate standard errors and concluded that “with sufficient waves added, the influence of fallible measurement rapidly dwindles to zero” (p. 598). furthermore, speer and greenbaum (1995) demonstrated that growth modeling based on multiple points of assessment is more sensitive to change than methods based on pre-post measures. in this study, we enhance the reliability of the assessment of growth by modeling learning progress across eight points of measurement over one school year. 1.6 research questions our study addressed three research questions: first, we were interested in general effects of teachers’ beliefs on students’ progress in reading. consistent with previous studies, we expected positive effects from constructivist beliefs and negative effects for direct-transmissive views on reading fluency and comprehension. second, we wanted to investigate if students’ initial achievement moderates the effects of teachers’ beliefs on students’ reading progress. given the different findings in reading with positive effects of either constructivist or direct-transmissive beliefs for high and low achieving students, respectively, we anticipated that constructivist beliefs might be supportive for students with higher reading skills, whereas direct-transmissive beliefs might be more suitable for students with lower reading skills. third, given that moderating effects might not only become apparent at the individual level but also in the entire classroom, we also studied whether effects of teachers’ beliefs on classrooms’ reading growth were moderated by the average prior ability of the classroom. in concordance with the second hypothesis, we expected that constructivist beliefs are supportive for classrooms with higher initial reading ability and direct-transmissive beliefs to be more helpful for classrooms with lower initial reading ability. regarding each of the hypotheses, we expected the same effects for growth in reading fluency and reading comprehension. 2. methodology 2.1 participants and procedure teachers from a previous reading intervention study who voluntarily decided to implement learning progress assessment in the school year 2012-13 were asked to participate in this study. out of 47 teachers, 29 teachers (83% female) agreed to participate. on average they were about 48 years old (m = 47.90 years, sd = 10.99) and had a teaching experience of approximately 22 years (m = 22.45 years, sd = 12.05). the student sample consisted of 568 fourth graders (49% female; 17% with a migration background) from 29 classrooms in 18 german schools. at the first point of measurement, students were approximately 10 years old (m = 9.73 years, sd = 0.48). teachers’ beliefs were assessed with a questionnaire at the beginning of the school year before repeatedly assessing students’ reading skills. participation was voluntary. neither teachers, nor students received incentives for participation. 2.2 measures 2.2.1 teacher beliefs following the procedure by behrmann and souvignier (2013), teachers rated their beliefs on three different scales. the constructivist orientation scale (cos) measures constructivist beliefs with regard to reading instruction. an example for an item of the cos is: “in order to learn how to competently handle texts, it is helpful to let students discuss their own text approaches.” internal consistency of this scale was acceptable (cronbach’s α = .75). the global orientation scale (gos) quantifies global constructivist beliefs. an example for an item of the gos is: “curricular activities should primarily focus on students’ practical learning experiences.” internal consistency of this scale was also acceptable (cronbach’s α = .72). the direct-transmissive orientation scale (dos) measures direct-transmissive beliefs specifically referring to reading instruction. an example for an item of the dos is: “most students are unable to discover reading strategies on their own, and therefore need explicit instruction.” for this scale, internal consistency proved to be good with cronbach’s α = .82. the cos, gos, and dos measures consist of six items each with a four-point likert scale, ranging from (1) i strongly disagree to (4) i strongly agree (see behrmann & souvignier, 2013). a content general direct-transmissive scale was not provided by behrmann and souvignier (2013) and thus not used in this study. 2.2.2 reading progress students’ progress in reading fluency and reading comprehension was assessed over one school year using an internet-based tool for learning progress assessment (see förster & souvignier, 2014). at intervals of three weeks, students individually completed one of eight equivalent reading tests during self-study periods at school. each test took about 10 min on average. in each of the eight tests, learners first completed a maze task in which every seventh word of a text was deleted. students were instructed to replace the 24 gaps as quickly as possible by choosing the correct word among three choices. no time limit was given to assure that all learners had read the complete text. in addition to the number of correct replacements, we also recorded the time needed to complete the maze task. given the need to simultaneously recognize words and construct meaning from text to select the gaps, this test format is in accordance with definitions of reading fluency (e.g., samuels, 1979). we measured reading fluency in the current study as the number of correctly selected words within 1 min. after completing the maze task, students answered 16 comprehension questions that referred to the text from the maze. while answering the questions, the complete and correct text was visible. learners were required to choose the correct answer from four choices. following models of text comprehension (e.g., kintsch, 1998), half of the questions addressed text-based information and thus asked for information that was explicitly contained in the text. the other eight questions assessed the construction of a situation model by requiring students to make inferences from the given information. no time constraints were given to complete the task. we used the number of correct answers as the reading comprehension measure. eight tests were applied over the whole assessment period. four of the tests were based on non-fictional texts about animals and the other four texts were based on fictional detective stories. fictional and non-fictional texts were performed alternately. prior research has documented the psychometric quality of the reading tests (see souvignier, förster, & salaschek, 2014). internal consistencies were found to be high with cronbach’s α ranging from .86 to .89. in addition, correlations to standardized paper-pencil tests measuring reading fluency (r = .60 to .66) and reading comprehension (r = .63 to .65) revealed satisfying criterion validity. the tests demonstrated that they are sensitive to student improvement with significant reading growth over the eight points of measurement. 2.3 data analysis we removed outliers that were two standard deviations below the average of an individuals’ points of measurement because of selective distortion of the data due to guessing, inattention, or failure to make a decision. we also removed outliers that were two standard deviations above the individual average on reading fluency, because fast guesses likely increased these measures. in total, 1.6% of the data were excluded for reading fluency and 0.7% for reading comprehension. data coverage at any point of measurement was continuously higher than 90% with the highest rates of data coverage at the last point of measurement. in total, 6.6% of the reading fluency and 4.4% of the reading comprehension values were missing. we used full information maximum likelihood (fiml) to account for missing data, which has shown to be particularly useful for structural equation modeling (enders & bandalos, 2001). with this procedure, all existing data are used to estimate model parameters. data were nested in three levels. points of measurement (level 1) were nested in students (level 2) which were nested in classrooms (level 3). thus, we applied a three-level latent growth curve model using mplus 8 (muthén & muthén, 2017). with this analysis, prior competence can be modeled on the level of individual students as well as on the level of classrooms. thereby we accounted for the nested structure of the data and prevented underestimation of standard errors (bryk & raudenbush, 1988). given that linear and non-linear reading growth rates have been reported in the literature (parrila et al., 2005; tilstra et al., 2009), we considered linear, quadratic, and free-loading models for the most suitable curve estimation of our data (bollen & curran, 2006; duncan, duncan, & strycker, 2006). we rejected the free-loading model, because its growth factor can only be interpreted as a measure for progress when the determined shape is close to linearity (duncan et al., 2006). after scrutinizing the mean scores of the eight measurement points (see table 4), we only compared fit indices between linear and quadratic models. a linear growth factor is the most legitimate measure of progress for these models (see bollen & curran, 2006). compared to a linear model, a quadratic model additionally has a quadratic growth factor, which is an indicator for an acceleration or deceleration trend in growth over time. its linear factor thereby represents the growth rate at the first measurement point (bollen & curran, 2006). to specify a quadratic model with a comparative linear measure of learning progress over the whole period of the study, we fixed the variance of the quadratic growth factors to zero at the individual and classroom levels. thus, the acceleration or deceleration trend in growth over time was constrained to be equal across all classrooms. at the individual student level, the quadratic factor mean is zero, because lower level (i.e. student level) scores are deviations from means of higher levels (i.e. classroom level) in multilevel modeling. with no variation and a mean of zero, the quadratic growth factor on the individual level remained unspecified. this resulted in a quadratic model with linear growth factors at the individual and classroom level, which could be used as comparable measures of growth over the whole period of measurement. next, solutions of the above described linear and quadratic growth models were compared by using akaike information criterion (aic) and bayesian information criterion (bic). fit indices for reading fluency and reading comprehension suggested an advantage for the quadratic model (see table 1). the classroom level quadratic factor was negative for reading fluency (p < .001) and for reading comprehension (p < .001) indicating decelerated growth over time. based on this result, the quadratic model was selected for the reading fluency and comprehension data. table 1 fit indices of linear and quadratic three-level latent growth curve models without covariates teachers’ beliefs and the interactions of teachers’ beliefs with prior abilities on student and class level were stepwise included into a baseline model resulting in three additional models. all models are shown in table 2 and coefficients are described in table 3. given the three different measures of teachers’ beliefs and the two reading outcomes, we ran 18 models in total. teacher belief data were centred at the grand mean before they were added as covariates to the baseline model in a stepwise procedure (models 1-3, table 6 & 7). this procedure allows for unbiased estimates of higher-level interaction effects (enders & tofighi, 2007). in model 1, only a test for a main effect of teachers’ beliefs on the classrooms’ average learning progress (γ101) was conducted by including the third-level predictor teacher belief (tbj). model 2 additionally tested a cross-level interaction effect (γ201) to analyse if students’ initial achievement moderates effects of teachers’ beliefs on students’ learning gains. thereby, students’ initial reading achievement (r0ij) and progress (r1ij) are determined in relation to their own classrooms’ average initial achievement (ß00j) and progress (ß10j). to determine the cross-level interaction effect in this way, a random effect of prior skill levels on learning growth was required for the student level (ß11j). thus, a regression of individual deviations from class level learning growth (r1ij) on individual deviations from class level prior ability levels (r0ij) was modelled. the parameters r0ij and r1ij thereby indicate the group mean centred individual deviations from the classrooms prior ability and learning progress, respectively. having group mean centred lower level predictors (as r0ij in our case) is crucial to estimate cross-level interaction effects because then, estimates are not biased by interaction, which may be potentially present on class level (enders & tofighi, 2007). model 3 furthermore tested if the classrooms’ initial skills moderated the effect of teachers’ beliefs on the classrooms’ learning growth (γ103). in addition, a test for effects of initial competences on learning progress at the classroom level (γ102) was added to model 3 to allow unbiased estimations of the classroom level interaction effect. table 2 overview of the three-level latent growth curve models table 3 description of coefficients of model 3 3. results 3.1 descriptive statistics and correlations table 4 shows the means, standard deviations, and intercorrelations of reading fluency and reading comprehension data at all points of measurement. all measures of reading ability were highly correlated. table 5 presents the descriptive statistics of the belief scales. moderate to strong positive and negative correlations were found between the cos, gos, and dos measures as expected. a mean score of m = 3.36 for the cos and m = 3.53 for the gos on a 1 to 4 scale indicated that teachers on average agreed with the reading specific and global statements of the constructivist orientation scales. the mean score for the dos was in the middle of the scale (m = 2.48). table 4 intercorrelations, means, and standard deviations of reading fluency and reading comprehension at eight points of measurement (n = 568) table 5 intercorrelations, means, standard deviations of the cos (n = 29), gos (n = 29) and dos (n = 29) scales 3.2 analyses of longitudinal data as shown in table 6, the quadratic three-level latent growth curve model for reading fluency without covariates revealed that students reached an average of 3.63 correctly selected gaps per min at the beginning of fourth grade (γ000). the linear growth (γ100) was 0.39 gaps every three weeks with a moderate deceleration trend indicated by a negative quadratic factor (γ300 = -0.03). we found substantial within-class variance in students’ reading fluency at the beginning of the school year (r0ij = 1.93, p <.01) but no significant variation in linear growth over the course of the school year (r1ij = 0.01, p = 0.15). the same pattern was observed at the classroom level. initial abilities significantly differed between classrooms (u00j = 0.13, p < 0.01), whereas linear growth in reading fluency did not differ between classrooms (u10j = 0.002, p = 0.17). students answered on average 9.79 questions correctly on the reading comprehension test (γ000) at the beginning of fourth grade (see table 5) and had a linear improvement rate of 0.30 answers (γ100) every three weeks on average with a moderate deceleration trend indicated by a negative quadratic factor (γ300 = -0.03). similar to the results found for reading fluency, reading comprehension significantly differed between students from the same classroom at the beginning of the school year (r0ij = 4.91, p <.01), but variation in linear learning growth between individual students of the same classroom was not significant (r1ij = 0.004, p = 0.66). the opposite pattern was found at the classroom level. although no significant differences were found for prior reading comprehension (u00j = 0.66, p = 0.11), different classrooms showed different linear growth in reading comprehension over the school year (u10j = 0.01, p <.01). despite finding some non-significant variances at the student and classroom levels, we analysed the effects of teachers’ beliefs, because adding beliefs as covariates may increase testing power by reducing error variance in the dependent variable (aberson, 2010). table 6 parameters for the three-level latent growth curve baseline models for reading fluency and reading comprehension 3.3 main effects of teacher beliefs results of the three-level latent growth curve models revealed that teachers’ global constructivist beliefs were positively related to students’ progress in reading fluency (see table 7, model 1 & 2) . no significant effects were found for the reading specific cos scale and direct-transmissive beliefs on reading fluency growth. the results for reading comprehension showed that none of the teacher belief scales was significantly related to students’ learning growth (see table 8, model 1 & 2). table 7 effects of teacher beliefs on progress in reading fluency 3.4 interaction of teacher beliefs with student ability we also analysed whether students’ prior ability (relative to the average classroom ability) moderated the effects of teachers’ beliefs on students’ deviation from the average growth of the classroom. results for reading fluency indicate that students’ initial skills did not affect the relation between teachers’ beliefs and individual learning growth (see table 7, model 2 & 3). as hypothesized, however, a significant cross-level interaction was found for reading comprehension (see table 8, model 2 & 3). the effect of teachers’ constructivist beliefs on students’ reading growth was positively moderated by students’ prior skills. hence, teachers with higher constructivist beliefs affected higher growth in reading for students with higher prior ability compared to students with lower ability within their classrooms. we found no interaction for direct-transmissive beliefs (see table 8, model 2 & 3). table 8 effects of teacher beliefs on progress in reading comprehension 3.5 interaction of teacher beliefs with classroom ability we additionally tested whether the prior average reading skills of the classroom moderated the effect of teachers’ beliefs on growth in reading fluency and reading comprehension on the classrooms level. the results show that the classrooms’ prior competences did not moderate the effect of teachers’ beliefs on classrooms’ learning progress (see table 7 & 8, model 3, respectively). in addition, classrooms’ initial reading fluency and reading comprehension ability had no general effect on learning. 4. discussion in this study, we used multilevel latent growth curve modeling to investigate effects of teachers’ constructivist and direct-transmissive beliefs on students’ progress in reading fluency and reading comprehension. moreover, we examined whether effects of teachers’ beliefs depended on prior reading ability. we found that teachers’ global constructivist beliefs had a general positive effect on students’ reading fluency but not on their reading comprehension progress. no significant relations were found between reading specific constructivist beliefs or direct-transmissive beliefs and student growth in reading fluency and reading comprehension. as hypothesized, we found an interaction of teacher beliefs and prior abilities. high achieving students in contrast to low achieving students benefited in their reading comprehension growth from a teacher who holds high constructivist beliefs. the positive effect of teachers’ global constructivist beliefs on reading fluency was unaffected by prior abilities. thus, effects of teacher beliefs seem to depend on the skill under study and interact with students’ prior ability, which we discuss in the following section. 4.1 interpretation of results our finding that general constructivist beliefs of teachers were positively related to students’ growth in reading fluency is in line with our hypotheses. the same effect, however, was expected but not found for reading comprehension and no main effects of reading specific teacher beliefs were found. also, whether or not prior abilities moderated effects of teacher beliefs seems to depend on the respective skill. so how can we explain this pattern of results? given the different findings for reading fluency and reading comprehension, one starting point is to reflect on the specific reading skills and effective ways of teaching these skills. while reading fluency is characterized by the automation of word recognition, reading comprehension requires to intentionally apply reading strategies to construct a representation of the situation model and to connect new information to prior knowledge. consequently, effective reading fluency instruction aims to automatize word recognition, for example by instructing students to (repeatedly) read text passages aloud (e.g., repeated or paired reading; topping, 2006). these decoding practices are mainly student-driven without a particular need of teacher instruction and thus easily match with high constructivist beliefs. effective instruction of reading comprehension, in contrast, is characterized by the explicit instruction of reading strategies by the teacher (souvignier & mokhlesgerami, 2006). this contradicts constructivist views of teaching after which students should and will develop their own problem-solving strategies. actually, as indicated by our finding that prior student ability moderated effects of global and specific constructivist beliefs on students’ progress in reading comprehension, it seems that high achieving students indeed tend to develop their own effective reading strategies and thus profit when a teacher with high constructivist beliefs teaches them. low achieving students, however, might be overstrained to self-regulate their reading comprehension without explicit instruction of strategies. following this argumentation, we would expect that teachers with high constructivist beliefs positively affect the development of skills that require student-driven practices to automatize processes (e.g. word recognition or basic mathematical skills) for all students independent of their prior ability. if, however, the skill is not characterized by high automation but requires strategic behaviour, teachers who provide much student-managed but less teacher-managed instruction due to their high constructivist beliefs might positively affect learning for the high but not the low achieving students. this assumption is in line with findings on child x instruction interactions by connor and colleagues, who showed that low achieving students benefit from teacher-managed instruction but high achieving students benefit from child-managed instruction (e.g. connor et al., 2011; 2004). the positive effects of high constructivist beliefs on student learning have been ascribed to higher cognitive activation and motivational advantages of teaching methods used by teachers with high constructivist beliefs (e.g., savery & duffy, 1995; staub & stern, 2002). these positive motivational advantages, however, will likely not occur if students feel overstrained by the task. we assume that the teaching methods of teachers with high constructivist beliefs will probably be more child-managed and less teacher-managed, and that those methods will likely overstrain low achieving students, who need explicit guidance to build up self-regulated strategic reading behaviour. regarding reading fluency, in contrast, most fourth-grade students will be able to cope with the task of just reading a text, which might enhance their automation of word recognition but not their ability to understand texts. this would explain why we find general positive effects of constructivist beliefs on growth in reading fluency but not in reading comprehension. reading specific views about teaching had no effects on learning growth, which is in contrast to our hypotheses as well as to findings showing that content specific constructivist beliefs about teaching can effect students’ learning (e.g., staub & stern, 2002; souvignier & mokhlesgerami, 2005). nevertheless, results point to expected directions. effects of reading specific beliefs on learning growth may be weaker, because compared to global beliefs, they may be rather limited to lessons specifically dedicated to foster reading skills. we found no effects for direct-transmissive beliefs. thus, although variance in direct transmissive beliefs was highest, this variance did not explain differences in reading fluency or reading comprehension growth. most prior studies have assessed teacher beliefs using a single continuum. the only study investigating teacher beliefs on separate scales in reading is very specific as the teachers applied a strategy-based reading program (text detectives; behrmann & souvignier, 2013). this pattern of results suggests that differences in direct-transmissive beliefs in contrast to constructivist beliefs might be less influential on teaching behaviour and thus do not explain differences in student learning. finally, it should be noted that the interactions with prior ability were found on the student but not the classroom level. an explanation may be that significant variance in prior abilities was found between students of the same classrooms, while the average reading skills of the classrooms in this study seemed to be similar (see table 6). 4.2 limitations a limitation of this study is the small variance in student and classroom level reading growth, despite the representative range of classrooms. the limited variation between classrooms’ growth in reading may have masked effects of teacher beliefs. investigating interaction effects in a sample with a higher variance within and between classrooms would be desirable. moreover, when interpreting our results, one should consider that the belief measures might be affected by social desirability regarding constructivist beliefs leading to ceiling effects and low variances in the cos and gos scales. assuming that social desirability affected our measures it is likely that effects of teachers’ beliefs on students’ reading growth were rather masked than increased by potentially inflated standard errors. given that standard errors are similar for effects of constructivist and direct-transmissive beliefs (see tables 7 & 8), however, we assume that the impact of social desirability may be rather negligible. unfortunately, the construct validity of the three belief scales could not be confirmed using confirmatory factor analysis due to the limited teacher sample. nevertheless, the moderate to strong correlations between the three scales indicate that they are partly independent of each other. in the introduction, we stated that content general as well as reading specific beliefs about teaching are relevant. nevertheless, similar to the study conducted by behrmann and souvgnier (2013), our study included a global constructivist, but not a global direct-transmissive belief scale. further studies should investigate effects of both global constructivist and direct-transmissive beliefs to fully discriminate between content-specific and global beliefs. analysing effects of teachers’ beliefs on students’ learning is largely based on the assumption that beliefs and classroom behaviour of teachers are closely connected (e.g., buehl & beck, 2015). in research designs without classroom observations, as in our study, the variability of teachers’ instructional activity remains an open issue. conversely, by not observing teacher behaviour we ensured unimpaired business-as-usual instruction and thus high ecological validity of the study. regarding the external validity of this study, it should be considered that the results and conclusions of this paper are based on reading skill data from a sample of fourth grade classrooms only. consequently, generalization about different grades and content is limited, especially as our results indicate that effects of teachers’ beliefs may depend on the specific skill under study. the teachers of this study had access to the results of their students’ reading tests, which is a feature of the learning progress assessment tool (förster & souvignier, 2014). thus, in addition to their personal impressions, they had another objective information about the development of their students. we do not assume that the availability of more student information alone is responsible for the effects. moreover, quantity and quality of additional information was the same for all teachers. given that effects of both teacher beliefs and learning progress assessments are assumed to be mediated by instructional behaviour, future studies should investigate the interplay of teacher beliefs, learning progress assessment and instructional behaviour. for example, according to constructivist views, prior knowledge is considered to be particularly important for the learning process (savery & duffy, 1995; staub & stern, 2002). thus, teachers with high constructivist views might be more receptive to additional assessment information. 4.3 conclusion our study complements existing research in a number of ways. first, we investigated effects of teacher beliefs in the domain of reading and–up to our knowledge–provide the first results for effects on reading comprehension. second, we analysed under which conditions which teacher beliefs positively affect students’ reading progress. the analysis of interactions between student ability and different teacher beliefs on both student and classroom level adds to our knowledge of the interplay between teacher and student variables during the learning process and provides a novel perspective to child x instruction interactions. third, we assessed reading fluency and reading comprehension progress with eight measurements across the school year, thereby ensuring the reliable assessment of student progress. this work partly confirmed general effects of teachers’ beliefs about teaching. global constructivist, but not reading specific beliefs about teaching had an impact on students’ reading fluency. no general effects of teachers’ beliefs on reading comprehension progress were found but teachers with stronger constructivist beliefs affected higher learning growth in reading comprehension for students with higher prior ability compared to lower performing students within the classrooms. a similar interaction was not found for reading fluency indicating that effects of teachers’ beliefs on growth in reading fluency is unaffected by prior skills. these skill-specific findings for effects of teachers’ beliefs and their interaction with students’ ability might be explained by differences regarding the optimal instruction of these skills that correspond more or less to constructivist views of teaching. our study thus adds to our understanding of the conditions under which constructivist teacher beliefs are positively associated with student learning. keypoints we investigated effects of teachers’ constructivist and direct-transmissive beliefs on students’ reading fluency and reading comprehension progress. according to child x instruction interactions, we hypothesized that prior abilities moderate the effects of teachers’ beliefs on student learning. we assessed constructivist and direct-transmissive beliefs on separate scales and modelled reading progress across eight points of measurement over one school year. global constructivist beliefs were positively related to reading fluency only. no other main effects were found. effects of teachers’ constructivist beliefs on students’ growth in reading comprehension depended on students’ prior abilities and was higher for high compared to low achieving students. references aberson c. l. (2010). applied power analysis for the behavioral sciences. new york: routledge. behrmann, l., & souvignier, e. (2013). pedagogical content beliefs about reading instruction and their relation to gains in student achievement. european journal of psychology of education, 28 (3), 1023–1044. doi:10.1007/s10212-012-0152-3 bollen, k. a., & curran, p. j. (2006). latent curve models. a structural equation perspective. hoboken, nj: john wiley & sons, inc. brown, r., & pressley, m. (1994). self-regulated reading and getting meaning from text: the transactional strategies instructional model and its ongoing validation. in d. schunk & b. zimmerman (eds.), self-regulation of learning and performance: issues and educational applications (pp. 155–180). hillsdale, nj: erlbaum. bryk, a. s., & raudenbush, s. w. (1988). toward a more appropriate conceptualization of research on school effects: a three-level hierarchical linear model. american journal of education, 97 (1), 65–108. doi:10.1086/443913 buehl, m. b., & beck, j. s. (2015).the relationship between teachers’ beliefs and teachers’ practices. in h. fives, & m. g. gill (eds.), international handbook of research on teachers’ beliefs(pp. 66–84). new york: routledge. bunting, c. e. (1985). dimensionality of teacher educational beliefs: a validation-study. the journal of experimental education, 53 (4), 188–192. doi:10.1080/00220973.1985.10806380 connor, c. m., morrison, f. j., fishman, b., giuliani, s., luck, m., underwood, p. s., bayraktar, a., crowe e. c., & schatschneider, c. (2011). testing the impact of child characteristics x instruction interactions on third graders’ reading comprehension by differentiating literacy instruction. reading research quarterly, 46(3), 189-221. doi: 10.1598/rrq.46.3.1 connor, c. m., morrison, f. j., & petrella, j. n. (2004). effective reading comprehension instruction: examining child by instruction interactions.journal of educational psychology, 96(4), 682–698. doi:10 .1037/0022-0663.96.4.682 cromley, j. g., & azevedo, r. (2007). testing and refining the direct and inferential mediation model of reading comprehension. journal of educational psychology, 99(2), 311–325. doi:10.1037 /0022-0663.99.2.311 dubberke, t., kunter, m., mcelvany, n., brunner, m., & baumert, j. (2008). lerntheoretische überzeugungen von mathematiklehrkräften: einflüsse auf die unterrichtsgestaltung und den lernerfolg von schülerinnen und schülern[mathematics teachers’ beliefs: their impact on instructional quality and student achievement]. zeitschrift für pädagogische psychologie [german journal of educational psychology], 22 (34), 193–206. doi:10.1024/1010-0652.22.34.193 duncan, t. e., duncan, s. c., & strycker, l. a. (2006). an introduction to latent variable growth curve modeling: concepts, issues, and application. mahwah, nj: lawrence erlbaum associates. enders, c. k., & bandalos, d. l. (2001). the relative performance of full information maximum likelihood estimation for missing data in structural equation models. structural equation modeling, 8 (3), 430–457. doi:10.1207/s15328007sem0803_5 enders, c. k., & tofighi, d. (2007). centering predictor variables in cross-sectional multilevel models: a new look at an old issue. psychological methods, 12(2), 121. doi:10.1037/1082-989x.12.2.121 farnia, f., & geva, e. (2013). growth and predictors of change in english language learners' reading comprehension. journal of research in reading, 36(4), 389–421. doi:10.1111/jrir.12003 fives, h., lacatena, n., & gerard, c. (2015). teachers’ beliefs about teaching (and learning). in m. g. gill, & h. fives (eds.), international handbook of research on teachers’ beliefs (pp. 249–265). new york: routledge. förster, n., & souvignier, e. (2014). learning progress assessment and goal setting: effects on reading achievement, reading motivation and reading self-concept. learning and instruction, 32, 91–100. doi:10.1016/j.learninstruc.2014.02.002 gill, m. g., & hoffman, b. (2009). shared planning time: a novel context for studying teachers’ discourse and beliefs about learning and instruction. teacherscollege record, 111(5) , 1242–1273. retrieved from http://www.tcrecord.org/content.asp?contentid =15241 hattie, j. (2012). visible learning for teachers: maximizing impact on learning. london: routledge. kagan, d. m. (1992). implication of research on teacher belief. educational psychologist, 27(1), 65–90. doi:10.1207/s15326985ep2701_6 kintsch, w. (1998). comprehension: a paradigm for cognition. new york: cambridge university press. kleickmann, t. (2007). zusammenhänge fachspezifischer vorstellungen von grundschullehrkräften zum lehren und lernen mit fortschritten von schülerinnen und schülern im konzeptuellen naturwissenschaftlichen verständnis [coherences of elementary teachers ’ subject-related beliefs on teaching and learning with students ’progresses in scientific comprehension](doctoral dissertation). retrieved from https://miami.uni-muenster.de/record/642aa4ce-7149-4cdb-a938-c37f3c64cbe2 muthén, l. k., & muthén b. o. (2017). mplus user’s guide. eighth edition.los angeles: muthén & muthén. nation, k., cocksey, j., taylor, j. s., & bishop, d. v. (2010). a longitudinal investigation of early reading and language skills in children with poor reading comprehension. journal of child psychology and psychiatry, 51(9), 1031–1039. doi:10.1111/j.1469-7610.2010.02254.x national institute of child health and human development (2000). teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. washington, dc: u.s. government printing office. nespor, j. (1987). the role of beliefs in the practice of teaching. journal of curriculum studies, 19(4), 317-328. doi:10.1080/0022027870190403 organisation for economic co-operation and development (2009).creating effective teaching and learning environments: first results from talis, oecd. paris, france: oecd. retrieved from http://www.oecd.org/education/school /creatingeffectiveteachingandlearningenvironmentsfirstresultsfromtalis.html pajares, m. f. (1992). teachers’ beliefs and educational research: cleaning up a messy construct. review of educational research, 62(3), 307–332. doi:10.3102 /00346543062003307 paris, s. g., cross, d. r., & lipson, m. y. (1984). informed strategies for learning: a program to improve children’s reading awareness and comprehension. journal of educational psychology, 76, 1239–1252. doi:10.1037/0022–0663.76.6.1239 parrila, r., aunola, k., leskinen, e., nurmi, j. e., & kirby, j. r. (2005). development of individual differences in reading: results from longitudinal studies in english and finnish. journal of educational psychology, 97(3), 299–319. doi:10.1037/0022-0663.97.3.299 peterson, p. l., fennema, e., carpenter, t. p., & loef, m. (1989). teacher's pedagogical content beliefs in mathematics. cognition and instruction, 6(1), 1–40. doi:10.1207/s1532690xci0601_1 rasinski, t. v., reutzel, d. r., chard, d., & linan-thompson, s. (2011). reading fluency. in m. l. kamil, p. d. pearson, birr moija, e., & p. p. afflerbach (eds.), handbook of reading research(vol. iv) (pp. 286–319). new york, nj: routledge. resnick, l. b., & hall, m. w. (1998). learning organizations for sustainable education reform. daedalus, 127(4), 89–118. retrieved from http://www.jstor.org/stable /20027524?seq=1#page_scan_tab_contents richardson, v. (2003). preservice teachers’ beliefs. in j. raths, & a. c. mcaninch (eds.), teacher beliefs and classroom performance: the impact of teacher education, volume 6: advances in teacher education (pp. 1–22). greenwich, ct: information age. samuels, s. j. (1979). the method of repeated readings. the reading teacher, 32(4), 403–408. retrieved from http://www.jstor.org/stable/20194790?seq=1#page_scan_tab _contents savery, j. r., & duffy, t. m. (1995). problem based learning: an instructional model and its constructivist framework. educational technology, 35(5), 31–38. retrieved from http://citeseerx.ist.psu.edu/viewdoc /summary? singer, j. d., & willett, j. b. (2003). applied longitudinal data analysis: modeling change and event occurrence . new york: oxford university press. snider, v. e., & roehl, r. (2007). teachers’beliefs about pedagogy and related issues. psychology in the schools, 44(8), 873–886. doi:10.1002/pits.20272 souvignier, e., förster, n., & salaschek, m. (2014). quop: ein ansatz internetbasierter lernverlaufsdiagnostik mit testkonzepten für lesen und mathematik [quop: an internet based approach with testing concepts for reading and mathematics]. in m. hasselhorn, w. schneider, & u. trautwein (eds.), lernverlaufsdiagnostik [learning progress assessment](pp. 239–256). goettingen: hogrefe. souvignier, e., & mokhlesgerami, j. (2005). implementation eines programms zur vermittlung von lesestrategien im deutschunterricht [moving strategy-oriented reading instruction into the classroom: the role of the teacher]. zeitschrift für pädagogische psychologie [german journal of educational psychology], 19 (4), 249–261. doi:10.1024/1010-0652.19.4.249 souvignier, e., & mokhlesgerami, j. (2006). using self-regulation as a framework for implementing strategy-instruction to foster reading comprehension. learning & instruction, 16, 57-71. doi:10.10167j.learninstruc.2005.12.006 speer, d. c., & greenbaum, p. e. (1995). five methods for computing significant individual client change and improvement rates: support for an individual growth curve approach. journal of consulting and clinical psychology, 63(6), 1044–1048. doi:10.1037/0022-006x.63.6.1044. staub, f. c., & stern, e. (2002). the nature of teachers’ pedagogical content beliefs matters for students' achievement gains: quasi-experimental evidence from elementary mathematics. journal of educational psychology, 94(2), 344–355. doi:10.1037/0022-0663.94.2.344 tilstra, j., mcmaster, k., van den broek, p., kendeou, p., & rapp, d. (2009). simple but complex: components of the simple view of reading across grade levels. journal of research in reading, 32(4), 383–401. doi:10.1111/j.1467-9817.2009.01401.x topping, k. (1987). paired reading: a powerful technique for parent use. the reading teacher, 40(7), 608–614. retrieved from http://www.jstor.org/stable/20199562 topping, k. j. (2006). building reading fluency: cognitive, behavioral, and socioemotional factors and the role of peer-mediated learning. in s. j. samuels, & a. e. farstrup (eds.), what research has to say about fluency instruction.(pp. 106-129). newark, de: international reading association. willett, j. b. (1989). some results on reliability for the longitudinal measurement of change: implications for the design of studies of individual growth. educational and psychological measurement, 49(3), 587–602. doi:10.1177/001316448904900309 woolley, s. l., benjamin, w.-j. j., & woolley, a. w. (2004). construct validity of a self-report measure of teacher beliefs related to constructivist and traditional approaches to teaching and learning. educational and psychological measurement, 64(2), 319–331. doi:10.1177/0013164403261189 microsoft word korpershoek_publication.docx frontline learning research vol.4 no. 3 (2016) 28 -‐ 43 issn 2295-‐3159 1 corresponding author: hanke korpershoek, university of groningen, grote rozenstraat 3, 9712 tg groningen, the netherlands. e-mail: h.korpershoek@rug.nl doi: http://dx.doi.org/10.14786/flr.v4i3.182 relationships among motivation, commitment, cognitive capacities, and achievement in secondary education hanke korpershoek university of groningen, the netherlands article received 5 june / revised 8 january / accepted 7 march / available online 10 may abstract the aims of the present study were (1) to identify to what extent school motivation and school commitment contributed to the explanation of students’ academic achievement in addition to the effect of students’ cognitive capacities, (2) to find out whether school commitment mediated the relation between school motivation and academic achievement, and (3) to find out whether school motivation mediated the relation between school commitment and academic achievement. new in the field is that perspectives from two different research traditions were adopted, resulting in a selection of variables introduced by identity development theory and by motivational theories on achievement goals. the overall goal was to provide insight in the underlying structure of the relationships among these variables by providing new empirical evidence derived from a large student sample. a sample of more than 6,000 secondary school students from the netherlands was therefore used in the study. path models (structural equation models) were used to analyse the data. fit indices of the final model were satisfactory. this model included students’ cognitive capacities, three motivation factors (performance, social, and extrinsic motivation; mastery was excluded) and one commitment component (indepth exploration; the ‘commitment’ and ‘reconsideration of commitment’ components were excluded). the results showed small effects of performance (+), social (+), and extrinsic (-) motivation on academic achievement in addition to students’ cognitive capacities. a very small negative effect was found for in-depth exploration. in-depth exploration mediated the motivation – achievement relationships to a limited extent. suggestions for further research are discussed. keywords: school motivation; school commitment; cognitive capacities; academic achievement; identity development theory; achievement goal theory korpershoek | f l r 29 1. introduction the purpose of the present study was to better understand the underlying structure of the relationships among school motivation, school commitment, and academic achievement of students in secondary education. school motivation is derived from the achievement goal framework. the school commitment construct follows from identity development theory, referring to students’ feelings of being committed to school. the relation between motivation and achievement has received ample attention in the literature (for recent meta-analyses using achievement goal theory see huang, 2012; hulleman, schrager, bodmann, & harackiewicz, 2010). however, within the widely used achievement goal framework, the focus is usually on a limited set of achievement goals (i.e. mastery and performance goals). maehr (1984) suggested that also social solidarity goals and extrinsic goals should be considered when studying achievement goals in educational settings, because students largely vary in their orientations toward learning. therefore, all four suggested achievement goals are investigated in this paper as indicators of students’ school motivation. the relation between school commitment and academic achievement has received far less attention in the literature. building and maintaining relationships with significant others in one’s environment is part of the identity development process (see e.g. klimstra, hale, raaijmakers, branje, & meeus, 2010; kroger, martinussen, & marcia, 2010; meeus, 2011). the school context is one of the most important life domains within which identity formation processes take place. students enter into various commitments by establishing meaningful relationships with peers and teachers. although it is plausible that the extent to which students feel committed influences students’ overall functioning at school, the literature on this topic is scarce. particularly the commitment construct as defined by identity development theory is not commonly used in educational studies. however, a wide variety of similar constructs (from various theoretical frameworks) have been used to explain student outcomes. that is, school commitment is conceptually related to school engagement (fredricks, blumenfeld, & paris, 2004), school membership (hagborg, 1998; wehlage, rutter, smith, lesko, & fernandez, 1989), school belonging (goodenow & grady, 1993), school relatedness (deci & ryan, 2002), and school connectedness (resnick et al., 1997; shochet, dadds, ham, & montague, 2006). the conceptually closest construct is ‘sense of school belonging’, which is explained further in the theoretical framework. prior studies have shown that students’ sense of school belonging is positively associated with school motivation (e. m. anderman, 2002; l. h. anderman & e. m. anderman, 1999; goodenow & grady, 1993; roeser, midgley, & urdan, 1996; ryan & powelson, 1991) and cognitive outcomes (anderman, 2003; goodenow, 1993; ma, 2003; osterman, 2000; roeser et al., 1996; pittman & richmond, 2007). based on these findings, it is expected that similar results can be found for the relationship between school commitment and academic achievement. all in all, the present study aims (1) to identify to what extent school motivation and school commitment contributed to the explanation of students’ academic achievement in addition to the effect of students’ cognitive capacities, (2) to find out whether school commitment mediated the relation between school motivation and academic achievement, and (3) to find out whether school motivation mediated the relation between school commitment and academic achievement. both school motivation and school commitment are, at least theoretically, malleable to some extent, thus insight into the (relative) contributions of these variables to students’ academic achievement is a relevant topic for educational practice. moreover, the multiple goal perspective that is adopted in this paper enables us to identify which achievement goals are related to more general academic achievement measures. this focus on general academic achievement is, in our view, important for educational practice, in addition to the more contextor domain-specific studies on student achievement. it is widely known that mastery goals are generally associated with favourable student achievement in class, but it is not clear whether this is also the case for students’ general academic achievement. in this paper, curriculum independent test scores on mathematics and reading comprehension are used as indicators of students’ general academic korpershoek | f l r 30 achievement in the 9th grade of secondary education. these tests give an indication of students’ general academic functioning in secondary education. when relevant relationships are found between multiple achievement goals and students’ general academic achievement, these insights stress the importance of endorsing and stimulating various achievement goals in school. performance motivation, for example, may not be beneficial for students’ school grades in particular school subjects, but it may relate to students’ general academic achievement. the same line of reasoning applies to the impact of school commitment on student achievement. generally, positive effects are expected, but it is unclear whether these effects are contextor domain-specific or more general in nature. this paper addresses these issues by focusing on the effects of school motivation and school commitment on general academic achievement measures. some factors (e.g. performance motivation) might be weakly related to students’ grades in class, but show stronger relationships with general academic achievement in secondary education. as such, these factors can be seen as appropriate targets for intervention, because they are associated with students’ more general academic functioning. in paragraph 2, the school commitment and school motivation constructs are discussed in more detail before further explaining the present study. insights from various relevant theoretical frameworks are presented in order to clearly explain how the constructs were defined. 2. theoretical framework 2.1 the school commitment construct a fast-growing body of research now recognizes the significance of fulfilling basic psychological needs of students in education. self-determination theory (sdt) distinguishes the need for autonomy, competence, and relatedness which, when all three are supported, are associated with favourable outcomes. these needs specify ‘innate psychological nutriments that are essential for ongoing psychological growth, integrity, and well-being’ (deci & ryan, 2000, p. 229). the need for relatedness is suggested to facilitate the process of internalization, which means that people tend to internalize values and practices from contexts (and people within that context) in which they experience a sense of belonging (niemiec & ryan, 2009). the social context is therefore of major importance in facilitating growth processes such as growth in intrinsic motivation and integration of extrinsic motivation among students (deci & ryan, 2000). moreover, it is said that the need to belong precedes the desire for knowledge (e.g. deci & ryan, 2002). the need for relatedness is therefore seen as a basic and innate psychological need of people. closely linked to these statements about the need for relatedness is the so-called belongingness hypothesis, which states that human beings have a pervasive drive to form and maintain at least a minimum quantity of lasting, positive, and significant interpersonal relationships (baumeister & leary, 1995, p. 497). within the school context, this would imply that students generally have a pervasive drive (or in sdt an innate need) to form and maintain significant interpersonal relationships (e.g. with their teachers and peers). similarly, a sense of school belonging is conceptualized as ‘the extent to which students feel personally accepted, respected, included, and supported by others in the school social environment’ (goodenow & grady, 1993, p. 60-61). here we can already see that the need for relatedness, the pervasive drive to form and maintain interpersonal relationships, and the need to belong are closely related and, more importantly, are closely linked to identity development processes in the school context. faircloth (2012) stated that ‘identity can be seen as a type of ongoing negotiation of participation, shaped by – and shaping in response – the context(s) in which it occurs.’ (p. 186). the school context is therefore an important factor in shaping adolescents’ identity (eccles & roeser, 2011; lannegrand-willems, & bosma, 2006; rich & schachter, 2012). strongly grounded in the work of erikson (1950) and marcia korpershoek | f l r 31 (1966, 1980, 1994), crocetti, rubini, and meeus (2008) developed a three-dimensional model of identity formation that can be used to assess adolescents’ identity formation processes in various life domains (e.g. the school). the model comprises three dimensions. the first dimension, commitment, is conceptualized as a choice made in an identity-relevant area and as the extent to which one identifies with that choice (crocetti et al., 2008, p. 218). it indicates whether a person feels committed to a certain relationship, for example, to friends or to school in general. meeus (1996) formerly defined commitment as the extent to which young people feel committed to, and derive self-confidence from, a positive self-image and confidence in the future from relationships (p. 585; see also bosma, 1985; meeus & dekovic, 1995; meeus, iedema, & maassen, 2002). recall that these definitions show remarkable overlap with the definition of school belonging. both refer to a malleable emotional state and both stress the importance of interpersonal relationships with significant others in obtaining a sense of school belonging or the feeling of school commitment. the second dimension, in-depth exploration, refers to the way in which adolescents deal with existing commitments and how much young people are actively engaged in investigating relationships. the third dimension, reconsideration of commitment, refers to the comparison between current commitments and other possible alternatives and also includes young peoples’ efforts to change present commitments because they are no longer satisfactory (crocetti et al., 2008, p. 209). together, the three dimensions can be used to characterize students’ (feelings of) commitment to the school in general. in the present study, crocetti et al.’s (2008) framework is used to measure students’ commitment to school. it has a strong theoretical basis and fits our idea that having a sense of commitment (or belonging) is an ongoing process of making and reconsidering commitments, thus interpersonal relationships with significant others such as teachers and peers (i.e. the school community). 2.2 the school motivation construct a broad range of motivational theories has attempted to unravel student motivation in educational settings, among others, achievement goal theory (agt; elliot & mcgregor, 2001) and personal investment theory (pi theory; maehr, 1984). motivational theories vary largely in how they define the concept of motivation and how motivation is operationalized. an oversimplified yet clear definition that can be drawn from agt and pi theory is that motivation refers to students’ general orientation towards learning. this general orientation involves cognitive aspects (e.g. adopting achievement goals) as well as noncognitive aspects (e.g. emotional reactions). for this paper, we focused on the adoption of achievement goals as indicators of students’ school motivation, because this approach takes a multiple goal perspective. it captures many different motivational dimensions (e.g. multiple achievement goals), which gives the opportunity to link students’ school commitment to various dimensions of students’ school motivation. agt emphasizes that students pursue different achievement goals in learning situations, such as mastery goals (focused on gaining knowledge and improving skills) and performance goals (focused on demonstrating their ability) (elliot & mcgregor, 2001). mastery-oriented students – those adopting (or striving towards) mastery goals – attempt to understand the topic at hand, gain knowledge, to improve their skills (e.g. tapola & niemivirta, 2008), which generally has a positive effect on students’ learning outcomes (huang, 2012). central to this orientation is the belief that effort leads to success (elliot & mcgregor, 2001). performance-oriented students – those adopting performance goals – are more focused on demonstrating their ability (e.g. tapola & niemivirta, 2008). one’s own ability is referenced against the performance of others (elliot & mcgregor, 2001). the effect of adopting performance goals is less straightforward; both positive and negative effects have been reported (e.g. huang, 2012). maehr (1984) suggested that also social solidarity goals and extrinsic goals should be considered when studying achievement goals in educational settings. maehr’s pi theory includes task goals (mastery), ego goals (performance), social solidarity goals, and extrinsic goals (see also king, ganotice, & watkins, 2014; king, mcinerney, & watkins, 2013; urdan & maehr, 1995). social goals can be referred to as socialgrounded reasons for studying, resulting from social concern and social affiliation (king & mcinerney, korpershoek | f l r 32 2012). social-oriented students – those adopting social goals – are more focused on group learning, for example, studying for the sake of the group (covington, 2000). the relationship with academic achievement has not been studied frequently, though one can expect that the effect on academic achievement is at least positive. deci and ryan (2000) emphasize the importance of studying social goals that can affect achievement, in addition to examining more frequently addressed mastery and performance goals. extrinsic goals refer to the desire for external rewards such as praises and tokens. extrinsic-oriented students – those adopting extrinsic goals – attempt to gain external rewards in learning situations. external rewards then function as an incentive to continue one’s work or task (ryan & deci, 2000). some studies found negative effects of extrinsic motivation on cognitive outcomes (e.g. wolters, yu, & pintrich, 1996). however, as is the case with social goals, the relationship with academic achievement remains largely unclear. building on the theoretical frameworks of agt and pi theory, the inventory of school motivation was developed (ism; mcinerney, & sinclair, 1991; 1992; mcinerney & ali, 2006), in order to capture the four motivation dimensions, including mastery, performance, social, and extrinsic goals. these four motivation dimensions are used in the present paper. 2.3 relationships between the two constructs in a previous publication using the same dataset, latent cluster analysis was used to define student groups with different motivational profiles (korpershoek, kuyper, & van der werf, 2015). it was found that the student group with high scores on all motivation dimensions (i.e. adoption of mastery, performance, social, and extrinsic goals) also had high scores on school commitment. moreover, correlations between the four motivation dimensions and school commitment were all positive and small to medium in size (mastery .40; performance .17; social .32; extrinsic .23). there are also theoretical explanations why the associations are rather small. according to sdt, people tend to pursue goals, domains, and relationships that support their need satisfaction (deci & ryan, 2000). these authors state that relatedness plays a more distal role in the maintenance of intrinsic motivation than autonomy and competence, which more directly influence intrinsic motivation. it is not necessarily a prerequisite for intrinsic motivation, but a ‘needed backdrop’ that makes expression of the innate growth tendency of intrinsic motivation more likely (deci & ryan, 2000, p. 235). prior research also suggests that the two constructs are related to students’ academic achievement. school motivation is found to be a prominent predictor of school grades (e.g. brophy, 2004), but its relation with more objective academic achievement measures (e.g. curriculum independent achievement tests) is less straightforward. based on a meta-analysis of 84 studies, huang (2012) found correlations of .13 between mastery motivation and academic achievement and correlations of -.00 between performance motivation and academic achievement. correlations varying from -.02 to .09 were reported in korpershoek et al. (2015). korpershoek et al. (2015) also reported small and positive correlations between school commitment (as an overall construct) and academic achievement (.11 for reading comprehension and .13 for mathematics). having a sense of commitment (or belonging) is part of students’ basic psychological need satisfaction. it is therefore suggested to be an essential prerequisite for learning (and thus for academic achievement). 2.4 the present study an important question that follows from the theoretical framework is to what extent school commitment and school motivation are related, and to what extent they are related to students’ academic achievement. the goal of the path analyses conducted in this paper was to better understand the underlying structure of the relationships among these variables. three conceptual models were tested to identify to what extent school motivation and school commitment contributed to the explanation of students’ academic achievement in addition to the effect of students’ cognitive capacities. a measure of students’ cognitive capacities was included, because this is generally the strongest predictor of students’ academic achievement. korpershoek | f l r 33 motivation and commitment were expected to show additive effects on academic achievement. the first model (model a) includes only direct effects on academic achievement, two other models also include indirect effects. the first mediation model (model b) includes mediation effects of school commitment on the relation between school motivation and academic achievement. theoretically, this model is the most plausible of the two because of the definition of school commitment used in this study. osterman (2000), for example, explains that in contexts in which students’ basis psychological needs (such as the need to belong) are met, students will function better (e.g. be more motivated) than in contexts in which their needs are not satisfied. the second mediation model (model c) includes mediation effects of school motivation on the relation between school commitment and academic achievement. there is no strong empirical support for the latter model, however, we sought to unravel the underlying structure of the relationships among motivation, commitment, and academic achievement. therefore, both mediation models were empirically tested. 3. method 3.1 participants the data used were collected as part of a large-scale study in secondary education in the netherlands, the so-called cool5-18 project (zijsling, keuning, kuyper, van batenburg, & hemker, 2009). the students included in the present paper were selected from a response group of 8,884 9th grade students (from 80 secondary schools throughout the netherlands) who had participated in the overall data collection. the students were on average 16 years old. in the netherlands, all students are expected to enter secondary education and obtain a secondary school diploma (track a or b, see below) or a secondary school diploma (track c) plus an addition diploma in senior secondary vocational education. students start 7th grade (year one of secondary education) in different educational tracks. the track placement is based on the primary school teachers’ recommendation. the lowest track is the preparatory secondary vocational education programme (track c, duration 4 years), which prepares students for senior secondary vocational education. this track is further divided into three sublevels. the senior general secondary education track (track b, duration 5 years) prepares students for higher professional education. the highest track, pre-university education (track a, duration 6 years) prepares students for university. thus, both tracks a and b prepare for higher education. the students in our sample pursued preparatory vocational secondary education (48%), senior general secondary education (27%), or pre-university education (25%). the sample included similar numbers of boys and girls (each 50%). 3.2 instruments 3.2.1 school commitment the school commitment scale was part of a paper-and-pencil questionnaire administered at the participating schools. we used an adapted version of the u-gids (utrecht-groningen identity development scale; crocetti et al., 2008). this instrument comprises three subscales: commitment (5 items), in-depth exploration (5 items), and reconsideration of commitment (3 items). sample items are: “my school gives me certainty in life” (commitment), “i think a lot about my school” (in-depth exploration), and “i often think it would be better to try to find a different school” (reconsideration of commitment; reversed scale), with answer categories ranging from 1 (strongly disagree) to 5 (strongly agree). the factor structure was confirmed in a factor analysis. the reliabilities of the subscales were: commitment (α = .86), in-depth exploration (α = .79), and reconsideration of commitment (α = .87). korpershoek | f l r 34 3.2.2 school motivation a dutch version of the inventory of school motivation (ism) of mcinerney and ali (2006) was used. the questionnaire used here consisted of 32 items (see ali & mcinerney, 2004 for this subset of items) on a 5-point likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). the items were included in the same questionnaire as the items of the school commitment scale. factor analysis has confirmed the four factor structure suggested by the literature (mcinerney, dowson, & yeung, 2005; mcinerney, marsh, & yeung, 2003, see also korpershoek, xu, mok, mcinerney, & van der werf, 2015) and resulted in four reliable scales: mastery motivation (9 items, α = .77), performance motivation (7 items, α = .84), social motivation (7 items, α = .74), and extrinsic motivation (9 items, α = .86) in our sample. each of these four dimensions is based on two first order factors. mastery motivation is based on task (e.g. “i like to see that i am improving in my schoolwork”) and effort (e.g. “when i am improving in my schoolwork i try even harder”), performance motivation on competition (e.g. “i work harder if i’m trying to be better than others”) and social power (e.g. “i often try to be the leader of a group”), social motivation on social concern (e.g. “it is very important for students to help each other at school”) and affiliation (e.g. “i prefer to work with other people at school rather than alone”), and extrinsic motivation on praise (e.g. “at school i work best when i am praised”) and token (e.g. “i work hard in class for rewards from the teacher”). 3.2.3 cognitive capacities students’ score on an intelligence test was used as indicator of students’ cognitive capacities. students’ intelligence was estimated based on their performance on the so-called nscct intelligence test (“non-scholastic cognitive capacities test”; van batenburg & van der werf, 2004) which was adapted to the level of 9th grade students (see also zijsling et al., 2009). the test consists of 76 items including five topics: constructing figures, exclusion, series of numbers, categories, and analogies. the reliability of the test in the overall student sample was .91. 3.2.4 academic achievement two standardized achievement tests were used to assess the students’ achievements in mathematics and reading comprehension. the achievement tests were paper-and-pencil tests that were administered at the participating schools. the mathematics test was based on an item bank of 50 multiple choice questions, resulting in three different versions (with 11 anchored items) for students in different educational tracks. the reading comprehension text consisted of several short texts about which multiple choice questions were formulated. an item bank of 46 questions was used (with 11 anchored items). thus, different versions of the mathematics and reading comprehension tests with both anchored and unique items were used for students in the lower and higher educational tracks (for details see zijsling et al., 2009). for cool5-18 three versions of the test have been developed, two for track c students (one for the lowest two levels and one for the highest level within this track) and one for track a and b students. using a one-parameter logistic model (oplm; an item response model), the students’ scores were placed on one performance scale, indicating the percentage of items within the overall item test bank which a student was expected to answer correctly (between 0100%), regardless of the track they were in and regardless of the test version. the advantage of using this procedure is that the students’ scores can easily be compared across different test versions (e.g. when comparing the results of track a and track b students, which had taken the same test version). the reliability for the mathematics test was .94 and for the reading comprehension test it was .92. since we attempted to explain students’ academic achievement in general, a latent factor based on both test scores was included in the path models. 3.3 analyses structural equation modelling was applied to the data. models were estimated with mplus software (version 7) using maximum likelihood (ml) estimation. model fit indices reported are the chi-square and korpershoek | f l r 35 degrees of freedom values, the root mean square error of approximation (rmsea), the standardized root mean square residual (srmr), the comparative fit index (cfi) and the tucker-lewis index (tli). adequate fit is found when the rmsea values are .06 or lower, srmr values are .08 or lower, and cfi/tli values are .95 or higher (hu & bentler, 1999). first, model a is presented, including only direct effects of the school motivation factors (i.e. four latent variables) and school commitment factors (i.e. three observed variables) on academic achievement. then, models b and c (the mediation models) are presented. insignificant paths (p > .01) will be deleted step-by-step to improve model fit. 4. results table 1 shows the correlations among all variables. table 1 correlations among all variables 1 2 3 4 5 6 7 8 9 1. cognitive capacities 2. mastery motivation .04 3. performance motivation .05 .36 4. social motivation .09 .47 .14 5. extrinsic motivation -.02 .51 .56 .36 6. commitment .11 .35 .14 .27 .17 7. in-depth exploration -.05 .35 .24 .26 .31 .30 8. reconsid. of commitment -.20 -.07 .08 -.10 .07 -.32 .09 9. reading comprehension .49 .07 .01 .08 -.03 .10 -.02 -.17 10. mathematics .70 .05 .10 .09 -.01 .13 -.04 -.21 .52 students’ cognitive capacities correlated highly with their scores on the mathematics test (r = .70) and moderately with their scores on the reading comprehension test (r = .49). all other correlations varied from -.01 to .56, with the highest correlations between performance and social motivation (r = .47), between mastery and extrinsic motivation (r = .51), between performance and extrinsic motivation (r = .56), and between the reading comprehension and mathematics scores (r = .52). finally, the correlations between the school motivation and school commitment components on the one hand and the achievement measures on the other hand were low (the highest correlation was -.21). all variables were initially included in the path models. the first path model (model a1) included direct effects of students’ cognitive capacities, school motivation (4 latent factors: mastery, performance, social, and extrinsic motivation), and school commitment (commitment, in-depth exploration, reconsideration of commitment) on students’ academic achievement. the model did not show adequate fit with regard to the rmsea (.084) and srmr (.091) values and the cfi (.881) and tli (.834) values. deleting the insignificant path from reconsideration of commitment to achievement (p = .167) in model a2 did not improve model fit: rmsea (.091), srmr (.098), cfi (.881), tli (.829). subsequently, the other insignificant path, that is, from mastery motivation to achievement (p = .026) was deleted in model a3. this model, now only including significant paths, also did not improve model fit (see table 2). korpershoek | f l r 36 table 2 model fit results of models a3, b2, and c model a3 model b2 model c rmsea .090 [.087-.094] .057 [.053-.060] .126 [.122-.129] srmr .087 .034 .099 cfi .896 .967 .816 tli .846 .944 .702 aic 191957.338 215718.962 193420.437 bic 192181.762 215974.196 193665.437 χ2 (df) 1938.108 (35) 636.237 (26) 3395.381 (32) r2 .681 .677 .679 n 6639 7319 6639 note. full information maximum likelihood was used, therefore, the number of students included in the analysis varies per model. missing data are generally missing scores on the intelligence test, because not all schools administered this test. moreover, some individual students did not take the achievement tests or filled out the questionnaire (or had too many missing items to construct scale scores). subsequently, model b was constructed, including the direct effects from model a3 (3 out of 4 school motivation factors: performance, social, and extrinsic motivation; 2 out of 3 school commitment variables: commitment and in-depth exploration) and mediation effects of the school commitment variables on the motivation – achievement relationships. model b1 shows adequate fit: rmsea (.059), srmr (.037), cfi (.959), except for the tli value (.929). however, the one direct path was not significant, that is, from commitment to achievement (p = .215). model b2 therefore shows the results without this variable in the model (see table 2), which significantly improved model fit. the rmsea and srmr values are well below the cut-off values. the cfi value is above the cut-off value of .95 (hu & bentler, 1999), the tli value almost reaches the cut-off value (.944). model c, the model that included mediation effects of the school motivation factors on the school commitment – achievement relationships, did not fit the data (see table 2). model b2 appeared the best fitting model. figure 1 shows the corresponding path model. korpershoek | f l r 37 figure 1 path model of model b2 (standardized estimates, standard errors between brackets) note. all paths are statistically significant at p < .001. the path from extrinsic motivation to in-depth exploration is significant at p < .01. the strongest predictor of academic achievement was students’ score on the intelligence test (an indicator of students’ cognitive capacities; .813). additionally, performance motivation (.155) and social motivation (.125) showed positive effects on students’ academic achievement. the desire to outperform others (performance motivation) and to learn together with others (social motivation) seems to progress students’ achievement. extrinsic motivation (e.g. learning for praise and tokens) was, however, associated with lower levels of academic achievement (-.161). the final model included one of the three subscales of school commitment, namely, in-depth exploration. referring to the extent to which students are actively engaged in investigating relationships and the way in which they deal with existing commitments, this variable was negatively related to academic achievement. the size of the effect was quite small (-.047), which indicates that this result needs to be interpreted with some caution. we will return to this issue in the discussion. stronger effects were found for the relationships between the motivational factors and in-depth exploration. higher levels of motivation (performance, social, and extrinsic) were associated with higher levels of in-depth exploration. that is, the higher one’s motivation, the more one thinks about and explores relationships at school. this was particularly the case for social motivation. the final model revealed small significant mediation effects of in-depth exploration on the motivation – achievement relationships, although we would like to stress that the relationship between indepth exploration and achievement was quite small to begin with. we tested the indirect effects of performance, social, and extrinsic motivation on achievement via in-depth exploration. these indirect effects korpershoek | f l r 38 were negligible: performance motivation -.007 (se = .002; p < .01), social motivation -.014 (se = .004, p < .001), and extrinsic motivation -.004 (se = .002, p < .05). 5. discussion this study integrated insights from identity development theory and motivational theories on achievement goals in an educational context, using a large student sample. although the constructs that were used in this study have very different theoretical origins, the empirical findings underscore that school motivation (following motivational theories on achievement goals) and school commitment (following identity development theory) are related constructs among secondary school students. various school motivation factors (i.e. performance, social, and extrinsic motivation) and one school commitment component (i.e. in-depth exploration) each had unique effects on academic achievement in addition to the effect of students’ cognitive capacities. moreover, the school motivation factors were positively related to students’ in-depth exploration. educational studies attempting to explain students’ academic achievement should, therefore, integrate insights from these different theoretical perspectives in their explanatory models to further understand the direct and unique contributions of each of these variables. a positive direct effect was found for social motivation on students’ academic achievement (as suggested by covington, 2000 and deci & ryan, 2000) and a negative direct effect was found for extrinsic motivation (in line with findings presented by wolters et al., 1996), which suggests that it is relevant to study other achievement goals in addition to the more commonly addressed mastery and performance goals (see maehr, 1984). furthermore, a positive effect was found for performance motivation. performance-oriented students, thus those that, for example, responded that they worked harder when they tried to be better than others, had higher scores on the achievement tests than students with different orientations towards learning. for students’ general academic achievement, it seems beneficial to be (to some extent) oriented towards outperforming others. this finding is in contrast with the results of the meta-analysis of huang (2012), who did not find a significant relationship between performance motivation and achievement. a notable finding was that mastery motivation was the first factor that needed to be deleted from the model (see result section for details). the findings for mastery and performance motivation are in contrast with the results of the metaanalysis of huang (2012), who found positive relationships between mastery motivation and achievement but not between performance motivation and achievement. presumably, the study design is important here for the interpretation. when outperforming others is students’ general orientation toward learning, performance on a low stakes academic achievement test (which was used in this study) provides students with almost the same opportunities as performance on a high stakes test, namely outperforming others. when mastery is students’ general orientation toward learning, performance on a low stakes test does not imply that actual learning takes place. that is to say, the context does not ask for any learning activities such as trying to master the content. there were no consequences attached to the outcomes of the tests. a more methodological explanation is that the several motivation components were moderately correlated (which was allowed in the path model). particularly the correlations of social motivation with mastery and extrinsic motivation were moderately high, which may have resulted in smaller effects for each of these components. students are not mastery or performance-oriented, they often adopt various achievement goals in learning situations (see also korpershoek et al., 2015). only one of the three school commitment components was included in the final model. the higher students’ score on the in-depth exploration scale, the lower their general academic achievement. this would imply that thinking a lot about school and exploring one’s commitment to school is unfavourable for students’ general academic outcomes, which is not in line with theoretical notions discussed earlier in this paper. as already mentioned in the results section, the size of the effect was rather small (-.047), which is why this result should be interpreted with some caution. replication of the study is needed to validate these findings. the other two school commitment components (commitment and reconsideration of commitment) korpershoek | f l r 39 were not included in the final model, indicating that those components were not related to students’ general academic achievement. as stated in the introduction, these factors may still be relevant for day-to-day functioning of students in class and presumably also for their school grades in more contextor domainspecific situations. the impact on general academic achievement could, however, not be confirmed. finally, although in-depth exploration mediated the motivation – achievement relationships, the indirect effects of performance, social, and extrinsic motivation on academic achievement via in-depth exploration were negligible. the final model that included these effects showed adequate model fit, but our data did not support the idea that one’s school commitment substantially mediated the motivation – achievement relationships. replication of the study is needed to validate these findings. notwithstanding these critical remarks, model b (including mediation effects of school commitment on the motivation – achievement relation) fitted the data much better than the theoretically less plausible model c (including mediation effects of school motivation on the school commitment – achievement relation). the study contributes to further theory development, particularly by highlighting that some motivational processes (such as adopting mastery goals) and some identity development processes (such as making commitments to people in one’s environment) are presumably more important for situation-specific school contexts then for general school contexts. that is, in our models, mastery motivation did not show a meaningful relationship with our general academic achievement measures (r < .10), but the correlations between mastery motivation and two school commitment components (commitment and in-depth exploration) were meaningful (both r = .35). these latter findings are more in line with theory (e.g. deci & ryan, 2000; osterman, 2000), because these relationships suggest that motivational processes and students’ identity development processes go, to some extent, hand in hand. model b (including mediation effects of school commitment on the motivation – achievement relation) fitted these theoretical notions, although the relationship between in-depth exploration and achievement we found was quite unexpected. however, in our study, we used curriculum-independent test scores to measure students’ academic achievement rather than situation-specific achievement measures (e.g. student achievement on a domain-specific test in a specific course in secondary education), which might explain this finding. based on our results, one could argue that the theories that we studied to explain differences in student achievement appear less applicable to this more general school context. an important suggestion for further theory development with regard to agt (elliot & mcgregor, 2001) and pi theory (maehr, 1984) is, therefore, to see how and to what extent these motivational theories on achievement goals can capture more general motivational patterns among adolescents in addition to more situation-specific contexts such as classroom learning. additionally, it might be worthwhile to examine different ways to operationalize school motivation (i.e. more situation-specific versus more in general) when studying students’ general academic achievement. with regard to educational practice, the finding that social motivation is positively associated with students’ general academic achievement, suggests that social motivation is a suitable target for intervention. although the contribution of this variable to the explanation of students’ general academic achievement is relatively small compared to the effect of students’ cognitive capacities, it showed a meaningful relationship. stimulating students’ social concern, for example, by emphasizing that it is important to help each other at school, may create an atmosphere in which students stimulate each other’s’ learning processes. in a similar vein, the findings show that students’ often prefer to work in groups rather than alone (social affiliation). the positive association between social motivation and academic achievement suggests that group work may stimulate student learning. in addition to validating the findings and confirming the final model in future studies, we suggest investigating differential effects on students’ academic achievement. that is, for particular student groups (e.g. for underperforming students) some relationships may be stronger than for other student groups, but more research is needed to investigate this (e.g. by using multigroup analysis). additionally, the addition of other variables in the model, for example, school engagement (see osterman, 2000) and self-efficacy (see hejazi, shahraray, farsinejad, & asgary, 2009) is a relevant topic for future research. various studies propose that the effect of sense of school belonging (conceptually related to school commitment) does not directly influence student achievement, but influences student engagement and self-efficacy beliefs, which in korpershoek | f l r 40 turn affects achievement. an important limitation of this paper is that cross-sectional data were used, therefore eliminating the opportunity to examine cause-effect relationships. that is, the findings confirmed various significant associations, but it is likely that the relationships work both ways. for example, high academic achievement may have a positive impact on students’ motivation as well. further research in therefore needed to understand how these relationships develop over time (e.g. using cross-lagged models). notwithstanding this limitation, the main contribution of this paper lies in the empirically-funded argument that the integration of insights from identity development theory and motivational theories enhances our general understanding of student learning and student achievement in secondary school. keypoints this paper adopted insights from two different theories, namely identity development theory and achievement goal theory various motivation and school commitment components were significantly related to students’ academic achievement cognitive capacity was the strongest predictor of academic achievement among 9th grade secondary school students the final model included small effects of performance (+), social (+), and extrinsic (-) motivation on students’ academic achievement in-depth exploration mediated the motivation – achievement relationships to a limited extent references ali, j., & mcinerney, d. m. (2004). multidimensional assessment of school motivation. paper presented at the 3rd self research conference, berlin, germany. anderman, e. m. (2002). school effects on psychological outcomes during adolescence. journal of educational psychology, 94, 795-809. doi: 10.1037//0022-0663.94.4.795 anderman, l. h. (2003). academic and social perceptions as predictors of change in middle school students’ sense of school belonging. journal of experimental education, 72, 5-22. doi: 10.1080/00220970309600877 anderman, l. h., & anderman, e. m. (1999). social predictors of changes in students’ achievement goal orientations. contemporary educational psychology, 24, 21-37. doi: 10.1006/ceps.1998.0978 batenburg, th. a. van, & werf, m.p.c. van der. (2004). nscct: niet schoolse cognitieve capaciteiten test. voor groep 4, 6 en 8 in het basisonderwijs. verantwoording, normering en handleiding. groningen, the netherlands: gion. baumeister, r. f., & leary, m. r. (1995). the need to belong: desire for interpersonal attachments as a fundamental human motivation. psychological bulletin, 117, 497-529. doi: 10.1037//00332909.117.3.497 bosma, h. a. (1985). identity development in adolescence. coping with commitments. unpublished doctoral dissertation. groningen, the netherlands: university of groningen. brophy, j. (2004). motivating students to learn. mahwah, nj: erlbaum. covington, m. v. (2000). goal theory, motivation, and school achievement: an integrative review. annual review of psychology, 51, 171-200. doi: 10.1146/annurev.psych.51.1.171 crocetti, e., rubini, m., & meeus, w. (2008). capturing the dynamics of identity formation in various ethnic groups. development and validation of a three-dimensional model. journal of adolescence, 31, 207-222. doi: 10.1016/j.adolescence.2007.09.002 korpershoek | f l r 41 deci, e. i., & ryan, r. m. (2000). the ‘what’ and ‘why’ of goal pursuits: human needs and the selfdetermination of behavior. psychological inquiry, 11, 227-268. doi: 10.1207/s15327965pli1104_01 deci, e. l., & ryan, r. m. (eds.). (2002). handbook of self-determination theory research. rochester, ny: university of rochester press. eccles, j. s., & roeser, r. w. (2011). schools as developmental contexts during adolescence. journal of research on adolescence, 21, 225-241. doi: 10.1111/j.1532-7795.2010.00725.x elliot, a. j., & mcgregor, h. a. (2001). a 2x2 achievement goal framework. journal of personality and social psychology, 80, 501-519. doi: 10.1037//0022-3514.80.3.501 erikson, e. h. (1950). childhood and society. new york: norton. faircloth, b. s. (2012). “wearing a mask” vs. connecting identity with learning. contemporary educational psychology, 37, 186-194. doi: 10.1016/j.cedpsych.2011.12.003 fredricks, j. a., blumenfeld, p. c., & paris, a. h. (2004). school engagement: potential of the concept, state of evidence. review of educational research, 74, 59-109. doi: 10.3102/00346543074001059 goodenow, c., & grady, k. e. (1993). the relationship of school belonging and friends’ values to academic motivation among urban adolescent students. the journal of experimental education, 62, 60-71. doi: 10.1080/00220973.1993.9943831 hagborg, w. j. (1998). an investigation of a brief measure of school membership. adolescence, 33, 461468. hu, l., & bentler, p. m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6, 1-55. doi: 10.1080/10705519909540118 huang, c. (2012). discriminant and criterion-related validity of achievement goals in predicting academic achievement: a meta-analysis. the journal of educational psychology, 104, 48-74. doi: 10.1037/a0026223 hulleman, c. s., schrager, s. m., bodmann, s. m., & harackiewicz, j. m. (2010). a meta-analytic review of achievement goal measures: different labels for the same constructs or different constructs with similar labels? psychological bulletin, 136, 422-449. doi: 10.1037/a0018947 klimstra, t.a., hale, w.w., raaijmakers, q.a.w., branje, s.j.t. & meeus, w.h.j. (2010). identity formation in adolescence: change or stability? journal of youth and adolescence, 39, 150-162. doi: 10.1007/s10964-009-9401-4 king, r. b., & mcinerney, d. m. (2012). including social goals in achievement motivation research: examples from the philippines. online readings in psychology and culture, unit 5. retrieved from http://scholarworks.gvsu.edu/orpc/vol5/iss3/4. doi: 10.9707/2307-0919.1104 king, r. b., ganotice, f. a., & watkins, d. a. (2014). a cross-cultural analysis of achievement and social goals among chinese and filipino students. social psychology of education, 17, 439-455. published online first may 2014. doi: 10.1007/s11218-014-9251-0 king, r. b., mcinerney, d. m., & watkins, d. a. (2012). competitiveness is not that bad…at least in the east: testing the hierarchical model of achievement motivation in the asian setting. international journal of intercultural relations, 36, 446-457. doi: 10.1016/j.ijintrel.2011.10.003 korpershoek, h., kuyper, h., & van der werf, m. p. c. (2015). differences in students’ school motivation: a latent class modelling approach. social psychology of education, 18, 137-163. doi:10.1007/s11218014-9274-6 korpershoek, h., xu, j. k., mok, m. m. c., mcinerney, m. d., & van der werf, m. p. c. (2015). testing the multidimensionality of the inventory of school motivation in a dutch student sample. journal of applied measurement, 16, 41-59. kroger, j., martinussen, m., & marcia, j. e. (2010). identity status change during adolescence and young adulthood: a meta-analysis. journal of adolescence, 33, 683-698. doi: 10.1016/j.adolescence.2009.11.002 lannegrand-willems, l., & bosma, h. (2006). identity development-in-context: the school as an important context for identity development. identity, 6, 85-113. doi: 10.1207/s1532706xid0601_6 korpershoek | f l r 42 ma, x. (2003). sense of belonging to school: can schools make a difference? the journal of educational research, 96, 340-349. doi: 10.1080/00220670309596617 maehr, m. l. (1984). meaning and motivation: toward a theory of personal investment. in c. ames & r. ames (eds.), research on motivation in education, vol. 1 (pp. 115-144). new york: academic press. marcia, j. e. (1966). development and validation of ego-identity status. journal of personality and social psychology, 3, 551-558. doi: 10.1037/h0023281 marcia, j. e. (1980). identity in adolescence. in j. adelson (ed.), handbook of adolescent psychology (pp. 159-187). new york: wiley. marcia, j. e. (1994). the empirical study of ego identity. in h. a. bosma, t. l. g. graafsma, h. d. grotevant, & d. j. de levita (eds.), identity and development. an interdisciplinary approach (pp. 6780). thousand oaks, ca: sage publications, inc. mcinerney, d. m., & ali, j. (2006). multidimensional and hierarchical assessment of school motivation: cross-cultural validation. educational psychology, 26, 595-612. doi: 10.1080/01443410500342559 mcinerney, d. m., & sinclair, k. e. (1991). cross-cultural testing: inventory of school motivation. educational and psychological measurement, 51, 123-133. doi: 10.1177/0013164491511011 mcinerney, d. m., & sinclair, k. e. (1992). dimensions of school motivation: a cross-cultural validation study. journal of cross-cultural psychology, 23, 389-406. doi: 10.1177/0022022192233009 mcinerney, d. m., dowson, m., & yeung, a. s. (2005). facilitating conditions for school motivation: construct validity and applicability. educational and psychological measurement, 65, 1046-1066. doi: 10.1177/0013164405278561 mcinerney, d. m., marsh, h. w., & yeung, a. s. (2003). toward a hierarchical goal theory model of school motivation. journal of applied measurement, 4, 335-357. meeus, w. (1996). studies on identity development in adolescence: an overview of research and some new data. journal of youth and adolescence, 25, 569-598. doi: 10.1007/bf01537355 meeus, w. (2011). the study of adolescent identity formation 2000-2010: a review of longitudinal research. journal of research on adolescence, 21, 75-94. doi: 10.1111/j.1532-7795.2010.00716.x meeus, w., & dekovic, m. (1995). identity development, parental and peer support in adolescence: results of a national dutch survey. adolescence, 30, 931-944. meeus, w., iedema, j., & maassen, g. h. (2002). commitment and exploration as mechanisms of identity formation. psychological reports, 90, 771-785. doi: 10.2466/pr0.90.3.771-785 niemiec, c. p., & ryan, r. m. (2009). autonomy, competence, and relatedness in the classroom. applying self-determination theory to educational practice. theory and research in education, 7, 133-144. doi: 10.1177/1477878509104318 osterman, k. f. (2000). students’ need for belonging in the school community. review of educational research, 70, 323-367. doi: 10.3102/00346543070003323 pittman, l. d., & richmond, a. (2007). academic and psychological functioning in late adolescence: the importance of school belonging. the journal of experimental education, 75, 270-290. doi: 10.3200/jexe.75.4.270-292 resnick, m. d., bearman, p. s., blum, r. w., bauman, k. e., harris, k. m., jones, j., tabor, j., beuhring, t., sieving, r. e., shew, m., ireland, m., bearinger, l. h., & udry, j. r. (1997). protecting adolescents from harm: findings from the national longitudinal study on adolescent health. the journal of the american medical association, 278, 823-832. doi: 10.1001/jama.278.10.823 rich, y., & schachter, e. p. (2012). high school identity climate and student identity development. contemporary educational psychology, 37, 218-228. doi: 10.1016/j.cedpsych.2011.06.002 roeser, r. w., midgley, c., & urdan, t. c. (1996). perceptions of the school psychological environment and early adolescents’ psychological and behavioral functioning in school: the mediating role of goals and belonging. journal of educational psychology, 88, 408-422. doi: 10.1037/0022-0663.88.3.408 ryan, r. m., & deci, e. l. (2000). self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. american psychologist, 55, 68-78. doi: 10.1037//0003066x.55.1.68 korpershoek | f l r 43 ryan, r., & powelson, c. (1991). autonomy and relatedness as fundamental to motivation and education. journal of experimental education, 60, 49-66. doi: 10.1080/00220973.1991.10806579 shochet, i. m., dadds, m. r., ham, d., & montague, r. (2006). school connectedness is an underemphasized parameter in adolescent mental health: results of a community prediction study. journal of clinical child and adolescent psychology, 35, 170-179. doi: 10.1207/s15374424jccp3502_1 tapola, a., & niemivirta, m. (2008). the role of achievement goal orientations in students’ perceptions of and preferences for classroom environment. british journal of educational psychology, 78, 291-312. doi: 10.1348/000709907x205272 urdan, t. c., & maehr, m. l. (1995). beyond a two-goal theory of motivation and achievement: a case for social goals. review of educational research, 65, 213-243. doi: 10.3102/00346543065003213 wehlage, g. g., rutter, r. a., smith, g. a., lesko, n., & fernandez, r. r. (1989). reducing the risk: schools as communities of support. new york: falmer press. wolters, c. a., yu, s. l., & pintrich, p. r. (1996). the relation between goal orientation and students’ motivational beliefs and self-regulated learning. learning and individual differences, 8, 211-238. doi: 10.1016/s1041-6080(96)90015-1 zijsling, d., keuning, j., kuyper, h., batenburg, th. van, & hemker, b. (2009). cohortonderzoek cool518. technisch rapport eerste meting in het derde leerjaar van het voortgezet onderwijs [cohort study cool5-18. technical report of the first wave in the 9th grade of secondary education]. groningen/arnhem, the netherlands: gion/cito. codepen strohmaier et al frontline learning research vol.8 no. 1 (2020) 16 32 issn 2295-3159 a comparison of self-reports and electrodermal activity as indicators of mathematics state anxiety. an application of the control-value theory anselm r. strohmaiera, anja schiepe-tiskab & kristina m. reiss a b aheinz nixdorf chair of mathematics education, tum school of education, technical university of munich, munich, germany bcentre for international student assessment, technical university of munich, munich, germany article received 11 november 2018 / revised 13 december 2019/ accepted 24 january 2020 / available online 19 february abstract abstract: in the present study with 86 undergraduate students, we related trait mathematics anxiety (ma) with two indicators of state anxiety: self-reported state anxiety and electrodermal activity (eda). extending existing research, we included appraisals of control and perceived value in hierarchical multiple regression analyses in accordance with the control-value theory of achievement emotions (pekrun, 2006). results showed that trait ma predicted self-reported state anxiety, while no additional variance was explained by including control and value. in contrast, we found no significant relation between trait ma and physiological state anxiety, but a significant, negative three-way interaction effect with control and value. regression coefficients indicated that trait ma predicted physiological state anxiety, but only in the presence of negative perceived control and positive perceived value. thus, our results support the control-value theory for physiological state anxiety, but not for self-reports. they emphasize the need to distinguish between trait and state ma, the advantages of adopting the control-value theory, and the benefits of using eda recording as a supplemental assessment method for state anxiety. keywords: mathematics anxiety, electrodermal activity, galvanic skin response, control-value theory, state anxiety. info corresponding author anselm.strohmaier@tum.de doi 10.14786/flr.v8i1.427 1. introduction mathematics anxiety (ma) has a substantial impact on many students’ academic and personal lives. it influences achievement in mathematics tests and classes (hembree, 1990; ma, 1999; namkung, peng, & lin, 2019). moreover, students with high ma avoid mathematics in everyday life as well as in career and academic choices (dowker, sarkar, & looi, 2016; ma, 1999). ma is common across countries, cultures, and ages (dowker et al., 2016; lee, 2009). in the 2012 study of the programme for international student assessment (pisa), 30% of students reported that they felt helpless when doing a mathematic problem (oecd, 2013b). at the same time, ma is a problem of increasing relevance. on average across oecd countries, ma increased significantly from pisa 2003 to pisa 2012 (oecd, 2013b). thus, for educational research, it is important to understand how ma affects students when doing mathematics. research has elaborated the distinction between (momentary) state anxiety (mastate) and (habitual) trait mathematics anxiety (matrait), assessed through separate self-reports, but the findings left their relationship ambiguous (goetz, bieg, lüdtke, pekrun, & hall, 2013). hence, merely assessing matrait cannot exhaustively explain how ma affects mathematical activities momentarily. then again, directly assessing mastate provides a challenge, because self-reports of state emotions might be unreliable (pekrun & bühner, 2014). among other physiological measures, electrodermal activity (eda; also referred to as galvanic skin response; gsr) had sporadically been used as an indicator for mastate in the 1980s, but its relationship with self-reports of mastate or to matrait remained unclear. in this paper, we addressed this research gap by combining two novel approaches. first, we included and compared both self-reports and eda as measures of mastate. second, we used the control-value theory of achievement emotions (pekrun, 2006) as a framework to test their relation to matrait. accordingly, we included appraisals of control and perceived value as moderators of the relation between matrait and mastate. 1.2 mathematics anxiety ma “involves feelings of tension and anxiety that interfere with the manipulation of numbers and the solving of mathematical problems in a wide variety of ordinary life and academic situations” (richardson & suinn, 1972, p. 551). ma has an adverse effect on cognitive resources, independent of actual abilities (ashcraft, 2007; ashcraft & kirk, 2001; maloney et al., 2013). ashcraft and kirk (2001) found that in a mental addition task, undergraduates with high ma showed a smaller working memory capacity that led to an increase in reaction time and errors. this first finding started an intensive line of research, largely confirming direct effects of ma on performance (for overviews, see dowker et al., 2016; suárez-pellicioni, núñez-peña, & colomé, 2016). this influence is not limited to working memory capacity. for example, maloney, ansari, and fugelsang (2011) found that high ma students suffer from low-level numerical deficits, like a less precise representation of numerical magnitude. although most studies refer to ma as a unidimensional construct, a number of studies reported evidence that it consists of more than one factor, most prominently a cognitive component (“worry”) and an affective component (“emotionality”; e.g., ho et al., 2000; lukowski et al., 2016; wigfield & meece, 1988). these studies typically analyzed the factorial structure of questionnaires and related the dimensions to cognitive outcomes like mathematical achievement (e.g., ashcraft & ridley, 2005; lukowski et al., 2016). 1.3 trait and state mathematics anxiety while there are a large number of studies on ma, very few of them differentiate between mastate and matrait (goetz, bieg, lüdtke, pekrun, & hall, 2013; goldin, 2014). however, this distinction arguably is important when focusing on the effects of ma during mathematical activities. self-reports of matrait refer to multiple, generalized mathematical situations (bieg, goetz, wolter, & hall, 2015). in contrast, mastate refers to the specific, current situation. therefore, reports of matrait might be a good predictor for long-term effects of ma on learning or career and course choices (dowker et al., 2016) but do not necessarily accurately predict mastate during specific mathematical activities like tests or classes. when investigating the effects of ma during such activities, directly addressing mastate seems to be more appropriate. studies investigating the role of emotions in mathematics and of ma in particular predominantly focus on trait emotions rather than state emotions (goetz et al, 2013; goldin, 2014). accordingly, an extensive number of findings have been gathered on effects, individual differences, and precursors of matrait (dowker et al., 2016). in contrast, there are fewer studies on mastate, often using qualitative analyses (goldin, 2014). yet, specific mechanisms explaining the impact of mastate have rarely been reported for mathematics (dowker, 2016). the relationship between mastate and matrait is ambiguous. on the one hand, a number of studies indicate a strong positive relation. for high matrait students, mastate is considered a key explanation for a lower working memory capacity (ashcraft & moore, 2009; beilock, 2008). in his meta-analysis, hembree (1990) reports a mean correlation of r = .42 between matrait and state anxiety. however, state anxiety was not necessarily assessed during specific mathematical activities in the four reported studies (e.g. plake & parker, 1982). on the other hand, some studies indicate that there is a notable discrepancy between matrait and mastate. goetz et al. (2013) found that girls systematically report higher levels of matrait, but that this difference is not present in reports of mastate during mathematics tests or classes. this difference between reports of matrait and mastate is largely explained by individual beliefs and perceptions of competence (bieg et al., 2015; goetz et al., 2013). another reason for differences between matrait and mastate might be that ma negatively affects achievement in mathematics through long-term avoidance behavior, but not during mathematical activities per se (dowker et al., 2016): to avoid aversive consequences, mastate can even enhance motivation momentarily and lead to an increase in effort and strategy use during mathematics tests (eysenck & calvo, 1992; eysenck, derakshan, santos, & calvo, 2007). this indicates that matrait does not necessarily induce mastate. in general, state anxiety can have various cognitive and motivational-affective effects on learning and performance. zeidner (2014) lists 15 specific deficits in information processing during learning caused by anxiety, which are likely to be transferable to mastate. this includes cognitive deficits in areas like information encoding, information storage and processing, and information retrieval and production. moreover, state anxiety is associated with physiological reactions. however, this has not been described for mastate in particular, but only for state anxiety in general. per definition, state anxiety is a “transitory emotional state consisting of feelings of apprehension, nervousness, and physiological sequelae such as an increased heart rate or respiration” (wiedemann, 2015, p. 808). among other aspects, state anxiety is thus characterized by increased arousal and activation of the autonomic nervous system (steimer, 2002; wiedemann, 2015). accordingly, state anxiety does not only cause cognitive deficits, but also a physiological reaction. in sum, existing studies mostly focus on matrait, while its relation to mastate is left ambiguous. thus, to better understand how ma affects learning not only over a longer period of time but also momentarily, additional research is needed. this refers both to the question of the relation between matrait and mastate, as well as to the mechanisms and precursors of mastate in particular. in the following, we propose a theoretical framework for investigating these questions. 1.4 the control-value theory the control-value theory of achievement emotions (pekrun, 2006) characterizes predictors of achievement emotions, including state anxiety. it states that appraisals of control and the perceived subjective value of an achievement situation are the most proximal predictors of achievement emotions. a low appraisal of control and a simultaneous high perceived value of the task are key determinants of state anxiety. in contrast, trait emotions, environmental factors, or former achievement are considered distal factors and are assumed to have a mostly indirect effect on state emotions. according to the control-value theory, matrait should therefore predict mastate mostly indirectly, in association with low appraisals of control and a high subjective value. several empirical studies support aspects of the control-value theory in mathematics (e.g., niculescu, tempelaar, dailey-hebert, segers, & gijselaers, 2015). frenzel, pekrun, and goetz (2007) found that matrait is associated with a pattern of low competence beliefs paired with high achievement values in mathematics. extending the scope, research about attitudes and beliefs about competence in mathematics offers plenty of evidence supporting the control-value theory for other mathematical achievement emotions (for an overview, see goldin et al., 2016). however, to our knowledge, no study implemented both matrait and mastate as well as appraisals of control and perceived value in one model. 1.5 assessing mathematics state anxiety to assess mastate, research has mostly focused on qualitative research (see goldin, 2014, for an overview). these approaches included retrospective interviews and videotaping, but the reliability of these methods has been questioned (goldin, 2014). using a more quantitative approach, goetz et al. (2013) proposed short self-reports that could be used both for measuring anxiety during tests as well as during classes. the advantage of self-reports is that they can be used conveniently for experience-sampling and might be more reliable than observations. however, self-reports about achievement emotions might disrupt the current activity (goldin, 2014). moreover, it is questionable if self-reports can reflect an accurate evaluation of current emotions. in general, self-reports can only cover aspects of emotions that a person is aware of, depend on the use of language, and are subject to systematic biases, e.g. social desirability (pekrun & bühner, 2014). consequently, other researchers have attempted to use physiological measures to directly investigate ma in performance situations (dowker et al., 2016; hannula, 2016), predominantly using neuropsychological methods (e.g., lyons & beilock, 2012; pletzer, kronbichler, nuerk, & kerschbaum, 2015). these studies revealed that ma activates brain areas linked to fear processing, disgust and pain processing, but they did not distinguish between matrait and mastate (artemenko, daroczy, nuerk, 2015; suárez-pellicioni et al., 2016). state anxiety in general is associated with arousal and stress and with physiological reactions due to the activation of the autonomic nervous system. this leads to an increased heart rate and respiration, among other physiological reactions (steimer, 2002; wiedemann, 2015). this also holds for state anxiety in the context of education (zeidner, 2014). therefore, these specific physiological reactions can be assumed to be an indicator for mastate. some studies have assessed heart rate or cortisol secretion to monitor stress levels during mathematical tests (dew, galassi, & galassi, 1984; faust 1992, as cited in ashcraft, 2002; mattarella-micke, mateo, kozak, foster, & beilock, 2011; pletzer, wood, moeller, nuerk, & kerschbaum, 2010; sarkar, dowker, & cohen kadosh, 2014). these studies produced mixed results. mattarella-micke et al. (2011) showed that cortisol secretion can be associated with high performance (for low ma students) or with low performance (for high ma students), probably associated with a working memory overload. in contrast, pletzer et al. (2010) did not find a correlation between cortisol secretion and reports of ma, but used self-reports of matrait, not mastate. a relation between heart rate and state anxiety has been shown in various fields (e.g. kantor, endler, heslegrave, & kocovski, 2001), but has rarely been used in mathematics. faust (as cited in ashcraft, 2002) reported changes in heart rate when a highly math-anxious group performed mathematics tests of increasing difficulty. in contrast, dew, galassi, and galassi (1984) found no substantial relation between heart rate and matrait or mastate. in addition to heart rate, dew, galassi, and galassi (1984) observed physiological arousal during a timed mathematics test assessing participants’ eda. eda are fluctuations in skin conductance due to an increase in sweat gland activity. since sweat gland activity is associated with the autonomic nervous system activity, eda is an established method to assess physiological reactions to arousal, concerns, or stress (boucsein, 2012; naveteur & freixa i baqué, 1987; nikula, 1991). in his overview of the method, boucsein (2012) extensively reviewed applications and correlates of various measures of eda. he concludes that eda “can be regarded as a valid indicator for the strength of – mostly negative – emotions, for observing the course of psychological stress, and for objectively determining coping efficacy” (p. 521). therefore, eda can indicate state anxiety by detecting associated physiological reactions (boucsein, 2012). eda has recently been used to observe emotions during educational processes like self-regulated and multimedia learning (dindar et al., 2019; mudrick, taub, azevedo, price, & lester, 2017) and reading (meer, breznitz, & katzir, 2016). dew et al. (1984) used various measures of eda and different scales to assess matrait and mastate, but found no relation between eda and mastate, and only a small relation between eda and mastate for one of their measures of eda. as a possible explanation, they acknowledge that the challenge of comparing cognitively experienced anxiety and physiologically experienced anxiety might need a larger sample than their 31 students. moreover, their study design did not include a baseline measure, which is generally advisable for data quality (boucsein, 2012) and could indicate if eda is indeed influenced by a mathematical test context. thus, while their theoretical assumptions seem well-founded, the authors argue that their data was not sufficient for a meaningful interpretation (dew et al., 1984). in conclusion, mastate has been assessed through qualitative methods, self-reports, and physiological measures. physiological reactions are a vital aspect of anxiety in general and arguably of mastate in particular, but previous research has not provided clear results concerning the relation between self-reports and physiological measures of mastate, or the relation between mastate and matrait in general. 1.6 the present research so far, we have discussed that the relation between mastate and matrait is not yet fully understood. in performance situations, mastate might be stronger related to processes influencing mathematical thinking, like a reduction of working memory capacity. therefore, taking into account mastate seems important when analyzing effects of ma, but it can be assessed in different ways. while self-reports of mastate are easy to obtain, they might suffer from systematic biases. as an alternative, some studies used physiological measures of stress and arousal instead of self-reports to assess mastate in mathematical performance situations. yet, these studies did either not address both mastate and matrait or, in the case of dew et al. (1984), did not show clear results. moreover, no study did yet include appraisals of control or value to describe the relation between mastate and matrait in accordance with the control-value theory. we consider this a considerable gap in research on ma. we assume that the approach by dew and colleagues (1984) to use eda as an indicator for mastate is more promising today, because the possibilities to record and analyze eda have greatly improved. particularly, the innovations in eda recording offer better possibilities in observing the association between eda and matrait, since they allow to assess mastate more reliable and in an authentic environment. at the same time, using the control-value theory offers a better theoretical framework for the correlation between mastate and individual antecedents. it has been supported by a number of studies using self-reports and other methods to assess state anxiety, but to our knowledge, the control-value theory has not yet been utilized to analyze precursors of eda. 1.7 hypotheses in the present study, we investigated the relation of matrait with two indicators for mastate, the physiological measure eda and self-reported state anxiety. we assessed mastate both in a baseline context (a relaxation exercise) and a mathematics test. first, we assumed that the mathematics test would lead to an increase in both measures (hypothesis 1) and thus indicate that anxiety is successfully induced by the mathematics test. second, we assumed that there is a relation between self-reported mastate and eda (hypothesis 2). moreover, we anticipated that our findings would replicate the direct association between self-reported mastate and matrait (goetz et al., 2013; hypothesis 3a). we expected to find a similar relation between eda and matrait, since eda should reveal physiological arousal, which in turn is an indicator of mastate (hypothesis 3b). according to the control-value theory, appraisals of control and subjective value were included as predictors. we expected that this would confirm the relation between these appraisals and both measures of mastate (hypotheses 4a and 4b). finally, the relation between mastate and matrait should be higher when students report low control and high perceived value. thus, we expected a negative three-way interaction between matrait, appraisals of control, and perceived value, on both measures of mastate, respectively (hypotheses 5a and b). 2. method 2.1 sample and procedure 95 undergraduate students participated in the study. they gave written informed consent before participation. the study was conducted according to the ethical principles of psychologists and code of conduct of the american psychological association from 2017. an ethics approval was not required by institutional guidelines or national regulations, in line with the guidelines of the german research foundation. due to technical difficulties, 5 participants had to be excluded from the sample. additionally, we excluded 4 students because of deviations of more than 3 sd in one of the assessed measures. the remaining participants were 86 undergraduate students (53 female) from programs other than mathematics, ranging from engineering to nutritional science. mathematics students were not recruited as participants to avoid a bias in their beliefs and attitudes towards mathematics, as well as in their mathematical skills. the mean age was 23.2 years (sd = 4.07). participants were recruited on campus and were paid 15 eur for participation. during recruitment and before the experiment any indication of a mathematical content of the study was avoided. the study was described as a study investigating eda during various tasks. the individual sessions of the experiment took place in an office at the university containing only two tables, two chairs, and a closed closet. at the beginning of the experiment, the experimenter made participants familiar with the wristband assessing eda. she then put the device on the wrist of the participant’s non-dominant hand and fitted it comfortably. after recording had started, participants were presented a 5-minute relaxation exercise via headphones. the exercise facilitated relaxation through breathing exercises, accompanied by an audio track that included sounds from nature to help promote a relaxing environment for the participant. when the participant removed the headphones after the exercise, the experimenter immediately presented the first questionnaire assessing state-anxiety. after the participant finished the questionnaire, a first mathematical test was presented. the participant was asked to read the instruction carefully and then wait for the signal to start. all participants had 10 minutes to solve the test and received a short notice after 8 minutes. after the test, the participant answered the second state-questionnaire. the procedure was repeated for a second mathematics test. at the end of the experiment, trait and demographic data were assessed. 2.2 mathematics tests both mathematics tests consisted of six items. eleven items were taken from a pool of released items from the pisa-study (oecd, 2013a); one item was adopted from the trends in international mathematics and science study (timss, international association for the evaluation of educational achievement [iea], 2013). since research suggests that anxiety might have a larger influence for cognitively demanding tasks (ching, 2017; faust, ashcraft, & fleck, 1996), we composed both tests to be fairly difficult. the overall solution rate of 42% (sd = 21%) suggests that the tests were appropriately demanding. the items covered a broad range of mathematical problems, ranging from geometry to statistics. they were based on the concept of mathematical literacy and therefore covered mathematical competencies beyond mere factual knowledge. the tasks required knowledge that all students should have achieved by the end of their compulsory education. for an overall achievement score, we coded each item according to the coding instructions from pisa and timss (0 = incorrect, 0.5 = partially correct, 1 = correct; oecd, 2013a; iea, 2013) and calculated a sum score for all 12 items. 2.3 study measures we assessed matrait using the anxmat-scale developed for the pisa-studies (five items, e.g. “i feel helpless when doing a mathematics problem”, α = .87; oecd, 2005). participants answered on a 4-point likert scale from 1, strongly disagree to 4, strongly agree. we assessed self-reported mastate twice during the experiment according to goetz et al., 2013, asking if participants felt anxious in the previous situation (1, definitely not to 4, definitely). appraisals of control and perceived values were assessed after both tests and were task-specific. for appraisals of control, we used two items accounting for the controllability and probability of outcomes (e.g. “i think my competence in this area is …”, α = .78) on a 7-point likert-scale (1, low to 9, high; engeser & rheinberg, 2008; pekrun & perry, 2014). appraisals of perceived value were assessed with the four-item cognitive preferences-scale by kehr, von rosenstiel, and bles (1997) on a 7-point likert-scale (e.g. “it is important to me to solve the exercises”; 1, not at all to 9, very much; α = .85). for eda data collection during the relaxation exercise and the tests, we used an empatica e4 wristband. the wristband is worn like a watch and measures skin conductance with two stainless steel electrodes at the inner wrist. the exosomatic non-invasive sensor applies a very small, non-perceptible alternating current with a peak value of 100 μ a at 1v with an 8hz frequency. the 4 hz signal is recorded on an integrated flash memory. 2.4 eda data analyses eda signals consist of two components. the tonic signal is influenced by medium-term factors like room temperature or physiological characteristics of the individual. it provides a level of skin conductance that is rather stable within some seconds. even though the tonic signal can be an indicator for stress or anxiety, the phasic component of the signal is suited better to compare eda between individuals and is commonly used as an indicator for state anxiety (boucsein, 2012). phasic components of the eda signal are usually called responses, since they reflect a short peak in the signal. responses can be specific responses to a stimulation, for example a bursting balloon. however, there are phasic responses that are not associated to any specific external stimulation, hence nonspecific. the frequency of these nonspecific responses in skin conductance is associated with stress and anxiety and is one of the most common measures for eda (boucsein, 2012). the phasic and the tonic components of an eda signal overlap and need to be decomposed for analyses. data processing was carried out using matlab (v9.2.0) and the matlab-based software ledalab (v3.4.9). the software applies continuous decomposition analysis to extract the phasic signal (benedek & kaernbach, 2010). after the extraction, any peak in the phasic signal bigger than .01 μ s is counted as a response (boucsein, 2012). for both phases of the experiment (relaxation and test), the number of events is then summed up and divided by the duration of the phase in minutes. the result is the frequency of nonspecific skin conductance responses per minute (scr.freq). scr.freq served as the measure for physiological mastate. 2.5 analyses for hypothesis 1, we conducted a repeated measures anova to test for differences in state anxiety during the relaxation and the test. to assess the relation between the two measures of mastate and their relation to matrait (hypothesis 2 and 3), we calculated the correlations controlling for gender, achievement, and the respective baseline measures (see sect. 3.1). for hypotheses 4 and 5, we adopted a 5-step hierarchical multiple regression model for both measures of mastate as outcome variables (self-reported and physiological mastate). all predictors except gender were z-standardized before the analyses. in step 1, we included the control variables as predictors. in step 2, we additionally included matrait. in accordance with the control-value theory, step 3 included appraisals of control and subjective value. in step 4, we included the interaction term between control and subjective value. finally, step 5 included the interaction terms between matrait and appraisals of control and subjective value, respectively. additionally, we included the three-way interaction between matrait, control, and subjective value. 3. results 3.1 control variables gender differences exist between self-reports of ma (dowker et al., 2016). moreover, because of physiological differences in the sweat gland density and activity, women tend to display a weaker eda reactivity than men (boucsein, 2012). accordingly, our results revealed significant gender differences, with females showing weaker eda, t(84) = 2.93, p = .004, reporting higher matrait, t(84) = -2.38, p = .020, and lower control, t(84) = 2.37, p = .020. no significant gender differences were found regarding self-reports of mastate, t(84) = -0.49, p = .626, and perceived value t(84) = 0.02, p = .984. because of this general influence of gender, we included gender as a control variable in all following analyses. in addition, achievement is associated both with trait anxiety (ma, 1999) and with physiological reaction (mattarella-micke et al., 2011). in our data, we similarly found a significant relation between the test score and reports of matrait, r(86) = -.28, p = .008, self-reports of mastate, r(86) = -.31, p = .004, and control, r(86) = .51, p = .000, respectively, but no significant relation between the test score and eda, r(86) = .15, p = .177, and perceived value, r(86) = .07, p = .518, respectively. since our analyses focused on the interplay of matrait and mastate, irrespective of achievement, we also controlled for the test score in the following analyses. for both measures of mastate (self-reports and eda), we used the data from the relaxation exercise as respective baseline measures. 3.2 main analyses 3.2.1 descriptive results table 1 provides the means and standard deviations for matrait and appraisals of control and perceived value. additionally, mean scores and standard deviations for both measures of mastate during the relaxation exercise and the test are included. for both measures, mastate was significantly higher during the test compared to the relaxation exercise, confirming hypothesis 1. while physiological mastate increased from 15.43 events per minute to 20.04 events per minute (f(85) = 10.53, p = .002, 2 = .10), self-reported anxiety increased from 1.37 to 1.62 (f(85) = 17.23, p < .001, 2 = .17). table 1 descriptive statistics and differences between mastate in relaxation exercise and tests note. the unit for physiological state mathematics anxiety is scr.freq [1/min]. **p < .01 ***p < .001. 3.2.2 correlations table 2 provides correlations between all measures. all correlations were controlled for gender and test score and for the respective mastate baseline during the relaxation exercise. contrary to hypothesis 2, no significant correlation was observed between the two measures of mastate (r = .06, p = .63). matrait showed a moderate and significant correlation with self-reported mastate (r = .34, p = .002), but not with physiological mastate (r = .08, p = .48), which supports hypothesis 3a, but not 3b. including appraisals of control and perceived values, matrait correlated moderately and significantly with control and value (r = -.38, p < .001; r = .24, p = .029). a significant, moderate correlation emerged between appraisals of control and self-reported mastate (r = -.29, p = .008), but not physiological mastate (r = .05, p = .65). in contrast, appraisals of the perceived value were significantly related to physiological mastate (r = .29, p = .007), but not to self-reported mastate (r = .16, p = .15). appraisals of control and perceived value showed no significant relation (r = -.01, p = .95). table 2 correlations between measures of anxiety and appraisals of control and perceived value note. correlations of the two measures of state mathematics anxiety are controlled for their respective baseline. all correlations are controlled for gender and test score. n = 86. *p < .05 **p < .01 ***p < .001. 3.2.3 hierarchical multiple regression results of the hierarchical multiple regressions are reported in table 3. it displays only the predictors added in each step. for the full hierarchical models, see appendix a.1. for the two regressions, we used the two measures of mastate as outcome measures respectively. inclusion of the control variables explained 58% of the variance in physiological mastate during the test (p < .001), and 24% of the variance in self-reported mastate (p < .001). for self-reported mastate, step 2 revealed a significant relation between matrait and self-reported mastate ( β = 0.32, p = .002) that explained additional 9% of the variance in self-reported mastate (p = .002). step 3 did not confirm a relation between appraisals of control or perceived value and self-reported mastate ( β = -0.20, p = .090; β = 0.09, p = .37). step 4 did not reveal an interaction effect of control x value ( β = -0.00, p = .98), and step 5 revealed no three-way interaction effect of matrait x control x value ( β = -0.15, p = .20). similarly, the interaction effects matrait x control and ma x value were not significant ( β = -0.06, p = .59; β = -0.23, p = .053). these findings do not support hypothesis 4a or 5a. overall, the predictors explained 39% of the variance in self-reported mastate (p < .001). we conducted the same hierarchical multiple regression for physiological mastate. contrary to self-reported mastate, step 2 did not reveal a significant relation with matrait ( β = 0.06, p = .48). in step 3, adding appraisals of control and perceived value increased the r2 significantly by 4% (p = .033). in this step, perceived value had a significant positive relation with physiological mastate ( β = 0.19, p = .009), while no relation was found for control ( β = -0.04, p = .65). again, step 4 did not reveal an interaction of the control and value on mastate ( β = 0.04, p = .65). contrary to self-reported mastate, step 5 revealed a negative three-way interaction effect of matrait x control x value ( β = -0.23, p = .008), while the interaction effects matrait x control and matrait x value were not significant ( β = -0.06, p = .44; β = -0.01, p = .96). these effects explained an additional 4% of the variance in physiological mastate (p = .041). the three-way interaction effect is displayed in figure 1 (right). for comparison, figure 1 (left) displays the non-significant interaction for self-reported mastate. because there was no significant direct relation between matrait and physiological mastate, the slopes are less steep in figure 1 (right) than for self-reported mastate. however, it illustrates that the slope of matrait on physiological mastate increases for students appraising low control and high perceived value at the same time. these results support hypothesis 5b, but not hypothesis 4b. overall, the predictors explained 66% of the variance in physiological mastate (p < .001). table 3 hierarchical multiple regression analyses for physiological mastate and self-reported mastate figure 1. relation between matrait and mastate in dependence of control and perceived value. 4. discussion 4.1 measures of state anxiety in mathematics tests in line with hypothesis 1, we found significant differences between the relaxation exercise and the test for both measures of mastate. this implies that the mathematics test induced anxiety compared to the relaxation exercise. however, descriptive analyses showed that self-reports of mastate were relatively low in our study. this might be due to the fact that the experiment was a low-stakes test for the participants. we would assume that our result might emerge even stronger in a high-stakes test situation. in contrast to our hypothesis, students’ self-reports about mastate and their physiological mastate were not significantly related. our assumption had been that even though self-reports and physiological measures might differ to some extent, they should still refer to a similar mastate and hence be related. judging from our results, the two measures might refer to conceptually different aspects of mastate. some researchers suggest that matrait is a multidimensional construct, usually differentiating between a cognitive and an affective dimension (lukowski et al., 2016; wigfield & meece, 1988). similarly, physiological mastate and self-reported mastate as assessed in this study might refer to different facets of mastate. consequently, they might not necessarily be related. for example, eda might be more associated with arousal and an affective, emotional dimension of mastate. in contrast, self-reports might be more related to a cognitive dimension of mastate that is associated with worries and cognitive resources (ashcraft & moore, 2009; beilock, 2008; liebert & morris, 1967). future research could include a multi-dimensional assessment of ma to address this possibility. additionally, the measures might differ because of their differing mode of assessment (pekrun & bühner, 2014). self-reports might not be able to paint an adequate picture of achievement emotions, especially for a highly physiological emotion like anxiety (pekrun & bühner, 2014). furthermore, self-reports of mastate might be subject to biases like social desirability (pekrun & bühner, 2014) or stereotypes (goetz et al., 2013). 4.2 the relation between mastate and matrait in line with goetz et al. (2013), we found a significant relation between matrait and self-reported mastate which was within the range of previous findings reported by hembree (1990). students with higher matrait also reported higher mastate during a mathematical test. however, we did not find a relation between matrait and physiological mastate. this finding is contrary to hypothesis 3b but is in line with previous findings by dew et al. (1984). dew et al. (1984) proposed two explanations. first, the results might be viewed as questioning the construct validity of matrait scales. since these scales have been further validated since then and worked as expected with regard to self-reported mastate, this explanation seems unlikely. alternatively, since students reporting mastate need to evaluate their perceived anxiety cognitively, it is assumed that they might in part refer to generalized beliefs about mathematics. this might include the same resources as their evaluation of matrait (bieg, goetz, & lipnevich, 2014; goetz et al., 2013), or students might even refer directly to their matrait when trying to evaluate mastate. this would increase the relation between self-reported mastate and matrait, but not between physiological mastate and matrait. 4.3 the control-value theory according to the control-value theory (pekrun, 2006), mastate should primarily be determined by appraisals of control and perceived value. these appraisals should also moderate the relation between matrait and mastate. for the two measures of mastate, the application of the control-value theory in the present study produced diverging results. both matrait and self-reported mastate were related to appraisals of control. nevertheless, the hierarchical multiple regression did not produce signs that appraisals of control or value play an important role for the relation between matrait and self-reported mastate. rather, this relation seemed to be direct. hence, we did not find support for the control-value theory for self-reported mastate. as was proposed above, the relation between matrait and self-reported mastate might be increased by the similar mode of assessment. the resulting direct relation could overweight a possible indirect effect of appraisals of control and perceived value. in contrast, physiological mastate showed a different pattern. in opposition to self-reported mastate, we did not find a direct correlation with matrait. however, we found strong support for the control-value theory in this second multiple regression analysis. first, perceived value was related to mastate, independent of matrait. second, including appraisals of control and perceived value explained additional 8% of variance of physiological mastate, which indicates a substantial contribution to its emergence. third, the interplay between matrait, control, and value also was observed as expected. as illustrated in figure 1 (right), high matrait was related to high mastate, but only when students appraised their control low and their perceived value high. this effect is in line with the control-value theory, since matrait is considered a distal antecedent, whereas appraisals of control and perceived value are considered proximal causes of mastate. these results further support the notion that the causal relation between matrait and the two measures of mastate might be conceptually different. 4.4 limitations using eda comes with some immanent limitations, and only some of them can be overcome. for example, eda is subject to physiological gender differences. this inhibits its practicality for inquiring the gender gap in ma. even when controlling for a baseline value, differences in reactivity exist. in general, a large variance between students’ eda makes comparisons more difficult. in our study, we assessed the baseline value during a relatively short period of time. a more reliable value could be obtained through several hours or days of baseline recordings (boucsein, 2012). of course, such a study requires much more time. lastly, even though ma is common in students of all ages, our specific sample cannot be overgeneralized. it needs to be verified if eda recording can be useful in schools and for specific groups of students, for example high-anxiety students or younger students. more generally, our test did not seem to induce a very strong emotional reaction. in order to generalize our findings to high-stakes testing which might cause more mastate, additional studies are needed. moreover, we followed goetz at al. (2013) in using a single-item scale to assess self-reported mastate. this keeps the disruption of the participants at a minimum but might result in some inaccuracies. our results indicate that the scale was working properly, but future studies might try to assess mastate at more occasions or check if the one-item scale is appropriately precise. similarly, a number of different questionnaires exist to assess ma and general test anxiety. comparing these questionnaires regarding their relation to eda, particularly regarding cognitive and affective dimensions of these scales, could help to explain the absent relation between self-reported and physiological mastate. the relation between eda and physiological arousal has been well established by previous research (boucsein, 2012). however, other factors than mastate might additionally influence eda during mathematics tests. future research could incorporate additional state measures that assess cognitive load or situational motivation to further narrow down the processes associated with eda reactivity, and might support these findings through qualitative data like interviews or think-aloud-protocols. until the validity of eda as a measure of physiological mastate is fully understood, results will always require a cautious discussion of limitations and different explanations. the control-value theory is generalizable to various achievement emotions, including both trait and state emotions (pekrun, 2006). in the current cross-sectional study, we focused on mastate as an outcome, and task-specific appraisals of control and perceived value as moderators. consequently, we considered matrait as a distal predictor. however, future studies could also consider matrait as an outcome itself. for analyzing effects of general appraisals of control and perceived value towards mathematics as predictors of matrait, longitudinal designs would be more advantageous. 4.5 conclusion our study combined several innovative approaches that have emerged in research on ma within the last years. with the distinction between matrait and mastate, we differentiated between two different facets of ma. further, through the adoption of the control-value theory, we compared eda recordings and common self-reports as a tool for observing mastate and investigated their unique relations to matrait. overall, we found that eda was related to matrait, but that this relation only got visible when taking appraisals of control and perceived value into account. students reporting high matrait were not necessarily more physiologically anxious during mathematical activities. rather, a pattern of appraisals of low control and high perceived value accompanied that relation. hence, with regard to eda, our results were in line with the control-value theory, which on the other hand was not supported by self-reported measures of mastate. in sum, our findings match the plea by goetz at al. (2013) to consequently distinguish between matrait and mastate in research on ma, as well as to additionally include physiological data in assessing emotions in learning. furthermore, our results indicate that self-reports and physiological measures might refer to different aspects of mastate. thus, our results support theoretical considerations and empirical findings that self-reports of mastate should be interpreted cautiously. ultimately, we cannot decide if self-reports or eda captured actual mastate. rather, the two measures both seem to be related to matrait, but in different ways. therefore, we cannot conclude that eda can make self-reports obsolete, but we propose that the assessment of eda can provide additional information about underlying affective aspects of mastate. because of recent technical advances in recording and analyses of eda, the method seems to offer a convenient addition to the common practice of self-reports. furthermore, the advances in eda-recording offer the possibility to conduct studies in the classroom during regular classes with hardly any disruption. we believe that our study can be a first step into this promising direction of in vivo research on ma. as a next step, the relation to mathematical achievement should be investigated. in the recent study, we used a mathematics test, the goal of which was to trigger ma, but that was not designed to diagnose mathematical achievement in detail. our preliminary results indicate that achievement might be differently associated with self-reported and physiological mastate, but a study that assesses mathematical performance in more detail is needed to shed light on this question. additionally, achievement under conditions of mastate and no mastate should be compared in a within-subject design, since eda shows a notable variance between subjects. similarly, using tests that are not mathematical could help to distinguish how specific mastate is linked to mathematics. at the same time, using eda for other domains or test anxiety in general, possibly using the control-value theory, might be an interesting and fruitful perspective for future research. lastly, the relation to working memory capacity, which has proven to be a key factor in the effects of ma, should be taken into account. ultimately, this knowledge could be used to design longitudinal and intervention studies that use eda to observe the role of mastate for learning processes or create ways to decrease mastate in mathematics tests, possibly without necessarily tackling matrait. with a number of questions remaining unanswered, our study is merely a first step in including eda as an indicator for mastate. nevertheless, the results illustrate that self-reports only comprise one perspective on the multi-faceted phenomenon of mathematics anxiety, and that including eda can be uniquely insightful. keypoints we did not find a correlation between eda and measures of state anxiety or trait mathematics anxiety, respectively. self-reported state anxiety correlated significantly with trait anxiety independent of appraisals of control and perceived value, which is in contrast to the control-value theory. in line with the control-value theory, trait mathematics anxiety predicted physiological state anxiety when high perceived value and low control of the achievement situation were reported. acknowledgments we would like to thank ashley l. johnson and kathrin ebenhöh for their contributions during data collection. this research was funded by the federal ministry of education and research (bmbf) and the standing conference of the ministers of education and cultural affairs of the länder in the federal republic of germany (kmk) [grant number zib2016]. references artemenko, c., daroczy, g., nuerk, h.-c. (2015). neural correlates of math anxiety an overview and implications. frontiers in psychology 6, 1333. doi:10.3389/fpsyg.2015.01333 ashcraft, m. h. (2002). math anxiety: personal, educational, and cognitive consequences. current directions in psychological science 11(5), 181-185. doi:10.111/1467-8721.00196 ashcraft, m. h. (2007). is math anxiety a mathematical learning disability? in d. b. berch & m. m. m. mazzocco (eds.), why is math so hard for some children? the nature and origins of mathematical learning difficulties and disabilities (pp. 329-348). baltimore: brookes. ashcraft, m. h., & kirk, e. p. (2001). the relationships among working memory, math anxiety, and performance. journal of experimental psychology: general, 130(2), 224–237. doi:10.1037//0096-3445.130.2.224 ashcraft, m. h., & moore, a. m. (2009). mathematics anxiety and the affective drop in performance. journal of psychoeducational assessment, 27(3), 197-205. doi:10.1177/0734282908330580 ashcraft, m. h., & ridley, k. s. (2005). math anxiety and its cognitive consequences: a tutorial review in j. i. d. campbell (ed.), handbook of mathematical cognition (pp. 315-327). new york, ny: psychology press. beilock, s. l. (2008). math performance in stressfull situations. current directions in psychological science, 17(5), 339–343. doi:10.1111/j.1467-8721.2008.00602.x benedek, m., & kaernbach, c. (2010). a continuous measure of phasic electrodermal activity. journal of neuroscience methods, 190(1), 80-91. doi:10.1016/j.jneumeth.2010.04.028 bieg, m., goetz, t., & lipnevich, a. a. (2014). what students think they feel differs from what they really feel academic self-concept moderates the discrepancy between students’ trait and state emotional self-reports. plos one, 9(3). doi:10.1371/journal.pone.0092563 bieg, m., goetz, t., wolter, i., & hall, n. c. (2015). gender stereotype endorsment differentially predicts girls’ and boys’ trait-state discrepancy in math anxiety. froniers in psychology, 6(1404). doi:10.3389/fpsyg.2015.01404 boucsein, w. (2012). electrodermal activity (2 ed.). new york: springer. ching, b. h.-h. (2017). mathematics anxiety and working memory: longitudinal associations with mathematical performance in chinese children. contemporary educational psychology, 51, 99-113. doi:10.1016/j.cedpsych.2017.06.006 dew, k. m. h., galassi, j. p., & galassi, m. d. (1984). math anxiety: relation with situational test anxiety, performance, physiological arousal, and math avoidance behaviour. journal of counseling psychology, 31 (4), 580-583. doi:10.1037/0022-0167.31.4.580 dowker, a., sarkar, a., & looi, c. y. (2016). mathematics anxiety: what have we learned in 60 years? frontiers in psychology, 7, 508. doi:10.3389/fpsyg.2016.00508 dindar, m., malmberg, j., järvelä, s., haataja, e., & kirschner, p. a. (2019). matching self-reports with electrodermal activity data: investigating temporal changes in self-regulated learning. education and information technologies. doi:10.1007/s10639-019-10059-5 engeser, s., & rheinberg, f. (2008). flow, performance and moderators of challenge-skill balance. motivation and emotion, 32(3), 158-172. doi:10.1007/s11031-008-9102-4 eysenck, m. w., & calvo, m. g. (1992). anxiety and performance: the processing efficiency theory. cognition and emotion, 6(6), 409. doi:10.1080/02699939208409696 eysenck, m. w., derakshan, n., santos, r., & calvo, m. g. (2007). anxiety and cognitive performance: attentional control theory. emotion, 7(2), 336-353. doi:10.1037/1528-3542.7.2.336 faust, m. w., ashcraft, m. h., & fleck, d. e. (1996). mathematics anxiety effects in simple and complex addition. mathematical cognition, 2, 25-62. doi:10.1080/135467996387534 frenzel, a. c., pekrun, r., & goetz, t. (2007). girls and mathematics a „hopeless“ issue? a control-value approach to gender differences in emotions towards mathematics. european journal of psychology of education, 22(4), 497-514. doi:10.1007/bf03173468 goetz, t., bieg, m., lüdtke, o., pekrun, r., & hall, n. c. (2013). do girls really experience more anxiety in mathematics? psychological science, 24(10), 2079–2087. doi:10.1177/0956797613486989 goldin, g. a. (2014). perspectives on emotion in mathematical engagement, learning, and problem solving. in r. pekrun & l. linnenbrink-garcia (eds.), international handbook of emotions in education (pp. 391-414). new york: routledge. goldin, g. a., hannula, m. s., heyd-metzuyanim, e., jansen, a., kaasila, r., lutovac, s., . . . zhang, q. (eds.). (2016). attitudes, beliefs, motivation and identity in mathematics education. an overview of the field and future directions . hamburg: springer open. hannula, m. s. (2016). introduction. in g. a. goldin, m. s. hannula, e. heyd-metzuyanim, a. jansen, r. kaasila, s. lutovac, p. di martino, f. morselli, j. a. middleton, m. pantziara, & q. zhang (eds.), attitudes, beliefs, motivation and identity in mathematics education. an overview of the field and future directions (pp. 1-2). hamburg: springer open. hembree, r. (1990). the nature, effects and relief of mathematics anxiety. journal for research in mathematics education, 21. doi:10.2307/749455 ho, h.-z., senturk, d., lam, a. g., zimmer, j. m., hong, s., okamoto, y., . . . wang, c.-p. (2000). the affective and cognitive dimensions of math anxiety: a cross-national study. journal for research in mathematics education, 31(3), 362-379. doi:10.2307/749811 international association for the evaluation of educational achievement [iea] (2013). timss 2011 assessment. chestnut hill, ma: timss & pirls international study center. kantor, l., endler, n. s., heslegrave, r. j., kocovski, n. l. (2001). validating self-report measures of state and trait anxiety against a physiological measure. current psychology 20(3), 207-215. doi: 10.1007/s12144-001-1007-2 kehr, h. m., von rosenstiel, l., & bles, p. (1997). zielbindung, subjektive fähigkeiten und intrinsische motivation [goal commitment, subjective abilities, and intrinsic motivation]. paper presented at the 16th colloquium of motivational psychology, potsdam, germany. lee, j. (2009). universals and specifics of math self-concept, math self-efficacy, and math anxiety across 41 pisa 2003 participating countries. learning and individual differences, 19(3), 355–365. doi:10.1016/j.lindif.2008.10.009 liebert, r. m., & morris, l. w. (1967). cognitive and emotional components of test anxiety: a distinction and some initial data. psychological reports, 20(3), 975-978. doi:10.2466/pr0.1967.20.3.975 lukowski, s. l., ditrapani, j., jeon, m., wang, z., schenker, v. j., doran, m. m., . . . petrill, s. a. (2016). multidimensionality in the measurement of math-specific anxiety and its relationship with mathematical performance. learning and individual differences. doi:10.1016/j.lindif.2016.07.007 lyons, i. m., & beilock, s. l. (2012). mathematics anxiety: separating the math from the anxiety. cerebral cortex, 22, 2102-2110. doi:10.1093/cercor/bhr289 ma, x. (1999). a meta-analysis of the relationship between anxiety toward mathematics and achievement in mathematics. journal for research in mathematics education, 30(5), 520-540. doi:10.2307/749772 maloney, e. a., ansari, d., & fugelsang, j. a. (2011). the effect of mathematics anxiety on the processing of numerical magnitude. quarterly journal of experimental psychology, 64(1), 10-16. doi:10.1080/17470218.2010.533278 maloney, e. a., schaeffer, m. w., & beilock, s. l. (2013). mathematics anxiety and stereotype threat: shared mechanisms, negative consequences and promising interventions. research in mathematics education, 15(2), 115–128. doi:10.1080/14794802.2013.797744 mattarella-micke, a., mateo, j., kozak, m. n., foster, k., & beilock, s. l. (2011). choke or thrive? the relation between salvary cortisol and math performance depends on individual differences in working memory and math-anxiety. emotion, 11(4), 1000-1005. doi:10.1037/a0023224 meer, y., breznitz, z., & katzir, t. (2016). calibration of self-reports of anxiety and physiological measures of anxiety while reading in adults with and without readig disability. dyslexia, 22, 267-284. doi:10.1002/dys.1532 mudrick, n. v., taub, m., azevedo, r., price, m. j., & lester, j. (2017). can physiology indicate cognitive, affective, metacognitive, and motivational self-regulated learning processes during multimedia learning? paper presented at the annual meeting of the american educational research association (aera), san antonio, tx. namkung, j. m., peng, p., & lin, x. (2019). the relation between mathematics anxiety and mathematics performance among school-aged students: a meta-analysis. review of educational research, 89(3), 459–496. doi:10.3102/0034654319843494 naveteur, j., & freixa i baqué, e. (1987). individual differences in electrodermal activity as a function of subjects’ anxiety. personality and individual differences, 8(5), 615-626. doi:10.1016/0191-8869(87)90059-6 niculescu, a. c., tempelaar, d., dailey-hebert, a., segers, m., gijselaers, w. (2015). exploring the antecedents of learning-related emotions and their relations with achievement outcomes. frontline learning research 3 (1), 1-17. doi:10.14786/flr.v3i1.136 nikula, r. (1991). psychological correlates of nonspecific skin conductance responses. psychophysiology, 28(1), 86-90. doi:10.1111/j.1469-8986.1991.tb03392.x ng, e. l., & lee, k. (2015). effects of trait test anxiety and state anxiety on children's working memory task performance. learning and individual differences, 40, 141-148. doi:10.1016/j.lindif.2015.04.007 oecd. (2005). pisa 2003 technical report. retrieved from http://www.oecd.org/edu/school/programmeforinternationalstudentassessmentpisa/35188570.pdf oecd. (2013a). pisa 2012 released mathematics items. retrieved from https://www.oecd.org/pisa/pisaproducts/pisa2012-2006-rel-items-maths-eng.pdf oecd. (2013b). pisa 2012 results: ready to learn: students ’engagement, drive and self-beliefs (volume iii): pisa, oecd publishing. pekrun, r. (2006). the control-value theory of achievement emotions: assumptions, corollaries, and implications for educational research and practice. educational psychology review, 18(4), 315–341. doi:10.1007/s10648-006-9029-9 pekrun, r., & bühner, m. (2014). self-report masures of academic emotions. in r. pekrun & l. linnenbrink-garcia (eds.), international handbook of emotions in education (pp. 561-579). new york: routledge. pekrun, r., lichtenfeld, s., marsh, h. w., murayama, k., & götz, t. (2017). achievement emotions and academic performance: longitudinal models of reciprocal effects. child development, 88(5), 1653-1670. doi:10.1111/cdev.12704 pekrun, r., & perry, r. p. (2014). control-value theory of achievement emotions. in r. pekrun & l. linnenbrink-garcia (eds.), international handbook of emotions in education (pp. 120-141). new york: routledge. plake, b. s., & parker, c. s. (1982). the development and validation of a revised version of the mathematics anxiety rating scale. educational and psychological measurement, 42(2), 551-557. doi:10.1177/001316448204200218 pletzer, b., wood, g., moeller, k., nuerk, h. c., & kerschbaum, h. h. (2010). predictors of performance in a real-life statistics examination depend on the individual cortisol profile. biological psychology, 85, 410-416. doi:10.1016/j.biopsycho.2010.08.015 pletzer, b., kronbichler, m., nuerk, h.-c., & kerschbaum, h. h. (2015). mathematics anxiety reduces default mode network deactivation in response to numerical tasks. frontiers in human neuroscience, 9, 202. doi:10.3389/fnhum.2015.00202 richardson, f. c., & suinn, r. m. (1972). the mathematics anxiety rating scale: psychometric data. journal of counseling psychology, 19(6), 551–554. doi:10.1037/h0033456 sarkar, a., dowker, a., & cohen kadosh, r. (2014). cognitive enhancement or cognitive cost: trait-specific outcomes of brain stimulation in the case of mathematics anxiety. the journal of neuroscience, 34, 16605–16610. doi:10.1523/jneurosci.3129-14.2014 steimer, t. (2002). the biology of fearand anxiety-related behaviors. dialogues in clinical neuroscience, 4(3), 231-249. suárez-pellicioni, m., núñez-peña, m. i., & colomé, à. (2016). math anxiety: a review of its cognitive consequences, psychophysiological correlates, and brain bases. cognitive, affective, & behavioral neuroscience, 16(1), 3-22. doi:10.3758/s13415-015-0370-7 wiedemann, k. (2015). anxiety and anxiety disorders. in j. d. wright (ed.), international encyclopedia of the social & behavioral sciences (second edition) (pp. 804-810). amsterdam: elsevier. wigfield, a., & meece, j. l. (1988). math anxiety in elementary and secondary school students. journal of educational psychology, 80 (2), 210-216. doi:10.1037/0022-0663.80.2.210 zeidner, m. (2014). anxiety in education. in r. pekrun & l. linnenbrink-garcia (eds.), international handbook of emotions in education (pp. 120-141). new york: routledge. table a.1 full hierarchical multiple regression analyses for physiological mastate and self-reported mastate codepen testers et al publication frontline learning research vol.7 no. 1 (2019) 23 42 issn 2295-3159 from monocontextual to multicontextual transfer: organizational determinants of the intention to transfer generic information literacy competences to multiple contexts laurent testers a, andreas gegenfurtner brolf van geelc, saskia brand-gruwelc, d abreda university of applied sciences, the netherlands btechnische hochschule deggendorf, germany c open university of the netherlands, the netherlands d welten institute research centre for learning, teaching and technology article received 2 may 2018/ revised 28 september/ accepted 1 october/ available online 24 january 2019 abstract an important goal of educational designers is to achieve long-term transfer of learning that is the learner's application of newly acquired competencies. extensive research during more than a century shows that especially in formal educational settings this fundamental aspect of education often occurs poorly or not at all, leading to what is called a transfer problem. to address this transfer problem, the present study examines intentions to transfer learning to multiple contexts; this focus on multiple transfer contexts extends previous research focusing on a single transfer context, typically the workplace. the present study aimed to estimate the influence of five organizational variables (peer support, supervisor support, opportunity to use, openness to change, and feedback) on pre-training intention to transfer prospective learning in two different transfer contexts: study and work. participants were 303 students at an open university starting a digital course in information literacy. the model was tested using structural equation modelling. the results indicated that before starting the course supervisor support and feedback were considered the strongest predictors of intention to transfer new learning in both the study and the work contexts. this research is amongst the first in the training literature to address multicontextuality and examines intentions to transfer generic competences to the two transfer contexts study and work within one single study. keywords: transfer of learning; training; intention to transfer; information literacy info corresponding author mail: laurent.testers@ou.nl doi: 10.14786/flr.v7i1.359 1. introduction it is widely accepted that transfer of learning the application of what has been learned in new situations is essential to all forms of education or organizational training. however, for over a century ample research in disciplines like human resource development (hrd), psychology, and education shows that transfer is not self-evident. how fluently and unconsciously it might occur in daily life, it often seems to take place sparsely or not at all in formal educational settings (ford, yelon, & billington, 2011; haskell, 2001; katsioloudes, 2015). the aim of this study is to offer means to enhance transfer of learning and to contribute to solving this so-called transfer problem (baldwin & ford, 1988; haskell, 2001) by complementing previous research in various ways. 1.1 monocontextual and multicontextual transfer in previous studies transfer is typically investigated within one single context, predominantly education or work. this monocontextual perspective is, amongst others, expressed in the various definitions of transfer. the focus on only one single context however does not always reflect reality, for example when it concerns training of generic competences that are meant for, or can be applied in, multiple contexts, like information literacy, in which the participants were going to be trained, in various educational settings there is a strong emphasis on, or desire for, the application of newly gained generic competences not only in the student's formal study context but also in their current or prospective work context. also the growing emphasis on lifelong learning encourages the education of more generic multicontextual competences that can be applied not only in an educational study context but also in work and private life. when educational designers are aware of these various contexts and gain insight in their specific features, they are better able to design programmes that are attuned to the students’ needs and realities. this might include showing recognizable examples of aspects of intended transfer contexts that encourage or hamper the application of new learning. or facilitating discussions amongst students on how to use or how to tackle these conditions in their specific situation, which in turn might boost their self-efficacy and self-confidence when transferring new learning. the present study investigated the factors that already before the course influenced the participants' intention to transfer these widely applicable competences to multiple contexts, more specifically to both their study and work contexts. 1.2 intention to transfer the best way to measure the influence of specific variables on the transfer process would be to relate them directly to actual transfer, although there are different opinions on the what, how, and when of this assessment. at this we endorse the viewpoint that a priori actual learning must have taken place before it can be transferred. however, specific circumstances might not allow to actually measuring transfer. this might relate to the kind of skills that are involved (blume, ford, baldwin, & huang, 2010). a so-called closed skill that can only be executed in one specific way, like pressing a button at a specific moment, is relatively easy to monitor. this becomes more difficult with so-called open skills, like critically evaluating information that can be applied in various ways depending on the context and the creativity of the learner. furthermore, information literacy, the set of skills that respondents in this study were about to learn, is considered to be a complex higher-order cognitive skill (brand-gruwel, wopereis, & vermetten, 2005; reece, 2007). the application of information literate behaviour like formulating an adequate information question, efficiently and effectively using information sources and critically evaluating, synthesising and applying information is therefore mainly a conscious and intentional process, especially when it involves information literacy novices. also, the relative autonomy of the student might inhibit the actual measurement of transfer. it will be easier to measure transfer from a person whose behaviour is closely monitored by peers or supervisors than from a person who works relatively autonomous. aforementioned circumstances apply to the present study, making it difficult to actually measure transfer. we therefore have focussed on the students' intention to transfer learning and variables that might influence this intention. the concept intention is sometimes used interchangeably with motivation (foxon, 1994) or as a dimension of motivation (gegenfurtner, 2013). in the present study we adopt a phrase perspective and consider motivation and intention to be different concepts and consecutive steps within a motivational process continuum (al-eisa, furayyan, & alhemoud, 2009; quesada-pallarès, & gegenfurtner, 2015). according to ajzen (1991) intentions ”capture the motivational factors that influence a behavior" (p. 181). intrinsic and extrinsic motives tell us more about the why of a specific behaviour while intention is the resulting commitment, propensity or willingness to actually transfer or apply it. (hutchins, nimon, bates, & holton, 2013). some established behavioural theories including the goal-setting theory of locke and latham (1990), triandis' theory of interpersonal behaviour (1980), and ajzen's social psychological theory of planned behavior acknowledge the role of intentions as important and reliable predictor of behaviour. this is, amongst others, confirmed by an extensive meta-analysis by sheeran (2002). ajzen (1991), who considers behavioural change the result of behavioural intention, defines intentions as "indications of how hard people are willing to try, how much of an effort they are planning to exert, in order to perform the behavior. as a general rule, the stronger the intention to engage in a behavior, the more likely should be its performance" (p. 181). gollwitzer (1993), who extended the research on intentions by differentiating between goal intentions and their successive situation-specific implementation intentions, considered intention to be "the best predictor of behaviour since it is a commitment to achieve respective outcomes and performance of relevant behaviour" (p. 145). in their review article cheng and hampson (2008) concluded that existing transfer models that put an emphasis on the trainee's intention to transfer were missing. and according to hutchins and colleagues (2013) only little research exists on the relationship between various transfer factors and intention to transfer. the present study aimed at expanding the knowledge on intention to transfer by investigating the influence of five organizational factors described below that are considered to be antecedents of transfer: peer support, supervisor support, opportunity to use, openness to change, and feedback. this might provide instructional designers with additional tools to enhance transfer of learning. 1.3 organizational factors based on a comprehensive literature review baldwin and ford (1988) defined three domains of variables that are important to the transfer process: (a) trainee characteristics, (b) training design, and (c) work-environment. in the literature the latter is also referred to as transfer environment, organizational environment, situational factors or constraints, or transfer climate, where the last refers to "components of the work environment that are specifically and intentionally directed at the transfer of training" (nijman, 2004, p. 19). many studies affirm that components of the transfer climate will directly or indirectly influence the application of what has been learned (ajzen & madden, 1986; egan 2004; facteau, dobbins, russell, ladd, & kudisch, 1995; ford, quinones, sego, & sorra, 1992; horner, 2008; hutchins & burke, 2007; kontoghiorghes, 2004; lim & johnson, 2006; rouiller & goldstein, 1993; smith-jentsch, salas, & brannick, 2001; tracey, tannenbaum, & kavanagh, 1995). these studies have identified a broad spectrum of aspects of the transfer environment that may support or inhibit the transfer of learning. rouiller and goldstein (1993) conceptualised transfer climate as "those situations and consequences that either inhibit or help to facilitate the transfer of what has been learned in training into the job situation" (p. 379). they identified situational cues like manager goals, peer support, available facilities, and the opportunity to practice, and consequences like positive or negative feedback and sanctions from both managers and peers when applying acquired skills after training. other factors that are mentioned in the literature are, amongst others, performance coaching, strategic link, accountability, openness to change, differentiated work environments, performance coaching, post-training goals setting, relapse prevention training, workplace design features, level of job autonomy and workload (aguinis & kraiger, 2009; baldwin & ford, 1988; baldwin & huang, 2010; bates, holton, & hatala, 2012; blume, ford, burke & hutchins, 2007; de rijdt, stes, vleuten, & dochy, 2013; grossman & salas, 2011; hutchins et al., 2013; nijman, 2004). the support factors that have been used in this study have been derived from review studies (baldwin & ford, 1988; blume et al., 2010; cheng & hampson, 2008; ford & weissbein, 1997; ling & yusof, 2017; merriam & leahy, 2005; salas & cannon-bowers, 2001; tonhäuser & büker, 2016) and from the learning transfer system inventory (ltsi) by holton and colleagues (bates et al., 2012; holton, bates, & ruona, 2000). in his theory of planned behavior (1991) ajzen considers intentions to be a strong predictor of behaviour and acknowledges the relevance of how "important others" like teachers, colleagues, and supervisors whose approval we desire, feel about our behaviour. this subjective norm or "perceived social pressure to engage or not engage in a specific behaviour" (ajzen, 1991, p. 188) is based on perceived expectations or normative beliefs of these important others and their resulting support or disapproval. in this study we focus on the factors peer support, supervisor support, opportunity to use, openness to change, and feedback; each is described in turn. peer and supervisor support a well-established or even critical direct or mediating predictor of transfer is the support trainees receive when they apply what they have learned (aguinis, 2009; baldwin & ford, 1988; blume et al., 2010; burke & hutchins, 2007; grossman, 2011; nijman, 2004; smith-jentsch et al., 2001; tracey et al., 1995). although not always clearly conceptualized (govaerts & dochy, 2014), in our study support is defined as the reinforcement of the use of newly acquired knowledge, skills and attitudes, in this case during the students' study or on their job. support may be provided in various manners at a managerial level by creating an innovative, open organizational atmosphere, but also at a more personal level by supervisors, peers, subordinates, and even friends or family. the expression of social support before, during or after an intervention can be manifold. peer support focuses more on the actual application of new learning by offering assistance, coaching, encouragement, and feedback. supervisors additionally can offer support by setting proximal and distal learning goals, by behavioural modelling, discussing training content and by actual involvement in a course or training itself (nijman, nijhof, wognum, & veldkamp, 2006; russ-eft, 2002). in our study support refers to encouragement and positive appreciation of the application of new learning by both peers and supervisors. although at first glance it might seem obvious that people will be more inclined to use what they have learned when they are supported when doing so, previous studies show a more differentiated picture. some studies show no or a non-significant impact of peer or supervisor support on transfer (chiaburu & marinova, 2005; den ouden, 1992; homklin, takahashi, & techakanont, 2014; nijman et al., 2006; van der klink, gielen, & nauta, 2001; velada et al., 2007). others report indirect effects of support on transfer via mediating factors like trainee's self-efficacy (mathieu, martineau, & tannenbaum, 1993), mastery goal orientation, motivation to transfer (chiaburu, van dam, & hutchins, 2010; kontoghiorghes, 2004; massenberg, spurk, & kauffeld, 2015; seyler, holton, bates, burnett, & carvalho,1998; tracey, hinkin, tannenbaum, & mathieu, 2001) or intention to transfer (den ouden, 1992; hoekstra, 1998). the ultimate effect of support on the transfer of learning may depend on aspects like the cultural or organizational context (holton, chen, & naquin, 2003), timing, or the extend to which trainees identify themselves with the supportive persons (pidd, 2004) and may be different for peer and for supervisor support. especially to adult learners, the participants in this study, managerial and peer support seem to be crucial to transfer (merriam & leahy, 2005). the aforementioned mixed results justify further research in order to clarify the role of peer and supervisor support in the transfer process. in the present study we have used assessments of social support by 'peers' and 'supervisors' in two contexts: study and work. in the study context fellow students fulfilled the role of peers and lecturers the role of supervisors; in the work context colleagues fulfilled the role peers and superiors the role of supervisors. opportunity to use opportunity to use refers to "the extent to which a trainee is provided with or actively obtains work experiences relevant to the tasks for which he or she was trained" (ford et al., 1992, p. 512). providing trainees with the time, resources, facilities, and tasks to practice and rehearse their newly gained competences is considered to be a strong or even critical predictor of transfer of learning (ford et al., 1992, lim & johnson, 2002, wexley & latham, 1991). putting it differently, in their extensive literature review burke and hutchins (2007) concluded that the absence of opportunity to use was rated the biggest obstacle to transfer. broad and newstrom (1992) found the lack of reinforcement the most significant of nine transfer barriers. it might not be surprising that there is a strong relationship between the opportunity to practice and the support or reinforcement offered by peers (russ-eft, 2002) and supervisors (ford et al, 1992). previous research and models show that opportunity to use does not only enhance transfer of learning directly (baldwin & ford, 1988; holten, 2005) but also indirectly, for example via the mediating variable motivation (cheng & hampson, 2008; massenberg, schulte, & kauffeld, 2016; seyler et al., 1998). openness to change in our so-called information or knowledge based societies knowledge is developed in an ever increasing pace. consequently, for individuals and organizations alike it is important to stay up-to-date with the latest developments. renowned organizations like the unesco, the european commission, and the world bank emphasise the importance of lifelong learning. openness to change reflects an innovative, continuous-learning culture that embraces and supports new developments at various levels. it encourages members to apply newly gained knowledge, skills, and behaviours in order to improve individual and, as a result, organizational performance. it's also one of the training-general or environmental constructs in the learning transfer system inventory by holten and colleagues, although formulated as 'resistance to change' and defined as 'the extent to which prevailing group norms are perceived by individuals to resist or discourage the use of skills and knowledge required in training.' (holton et al., 2000, p. 346). openness to change can be expressed at an individual level by peers and supervisors, within a team or group, and also at a more general organizational level for example by communicating values and norms that emphasise the importance of continuous learning and innovation, by offering training programmes and by facilitating the development and exchange of knowledge. in the present study we refer to openness at a team and a more general organizational level asking, for example, whether or not respondents experienced an open atmosphere that enables changes in the ways things are normally done. previous studies have identified a direct relationship between openness to change and transfer of learning (gilpin-jackson & bushe, 2007; katsioloudes, 2015). feedback the oxford dictionary (2018) defines feedback as 'information about the reaction to a product, a person's performance of a task etc. which is used as a basis for improvement'. performance feedback, often merged with performance coaching, has been examined extensively and is generally considered to be an important direct or mediating predictor of both learning and transfer (clarke, 2002; reinhold et al., 2018; rouiller & goldstein 1993, smith-jentsch et al., 2001; van den bossche, segers, & jansen, 2010; velada, caetano, michel, lyons, & kavanagh, 2007). some studies however did find a mixed (alvero, bucklin, & austin, 2001; gabelica, van den bossche, segers, & gijselaers, 2012), no or even a negative (lim & johnson, 2002) effect of feedback on transfer. in their description of characteristics of a positive transfer climate rouiller and goldstein (1993) mention feedback as a consequence that, together with various situational cues, influences the transfer process. feedback is also an essential aspect in instructional systems design, for example in gagné's conditions of learning (1970), and in learning theories like bandura's social learning theory (1977). interestingly, it is often considered to be an important expression of both supervisor and peer support (lim & johnson, 2002). one can make a distinction between positive or negative, intrinsic or extrinsic, and process or performance feedback at various moments in time, with different frequencies and related to various aspects of the task performance. in the present study feedback is considered to be a one-dimensional social support construct related to performance. according to hutchins and colleagues (2013) only few studies have investigated the relationship between feedback and intention and also van den bossche et al. (2010) conclude that feedback has received relatively little attention in transfer research. 1.4 research question and hypotheses to support the design of educational interventions that enhance transfer of learning to multiple contexts, this study aimed at estimating the extent to which peer support, supervisor support, opportunity to use, openness to change, and feedback predict intention to transfer. as a novel contribution to the literature on transfer of training, the present study compares two different transfer contexts: study and work. we hypothesized positive relationships of peer support (hypothesis 1), supervisor support (hypothesis 2), opportunity to use (hypothesis 3), openness to change (hypothesis 4), and feedback (hypothesis 5) on intention to transfer. due to a lack of previous research addressing multiple transfer contexts, however, no hypotheses were generated whether these relationships were stronger in study or work transfer contexts. figure 1 presents the hypothesized model structure. figure 1. hypothesized relationships in the transfer contexts study and work. 2. methods 2.1 participants participants in this study were 303 adult students of the premaster learning sciences at the open university of the netherlands. most students were in their first year of study. beside their study at the open university students mainly worked as a teacher or tutor in primary and secondary education, higher education, and training. table 1 presents the demographic characteristics of the sample, including gender, age, years of work experience, and work type. table 1 demographic characteristics of the study participants 2.2 training program and procedure the training program students were about to take was a web-based course on information literacy for social scientists (4,3 ects, equal to 120 hours of study), which was mandatory for students to prepare them for their studies (wopereis, frerejean, & brand-gruwel, 2015, 2016). the course was designed according to the four-component instructional design (4c/id) model (van merriënboer & kirschner, 2018). during the course students work on five authentic tasks with varying support and reported on their task solution steps by means of a process worksheet (brand-gruwel, wopereis, & walraven, 2009). students are then provided with feedback on their performance. an example task is: 'imagine you are a teacher in primary education and you want to know more about how to stimulate and support collaborative learning amongst your students. study four information sources using the checklist 'critical reading' and write a short essay (600 words) in which you answer your research questions and critically reflect on them.' provided with information about the course content but before the course had actually started, students completed a survey instrument that was embedded in the electronic learning environment and integrated in the course curriculum as task 0. beforehand they received instructions on how to complete the instrument and were assured that their responses were used confidentially for research purposes only. 2.3 measures the measures were collected with a multi-item questionnaire that was administered as a web-based online survey. a likert-type 7-point response format was used for all scales, ranging from 1 = “do not agree at all” to 7 = “totally agree”. the independent variables were peer support, supervisor support, opportunity to use, openness to change, and feedback. the dependent variable was intention to transfer. to afford comparability, the number of variables was identical in the two transfer contexts study and work. appendix 1 presents all scales. table 2 shows the number of items, the reliability coefficient (cronbach's alpha), and a sample item per scale. table 2 number of items, reliability estimates, and example items of all scales 2.4 analysis initial data screening (cf. kline, 2015) at item level revealed univariate and multivariate normality, linearity, and heteroscedasticity of the data. missing values were missing at random and treated with em imputation (allison, 2003). there were no multivariate outlying cases. in order to examine the structure of the items related to intention to transfer we first conducted several exploratory factor analyses (ml, oblimin). these analyses were performed separately for the study and work contexts. of the initial item pools several items were removed, until an unambiguous 'simple structure' emerged (cf. thurstone, 1947). these factor analyses suggested a five-factor structure, but the items about 'opportunity to use' displayed substantial secondary loadings, in both contexts, blurring the picture of a 'simple' structure. therefore, we eventually decided to discard all these items from the final model. in the discussion section we will elaborate on this in more detail. tables 3 and 4 present the final (exploratory) five-factor solution, disclosing the same five factors in both transfer contexts. the quality and utility of this model is supported by the amount of variance explained in both contexts: 79.42 % (study context) and 83.27 % (work context) of the data. table 3 factor loadings of all scales in the transfer contexts study and work the mra model of figure 1 was tested by structural equation modeling (eqs version 6.3). however, instead of a (simple) path model, a so-called 'hybrid mra model' was examined, which incorporates a measurement model (cf. cfa factor analysis) as well as direct 'causal' effects (cf. mra model) (kline, 2015). this hybrid model was examined separately, for both contexts (study and work). five goodness-of-fit indices were used to estimate the extent to which the hypothesized model structure fitted the entered data. these fit indices were x2 statistics to estimate absolute fit as well as the comparative fit index (cfi), incremental fit index (ifi), the standardized root-mean square residual (srmr), and the root-mean square error of approximation (rmsea) together with its 90% confidence interval to estimate relative fit. we followed recommendations of hu and bentler (1999) for cut-off criteria with cfi > 0.95, ifi > 0.95, srmr < 0.08, and rmsea < .06 to indicate acceptable model fit. table 4 explained total variance of factors in the transfer contexts study and work 3. results the study aimed to estimate the extent to which peer support (hypothesis 1), supervisor support (hypothesis 2), openness to change (hypothesis 4), and feedback (hypothesis 5) influenced pre-training intention to transfer in the two transfer contexts study and work. table 5 presents the means, standard deviations, reliability estimates, and intercorrelations among all factors. table 5 correlation matrix of all variables the five-factor model yielded an acceptable fit in both transfer contexts. table 6 presents the psychometric properties. in the transfer context study, the x2 was 99.43 (df = 80), cfi = 0.99, ifi = 0.99, srmr = 0.04, and rmsea = 0.03 (90 % ci = 0.00, 0.05). in the transfer context work, the x2 was 206.39 (df = 80), cfi = 0.96, ifi = 0.96, srmr = 0.04, and rmsea = 0.07 (90 % ci = 0.06, 0.08). these estimates suggest acceptable model fit that was slightly better for the transfer context study compared to work. table 6 goodness-of-fit indices of the structural models in the transfer contexts study and work figures 2 and 3 present the model parameter estimates of the structural relations among factors for the transfer contexts study and work, respectively. in the transfer context study, intention to transfer was positively predicted by feedback (β = 0.37, p < 0.01), supervisor support (β = 0.31, p < 0.01), and openness to change (β = 0.10, p < 0.05); the relationship between peer support and intention to transfer (β = 0.04) was statistically non-significant. in the transfer context work, intention to transfer was predicted by supervisor support (β = 0.31, p < 0.01) and feedback (β = 0.19, p < 0.01); the relationship of intention to transfer with peer support (β = -0.05) and openness to change (β = -0.04) were statistically non-significant. figure 2. measurement and structural model parameter estimates of transfer context: study figure 3. measurement and structural model parameter estimates of transfer context: work a comparison between the two transfer contexts study and work indicates that the model parameter estimates between the independent and dependent variables were higher in the study context. table 7 presents the differences between beta coefficients for all variables. the highest difference emerged for feedback (study context: β = 0.37, work context: β = 0.19, δ = 0.18) followed by openness to change (study context: β = 0.10, work context: β = -0.04, δ = 0.14). these analyses tend to indicate the benefits of examining multiple transfer contexts when estimating organizational predictors of intention to transfer. table 7 comparison of beta coefficients between the two transfer contexts study and work 4. discussion the goal of this study was to complement previous research on the intention to transfer new learning by investigating the students' pre-course perception of the influence of five support-related variables on their intention to transfer learning to both their study and their work context. in educational settings transfer of learning is typically measured during a test after the training. the first finding of our study is that already before the actual training, in this case a course in generic information literacy competences, five of the eight variables used for both the study and work context were considered significant for the students’ intention to transfer competences they were about to learn during the course. from the literature we learn that transfer is a dynamic process with a temporal dimension, influenced by a multitude of variables, not only during and after, but also already before an intervention (baldwin & ford, 1988; broad, 2005, burke & hutchins, 2008; gegenfurtner, veermans, festner, & gruber, 2009; grossman & salas, 2011; holton & baldwin, 2003). this study confirms this influence in the first pre-training stage of the transfer process. instructional designers might take this into account when framing and presenting a training. (baldwin & magjuka, 1991). preceding a training, for example when designing the course programme but also later on during the course, they might communicate to the students the importance of the variables that appear to be significant predictors of intention to transfer in the students' study and work context and how they are integrated into the course. follow-up studies will investigate the other temporal dimensions. a second finding from the model parameter estimates shows that there is a difference between the beta coefficients for the two transfer contexts study and work. this is especially relevant in situations where it involves education in so-called generic competences like information literacy that can be, or are meant to be applied in various contexts, be it study, work or private life. this finding tends to indicate that, for educational designers, it is beneficial to gain insight into the conditions of, and main actors within the intended transfer contexts. one option would be to involve the input of former students when designing and framing a specific course, for example by using course evaluation reports. a third finding refers to the variable 'opportunity to use' that was dropped from this study. in previous reviews and studies opportunity appeared to be one of the strongest predictors of transfer. in our exploratory factor analysis it loaded onto one factor together with intention to transfer. correlation analysis also showed a relatively strong positive relationship. this might be difficult to explain when looking at the items that were used for both constructs. the ones used for intention referred to the respondents' plans or efforts to apply new learning, while items for opportunity to use specifically referred to workload and the availability of sufficient facilities and opportunities to apply new learning. follow-up interviews with respondents might have shed more light on why they have interpreted both variables in the same manner. looking at the differences between the two transfer contexts in more detail we notice that in the study context feedback (0.37, hypothesis 5), supervisor support (0.31, hypothesis 2), and openness to change (0.10, hypothesis 4) appear to be significant moderate predictors of the students' intention to transfer learning. this is not surprising as especially in an educational context, where students are gaining new competencies, feedback and the support by supervisors, in this case lecturers, is considered essential to, and even a precondition for the proper application of new learning. communicating, for example in the course guide, that both are important aspects of the course might boost the students' pre-course self-confidence and subsequently their intention to apply their newly acquired competences. another aspect within the study context that is significant for the students' intention to transfer, although with a significantly lower beta coefficient, is openness to change. this might be understandable as behavioural change is part and parcel of educational settings and openness to these changes might therefore be taken for granted by the students. the specific educational setting in this research, distance education at the open university with little or no direct contact between students, might explain why peer support isn't seen as significant for the intention to transfer setting. in the students' work context supervisor support (0.31, hypothesis 2) and feedback (0.19, hypothesis 5) appear to be significant for the students' intention to apply new learning. one explanation might be that doing a master programme beside work is a challenging, time and energy consuming activity that will not only improve the learners' individual competences but will also be beneficial for the quality of the organization at large. students therefore might expect that when applying these new competences this will be acknowledged and facilitated by the organization in the person of the supervisors. when it involves an in-company training instructional designers might pay special attention to the organizational support that is available or might have to be organized in order to enhance transfer of learning. as educational designers generally don't have any influence on the supportive conditions within the students' work context they might integrate discussions about these aspects in the course, asking students how relevant feedback and supervisor support are for them, if and how both are available in their organization and in case it is lacking, what steps might be taken to organize it. these discussions might also give an indication of what kind of support is expected. extending the current research by interviewing respondents might also give an explanation of why support of colleagues and an open, innovative team or organizational climate are not relevant for their application of new learning. 4.1. theoretical implications the theoretical relevance of this study is that it adds a novel dimension to the conceptual development of transfer studies. a multicontextual perspective on transfer is currently absent from the training literature, as are studies that systematically compare transfer to more than one context within one sample. we expect that studies addressing this multicontextuality of transfer will continue to confirm or challenge previous monocontextual research on the importance of environmental predictors in specific settings. 4.2 practical implications the practical value of this study lies in the field of educational design. the results confirm that transfer of learning isn't only happening after an intervention. it is a longitudinal process in which various aspects may influence the learners' intention to transfer new learning not only after but already before the actual intervention. furthermore, from a lifelong learning perspective and especially when generic competences are involved, it is important for educational designers not only to focus on the educational setting of an intervention but also to pay attention to aspects of current or prospective transfer contexts, be it study, work or private life, that might be relevant for the transfer process. in order to enhance multicontextual transfer, learners could be involved in the design process for their knowledge of the transfer context. 4.3 limitations and directions for future research one limitation of this study is the use of self-report surveys. although there are valid arguments against using self-report like social desirability or common method bias, some specific situations might prevent additional external measurements. this can be the case when it involves a construct like intention that is difficult to estimate with behavioural measures, or when it involves more autonomously working respondents. previous studies on the other hand indicate that respondents are very well capable of reporting on their own transfer process (chiaburu & tekleab, 2005; devos, dumay, bonami, bates, & holton, 2007; facteau et al., 1995; velada et al., 2007) and that they themselves know best which variables they consider relevant for their intention to transfer. we have attempted to limit undesirable bias by communicating that data was collected anonymously and by offering the opportunity to answer the surveys electronically and therefore in private. for future research however data triangulation using additional measurement instruments would be advisable. another limitation is the setting in which this study has taken place namely adult learners in a distance education context. distance learning creates specific conditions that may have implications for some transfer related variables like peer support. furthermore, according to heery (1996, p. 5) non-traditional students like distance learners "often show an unusual degree of motivation and commitment”. we therefore recommend future studies to be carried out in other learning contexts, for example at regular universities of applied sciences where learning is also directed towards transfer of new learning to study and work contexts. finally, the course and also the integrated survey were mandatory. various studies recommend voluntary enrolment in order to enhance motivation and transfer (gegenfurtner, könings, kosmajac, & gebhardt, 2016) while others argue that mandatory participation enhances transfer as it expresses the importance of the course to the organization (baldwin & magjuka, 1991). future research might investigate the implications of these differences in more detail. future steps in this research will investigate the influence of learner characteristics and intervention variables on the intention to transfer learning to multiple contexts, as well as their longitudinal development over time, measured directly after and three months after the course. despite the extensive body of knowledge on transfer of learning, including on the intention to transfer, studies typically have focussed on transfer within one specific context. a multicontextual focus however is in place when it concerns transfer of generic competences that can be, or are intended to be applied in more than one context. this has been confirmed by the results of the present study. furthermore, in contexts where it’s difficult to measure actual transfer, intention can function as a valuable precursor to and predictor of transfer. the present study indicates that variables and items used in the study offer a valuable contribution to the design of a practical instrument to measure the influence of a selection of key factors on the transfer process. this instrument in turn will help instructional designers to create educational interventions that enhance the transfer of learning. keypoints the present study extends previous research on transfer of learning by introducing a multicontextual perspective when designing training of generic competences that are meant to be applied in more than one context, for example study, work or daily life. we hypothesised positive relationships of five organizational variables on students' pre-training intention to transfer generic competences from the prospective training to both their study and work context. participants were 303 students at an open university starting a course in information literacy. data was collected by means of pre-course self-reports and analysed using structural equation modelling. before starting the course supervisor support and feedback were the strongest predictors of intention to transfer new learning in both the study and the work context. our study confirmed the presumption that transfer of learning is a process that already starts before an actual training. references aguinis, h., & kraiger, k. (2009). benefits of training and development for individuals and teams, organizations, and society. annual review of psychology, 60, 451-474.doi:10.1146/annurev.psych.60.110707.163505 ajzen, i. (1991). the theory of planned behavior. organizational behavior and human decision processes, 50(2), 179-211. doi:10.1016/0749-5978(91)90020-t ajzen, i., & madden, t. j. (1986). prediction of goal-directed behavior: attitudes, intentions, and perceived behavioural control. journal of experimental social psychology, 22(5), 453-474. doi:10.1016/0022-1031(86)90045-4 al-eisa, a.s., furayyan, m.a., & alhemoud, a.m. (2009). an empirical examination of the effects of self-efficacy, supervisor support and motivation to learn on transfer intention. management decision, 47 (8), 1221–1244. doi:10.1108/00251740910984514 allison, p. d. (2003). missing data techniques for structural equation modeling. journal of abnormal psychology, 112(4), 545-557. doi:10.1037/0021-843x.112.4.545 alvero, a. m., bucklin, b. r., & austin, j. (2001). an objective review of the effectiveness and essential characteristics of performance feedback in organizational settings (1985-1998). journal of organizational behavior management, 21(1), 3-29. doi:10.1300/j075v21n01_02 baldwin, t. t., & ford, j. k. (1988). transfer of tranining: a review and directions for future research. personnel psychology, 41(1), 63-105. doi:10.1111/j.1744-6570.1988.tb00632.x baldwin, t. t., & magjuka, r. j. (1991). organizational training and signals of importance: linking pretraining perceptions to intentions to transfer. human resource development quarterly, 2(1), 25-36. pp. 80–96. doi:10.1002/hrdq.3920020106 bandura, a. (1977). social learning theory (englewood cliffs, nj: prentice hall). bates, r. a., holton lll, e. f., & hatala, j. p. (2012). a revised learning transfer system inventory: factorial replication and validation. human resource development international, 15(5), 549-569. doi:10.1080/13678868.2012.726872 blume, b. d., ford, j. k., baldwin, t. t., & huang, j. l. (2010). transfer of training: a meta-analytic review. journal of management, 36(4), 1065-1105. doi:10.1177/0149206309352880 brand-gruwel, s., wopereis, i., & vermetten, y. (2005). information problem solving by experts and novices: analysis of a complex cognitive skill. computers in human behaviour, 21(3), 487-508. doi:10.1016/j.chb.2004.10.005 brand-gruwel, s., wopereis, i., & walraven, a. (2009). a descriptive model of information problem solving while using internet. computers & education, 53(4), 1207-1217. doi:10.1016/j.compedu .2009.06.004 brinkerhoff, r. o., & montesino, m. u. (1995). partnerships for training transfer: lessons from a corporate study. human resource development quarterly, 6(3), 263-274. doi:10.1002/hrdq.3920060305 broad, m. l., & newstrom, j. w. (1992). transfer of training: action packed strategies to ensure high payoff from training investments. reading, ma: addison-wesley. broad, m. l. (2005). beyond transfer of training: engaging systems to improve performance . john wiley & sons. burke, l. a., & hutchins, h. m. (2007). training transfer: an integrative literature review. human resource development review, 6(3), 263-296. doi:10.1177/1534484307303035 cheng, e. w. l., & hampson, i. (2008). transfer of training: a review and new insights. international journal of management review, 10(4), 327-341. doi:10.1111/j.1468-2370.2007.00230.x chiaburu, d. s., van dam, k., & hutchins, h. m. (2010). social support in the workplace and training transfer: a longitudinal analysis.international journal of selection and assessment, 18(2), 187–200. doi:10.1111/j.1468-2389.2010.00500.x. chiaburu, d. s., & marinova, s. v. (2005). what predicts skill transfer? an exploratory study of goal orientation, training self‐efficacy and organizational supports. international journal of training and development, 9(2), 110-123. doi:10.1111/j.1468-2419.2005.00225.x chiaburu, d. s., & tekleab, a. g. (2005). individual and contextual influences on multiple dimensions of training effectiveness. journal of european industrial training, 29(8), 604-626. doi:10.1108/03090590510627085 clarke, n. (2002). job/work environment factors influencing training transfer within a human service agency: some indicative support for baldwin and ford’s transfer climate construct. international journal of training and development, 6(3), 146–162. doi:10.1111/1468-2419.00156 cohen, s., underwood, l. g., & gottlieb, b. h. (eds.). (2000). social support measurement and intervention. oxford: oxford university press. colquitt, j. a., lepine, j. a., & noe, r. a. (2000). toward an integrative theory of training motivation: a meta-analytic path analysis of 20 years of research. journal of applied psychology, 85(5), 678–707. doi:10.1037//0021-9010.85.5.678 cromwell, s. e., & kolb, j. a. (2004). an examination of work-environment support factors affecting transfer of supervisory skills training to the workplace. human resource development quarterly, 15(4), 449–471. doi:10.1002/hrdq.1115 den ouden, m. d. (1992). transfer na bedrijfsopleidingen: een veldonderzoek naar de rol van voornemens, sociale normen, beheersing en sociale steun bij opleidingstransfer. doctoral dissertation. amsterdam: thesis publishers. de rijdt, c., stes, a., van der vleuten, c., & dochy, f. (2013). influencing variables and moderators of transfer of learning to the workplace within the area of staff development in higher education: research review. educational research review, 8(0), 48-74. doi:10.1016/j.edurev.2012.05.007 devos, c., dumay, x., bonami, m., bates, r., & holton, e. (2007). the learning transfer system inventory (ltsi) translated into french: internal structure and predictive validity. international journal of training and development, 11(3), 181-199. doi:10.1111/j.1468-2419.2007.00280.x egan, t. m., yang, b., & bartlett, k. r. (2004). the effects of organizational learning culture and job satisfaction on motivation to transfer learning and turnover intention. human resource development quarterly, 15(3), 279-301. doi:10.1002/hrdq.1104 facteau, j. d., dobbins, g. h., russell, j. e., ladd, r. t., & kudisch, j. d. (1995). the influence of general perceptions of the training environment on pretraining motivation and perceived training transfer. journal of management, 21(1), 1-25. doi:10.1177/014920639502100101 ford, j.k. (1997). advances in training research and practice: an historical perspective. in ford, j., kozlowski, s., kraiger, k., salas, e., & teachout, m. (eds), improving training effectiveness in work organizations (pp. 1-18) mahwah, nj: lawrence erlbaum associates ford, j. k., quinones, m. a., sego, d. j., & sorra, j. (1992). factors affecting the opportunity to perform trained tasks on the job. personnel psychology, 45(3), 511-527. doi:10.1111 /j.17446570.1992.tb00858.x ford, j.k., & weissbein, d. (1997). transfer of training: an updated review. performance improvement quarterly, 10(2) , 22–41. doi:10.1111/j.1937-8327.1997.tb00047.x ford, j. k., yelon, s. l., & billington, a. q. (2011). how much is transferred from training to the job? the 10% delusion as a catalyst for thinking about transfer. performance improvement quarterly, 24(2), 7-24. doi:10.1002/piq.20108 foxon, m. j. (1994). a process approach to transfer of training: part 2; using action planning to facilitate the transfer of training. australian journal of education and technology, 10(1), 1-18. doi:10.14742/ajet.2080 gabelica, c., van den bossche, p., segers, m., & gijselaers, w. (2012). feedback, a powerful lever in teams: a review. educational research review, 7(2), 123-144. doi:10.1016/j.edurev.2011.11.003 gegenfurtner, a. (2011). motivation and transfer in professional training: a meta-analysis of the moderating effects of knowledge type, instruction, and assessment conditions. educational research review, 6 (3), 153–168. doi:10.1016/j.edurev.2011.04.001 gegenfurtner, a. (2013). dimensions of motivation to transfer: a longitudinal analysis of their influence on retention, transfer, and attitude change. vocations and learning, 6(2), 187-205. doi:10.1007/s12186 -012-9084-y gegenfurtner, a., könings, k. d., kosmajac, n., & gebhardt, m. (2016). voluntary or mandatory training participation as a moderator in the relationship between goal orientations and transfer of training. international journal of training and development, 20(4), 290-301. doi:10.1111/ijtd.12089 gegenfurtner, a., veermans, k., festner, d., & gruber, h. (2009). integrative literature review: motivation to transfer training: an integrative literature review. human resource development review, 8(3), 403-423. doi:10.1177/1534484309335970 gilpin-jackson, y., & bushe, g. r. (2007). leadership development training transfer: a case study of posttraining determinants. the journal of management development, 26(10), 980-1004. doi:10.1108/02621710710833423 govaerts, n., & dochy, f. (2014). disentangling the role of the supervisor in transfer of training. educational research review, 12, 77-93. doi:10.1016/j.edurev.2014.05.002 grossman, r., & salas, e. (2011). the transfer of training: what really matters. international journal of training and development, 15(2), 103-120. doi:10.1111/j.1468-2419.2011.00373.x haskell, e. (2001). transfer of learning: cognition, instruction, and reasoning. san diego, usa: academic press. heery, m. (1996). academic library services to non-traditional students. library management, 17(5), 3-13. doi:10.1108/01435129610119584 hoekstra, m. r. (1999). gedragsbeïnvloeding door cursussen. een studie naar de effecten van persoons-, cursusen omgevingskenmerke. [influencing behaviour through training programmes: a study of the effects of personal, training programme and environmental characteristics]. amsterdam, the netherlands: vrije universiteit holton iii, e. f. (2005). holton's evaluation model: new evidence and construct elaborations. advances in developing human resources, 7(1), 37-54. doi:10.1177/1523422304272080 holton, iii, e. f., & baldwin, t. t. (2003). making transfer happen. in holton iii, e. f., & baldwin, t. t. (eds.) improving learning transfer in organizations (pp. 3-15). san francisco, ca: jossy-bass. holton iii, e. f., bates, r. a., & ruona, w. e. (2000). development of a generalized learning transfer system inventory. human resource development quarterly, 11(4), 333-360. doi: 10.1002/1532-1096(200024)11:4<333::aid-hrdq2>3.0.co;2-p holton iii, e. f., chen, h. c., & naquin, s. a. (2003). an examination of learning transfer system characteristics across organizational settings. human resource development quarterly, 14(4), 459–482. doi:10.1002/hrdq.1079 homklin, t., takahashi, y., & techakanont, k. (2014). the influence of social and organizational support on transfer of training: evidence from thailand, international journal of training and development, 18(2), 116-131. doi:10.1111/ijtd.12031 horner, m. t. (2010). toward an understanding of when and why situational constraints influence performance . doctoral dissertation, texas a & m university. hu, l. t., & bentler, p. m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6(1), 1-55. doi:10.1080/10705519909540118 hutchins, h. m. and burke, l. a. (2007), identifying trainers' knowledge of training transfer research findings – closing the gap between research and practice. international journal of training and development, 11 (4), 236–264. doi:10.1111/j.1468-2419.2007.00288.x hutchins, h. m., nimon, k., bates, r., & holton iii, e. (2013). can the ltsi predict transfer performance? testing intent to transfer as a proximal transfer of training outcome. international journal of selection and assessment, 21(3), 251-263. doi:10.1111/ijsa.12035 katsioloudes, v. (2015). supervisory ratings as a measure of training transfer: testing the predictive validity of the learning transfer system inventory . [doctoral dissertation]. kline, r. b. (2015). principles and practice of structural equation modeling (4th ed.). new york: guilford. knowles, m., holton iii, e., & swanson, r. (1998). the adult learner (5th ed.). houston, tx: gulf publishing. kontoghiorghes, c. (2004). reconceptualizing the learning transfer conceptual framework: empirical validation of a new systemic model.international journal of training and development, 8(3), 210-221. doi:10.1111/j.1360-3736.2004.00209.x lim, d. h., & johnson, s. d. (2002). trainee perceptions of factors that influence learning transfer. international journal of training and development, 6(1), 36-48. doi:10.1111/1468-2419.00148 ling, g. j., & yusof, h. m. (2017). a review of the linkage between supervisory support and training transfer. sains humanika, 9(1-3). doi:10.11113/sh.v9n1-3.1140 massenberg, a. c., schulte, e. m., & kauffeld, s. (2017). never too early: learning transfer system factors affecting motivation to transfer before and after training programs. human resource development quarterly, 28(1), 55-85. doi:10.1002/hrdq.21256 massenberg, a. c., spurk, d., & kauffeld, s. (2015). social support at the workplace, motivation to transfer and training transfer: a multilevel indirect effects model. international journal of training and development, 19(3), 161-178. doi:10.1111/ijtd.12054 mathieu, j. e., martineau, j. w., & tannenbaum, s. i. (1993). individual and situational influences on the development of self-efficacy: implications for training effectiveness. personnel psychology, 46(1), 125–147. doi:10.1111/j.1744-6570.1993.tb00870.x mathieu, j.e., tannenbaum, s.i., & salas, e. (1992). influences of individual and situational characteristics on measures of training effectiveness. academy of management journal, 35(4), 828–847. doi:10.2307/256317 merriam, s. b., & leahy, b. (2005). learning transfer: a review of the research in adult education and training. paace journal of lifelong learning, 14(1), 1-24. naquin, s. s., & baldwin, t. t. (2003). managing transfer before learning begins: the transfer-ready learner. in e. f. holton, iii & t. t. baldwin (eds), improving learning transfer in organizations (pp. 80–96). san francisco, usa: jossey-bass. ng, k. h. (2013). the influence of supervisory and peer support on the transfer of training. studies in business & economics 8(3), 82-97. nijman, d. j. (2004). supporting transfer of training: effects of the supervisor [doctoral dissertation].(phd), enschede, the netherlands: universiteit twente. retrieved from http://doc.utwente.nl/76049/ nijman, d., nijhof, w., wognum, a., & veldkamp, b. (2006). exploring differential effects of supervisor support on transfer of training. journal of european industrial training, 30(7), 529–549. doi:10 .1108/03090590610704394 noe, r. a. (1986). trainees' attributes and attitudes: neglected influences on training effectiveness. academy of management journal, 11(4), 736-749. doi:10.5465/amr.1986.4283922 oxford dictionaries, british & world english, retrieved february 18, 2018 from https://en.oxforddictionaries.com/definition/feedback pidd, k. (2004). the impact of workplace support and identity on training transfer: a case study of drug and alcohol safety training in australia. international journal on training and development, 8(4), 274–288. doi:10.1111/j.1360-3736.2004.00214.x reece, g. j. (2007). critical thinking and cognitive transfer: implications for the development of online information literacy tutorials. research strategies, 20, 482-493. doi:10.1016/j.resstr.2006.12.018 reinhold, s., gegenfurtner, a., & lewalter, d. (2018). social support and motivation to transfer as predictors of training transfer: testing full and partial mediation using meta-analytic structural equation modeling. international journal of training and development, 22(1). doi:10.111/ijtd.12115 de rijdt, c., stes, a., vleuten, c. v. d., & dochy, f. (2013). influencing variables and moderators of transfer of learning to the workplace within the area of staff development in higher education: research review. educational research review, 8, 48-74. doi:10.1016/j.edurev.2012.05.007 rouiller, j. z., & goldstein, i. l. (1993). the relationship between organizational transfer climate and positive transfer of training. human resource development quarterly, 4(4), 377-390. doi:10.1002/hrdq.3920040408 russ-eft, d. (2002). a typology of training design and work environment factors affecting workplace learning and transfer. human resource development review, 1(1), 45-65. doi:10.1177/1534484302011003 salas, e., & cannon-bowers, j. a. (2001). the science of training: a decade of progress. annual review of psychology, 52, 471-499. doi:10.1146/annurev.psych.52.1.471 seyler, d. l., holton iii, e. f., bates, r. a., burnett, m. f., & carvalho, m. a. (1998). factors affecting motivation to transfer training. international journal of training and development, 2(1), 2-16. doi:10.1111/1468-2419.00031 sheeran, p. (2002). intention-behavior relations: a conceptual and empirical review. in w. stroebe, & m. hewstone (eds.), european review of social psychology, vol. 12 (pp. 1–36). london: wiley. simons, p. r. j. (1999). transfer of learning: paradoxes for learners. international journal of educational research, 31(7), 577-589. doi:10.1016/s0883-0355(99)00025-7 smith-jentsch, k. a., salas, e., & brannick, m. t. (2001). to transfer or not to transfer? investigating the combined effects of trainee characteristics, team leader support, and team climate. journal of applied psychology, 86(2), 279-292. doi:10.1037/0021-9010.86.2.279 taylor, p. j., russ-eft, d. f., & chan, d. w. l. (2005). a meta-analytic review of behavior modeling training, journal of applied psychology, 90(4), 692–709. doi:10.1037/0021-9010 .90.4.692 thayer, p. w., & teachout, m. s. (1995). a climate for transfer model. brooks air force base, texas. thurstone, l. l. (1947). multiple-factor analysis. chicago: the university of chicago press. tonhäuser, c., & büker, l. (2016). determinants of transfer of training: a comprehensive literature review.international journal for research in vocational education and training, 3(2), 127-165. doi:10.13152/ijrvet.3.2.4 tracey, j. b., hinkin, t. r., tannenbaum, s., & mathieu, j. e. (2001). the influence of individual characteristics and the work environment on varying levels of training outcomes. human resources development quarterly, 12(1), 5–23. doi:10.1002/1532-1096(200101/02)12:1<5::aid-hrdq2>3.0.co;2-j tracey, j. b., tannenbaum, s. i., & kavanagh, m.j. (1995). applying trained skills on the job: the importance of the work environment. journal of applied psychology, 80(2), 239–252. doi:10.1037/0021-9010.80.2.239 triandis, h. c. (1980). values, attitudes, and interpersonal behavior. in h. e. howe & m. page (eds.), nebraska symposium of motivation, 27 (pp. 195–259). lincoln, ne: university of nebraska press. tziner, a., haccoun, r. r., & kadish, a. (1991). personal and situational characteristics influencing the effectiveness of transfer of training improvement strategies, journal of occupational psychology, 64(2), 167–77. doi:10.1111/j.2044-8325.1991.tb00551.x van den bossche, p., segers, m., & jansen, n. (2010). transfer of training: the role of feedback in supportive social networks, international journal of training & development, 14 (2), 81–94. doi:10.1111/j.1468-2419.2010.00343.x van der klink, m., gielen, e., & nauta, c. (2001). supervisory support as a major condition to enhance transfer. international journal of training and development, 5(1), 52-63. doi: 10.1111/1468-2419.00121 van merriënboer, j. j. g., & kirschner, p. a. (2018). ten steps to complex learning: a systematic approach to four-component instructional design . new york, ny: routledge. velada, r., caetano, a., michel, j. w., lyons, b. d., & kavanagh, m. j. (2007). the effects of training design, individual characteristics and work environment on transfer of training. international journal of training & development, 11(4), 282-294. doi:10.1111/j.1468-2419.2007.00286.x wexley k. n., & latham g. p (1991). developing and training human resources in organizations. new york: harper-collins. wopereis, i., frerejean, j., & brand-gruwel, s. (2015). information problem solving instruction in higher education: a case study on instructional design. communications in computer and information science, 552, 293-302. doi:10.1007/978-3-319-28197-1_30 wopereis, i., frerejean, j., & brand-gruwel, s. (2016). teacher perspectives on whole-task information literacy instruction. communications in computer and information science, 676, 678-687). doi: 10.1007/978-3-319-52162-6_66 van laer et elen publication frontline learning research vol.6 no. 3 (2018) 228 issn 2295-3159 towards a methodological framework for sequence analysis in the field of self-regulated learning stijn van laer a& jan eelena aku leuven, belgium article received 13 may 2018 / revised 30 august / accepted 26 september / available online 19 december abstract in recent decades, conceptualizations and operationalizations of self-regulated learning (srl) have shifted from srl as an aptitude to srl as an event. alongside this shift, increased technological capability has introduced computer log files to the investigation of srl, uncovering new research avenues. one such avenue investigates the time-related characteristics of srl through learners’ behavioural sequences. although sequence analysis is still relatively new in srl research, other fields have fruitful traditions in its application and may serve as a basis for applications in the field of srl. ten years of investigating srl through sequence analysis have produced a wide range of methodological approaches. while this variety of methods illustrates the diversity of opportunities, it also indicates the lack of consensus regarding the most appropriate approaches often resulting in difficult to understand methods and non-transparent ways of reporting. since the introduction of sequence analysis in the field of srl, researchers have been emphasizing the need for a methodological framework to guide its application. yet, to date, no such framework has been proposed, hindering our progress through (1) transparent methods and (2) comparative studies to (3) empirical and ecological applications. to help overcome this issue, this manuscript discusses the basis of a methodological framework for the use of sequence analysis in srl research. we first make a case for why such a framework is necessary; secondly, we propose a set of guidelines which could serve as a starting point for the construction of a framework. keywords: computer log files; sequence analysis; self-regulated learning; methodological framework info corresponding author mail stijn.vanlaer@kuleuven.be doi: https://doi.org/10.14786/flr.v6i3.367 acknowledgments we would like to acknowledge the support of the project “adult learners online” funded by the agency for science and technology (project number: sbo 140029), who made this research possible. 1. introduction over the last five decades, multiple theoretical conceptualizations and practical operationalizations have been proposed for self-regulated learning (srl), shifting the focus from srl as an aptitude to srl as an event (e.g., endedijk, brekelmans, sleegers, & vermunt, 2016; panadero, klug, & järvelä, 2016; winne, 2016). besides this shift, technological developments have meant that computer log files now have a role to play in investigations of learners’ srl. from both theoretical and practical perspectives, computer log files are an interesting avenue for investigating learners’ srl (e.g., azevedo & hadwin, 2005; winne, 2005; zimmerman & schunk, 2001). on the one hand, their sequenced structure means that computer log files possess time-related characteristics relevant to the current conceptualization of srl as an event (e.g., azevedo, 2014; ben-eliyahu & bernacki, 2015; molenaar & järvelä, 2014). on the other hand, their unobtrusive nature enables us to observe traces of srl in learners’ behaviour in ecologically valid contexts (e.g., bourbonnais et al., 2006; hine, 2011). while sequence-based analysis has only become popular as a means of investigating the time-related characteristics of srl within the last ten years, other fields of research (e.g., bioinformatics, chemistry, marketing, and sociology) have longstanding traditions in the use of such analyses. insights gained from these fields may serve as a basis for applying sequence analysis in investigations of srl (e.g., perer & wang, 2014; winne & baker, 2013). a decade of log file sequence analysis in srl research has produced a large amount of relevant work (e.g., azevedo, taub, & mudrick, 2015; bannert, molenaar, azevedo, järvelä, & gašević, 2017; molenaar & järvelä, 2014; roll & winne, 2015) and a variety of methodological approaches. while theory-driven approaches for example prefer to recode log files to theoretically meaningful events, predefine the length of an ideal sequence, or set the threshold for significance (e.g., roll & winne, 2015; winne, 2010), data-driven approaches often prefer to extract the most common sequences from the data, regardless of their content and length (e.g., bannert et al., 2017; beheshitha, gašević, & hatala, 2015). differences with regard to the statistical analyses used can also be found. some researchers investigate the occurrence of for example particular sub-sequences as varying from learner to learner and apply multi-level analysis (e.g., taub, azevedo, bouchet, & khosravifar, 2014; taub, azevedo, bradbury, millar, & lester, 2017),while others focus on clusters of learners and instead apply chi-square analysis (e.g., van laer & elen, 2016; van laer, jiang, & elen, 2018) or variance analysis. still others argue that statistical analysis based on sub-sequences is insufficient to establish a full picture of learners’ learning patterns and prefer to use stochastic models based on the entire sequences to operationalize the investigation of learners’ behaviour (e.g., bannert, sonnenberg, mengelkamp, & pieger, 2015; sonnenberg & bannert, 2015). this multitude of approaches demonstrates not only the diversity of opportunities, but also the lack of consensus regarding the most appropriate methods. this lack of consensus often results in fragmentation, leading to non-transparent research practices and research reports, hampering the validation and testing of methods and thus the advancement of the investigation of srl through sequence analysis. in line with this observation, researchers have been emphasizing the need for a methodological framework to guide the application of log file sequence analysis in srl research since 2014 (e.g., azevedo, 2014; bannert, reimann, & sonnenberg, 2014; molenaar & järvelä, 2014). such a methodological framework could, on the one hand, provide a decision-tree-like approach to choosing which analysis to perform when (schnaubert, heimbuch, & bodemer, 2016) and, on the other hand, offer guidelines for reporting on each of the steps taken and considerations made. yet, to date, no methodological frameworks have been proposed (e.g., segedy & biswas, 2015; winne, 2014), hindering our ability to validate, duplicate, and so to demonstrate progress in the use of sequence analysis in srl research and our search for the most appropriate methods (kuhn, 2012). therefore, in this manuscript we discuss a methodological framework for the application of sequence analysis in the field of srl. to do so, we first make a case for why such a methodological framework is necessary. secondly, we propose a set of guidelines which may serve as a starting point for the construction of a framework. with a methodological framework in place, the investigation of time-related characteristics in srl using sequence analysis could evolve towards (1) the use of transparent methods, (2) comparative studies, and (3) empirical and ecological applications, supporting both research and practice. in what follows, we first define sequence analysis, elaborate on its link to srl and introduce the most common phases in its operationalization, providing an illustrative example from one of our own studies. the illustrative example used in this manuscript is not intended as a good practice, but a demonstration of the complexity of sequence analyses and the decisions to be made. at the end of this introductory section, we outline the operational efforts made in the search for tangible proof of progress in sequences analysis for the investigation of learners’ srl as a method. based on insights gathered from the introductory section, the second section proposes a set of guidelines upon which framework construction can be based. in the third and final section, we elaborate on the implications for research and practice and suggest further directions in the construction of a methodological framework for sequence analysis in the field of srl. 1.1 sequence analysis a sequence (β) is an ordered list of elements (β = < a, c, b, d, e, g, c, e, d, b, g >) (zhou, xu, nesbit, & winne, 2010). such elements can be physical, behavioural, or conceptual in nature. the analysis of a sequence makes it possible to discover hidden time-related relations between different sequences, parts of these sequences, and the individual elements within these sequences (antunes & oliveira, 2001). sequence analysis therefore is indispensable in many application domains (e.g., bioinformatics, chemistry, marketing, sociology, and education) (liu, dev, dontcheva, & hoffman, 2016) and approaches are plentiful. for example in bioinformatics sequence analysis is the process of investigating a deoxyribonucleic acid (dna) sequence to understand its features, function, structure, or evolution (e.g., lubahn et al., 1988; stackebrandt & goebel, 1994). in chemistry, sequence analysis comprises the determination of the sequence of a polymer formed of several monomers (e.g., martin, shabanowitz, hunt, & marto, 2000; van krevelen & te nijenhuis, 2009). in marketing, sequence analysis on its turn is often used in analytical customer relationship management applications, such as next product to buy models (e.g., kumar, venkatesan, & reinartz, 2004; prinzie & van den poel, 2007). in sociology, sequence methods are increasingly used to study life-course and career trajectories, patterns of organizational and national development, conversation and interaction structure, and the problem of work and family synchrony (e.g., bonin, vogel, & campbell, 2014; stark & vedres, 2012). finally, in recent years the field of education also gained interest in the investigation of sequence data. methods have been increasingly used in the context of data analysis to investigate learning processes (reimann, markauskaite, & bannert, 2014). one distinct area of learning research in which sequence-analysis methods have been used is srl-research, in particular for studying regulation and metacognition in students' learning through computer log files (e.g., azevedo, moos, johnson, & chauncey, 2010; zhou et al., 2010). computer log files gathered from learners’ interaction with online learning environments are the most know and potentially the least obtrusive way of gathering data with regard to learners’ srl behaviour. such log files are gathered through clickstreams. clickstreams are also known as click paths, or the route that learners choose when clicking or navigating through an online learning environment. a clickstream is a list of pages visited by a learner, presented in the order the pages are visited (also defined as the 'succession of mouse clicks' that each learners makes). based on the sequence of the visited pages, researchers attempt to map learners’ srl processes. 1.2 self-regulated learning and its measurement as learning in general is seen as an activity performed by learners rather than something happening to them as result of instruction (e.g., bandura, 1989; oliver & trigwell, 2005) it entails a self-regulated process through means of which learners’ regulate their behaviour according to the instructional demands (zimmerman & schunk, 2001). to be successful learners, learners need to self-regulate their learning. this assumption is evidenced by a substantial body of literature showing scores on self-regulated-learning-related variables to be strongly positive correlated and to have causal relations with scores on performance-related variables (e.g., daniela, 2015; lin, coburn, & eisenberg, 2016). the theoretical conceptualization of srl evolved from srl as an aptitude to srl as an event. the aptitude approach on the one hand sees srl as in-person, across situations (aggregated over or abstracted from behaviour), and stable from a certain age onwards (e.g., veenman, 2007; winne & perry, 2000). the event approach, on the other hand, conceptualizes srl as a cyclical process unfolding in roughly three phases (forethought, performance and evaluation) (e.g., boekaerts, 1992), influenced by variables internal and external to the learner (e.g., winne & hadwin, 1998). additionally, the event approach also sees srl as covert in nature and so requires inferencing through learners’ behaviours and behavioural consequences (e.g., veenman, prins, & verheij, 2003). although both approaches are still used in research, over the past three decades the event approach gained considerable interest and dominated the investigation of learners’ srl (see: puustinen & pulkkinen, 2001). in line with the shift of conceptualization of srl, also the conceptualization of measurement approaches to capture it evolved. measurements shifted from single measurements administered before or after the execution of a task to continuous measurements administered during the execution of the task (winne & perry, 2000). the latter type of measurement is referred to as an on-line measurement whereas the former is referred to as the off-line measurement of srl (pintrich, 2004). following the shift in conceptualization of the srl concept, the use of offline measurements based on learners’ perceptions (e.g., self-reports) came under stress (endedijk et al., 2016). this is mainly because these types of measurements assume learners are capable to predict, reflect, or estimate in general terms (prior or after a task) how they will act in a certain context and subsequently rely on learners’ perceptions about their own srl rather than on the actual account of srl they exhibit (veenman, bavelaar, de wolf, & van haaren, 2014). the interest in sequence data and sequence analysis in srl taps in into the cyclical nature of srl and has been particularly fuelled by improvements in technical capabilities. the recording of learning-related behavioural data that are suitable for quantitative analysis has become almost effortless and unobtrusive for the learners in computer-based learning environments (winne, nesbit, & popowich, 2017), making this type of data particularly interesting for both practice and research. examples of this usefulness are, the mining of theory-based patterns from big data to identifying srl strategies in massive open online courses (maldonado-mahauad, pérez-sanagustín, kizilcec, morales, & munoz-gama, 2018), the finding of traces of srl in activity streams (cicchinelli et al., 2018), or the assessment of online learning material and its relation to learners’ quantitative behaviour patterns and their effects on motivation and learning performance (yang, li, & xing, 2018). applications are plentiful. 1.3 sequence analysis in the field of self-regulated learning in the field of srl in general sequence analysis refers to a sequence as an ordering of observable behavioural events preceded and followed by an unknown behavioural state (e.g., du, plaisant, spring, & shneiderman, 2016; köck & paramythis, 2011). simply put, each change of state is an event, and each event implies a change of state (müller, studer, gabadinho, & ritschard, 2010). for example, an assumed behavioural state could be reading a content page in an online learning environment, while clicking the calendar tool would be an observable behavioural event that changes the behavioural state of a learner to viewing the calendar page. through the investigation of ordered observable behavioural events (sequence), researchers try to gain insights in the unknown behavioural states learners are in (molenaar & järvelä, 2014). this investigation leads to three types of research questions: (1) questions about the nature of the sequences of the observed events, (2) questions about variables that affect those sequences, and (3) questions about the variables affected by the sequences (abbott & tsay, 2000). to gain a brief insight into the variety of approaches that can be used to handle each of these questions, below we illustrate how the investigation of them can be operationalized. this will be done based on three common phases in the investigation of sequential data in the field of srl (e.g., liu et al., 2016; zhou, 2016). these phases are: (1) the pre-processing phase, (2) the mining and characterization phase, and (3) the analysis phase. secondly, we provide an illustrative example highlighting (1) the complexity of sequence analysis, (2) the theoretical and methodological choices and considerations to be made, and (3) the reporting of the methods used, illustrating the need for agreed upon frameworks to be able to conduct and report sequence analyses transparently. 1.3.1 data structure as the raw computer log file data gathered functions as the input and absolute basis for sequence analysis (coronel & morris, 2016), we will start with the description of the data structure of sequence data before elaborating on the different phases of sequence analysis itself. the most common data format of raw computer-log-file data is the time stamped event (tse) format (gabadinho, ritschard, mueller, & studer, 2011). a tse-dataset contains at least three columns: (1) the timestamp of the observable behavioural event, (2) a personal identifier of the learner, and (3) an event name (of the observable behavioural event). examples of event names are the names of each element in the online learning environment (e.g., discussion form, content page, exercise, etc.), areas on the screen learners clicked, or clicking actions (e.g., caprotti, 2017; cicchinelli et al., 2018; maldonado-mahauad et al., 2018). 1.3.2 pre-processing in a first phase of the sequence-analysis method the raw data is pre-processed (e.g., zhou, 2016). this pre-processing phase generally consists of two steps. the first step relates to the question: “is there a need for recoding the raw data?” if there is a need (i.e., deductive approaches) to recode the raw data, this happens using an action library. such a library specifies the links between the observed events and the recoded, conceptualized concept. examples of these practices include action libraries based on the strict recoding of events or clusters of events using srl theories (e.g., winne et al., 2017). this depends on whether or not the action library is based on think-aloud coding schemes or on pure theoretical conceptualizations (e.g., bannert et al., 2015; taub et al., 2017). another illustration is the coding of computer log files using a tool-related coding scheme (e.g., lust, 2012; lust, vandewaetere, ceulemans, elen, & clarebout, 2011; siadaty, gašević, & hatala, 2016). if there is no need (i.e., inductive approaches) to recode the data, the raw data (as is) can be used (e.g., kurki, järvenoja, järvelä, & mykkänen, 2017; van laer & elen, 2016). the second step in the pre-processing phase involves assigning an ordered list of events to each learner, resulting in a single sequence per user (gabadinho et al., 2011). while the chronological ordering of the observed events suffices for the investigation of the sequential nature of such a sequence, the investigation of the temporal characteristics of a sequence will require the calculation and addition of the distance (time) between consecutive events to the sequence. based on the compilation of a single sequence per learner, sub-sequences and models can be mined and characterized. 1.3.3 mining and characterization after the raw data is pre-processed to a single sequence per learner, research questions related to the characteristics of sequences can be investigated. research questions are plentiful and pertain to the investigation of either whole sequence (β) or sub-sequences (α). a sub-sequence (α) is part of a sequence (β) if the sub-sequence (α) can either directly (α = < b, d, e, g >) or indirectly (α = < c, e, g, e >) be formed from the sequence (β = < a, c, b, d, e, g, c, e, d, b, g >) (zhou et al., 2010). the most common approach to mining and characterizing sequences and sub-sequences is called the algorithmic approach (e.g., kinnebrew, loretz, & biswas, 2013; perez et al., 2017; poole, lambert, murase, asencio, & mcdonald, 2016). this approach assumes the relation between the different events is unknown and therefore attempts to create meaning from the events that have already occurred by investigating the statistical relationships among them (breiman, 2001). efficient algorithms for discovering these characteristics have been proposed in statistical literature. the prominent algorithms are those of bettini, wang, and jajodia (1996), srikant and agrawal (1996), mannila, toivonen, and verkamo (1997), zaki (2001) and masseglia, teisseire, and poncelet (2002). all the algorithms require parameter settings. examples of these parameter settings are (1) time constraints of the occurrence of an event or sub-sequence, (2) a method for counting the occurrences of events and sub-sequences, and (3) a threshold for the identification of frequently occurring events and sub-sequences. once the parameters are defined, of-the-shelve software tools makes it possible to apply algorithmic approaches and to identify typical sequences (models), frequent events, and frequent sub-sequences. common platforms for performing this identification include prom (process mining workbench), developed by van der aalst (2016), spam (sequential pattern mining) by ayres, flannick, gehrke, and yiu (2002), or the traminer (trace mining in r) package developed for r-statistics by gabadinho et al. (2011). an extensive overview of algorithmic tools can be found in slater, joksimović, kovanovic, baker, and gasevic (2017). besides the algorithmic approach described above, there are also other approaches (for an extensive overview see: poole et al. (2016)). examples are theory-driven approach (e.g., cleary, 2011) which hypothesize the characteristics of sequences and sub-sequences and stochastic approach focussing on whole-sequence modelling (e.g., biswas, jeong, kinnebrew, sulcer, & roscoe, 2010; jeong, biswas, johnson, & howard, 2010). once sequences and sub-sequences are mined for and characterized they can be used as variables in statistical trials. 1.3.4 analysis when investigating sequences in the light of srl, we may be interested to know how sequences or sub-sequences are impacted by variables internal or external to the learners (e.g., winne & baker, 2013). for example when providing an instructional intervention to learners, we might want not only to see the change in learners’ learning outcomes but also the change in the occurrence of particular sequences or sub-sequences. another example might be that we want to compare sequences or sub-sequence of learners with low or high motivation (e.g., duffy & azevedo, 2015; jovanović, gašević, dawson, pardo, & mirriahi, 2017). in other words we may want to explore which sequences or sub-sequences discriminate most when different groups’ sub-sequences or averaged sequence are compared. to answer such questions, various approaches have been proposed for the incorporation of sub-sequences and sequences as dependent variables. the approach of studer, mueller, ritschard, and gabadinho (2010) consists of measuring the strength of association of each sequence or sub-sequence with the considered covariate and selects the sequence or sub-sequences with the strongest association. the association is measured with the pearson independence chi-square. the most discriminant one is the one with the highest chi-square. another approach proposed by kinnebrew et al. (2013) relies on multiple comparisons by t-test statistics between groups based on the considered covariate. the t-test is not used to prove that the groups of sequences differ. instead, it is employed as a heuristic for identifying more interesting sub-sequences in an exploratory analysis. this is done for example by determining with 95% confidence that a frequent sub-sequence is shown to be different between the groups. besides these common methods multi-level modelling (e.g., taub et al., 2016), regression analyses (e.g., segedy, kinnebrew, & biswas, 2015), or spearman correlation analysis (e.g., kizilcec, pérez-sanagustín, & maldonado, 2017) are also used. besides the investigation of variables influencing sequences, we can also investigate the influence of sequences on another variable. in the field of srl an example could be the impact of the occurrence of a specific sequence on group performance (e.g., molenaar & chiu, 2015). such research questions investigate the dissimilarities between different sequences (e.g., abbott & tsay, 2000; aisenbrey & fasang, 2010). these dissimilarities are commonly measured using the optimal matching edit distance. the optimal matching edit distance is defined as the minimal cost of transforming one sequence into the other (e.g., biemann & wolf, 2009; mazon, rossi, & toledo, 2014). the transformation operations considered by the optimal matching edit distance are (1) the insertion / deletion cost and (2) a change in the temporal distance resulting in the transformation from one sequence or sub-sequence to another. event dependent costs can be specified both for the insertion/deletion of an event as well as for a one-unit change in the temporal distance of given events. both the insertion / deletion and temporal distance cost result in a distance matrix between sequences themselves. this matrix can then be used in classification methods as well as in scaling methods to investigate the relation between various sequences (e.g., maldonado-mahauad et al., 2018; segedy et al., 2015). 1.4 an illustrative example of sequence analysis earlier we provided a condensed overview of different choices to be made at each phase of the sequence analysis process. to further illustrate the complexity of sequence analysis, the choices to be made, and the reporting of the methods used we provide an example of a study applying sequence analysis. in the study presented, we investigated the impact of reflection cues on learners’ srl. an event approach to srl was chosen focussing on srls’ cyclical, influenceable, and covert nature. srl was operationalized through learners’ learning behaviour and learners’ learning outcomes. two research questions were addressed: the first one investigated the impact of reflection cues on (a) learners’ learning behaviour and (b) on learners’ learning outcomes. the second research question investigated how learners’ learning outcomes related to by learners’ learning behaviour. to answer these questions, a 2x2 mixed factorial design was applied and data was gathered from 60 learners in second chance adult education. half of the group was exposed to additional cues for reflection; the learners in the control group were not. learners’ behavioural data existed of computer-log-file data gathered through an online learning environment in an ecologically valid setting. learners’ learning outcomes were assessed through cognitive (domain knowledge), motivational (goal orientation), and metacognitive (learning effort and learning confidence) tests and questionnaires. the computer-log-file data gathered had the tse-format. the event names were actions learners’ could perform in the online learning environment (i.e., post in the discussion forum; submit assignment; etc.). as a unit of analysis we used the entire eight week course and instructional stability throughout the eight weeks was described using the instrument of van laer and elen (2018). no validated operationalizations of sequence analysis based on the conceptualizations of the cyclical, influenceable, or covert nature of srl could be retrieved to direct the operationalization of our investigation. to deal with the issue of the lack of operationalizations, we decided to follow an approach staying as close to the observed data as possible. an inductive rather than a deductive approach was followed to avoid non-transparent alignment between conceptualization and operationalization. in line with this approach, we limited the assumptions made by (1) taking only into account observed overt events, (2) focussing only on the sequential aspects of computer-log-file data, and by (3) not conceptualizing the evolution of srl through the course of srl resulting on directly observable patterns via frequent sub-sequences rather than the extraction of behavioural models. the pre-processing of the data resulted in one (eight week long, +/10000 events) sequence of ordered raw behavioural events per learner. no recoding was applied, nor was time between events calculated. in the mining and characterization the traminer algorithm (gabadinho et al., 2011) was used in r-statistics to investigate learners’ sequences through the investigation of directly observable patterns via frequent sub-sequences. the identification of frequent sub-sequences was based on (1) the time constraints of the occurrence of events in the observed sub-sequences, (2) a counting method for counting the occurrences of sub-sequences, and on (3) a threshold for the identification of frequently occurring sub-sequences. as only directly observable sub-sequences were targeted, the parameter for the distance between events was set to one, representing that only events directly observed before or after a certain event could be seen as part of a sub-sequence. the counting method chosen was selected arbitrary, based on the occurrence of sub-sequences over the different learners. the frequency threshold was set to 25% meaning that at least 25% of the learners should exhibit the sub-sequence to be counted as frequently occurring. 688 frequent sub sequences were observed. next, we investigated frequent sub-sequences’ relationship with (1) the condition learners were in (impact of cues on behaviour) and (2) learners’ learning outcomes (relations between outcomes and behaviour). in the analysis phase, the frequent sub-sequences were used as dependent variables. for this analysis, chi-square tests were used containing the frequent sub-sequences for discriminating the groups and the variables that defines the groups (condition and learners’ learning outcomes). based on these tests, the effect sizes were calculated using cramer’s v. the cramer’s v expresses the relation between a certain discriminating frequent sub-sequence and the learners’ characteristics and is reported in a value between zero and one. the closer to one the higher the relation is. cohen (1988) refers to small (≤.30), medium (≥.30 and ≤50), and large (≥.50) effect sizes. with regard to the first research question dealing with the investigation of the impact of reflection cues on (a) learners’ learning behaviour and (b) on learners’ learning outcomes, learners in the experimental condition were shown to make significantly more use of sub-sequences consisting of events related to assignments and tasks, communication, and assessment. furthermore, both conditions showed a significant increase in domain knowledge and learning confidence and a decrease in performance goal approach. learners in the experimental condition who received cues for reflection scored significantly higher on performance goal approach compared to the learners in the control condition. as for the interaction effect between time and condition, learners in the experimental condition scored significantly higher for performance avoidance approach compared to their counterparts. this result was unexpected in the light of the aim of the study (van laer et al., 2018). finally, with regard to the second research question dealing with the investigation of how learners’ learning outcomes related to by learners’ learning behaviour, it became clear that changes in learning behaviour seemed to be linked to learning outcomes (performance avoidance approach). results showed that differences in learners’ learning behaviour were observed when learners had different performance avoidance approach scores. 1.5 towards tangible proof of progress as research aims at either at building or testing theory, the research cycle moves from description, to explanation, to testing with repeated iterations through this cycle (van der merwe, 2013). throughout this iterative process, descriptive models are expanded into explanatory frameworks that are tested against reality until they are eventually developed into theories as research study builds upon research study. the result is to validate and add confidence to previous findings, or else invalidate them and force researchers to develop more valid or more complete theories (meredith, 1993). in this way both (1) theoretical conceptualizations of the theory under investigation and their (2) operationalization through measurements are continuously updated and refined. as illustrated throughout the different paragraphs presented above, different operationalizations of sequence analysis can be made. the illustrative example has shown one of these operationalizations. to be able to monitor methodologies’ evolution towards tangible proof of progress and so secure the iterative research cycle, the literature on advances in research methodology (e.g., beach & pedersen, 2013; lupia & alter, 2014) proposes three indications of such an evolution. the first one is the transparency of the method applied (moravcsik, 2014). transparently reported methods permit scholars to assess research and to communicate with one another. unless other scholars can examine evidence, parse the analysis, and understand the processes by which evidence and theories were chosen, why should they trust and thus expend the time and effort to scrutinize, critique, debate, or extend existing research? as demonstrated earlier, a lot of explorative work on the use of sequence analysis has been done, yet most of the studies do not seem to report in detail on the different phases of sequence analysis or on the parameter settings involved in each of them, hampering a thorough study of the method applied. when literature on the investigation of srl through sequence analysis reports on the log file data structure (e.g., biswas, roscoe, jeong, & sulcer, 2009; duffy & azevedo, 2015; lazakidou & retalis, 2010), it often does this through elaborating on the events traced: clicks, pages, specific cognitive or metacognitive activities, and so on. despite the information on the events traced, additional information on the structure of the data, such as the timestamp interval or type of timestamps, session identifiers, etc., is hardly provided. this information is important to distinguish which pre-processing steps are possible or desirable (e.g., calculation of time between events, grouping of learners or individuals, etc.). with regard to this pre-processing phase, in the best cases researchers acknowledge they developed a set of filters or recoding algorithms to remove irrelevant information from the raw log files, with the aim of presenting the relevant information in a compact format that is suitable for further analysis (e.g., jeske, backhaus, & stamov roßnagel, 2014; paans, molenaar, segers, & verhoeven, 2018). nonetheless they hardly ever elaborate on which information was discarded and what made the researcher assume this information could be classified as irrelevant. when for example action libraries are used (e.g., bannert et al., 2014; goldberg et al., 2014) researchers elaborate on the coding scheme, but lack to state how the ‘raw’ events are recoded and what the reliability of this recoding was like. without this information it is impossible to distinguish which coding scheme is most reliable and works best for what data. in line with this, no studies seem to be available which argue for the selection of a certain coding scheme or elaborate on why a certain coding scheme is preferable over another. with regard to the mining and characterization of sequences, most of the current literature seems to indicate which algorithms are used to identify or mine (frequent) sub-sequences. nonetheless, the authors rarely seem to address the assumptions underlying the algorithm used (e.g., balderas, dodero, palomo-duarte, & ruiz-rube, 2015; lan & lu, 2017) or the procedure followed to select the appropriate algorithm (e.g., kizilcec et al., 2017; maldonado-mahauad et al., 2018), let alone the parameter settings applied when mining for sub-sequences. the same is the case when the (frequent) sub-sequences are used as dependent or independent variables. the analysis methods used to answer similar research questions vary from researcher to researcher (ahmadpour & khaasteh, 2017; cerezo, esteban, sánchez-santillán, & núñez, 2017; chen, breslow, & deboer, 2018). traditional cluster analysis and predictive apriori algorithms are used to identify sets of successful learner and environmental characteristics impacting performance, without any explanation of why a certain approach might be considered superior to another. the multitude of methods and the observation of the lack of transparency make the studies unreplicable. the second characteristic relates to the availability of comparative research designs (e.g., bureau & salomonsen, 2012; peterson, 2005). comparison is one of the most powerful tools used in intellectual inquiry, since an observation made repeatedly is given more credence than a single observation. put simply, as argued by mills, van de bunt, and de bruijn (2006) the main goal of comparative research is to search for or identify variance or similarity. although there is quite some comparative research on the measurement of srl, the majority of it focusses at best on the comparison of online behaviour event measurements (i.e., sequence analysis) with offline perception event measurements (i.e., self-reports) (e.g., cho & yoo, 2017; hadwin, nesbit, jamieson-noel, code, & winne, 2007). even at the most basic level of comparison, namely the use of different coding schemes to recode the ‘raw’ event captured in log files, there seems to be hardly any evidence on which coding scheme results in the most accurate results under which conditions (azevedo, 2014). although there are useful summaries of approaches and tools (e.g., slater et al., 2017) as well as ample ideas on how to apply sequence analysis (e.g., azevedo et al., 2010; winne, 2018; winne et al., 2017), no discussion of different sequence analysis methods could be found in the field of srl. based on this observation, research comparing different sequence analysis approaches for srl seems to be missing. this would lead to the identification of commonalities and differences between methods adding to the validation of the method. the third and final characteristic is the application of a method in empirical ecologically valid settings (e.g., chambless & ollendick, 2001; rotter, 1954). only when methods can be applied in different contexts and situations, they can propel and more importantly validate investigations. although there have been attempts in the field of self-regulated learning to operationalize insights drawn from experimental settings in ecologically valid empirical contexts, these attempts are mainly based on a mixture of insights obtained from the experimental setting, accompanied by a data-driven approach to overcome the gaps left by the experimental approach (e.g., parameter settings, coding events, identification of sub-sequences) (e.g., hsu, 2018; ifenthaler, gibson, & dobozy, 2018; taub, azevedo, bradbury, millar, & lester, 2018). no clear attempts to apply and transfer insights between settings seem to be made so far. in summary, it becomes clear that a lot of work still needs to be done. it seems that none of the three indications for tangible proof of progress already has been achieved for the use of sequence analysis in the field of srl. in line with this finding already in 2014, roger azevedo (2014) pointed out that researchers investigating sequence data recoded the data different, made diverse statistical and theoretical assumptions regarding the data collected, and that too easily inferences were drawn from the sequential and temporal unfolding data. his call for action was in vane and repeated multiple times (e.g., molenaar & järvelä, 2014; winne et al., 2017), supported by the expression of the need for standards and frameworks to align the investigation of sequence data in the field of srl (e.g., bannert et al., 2015). partly, the aim of this manuscript is to add to the body of literature calling for standards, protocols, and frameworks that can be tested and validated. by providing general guidelines contributing to a framework for the use and reporting of sequence analysis for srl this manuscript aims to propel the establishment of sequence analysis as a research method. 1.6 problem statement as illustrated in the current section, sequence analysis in the field of srl is an umbrella term covering a large variety of approaches. with regard to the log file data format and the pre-procession phase it often seems unclear how researchers devise and deploy the data scrubbing, cleansing, recoding, or the cleaning processes (e.g., clarke, 2016; müller, naumann, & freytag, 2003; rahm & do, 2000). with regard to the mining and characterization phase and to the analysis phase hardly any explicit references seem to be made to the basis of specific decisions. current research makes it hard to distinguish which parameter settings were derived from literature or which ones were set arbitrary, nor why this is the case. no arguments are given on why a certain approach is considered above another one (e.g., poole et al., 2016; stark & vedres, 2012). the multitude of approaches and considerations does not yet seem to be condensed into a transparent methodological framework for sequence analysis, nor do these approaches seem to contribute yet to tangible proof of progress in the investigation of learners’ srl using sequence analysis based on computer log files. without such a framework it is impossible to test, falsify or modify approaches to sequence analysis and so spark the investigation of learning through sequence analysis. in the next section we propose a set of guidelines functioning as a potential starting point for the construction of a transparent methodological framework to communicate approaches of sequence analysis in the field of srl. 2. guidelines on the use of sequence analysis in the field of self-regulated learning as illustrated throughout the introductory section of this manuscript, the operationalization of sequence analysis to investigate learners’ srl using computer log files is shaped through many practical choices and theoretical assumptions. whereas in the previous sections we highlighted the need for a methodological framework, in what follows we propose a set of guidelines functioning as a potential starting point for the construction of such a framework. we offer guidelines in two main areas. the first one relates to the alignment of the conceptualization and the operationalization of the different components of srl. the second one relates to the enactment of the operationalization of the selected sequence analysis approach. 2.1 alignment of conceptualization and operationalization singleton, straits, and straits (1999) see alignment of conceptualization and operationalization as one of the key features to scientific and methodological success. they refer to the process of conceptualization as the act of defining the different components of a phenomenon under investigation and to operationalization as the practical result of the conceptualization act. in line with this definition, we explore the operational impact of the current conceptualization of srl. as presented in the introduction, current conceptualizations of srl focus on its cyclical, influenceable, and covert nature (e.g., winne & hadwin, 1998). in what follows we relate these three general conceptions to practical consequences in the operationalization of the selected sequence analysis approach chosen. even when these three conceptions are not at the basis of the conceptualization of srl, the conceptions below might shed light on the relation between on the one hand the conceptualization of the phenomenon under investigation and the selected operationalization of sequence analysis. 2.1.1 the cyclical nature of self-regulated learning the idea of srl phases unfolding in different cyclical phases raises questions concerning (1) the dynamics of these cyclical srl process, (2) the sequential patterns within it, as well as (3) the development of the cycle over time. each of these questions bring the notion of sequentiality and temporality to the discourse on srl (e.g., molenaar & järvelä, 2014). while in the literature on srl the ‘temporal’ and ‘sequential’ notion is often used intertwined (knight, wise, & chen, 2017), literature on sequence analysis from other fields of research makes a clear distinction. temporality refers to the passage of elapsed time and comes with a collection of related concepts such as duration, rate, and acceleration (blikstein, 2011; haythornthwaite & gruzd, 2012). sequentiality is used to refer to the order of events and transitions between different events, without explicit reference needed to duration or passages of time (biswas et al., 2010; halatchliyski, hecking, goehnert, & hoppe, 2014). as a first consideration, when for example incorporating only sequential characteristics of srl, the construction of a single sequence per learner in the pre-processing phase will consist of the chronological ordering of events, assuming time between each events’ timestamp of secondary interest. when in contrast also temporal characteristics of srl are taken into account, the construction of learners’ single sequences will include the calculation of the time between the consecutive events’ timestamps and the inclusion of these calculations in further analyses. the latter poses additional conceptual questions to the status of this calculation. does it for example represents a single hidden unknown state or is it instead an indication of involvement with the environment? a second consideration relates to the developmental characteristic of the behaviour observed. when srl-development over time is assumed (e;g., andrade & evans, 2015; huang, klein, & beck, 2017) a whole-sequence approach might be preferred over a sub-sequence approach that does not consider such a development (in the time frame of investigation) (winne & hadwin, 1998). both will affect further analyses. as demonstrated above while briefly reviewing the conceptualization of the cyclical nature of srl, a clear link between (1) the conceptualization of the sequential and temporal characteristics of srl and its practical operationalization and (2) the developmental characteristics of srl over time and its operationalization in practice seem necessary to be able to study the sequence analyses applied. 2.1.2 the influenceable nature of self-regulated learning with regard to how srl comes to be, recent event theories regard srl as influenced by variables internal and external to the learner (veenman, van hout-wolters, & afflerbach, 2006). in general research identifies three major sets of internal variables influencing srl: cognitive (e.g., zimmerman, 1986, 1990, 1998; zimmerman & pons, 1986), metacognitive (borkowski, carr, rellinger, & pressley, 1990; pressley, levin, & mcdaniel, 1987) and motivational variables (e.g., butler & winne, 1995; schraw, crippen, & hartley, 2006; schraw & moshman, 1995; zimmerman, 2000). a substantial body of literature identifies external variables at different grainsize levels influencing learners’ srl. dignath and büttner (2008) for example point in their meta-analysis that (1) instruction of cognitive strategies (i.e., rehearsal, elaboration, and organizational strategies) affected learners’ srl significantly. the same was observed for (2) instruction of metacognitive strategies (i.e., planning, monitoring, and evaluation), (3) promoting metacognitive reflection, and (4) instruction of motivation strategies. another example is the literature review of van laer and elen (2016) identifying seven attributes of learning environments that support learners’ srl. the combinations of the abovementioned internal and external variables make up the timeframe in which srl needs to be investigated. under this conceptualization, each change in either variables internal and / or external to the learner will influence learners’ srl (e.g., greene & azevedo, 2007; winne & hadwin, 1998). thus without the appropriate timeframe in which to investigate learners’ srl insights might be hard to gather. so, the main consideration with regard to the influenceable nature of srl relates to the unit of analysis. the size of the allowed timeframe affects the operationalization of sequence analysis for example through the parameter settings while mining and characterizing sequence and sub-sequence. when for example both internal and external variables are regarded as stable throughout the investigative trial the timeframe might stretch over the entire trial. if in contrast the variables are assumed to be variable at a certain rate, the timeframe might want to match this rate as much as possible. as illustrated before the characterization of sequences and sub-sequences relies amongst others on specifications with regard to the time constraints of the occurrence of an event or sub-sequence. the time dimension raises questions concerning the maximal distance between events (molenaar & järvelä, 2014). moreover, we may consider two or more events form a relevant sequence or sub-sequence only if they occur within a given distance of each other (maximal timespan). for example when they occur in a time window in which both variables within and external to the learner are assumed to be constant. for example completing an exercise right after viewing a content related page might not mean the same as completing that same exercise twenty events after viewing the content related page (e.g., du et al., 2016; jovanovic, pardo, mirriahi, dawson, & gašević, 2017). to conclude, it seems that to be able to study the sequence analyses applied, the operationalization of the unit of analysis (e.g., through parameter settings) as conceptualized through the influenceable nature of srl needs to be elaborated. elaborating on the relation between unit of analysis and parameter settings allows us to assess the suitability of the decisions made. 2.1.3 covert nature of self-regulated learning current conceptualizations assume srl operates at different levels of the cognitive system and so regulates lower order cognitive processes that, in turn, shape learners’ overt cognitive behaviour (roth, ogrin, & schmitz, 2016). this conceptualization results in the assumption that srl occurring in each of the srl phases cannot be directly observed as it manifests in overt cognitive behaviours (williamson, 2015) and through behavioural consequences like learners’ learning outcomes (veenman & alexander, 2011). for instance, when a learner recalculates the outcome of a mathematical equation, it is assumed that a srl monitoring or evaluation process must have preceded this overt cognitive activity of recalculation. as illustrated in the introductory section, conceptualization of the covert nature of srl can be based on the relation between the overt behavioural events with srl-influencing constructs (e.g., tool-use, engagement, etc.) or directly with srl-related activities (e.g., goal-setting and planning, monitoring, etc.) (e.g., azevedo et al., 2015; bannert et al., 2015; lust, 2012). depending on the conceptualization of (a) the covert nature of srl and (b) how this nature can be uncovered through the overt behavioural events observed through computer log files, a link will be constituted with the operationalization of this covert nature through the establishment of action libraries (zhou, 2016). when an action library is used, overt behavioural events (or sets of events) are recoded into more meaningful learner behaviour or srl behaviour. although this practice is common when following a deductive research approach, action libraries often have a different level of granularity as they are developed through other online event measurements (i.e., think aloud trials) (azevedo, 2014). comparing the micro-level approach followed through using computer log files with the codes abstracted from think aloud trails might be problematic given the different grain-size (e.g., al mamun, lawrie, & wright, 2017). in summary, when operationalizing the covert nature of srl via recoding data using different grainsizes (i.e., computer log files and think aloud data), clearly the relationship between observed behaviour and assigned codes needs to be made explicit and communicated transparently. 2.2 enactment of the operationalization of sequence analysis proctor, powell, and mcmillen (2013) see enactment as the step following the conceptualization and operationalization of a research method. they identify enactment as the systematic application of the operationalized concepts. as illustrated before, the quest for tangible proof of progress lies in transparently reported methods and procedures permitting scholars to assess research and to communicate with one another (e.g., beach & pedersen, 2013; lupia & alter, 2014). with regard to the enactment of the operationalization of the selected sequence approach, we focus on two components: systematic account of the operationalization and transparent parameter settings. 2.2.1systematic account of the operationalization the operationalization of sequence analysis starts from gathered data with a specific structure and unfolds roughly in three phases. in the first phase, the pre-processing phase, a single sequence per learner is constructed. in the second phase, sequences and frequent sub-sequences are mined and characterized. identified sequences and sub-sequences can function as either dependent or independent variables. although general approaches such as the one described above have been usefully proposed by for example zhou (2016) and liu et al. (2016), current research fails to go much beyond the vagueness level of this general approach. as demonstrated a multitude of decisions need to be made in the chain of sequence analysis (roll & winne, 2015). keeping systematicity and transparency in mind, detailed accounts of each of these phases would increase replicability (e.g., moravcsik, 2014). systematic accounts of the enactment of the operationalization of sequence analysis might start with a description of the raw data gathered. this not only means reporting on the environment in which the data was gathered, but also the actual structure of the dataset extracted, including for example the database structure. in the pre-processing phase systematicity and transparency might be accomplished through elaborating (among others) on the data cleaning process, the recoding procedure (if applicable), and the transformations applied (if applicable). with regard to the mining and characterization of the sequences and sub-sequences this might be done through a clear account of the different steps taken in the mining and characterization process, including for example a detailed explanation of the algorithm used and the parameters set. finally in the analysis phase transparency and systematicity might be accomplished by the presentation of the output format of the previous phase and the analysis approach chosen (with its key figures). 2.2.2 transparent parameter settings it is clear that the conceptualization of a theory cannot account for each variation in operationalization nor for the justification of each parameter setting (bannert et al., 2017). nonetheless, decisions need to be made to derive useful approaches. regardless of the inductive or deductive approach to the conceptualization of srl, transparency and courage on the part of the researchers is essential to report in detail which parameter settings are derived from theory and which are arbitrary (e.g., tsai, shen, & tsai, 2011). the degree to which this is possible on either side is irrelevant to the argument, as long it is clear which parameter settings are used for what reason or considering what assumption. an example of such a practice was presented in the illustrative case. no theoretical evidence could be found to determine the frequency threshold for identifying frequent sub-sequences. in this case, it was reported that an arbitrary cut-off was set at 25%. 3. implications and conclusions although the use of sequence analysis in the field of self-regulation is one of the last decade, a lot of valuable work has been done to propel the investigation of learners’ srl through sequence analysis. despite these efforts, no methodological framework seems to be available for the systematic application and reporting of sequence analysis in the field of srl. because of the lack of such a framework, tangible proof of progress is difficult to achieve and so the evolution of sequence analysis to investigate srl seems to be hampered. to illustrate the need for such a methodological framework we provided in the introduction of this manuscript a brief overview of the variability of current operationalizations and illustrated one such approach. from the pre-processing phase onwards, over the mining and characterization phase, up to the use of the identified sequences and sub-sequences as dependent and independent variables, a multitude of conceptual and operational choices need to be made. therefore, in this manuscript we aimed to foster discussion on a methodological framework for the application of sequence analysis in the field of srl which would make replication, falsification, and validation possible. to do so, in addition to the case built in the introduction section, we proposed in the previous section a set of guidelines functioning as a potential starting point for the construction of a framework. these guidelines were centred on two key areas. the first area focussed on the alignment of the conceptualization of the different components of srl and the operationalization of the selected sequence analysis approach (e.g., singleton et al., 1999). the other area focussed on the enactment of the operationalization of the selected sequence analysis approach (e.g., proctor et al., 2013). with regard to the former four guidelines were proposed relating to the current conceptualization of srl that relate to: (1) the sequential and temporal characteristics of srl; (2 the development through time of srl; (3) the unit of analysis imposed by the factors influencing srl; (4) the matching-granularity as linked to the covert nature of srl. with regard to the enactment of the operationalization of the sequence analysis approach two guidelines are proposed related to: (1) the systematic account of the operationalization; and (2) the transparent communication of parameter settings. although this manuscript does not pretend to provide solutions nor to be exhaustive with regard to possible approaches to sequences analysis in the field for srl, on the one hand it highlights the need for a transparent and systematic methodological approach. on the other hand, it also identifies guidelines that might function as a basis for further construction of such a methodological framework for the use of sequence analysis in the field of srl. keeping in mind the nature of the guidelines provided, we might wonder whether the guidelines are simply ‘common sense’ and applicable to many other research methods. the latter is certainly the case, yet guidelines have not been constructed before for the investigation of sequence analysis for srl. with regard to the ‘ordinariness’ of the guidelines provided in this manuscript, there is an abundance of systematic methodological literature reviews (e.g., kallio, pietilä, johnson, & kangasniemi, 2016; kelly, lesh, & baek, 2014; mertens, 2014) illustrating that guidelines similar to the ones suggested in this manuscript might be beneficial for a broad range of research methods and also that without such guidelines, the quest for tangible proof of progress is more than likely to be unsuccessful. 3.1 implications when a methodological framework is in place the investigation of time-related characteristics in srl, using sequence analysis might evolve to (1) transparent methods, (2) comparative studies, and (3) empirical and ecological applications, supporting both research and practice. such a framework enables researchers to use the framework to describe and compare current approaches to sequence analysis. a solid description of these approaches, their reproduction and validation first in similar, later in different contexts. the former will allow us to apply the insights gathered not only under very strict conditions in one particular situation but is likely to foster empirical and ecologically valid trails. the latter would be useful for researchers and practitioners using sequence analysis for example to inform the design of their course. by having a methodological framework at their disposal, selecting the most appropriate sequence analysis approach for their needs is facilitated. the sooner we are able to compare, validate, and establish sequence analysis methods, the more quickly we can make progress in the investigation of srl through learners’ learning behaviour. 3.2 further directions as it was not our aim to present a fully developed methodological framework for the use of sequence analysis in the field of srl, in future investigations it might be interesting to further detail the conceptual assumptions related to the investigation of time-related characteristics of srl and their relation to methodological operationalizations. this could be done by incorporating more theoretical research on the investigation of self-regulated learning and extract the possible methodological consequences for the operationalization of the theoretical conceptualizations proposed. another avenue might be the integration of non-content-related research on sequence analysis to further investigate the operational possibilities of the method to further investigate the conceptual assumptions made by choosing a particular approach over another. by doing so a protocol, standard, or framework can be established as a method for the use of sequence analysis which can then be tested, validated and modified to further optimize the use of sequence analysis for the investigation of srl. 3.3 conclusions first, the manuscript built a case for the need of a methodological framework for the application of sequence analysis in the field of self-regulated learning. secondly, it provided a ground for further discussion on the construction of such methodological framework, raising questions about both the conceptualization and the operationalization of sequence analysis in the field of srl. additionally it provides guidelines and possible directions supporting sequence analysis as a method in the field of srl. applying sequence analysis as a method in the field of srl in a more systematic and transparent way might support the development of the method towards more transparency, comparative studies, and empirical and ecological applications, supporting both research and practice. as demonstrated throughout the manuscript it seems that it is not the amount of data gathered that will help us gain insights, but rather the way we analyse them and the thoroughness of that analysis. this manuscript by no means implies that the overview of operationalizations is exhaustive or complete, nor does is pretend to provide a best practice or example through the illustrative example. instead, it aimed to provide a transparent and verifiable framework to discuss the method we see as potentially powerful for investigating learners’ srl. such a framework challenges the assumptions made, approaches taken, and thus propels the investigation of learners’ srl through computer log files to new heights. in conclusion, the guidelines proposed and their underlying call for transparency and systematicity through the construction of a methodological framework for the use of sequence analysis in the field of srl can potentially transcend the use of sequence analysis for computer log files and go as far as to other log file methods investigating the temporal and sequential nature of phenomena. as the literature on the investigation of for example eye movement (e.g., kiefer, giannopoulos, raubal, & duchowski, 2017; lorigo et al., 2008) or skin conduction (e.g., el‐sheikh, 2007; haufler et al., 2017) log files seems to experience similar issues, also these fields of research might benefit from the general guidelines formulated in the manuscript presented (e.g., kuhn, 2012). although the focus in this manuscript is only on one rather specific approach to sequence analysis, the quest for transparency and systematicity is one that relates to all investigations, especially when it comes to new complex methods that require abundant data processing in order to make them meaningful. keypoints substantial increase in the use sequence-analysis methods for investigating self-regulated learning. there is hardly any tangible proof of progress of sequences analysis as a method. alignment of conceptualization and operationalization and transparent enactment is desirable. when such a framework is in place, sequence analysis is more likely to grow as a method. acknowledgments we would like to acknowledge the support of the project “adult learners online” funded by the agency for science and technology (project number: sbo 140029), which made this research possible. references abbott, a., & tsay, a. (2000). sequence analysis and optimal matching methods in sociology: review and prospect. sociological methods & research, 29(1), 3-33. doi: 10.1177/0049124100029001001 ahmadpour, z., & khaasteh, r. (2017). writing behaviors and critical thinking styles: the case of blended learning. khazar journal of humanities & social sciences, 20(1). aisenbrey, s., & fasang, a. e. (2010). new life for old ideas: the" second wave" of sequence analysis bringing the" course" back into the life course. sociological methods & research, 38(3), 420-462. al mamun, m. a., lawrie, g., & wright, t. (2017). factors affecting student engagement in self-directed online learning module. paper presented at the proceedings of the australian conference on science and mathematics education andrade, m. s., & evans, n. w. (2015). developing self-regulated learners. esl readers and writers in higher education: understanding challenges, providing support , 113. antunes, c. m., & oliveira, a. l. (2001). temporal data mining: an overview.paper presented at the kdd workshop on temporal data mining. ayres, j., flannick, j., gehrke, j., & yiu, t. (2002). sequential pattern mining using a bitmap representation.paper presented at the proceedings of the eighth acm sigkdd international conference on knowledge discovery and data mining. azevedo, r. (2014). issues in dealing with sequential and temporal characteristics of self-and socially-regulated learning. metacognition and learning, 9(2), 217-228. azevedo, r., & hadwin, a. f. (2005). scaffolding self-regulated learning and metacognition–implications for the design of computer-based scaffolds: springer. azevedo, r., moos, d. c., johnson, a. m., & chauncey, a. d. (2010). measuring cognitive and metacognitive regulatory processes during hypermedia learning: issues and challenges. educational psychologist, 45(4), 210-223. doi: 10.1080/00461520.2010.515934 azevedo, r., taub, m., & mudrick, n. (2015). technologies supporting self-regulated learning. the sage encyclopedia of educational technology, 731-734. balderas, a., dodero, j. m., palomo-duarte, m., & ruiz-rube, i. (2015). a domain specific language for online learning competence assessments. international journal of engineering education, 31(3), 851-862. bandura, a. (1989). human agency in social cognitive theory. american psychologist, 44(9), 1175. bannert, m., molenaar, i., azevedo, r., järvelä, s., & gašević, d. (2017). relevance of learning analytics to measure and support students' learning in adaptive educational technologies. paper presented at the proceedings of the seventh international learning analytics & knowledge conference. bannert, m., reimann, p., & sonnenberg, c. (2014). process mining techniques for analysing patterns and strategies in students’ self-regulated learning. metacognition and learning, 9(2), 161-185. bannert, m., sonnenberg, c., mengelkamp, c., & pieger, e. (2015). short-and long-term effects of students’ self-directed metacognitive prompts on navigation behavior and learning performance. computers in human behavior, 52, 293-306. beach, d., & pedersen, r. b. (2013). process-tracing methods: foundations and guidelines: university of michigan press. beheshitha, s. s., gašević, d., & hatala, m. (2015). a process mining approach to linking the study of aptitude and event facets of self-regulated learning. paper presented at the proceedings of the fifth international conference on learning analytics and knowledge. ben-eliyahu, a., & bernacki, m. l. (2015). addressing complexities in self-regulated learning: a focus on contextual factors, contingencies, and dynamic relations. metacognition and learning, 10(1), 1-13. doi: 10.1007/s11409-015-9134-6 bettini, c., wang, x. s., & jajodia, s. (1996). testing complex temporal relationships involving multiple granularities and its application to data mining. paper presented at the proceedings of the fifteenth acm sigact-sigmod-sigart symposium on principles of database systems. biemann, t., & wolf, j. (2009). career patterns of top management team members in five countries: an optimal matching analysis. the international journal of human resource management, 20(5), 975-991. doi: 10.1080/09585190902850190 biswas, g., jeong, h., kinnebrew, j. s., sulcer, b., & roscoe, r. (2010). measuring self-regulated learning skills through social interactions in a teachable agent environment. research and practice in technology enhanced learning, 5(02), 123-152. biswas, g., roscoe, r., jeong, h., & sulcer, b. (2009). promoting self-regulated learning skills in agent-based learning environments. paper presented at the proceedings of the 17th international conference on computers in education. blikstein, p. (2011). using learning analytics to assess students' behavior in open-ended programming tasks. paper presented at the proceedings of the 1st international conference on learning analytics and knowledge. boekaerts, m. (1992). the adaptable learning process: initiating and maintaining behavioural change. applied psychology, 41(4), 377–397. doi: 10.1111/j.1464-0597.1992.tb00713.x bonin, f., vogel, c., & campbell, n. (2014). social sequence analysis: temporal sequences in interactional conversations. paper presented at the cognitive infocommunications (coginfocom), 2014 5th ieee conference on. borkowski, j. g., carr, m., rellinger, e., & pressley, m. (1990). self-regulated cognition: interdependence of metacognition, attributions, and self-esteem. dimensions of thinking and cognitive instruction, 1, 53-92. bourbonnais, s., hamel, e. b., lindsay, b. g., liu, c., stankiewitz, j., & truong, t. c. (2006). method, system, and program for merging log entries from multiple recovery log files: google patents. breiman, l. (2001). statistical modeling: the two cultures (with comments and a rejoinder by the author).statistical science, 16(3), 199-231. bureau, v., & salomonsen, h. h. (2012). comparing comparative research designs. butler, d. l., & winne, p. h. (1995). feedback and self-regulated learning: a theoretical synthesis. review of educational research, 65(3), 245-281. doi: 10.2307/1170684 caprotti, o. (2017). shapes of educational data in an online calculus course. journal of learning analytics, 4(2), 76-90. cerezo, r., esteban, m., sánchez-santillán, m., & núñez, j. c. (2017). procrastinating behavior in computer-based learning environments to predict performance: a case study in moodle. frontiers in psychology, 8, 1403. chambless, d. l., & ollendick, t. h. (2001). empirically supported psychological interventions: controversies and evidence. annual review of psychology, 52(1), 685-716. doi: 10.1146/annurev.psych.52.1.685 chen, x., breslow, l., & deboer, j. (2018). analyzing productive learning behaviors for students using immediate corrective feedback in a blended learning environment. computers & education, 117, 59-74. cho, m.-h., & yoo, j. s. (2017). exploring online students’ self-regulated learning with self-reported surveys and log files: a data mining approach. interactive learning environments, 25(8), 970-982. cicchinelli, a., veas, e., pardo, a., pammer-schindler, v., fessl, a., barreiros, c., & lindstädt, s. (2018). finding traces of self-regulated learning in activity streams. clarke, r. (2016). big data, big risks. information systems journal, 26(1), 77-90. doi: 10.1111/isj.12088 cleary, t. j. (2011). emergence of self-regulated learning microanalysis. handbook of self-regulation of learning and performance, 329-345. coronel, c., & morris, s. (2016). database systems: design, implementation, & management: cengage learning. daniela, p. (2015). the relationship between self-regulation, motivation and performance at secondary school students. procedia-social and behavioral sciences, 191, 2549-2553. doi: 10.1016/j.sbspro.2015.04.410 dignath, c., & büttner, g. (2008). components of fostering self-regulated learning among students. a meta-analysis on intervention studies at primary and secondary school level. metacognition and learning, 3(3), 231-264. doi: 10.1007/s11409-008-9029-x du, f., plaisant, c., spring, n., & shneiderman, b. (2016). eventaction: visual analytics for temporal event sequence recommendation. paper presented at the visual analytics science and technology (vast), 2016 ieee conference on. duffy, m. c., & azevedo, r. (2015). motivation matters: interactions between achievement goals and agent scaffolding for self-regulated learning within an intelligent tutoring system. computers in human behavior, 52, 338-348. el‐sheikh, m. (2007). children's skin conductance level and reactivity: are these measures stable over time and across tasks? developmental psychobiology, 49(2), 180-186. endedijk, m. d., brekelmans, m., sleegers, p., & vermunt, j. d. (2016). measuring students’ self-regulated learning in professional education: bridging the gap between event and aptitude measurements. quality & quantity, 50(5), 2141-2164. gabadinho, a., ritschard, g., mueller, n. s., & studer, m. (2011). analyzing and visualizing state sequences in r with traminer. journal of statistical software, 40(4), 1-37. goldberg, b., sottilare, r., roll, i., lajoie, s., poitras, e., biswas, g., . . . long, y. (2014). enhancing self-regulated learning through metacognitively-aware intelligent tutoring systems: boulder, co: international society of the learning sciences. greene, j. a., & azevedo, r. (2007). a theoretical review of winne and hadwin's model of self-regulated learning: new perspectives and directions. review of educational research, 77(3), 334-372. doi: 10.3102/003465430303953 hadwin, a. f., nesbit, j. c., jamieson-noel, d., code, j., & winne, p. h. (2007). examining trace data to explore self-regulated learning. metacognition and learning, 2(2-3), 107-124. halatchliyski, i., hecking, t., goehnert, t., & hoppe, h. u. (2014). analyzing the main paths of knowledge evolution and contributor roles in an open learning community. journal of learning analytics, 1(2), 72-93. haufler, a. j., lewis, g. f., davila, m. i., westhelle, f., gavrilis, j., bryce, c. i., . . . mcdaniel, w. (2017). biobehavioral insights into adaptive behavior in complex and dynamic operational settings: lessons learned from the soldier performance and effective, adaptable response (spear) task. frontiers in medicine, 4, 217. haythornthwaite, c., & gruzd, a. (2012). exploring patterns and configurations in networked learning texts. paper presented at the system science (hicss), 2012 45th hawaii international conference on. hine, c. (2011). internet research and unobtrusive methods. social research update(61), 1. hsu, t.-c. (2018). behavioural sequential analysis of using an instant response application to enhance peer interactions in a flipped classroom. interactive learning environments, 26(1), 91-105. huang, n., klein, m., & beck, a. (2017). measuring student teachers development of metacognition and self-regulated learning in professional dialogue. ecer 2017. ifenthaler, d., gibson, d., & dobozy, e. (2018). informing learning design through analytics: applying network graph analysis. australasian journal of educational technology, 34(2). jeong, h., biswas, g., johnson, j., & howard, l. (2010). analysis of productive learning behaviors in a structured inquiry cycle using hidden markov models. paper presented at the educational data mining 2010. jeske, d., backhaus, j., & stamov roßnagel, c. (2014). self‐regulation during e‐learning: using behavioural evidence from navigation log files. journal of computer assisted learning, 30(3), 272-284. jovanović, j., gašević, d., dawson, s., pardo, a., & mirriahi, n. (2017). learning analytics to unveil learning strategies in a flipped classroom. the internet and higher education, 33, 74-85. doi: 10.1016/j.iheduc.2017.02.001 jovanovic, j., pardo, a., mirriahi, n., dawson, s., & gašević, d. (2017). an analytics-based framework to support teaching and learning in a flipped classroom. learning analytics in the classroom: translating learning analytics research for teachers. oxon: routledge . kallio, h., pietilä, a. m., johnson, m., & kangasniemi, m. (2016). systematic methodological review: developing a framework for a qualitative semi‐structured interview guide. journal of advanced nursing, 72 (12), 2954-2965. kelly, a. e., lesh, r. a., & baek, j. y. (2014). handbook of design research methods in education: innovations in science, technology, engineering, and mathematics learning and teaching : routledge. kiefer, p., giannopoulos, i., raubal, m., & duchowski, a. (2017). eye tracking for spatial research: cognition, computation, challenges. spatial cognition & computation, 17(1-2), 1-19. doi: 10.1080/13875868.2016.1254634 kinnebrew, j. s., loretz, k. m., & biswas, g. (2013). a contextualized, differential sequence mining method to derive students' learning behavior patterns. jedm| journal of educational data mining, 5(1), 190-219. kizilcec, r. f., pérez-sanagustín, m., & maldonado, j. j. (2017). self-regulated learning strategies predict learner behavior and goal attainment in massive open online courses. computers & education, 104, 18-33. doi: 10.1016/j.compedu.2016.10.001 knight, s., wise, a. f., & chen, b. (2017). time for change: why learning analytics needs temporal analysis. journal of learning analytics, 4(3), 7-17. köck, m., & paramythis, a. (2011). activity sequence modelling and dynamic clustering for personalized e-learning. user modeling and user-adapted interaction, 21(1-2), 51-97. doi: 10.1007/s11257-010-9087-z kuhn, t. s. (2012). the structure of scientific revolutions: university of chicago press. kumar, v., venkatesan, r., & reinartz, w. (2004). a purchase sequence analysis framework for targeting products, customers and time period. forthcoming in journal of marketing. kurki, k., järvenoja, h., järvelä, s., & mykkänen, a. (2017). young children’s use of emotion and behaviour regulation strategies in socio-emotionally challenging day-care situations. early childhood research quarterly, 41, 50-62. lan, m., & lu, j. (2017). assessing the effectiveness of self-regulated learning in moocs using macro-level behavioural sequence data. paper presented at the emoocs-wip. lazakidou, g., & retalis, s. (2010). using computer supported collaborative learning strategies for helping students acquire self-regulated problem-solving skills in mathematics. computers & education, 54(1), 3-13. lin, b., coburn, s. s., & eisenberg, n. (2016). self-regulation and reading achievement. the cognitive development of reading and reading comprehension, 67-86. liu, z., dev, h., dontcheva, m., & hoffman, m. (2016). mining, pruning and visualizing frequent patterns for temporal event sequence analysis. paper presented at the proceedings of the ieee vis 2016 workshop on temporal & sequential event analysis. lorigo, l., haridasan, m., brynjarsdóttir, h., xia, l., joachims, t., gay, g., . . . pan, b. (2008). eye tracking and online search: lessons learned and challenges ahead. journal of the association for information science and technology, 59 (7), 1041-1052. doi: 10.1002/asi.20794 lubahn, d. b., joseph, d. r., sar, m., tan, j.-a., higgs, h. n., larson, r. e., . . . wilson, e. m. (1988). the human androgen receptor: complementary deoxyribonucleic acid cloning, sequence analysis and gene expression in prostate. molecular endocrinology, 2(12), 1265-1275. doi: 10.1210/mend-2-12-1265 lupia, a., & alter, g. (2014). data access and research transparency in the quantitative tradition. ps: political science & politics, 47(1), 54-59. doi: 10.1017/s1049096513001728 lust, g. (2012). opening the black box. students' tool-use within a technology-enhanced learning environment: an ecological-valid approach. lust, g., vandewaetere, m., ceulemans, e., elen, j., & clarebout, g. (2011). tool-use in a blended undergraduate course: in search of user profiles. computers & education, 57(3), 2135-2144. doi: 10.1016/j.compedu.2011.05.010 maldonado-mahauad, j., pérez-sanagustín, m., kizilcec, r. f., morales, n., & munoz-gama, j. (2018). mining theory-based patterns from big data: identifying self-regulated learning strategies in massive open online courses. computers in human behavior, 80, 179-196. mannila, h., toivonen, h., & verkamo, a. i. (1997). discovery of frequent episodes in event sequences. data mining and knowledge discovery, 1(3), 259-289. doi: 10.1023/a:1009748302351 martin, s. e., shabanowitz, j., hunt, d. f., & marto, j. a. (2000). subfemtomole ms and ms/ms peptide sequence analysis using nano-hplc micro-esi fourier transform ion cyclotron resonance mass spectrometry. analytical chemistry, 72(18), 4266-4274. doi: 10.1021/ac000497v masseglia, f., teisseire, m., & poncelet, p. (2002). real time web usage mining with a distributed navigation analysis. paper presented at the research issues in data engineering: engineering e-commerce/e-business systems, 2002. ride-2ec 2002. proceedings. twelfth international workshop on. mazon, j. m., rossi, j. d., & toledo, j. (2014). an optimal matching problem for the euclidean distance. siam journal on mathematical analysis, 46(1), 233-255. doi: 10.1137/120901465 meredith, j. (1993). theory building through conceptual methods.international journal of operations & production management, 13(5), 3-11. mertens, d. m. (2014). research and evaluation in education and psychology: integrating diversity with quantitative, qualitative, and mixed methods : sage publications. mills, m., van de bunt, g. g., & de bruijn, j. (2006). comparative research: persistent problems and promising solutions. international sociology, 21(5), 619-631. doi: 10.1177/0268580906067833 molenaar, i., & chiu, m. m. (2015). effects of sequences of socially regulated learning on group performance. paper presented at the proceedings of the fifth international conference on learning analytics and knowledge. molenaar, i., & järvelä, s. (2014). sequential and temporal characteristics of self and socially regulated learning. metacognition and learning, 9(2), 75-85. doi: 10.1007/s11409-014-9114-2 moravcsik, a. (2014). transparency: the revolution in qualitative research. ps: political science & politics, 47(1), 48-53. doi: 10.1017/s1049096513001789 müller, h., naumann, f., & freytag, j.-c. (2003). data quality in genome databases. müller, n. s., studer, m., gabadinho, a., & ritschard, g. (2010). analyse de séquences d'événements avec traminer.paper presented at the egc. oliver, m., & trigwell, k. (2005). can ‘blended learning’be redeemed. e-learning, 2(1), 17–26. doi: 10.2304/elea.2005.2.1.17 paans, c., molenaar, i., segers, e., & verhoeven, l. (2018). temporal variation in children's self-regulated hypermedia learning. computers in human behavior. panadero, e., klug, j., & järvelä, s. (2016). third wave of measurement in the self-regulated learning field: when measurement and intervention come hand in hand. scandinavian journal of educational research, 60(6), 723-735. doi: 10.1080/00313831.2015.1066436 perer, a., & wang, f. (2014). frequence: interactive mining and visualization of temporal frequent event sequences. paper presented at the proceedings of the 19th international conference on intelligent user interfaces. perez, s., massey-allard, j., butler, d., ives, j., bonn, d., yee, n., & roll, i. (2017). identifying productive inquiry in virtual labs using sequence mining. paper presented at the international conference on artificial intelligence in education. peterson, r. a. (2005). problems in comparative research: the example of omnivorousness. poetics, 33(5), 257-282. doi: 10.1016/j.poetic.2005.10.002 pintrich, p. r. (2004). a conceptual framework for assessing motivation and self-regulated learning in college students. educational psychology review, 16(4), 385-407. doi: 10.1007/s10648-004-0006-x poole, m. s., lambert, n., murase, t., asencio, r., & mcdonald, j. (2016). sequential analysis of processes. the sage handbook of process organization studies, 254. pressley, m., levin, j. r., & mcdaniel, m. a. (1987). remembering versus inferring what a word means: mnemonic and contextual approaches. prinzie, a., & van den poel, d. (2007). predicting home-appliance acquisition sequences: markov/markov for discrimination and survival analysis for modeling sequential information in nptb models. decision support systems, 44(1), 28-45. doi: 10.1016/j.dss.2007.02.008 proctor, e. k., powell, b. j., & mcmillen, j. c. (2013). implementation strategies: recommendations for specifying and reporting. implementation science, 8(1), 139. doi: 10.1186/1748-5908-8-139 puustinen, m., & pulkkinen, l. (2001). models of self-regulated learning: a review. scandinavian journal of educational research, 45(3), 269-286. doi: 10.1080/00313830120074206 rahm, e., & do, h. h. (2000). data cleaning: problems and current approaches. ieee data eng. bull., 23(4), 3-13. reimann, p., markauskaite, l., & bannert, m. (2014). e‐research and learning theory: what do sequence and process mining methods contribute? british journal of educational technology, 45(3), 528-540. roll, i., & winne, p. h. (2015). understanding, evaluating, and supporting self-regulated learning using learning analytics. journal of learning analytics, 2(1), 7-12. roth, a., ogrin, s., & schmitz, b. (2016). assessing self-regulated learning in higher education: a systematic literature review of self-report instruments. educational assessment, evaluation and accountability, 28(3), 225-250. doi: 10.1007/s11092-015-9229-2 rotter, j. b. (1954). social learning and clinical psychology. schnaubert, l., heimbuch, s., & bodemer, d. (2016). extracting selection strategies: comapring measures to analyze sequential data. paper presented at the earli sig27 measuring learning online, oulu, finland. schraw, g., crippen, k. j., & hartley, k. (2006). promoting self-regulation in science education: metacognition as part of a broader perspective on learning. research in science education, 36(1-2), 111-139. doi: 10.1007/s11165-005-3917-8 schraw, g., & moshman, d. (1995). metacognitive theories. educational psychology review, 7(4), 351-371. doi: 10.1007/bf02212307 segedy, j. r., & biswas, g. (2015). towards using coherence analysis to scaffold students in open-ended learning environments. paper presented at the aied workshops. segedy, j. r., kinnebrew, j. s., & biswas, g. (2015). using coherence analysis to characterize self-regulated learning behaviours in open-ended learning environments.journal of learning analytics, 2(1), 13-48. siadaty, m., gašević, d., & hatala, m. (2016). associations between technological scaffolding and micro-level processes of self-regulated learning: a workplace study. computers in human behavior, 55, 1007-1019. doi: 10.1016/j.chb.2015.10.035 singleton, r., straits, b. c., & straits, m. (1999). approaches to social research oxford university press. new york and oxford. slater, s., joksimović, s., kovanovic, v., baker, r. s., & gasevic, d. (2017). tools for educational data mining: a review. journal of educational and behavioral statistics, 42(1), 85-106. doi: 10.3102/1076998616666808 sonnenberg, c., & bannert, m. (2015). discovering the effects of metacognitive prompts on the sequential structure of srl-processes using process mining techniques. journal of learning analytics, 2(1), 72-100. srikant, r., & agrawal, r. (1996). mining sequential patterns: generalizations and performance improvements. paper presented at the international conference on extending database technology. stackebrandt, e., & goebel, b. (1994). taxonomic note: a place for dna-dna reassociation and 16s rrna sequence analysis in the present species definition in bacteriology. international journal of systematic and evolutionary microbiology, 44 (4), 846-849. stark, d., & vedres, b. (2012). social sequence analysis. the emergence of organizations and markets, 347-364. studer, m., mueller, n. s., ritschard, g., & gabadinho, a. (2010). classer, discriminer et visualiser des séquences d'événements. paper presented at the egc. taub, m., azevedo, r., bouchet, f., & khosravifar, b. (2014). can the use of cognitive and metacognitive self-regulated learning strategies be predicted by learners’ levels of prior knowledge in hypermedia-learning environments? computers in human behavior, 39, 356-367. taub, m., azevedo, r., bradbury, a. e., millar, g. c., & lester, j. (2017). using sequence mining to reveal the efficiency in scientific reasoning during stem learning with a game-based learning environment. learning and instruction. taub, m., azevedo, r., bradbury, a. e., millar, g. c., & lester, j. (2018). using sequence mining to reveal the efficiency in scientific reasoning during stem learning with a game-based learning environment. learning and instruction, 54, 93-103. taub, m., mudrick, n. v., azevedo, r., millar, g. c., rowe, j., & lester, j. (2016). using multi-level modeling with eye-tracking data to predict metacognitive monitoring and self-regulated learning with c rystal i sland. paper presented at the international conference on intelligent tutoring systems. tsai, c.-w., shen, p.-d., & tsai, m.-c. (2011). developing an appropriate design of blended learning with web-enabled self-regulated learning to enhance students' learning and thoughts regarding online learning. behaviour & information technology, 30(2), 261-271. doi: 10.1080/0144929x.2010.514359 van der aalst, w. m. (2016). process mining: data science in action: springer. van der merwe, w. a. j. (2013). towards a conceptual model of the relationship between corporate trust and corporate reputation. university of pretoria. van krevelen, d. w., & te nijenhuis, k. (2009). properties of polymers: their correlation with chemical structure; their numerical estimation and prediction from additive group contributions : elsevier. van laer, s., & elen, j. (2016). adults’ self-regulatory behaviour profiles in blended learning environments and their implications for design. technology, knowledge and learning, 1-31. van laer, s., & elen, j. (2018). an instrumentalized framework for supporting learners’ self-regulation in blended learning environments. in m. j. spector, b. b. lockee, & m. d. childress (eds.), learning, design, and technology: an international compendium of theory, research, practice, and policy : springer, cham. van laer, s., jiang, l., & elen, j. (2018). the effect of cues for reflection on learners’ self-regulated learning through changes in learners’ learning behaviour and outcomes. computers & education, under review. veenman, m. v. (2007). the assessment and instruction of self-regulation in computer-based environments: a discussion. metacognition and learning, 2(2), 177-183. veenman, m. v., & alexander, p. (2011). learning to self-monitor and self-regulate. handbook of research on learning and instruction, 197-218. veenman, m. v., bavelaar, l., de wolf, l., & van haaren, m. g. (2014). the on-line assessment of metacognitive skills in a computerized learning environment. learning and individual differences, 29, 123-130. doi: 10.1016/j.lindif.2013.01.003 veenman, m. v., prins, f. j., & verheij, j. (2003). learning styles: self‐reports versus thinking‐aloud measures. british journal of educational psychology, 73(3), 357-372. veenman, m. v., van hout-wolters, b. h., & afflerbach, p. (2006). metacognition and learning: conceptual and methodological considerations. metacognition and learning, 1(1), 3-14. williamson, g. (2015). self-regulated learning: an overview of metacognition, motivation and behaviour. winne, p. (2016). self-regulated learning. sfu educational review, 1(1). winne, p. h. (2005). key issues in modeling and applying research on self‐regulated learning. applied psychology, 54(2), 232-238. winne, p. h. (2010). improving measurements of self-regulated learning. educational psychologist, 45(4), 267-276. winne, p. h. (2014). issues in researching self-regulated learning as patterns of events. metacognition and learning, 9(2), 229-237. doi: 10.1007/s11409-014-9113-3 winne, p. h. (2018). theorizing and researching levels of processing in self‐regulated learning. british journal of educational psychology, 88(1), 9-20. winne, p. h., & baker, r. s. (2013). the potentials of educational data mining for researching metacognition, motivation and self-regulated learning. jedm| journal of educational data mining, 5(1), 1-8. winne, p. h., & hadwin, a. f. (1998). studying as self-regulated learning. metacognition in educational theory and practice, 93, 27–30. winne, p. h., nesbit, j. c., & popowich, f. (2017). nstudy: a system for researching information problem solving. technology, knowledge and learning, 22(3), 369-376. doi: 10.1007/s10758-017-9327-y winne, p. h., & perry, n. e. (2000). measuring self-regulated learning. yang, x., li, j., & xing, b. (2018). behavioral patterns of knowledge construction in online cooperative translation activities. the internet and higher education, 36, 13-21. doi: 10.1016/j.iheduc.2017.08.003 zaki, m. j. (2001). spade: an efficient algorithm for mining frequent sequences. machine learning, 42(1-2), 31-60. doi: 10.1023/a:1007652502315 zhou, m. (2016). data pre-processing of student e-learning logs information science and applications (icisa) 2016(pp. 1007-1012): springer. zhou, m., xu, y., nesbit, j. c., & winne, p. h. (2010). sequential pattern analysis of learning logs: methodology and applications. handbook of educational data mining, 107, 107-121. zimmerman, b. j. (1986). becoming a self-regulated learner: which are the key subprocesses? contemporary educational psychology, 11, 307–313. zimmerman, b. j. (1990). self-regulated learning and academic achievement: an overview. educational psychologist, 25(1), 3–17. doi: 10.1207/s15326985ep2501_2 zimmerman, b. j. (1998). academic studing and the development of personal skill: a self-regulatory perspective. educational psychologist, 33 , 73–86. zimmerman, b. j. (2000). self-efficacy: an essential motive to learn. contemporary educational psychololgy, 25(1), 82-91. doi: 10.1006/ceps.1999.1016 zimmerman, b. j., & pons, m. m. (1986). development of a structured interview for assessing student use of self-regulated learning strategies. american educational research journal, 23(4), 614-628. doi: 10.3102/00028312023004614 zimmerman, b. j., & schunk, d. h. (2001). self-regulated learning and academic achievement: theoretical perspectives : routledge. codepen gutzwillerlatzko frontline learning research special issue vol.8 no.5 (2020) 47 69 issn 2295-3159 happy victimizing in emerging adulthood: reconstruction of a developmental phenomenon? eveline gutzwiller-helfenfinger a & brigitte latzkob auniversity of duisburg-essen, germany buniversity of leipzig, germany article received 1 june 2018 / revised 7 november / accepted 21 november 2019 / available online 1 july 2020 abstract this study contributes to a developmental approach focusing on emotions as being of key significance in explaining the happy victimizer pattern (hv pattern) among adults. based on findings from our own research on moral emotions within the happy victimizer paradigm, we claim that a purely cognitive approach to explain the hv is overly narrow. instead, we argue that emotion attributions serve as a source for moral motivation. by identifying new dimensions (i.e., deontic judgment; own action choice; self-constructed emotion attributions) to explain the complexity of moral functioning in emerging adulthood, the current study contributes to a theoretical and methodological framework that integrates both cognitive and emotional processes to bridge the gap between moral thought, emotion, and action with the aim of fostering moral learning across the lifespan. keywords: happy victimizer phenomenon; adult moral development; moral emotions; developmentally appropriate assessment info corrseponding author email: eveline.gutzwiller-helfenfinger@uni-due.de doi: https.www.doi.org/10.14786/flr.v8i5.382 1. introduction the happy victimizer phenomenon (hvp ) denotes the empirical finding that children aged four to seven ascribe positive emotions like satisfaction or happiness to a rule transgressor although they know that a moral rule was broken (arsenio & kramer, 1992; arsenio & lover, 1995; nunner-winkler & sodian, 1988; nunner-winkler, 1999, 2012). in contrast, older children ascribe negative emotions like guilt or shame to the rule transgressor. the classical explanation according to nunner-winkler and sodian (1988) states that moral cognitions (making and justifying judgments) develop earlier than moral emotions. the attribution of positive emotions to a rule transgressor is interpreted as indicating a lack of moral motivation (nunner-winkler & sodian, 1988), the lack of moral motivation representing a lack of readiness to act on moral commitments (thorkildsen, 2013). based on the assumption that moral emotions like guilt can be seen as showing that the self not only knows a moral rule, but also feels committed to it (gibbard, 2002; malti, 2010; malti, gummerum, keller, & buchmann, 2009), the hvp can also be interpreted as a lack of moral commitment accompanying the absence of negative (i.e., moral) emotion attributions. owing to the well-established finding that by age eight or nine a shift towards negative emotion attributions can be observed, the hvp is also understood as representing a developmental transition, with the attribution of positive emotions being replaced by the attribution of negative emotions in the course of moral development (arsenio et al., 2006; lagattuta, 2005; krettenauer, malti, & sokol, 2008). according to this understanding, the hvp is restricted to preand early school age. findings from recent studies, however, challenge this position, as identical reasoning patterns (i.e., judging a transgression as wrong while attributing positive emotions to the transgressor) can be found for adolescents and adults as well (e.g., heinrichs, minnameier, gutzwiller-helfenfinger, & latzko, 2015; krettenauer, asendorpf, & nunner-winkler, 2013; krettenauer & eichler, 2006; nunner-winkler, 2007). therefore, the question arises how the occurrence of these reasoning patterns in adolescence and adulthood can be explained.2 researchers have offered different explanations for the occurrence of the hv pattern in adolescence and adulthood. heinrichs and colleagues (2015) for example argue that specific factors of a given moral conflict and its associated context might influence individuals’ respective evaluation (situation-specifity). minnameier and schmidt (2013) conceptualise the hv pattern in adolescence and adulthood as a particular moral judgment structure triggered by situation-specific factors and used by individuals to adjust to the requirements of the situation (adjustment-focused). from a perspective of the development of moral emotions, we raise the question whether the classical interpretation of the hv as a lack of moral motivation (nunner-winkler & sodian, 1988; nunner-winkler, 2007; 2013a) also applies to adolescence and adulthood. a second issue to be resolved is whether the hvp actually does disappear in the course of development. this is a relevant question: if the hvp represents a developmental transition (e.g., krettenauer et al., 2008) which is experienced by all children (i.e., a normative transition), then we need longitudinal research to indicate whether adults showing the hv pattern are displaying delayed or even dysfunctional moral development. first longitudinal findings have not yielded a clear picture (krettenauer et al., 2013). however, the basic question whether the judgment-emotion-attribution-justification patterns found in adults really represent the hvp (as documented for children) can already be approached by cross-sectional research. the aim of our study is to investigate whether the hvp or similar judgment-emotion attribution-justification patterns can be found in adults, and if so, how these patterns might be explained. more specifically, the question is whether the explanation used with children, that is, a lack moral motivation, also applies to adults. to address these aims, we also need to consider measurement issues related to the age-sensitive assessment of moral rule knowledge and moral emotions. 1.1 assessing the happy victimizer phenomenon in childhood in the classical study by nunner-winkler and sodian (1988), the hvp as such was studied systematically for the first time, although the term “happy victimizer” was not yet used. the authors sought to replicate and extend findings from a previous study by barden, zelko, duncan and masters (1980) who had observed that children aged four to five ascribed mainly positive emotions (mostly happiness) to a protagonist whose theft passed undetected, whereas children aged nine to ten and twelve to thirteen attributed negative emotions, especially fear and sadness. nunner-winkler and sodian (1988) sought to more deeply investigate children’s understanding of moral emotions: “if the developmental trend observed by barden et al. (1980) is a stable phenomenon, this may point to an important change in children's conceptions of the determinants of emotion between the preschool and the elementary school years” (nunner-winkler & sodian, 1988, p. 1324). they devised and implemented a series of three experiments to pursue this issue. the first experiment served as a test of the generality of the expected emotion attribution patterns and was intended also to offer first explanations for these patterns. children aged 4, 6 and 8 were given two emotion attribution and one moral judgment task, all of them including pictures so assist children’s understanding. the emotion attribution tasks included two parallel stories about a protagonist taking or not taking sweets or chestnuts from another child in that child’s absence (the term “stealing” was not used). each story was therefore available in a moral (not taking) and an immoral version (taking the sweets/chestnuts). counterbalancing was used in that children were presented only one version of each story, that is, either the moral version of the first and the immoral version of the second story, or vice versa. for each story, children were first told that the protagonist considered taking the sweets/chestnuts and then asked whether the protagonist was allowed to do so. this question was used to control for children’s rule understanding, namely, whether they knew that one is not allowed to take another’s belongings. afterwards, they were asked to tell how the protagonist felt. although the experimenter did not ask children to justify their emotion attributions, most of them did so. accordingly, justifications were included in analyses. in the subsequent moral judgment task, children were told a story about two children, each of whom had stolen a toy car from a friend. the experimenter asked whether it was right for them to have taken the car or not, and why. then, pictures of both protagonists were shown, with one having a happy face (because s/he now has the beautiful car) and the other having a sad face (because s/he is sorry for taking the car). children were then asked whether the happy or the sad child was worse, or whether both were the same and asked to justify their judgment. in the second experiment, the potential influence of the salience of morality in a given context on children’s emotion attributions was addressed in a sample of children aged 4 to 5. salience of morality was manipulated along the dimensions of (a) tangibility of profit of an immoral action (achieving possession of a desired object vs. managing to annoy another child); and (b) severity of transgression (telling a lie in story 1 vs. physically harming another child in story 2). story 1 (telling a lie) was acted out using two puppets. story 2 (harming) was narrated and accompanied by coloured drawings. half of the children were assigned to the tangibleand non-tangible profit conditions, respectively, and stories were told in counterbalanced order. understanding of the story was checked directly after introducing the rule transgression. after each story, first the test question (“what do you think? how does [protagonist] feel now?” “why?”) and then the control question tapping rule understanding (“what do you think about what [protagonist] did: was it right or was it not right?” “why?”) were asked. after finishing both stories children were asked whether the protagonist who sent another in the wrong direction or the protagonist who pushed another child from the swing was worse or whether they were both the same. experiment 3 investigated whether the attribution of positive emotions to a wrongdoer was limited to instances of intentional harm in a sample of 4-to-5-year-olds. four contrasting conditions were defined: intentional harm by ill-motivated actor; unintentional harm by ill-motivated actor; unintentional harm by neutrally motivated actor; and bystander witnessing someone being hurt. the harm done always referred to physical injury in the context of children playing together. four story frames were constructed, and each frame was used to create four stories representing the four contrasting conditions. movable coloured figures were used to enact the stories. each child was presented four different stories, one for each of the experimental conditions (including counterbalancing and randomisation of story frames and experimental conditions, respectively). after each story, first emotion attributions (“what do you think? how does [protagonist] feel now?” “why?”) and then moral judgments (“was [protagonist] bad or was she [he] not bad?” “why?”) were elicited. if emotions were not attributed spontaneously, the experimenter asked: “do you think [protagonist] is happy or do you think [protagonist] is sad?” “why?”. after eliciting emotion attributions and moral judgments, a control question referring to intentionality of harm was asked: “did [protagonist] hurt [victim] intentionally or did he [she] hurt him [her] not intentionally?”. for the bystander condition, the control question was: “did [protagonist] hurt [victim] or did he [she] watch [victim] being hurt?” to sum up, slightly different methods (stimulus materials, nature and functions of the questions asked, and sequence of the questions) were used in each experiment. in experiment 1, moral emotion attributions and moral judgment were assessed separately. in the emotion attribution task (coming before the moral judgment task) children’s moral rule understanding was determined before eliciting emotion attributions. in experiment 2, children’s emotion attributions were elicited before determining their moral rule understanding. moral judgment was assessed last, after finishing the two stories involved, but not in a separate task. in experiment 3, children’s emotion attributions were elicited before moral judgments. again, the latter were not assessed in a separate task. also, control questions were used for intentionality or bystander perspective. especially the nature and function as well as the sequence of the questions used pose a challenge when it comes to deconstructing the hv. interestingly, nunner-winkler and sodian (1988) never used the term “happy victimizer” for the phenomenon they explored but mentioned a “happy wrongdoer”3. in subsequent research, a more standardised assessment procedure was developed, leading to the following prototype: presenting the story; assessment of rule understanding; introducing the transgression; asking for a moral judgment and its justification; and eliciting the attribution of an emotion to the perpetrator and the justification of that emotion. variations included for example asking control questions ensuring understanding of the story (e.g., gasser & keller, 2009), probing for deserved punishment (e.g., smetana, toth, cicchetti, bruce, kane, & daddis, 1999), severity of transgression (e.g., smetana et al., 1999), evaluation of interpersonal consequences (e.g., malti & keller, 2009), ruleand authority independence (e.g., malti, gasser, & gutzwiller-helfenfinger, 2010), or including the perspective of self-as-perpetrator in addition to other-as-perpetrator (e.g., keller, lourenço, malti, & saalbach, 2003). often, following nunner-winkler and sodian (1988), additional materials like pictures (e.g., keller, 2006), cartoons (e.g., malti & keller, 2009) or figurines and dolls (e.g., woolgar, steele, steele, yabsley, & fonagy, 2001) were used to illustrate or even enact the story vignettes and support children’s understanding. for further discussions of methodological variations see for example krettenauer, malti and sokol (2008). some variation can also be found for the assessment of moral judgments and emotion attributions in particular. with respect to morally judging the transgression (sometimes also termed understanding of rule validity to emphasise the necessity of knowing a rule and its validity before being able to judge its transgression as wrong), various, slightly differing probes were used, like for example “is it right what x (victimizer) did? why/why not?” (e.g., keller et al., 2003; malti et al., 2009); “is it right to do what the victimizer did? why/why not?” (malti & keller, 2009); “is it right or wrong to do x? why?” (e.g., gasser & keller, 2009); or “is it okay or not okay for the child to do x? why? ”. action choice and its moral evaluation were used in one out of four vignettes in the study by malti & keller (2009): “how does the protagonist decide in this situation? why?” “is this the right decision or not? why?” thus, whereas nunner-winkler and sodian (1988) used the concept of “being allowed”, other researchers focused also on the “rightness” of an act, the dichotomies of “right vs. wrong” or “okay vs. not okay”, or, very rarely, asked for an action decision and its justification. with respect to eliciting emotion attributions (also called emotion expectancies to specify that emotions attributed represent what children expect others to feel in a given situation), again, slightly differing probes were utilised. regarding attributing emotions to the transgressor, probes like “how does x (victimizer) feel at the end of the story? why does s/he feel this way?” (e.g., keller et al., 2003); “how does the victimizer feel? why?” (e.g., malti & keller, 2009); or “how do you think this child will feel after s/he (x)es?” why? (malti et al., 2010) were used. when attributing emotions to the self as transgressor, probes included “how would you feel if you had done that? why would you feel that way?” (e.g., malti et al., 2009) or “how would you feel if you did x? ‘why would you feel that way?” (e.g., gasser & keller, 2009). in some studies, attribution of victims’ emotions was also elicited (e.g., gasser, malti, & gutzwiller-helfenfinger, 2012) using simple probes like “how does the victim feel? (why?)”. sometimes, children could freely attribute emotions (e.g., gutzwiller-helfenfinger, gasser, & malti, 2010), whereas in some studies they were either presented with response scales including a variety of emotions (e.g., gasser et al., 2012) or depicting gradations between “good” and “bad” (in some cases accompanied by schematically drawn faces, e.g., in the study by arsenio & kramer, 1992). to sum up, while nunner-winkler and sodian used a general probe about the way the protagonist feels, introduced by “what do you think…?”, other researchers used also more direct probes not stressing participants’ thinking, specifications of the point in time (at the end of the story, after s/he xes) or – in the case of self-as-perpetrator – formulations including conditionals (“would”), sometimes including also pre-defined response scales. as indicated above there is no doubt that happy victimizer patterns can be found in adolescents and adults. a large part of the studies addressing the hv either in adolescence or adulthood from a developmental psychological perspective have focused on its relationship with antisocial or aggressive behaviour (for a review see malti & krettenauer, 2013). empirical findings indicating that immoral conduct (e.g. breaking rules, aggressive behaviour) is, in part, related to a lack of moral emotions (malti & krettenauer, 2013) underline the assumption that moral emotions have a great impact on regulating social interaction. moral emotions are considered to be the key elements of socio-emotional competences, because they help children and adolescents (and even adults) to anticipate the outcomes of socio-moral events and adjust their social interaction accordingly (arsenio, gold, & adams, 2006; malti, 2010). within this context moral emotions have the power to regulate social interaction in the sense of providing the motivation to do the good and avoid doing the bad (kroll & egan 2004). therefore, investigating the hv, particularly the question how moral emotions impact adolescents’ and adults’ social behaviour is of great significance in explaining the causes of adaptive and maladaptive behaviour. krettenauer et al. (2013) reported a significant relationship between moral emotion attributions at the ages of 18 and 23 with antisocial conduct at age 23. additionally, they found that moral emotion attributions at the ages of 18 and 23 predicted antisocial conduct at age 23, both directly and indirectly. a study by perren and gutzwiller-helfenfinger (2012) showed that moral emotion attributions (a lack of remorse) predicted both traditional and cyberbullying in adolescents aged 12-19. moreover, a vast body of research on moral disengagement in children, adolescents and adults has consistently shown that the selective activation of moral distancing processes enables individuals to feel indifference and even happiness when harming others’ welfare (for an overview, see bandura, 2016). finally, a set of studies on cheater’s high indicated that although individuals predicted that they would feel guilty and experience increased negative affect after acting unethically, those individuals who actually did cheat consistently experienced more positive feelings than those who did not (ruedy, moore, gino, & schweitzer, 2013). however, two major issues are still unresolved, one of them of a conceptual and one of them of a methodological nature. first, conceptually, most of these studies have used the hv as an explanatory variable, shedding light on the role moral motivation (to be exact: emotion attributions) plays in explaining negative behaviour in children and youth. only few studies have investigated the hv in adolescence or adulthood as a phenomenon in its own right (nunner-winkler, 2007; minnameier, 2012) and sought to explain it. accordingly, the question why hv patterns can be found in children, adolescents, and adults still remains unresolved, as remains also the question whether the childhood phenomenon is identical with the hv patterns occurring in later life. a purely cognitive-structural explanation, postulating that the hvp can be reconstructed as a specific moral judgment structure, that is, level 2b reasoning (cf. minnameier, 2012; minnameier & schmidt, 2013), is not sufficient and falls short because both the phenomenon and its explanation are assessed on the basis of one and the same judgment-emotion attribution-justification pattern. more precisely, the pattern is used both to “diagnose” the hvp and to identify the respective moral judgment structure, leading to circular reasoning. presently we do not know whether the specific moral judgment structure (2b) can also be found in persons not displaying the hvp. only if the judgment structure cannot be found in persons not displaying the hvp can we conclude that this specific structure is found only within the hvp and might thus offer a potential explanation thereof. both the transitional and the dysfunctional hypotheses, however, are in line with the assumption that developmental changes in moral motivation cannot be explained solely as caused by changes in moral judgment (krettenauer & montada, 2005). instead, moral rule knowledge and self-evaluating moral emotions are increasingly coordinated (krettenauer & montada, 2005). according to blasi (1993, 1999) and damon and hart (1988) these processes are based on the development of a moral self or a moral identity, respectively. one’s moral identity shows itself to the extent that moral notions, such as being fair, just, and good are important to one’s self-understanding (blasi, 1984). or, conversely, when the self is not constructed or defined with reference to moral categories and shows no commitment to moral values, one does not have a moral identity (see lapsley & narvaez, 2004, p.192). taken together this position points towards a “weak” moral self as a further explanation supporting the dysfunctional hypothesis (cf. krettenauer, 2012) and raises the question whether hv is associated with a lower commitment to moral values. a second issue relates to the measurement of hv across different age groups. we have to keep in mind that investigating a developmental phenomenon necessitates the implementation of developmentally appropriate, that is, sensitive, assessment methods. to learn more about the hv across different age groups, we must ensure that we adequately measure the underlying moral mechanisms. only then can we ascertain whether the phenomenon (as a judgment-emotion attribution-justification pattern) actually can be found in adults, that is, in analogy to findings for children. thus, we have to analyse the conceptual foundations of the phenomenon as well as its assessment across the various age groups. accordingly, we will first analyse the hvp, that is, dissect it into its constituent parts on the basis of the measurement method used in the original studies on children. this will enable us to critically discuss existing findings on the hv pattern in adults. afterwards, we will transfer the operationalisation of the individual components as used for investigating children to adulthood, that is, develop an operationalisation of these components that represents a developmentally appropriate assessment suited to study the phenomenon in adults. this new measurement approach will then be used to investigate a) whether and how the phenomenon (judging a transgression as wrong while attributing positive emotions to the transgressor) manifests itself in young adults, addressing the reconstruction of the phenomenon; and b) whether the moral reasoning structures found actually represent the hvp or whether they reproduce a specific judgment-emotion attribution-justification pattern which resembles the hvp on the surface but means something different on the conceptual level. the core research questions we pursue are: what patterns of moral judgments, emotion attributions, and associated justifications do we find in young adults? can the hvp be reconstructed for adults on the basis of our findings? additionally, we wanted to explore the potential relationship between participants’ patterns of moral judgments, emotion attributions, and associated justifications and their commitment to moral values. we hypothesised that participants displaying hv reasoning patterns would show a lower level of commitment to moral values than participants not displaying hv reasoning patterns. 2. method 2.1 participants 285 pre-service teachers enrolled in a teacher education programme at the university of leipzig (germany) participated in the study. participants’ age ranged from 19 to 40 years (mage=21,74, sd=2.59), with 98.2% of the sample being 29 years or younger. 69% of participants were female. this corresponds well with the overall gender balance of the german pre-service-teacher-population. 60% of participants were enrolled in a secondary i programme, whereas 40% attended a special needs education programme focusing on socio-emotional development. most of the students studied humanities (21%) and languages (25%), only a few studied subjects related to natural sciences (5%). study participation took place in the context of a lecture on developmental psychology and was voluntary and anonymous. 2.2 instruments and procedure following krettenauer and eichler’s (2006) argumentation relating to a potential social desirability bias in adolescents’ and emerging adults’ responses in an individual interview setting, we decided to provide a half-standardised paper-and-pencil questionnaire to provide a more anonymous setting in which participants feel free to share their (written) reflections about rule transgressions and the emotions involved with the researcher. the questionnaire consisted of three parts: in the first part, participants were asked to work through two vignettes, each reflecting a moral norm conflicting with personal desires in order to assess their moral judgements, emotion attributions, and their respective justifications. in the second part, participants’ moral values were assessed. in the third part, participants provided some general sociodemographic information. 2.2.1 moral judgments, emotion attributions, and justifications in line with the traditional happy victimizer paradigm three vignettes describing the following moral rule transgressions were used: keeping excess change money (10 euros) after buying a new bicycle light (gutzwiller-helfenfinger & perren, 2015; 2016) (this vignette was called the “change money” vignette; see also study 2 in heinrichs, gutzwiller-helfenfinger, latzko, minnameier, & döring, this issue); breaking one‘s promise to wait for a prior customer while selling the motorbike to someone offering a better price (see döbert & nunner-winkler, 1983; see also study 1 in heinrichs, gutzwiller-helfenfinger et al., this issue) (this vignette was called the “the motorbike” vignette); lying to a potential customer to prevent him/her from employing another company (minnameier & schmidt, 2013) (this vignette was called the “lying to a customer” vignette). the “change money” vignette involved a passive moral temptation, where a protagonist has no intention to transgress and only realises that s/he might do so as a result of specific circumstances (gutzwiller-helfenfinger & perren, 2015; 2016; heinrichs, minnameier, gutzwiller-helfenfinger, & latzko, 2015), while the other scenarios involved proactive transgressions. each participant worked through a combination of two vignettes. the combinations were as follows: change money-the motorbike, lying to a customer-change money, the motorbike-lying to a customer. female participants received a female version (jana, maria, petra) whereas male participants received corresponding male versions (jan, mark, peter). the order of vignettes per combination was counterbalanced. the english version of the vignettes can be found in the appendix. to provide developmentally appropriate assessment of adults’ moral competencies in the context of the vignettes, we used an extended measurement approach regarding moral rule understanding and moral emotions. moral rule understanding was not only assessed by asking participants to judge the transgression, but, to gain insights into participants’ initial construction of the situation, by asking them to make a deontic judgment and to justify it. to give more room to potential complexity and richness of emotion attributions, participants had to indicate first whether they ascribed positive, negative, or mixed (i.e., both positive and negative) emotions to the protagonist and then to specify (i.e., construct) the emotion(s) attributed and justify them. the exact procedure was as follows: for each vignette, participants were first asked to make a deontic judgment: they had to indicate what the protagonist should do by ticking the appropriate box (transgress, not transgress) and to justify their judgment. for example, in the “change money” vignette, they had to indicate whether jana should keep the money or give it back, and write down the reason for doing so in their own words. afterwards, the vignette was continued by saying that the protagonist had transgressed the moral rule. in the “change money” vignette, this was expressed as follows: “let us suppose that jana kept the money”. participants then had to judge the rule transgression (classical happy victimizer judgment): they indicated whether the behaviour was “wrong” or “right” by marking the appropriate answer; afterwards, they had to tick one out of three boxes marked “good”, “bad”, and “mixed” to indicate how the protagonist felt after the transgression. additionally, they were asked to specify the exact emotion(s) in their own words and to write in their own words why the protagonist felt that way. finally, participants had to make a self-judgment. they had to indicate what they themselves would do in the given situation, that is, transgress or not transgress, by ticking the appropriate box (e.g., keep or give back the money); justify this decision in their own words; attribute emotions to themselves by ticking “good”, bad”, or “mixed”; specify the exact emotion(s) in their own words; and justify the emotion attribution(s) in their own words. 2.2.2 moral values the second part of the questionnaire assessed the values participants were committed to by using the ideal self values scale (pratt, hunsberger, pancer, & alisat, 2003). the scale includes the following twelve values: “polite and courteous”, “trustworthy”, “good citizen”, “honest/truthful”, “ambitious/hard-working”, “be open and communicate”, “careful/cautious”, “independent”, “kind and caring”, “fair and just”, “loyal”, “integrity”. six of these values belong explicitly to the moral domain (“trustworthy”, “good citizen”, “honest/truthful”, “kind and caring”, “fair and just”, “integrity” and represent a general index of commitment to a moral valuing self (campbell, 2004; pratt et al., 2003). participants had to indicate how important the twelve values were for their own life on a 6-point-likert-scale (0=unimportant; 6=important). afterwards, they had to pick and range the three most important values. 2.3 analyses emotion attributions included both a general attribution (good, mixed, bad) as well as an emotion specification (i.e., construction of the respective emotion/s) and related justification of the emotion specification. emotion specifications were categorised separately from justifications. if specifications of “good” or “bad” included more than one emotion, the most concrete and/or most complex was used, following the classification of emotions according to harris (2008). an example for “bad” was “tense, anxious, unwell”. in this case, anxiety was used because it was the most concrete emotion. an example for “good” was “good, proud”. here, pride was coded because pride was both the most concrete and the most complex emotion. an overview of positive and negative emotion specifications categorised according to their complexity is given in the results section (see also tables 5 and 6). in line with study 2 in heinrichs, gutzwiller-helfenfinger et al. (this issue), justifications of judgments and emotion attributions were content analysed using categories from research within the happy victimizer paradigm. as no new inductive categories were found for the change money vignette in relation to the categories identified in study 2 in heinrichs, gutzwiller-helfenfinger et al. (this issue), the existing categories were summarised into the following category groups: morality (i.e., referring to moral principles), empathy towards the victim, consideration of consequences, law and order, hedonism, blaming the victim, and affective distancing (i.e., stating that the protagonists’ emotions cannot be inferred). inter-rater reliability including two independent raters (10% of scenarios) was high (percentage of perfect agreement = 96.8 %, cohen’s kappa κ = .81). inter-rater reliability for emotion specifications was perfect (percentage of perfect agreement = 100%; cohen’s kappa κ = 1.0). data from the ideal self values scale were analysed for internal consistency. for both the moral (6 items) and the non-moral (6 items) subscales, cronbach’s alpha was calculated. both subscales had only moderate internal consistency (.50 for the moral and .47 for the nonmoral subscales, respectively). accordingly, the six moral values were used as single-item measures of the respective values. 3. results in order to reconstruct the happy victimizer phenomenon, we combined its basic and constitutive elements in a step-by-step procedure. in a first step, to refer back to the original hvp, we included only data from the classical happy victimizer condition referring to the evaluation of the rule transgression. accordingly, we identified participants who judged the transgression as wrong while attributing positive or mixed emotions to the perpetrator. table 1 shows the distribution of participants across the categories of “pure happy victimizer”, “mixed happy victimizer” (attributing mixed emotions) and “no happy victimizer” for the first vignette. (to include all participants the distribution is shown across all vignettes given in the first situation.) as can be seen, almost no one judged the respective transgression as wrong while attributing purely positive emotions (2.2%), indicating that we found only very few participants displaying the pure happy victimizer. however, almost half of participants judged the transgression as wrong and attributed mixed, that is, positive and negative emotions (42.5%). table 1 distribution of participants in the classical happy victimizer condition across vignettes for the first vignette presented if we break this down for the individual vignettes, we see that the distribution of participants across the three hv categories differs, with the “lying to a customer” vignette having the highest number of participants belonging in the no hv category (table 2). table 2 distribution of participants in the classical happy victimizer condition across situations 1 and 2 for each vignette additionally, we took a closer look at the data from the 104 participants categorised as no hv in the classical victimizer condition4 in the “change money” vignette. 50 (48.1%) of those (104) participants actually said that it was okay for jana to keep the money. 14 participants attributed positive, 3 attributed negative, and the remaining 33 attributed mixed emotions. similar distributions were found for the no hv category the “the motorbike” and the “lying to a customer” vignette table 3 adding the deontic judgment to the classical happy victimizer condition across vignettes for situation 1 in a second step, we added the data from the deontic judgment condition to see what the distribution of hv patterns as displayed in table 1 would look like. this meant that we now considered whether participants had also spontaneously said in the deontic judgment that the protagonist should not transgress (i.e., give back the money in the “change money” vignette), representing a more appropriate assessment of moral rule understanding in adults. thus, in the “pure hv” category we now had those participants who had initially (deontic judgment) said that the protagonist should not transgress and who afterwards – when the transgression had been introduced 5 – said that the transgression was not okay but had attributed positive emotions to the protagonist who had transgressed. only 2.2% of participants actually belonged in that category. however, the “mixed hv” category increased, with 64.5% of participants initially saying that the protagonist should not transgress and who afterwards judged the transgression as wrong but still attributed mixed (i.e., positive and negative) emotions to the protagonist. the “no hv” category, accordingly, had shrunk to 33.3% (see table 3). a chi square test revealed a significant change of the distribution of the different hv patterns (“pure”, “mixed”, “no”) by judgment condition, that is, classical vs. deontic, χ2(4,279) = 317. 38, p >.001. in a third step, we focused on the justifications participants gave for the positive emotions they attributed to the perpetrator in the classical hv condition. due to the vignette-effect reported above, we decided to perform these in-depth analyses for individual vignettes. we selected the “change money” vignette because the situation depicted there (getting too much change) was closest to participants’ everyday life-experience. table 4 justifications of positive emotions in the change money vignette for the pure and mixed hv categories a these instances were formulated in the negative, for example, having no empathy for the victim. according to the classical findings by nunner-winkler and sodian (1988), positive emotions should be justified by hedonistic reasons. as most participants attributing positive emotions actually attributed mixed emotions, that is, positive and negative emotions (see above), we again included both participants showing the “pure” and participants showing the “mixed” hv pattern. hedonistic reasons were the predominant category used to justify positive emotions (80.9%; see table 4). still, most of the other justification categories were also used in this vignette, though rather infrequently. table 5 specification and justification of emotion attribution “good” the analysis of the specifications and justifications of positive emotions in the “change money” vignette for the pure and mixed hv categories revealed that one and the same emotion attributed, that is “good”, assumed different meanings. to illustrate this finding, three different specifications and justifications of the response category “good” are shown in table 5. thus, “good” meant happiness in response 190 whereas in 275 it expresses the rejection of any concern for the shop assistant. in response 2, “good” assumed the meaning of “feeling comfortable”. to find out whether the various specifications of “good” represented different levels of emotional complexity, we summarised and categorised them on the basis of harris’ (2008) taxonomy of emotions (see also pons, harris, & de rosnay, 2004) (see table 6). there is agreement among experts that emotions run at different levels of complexity, for example basic/primary, secondary, and tertiary level emotions (parrott, 2001). all seven levels of complexity were found, with the majority of specifications covering levels 1 and 2, the lowest two levels of complexity. table 6 emotion specification of the emotion attribution “good” the same analysis was performed for negative emotions. table 7 displays the summarised emotion specifications for “bad” categorised according to harris (2008). here, seven out of ten levels of complexity were covered. while a large part of specifications covered the lowest three levels, a substantial portion (32.9%) ranged on level six referring to bad conscience and guilt, indicating more complexity and differentiation for specifications of “bad”. table 7 emotion specification of the emotion attribution “bad” in a fourth and last step we explored the potential relationship between participants’ hv status assessed in the classical hv condition in the “change money” vignette and the moral values they identified as relevant to themselves. due to the low internal consistency of the moral subscale, separate univariate anovas were performed for each item (value), that is, the degree to which that value was considered important to the self (0=unimportant to 6=important). as cell size was <5 for several cells, the hv categorisation was collapsed into no hv and hv. only for honest/truthful was a significant difference found: participants categorised as hv ascribed more importance of honest/truthful to themselves than participants categorised as no hv (µhv=5.17, sdhv=.89; n=47; µnohv=4.58, sdnohv=1.25, n=98; f[1,98=6.94, p=.01, eta2=.07]. 4. discussion the present study had two aims. first, we wanted to investigate whether the happy victimizer phenomenon can be found in emerging adults, and if so, whether the explanation used with children, that is, a lack moral motivation, also applies to emerging adults. to achieve this, we reconstructed the hv in a step-by-step analytic procedure based on written data from 285 pre-service teachers working through a set of hypothetical vignettes. as our second aim, we wanted to explore the potential relationship between participants’ patterns of moral judgments, emotion attributions, and associated justifications and their commitment to moral values. our stepwise reconstruction of the hv using a developmentally appropriate measurement approach in our sample of emerging adults yielded a number of noteworthy findings. results from our first step indicated that in vignette 1 in the classical hv condition virtually no one (2.2% of participants) judged the transgression as wrong while attributing positive emotions to the perpetrator (i.e., displayed the classical hv reasoning pattern). thus, the classical hv phenomenon hardly ever emerged in our emerging adult sample. however, two fifths were identified as falling into the “mixed” hv category, attributing both positive and negative emotions to the perpetrator while judging the transgression as wrong, confirming earlier findings involving adult samples (heinrichs et al., 2015). hence, this result implies that the classical hv phenomenon can only insufficiently be used to characterise (emerging) adults’ moral functioning in the context of hypothetical vignettes. still, the relatively high percentage of response patterns falling into the mixed hv category indicates that the happy victimizer research paradigm is relevant for the study of moral functioning in emerging adulthood. however, it is necessary to use a measurement approach going beyond the classical assessment procedure to capture the potential complexities of (emerging) adults’ moral functioning. this point will be elaborated on in more detail in the subsequent sections. analysing the distribution of patterns in the individual vignettes we found that, while the classical hv pattern was low for all three vignettes, the proportion of participants showing the mixed hv pattern differed across vignettes. it seems that the specific vignette contexts contributed to the interpretation of the respective rule transgressions in the situations depicted. there is substantial earlier research showing that both the situations and contexts involved and the specific moral principles underlying hypothetical vignettes influence the way they are interpreted and judged (cf. nunner-winkler, 2013b). in this regard, our results also confirm earlier findings on the context and situation specifity of moral judgments (krettenauer & johnston, 2011). additionally, analyses indicated that for all three vignettes about half of those participants falling into the “no hv” group said that it was okay for the perpetrator to transgress, for example, to keep the change money in the “change money” vignette. within the classical hv paradigm involving (young) children this would be seen as indicating insufficient rule knowledge or rule understanding (e.g., nunner-winkler & sodian, 1988). in a sample of emerging adults, it would be absurd to think that this is actually the case. moreover, participants said that the rule should be transgressed, for example, that jana should keep the money, in the deontic judgment condition which came before the transgression was introduced in the classical hv condition. similar findings, that is, participants saying that it is okay for the perpetrator to transgress were reported by heinrichs et al. (2015) and by heinrichs, gutzwiller-helfenfinger et al. (this issue) for studies 2, 3, and 4. in the case of study 2, the sample consisted of 14-year-olds, indicating that such transgression-friendly judgments can already be found in adolescence. that these transgression-friendly judgments were found also in the deontic judgment and in the self judgment conditions in studies 2, 3, and 4 in heinrichs, gutzwiller-helfenfinger et al. (this issue) further indicates that we cannot assume that this result is due to a methodological artefact relating to the use of the classical hv condition, where a transgression is stated as “fait accompli”. results from our second step show that, across vignettes, when more deeply examining emerging adult participants’ rule knowledge by adding the judgment made in the deontic judgment condition, the proportion of participants identified as belonging in the mixed hv category significantly increased, whereas the no hv category shrank. no change was found for the “pure” hv category. thus, in the “mixed” hv category we now had participants initially saying that the protagonist should not transgress and who afterwards judged the transgression as wrong but still attributed mixed (i.e., positive and negative) emotions to the protagonist. accordingly, we can say that insufficient rule knowledge or rule understanding very probably does not lie at the heart of the “pure” and “mixed” hv reasoning patterns. the third analytical step, addressing the justifications participants gave for the positive emotions they attributed to the perpetrator in the classical hv condition in the “change money” vignette, yielded that participants in the “pure” and “mixed” hv categories predominantly mentioned hedonistic reasons. this is in line with the classical finding by nunner-winkler & sodian (1988) as well as subsequent research. nevertheless, in almost 20% of the cases additional justification categories emerged, indicating that in our emerging adult sample the classical pattern of positive emotion attributions as justified by hedonistic reasons is not the only pattern included in participants’ constructions. it seems that those participants’ socio-moral meaning making moved beyond the classical pattern, suggesting that they were able to construct alternative interpretations why a protagonist feels good after breaking a moral rule. of course, the nature of the vignette may have played a vital role in stimulating these interpretations. in the “change money” vignette, no pro-active, planned rule transgression occurs, no negative duty is violated. instead, the protagonist is thrown into the situation, that is, tempted not to give back some money s/he mistakenly receives. this is reflected in the responses of participants classified as displaying one of the hv patterns in the classical hv condition. for example, id 275 argued that jana feels good about keeping the money “because it is not her fault, her own advantage counts for more than the shop assistant’s stupidity”. justifications as these reflect specific strategies of moral disengagement (cf. bandura, 2016) which make it possible for an individual to feel good after breaking a moral rule by cognitive reconstruction of the situation: the behaviour or its consequences are reconstructed as less harmful, the individual’s responsibility is denied or weakened, or the victim is blamed or denigrated (e.g., gutzwiller-helfenfinger, 2015a). further, when analysing specifications and justifications of positive (good”) and negative (“bad”) emotion selections, we found that a broad range of meanings was associated with those specifications. for example, “good” could mean happiness, feeling unconcerned, or feeling comfortable. this differentiation of meanings was further confirmed when we categorised the specifications according to complexity after harris (2008). in the case of positive emotions, all seven levels of complexity were found, with the vast majority of specifications (94.2%) covering the lowest two levels. a different picture emerged for negative emotions. while almost all levels were used (eight out of ten), and while the majority of specifications belonged to the lowest three levels (59.2%), still 39.4% of specifications related to guilt and mention of a bad conscience, the sixth level of complexity. it seems that for negative emotions, guilty feelings are both salient and relevant in participants’ constructions of the situation. thus, the “change money” vignette, despite its context of a passive moral temptation (no intention to transgress) and its relation to a positive duty (which has a weaker moral appeal than negative duties, e.g., belliotti, 1981; see study 2 in heinrichs, gutzwiller-helfenfinger et al., this issue), still triggers also guilty feelings in participants’ evaluation of the transgression. on the methodological level, our results suggest that it is important to have (emerging adult) participants construct (specify) their emotion attributions, instead of offering a pre-selection without asking for any specification, in order to more closely examine their moral functioning. moreover, emerging adults’ use of also more complex levels of emotions going along with the use of justifications other than hedonism provides evidence that emerging adults’ moral meaning-making is differentiated, complex, and in some cases even sophisticated. this can be seen as an indication of complex reasoning processes taking place when emerging adults evaluated this hypothetical moral vignette. and although the results of their reasoning find expression in written form only, still these written answers are sufficient to reflect the processual nature of participants’ responding. thus, what might look similar for children and emerging adults when considering the surface (judging the transgression as wrong while attributing positive emotions) carries differential depths of understanding. accordingly, while the hv framework is still relevant in studying emerging adults’ moral functioning, our results indicate that the patterns found mean something different from the phenomenon as identified in young children. consequently, we vote for using different terms in order to mark this developmentally relevant difference. we suggest to reserve the term “happy victimizer phenomenon” to studies with preschool and young schoolchildren. based on the findings in this study as well as those from study 2 in heinrichs, gutwziller-helfenfinger et al. (this issue) using a similar measurement approach we suggest to use the term “happy victimizer pattern”. the second aim of our study was to explore the potential relationship between participants’ patterns of moral judgments, emotion attributions, and associated justifications and their commitment to moral values, the latter representing a moral self. here, results were modest. the poor measurement properties of the ideal values scale (pratt et al., 2003) as determined from our data did not make it possible to create a moral subscale. it is possible that effects of culture may have influenced the interpretation of these items, as the scale was developed in a us context while we used them in a sample of german pre-service teachers. this calls for a validation of the scale for the european and more specifically, for the german context. accordingly, analyses could only be performed on the level of individual items. from the six moral items, only the item “honest/truthful” yielded a significant effect: participants categorised as displaying hv patterns (mixed or pure) saw this value as more important for their own than those categorised as no hv. however, collapsing hv categories due to the small cell size of the pure hv category is not really satisfactory, because the two categories carry different meanings. in the pure category, only positive emotions were attributed, whereas participants in the mixed category attributed also negative emotions, reflecting their inner struggle to make meaning of the situation. that those participants displaying hv patterns assigned more personal relevance to honesty/truthfulness implies that there is a discrepancy between the abstract importance they assign to this value and the concrete evaluation of a situation where this value becomes relevant. it is possible that – due to the more open nature of the passive moral temptation, these participants were not sufficiently aware that the situation was morally relevant, and that an orientation towards honesty/truthfulness might guide the interpretation of the situation by attributing guilty feelings (i.e., negative emotions) to the rule transgressor. hence, this would raise the issue of moral sensitivity (rest, narvaez, bebeau, and thoma, 1999) and its relation to a moral self (blasi, 1993). in any case, our findings underline the claim by krettenauer et al. (2008) arguing that it is necessary to explore the link between moral emotions expectancies and the development of the moral self empirically. it is probable that moral emotion expectancies are intimately linked to the development of the moral self. to our knowledge, hardly any empirical research has analysed this relationship directly (e.g., see krettenauer, campbell, & hertz, 2013), so the ideas presented here remain largely theoretical. to sum up, our findings suggest that moral emotions play an important role in emerging adults’ evaluations of morally relevant situations and moral rule transgressions. by identifying new dimensions (i.e., deontic judgment; own action choice; self-constructed emotion attributions) to explain the complexity of moral functioning in emerging adulthood the current studies contribute to a theoretical (and methodological framework) that integrates both cognitive and emotional processes to bridge the gap between moral thought, emotion, and action (malti & latzko, 2010). second, our results imply that also in emerging adulthood emotions play a central role when it comes to explaining the complexity of moral functioning. a purely cognitive-structural approach as suggested by minnameier (e.g., 2012) is not sufficient to explain our findings. very basically, even if the pattern of judging the transgression as wrong while attributing positive emotions is constructed as reflecting a specific substage of moral judgment and thus seen as basically a cognitive phenomenon, the occurrence of mixed emotions and the associated mixed hv pattern cannot be grasped by this approach. the construction or attribution of also negative emotions, together with positive emotions, is not envisaged there. finally, we cannot say that emerging adults displayed the classical hv phenomenon, which in itself would be an indicator that either the developmental transition had not been made or that participants would display a dysfunctional morality. however, many of them showed complex mixed hv reasoning patterns, at least in the context of passive moral temptations, suggesting that emerging adults’ moral functioning often includes internal struggles and ambivalence when thinking about moral issues. we need more research, also longitudinal, to explore the morality of emerging adults and potential developmental trajectories across adulthood. 4.1 limitations and outlook there are several limitations to our study. first, we used self-report data to assess participants’ moral functioning. especially regarding the ideal self values scale, it is possible that, despite the anonymous setting, participants’ answers may have been influenced by social desirability. however, participants’ answers in the hypothetical vignettes included many socially undesirable instances relating to the breach of moral rules in connection with showing no indications of guilt. yet for further studies it would be important to include a measure of social desirability to rule out this possibility and thereby strengthen the internal validity of our measurement. second, including a convenience sample of pre-service teachers implies that we can generalise our findings only to a certain extent. although our results confirm earlier findings, still the data were collected in a rather homogeneous, well-educated university sample. accordingly, at this point we cannot say whether the hv patterns can also be found in the general population of (german) emerging adults. for future studies, we need to include more diverse samples and also assess other relevant personal characteristics like for example socio-economic status, level of education, or migration background in order to more deeply explore (emerging) adult moral functioning and potential mediating or moderating factors. this also means that it is necessary to include further potentially relevant variables associated with moral functioning like empathy, social perspective-taking, moral sensitivity, or interpersonal problem solving. moreover, we need to use also behavioural measures, for example in the context of experimental settings (malti & latzko, 2017) to bridge the gap between moral functioning in hypothetical contexts and actual moral behaviour. our findings have practical implications relating to moral learning and development. by using passive moral temptations, we stimulated participants to explore the boundaries of morality, that is, situations where right and wrong are not as clear-cut as for example in situations where a negative duty like stealing, lying, etc. is violated. this resulted in a surprisingly high proportion of participants showing mixed hv reasoning patterns, indicating that our scenarios stimulated participants to think more deeply about the issues raised in the vignettes. accordingly, such materials may be well suited for use with children, adolescents, and emerging adults to stimulate their moral growth. in line with the kohlbergian tradition, we state that it is not the direction of a moral judgment or evaluation per se but the reasoning process which both reflects and stimulates moral growth. we assume that when it comes to the important and often neglected issue of vertical moral development (cf. schuster, 2001), that is, learning to transfer moral principles and reasoning to other domains, contexts, and situations, scenarios including passive moral temptations may be especially fruitful. thus, while we do not offer a «plus one» stimulation in the vygotskyan sense (blatt & kohlberg, 2006) that is, a stimulation based on a higher developmental level, we argue that passive moral temptations may offer a «plus horizon», that is, a vertical, horizon-extending stimulation. as everyday moral situations involve a high level of variation, for example lying to a stranger vs. lying to a friend or stealing out of hunger vs. stealing just for fun, it is important that individuals of all ages are offered multiple opportunities to practice their reasoning skills in a variety of educational settings. these may include well-established approaches like conflict discussions, role-play, creative writing, and so on. the important issue is that shades and gradations of meaning can be explored, reflected upon, and experienced (cf. gutzwiller-helfenfinger, 2015b). keypoints a clear distinction needs to be made between the happy victimizer phenomenon as relating to young children’s, and the happy victimizer pattern as relating to emerging adults’ (and adolescents’) moral functioning, respectively. despite some similarities on the surface level, the respective reasoning (i.e., judgment-justification-emotion attribution-justification) patterns are developmentally distinct. exploring the link between moral emotion attributions and the commitment to moral values is a promising pathway for further research on (adult) moral functioning. the assessment of (emerging) adults’ moral reasoning tapping both cognitive and emotional components necessitates the use of developmentally appropriate or sensitive assessment approaches. our findings emphasise the situation specifity of moral emotion attributions. further research is needed to explore the potential of educating moral emotions footnotes 1 we use hvp as an abbreviation to refer to the happy victimizer phenomenon. 2we use the term emerging adulthood to refer to our own sample and refer to adolescence, adulthood, and young adulthood when using the terminology of the research cited. 3 it seems that arsenio & kramer (1992) were the first to use this term. 4in the present paper, the term “condition” refers to the specific form of assessment. for example, the deontic judgment condition refers to the assessment of the deontic judgment. 5participants had to make a deontic judgment before the transgression was introduced and afterwards had to judge the transgression. references arsenio, w. f., gold, j., & adams, e. (2006). children’s conceptions and displays for moral emotions. in m. killen & j. g. smetana (eds.), handbook of moral development (pp. 581–609). mahwah, nj: erlbaum. arsenio, w. f., & kramer, r. (1992). victimizers and their victims: children’s conceptions of the mixed emotional consequences of moral transgressions. child development, 63(4), 915–927. doi: 10.2307/1131243 arsenio, w. f., & lover, a. (1995). children’s conceptions of sociomoral affect: happy victimizers, mixed emotions, and other expectancies. in m. killen & d. hart (eds.), morality in everyday life: developmental perspectives (pp. 87–128). cambridge: cambridge university press. barden, c. r., zelko, f. a., duncan, w. s., & masters, j. c. (1980). children’s consensual knowledge about the experiential determinants of emotion. journal of personality and social psychology, 39(5), 968–976. https://doi.org/10.1037/0022-3514.39.5.968 blasi, a. (1984). moral identity: its role in moral functioning. in w. kurtines, & j. gewirtz (eds.), morality, moral behavior and moral development (pp. 128 –139). new york: wiley. blasi, a. (1993). the development of identity: some implications for moral functioning. in g. g. noam & t. e. wren (eds.), the moral self (pp. 99 – 122). cambridge, ma: mit press. blasi, a. (1999). emotions and moral motivation. journal for the theory of social behaviour, 29(1), 1-19. doi:10.1111/1468-5914.00088 bandura, a. (2016). moral disengagement: how people do harm and live with themselves. new york: worth publishers. blatt, m. & kohlberg, l. (2006). the effects of classroom moral discussion upon children's level of moral judgment. journal of moral education, 4(2), 129-161. doi: 10.1080/0305724750040207 campbell, k. m. (2004). moral identity, youth engagement, and discussions with parents and peers . st. catharines, ontario, canada: brock university. damon, w., & hart, d. (1988). self-understanding in childhood and adolescence. cambridge: cambridge university press. döbert, r. & nunner-winkler, g. (1983). moralisches urteilsniveau und verlässlichkeit. die familie als lernumwelt für kognitive und motivationale aspekte des moralischen bewusstseins in der adoleszenz. in g. lind, h.h. hartmann, & r. wakenhut (hrsg.), moralisches urteil und soziale umwelt [moral judgment and social environment] (s. 95-122). weinheim und basel: beltz verlag. gasser, l., & keller, m. (2009). are the competent the morally good? perspective taking and moral motivation of children involved in bullying. social development, 18, 798–816. doi: 10.1111/j.1467-9507.2008.00516.x gasser, l., malti, t., & gutzwiller-helfenfinger, e. (2012) aggressive and nonaggressive children's moral judgments and moral emotion attributions in situations involving retaliation and unprovoked aggression. the journal of genetic psychology, 173(4), 417-439. doi: 10.1080/00221325.2011.614650 gibbard, a. (2002). normative and recognitional concepts. philosophy and phenomenological research, 64(1), 151-167. doi: 10.1111/j.1933-1592.2002.tb00148.x gutzwiller-helfenfinger, e. (2015a). moral disengagement and aggression. comments on the special issue. merrill palmer quarterly, 61(1), 192-211. doi: 10.13110/merrpalmquar1982.61.1.0192 gutzwiller-helfenfinger, e. (2015b). die wirkung von erweitertem rollenspiel auf soziale perspektivenübernahme und antisoziales verhalten. in t. malti, & s. perren (hrsg),soziale kompetenz bei kindern und jugendlichen [social competence in children and adolescents] (s. 244-261; 2. überarb. und erw. aufl.). stuttgart: kohlhammer. gutzwiller-helfenfinger, e., & perren, s. (2015). adolescents’ evaluations of passive moral temptations – relations to bully-victim problems. paper presented in the symposium morality and bully-victim problems (chairs: l. kollérova & d. strohmeier). 17th european conference on developmental psychology (ecdp), braga (portugal), september 8-12, 2015. gutzwiller-helfenfinger, e., & perren, s. (2016). the relationship between adolescents’ bully-victim problems and their use of mechanisms of moral disengagement in the context of passive moral temptations. paper presented in the symposium moral disengagement in the production of aggression (chairs: k. runions & e. gutzwiller-helfenfinger). 22nd world meeting of the international society for research on aggression (isra), sydney (australia), july 19-23, 2016. gutzwiller-helfenfinger, e., gasser, l., & malti, t. (2010). moral emotions and moral judgments in children’s narratives: comparing real-life and hypothetical transgressions. new directions for child and adolescent development, 129, 11–31. doi: 10.1002/cd.273 harris, p.l. (2008). children’s understanding of emotions. in m. lewis, j. haviland-jones, & l. f. barrett (eds.), handbook of emotions (3rd ed., pp. 320-331). new york: the guilford press. heinrichs, k., minnameier, g., gutzwiller-helfenfinger, e., & latzko, b. (2015). „don’t worry, be happy?" das happy-victimizer-phänomen im berufsund wirtschaftspädagogischen kontext. zeitschrift für berufs-und wirtschaftspädagogik, 111(1), 31–55. keller, i. (2006). moralische motivation bei achtjährigen kindern. zusammenhänge von moralischer motivation mit sozialverhalten und der beliebtheit unter gleichaltrigen (forschungsbericht nr. 3 aus der reihe z-proso). zürich: pädagogisches institut der universität zürich. keller, m., lourenço, o., malti, t., & saalbach, h. (2003). the multifaceted phenomenon of ‘happy victimizers’: a cross-cultural comparison of moral emotions. british journal of developmental psychology, 21 , 1–18. doi: 10.1348/026151003321164582 krettenauer, t. (2012). linking moral emotion attributions with behavior: why “(un)happy victimizers” and “(un)happy moralists” act the way they feel. new directions for youth development, 136, 59-74. doi: 10.1002/yd krettenauer, t., asendorpf, j. b., & nunner-winkler, g. (2013). moral emotion attributions and personality traits as long-term predictors of antisocial conduct in early adulthood: findings from a 20-year longitudinal study. international journal of behavioral development, 37, 192–201. doi: 10.1177/0165025412472409 krettenauer, t., campbell, s., & hertz, s. (2013). moral emotions and the development of the moral self in childhood. european journal of developmental psychology, 10(2), 159-173. doi: 10.1080/17405629.2012.762750 krettenauer, t., & eichler, d. (2006). adolescents' self-attributed emotions following a moral transgression: relations with delinquency, confidence in moral judgment, and age. british journal of developmental psychology, 24, 489–506. doi: 10.1348/026151005x50825 krettenauer, t. & johnston, (2011). positively versus negatively charged moral emotion expectancies in adolescence: the role of situational context and the developing moral self. british journal of developmental psychology 29(3), 475-488. doi: 10.1348/026151010x508083 krettenauer, t., malti, t., & sokol, b. (2008). the development of moral emotion expectancies and the happy victimizer phenomenon: a critical review of theory and application. european journal of developmental science, 2, 221–235. doi: 10.3233/dev-2008-2303 krettenauer, t. & montada, l. (2005). entwicklung von moral und verantwortlichkeit. in j. b. asendorpf (hrsg.), soziale, emotionale und persönlichkeitsentwicklung [social, emotional, and personality development] (s. 141-189). göttingen: hogrefe. kroll, j., & egan, e. (2004). psychiatry, moral worry, and the moral emotions. journal of psychiatric practice, 10(6), 352-360. doi: 10.1097/00131746-200411000-00003 lagattuta, k. h. (2005). when you shouldn’t do what you want to do: young children’s understanding of desires, rules, and emotions. child development, 76(3), 713–733. doi: 10.1111/j.1467-8624.2005.00873.x lapsley, d. k., & narvaez, d. (2004). a social-cognitive approach to the moral personality. in, d. k. lapsley & d. narvaez (eds.), moral development, self and identity (pp. 189 212). mahwah, n. j.: erlbaum. malti, t., gasser, l., & buchmann, m. (2009). aggressive and prosocial children’s emotion attributions and moral reasoning. aggressive behavior, 35, 90–102. doi: 10.1002/ab.20289 malti, t., gasser, l., & gutzwiller-helfenfinger, e. (2010). children’s interpretive understanding, moral judgments, and emotion attributions: relations to social behaviour. british journal of developmental psychology, 28, 275–292. doi: 10.1348/026151009x403838 malti, t., & keller, m. (2009). the relation of elementary-school children’s externalizing behaviour to emotion attributions, evaluations of consequences, and moral reasoning. european journal of developmental psychology, 6(5), 592–614. malti, t., keller, m., gummerum, m., & buchmann, m. (2009). children’s moral motivation, sympathy, and prosocial behavior. child development, 80(2), 442–460. malti, t., & krettenauer, t. (2013). the relation of moral emotion attributions to prosocial and antisocial behavior: a meta-analysis. child development, 84(2), 397–412. doi: 10.1111/j.1467-8624.2012.01851.x malti, t., & latzko, b. (2010). children's moral emotions and moral cognition: towards an integrative perspective. new directions for child and adolescent development, 129, 1-10. malti, t., & latzko, b. (2017). moral emotions. in j. stein (ed.), reference module on neuroscience and biobehavioral psychology. oxford, uk: elsevier. doi: 10.1016/b978-0-12-809324-5.06491-9 minnameier, g. (2012). a cognitive approach to the ‘happy victimiser’. journal of moral education, 41, 491-508. doi: 10.1080/03057240.2012.700893 minnameier, g., & schmidt, s. (2013). situational moral adjustment and the happy victimizer. european journal of developmental psychology, 10, 253-268. doi: 10.1080/17405629.2013.765797 nunner-winkler, g. (2007). development of moral motivation from childhood to early adulthood. journal of moral education, 36(4), 399–414. https://doi.org/10.1080/03057240701687970 nunner-winkler, g. (2013a). moralische entwicklung. in m. stamm & d. edelmann (eds.), handbuch frühkindliche bildungsforschung [handbook of early childhood educational research] (pp. 653–665). wiesbaden: springer fachmedien. https://doi.org/10.1007/978-3-531-19066-2 nunner-winkler, g. (2013b). moral motivation and the happy victimizer phenomenon. in k. heinrichs, f. oser & t. lovat (eds.), handbook of moral motivation. theories, models, applications (pp. 267-288). rotterdam: sense publishers. nunner-winkler, g., & sodian, b. (1988). children’s understanding of moral emotions. child development, 59(5), 1323–1338. doi: 10.2307/1130495 parrott, w. (2001). emotions in social psychology. philadelphia: psychology press. perren, s., & gutzwiller-helfenfinger, e. (2012). cyberbullying and traditional bullying in adolescence: differential roles of moral disengagement, moral emotions, and moral values. european journal of developmental psychology, 9(2), 195-209. doi: 10.1080/17405629.2011.643168 pons, f., harris, p.l., & de rosnay (2004 ). emotion comprehension between 3 and 11 years: developmental periods and hierarchical organization . european journal of developmental psychology, 1(2), 127–152. doi: 10.1080/17405620344000022 pratt, m. w., hunsberger, b., pancer, s. m., & alisat, s. (2003). a longitudinal analysis of personal values socialization: correlates of a moral self-ideal in late adolescence. social development, 12, 563–585. doi:10.1111/1467-9507.00249 ruedy, n. e., moore, c., gino, f., & schweitzer, m. e. (2013). the cheater’s high: the unexpected affective benefits of unethical behavior. journal of personality and social psychology, 105(4), 531. schuster, p. (2001). von der theorie zur praxis – wege zur unterrichtspraktischen umsetzung des ansatzes von kohlberg, in w. edelstein, f. oser and p. schuster, eds., moralische erziehung in der schule. entwicklungspsychologie und pädagogische praxi s [moral development in school. developmental psychology and educational practice] (pp. 177-212). weinheim und basel:beltz. smetana, j. g., toth, s. l., cicchetti, d., bruce, j., kane, p., & daddis, c. (1999). maltreated and nonmaltreated preschoolers’ conceptions of hypothetical and actual moral transgressions. developmental psychology, 35, 269-281. doi: 10.1037/0012-1649.35.1.269 thorkildsen, t. a. (2013). motivation as the readiness to act on moral commitments. in k. heinrichs, f. oser & t. lovat (eds.), handbook of moral motivation. theories, models, applications (pp. 83-96). rotterdam: sense publishers. woolgar, m., steele, h., steele, m., yabsley, s., & fonagy, p. (2001). children's play narrative responses to hypothetical dilemmas and their awareness of moral emotions. british journal of developmental psychology, 19(1), 115-128. appendix hypothetical scenarios used change money jana uses her bike every day to go to school. she urgently needs a new tail light. she buys a suitable tail light in a bike shop nearby. she chooses one that costs 32.euros. jana pays cash with a 50-euro bill and, when leaving the shop, notices that the shop assistant gave her 10.euros too much in change. what should jana do (keep the money / return the money)? why? please justify your choice. suppose jana keeps the money. is it okay for jana to keep the money (okay / not okay)? how does she feel (good / mixed feelings [both good and bad], bad)? please name jana’s feelings precisely. please justify your assessment. what would you do in this situation (keep the money / return the money)? how would you feel (good / mixed feelings [both good and bad], bad)? please name your feelings precisely. please justify your assessment. the motorbike peter offers his motorbike for sale. he wants to sell it for 800.euros. a young man is interested in the bike. he beats peter down to 700.euros. the two men come to an agreement. however, the young man does not have enough cash on him. but he promises to be back with the money in half an hour. peter says: “agreed, i will wait for you.” a short time afterwards, though, another customer joins peter. he is prepared to pay the 800.euros in cash right on the spot. what should peter do (sell the motorbike to the new customer / wait for the first customer)? why? please justify your choice. suppose peter sells the motorbike to the new customer. is it okay for peter to sell the motorbike to the new customer (okay / not okay)? how does he feel (good / mixed feelings [both good and bad], bad)? please name peter’s feelings precisely. please justify your assessment. what would you do in this situation (sell the motorbike to the new customer / wait for the first customer)? how would you feel (good / mixed feelings [both good and bad], bad)? please name your feelings precisely. please justify your assessment. lying to a customer maria has founded an enterprise in an innovative technology sector. the enterprise is in a critical start-up phase. maria struggles with financial straits and a fluctuating order situation. she has just overcome a slack season. now the business is running smoothly again. she receives an order that needs to be processed at very short notice. maria knows already that she will not be able to meet the deadline and will have to stave off the customer. moreover, when attending a start-up workshop, she happened to learn about a rival company that would be able to process the order both more speedily and reliably. she ponders whether to inform the customer about the rival company or whether to keep the order in her company. what should maria do (inform the customer / not inform the customer)? why? please justify your choice. suppose maria does not inform the customer. is it okay for maria not to inform the customer (okay / not okay)? how does she feel (good / mixed feelings [both good and bad], bad)? please name maria’s feelings precisely. please justify your assessment. what would you do in this situation (inform the customer / not inform the customer)? how would you feel (good / mixed feelings [both good and bad], bad)? please name your feelings precisely. please justify your assessment. microsoft word wegner&nückles_publication.docx frontline learning research vol.3 no. 4 (2015) 95 -‐ 109 issn 2295-‐3159 training the brain or tending a garden? students’ metaphors of learning predict self-reported learning patterns elisabeth wegner, matthias nückles university of freiburg, germany article received 18 september / revised 30 november / accepted 6 december / available online 20 january abstract conceptions of learning are seen as an important factor in shaping students’ patterns of learning. however, conceptions are often implicit and difficult to assess. metaphors have been proposed as a method to assess conceptions, because metaphors are closely linked to the conceptual system. therefore, in our study we assessed which conceptions of learning are visible in students’ metaphors of learning and examined whether these metaphors predict differences in students’ learning patterns. altogether, n = 91 students of educational science from a german university filled in a questionnaire on their personal metaphors of learning, their learning strategy use, epistemological beliefs, and their motivation. four kinds of metaphors could be differentiated: regulation-related metaphors, learning as knowledge acquisition, learning as problem solving, or as personality development. a discriminant analysis revealed that students with personality development metaphors and with problem solving metaphors were more intrinsically motivated and more aware of the relativism of knowledge than students with regulationrelated or knowledge acquisition metaphors. students with personality development metaphors differed from students with problem solving metaphors in their stronger use of deep processing strategies, their lower extrinsic motivation and their stronger rejection of a dualism of knowledge. the study demonstrates that metaphors of learning are a suitable tool for assessing students’ conceptions of learning and gives new insights on using this innovative method as an assessment tool. keywords: conceptions of learning; metaphors; learning patterns; approaches to learning corresponding author: dr. elisabeth wegner, universität freiburg, institut für erziehungswissenschaft, rempartstr. 11, d-79085 freiburg, germany. phone: +49(0)761 / 203 97550, fax: : +49(0)761 / 203 2458, email: elisabeth.wegner@ezw.uni-freiburg.de doi: http://dx.doi.org/10.14786/flr.v3i4.212 wegner & nückles | f l r 96 “learning is like rowing against the current. as soon as you stop, you drift back again.” benjamin britten (1913-76) “the roots of education are bitter, but the fruit is sweet” (aristotle, 384 -382 b.c.) 1. introduction a great number of proverbs tell us in in metaphors what learning is like, how learning occurs, and what the benefits of learning are. such metaphorical expressions have received a lot of attention from researchers from as diverse domains as philosophy (black, 1993), cognitive science (gick & holyoak, 1980) or cognitive linguistics (lakoff & johnson, 1980), because metaphors have been identified as being more than a deviation from the ‘normal use’ of language. instead, metaphors are closely linked to the way our conceptual system is structured, thus being one of the basic mechanisms in which we perceive the world (lakoff & johnson, 1980). in the context of cognitively oriented research, conceptual metaphors are usually defined as a situation or an object x that shares a similarity with a situation or an object y (“x is like y”). the situation or object x that is characterized by the metaphor is called the “target”, and the situation or object y that is the medium of comparison, the “source” of the metaphor. because conceptual metaphors are based on the detection of similarities of new experiences with familiar experiences, they help to understand novel information, concepts, or information (gentner & holyoak, 1997, p. 32). for example, britten’s metaphor of learning as rowing against the current helps to convey the importance of learning continuously. however, metaphors only partially structure an experience, because the target and source of a metaphor never match completely. obviously, the rowing metaphor leaves out important other aspects of learning, such as that learning produces positive outcomes, as in aristotle’s metaphor of education, or that learning requires the learner to link new information to existing knowledge, which becomes visible in a metaphor such as “learning is like weaving a net”. according to lakoff and johnson’s conceptual metaphor theory, the metaphors that are used also feed back into our conceptual systems. for example, the metaphors “time is a resource” and “work is a resource” bring us to the realisation that leisure time is also a resource, thus influencing our concepts of leisure to be perceived as a valuable good that must not be wasted (lakoff & johnson, 1980). thus metaphors act as a lens through which we perceive the world around us. landau, meier and keefer (2010) suggest that metaphors are so fundamental for human thinking, that in order to understand individuals’ actions with regard to abstract social concepts, such as justice, spirituality, or happiness, it is central to look at how individuals structure these concepts metaphorically: “…metaphor is a cognitive tool that people routinely use to interpret and evaluate information related to those abstract concepts. put simply, a metaphor-enriched perspective suggests that a complete account of the meanings people give to abstract, socially relevant concepts requires an understanding not only of their schematic knowledge about those concepts in isolation but also how they structure those concepts in terms of superficially dissimilar, relatively more concrete concepts.” (p.1047) therefore, we assume that metaphors could be an important tool to assess how students structure their concepts of learning. the aim of the current study was therefore to assess which kind of metaphors students use to describe learning and which impact the metaphors have on students’ learning. so far, there is only very little research assessing students’ metaphors of learning. however, we find ample research on students’ conceptions of learning. therefore, we will first outline findings on conceptions of learning and their role for how students learn. afterwards we will elaborate on how metaphors and conceptions might relate to each other and how metaphors have been used to assess conceptions. finally we will present evidence from our study indicating that indeed the metaphors that students use relate to their self-reported learning activities, their motivation and their epistemological beliefs, that is, their beliefs about knowledge and knowing. wegner & nückles | f l r 97 1.1 conceptions of learning conceptions can be defined as an “individual’s personal and therefore variable response to a concept” (entwistle & peterson, 2004, p. 408). conceptions are usually understood as systems of beliefs (e.g. marton & säljö, 1976; richardson, 2007), which act as a filter for cognition (see pajares, 1992). the kind of conception of learning a student holds organizes the student's perception of learning environments, the interpretation of learning tasks, the expectations towards teaching staff and other students, motivation and also the choice of learning strategies (pajares, 1992). early studies on students’ conceptions of learning (säljö, 1979) differentiated between five different conceptions, ranging from reproductive conceptions such as understanding learning as the acquisition of factual information and as memorizing what has been learned, over learning as the application and use of knowledge, to meaning oriented conceptions such as understanding what has been learned and as seeing things in a different way. according to the phenomenographic perspective, conceptions are understood as qualitatively distinct categories, but “higher” conceptions such as “developing as a person” subsume lower conceptions, such as “acquisition of knowledge”. individuals develop towards more advanced conceptions (e.g. marton & säljö, 1976). other researchers do not assume a developmental order of distinct and developmental categories of conceptions (e,g. richardson, 2007). later research focused on how students’ understanding of learning is related to students’ use of learning strategies, their learning motivation and their epistemological beliefs. in this productive area of research, two merging research frameworks can be discerned (vanthournout, donche, gijbels & van petegem, 2014), namely the learning patterns framework (vermunt, 1996; vermunt & vermetten, 2004) and the approaches to learning framework (e.g. entwistle & peterson, 2004; entwistle & ramsden, 1983). both frameworks are based on the assumption that there are, on the one hand, different dimensions of learning on which students individually vary (such as their use of learning strategies, their learning motivation or their self-regulation strategies), but that on the other hand, these dimensions form systematic clusters, which are called learning patterns or approaches to learning. richardson (2011) assumes that students’ conceptions of learning are important for forming these systematic clusters. interestingly, in both research frameworks, we find a pattern that is characterized by an intrinsic interest in studying and in learning contents (deep approach / meaning-directed learning pattern). students with this pattern use deep processing strategies and have a high level of self-regulation, and according to vermunt (1996), this pattern is characterized by a mental model of learning as construction of knowledge. also, both frameworks describe an opposing pattern in which students have the major intention to cope with course requirements, are externally motivated, see contents as unrelated bits of knowledge and fail to see the meaning or value of the contents. this goes in hand with learning strategies that focus on rehearsal and involve little reflection, and also with a feeling of pressure and anxiety (surface approach / reproductiondirected learning pattern). this pattern is based on the mental model of learning as intake of knowledge (vermunt, 1996). both frameworks also describe, apart from these two more or less identical types of students, additional patterns or approaches. within the learning patterns framework, vermunt (1996) describes a type of students with an undirected learning pattern. this pattern is characterized by a lack of regulation, ambivalent motivation, and no identifiable mental model of learning. the other type of student described by vermunt (1996) are those with an application directed learning pattern, which is based on the mental model of learning as the use of knowledge, an intrinsic (vocational) orientation and concrete processing of information. the approaches to learning framework additionally includes a strategic approach (biggs, 1987; entwistle, tait, & mccune, 2000). this approach is characterized by a strong motivation to do well in the course and to complete the degree in order to accomplish personal goals. students with a strategic approach organize their studying well, manage their time effectively, and are alert to assessment requirements and criteria (virtanen & lindblom-ylänne, 2010). in a comprehensive review, vermunt and vermetten (2004) found that an undirected pattern/surface approach leads to the worst studying results; the best studying results are yielded by the meaning-directed pattern/deep approach. reproduction-directed pattern and application-directed pattern had no clear relation to studying success. wegner & nückles | f l r 98 1.2 assessing conceptions of learning taken together, we can draw from research that conceptions play an important role in shaping students’ learning, and thus have an impact on their studying. however, assessment of conceptions is not as simple as it seems. as we have pointed out above, conceptions are partly implicit and therefore difficult to assess. interviews which could be used to assess also implicit aspects of conceptions are time consuming and are not suitable for large scale studies. often, questionnaires with dimensional assessment scales such as the inventory of learning styles (ils, vermunt, 1994) are used to assign students to distinct groups by using cluster analysis (e.g. parpala, lindblom-ylänne, komulainen, litmanen, & hirsto, 2010; entwistle & mccune, 2013; richardson, 2007). however, the technique of cluster analysis carries the risk of methodological artefacts, because general answer tendencies might account for correlations between two variables (richardson, 2011). for example, some persons tend to agree rather than to disagree on items (acquiescent response style), whereas others tend to choose extreme response categories on all scales. this can result in clusters not based on differences in the assessed dimensions, but on the general answer tendencies. another problem is that clusters can only be determined post hoc in large samples, but it is difficult to make an individual diagnosis of conceptions. consequently, assessment techniques are needed to determine conceptions of learning. given the important role of metaphors for our cognitive system, it comes as no surprise that recently in the area of teacher education and of higher education in general, metaphors have become increasingly popular for assessing implicit constructs such as conceptions (löfström, nevgi, wegner, & karm, 2015). 1.3 using metaphors for understanding conceptions to use metaphors to assess conceptions of learning, we need to take a closer look into how metaphors and conceptions are assumed to relate to each other, and how this has been exploited in research. unfortunately, educational researchers using metaphors often do not explicate which relation between metaphors and conceptions they assume. this is problematic because lakoff’s and johnson’s cognitive metaphor theory, which is still the most prominent metaphor theory, allows for different assumptions about the relation between metaphors and cognition. murphy (1996) describes a “strong version” of this theory, stating “that some concepts are not understood via their own representations but instead by (metaphoric) reference to a different domain” (p. 201). this would imply that a metaphor of learning is identical to a conception of learning. a person who describes learning as the construction of a skyscraper would then literally have the conception that learning is construction. in contrast, the “weak version” of cognitive metaphor theory assumes that both the source and the target concept of the metaphor are more or less developed separate cognitive structures. under this view, a certain conception is the reason why a person can identify features that are mappings between one's own conception and a certain metaphor (haser, 2005). thus, a person who has the conception of learning as a construction of knowledge would single out identical features between learning and building a skyscraper, but not between learning and eating, and thus prefer to use the metaphor of learning as building a skyscraper then as eating a cake as a descriptor. in research using metaphors for assessing conceptions we can find works based on the “strong” and the “weak" versions of cognitive metaphor theory. those researchers who are interested in examining the development and change of conceptions, for example in the context of educational development programs (e.g. bullough, 1991; clandinin, 1985), tend to argue on the base of a strong version of cognitive metaphor theory because they usually assume that changing the metaphor a person uses also leads to a change in the person’s conception. in contrast, researchers using metaphors mainly for assessment of conceptions (e.g. saban, kocbeker & saban, 2007; patchen & crawford, 2011) usually argue on the base of a weak version of the cognitive metaphor theory, assuming that metaphors help to express or to identify an underlying conception. based on the longstanding tradition on research on conceptions of teaching and learning (e.g. gow & kember, 1993; vermunt & vermetten, 2004), we assume that there are indeed underlying wegner & nückles | f l r 99 conceptions that are separate from metaphors, and thus would adhere to a weak version of cognitive metaphor theory. we assume that the underlying conception enables or prompts a person to identify structural mappings between one's own conception and the metaphor. two principally different approaches can be discerned in assessing conceptions via metaphors (löfström et. al., 2015). on the one hand, researchers themselves generate metaphors and use them as a stimulus for assessing conceptions. for example, some researchers have developed questionnaires in which participants are asked to rate metaphors (e.g., lehmann, 2012). others have asked participants to reflect on preselected written metaphors (e.g., visser-wijnween, van driel, van der rijst, verloop & visser, 2009) or metaphorical pictures (ben-peretz, mendelson & kron, 2003), and analysed the participants’ responses with regard to the underlying conception. in both cases, the participants mapped preselected metaphors to their own conception. on the other hand, researchers also asked participants to produce metaphors on their own, and then analysed these metaphors according to their conceptual content. for example, saban et al. (2007) asked more than 1000 students to write down a metaphor on being a teacher and identified six dominant conceptual mappings for the metaphors: knowledge provider, craftsperson, facilitator, nurturer, counsellor and democratic leader. interestingly, these conceptual categories are similar to conceptions of teaching as described in the “teaching perspectives inventory” by collins and pratt (2011), namely, transmission, apprenticeship, developmental, nurturing and social reform. other studies (patchen & crawford, 2011; wegner & nückles, 2015a) classified metaphors based on the two scientific paradigms of learning as acquisition vs. as participation according to sfard (1998). only a few studies focus on students’ metaphors. inbar (1996) asked more than 400 students for metaphors on ‘being a student’ and on ‘teachers’. a great proportion of metaphors was related to feeling imprisoned in school, showing largely negative emotions towards school. marsch (2009) analysed high school students’ metaphors of biology learning. she found that most students conveyed an idea of learning as intake of knowledge. in a longitudinal study with students from educational science, wegner and nückles (2015b) found that students adapted their metaphors of learning to university learning culture in the course of their first year of studying. while in the first year, the most frequently used metaphor of learning was “collecting”, the most frequently used metaphor in the second year described learning as “discovering”. even though empirical studies do indicate that different views on teaching or learning are visible in metaphors, and there are theoretical arguments for a close relationship between metaphors and conceptions, there are few studies which really validate whether different metaphors also account for differences in underlying conceptions, and even less, whether they also account for differences in actual practice. moreover, all of the existing validation studies are case studies with very small samples, or just report data on selected cases illustrating their hypotheses (e.g. mahlios, massengill-‐shaw, & barry, 2010; bullough, 1991; marsch, 2009; thomas & mcrobbie, 1999). some larger studies link metaphors of teaching to other self-reported data, but not to practice (wegner & nückles, 2015a; löfström & poom-valickis, 2013). thus, there is a need for empirical studies validating whether metaphors of learning can indeed be an indicator for conceptions of learning, and whether students’ metaphors of learning really relate to how students learn in terms of which learning strategies they use and what their motivation is. 1.4 summary and aims of the study in sum, we can conclude that students’ learning patterns are influenced by the individual understanding by students of what learning is, that is, their conceptions of learning. first evidence from studies on conceptions of teaching indicates that metaphors might also be an appropriate and helpful tool for assessing conceptions of learning, and that differences in metaphors of learning are also associated with differences in students’ learning practice. however, so far there are only few studies analysing the relation between metaphors and practice, and studies are only based on small sample sizes. in our study, we aimed at closing these gaps by (a) exploring whether the different conceptions of learning as they have been described in the literature are also visible in the metaphors that students use to describe learning, and (b) examining whether differences in the conceptual content of the metaphors account for differences in learning practice, such as the use of learning strategies, study motivation, and epistemological beliefs. wegner & nückles | f l r 100 2. methods 2.1 participants and procedure ninety-one students of educational science from a german university took part in the study (78.1% female and 21.9% male, meanage =23.81 years, sdage =3.38). all students were first given a short example of what we meant by metaphor, and were then asked to write down their metaphors of learning. afterwards they filled-in questionnaires on learning-related measures. all measurements took place in university courses in the institute of educational science and were set at the beginning of a lesson. 2.2 questionnaires for assessing learning-related measures, we chose questionnaires on motivation, learning strategies and epistemological beliefs which are well established for german language speakers and which address central aspects of learning patterns and approaches to learning (for an overview of the scales and their reliabilities, see table 1). table 1 scales of the questionnaires, scale reliability (cronbach’s ɑ), mean values (m), standard deviation (sd) and number of items. sample item ɑ m sd no. of items intrinsic motivation i don’t need a reward for completing the study tasks because they are fun. . 734 4.96 0.91 5 extrinsic motivation i will be quite proud when i have completed my degree. .726 5.82 0.96 3 organisation i draw tables and graphs in order to structure the contents of the subject. .791 3.68 0.61 8 elaboration i try to relate new concepts or theories to familiar concepts or theories. .742 3.67 0.55 8 critical thinking i examine whether theories, interpretations or conclusions are sufficiently grounded. .873 3.11 0.68 8 rehearsal i re-read my notes again and again. .827 3.14 0.75 7 metacognitive strategies before i start with learning, i try to plan which contents i do need to know and which i don’t. .745 3.59 0.47 11 time management i schedule time slots for studying. .897 2.99 0.95 5 learning with others i work on texts and tasks together with my colleagues. .849 3.41 0.77 7 relativism scientific research shows that there is one right answer to most problems. .635 1.69 .40 6 dualism if two scientists have a different opinion on a matter, one of them has to be wrong. .613 1.68 .43 4 wegner & nückles | f l r 101 the use of learning strategies was assessed by seven scales of a german questionnaire (list; wild & schiefele, 1994) which is based on the motivated strategies for learning questionnaire (mslq; pintrich, smith, garcia, & mckeachie, 1993). participants rated on a five-point rating scale how often they engage in certain learning activities (“in the following, we would like to know about how you learn. you will find a list of learning activities. please indicate for each activity, how often it occurs when you are learning. you can rate the frequency between very seldom (1) and very often (5)”). the selected activities addressed four cognitive strategies (organization of contents, elaboration of contents, rehearsal, critical thinking), metacognitive strategies, use of time management strategies and the frequencies of learning with others. motivational orientation was assessed by two scales of the intrinsic motivation inventory (imi; deci & ryan, 2003), one on intrinsic motivation and one on the extrinsic value of studying in a version adapted to the context of higher education. students were instructed to rate the items on their seven-point rating scale ranging from completely disagree to completely agree (“please indicate for each statement how much you agree. […] these questionnaires are not evaluated! there are no “right” or “wrong” answers.”) epistemological beliefs in general were assessed by a german questionnaire on epistemological beliefs (köller, watermann, trautwein, & lüdtke, 2004). it comprises two dimensions, “dualism” (sample item: “if two scientists have a different opinion on a matter, one of them has to be wrong.”) and “relativism” (sample item: “scientific insights that seem true today can turn out to be wrong”). participants had to rate the statements on a four-point rating scale ranging from totally disagree (= 1) to totally agree (= 4). 2.3 assessment and analysis of metaphors following saban (saban et al., 2007), students had to answer the questions “learning is like… because…”. in order to enrich the answers, we added the question “the goal of learning is…”. metaphors were analysed following chi’s recommendations on coding verbal data (1997). two metaphors were excluded from the analysis because they were only fragments. one metaphor as a whole was defined as the unit of analysis, that is, the complete answer consisting of the source and explanation of the metaphor, because sometimes the same source was associated with different kinds of explanations (e.g. “learning is like food: you need it for survival” vs. "learning is like eating food: if you eat too much, you get sick”). we then inductively developed a system of categories within a team of two researchers. all decisions were also discussed within a larger research team, consisting of four researchers in total. as in other studies (e.g. inbar, 1996; leavy, mcsorley & boté, 2007) , we found a large amount of metaphors without conceptual content, but merely related to aspects of regulating one’s own learning and motivation, such as in "learning is like jumping into cold water. usually you don’t want to do it, but once you get started, it’s always good”. therefore, regulation-related metaphors were first separated from other metaphors. in the second step, the remaining metaphors were classified according to the conceptual content. we distinguished three different kinds of metaphors: learning as acquisition of knowledge vs. learning as problem solving vs. learning as development of personality (see table 2). for each category, a short description was written down with examples. then,half of the metaphors (n=43) were coded by a second independent person. interraterreliability as measured by cohen’s κ was very good (κ = .81). 3. results conceptions of learning as described in literature were visible in our metaphors. of the four categories of metaphors, knowledge acquisition was the most common (30.3%), followed closely by regulation-related metaphors (28.1%). personality development metaphors were described by 25.8% of the students, and only 15.7% of the students used metaphors which focused on learning as a prerequisite for solving problems (table 3). wegner & nückles | f l r 102 table 2 categories of metaphors, description and anchoring examples for each category of metaphor in the next step, we determined whether students with different kinds of metaphors differed with regards to their epistemological beliefs, their study motivation, and their learning strategies. an overall manova with type of metaphor as independent measure, and epistemological beliefs, motivation and learning strategies as dependent measures showed a significant multivariate effect of metaphor type, f(33, 231) = 2.31, p < .001, η2=.25 (see table 3 for an overview of the descriptive data for the four kinds of metaphors). separate univariate anovas revealed significant differences for intrinsic motivation, f(3, 85) = 4.31, p < .01, η2=.13, for dualism f(3,85) = 2.78, p < .05, η2=.09, and the use of rehearsal strategies, f(3,85) = 4.31, p < .01, η2= .14. students with problem solving and development metaphors indicated a higher intrinsic motivation than students with regulation-related or knowledge acquisition metaphors (see table 3). students with personality development metaphors had the lowest scores on the dualism scale, while students with knowledge acquisition metaphors had the highest, indicating that students with knowledge acquisition metaphors believed much stronger that knowledge is either true or false than students with personality development metaphors. students with knowledge acquisition metaphors also had the strongest tendency to use rehearsal strategies, followed by students with regulation-related and personality development metaphors. students with problem-solving metaphors had the lowest scores on this scale. category description example n regulationrelated metaphors the metaphor and its explanation refer to self-regulation aspects and do not contain any information about cognitive processes or further goals in learning. “learning is like jumping into cold water. usually you don’t want to do it, but once you get started, it’s always good.” “learning is like climbing a mountain. some hills are steep, and others are easy to walk.” 25 (28.1%) acquisition of knowledge learning consists of the acquisition of something (=knowledge). there is no further indication that the acquired knowledge is used for something. “learning is like building a library with your own books. you start with one shelf and while you get more and more books you also need more shelves.” “learning is like solving a jigsaw puzzle … the goal is to solve the jigsaw puzzle and to get the complete picture.” 27 (30.3%) problem solving learning consists of the acquisition of something (= skills and knowledge) which are necessary to solve certain problems, to be prepared for future challenges or to be able to work in a certain job. „learning is like food – you need it for survival. without it you cannot deal with new problems.” “learning is like getting a closet with lots of clothes. at the beginning of your life you have only a few pieces of clothes, later you get more and more […]. the goal is to buy, to select, to sort, to categorize the clothes so you can use them and wear them when you need them.” 14 (15.7%) development of personality learning consists of developing something existing further, in order to develop one's own personality or new perspectives. “learning is like exploring other countries. you get to know new cultures and new perspectives, and you widen your horizon.” “learning is like a plant that is growing, because you thrive and prosper inside.” 23 (25.8%) total 89 (100%) wegner & nückles | f l r 103 table 3 means and standard deviation for study motivation, epistemological beliefs and learning strategies for each group of metaphors regulationrelated acquisition of knowledge problem solving development of personality intrinsic motivation 4.70 (0.76) 4.66 (0.98) 5.36 (0.90) 5.33 (0.81) extrinsic motivation 5.70 (1.11) 5.91 (0.89) 6.11 (0.66) 5.70 (1.03) relativism 1.77 (0.42) 1.80 (0.44) 1.63 (0.42) 1.52 (0.29) dualism 1.66 (0.37) 1.82 (0.46) 1.75 (0.38) 1.49 (0.43) critical thinking 3.06 (0.75) 3.01 (0.58) 3.01 (0.71) 3.35 (0.71) learning with others 3.42 (0.68) 3.50 (0.86) 3.00 (0.62) 3.58 (0.80) elaboration 3.70 (0.53) 3.69 (0.54) 3.57 (0.48) 3.68 (0.65) organisation 3.53 (0.37) 3.80 (0.58) 3.68 (0.83) 3.71 (0.71) rehearsal 3.05 (0.75) 3.52 (0.57) 2.70 (0.82) 3.06 (0.75) metacognitive strategies 3.57 (0.43) 3.75 (0.55) 3.43 (0.46) 3.52 (0.42) time management 2.73 (0.78) 3.29 (0.95) 3.30 (0.92) 2.75 (1.06) to better understand the overall differences between the groups, and the patterns of motivation, epistemology and learning strategies for each group, we performed a discriminant analysis with epistemological beliefs, motivation, and learning strategies as predictors and the kind of metaphors as criterion. it resulted in three discriminant functions. the first discriminant function explained half of the variance, 56.8%, canonical r2 = .38; the second discriminant function explained one third of the variance, 33.2% canonical r2 = .26. the third discriminant function explained the remaining 9.9% of the variance, canonical r2 = .09. together, the three functions significantly differentiated between the metaphor types (wilk’s λ = .41, χ2(33) = 71.89, p = .000). after removing the first function, the remaining two functions still contributed significantly to the classification of the metaphors (wilk’s λ = .66, χ2(20) = 33.12, p = .03). however, the last function on its own could not differentiate between the metaphors. figure 1 shows the distribution of the four metaphors among the two separating functions. correlations of the predicting variables with each canonical discriminant function are given in table 4. a closer look at the discriminant functions revealed that the first function separated the students with regulation-related and with knowledge acquisition metaphors from the students with personality development and problem solving metaphors, whereas the second function mainly separated the students with problem solving metaphors from the students with personality development metaphors, see fig. 1. the first function was associated with high beliefs in the certainty of knowledge (i.e., low relativism), and with a low intrinsic motivation, see table 4 and fig. 2), thus indicating that students with knowledge acquisition and regulation-related metaphors were less intrinsically motivated and believed to a higher extent that knowledge is certain and unambiguous. the second function correlated positively with extrinsic motivation and the belief in the dualism of knowledge, and negatively with an extra preference for critical thinking, for learning with other students and for elaboration of contents (see table 4). this indicates that students with problem solving metaphors were more extrinsically motivated, believed more that knowledge was either wrong or right, and were less inclined to critically think about the contents or to discuss them with colleagues, than students with personality development metaphors. wegner & nückles | f l r 104 figure 1. plot of the group centroids of the four metaphors with regard to the two discriminant functions. function 1 separates problem solving and personality development metaphors from the regulation-related and knowledge acquisition metaphors. function 2 separates personality development from the problem solving metaphors. table 4 correlations between discriminant functions and the predicting variables. bold print indicates the highest correlating function for each predictor variable function 1: instrinsic motivation (-) and variability of knowledge function 2: extrinsic motivation deep processing (-), dualism function 3: structured learning instrinsic motivation -.467 -.144 .191 relativism (general certainty beliefs) .299 .250 -.198 dualism (beliefs in simple knowledge) .179 .460 .140 learning with others .172 -.316 .263 critical thinking -.107 -.309 .140 external motivation -.068 .253 .149 elaboration .062 -.097 -.078 rehearsal .424 -.003 .702 organisation -.008 .051 .475 time management -.002 .421 .461 metacognitive strategies .269 .060 .357 knowledge acquisition regulation-related personality development problem solving wegner & nückles | f l r 105 the last canonical discriminant function helped to differentiate the four groups only together with the second function. on this function, we found high loadings of measures indicating structured learning, such as the strategies of organization and rehearsal, metacognitive strategies and time management (see fig. 2). the function differentiated between regulation-related metaphors and knowledge acquisition metaphors, with students with knowledge acquisition metaphors showing more use of structured learning than students with regulation-related metaphors, that is, students with metaphors which just focus on aspects relating to the regulation of their learning or their motivation rather than on the results or the process of learning. however, as noted above, the third function could not discriminate between the groups on its own. figure 2. z-standardized mean values for each of the metaphor categories. the variables of the first function are printed in black/bold (discriminating between the regulation-related and knowledge acquisition metaphors on the one hand, and the problem-solving and the personality development metaphors on the other hand). variables of the second function are given in hatched/italics (discriminating between problem solving metaphors and personality development metaphors). 4. discussion and conclusion in our study, we could distinguish four kinds of metaphors of learning, namely metaphors focusing on regulation aspects of learning, metaphors expressing the idea of learning as knowledge acquisition and the idea of learning as personality development, and metaphors focusing on learning as a prerequisite for problem solving. students’ metaphors of learning predicted different patterns of motivation, epistemology and use of learning strategies. students with problem solving and with personality development metaphors differed in their intrinsic motivation and their awareness for the tentativeness of knowledge from students wegner & nückles | f l r 106 with knowledge acquisition and with regulation-related metaphors. students with personality metaphors could be separated from students with problem solving metaphors by their use of deep processing strategies, their belief in the dualism of knowledge and their extrinsic motivation. finally, students with knowledge acquisition metaphors had a tendency to engage more in structured learning activities than students with regulation-related metaphors, though not significantly so. metaphors of learning predicted study motivation, epistemological beliefs and learning strategies. this implies that metaphors can be used to detect differences in conceptions of learning. the predicted learning patterns mirror in some respects both the learning patterns and the approaches to learning model. personality development metaphors seem to predict a meaning-directed learning pattern or a deep approach, because students with personality development metaphors displayed a high intrinsic study motivation, a high awareness for the tentativeness and the complexity of knowledge and indicated to make much use of deep processing strategies. this finding confirms results from entwistle and mccune (2013), who found that there is a certain group of students that have a ‘disposition to understand for oneself’. this disposition seems to be based on the view of learning as development of personality. students with problem solving metaphors have similarities with students with an application-directed learning pattern as described by vermunt (1996), because the application directed mental model of learning is based on the use of knowledge as well. problem-solving metaphors were also associated with strong extrinsic motivation for studying, but were, other than students with the application-directed learning pattern, also more intrinsically motivated. on the other hand, students with problem-solving metaphors made only average use of concrete processing strategies such as elaboration of contents, which would have been expected in an application-directed learning pattern. knowledge acquisition metaphors seem to be similar to vermunts’ rehearsal-directed learning pattern, because they are also characterized by a mental model of intake of knowledge. as students with a rehearsal-directed learning approach, students with knowledge acquisition metaphors had an extrinsic study motivation and believed in the stability of knowledge. however, other than students with a rehearsal-directed learning pattern, students with knowledge acquisition metaphors in our sample also described the use of deep learning strategies and structured their learning activities strongly. in this respect, they seem more similar to the strategic approach described by biggs (1987), which is characterized by good organization, good time management, and alertness to the assessment requirements and criteria. this would indicate that acquisition and elaboration of knowledge are seen as the dominant requirement within the degree under consideration. finally, regulation-related metaphors have a great overlap with the undirected learning pattern. the metaphors did not convey a mental model of learning, students had little intrinsic motivation and they did not engage in deep or structured learning activities. this is interesting in several respects. on the one hand, these findings mirror those of other studies in which participants described metaphors with no apparent match to conceptions of teaching or learning. for example, in inbar's (1996) study with high school students, most metaphors were related to emotional aspects of learning and did not reveal anything about underlying conceptions of learning. similarly, in their study on teacher candidates’ metaphors of teaching, leavy et al. (2007) report a great number of metaphors that did “not refer to components central to the practice of teaching, but referred to what teaching meant to the individuals themselves (e.g. ‘teaching is like running a marathon; you train, sweat, and prepare for this great race but once you’re in it, you just keep going strong until the end’)” (p. 1226). in one group of the sample, 30% of the metaphors were ‘self-referential’. such self-referential metaphors can be found in many studies using metaphors (e.g. leavy et. al., 2007; zapata & lacorte, 2007; löfström & poom-valickis, 2013). findings from our study might be a first indicator that the participants who use such self-referential, emotional or motivational metaphors have not yet developed a differentiated explicable conception which can be communicated by a metaphor. considering the unorganized use of learning strategies in this group, the finding could be interpreted in the way that a lack of an elaborated conception of learning is a major problem for developing adequate learning strategies. consequently, to these students challenges of self-regulation are the most distinct experience of learning. however, further research is needed to confirm this hypothesis. wegner & nückles | f l r 107 of course, some limitations have to be born in mind. again, our study only assessed self-report data on participants’ use of learning strategies in general. therefore, we do not know how students’ answers relate to their actual practice of learning or on what they think they should do. also, course requirements, which influence strongly how students actually learn, need to be considered (vermetten, lodewijks & vermunt, 1999). however, if students were biased in their answers on their learning strategy use, the differences in self-report data between the four kinds of metaphors indicate at least that students differ in what they think is a socially desirable answer. another limitation is that we assessed metaphors just in one context at one point of time. so we cannot draw conclusions about whether metaphors are stable across contexts or over time, as conceptions would be. nevertheless, metaphors seem to be a promising research tool which should receive further attention for research on conceptions of learning, because it seems indeed to matter whether students see learning as a matter of training their brains or tending their gardens. keypoints students’ “metaphors of learning” discriminated between different profiles of motivation, epistemological beliefs and use of learning strategies. different categories of metaphors could be linked to both learning patterns and approaches to learning. students describing learning in terms of personality development shared similarities with deep approach learners and meaning-directed learners. students focussing in their metaphors on only the regulation aspects of learning shared similarities with undirected learners. students describing learning in terms of knowledge acquisition shared similarities either with rehearsal-directed learners or with learners with a strategic approach. references biggs, j. b. (1987). student approaches to learning and studying. research monograph. australian council for educational research ltd., radford house, frederick st., hawthorn 3122, australia. ben-peretz, m., mendelson, n., & kron, f. w. (2003). how teachers in different educational contexts view their roles. teaching and teacher education, 19(2), 277–290. doi:10.1016/s0742-051x(02)00100-2 black, m. (1993). more about metaphor. in a. ortony (ed.), metaphor and thought (2nd. edition) (pp. 19– 41). bullough, r. v. (1991). exploring personal teaching metaphors in preservice teacher education. journal of teacher education, 42(1), 43–51. doi: 10.1177/002248719104200107 chi, m. t. h. (1997). quantifying qualitative analyses of verbal data: a practical guide. the journal of the learning sciences, 6(3), 271–315. doi: 10.1207/s15327809jls0603_1 clandinin, d. j. (1985). personal practical knowledge: a study of teachers' classroom images. curriculum inquiry, 15(4), 361-385. doi: 10.2307/1179683 collins, j. b., & pratt, d. d. (2011). the teaching perspectives inventory at 10 years and 100,000 respondents: reliability and validity of a teacher self-report inventory. adult education quarterly november, 61(4), p. 358-375. doi: 10.1177/0741713610392763. deci, e. l. & ryan, r. m. (2003). intrinsic motivation inventory (imi). retrieved in oct. 2012 from http://www.selfdeterminationtheory.org/questionnaires/ entwistle, n. j., & peterson, e. r. (2004). conceptions of learning and knowledge in higher education: relationships with study behaviour and influences of learning environments. international journal of educational research, 41(6), 407–428. doi:10.1016/j.ijer.2005.08.009 wegner & nückles | f l r 108 entwistle, n. j., & ramsden, p. (1983). understanding student learning. london and canberra: croom helm. entwistle, n., & mccune, v. (2013). the disposition to understand for oneself at university: integrating learning processes with motivation and metacognition. british journal of educational psychology, 83(2), 267–279. doi:10.1111/bjep.12010 entwistle, n., tait, h., & mccune, v. (2000). patterns of response to an approaches to studying inventory across contrasting groups and contexts. european journal of psychology of education, 15(1), 33–48. doi:10.1007/bf03173165 gick, m. l., & holyoak, k. j. (1980). analogical problem solving. cognitive psychology, 12(3), 306–355. doi: 10.1016/0010-0285(80)90013-4 gentner, d., & holyoak, k. j. (1997). reasoning and learning by analogy: introduction. american psychologist, 52(1), 32–34. doi:10.1037/0003-066x.52.1.32 gow, l., & kember, d. (1993). conceptions of teaching and their relationship to student learning. british journal of educational psychology, 63(1), 20-23. doi: 10.1111/j.2044-8279.1993.tb01039.x haser, v. (2005). metaphor, metonymy, and experientialist philosophy: challenging cognitive semantics (vol. 49). walter de gruyter. inbar, d. e. (1996). the free educational prison: metaphors and images. educational research, 38(1), 77– 92. doi: 10.1080/0013188960380106 köller, o., watermann, r., trautwein, u. & lüdtke, o. (2004). wege zur hochschulreife in badenwürttemberg: tosca—eine untersuchung an allgemein bildenden und beruflichen gymnasien. opladen: leske + budrich. lakoff, g., & johnson, m. (1980). conceptual metaphor in everyday language. the journal of philosophy, 77(8), 453–486. doi: 10.2307/2025464 landau, m. j., meier, b. p., & keefer, l. a. (2010). a metaphor-enriched social cognition. psychological bulletin, 136(6), 1045-1067. 10.1037/a0020970 leavy, a. m., mcsorley, f. a., & boté, l. a. (2007). an examination of what metaphor construction reveals about the evolution of preservice teachers’ beliefs about teaching and learning. teaching and teacher education, 23(7), 1217–1233. doi: 10.1016/j.tate.2006.07.016 lehmann, b. (2012). entwicklung eines instruments zur erfassung unterrichtsbezogener metaphern [development of an instrument for assessing teaching-related metaphors]. in: faßhauer, u., fürstenau, b., wuttke, e. (eds.): berufsund wirtschaftspädagogische analysen – aktuelle forschungen zur beruflichen bildung. [analyses in pedagogy of economics and vocation – current research in vocational training] (p.127-139) opladen: leske + budrich. löfström, e., nevgi, a., wegner, e., & karm, m. (2015). images in research on teaching and learning in higher education. in: j. huisman & m. tight (eds.), theory and method in higher education research, volume 1. (p. 191-212). emerald group publishing limited. löfström, e., & poom-valickis, k. (2013). beliefs about teaching: persistent or malleable? a longitudinal study of prospective student teachers' beliefs. teaching and teacher education, 35, 104–113. doi:10.1016/j.tate.2013.06.004 mahlios, m., massengill-‐shaw, d., & barry, a. (2010). making sense of teaching through metaphors: a review across three studies. teachers and teaching: theory and practice, 16(1), 49–71. doi:10.1080/13540600903475645 marsch, s. (2009). metaphern des lehrens und lernens: vom denken, reden und handeln bei biologielehrern. [metaphors of teaching and learning: thinking, talking and practice of biology teachers.] (dissertation) freie universität berlin, berlin. marton, f., & säljö, r. (1976). on qualitative differences in learning: i—outcome and process british journal of educational psychology, 46(1), 4–11. doi:10.1111/j.2044-8279.1976.tb02980.x murphy, g. l. (1996). on metaphoric representation. cognition, 60(2), 173-204. 10.1016/00100277(96)00711-1 pajares, f. m. (1992). teachers' beliefs and educational research: cleaning up a messy construct. review of educational research, 62(3), 307. doi: 10.3102/00346543062003307 wegner & nückles | f l r 109 parpala, a., lindblom-ylänne, s., komulainen, e., litmanen, t., & hirsto, l. (2010). students' approaches to learning and their experiences of the teaching–learning environment in different disciplines. british journal of educational psychology, 80(2), 269–282. doi:10.1348/000709909x476946 patchen, t., & crawford, t. (2011). from gardeners to tour guides: the epistemological struggle revealed in teacher-generated metaphors of teaching. journal of teacher education, 62(3), 286–298. doi:10.1177/0022487110396716 pintrich, p. r., smith, d. a. f., garcia, t., & mckeachie, w. j. (1993). reliability and predictive validity of the motivated strategies for learning questionnaire (mslq). educational and psychological measurement, 53(3), 801–813. doi:10.1177/0013164493053003024 richardson, j. t. e. (2007). mental models of learning in distance education. british journal of educational psychology, 77(2), 253–270. doi:10.1348/000709906x110557 richardson, j. t. e. (2011). approaches to studying, conceptions of learning and learning styles in higher education. learning and individual differences, 21(3), 288–293. doi: 10.1016/j.lindif.2010.11.015 saban, a., kocbeker, b. n., & saban, a. (2007). prospective teachers' conceptions of teaching and learning revealed through metaphor analysis. learning and instruction, 17(2), 123–139. doi:10.1016/j.learninginstruc.2007.01.003 säljö, r. (1979). learning about learning. higher education, 8(4), 443-451. 10.1007/bf01680533 sfard, a. (1998). on two metaphors for learning and the dangers of choosing just one. educational researcher, 27(2), 4-13. 10.3102/0013189x027002004 thomas, g. p., & mcrobbie, c. j. (1999). using metaphor to probe students' conceptions of chemistry learning. international journal of science education, 21(6), 667–685. doi:10.1080/095006999290507 vanthournout, g., donche, v., gijbels, d., & van petegem, p. (2014). (dis)similarities in research on learning approaches and learning patterns. in d. gijbels, v. donche, j. t. e. richardson, & j. d. vermunt (eds.), learning patterns in higher education: dimensions and research perspectives (pp.1132). new york, oxford: routledge. vermunt, j.d. (1994). scoring key for the inventory of learning styles (ils) in higher education. tilburg: tilburg university. vermunt, j. d. (1996). metacognitive, cognitive and affective aspects of learning styles and strategies: a phenomenographic analysis. higher education, 31(1), 25–50. doi:10.2307/3447707 vermunt, j. d, & vermetten, y. (2004). patterns in student learning: relationships between learning strategies, conceptions of learning, and learning orientations. educational psychology review, 16(4), 359–384. doi:10.1007/s10648-004-0005-y vermetten, y. j., lodewijks, h. g., & vermunt, j. d. (1999). consistency and variability of learning strategies in different university courses. higher education, 37(1), 1-21. doi:10.1023/a:1003573727713 virtanen, v., & lindblom-ylänne, s. (2010). university students’ and teachers’ conceptions of teaching and learning in the biosciences. instructional science, 38(4), 355–370. doi: 10.1007/s11251-008-9088-z visser-wijnveen, g., van driel, j., van der rijst, r., verloop, n., & visser, a. (2009). the relationship between academics' conceptions of knowledge, research and teaching–a metaphor study. teaching in higher education, 14(6), 673–686. doi: 10.1080/13562510903315340 wegner, e., & nückles, m. (2015a). knowledge acquisition or participation in communities of practice? academics’ metaphors of teaching and learning at the university. studies in higher education, 40(4), 624-643. doi: 10.1080/03075079.2013.842213 wegner, e. & nückles, m. (2015b). from eating to discovering: how metaphors of learning change during students’ enculturation. zeitschrift für hochschulentwicklung, 10 (4), 145-166. wild, k.-p., & schiefele, u. (1994). lernstrategien im studium: ergebnisse zur faktorenstruktur und reliabilität eines neuen fragebogens: [learning strategies of university students: factor structure and reliability of a new questionnaire.]. zeitschrift für differentielle und diagnostische psychologie, 15(4), 185–200. zapata, g. c., & lacorte, m. (2007). preservice and inservice instructors' metaphorical constructions of second language teachers. foreign language annals, 40(3), 521–534. doi:10.1111/j.19449720.2007.tb02873.x stahl publication frontline learning research vol.7 no. 3 (2019) 27 63 issn 2295-3159 epistemic beliefs and googling tore ståhl a aarcada university of applied sciences helsinki; university of tampere, finland. article received 26 september 2018 / revised 4 april / accepted 5 july / available online 18 july abstract with the introduction of internet as a source of information, parents have observed youngsters’ tendency to prefer internet as a source, and almost a reluctance to learn in advance since “you can look it up when needed”. questions arise, such as ‘are these phenomena symptoms of changing beliefs about knowledge and learning? is it at all possible to learn on a deeper level simply by looking up the basic facts, without memorizing them?’ within an existing line of investigation, epistemic beliefs have been described as a set of dimensions. although internet-based information and internet as a source of information have been acknowledged, studies so far have not explored how dealing with internet-based information relates to other epistemic beliefs dimensions. to capture how users view internet-based information per se but also in relation to other epistemic beliefs, i suggest three new dimensions, out of which the most crucial is labelled ‘internet reliance’. offloading memory using memory aids is not a new phenomenon but the ‘internet reliance’ dimension indicates that especially internet-reliant users may be confusing external information with personal knowledge, with all the risks it may entail. besides including beliefs about learning, this study also challenges earlier assumptions regarding uncorrelated dimensions. keywords: epistemic beliefs; internet; constructivism; outsourcing knowledge; factor analysis info corresponding author: tore.stahl@arcada.fi doi: 10.14786/flr.v7i3.417 1. introduction and aim of study during the last decade, most people will have heard youngsters respond to a question with the acronyms jfgi or giyf (“just f…g google it” and “google is your friend”, see https://en.wiktionary.org/wiki/jfgi). for most adults, expecting a proper answer, this response was surprising, puzzling and perhaps even offensive. the response is, however, an illustration of the gap between the parent generation’s “you should know this”-view on knowledge, and the young generation’s stance “i’ll look it up when i need it”. with the introduction of easy and ubiquitous access to information over internet, the attitude of looking it up when one needs it became common, especially among frequent internet-users. given that the young generation born after the mid 1980’s grew up surrounded by information and communications technologies (hereafter ict), the interesting question is, has the easy and ubiquitous access to information actually influenced their view on knowledge, knowing and learning? during the first decade of this millennium, the so-called digital natives of the net generation were supposed to hold characteristics such as being constantly on-line, being ict savvy and being at home on social media (e.g. prensky, 2001; siemens, 2005). indeed, the youngsters differ from their parent generation in that they lack a personal history of the time before mobile phones, internet and search engines (gunter, rowlands, & nicholas, 2009, p. 3), not to mention smart phones. large parts of the youngsters within this cohort embrace the opportunities provided by ict, e.g. preferring internet-based information instead of books (cf. osf, 2010; purcell et al., 2012, p. 4). still, several studies have pointed out the heterogeneity within the generation (cf. jones & hosein, 2010; van den beemt, akkerman, & simons, 2011). also among the students participating in the present study, large differences occurred regarding both self-reported ict and media use patterns and performance-based ict skills (ståhl, 2017). within education, the easy and ubiquitous access to information raises concerns about how and upon which information students build their knowledge, since they seem to accept the veracity of on-line information too easily, and lack the skills of thinking critically and synthesizing the information found on-line (purcell et al., 2012, pp. 26-27). the vast popularity of search engines (with covert operating logics) in combination with users’ lacking critique has considerable epistemic implications, as demonstrated in the theoretical work and the studies cited below (section knowledge and information in the internet era). the present study will build upon the above studies that confirm the existence of the jfgi phenomenon. existing self-report instruments for measuring epistemic beliefs are not capable of capturing signs indicating internet-induced changes in the views of knowledge and learning. especially the digital natives’ ways of dealing with knowledge and learning have been described in literature (some examples in section hypothesized dimensions) but so far, this topic has been scarcely approached from an epistemic point of view. this topic calls for empirical investigation, which requires instruments. this paper will describe how the existing dimensions (structure and certainty of knowledge, innate learning ability and omniscient authority) are extended with the new dimensions constructivist approach, internet reliance and learning by dialogue. creating a validated instrument requires more than one round and therefore, the aim of this endeavour is an initial exploration of how new dimensions might contribute to a better description of how today’s higher education learners in an internet-saturated context view knowledge and learning. contemporary research regarding epistemic beliefs largely subscribes to epistemic beliefs being limited to beliefs about knowledge, and not about learning. the present study will deviate from this view by exploring also views about learning. doing so, this study contributes to the discussion by looking beyond the knowledge dimensions of epistemic beliefs, and by describing the connection between beliefs about knowledge and beliefs about learning, a connection that is necessary to illuminate consequences for educational practice. 2. personal knowledge, external information to provide a rationale for the present study, this section will 1) review some studies regarding knowledge, information and epistemic beliefs in the internet era, 2) review epistemic beliefs as a research area, 3) review some arguments regarding learning as part of epistemic beliefs, and 4) discuss why domain specificity and justification of knowledge where omitted from the study at this stage. 2.1 knowledge and information in the internet era george siemens tried to grasp the impact of technology and the decreasing half-life of knowledge by introducing connectivism as a new learning theory for the digital age. he suggested supplementing the existing forms of propositional (knowing-that) and procedural (knowing-how) knowledge with ‘knowing-where’ and ‘knowing-who’, i.e. an understanding of where to find knowledge. according to siemens, since we cannot experience everything or store all knowledge ourselves, we store knowledge in other people and in non-human appliances. the key is connectedness, and the knowledge is distributed (downes, 2007, p. 84; siemens, 2005). connectivism was apparently neither a learning nor a knowledge theory but rather a pedagogical view but still, the connectivist ideas resemble the concept of distributed mind, which suggests that knowledge can reside in people, in tools, and in cultural settings, and that the potential lies in the combination of those (cf. shaffer & clinton, 2006). the results of an experimental study by sparrow and her team suggest that internet has become a kind of extension to our individual memory system. if the net is available, we do not bother to memorize the information itself but rather, where to find the information, as when youngsters respond: “jfgi!” we are becoming increasingly symbiotic with our computer-based tools, growing into interconnected systems that remember less by knowing information than by knowing where to find the information. (sparrow, liu, & wegner, 2011) the concept of the extended mind (clark & chalmers, 1998) suggests that human cognition may extend beyond the brain and include elements from social and technological environments (cf. siemens, 2005). applying the concept to the context of the web opens up for the concept of the web-extended mind, which includes the idea that “… the informational and technological elements of the web can, at least on occasion, constitute part of the material supervenience base for (at least some of) a human agent’s mental states and processes” (smart, 2012, p. 451). the mere existence of the web does not automatically make it part of a person’s extended mind but in addition, three criteria need to be met: the availability criterion, the trust criterion and the accessibility criterion (clark & chalmers, 1998; smart, 2012). considering the development, that has taken place within the web and smart phone contexts since smart wrote his article, we have reason to suspect that users often regard these criteria as met, and too easily incorporate on-line information into their personal body of knowledge: due to internet capable smartphones, the availability and the accessibility criteria are easily met. the problematic part is the trust criterion: on-line information is too easily endorsed and too rarely subject to critical scrutiny (purcell, brenner, & rainie, 2012, pp. 10-11). this is especially problematic since e.g. google made personalized search in 2009 the default option for all users (simpson, 2012, p. 437). the personalization of search results performed by search engines means that the results are tailored to what will probably interest the enquirer, and that those hits that do not fit the enquirer’s profile are ranked down or even omitted. according to thomas simpson (2012), the epistemic significance of search engines lies in their acting as surrogate experts, firstly as they assist the enquirer in finding sources and secondly as they orient the enquirer to supposedly relevant sources of information (the expert role also discussed by fisher, goddu, & keil, 2015, below). the problematic aspect here is that by filtering and ranking the results, the search engine implies a judgment about what is relevant, without the enquirer having neither insight into, nor the possibility to influence the criteria for judgement. as simpson (2012, p. 427) puts it: “… objectivity may require telling enquirers what they do not want to hear, or are not immediately interested in” (my emphasis) (also see hinman, 2008). therefore, simpson regards personalization as an actual threat to objectivity. by leaving out relevant voices, the tailored search results contribute to an epistemic bubble, and the operating logics of search engines combined with the enquirers’ ignorance increases the risk of the enquirer being trapped in an epistemic bubble or even an echo chamber (nguyen, 2018). the complexity of the objectivity problem is illustrated by the findings of purcell, brenner, & rainie (2012): although a majority in their study disapproved search engines collecting information about their searches, 23-29% thought that using the information for personalizing search results was a positive feature (pp. 19-21). further, on average two thirds of the participants believed that the information provided by search engines was fair and unbiased: the younger, the more they relied on search engines’ objectivity (pp. 10-11). a further aspect, illustrating the objectivity problem, is the ritualization described by bhatt & mackenzie (2019), i.e. students’ information seeking practices being largely motivated by adhering to what they call the rules of the game. these rules can be appropriate in the beginning to induce students to the knowledge creation practices of the discipline but when detained too long, they may inhibit the development of students’ information seeking skills and trust in their own capacity to consider the justification of the information they find. in an experimental study, fisher et al. (2015) highlight the risks embedded in ubiquitous access to information, which may blur the boundaries between personal knowledge and external information, thus creating an illusion of possessing personal understanding. further, their results suggest that some individuals tend to regard internet as an expert regardless of domain. these results pose a true challenge for education at all levels, at least if we consider personal and integrated knowledge, instead of loose bits of information, as the objective of education and learning. miller & record (2013) discuss the covert operating logics of search engines and their epistemic implications using a framework building upon a responsibilist account of justified belief. according to this, an epistemically responsible enquirer will aim at having true beliefs and will therefore perform all the necessary actions to collect sufficient evidence to support his belief, such as checking a broad enough range of e.g. web pages and comparing them to other types of sources (cf. bråten, brandmo, & kammerer, 2018). there are, however, three cases where the enquirer may fail to acquire justification for his belief: 1) the enquirer neglects performing a proper search, 2) the enquirer performs a proper enquiry, but the results do not support his belief or 3) the activity to justify his belief is not possible, e.g. due to lack or impracticability of a technology. assuming that an enquirer is literate enough to avoid the first case, he can still fail as in cases 2 and 3. in cases of internet searches the problem is that, due to the covert search logics, the enquirer may not even know that he has failed. he may believe that he has performed a proper search but, due to the search engine’s filtering and ranking, the results may not provide the full picture of facts required to justify or rule out the belief. furthermore, due to the covert operating logic, it is impracticable (case #3) for the enquirer to assess the quality of the set of sources provided by the search engine. as shown above, the past decades’ technological development has induced changes in how individuals acquire information, and blurred the boundaries between personal knowledge and external information. the problem is not about using external memory aids or systems for offloading information (säljö, 2012). as säljö explains, man started developing external symbolic storages and artificial memory systems thousands of years ago, and memory aids such as otto’s physical notebook (smart, 2012) or address books in smartphones are everyday tools used to offload information from our memory. however, there is a risk that (especially young) users not only offload information but perhaps even outsource cognitive processes, since they may lack the epistemic competencies and practices required in this new information ecology (cf. bhatt & mackenzie, 2019; fisher et al., 2015; säljö, 2012; sparrow et al., 2011). to provide a rationale for the approach of this study, the following sections will briefly review 1) epistemic beliefs as a research area, 2) how epistemic beliefs may relate to learning and 3) dimensions and tools for measuring epistemic beliefs. these sections also aim at explaining how this study was delimited and why some aspects, albeit frequently discussed in other studies, were not included in this study. 2.2 epistemic beliefs as a research area william g. perry’s (1970) study of college students’ ideas regarding source and certainty of knowledge is commonly regarded as the starting point for research on epistemic beliefs or personal epistemology over the past decades, epistemic beliefs have been conceptualized in different ways (cf. schraw, 2013). some researchers conceive them as broad and developing stage-like. other researchers conceive them as a set of more or less independent dimensions expressing beliefs about knowledge and learning, marlene schommer (1990; 1993) being the first in this line of research. the term ‘epistemic beliefs’ will be used here since the study will focus on the respondents’ (implicit and unconscious) views of knowledge, not their theories of knowledge or epistemology (cf. kitchener, 2002; hofer, 2008, p. 5). the works during the 1990ies of marlene schommer (1990; 1993, later schommer-aikins) and barbara k. hofer and paul r. pintrich (1997) in developing research around epistemological theories are important to acknowledge. during the first decade of this century, research around epistemic beliefs increased and extended from perry’s original north american, white, elite, male college students context to other age groups and geographical and cultural contexts. for extensive overviews, please see the works by hofer & pintrich (2002), niessen, vermunt, abma, widdershoven, & van der vleuten (2004), debacker, crowson, beesley, thoma, & hestevold (2008) and khine (2008). further, the more recent works by schraw (2013) greene, sandoval, & bråten (2016), bernholt, gruber, & moschner (2017) and knight et al. (2017), out of which the four latter where not yet available at the time for planning this study. domain-specificity and domain differences have been issues throughout the years. the initial assumption, that one’s epistemic beliefs are general across domains, has been questioned and instead, it has been suggested that one can hold different epistemic beliefs, depending on the field of knowledge one is dealing with (muis, bendixen, & haerle, 2006). the longitudinal study by trautwein & lüdtke (2007), albeit focusing on the certainty dimension only, confirmed the hard-soft difference but also that students aiming at certain college programmes differed regarding their beliefs already at the end of their upper secondary education. in their large review, muis et al. (2006) noted that empirical research had been presented in support for both domain general and for domain specific epistemic beliefs respectively, and that they may co-exist and possibly interact. the suggestions by muis et al. were strongly supported by both hofer (2006) and alexander (2006). to conclude, i acknowledge the co-existence of and interaction between domain-general and domain-specific epistemic beliefs. the question regarding domain-generality vs. domain-specificity was, however, not the focus of the present study. the development of self-report instruments for measuring epistemic beliefs has encountered several challenges. in his review article, schraw notes that there has been disagreement about the underlying conceptual structure, and replications of exploratory factor analyses (hereafter efa and cfa will be used for exploratory and confirmatory factor analysis, respectively) have often failed. common problems have been that items load in an unexpected manner often resulting in less factors or another factor structure than anticipated in the underlying conceptual model, too few items loading per factor and the resulting model showing a low explanation score. (schraw, 2013) in the present study, i will subscribe to the line of research that considers the concept of epistemic beliefs as multidimensional. assuming that hitherto described dimension sets are not sufficient to describe epistemic beliefs in the new information ecology, i attempt to introduce some new dimensions. the aim of testing new dimensions required starting on a general level and therefore, the questionnaire items (except for the internet-related items) did not refer to any specific discipline or context (section instrument construction). 2.3 epistemic beliefs and learning alongside with motivation and cognitive styles, the concept of epistemic beliefs is an important factor affecting learning and study success. hofer & pintrich (1997) called for more research to understand how students’ epistemic beliefs may influence learning performance. further, they suggested that the type of learning tasks may shape the students’ epistemic beliefs, as shown later by kienhues, bromme, & stahl (2008). brownlee, walker, lennox, exley, & pearce (2009) approached the topic of epistemic beliefs qualitatively, and their results highlight that first-year students may hold subjectivist or objectivist core beliefs that may decrease their ability to engage in critical thinking, required in higher education. walker et al. (2009) also approached first-year students and identified some students being at risk of having difficulties in higher education due to their naïve beliefs about learning and knowing. there is also evidence suggesting cultural differences. zhang & watkins (2001) observed that chinese students’ cognitive-developmental patterns were the opposite of the patterns observed in the u.s. sample. their results also indicate that epistemic beliefs are not static but developing (cf. kienhues et al., 2008). further, hofer (2008, pp. 11-12) observed differences between japanese and us college students such that us students had more sophisticated beliefs about the factors describing certainty, simplicity, source and justification of knowledge. education is moving towards methods of teaching and learning that often involve using internet-based resources (e.g. the flipped classroom, knewton, 2011). these methods require more self-regulation from part of the student, and e.g. bråten (2008, pp. 369-370) highlights the risk that students with naïve epistemic beliefs may tend to over-reliance towards internet-based resources. regarding the changes in teaching methods, it is worth noting that the teachers’ choices of pedagogical activities and learning settings are also influenced, perhaps unconsciously, by the teacher’s own epistemic beliefs (palmer & marra, 2008, p. 337). an overall awareness regarding epistemic beliefs is called for among teachers at all levels of education. an interesting attempt to support this awareness is the theoretical model between epistemic beliefs and self-regulation suggested by muis (2007), where epistemic beliefs facilitate self-regulation and play a crucial role in all four phases (task definition, goal setting, enactment and evaluation) of the learning process. an example from a constructivist education context (pbl) is the study by otting, zwaal, tempelaar, & gijselaers (2010), where the results showed a connection between conceptions of expert knowledge and traditional conceptions of teaching and learning on one hand, and on the other hand a connection between learning effort and a constructivist conception of teaching and learning. the examples above illustrate that there is much going on within the educational context, most importantly that education is moving from being teacherand subject-centred towards being more studentand learning-centred. the development of the technological structures around ict is increasingly beyond control of the educational system. however, learning analytics is an area where education is actively applying ict: the core characteristic is the generation of high-resolution data about various types of [learning] actions (knight, wise, & chen, 2017), and applying knowledge from multidisciplinary perspectives such as business intelligence, web analytics and data mining for analysis purposes (ferguson, 2012). thus, learning analytics can generate real-time individual and group performance information with potential to support teachers’ decision-making (knight, wise, & chen, 2017). knight et al. (2017) present a novel approach as they explore how students’ epistemic beliefs predict e.g. students search behaviour (traced using learning analytics methods). their results did not show a convincing predictive value, whereas the results by pieschl, stallmann, & bromme (2014) were a bit more encouraging. this issue is further commented in section internet-specific epistemic beliefs. 2.4 dimensions and measurement within the line of investigation that regards epistemic beliefs as multidimensional, self-report instruments have been developed to capture the dimensions of epistemic beliefs. in her original 63-item schommer epistemological questionnaire (seq), schommer (1990; 1998) suggested the dimensions simple knowledge, certain knowledge, innate ability, quick learning and omniscient authority. using efa, schommer managed to extract four but not the omniscient authority dimension. thus, the dimensions described views on both knowledge and learning. several authors (e.g. hofer & pintrich, 1997) have criticized schommer for not performing factor analysis on the 63 original items but using 12 subscale scores (packages) based on those items, as variables. still, schommer’s questionnaire has been the starting point for a large part of later development regarding questionnaire-based instruments (for an overview, please see niessen et al., 2004), out of which the following instruments, besides the seq, were used as reference in the present study: wood & kardasch (2002) developed the epistemological beliefs survey (ebs) containing 38 items, out of which 32 stemmed from or resembled items in the seq, and covering two seq dimensions. schraw, bendixen, & dunkle (2002) developed the epistemic beliefs inventory (ebi) containing 28 items, out of which 17 stemmed from or resembled items in the seq. ebi reflected the same dimensions as the seq. moschner, gruber, & studienstiftungsarbeitsgruppe epi (2005) developed the 43-item fragebogen zur erfassung epistemischer überzeugungen (questionnaire for capturing epistemic beliefs, hereafter fee) containing nine items from seq. fee included the three seq dimensions certainty of knowledge, learning ability and omniscient authority. additionally, the fee proposed five new dimensions labelled social aspects of knowledge, value of knowledge, culture related aspects of knowledge, gender related approaches to knowledge and reflective nature of knowledge. 2.4.1 knowledge, knowing and learning the discussion whether epistemic beliefs should be limited to beliefs about knowledge and knowing, or whether beliefs about learning should be included, has been ongoing throughout the decades. hofer & pintrich (1997) recommended excluding beliefs about learning for the sake of clarity of the concept of epistemic beliefs. instead, they retained certainty and simplicity of knowledge (describing nature of knowledge) and proposed the dimensions source of knowledge and justification for knowing to describe the nature of knowing. schommer introduced an embedded systemic model that included beliefs about ways of knowing, interplaying with beliefs about knowledge and beliefs about learning, i.e. beliefs about knowledge and learning as separate constructs but within the same system (schommer-aikins, 2004). sandoval (2005) warned for conflation of the concepts. although beliefs about knowledge will probably influence one’s beliefs about learning, sandoval proposed that they should be investigated as separate constructs. in a comment to the discussion, elby (2009) suggested that it is too early to decide and therefore, views on learning should at least for the time being be included in the concept of epistemic beliefs for further empirical and theoretical development. for the present study, data were collected regarding beliefs about both knowledge and learning and consequently, the analyses include both aspects. this approach is also supported by previous research presented in the section epistemic beliefs and learning. 2.4.2 internet-specific epistemic beliefs the point of departure for this study, the tendency not to look up information until needed and to rely on internet-based sources, is close to the research regarding internet-specific epistemic beliefs by bråten, strømsø and their teams. in 2005, they developed the internet specific epistemic questionnaire (iseq: bråten, strømsø, & samuelstuen, 2005), which was based on the four dimensions described by hofer & pintrich (1997) and thus omitting learning dimensions. in performing efa, bråten et al. used maximum likelihood (hereafter ml) as extraction method together with an oblique rotation method but did, however, extract only two factors. they labelled the first one general internet epistemology, which included beliefs concerning the certainty and simplicity of internet-based knowledge, as well as beliefs concerning the internet as a source of knowledge, i.e. three dimensions in one factor. the second factor was labelled justification for knowing and described whether internet-based knowledge claims could be accepted without critical evaluation, or should they be critically evaluated using multiple sources, reasoning and prior knowledge. all eighteen iseq items referred to internet and thus, all questions connected explicitly and exclusively to the internet context. further, when reading the iseq general internet epistemology items it seems obvious that they do not actually reflect the certainty or structure of knowledge (cf. corresponding items in table 1) but rather, they mainly express the coverage and availability of information on the internet. thus, the iseq seems to leave questions open about the respondent’s beliefs regarding certainty and simplicity of knowledge in general, about the beliefs regarding other sources of knowledge, and how these beliefs relate to each other; unanswered questions constituting a research gap. in a subsequent study, bråten & strømsø (2006) applied parts of the seq (schommer, 1990), but not the iseq, to explore the connection between epistemic beliefs and internet-based search and communication activities. it turned out e.g. that students who believed in quick learning tend to overlook the importance of critically evaluating web-based resources. in another study, based on 17 out of 18 items in the iseq item set, the authors extracted only three factors (using ml and direct oblimin): certainty and source of knowledge, justification for knowing and structure of knowledge (strømsø & bråten, 2010). the iseq has also been applied in other contexts and for other purposes: karimi (2014), exploring the connection between internet-specific epistemic beliefs and grammar achievement, extracted the same three factors as strømsø & bråten (2010), although with varimax rotation. chiu, liang, & tsai (2013) used a chinese translation of the iseq, and applied an efa method (apparently with oblique rotation) but upon only twelve items. these authors did, however, not extract iseq dimensions as described by bråten et al. (2005) but instead, the four dimensions originally suggested by hofer & pintrich (1997), i.e. certainty, simplicity and source of knowledge and justification for knowing, but using items specifically denoting an internet-based context. kammerer & gerjets (2012) applied iseq to categorize users for comparison, but they only used eight items attributed to the iseq-dimension certainty and source of knowledge and thus, did not test the factor structure proposed in the original iseq. the study by knight et al. (2017) exemplifies a research approach linking epistemic beliefs with log data analytics. they used the iseq in an extensive study to explore whether the two-factor iseq scores could predict e.g. trustworthiness ratings of internet-based sources or traced search behaviour. according to their results, the factor scores did not predict search behaviour, and they had only small predictive value for trustworthiness rating. the approach by knight et al. is interesting and relevant but raises the question: could the connections to search behaviour have turned out differently had they not used the two-factor iseq, where the general internet epistemology factor contains a mix of certainty, structure and source of knowledge? e.g. the results by pieschl et al. (2014), indicate that students’ epistemic beliefs influence how they approach complex tasks. to conclude, epistemic beliefs have been explored also in relation to internet-based information, but the picture is disparate. the studies referred to above, as well as many other studies, suffer from the problems addressed by schraw (2013). the studies published prior to the present data collection (bråten et al., 2005; bråten & strømsø, 2006; strømsø & bråten, 2010) focused on beliefs about internet-based information without actually relating these beliefs to beliefs about knowledge based on other information sources. the studies referred to above also leave the question open, whether internet should be regarded as an authority or knowledge source, or a specific context (cf. grossnickle peterson, alexander, & list, 2017, p. 262). 2.4.3 justification for knowing hofer & pintrich (1997) introduced justification for knowing as a dimension, which was later supported by several researchers. both alexander (2006) and greene, azevedo, & torney-purta (2008) have noted that this dimension is least developed, and that exploring justification is more challenging than exploring other dimensions. this assumption seems well founded considering the complexity of the justification aspect, e.g. in terms of the responsibilist account of justified belief suggested by miller & record (2013) (see section personal knowledge, external information). greene et al. (2008) point out two aspects that are part of the challenge in investigating the justification dimension. first, considering the number of different kinds of justification identified in philosophy, justification as part of the epistemic beliefs model will probably require to be described by multiple factors rather than one single factor. further, greene et al. suggest that a person needs to have a sophisticated ontology of a domain before issues of justification, such as critical thinking, become relevant. this seems congruent both with bloom’s original cognitive process dimensions and especially with the knowledge dimensions described later by krathwohl (2002): issues of justification are probably far more relevant when applying, analysing or evaluating conceptual knowledge than when recalling facts. the above suggestion by greene et al. comes to expression in a recent study by bråten, brandmo & kammerer (2018), where they delimit the context to internet and the domain to that of educational topics within teacher education. their study focuses solely on the justification dimension, approaching it as a three-dimensional concept including justification by authority, justification by multiple sources and justification against prior personal knowledge and reasoning. as a result, they present the validated internet-specific epistemic justification inventory (isej). against the background of the considerations referred above, and the fact that epistemic beliefs in the new information ecology was totally uncharted territory, it seemed appropriate to leave the justification dimension outside this investigation. hence, the fee instrument (moschner et al., 2005) was chosen as a starting point (see section instrument construction). 2.5 research questions capturing all dimensions of epistemic beliefs (or cognition, cf. greene et al., 2008) while at the same time adding and testing new dimensions would be both adventurous and beyond this study. therefore, while acknowledging that epistemic beliefs consist of multiple dimensions developing over time, this study adopts a narrow focus on capturing a snapshot of the participants’ current epistemic beliefs, including beliefs in internet-based information. thus, the justification dimension as well as the topics regarding subject-, domain-, discipline-, cultureor gender-specificity of epistemic beliefs (see e.g. debacker et al., 2008) are beyond the scope of this study. the approach of this study is openly explorative in testing whether it is possible, overall, to extend the existing instruments and their dimension sets with new dimensions of epistemic beliefs, and specifically to capture such ways of relating to knowledge that have become common among frequent internet users during the past decades. further, this study will explore the relation between existing epistemic dimensions and those describing internet-based knowledge and knowing. apart from iseq (bråten et al., 2005), this study does not aim to explore how individuals justify internet-based information, but rather to explore whether and to which extent individuals rely on and prefer internet-based information sources, and how this preference relates to other epistemic dimensions. the investigation is framed in a single research question: (how) can the set of epistemic beliefs dimensions be extended so that it also expresses a googling attitude? the research question is openly phrased since, although research on epistemic beliefs has been going on for some time, the proposed dimensions are on uncharted territory. for the sake of clarity, i will use the term original dimensions for those dimensions described in or stemming from schommer’s seq (1990). hypothesized dimensions will be used to denote suggested dimensions until their existence has been confirmed, after which they are denoted as novel dimensions or scales in the proposed model, which is the endpoint of the present study. 3 material and methods by way of introduction to this section, i provide a rough outline for instrument construction and data collection. the first version of the instrument was created for the data collection in august 2011. the instrument was evaluated so that a revised version was used for the second data collection in august 2012, which resulted in the material being reported here. the usability and validity of data from 2012 is commented in the discussion section. it needs to be noted, that after the current data were collected, new studies describing further development have been published. the present instrument was, naturally, based on instruments that were published and available prior to 2012. 3.1 instrument construction the fee questionnaire developed by moschner et al. (2005) combined experiences from previous instruments and also contained some potentially interesting extensions. therefore, the fee was taken as point of departure for constructing the first version of the on-line survey called ‘me and my knowledge’. a replication of using the fee-specific items was performed on the first data set collected in 2011 (reported in ståhl & mildén, 2017). due to unsuccessful replication, the instrument was revised prior to the 2012 data collection: the five new dimensions suggested in fee were omitted, structure of knowledge items were included as well as some other items, based on item level analysis. in addition, some items describing the hypothesized subscales were reversely phrased. table 1 shows the entire instrument, item descriptives and item associations before and after analyses. since swedish and english are the working languages of the university, the questionnaire was set up in both languages. to ensure comprehensibility, both swedish-speaking domestic and english-speaking international students were involved in read-aloud sessions during instrument construction. an important aspect of the cultural adaptation of the questionnaire was rephrasing the questions into first person present tense, as suggested e.g. by kitchener (2002) and schommer-aikins (2004, p. 23). the main motive was to ensure a first-person perspective: the phrasing should clearly signal that the researchers were interested in knowing what the student herself thinks, not what she thinks that people in general think, or what is socially desirable to think about a topic. during the read-aloud sessions, the students provided valuable feedback acknowledging the need for cultural adaptation and inducing some further rephrasing. overall, the students’ feedback supported the choice to use direct and active wording. the items were consistently generic (not domainor discipline-specific), and the instructions did in no way refer to relating the responses to any specific subject, academic field or context (cf. wood & kardash, 2002, p. 244; muis et al., 2006, p. 25). table 1 questionnaire items in original and hypothesized dimensions, including item descriptives and item use in the proposed model table footnotes a) the number after 'k' refers to the page number (03-14) in the web questionnaire. the number after 'f' refers to the original fee numbering. b) able learning ability; auth omniscient authority; cert certainty of knowledge; constr constructivist approach; dia learning by dialogue; int internet reliance; struct structure of knowledge c) item phrasing is reverse compared to other items in the same dimension. 3.1.1 previously established dimensions the fee questionnaire (moschner et al., 2005) included the original seq dimensions certainty of knowledge, omniscient authority and learning ability. unfortunately, the dimension structure (or simplicity) of knowledge was excluded from the fee but was included in the 2012 survey being reported here (table 1). justification of knowledge should undoubtedly be a part of the epistemic beliefs dimension set. however, the new students (see section participants and data collection) that were involved as informants could hardly be expected to possess a sophisticated ontology of the domain they were just entering to study (cf. greene et al., 2008). based upon this, upon previously presented considerations (section justification for knowing) and upon the scope of the study, the justification dimension was omitted at this stage. 3.1.2 hypothesized dimensions out of the five new dimensions suggested in the fee, only reflective nature of knowledge was used in this study, and the items associated with it were rephrased to reflect reflective nature of learning. this dimension deals with the learning aspect and was intended to express a reflective stance towards new knowledge. the debate regarding digital natives did not produce an actual definition for digital natives but instead, researchers published different descriptions about how the (supposedly) digital generation acted and behaved (cf. ståhl, 2017). therefore, the dimensions described below were constructed with a starting point in descriptions regarding attitudes towards knowledge and learning, as reported in various studies. the instrument also set out to test whether the suggested attributes could be identified within this sample. the descriptions of connectivism (downes, 2007; siemens, 2005; 2006, pp. 31, 91) together with anderson & balsamo (2008, p. 244) stating that "they treat their affiliation networks as informal delphi groups” have contributed to the items proposed to describe a connectivist approach to learning (hereafter the short forms connectivist approach and constructivist approach will be used). a constructivist approach to learning has yet not been suggested in previous instruments, although some items in the dimension knowledge construction and modification suggested by wood & kardash (2002, p. 250) and the dimension reflective nature of knowledge suggested by moschner et al. (2005) point in this direction. the writings of siemens (2006, pp. 6, 20, 31) have also provided input to the items proposed to describe a constructivist approach. anderson & balsamo (2008, p. 244) described the young generation as ”…knowing and being confident where to find information once they need it”. siemens (2006, p. 31) described deciding what to memorise and choosing what to learn as characteristics in connectivist learning, inspiring the construction of items describing the hypothesized dimension just-in-time learning. reliance on internet is an integral part of the googling mind-set. at the time of planning this research the iseq had been introduced (bråten et al., 2005) but as mentioned above (section internet-specific epistemic beliefs), the iseq items focussed exclusively on internet-based information. thus, the five items concerning internet-based knowledge in the present instrument were generated from literature regarding the so called digital natives and the net generation (prensky, 2001; siemens, 2005; anderson & balsamo, 2008), and their preference for internet sources instead of printed sources (cf. head & eisenberg, 2010; purcell et al., 2012, p. 33). the items where phrased to express how the googling mind-set reflects a reliance in that any information you need can always be found on internet and accordingly, the dimension was labelled internet reliance. siemens (2006, pp. 16, 31, 56, 117) described valuing diversity as a central trait in connectivism, which requires interaction (downes, 2007, p. 78) and also involves exposing oneself to and valuing different opinions, all contributing to the individual learning process. this trait, requiring “… the widest possible spectrum of points of view…” (siemens, 2006, p. 16), can be regarded an expression for both a general scholarly approach and also the epistemic development from realist over absolutist and multiplist to evaluativist understanding (kuhn & weinstock, 2002, p. 124). the present instrument includes four previously described and six hypothesized dimensions, altogether 60 items (table 1). 3.2 participants and data collection the study was part of a university development project with the objective of collecting information about the new students’ mind-sets to develop teaching and learning practices. the university’s board on ethics approved the project research plan, including procedures for data collection, analysis and reporting. data were collected among all new students in august 2011 and 2012 (n = 476/440). since epistemic beliefs can change through intervention (cf. kienhues et al., 2008), it was crucial to get a “snapshot” of the students’ epistemic beliefs by collecting data during the very first week of the semester, before the students were exposed to study subjects or pedagogical influences at the university. data collection was organised during compulsory and scheduled ict level test sessions, where students first completed another survey called ‘ict, media and me’, then the compulsory ict driving license level tests and finally the survey ‘me and my knowledge’. figure 1 on-line questionnaire screenshot the students were introduced to the objectives of the project, and informed orally and in writing that although the ict level tests were compulsory, the surveys were voluntary and did not include any financial or other incentives. due to the survey being an operationalization of the university’s statutory obligation to continuously develop its education, informed consent was registered following a simplified procedure. the students were informed that by (performing the action of) filling in the questionnaire, they express their consent for the data being used for the purposes described in the information sheet and in the description of the scientific research data file as required in the legislation concerning personal data in research (personal data act, 1999). accordingly, the students had the opportunity to withdraw their permission by contacting the researcher by a given date, after which the data set was anonymized. the students were also introduced into the functionality of the questionnaires and informed that support was provided if needed. the survey was presented in an on-line questionnaire using a 6-point likert-type response format (figure 1). when applying the 63-item seq, wood & kardash (2002, p. 244) received student comments indicating respondents’ difficulties in understanding certain items. although some researchers (e.g. martin, 2005, p. 728) discourage the use of ‘don’t know’ options, the scale in this questionnaire was supplemented with two non-substantial options, ‘don’t know’ and ‘don’t understand’. this is partly supported by muis et al. (2006, p. 25), noting that it has not been empirically studied what individuals actually think as they fill out questionnaires. providing both options was especially important when introducing new items, since these options provided information regarding comprehensibility, potentially valuable when considering items to exclude (cf. finch, immekus, & french, 2016, p. 144). further, the non-substantial options were placed on both sides of the substantial options in order not to distort the visual midpoint of the likert-type response format (cf. tourangeau, couper, & conrad, 2004). in survey presentation, it was necessary to prevent fatigue effect and satisficing (cf. cape, 2010), and any effect where question context or order might influence question interpretation (cf. martin, 2005, p. 726; tourangeau et al., 2004). therefore, a progress indicator was included and the items were distributed over twelve pages containing four to six items each, which also improved readability. further, to prevent inter-item influence, each subscale’s items were distributed over different pages (e.g. the page in figure 1 containing items from five subscales) and the survey service was set to randomise item order within each page. 3.3 research data and sample characteristics the present study is based on data collected in 2012, where 371 students chose to complete the survey ‘me and my knowledge’. only those cases containing substantial responses to more than 70% of the items where retained for further analyses (n = 348). the 23 excluded cases had responded only to first-page items and were therefore regarded as dropouts. the complete data set with 371 cases exhibited missing values increasing from 4.2% up to 11.7% on page level, whereas this trend in the 348-case subsample developed from 2.5% to 7.0%. this, together with the dropouts, indicates that most respondents who started the survey also completed it, and that an actual fatigue effect was avoided. on item level, the portion of missing values ranged from 1.4% to 10.1%, where the two certainty of knowledge items k13_2f44 and k14_2f49 (table 1) showed the highest portions of missing values, mostly ‘don’t know’ responses. the highest ‘don’t understand’ portions occurred for three items representing the dimensions just-in-time learning (k04_1), constructivist approach (k07_2) and connectivist approach (k08_5). since the questionnaire applied a likert-type response format producing data on an ordinal scale, it is not meaningful to analyse distribution or assess normality on item level (cf. carifio & perla, 2007) but instead, analysis of the actual scales is postponed to the discussion section. for those calling for an item level analysis it can be mentioned that for each item, the response value ranged over the whole scale (1..6). the items showed a standard deviation between 1.02 and 1.66, a skewness between -1.23 and 0.91 and a kurtosis between -1.26 and 1.38. the criterion of the skewness and kurtosis value being within the range ±1 was met regarding 57 and 55 items, respectively. the shapiro-wilks test suggested non-normal distribution, whereas the kolmogorov-smirnov test suggested normal distribution throughout all items. a visual inspection of histograms, normal q-q plots and box plots showed that the items were approximately normally distributed. for the items showing skewness or kurtosis outside the ±1 range, the deviation was minor and further, the sample size was large enough to reduce a possible detrimental effect (cf. hair, black, babin, & anderson, 2010). based on the aforementioned criteria, the items were considered as normally distributed. the current 348-case subsample holds students from twelve degree programmes, both domestic and international students (86.8% / 13.2%), and a gender distribution holding 66% female students. the age average was 21.7 with 91% being born in 1986-1995. for this study, sample demographics should be reviewed in relation to access to internet resources. internet and publicly available search engines were launched already in the mid 1990’ies and during the following ten years, search engine use was established (http://www.searchenginehistory.com/). 2011-2012 were the very years when internet services, previously available via computers, became truly ubiquitous due to 3g/4g-connected smartphones becoming everyday tools, and finnish net operators offering affordable 3g/4g-subscriptions including generous mobile data. the mobile phone prevalence within both cohorts was close to 100%. smartphone as a concept was not yet established and thus, the corresponding survey item was phrased “my mobile phone is connected to the internet”. from 2011 to 2012, the portion of users across the cohorts having an internet-connected phone increased among domestic students from 48.7 to 81.3% and among international students from 60.7 to 90.6%, within the total cohorts from 50.2 to 82.5%. this corresponds well with the national statistics, according to which 53% of those aged 16-24 had a smartphone in the spring of 2011 (osf, 2011). at the time of data collection, the respondents had been exposed to computers, mobile phones and internet for in average 12, 10 and 9 years respectively. to conclude, the sample can be regarded a rather typical net generation cohort. 3.4 analysis methods fabrigar, wegener, maccallum, & strahan (1999) argue that principal component analysis is not a true method of factor analysis. they recommend the use of maximum likelihood, as later supported by osborne (2014, p. 9) and finch et al. (2016, p. 131). thus, the analysis procedure starts with an efa with ml as extraction method, followed by a validation procedure including efa and cfa on split halves of the sample (fabrigar et al., 1999; fokkema & greiff, 2017; knight et al., 2017; leal-soto & ferrer-urbina, 2017; osborne, 2014, pp. 6, 119-120; tang, 2010). for all statistical tests, a significance level of .05 was used and in efa and cfa procedures, the absolute loading value .32 was used as the threshold when assessing item (non-)loadings and cross-loadings (cf. finch et al., 2016, p. 143). in table and diagram presentations, loadings <.32 are generally not displayed although during efa procedures, low loadings were not suppressed since that may cause loss of valuable information (such as item k14_5, table 3). the spss software package (spss, 2016b) was used for efa procedures, and the cfa procedures were performed using the amos software package (spss, 2016a). to build analysis on true data, missing data values were not imputed, since any kind of replaced or imputed values are, after all, only estimates. this choice was made at the cost of listwise deletion reducing the number of cases in efa, and missing values thwarting the use of modification indices to support refinement in cfa. 4 results in this section, the results are presented together with analyses, since some results inform the subsequent steps. the reasoning behind e.g. item disposal, number of factors and factor labelling will be presented in conjunction with efa on the complete item set. 4.1 original and hypothesized dimensions of epistemic beliefs 4.1.1 replicating original dimensions for replication purposes, efa was first performed on the 27 items associated with the original dimensions. dysfunctional items (zero, low and cross-loading) were stepwise discarded (cf. finch et al., 2016, pp. 143-144). a model based on 18 items showed good fit indices and was interpreted as a successful replication (despite ml extraction and promax rotation). 4.1.2 emerging dimensions the 60-item questionnaire contained 33 items that were associated with six hypothesized dimensions: reflective nature of learning, connectivist approach, just-in-time learning, constructivist approach, internet reliance and valuing diversity (table 1). seeking inspiration from bråten et al. (2005) and trautwein & lüdtke (2007) who analysed only one or two factors, this item subset was initially factor analysed separately in order to identify dysfunctional items. the efa on the hypothesized dimensions was truly exploratory, including different rotation methods, varying the number of extracted factors and stepwise reduction of dysfunctional items (finch et al., 2016, pp. 143-144; osborne, 2014, pp. 17, 30-33). using ml, promax rotation and listwise deletion, the efa resulted in a four-factor model based on 23 items (n=191). the model showed good fit indices (eigenvalues 6.35 .. 1.5, 51% of variance explained; kmo=.864, bartlett's chi-square=1397, df=253, sig.<.000, goodness-of-fit test chi-square=182.7, df=167, sig.=.192) and reflected three of the hypothesized dimensions: constructivist approach, connectivist approach, internet reliance and a fourth one, now labelled learning by dialogue. each item loaded strongly on one factor without cross-loadings. throughout the various models, constructivist approach and connectivist approach correlated strongly, and several of the other factors correlated weakly with each other (table 2). table 2 factor correlation matrix, four-factor model based on 23 new items 4.1.3 an extended set of dimensions since inter-factor correlation occurred in all the previous analyses, efa on the complete item set were performed using ml extraction and promax or oblimin as oblique rotation methods (cf. finch et al., 2016, p. 133; knight et al., 2017; osborne, 2014, pp. 30-33; strømsø & bråten, 2010). the model was stepwise refined by removing low-loading and cross-loading items while simultaneously assessing their conceptual relevance, their communality estimates and their internal consistency within the anticipated scale. during the process, the six reversely phrased items occurring in four dimensions (table 1) were discarded due to dysfunctionality. thus, within all the hypothesized dimensions, the items were unidirectional. the refinement procedure boiled down to a model with seven factors and 26 items that fit the data reasonably well (table 3). both original and hypothesized dimensions appeared distinctly without cross-loadings (except for items k10_7 and k14_5), each dimension loaded on at least 3 items, and 25 out of 26 items loaded (>.32) on the anticipated factor. table 3 the proposed efa model based on 26 items table footnotes ml, promax rotation converged in 16 iterations; listwise deletion, n=195, eigenvalues 4.85 .. 1.01, 59.3% of variance explained; kmo=.782, bartlett's chi-square=1468, df=325, sig. <.000, goodness-of-fit test chi-square=169.8, df=164, sig.=.361 a) item loading not consistent with hypothesized dimension but conceptually coherent. from this model, several items were dropped due to loading weakly or inconsistently with the hypothesized dimension. the retained original items loaded on the same factors as in the eq, except for item k04_4f04 that was originally associated with omniscient authority. the current loading on certainty of knowledge can be regarded as conceptually coherent. the issues regarding which items to discard, how to decide on the number of factors, and how to label the subscales require some comments. as osborne (2014, pp. 17-18) notes, efa is a low-stakes procedure and expressly exploratory. accordingly, i entered the process with 60 items, a hypothesized underlying conceptual model, and used the statistical package (spss, 2016b) to provide suggestions for a factor model. during efa iterations, the dysfunctional items were eventually revealed and discarded. besides varying extraction and rotation, the search for an adequate number of factors included extracting factor sets ranging from two factors below up to two factors above the number suggested by the scree plot elbow (osborne, 2014, p. 18). this method provided valuable information: increasing the number of factors caused related items to split over several factors, whereas reducing factors caused items to pile up on one factor, usually then holding items from dimensions that at the end turned out to correlate. thus, the search for a factor model included weighing of theory, scree plot, item loadings and communalities, eigenvalues, internal consistencies and conceptual considerations. during the efa iterations, the hypothesized dimensions did not turn out quite as anticipated, which is only part of the nature in explorative work (cf. osborne, 2014, p. 17). in most of the explored models, the five hypothesized dimensions boiled down to three (table 3). the dimension learning by dialogue holds items from the suggested dimensions valuing diversity and connectivist approach, whereas the dimension constructivist approach besides its own items also holds items from the hypothesized dimensions valuing diversity and reflective nature of learning. all the three items originally associated to internet reliance consistently loaded on that dimension (table 1). retaining the items k14_5 and k10_7 violates the rule of using only strong, single-loading items and requires commenting. the internal consistency test showed that deleting the item k14_5 entailed a slightly improved alpha value, but at the cost of reducing the factor internet reliance to only two items. further, since the item communality value was reasonably good, the connection to structure of knowledge occurred also in the path diagram, and the cfa indicated that discarding the item impaired fit indices, there was enough support for retaining the item. as expected, the item k10_7 loaded strongly on the structure of knowledge dimension, but surprisingly also on constructivist approach. the item was retained since discarding it would have impaired the structure of knowledge internal consistency considerably. the split half efa suggested single-loading on structure of knowledge, whereas the cfa suggested a connection to constructivist approach and indicated that discarding the item impaired fit indices. three of the original dimensions (except learning ability) correlated weakly with each other. further, constructivist approach correlated strongly (.506) with learning ability and weakly (.282) with learning by dialogue. 4.2 evaluating the extended instrument for the purpose of evaluating the stability of the model presented in table 3, the data set was randomly split into two equal halves a and b, that were subject to efa and cfa, respectively (cf. fokkema & greiff, 2017, p. 401). 4.2.1 exploratory split half an efa was performed with 26 items on the split half a using the same methods as in the initial model. the model arrived at (table 4) did not show a one-to-one correspondence to the initial model (table 3) but resembled it strongly, with 22 out of 26 items loading as anticipated. all the hypothesized dimensions were reflected in the seven factors, although two of the constructivist approach items loaded on the learning ability factor (a). further, two items loaded on unexpected factors (b), and the learning by dialogue item k03_5 caused a heywood case. still, the fit indices suggested that the proposed model, appearing almost similar in both oblimin and promax rotation, fit also the split data set fairly well. table 4 exploratory factor analysis on split half a table footnotes a) item loading not consistent with hypothesized dimension but conceptually coherent b) item loading not consistent with hypothesized dimension, vague conceptual coherence ml, oblimin rotation converged in 12 iterations; listwise deletion n=96, eigenvalues 5.00 .. 1.06, 62.9% of variance explained; kmo=.706; bartlett's chi-square=928, df=325, sig. <.000; goodness-of-fit test chi-square=179.5, df=164, sig.=.193 as in the initial model, the constructivist approach factor correlated strongly with the learning ability factor (.494) and omniscient authority correlated with certainty of knowledge (.363). 4.2.2 confirmatory split half cfa was performed on the same 26 items as the previous efa but on the split half b (n=174) of the data set. in the first step, conceptually irrelevant and low connections between latent variables were removed, which resulted in an initial model with partly insufficient fit indices. assessing model fit and choice of cut-off criteria (in brackets) follow the recommendations by schreiber, nora, stage, barlow, & king (2006) and hooper, coughlan, & mullen (2008). since the data set contained empty cells, it was not possible to utilize the feature where the amos software would provide suggestions for modification (spss, 2016a). instead, model refinement was performed manually, partly following loadings and correlations indicated in the previous efa models (tables 3 and 4), and partly by adding and removing connections based on conceptual considerations in an exploratory manner. thus, some connections between latent variables, although weak, were retained, under the condition that they were conceptually defensible and contributed to improving fit indices. then again, in some cases conceptually defensible connections had to be discarded if their loading value was low (mainly <.32) and retaining them impaired the fit indices. the procedure resulted in a conceptually defensible path diagram (figure 2), similar to the efa models (tables 3 and 4) and reasonable although not perfect fit indices (chi-square/df=1.504, rmsea=.054, tli=.821, cfi=.853, pclose=.260). the item k13_6 caused a minor heywood case (1.01), which was accepted since any attempt to manipulate constraints impaired fit indices. figure 2. simplified cfa path diagram based on 26 items and split half b data set (cut-off criteria in brackets). chi-square/df=1.504 (<2), rmsea=.54 (<.60), tli=.821 (≥.95), cfi=.853 (≥.95), pclose=.260 (≥.05). 5 discussion 5.1 construct validity after the seven dimensions had been identified (table 3), they appeared stable throughout the succeeding analyses. in some cases, items associated with the dimensions constructivist approach and learning ability cross-loaded. this phenomenon is conceptually coherent considering the strong correlation between these factors that, in turn, is possibly due to a latent second-level variable. the internal replications by exploratory and confirmatory factor analyses on randomized split halves provide information speaking in favour of the proposed factor model. since the suggested construct holds good or reasonable fit indices and behaves in a consistent manner throughout the different analyses, it can be regarded as holding initial construct validity. initial meaning here that the present study was only a first attempt to launch the hypothesized dimensions, and further research (with new data and adjusted items) is still required to test the generalizability of the construct (finch et al., 2016, pp. 127-128). further testing should also involve a diverse student population with regards to domains and cultural background. 5.2 content validity in general, the factors reflect both original and hypothesized dimensions. in all efa models (tables 3 and 4) as well as in the cfa path diagram (figure 2), the highest loading on each factor occurred on one of the anticipated items. regarding the original dimensions, it turned out that 12 out of the 27 items reflected the anticipated construct and one item loaded differently than in previous studies (see rightmost column in table 1). within the hypothesized scales, 13 items where included in the novel scales. to answer the question if and to which extent the factors actually describe the dimensions, this section presents comments regarding each dimension in the model (table 3, figure 2). the original dimensions retained their original labels, and the labelling of the novel dimensions is commented in section novel dimensions. choices regarding factor model and number of factors were discussed in section an extended set of dimensions. 5.2.1 original dimensions learning ability three of the four items in the learning ability dimension proved stable across most analyses and models, whereas the item k12_9 was dropped at an early stage. in some efa models, this factor attracted items from the constructivist approach dimension, which is consistent with the strong correlation between these dimensions (table 3, figure 2). omniscient authority in her earliest studies, schommer (1990; 1998) reported this dimension as difficult to capture, whereas schraw et al. (2002, p. 267) and moschner et al. (2005) were able to identify this dimension. in this sample, the authority dimension manifested clearly in all models, and the items k09_3f29, k12_5f42 and k14_7 loaded consistently on the omniscient authority factor. structure of knowledge throughout the analyses, most of the structure of knowledge items loaded as expected. item k12_7 often loaded on the learning ability factor but for this item, a connection to learning is not far-fetched; combining information across sources may express an active stance towards learning, rather than a view of the structure of knowledge. thus, this item may be an example of a phrasing containing something that might be called keyword shifting, where the keyword “combining” is perceived differently: some respondents recognize the active learning approach, whereas other see it as an expression for knowledge as bits and pieces that can be combined or kept isolated. during the refinement process, item k12_7 as well as several other items were discarded, leaving four items to represent this dimension. looking at the discarded vs. retained items (table 1) does, however, raise some questions. it seems unfortunate to discard the items k05_7, k06_8, k07_8 and k12_7, since they indeed express a knowledge aspect, i.e. a very clear stance regarding the structure of knowledge as isolated facts vs. information that can or should be combined into larger entities. then again, the retained items seem to focus very much on the actions and behaviour from part of the teacher, almost like introducing a teaching aspect to epistemic beliefs, besides the knowledge and learning aspects. interestingly, five out of the six discarded items (k05_7, k06_8, k07_8, k09_7 and k12_7) stem from the original eq. certainty of knowledge in most models, items k03_8, k04_2f13 and k13_2f44 loaded as expected on the certainty of knowledge factor, but often accompanied by items k04_4f04 and k05_1f15, originally associated with omniscient authority. the items k04_4f04 and k05_1f15 (discarded) may be examples of items with keyword shifting of another kind. here, the respondent may pay attention either to the “experts/ teachers” as authorities, or rather focus on “same answers / same understanding”, the latter option connecting more to knowledge being certain. this observation shows similarity to the eq subset avoid ambiguity loading on simple (structure of) knowledge instead of certain knowledge (cf. schommer, 1990; wood & kardash, 2002, p. 241). it should be mentioned that the discarded items k13_2f44 and k14_2f49 were the ones to show the highest ‘don't know’ portions. 5.2.2 novel dimensions out of the six hypothesized dimensions, three survived the efa and cfa iterations. constructivist approach and internet reliance were retained and learning by dialogue was introduced as the third dimension. the crucial question is, whether the novel dimensions are defensible. do they reflect the constructs, and are the constructs relevant and credible? constructivist approach to learning the novel dimension constructivist approach to learning did not turn out as anticipated but instead, in the proposed model it holds items also from the hypothesized dimensions valuing diversity and reflective nature of learning (table 1). most of the seven items loading on this factor in the initial model (table 3) proved stable throughout the different analyses. in the split half efa, the items k05_2 and k07_2 loaded on learning ability, which is both conceptually coherent as well as understandable considering the strong correlation between these dimensions. to some extent, the constructivist approach can be regarded as an antithesis to omniscient authority; a naïve stance on the omniscient authority dimension would entail a belief that knowledge is handed down by some authority, which implicitly would exclude the possibility of the individual constructing knowledge herself. however, if these dimensions were opposite to each other, they would also correlate negatively, which was not the case. the explanation may lie therein that the items that were used to operationalize the omniscient authority dimension mainly focus on how the respondent relates to authorities and to the knowledge handed down by them. the items do not actually provide information about to which extent the respondent thinks it is possible to construct knowledge. thus, the constructivist approach dimension can rather be regarded as a supplement to the omniscient authority dimension and furthermore, whereas the omniscient authority dimension expresses a knowledge (source) aspect, the constructivist approach dimension expresses a learning (as construction) aspect. learning by dialogue in the initial efa (table 3), this dimension contained items from the hypothesized dimensions valuing diversity (k03_5, k03_7) and connectivist approach (k04_3), all with strong loadings. in the efa on split half a, the item k03_5 caused a heywood case while both other items loaded weakly and moreover, k04_3 loaded on the omniscient authority factor, which is conceptually questionable (table 4). the cfa path diagram on split half b (figure 2) shows rather weak loadings on this latent variable, but the correlation with constructivist approach is strong, which is conceptually coherent. the discarded items, especially k04_5 and k08_5, that were suggested to describe connectivist approach and valuing diversity, might still be worth testing after rephrasing. despite some instability, probably due to low number of cases in the split halves, this dimension can still be defended since it expresses an aspect not expressed in the previous dimensions, namely learning as a social process where the interaction with others, also those representing divergent opinions, is central. internet reliance internet reliance contains three of the five items originally associated to this dimension. some items associated to just-in-time learning might have been associated to this dimension but were discarded due to instability. the items k12_6 and k13_6 proved stable across the analyses, whereas k14_5 loaded weakly and cross-loaded in the initial model and loaded on certainty of knowledge in the split half efa. in the split half cfa, k13_6 caused a heywood case and k14_5 loaded weakly on this dimension. adding a connection from structure of knowledge to k14_5 improved fit indices. the corresponding loading also occurred in the initial efa model (table 3). the fact that iseq items (bråten et al., 2005) were not included to a larger extent may be surprising. however, a closer look shows that the three items included in the present instrument resemble the iseq general internet epistemology items strongly, and basically cover the same topics. the items in this dimension were presented from a naïve perspective and were slightly skewed to the right, indicating that the respondents were not quite as convinced of internet as the digital natives debate may have suggested. 5.3 correlating dimensions the question whether the dimensions correlate or not has been an issue throughout the years within this line of investigation. in her first studies, schommer (1990; 1998) used only varimax rotation and apparently assumed non-correlating dimensions. one might ask if the idea of a set of “more or less independent dimensions” (schommer, 1990) has created an expectation of the dimensions being uncorrelated? schraw et al. (2002, p. 265) analysed their material using both orthogonal and oblique rotation, but concluded that the factors did not correlate. still, their principal component analyses with varimax rotation revealed a weak positive correlation between the omniscient authority and simplicity/structure of knowledge dimensions (p. 269). then again, wood & kardash (2002, p. 252) found moderate to strong inter-factor correlations using factor analysis. wood & kardash (2002, p. 239) also discourage from limiting exploration to orthogonal rotation methods, since forcing inter-correlated factors into an orthogonal model will cause items to cross-load, and the attempt to find a simple structure will fail. otting et al. (2010) identified a relation between expert knowledge (cf. omniscient authority), certainty of knowledge and traditional conceptions of teaching and learning. accordingly, they also identified a relation between learning effort (cf. learning ability) and constructivist conceptions of teaching and learning (cf. the constructivist approach identified in the present study). thus, since the first explorations in the present study indicated that at least some factors correlate, it was obvious that oblique rotation methods should be used (cf. finch et al., 2016, pp. 133, 142; osborne, 2014, pp. 30-33) to allow the factors to correlate, and as it turned out, they did. throughout the analyses (tables 2, 3, 4 and figure 2), the naïvely oriented original dimensions omniscient authority, structure of knowledge and certainty of knowledge correlated with each other. this is in line with the findings by bråten et al. who merged these dimensions into a factor labelled general internet epistemology, but also raises the question if the general internet epistemology factor (bråten et al., 2005; knight et al., 2017) actually suggests a second-level latent variable? across the novel dimensions, correlations occurred between learning by dialogue and constructivist approach, although surprisingly weak. then again, constructivist approach always correlated strongly to learning ability, which was also confirmed in cfa (figure 2). the internet reliance dimension correlated weakly with certainty of knowledge and structure of knowledge (in cfa only with the latter) which may seem surprising but still coherent when taking a closer look at the single items. believing that you can get almost all information about a subject by googling one or two internet sources, and that they can provide you with a clearer picture (than books), will probably go hand in hand with a belief in knowledge being certain and structured. the overall weak correlations to internet reliance may also suggest that this dimension develops “more or less independently” as schommer (1990) originally suggested. the strong correlation between constructivist approach and learning ability is coherent, since believing in everyone’s ability to learn how to learn is part of the constructivist view where the metacognitive component, the learner’s awareness of her/his own learning, is central. the correlation between constructivist approach and learning by dialogue is also coherent. the constructivist approach regards learning as a process of reasoning and construction, where meaning and interpretation is often negotiated in social settings, in dialogue with other learners, and learning is enriched by multiple views and perspectives. the correlation is lower than anticipated, which may be due to learning by dialogue being represented by only three items. to conclude, limiting the efa to orthogonal rotation methods would have concealed the inter-factor relations reported here and perhaps also forced the items to load on inappropriate factors (cf. osborne, 2014, pp. 30-33). 5.4 methodological considerations 5.4.1 scale considerations data and sample have been partly described in the section research data and sample characteristics. due to elimination of cases with a high portion of non-response, the items used for analysis contained between 91.7 and 98.6% substantial responses. since the purpose was to form subscales, it is appropriate to inspect the characteristics and normality of the subscales (cf. carifio & perla, 2007). table 5 comparison of subscale item means, number of items and subscale internal consistencies in fee (moschner et al., 2005), in hypothesized and in proposed model subscales table footnotes a) means and alpha values are calculated excluding six items with reverse phrasing, cf. table 1 b) containing one item also from omniscient authority c) containing items also from reflective nature of learning and valuing diversity d) containing items from connectivist networking and valuing diversity the internal consistencies were analysed both for the hypothesized subscales (54 unidirectional items, table 1) and for the proposed model subscales (26 items, table 3). as illustrated in table 5, the subscales showed large variations; for two of the novel dimensions, the internal consistency index was acceptable. however, for the subscales certainty of knowledge and learning by dialogue, the alpha values were disappointingly low, although not necessarily poor compared to earlier studies (e.g. schommer, 1993; schraw et al., 2002, pp. 266-267; wood & kardash, 2002, p. 253). however, as wood & kardash (2002, p. 237) point out, a low internal consistency value should not too hastily be taken as a motive to discard a subscale. rather, a low value should encourage increasing the number of items and developing them such that they can more precisely express the respondent’s stance on a specific matter. further, as osborne (2014, p. 105) notes, the alpha values (table 5) rather express properties of the sample than properties of the instrument. carifio & perla (2007) recommend 6-8 items for each factor in efa, and the presented model can be criticized for not reaching up to that recommendation. further, the items within each subscale were unidirectional and thus, the lack of reversely phrased items can be criticized (cf. carifio & perla, 2007). however, comparing items within the hypothesized subscales (table 1, still containing bidirectional items) shows that on average, the sophistically oriented items score higher than naïvely oriented items, which indicates that the items measure accurately. as factor analyses often show (e.g. bråten et al., 2005; chiu et al., 2013; leal-soto & ferrer-urbina, 2017; schraw, 2013), the model arrived at in the exploratory procedure is not necessarily identical with the hypothesized conceptual model, regarding neither item set nor factor structure, as was the case here. the results of an efa are not sufficient to confirm a model (osborne, 2014, pp. 19, 49) and therefore, the model arrived at was subject to an internal replication, i.e. efa and cfa on randomized split half data sets (table 4, figure 2). these analyses largely hold the same factor structure as the initial efa model, thereby confirming it. in both split halves, the same inter-factor correlations as in the initial model recurred, which also applies for the cross-loading items k10_7 and k14_5. 5.4.2 data considerations in addition to methodological issues discussed above, the usability and relevance of the current data set (stemming from 2012) should be assessed against the aim of the study and the research question, while taking into account if and to which extent the past years’ technological development has changed the cognitive operating environment. firstly, the aim of the study, as expressed in the research question, was not a validated version of a new instrument but rather, a first exploration of new epistemic dimensions that might contribute to a more nuanced epistemic profile, especially regarding the googling attitude. for this purpose, the data set proved sufficient. secondly, the googling attitude is highly dependent on access to internet and search engines. as reported earlier (section research data and sample characteristics), not much has changed on that point. by 2012, internet penetration within the sample and in finland had long been close to 100% (osf, 2010; osf, 2011), and the majority of the informants had a long history of internet exposure. after 2012, the width of services over mobile devices has undeniably increased beyond browsers and search engines to various applications, probably inducing use habits that rely even more on ubiquity. it is not far-fetched to assume that users today may be even more prone than in 2012 to consult internet-based sources. consequently, if the current research data can demonstrate even weak signs of a googling attitude, then the data fulfils its purpose and one may assume that a newer set of data would reveal even clearer signs. to conclude, the current data set has served the aim and provided an answer to the research question of the current study as for the current sample. as further elaborated in the concluding section, i did not produce a validated instrument. still, the results corroborate the initial assumption about a connection between the googling attitude and epistemic beliefs and encourage further development along this line. should we choose to regard the current results simply as expressing the 2012 state of affairs, the results will still be relevant for historical comparison. 6 conclusions 6.1 dimensions and constructs in the present study, five novel dimensions were introduced and operationalized in 33 items, based both on literature about so-called digital natives and learning in the digital era as well as empirical observations. three novel dimensions, described by thirteen items, survived the process; constructivist approach, internet reliance and learning by dialogue. the dimension constructivist learning approach appeared as a rather stable dimension, correlating strongly with learning ability and moderately with learning by dialogue. these correlations are conceptually coherent, as is the lack of correlation to omniscient authority. the latter suggests that having a constructivist learning approach does not exclude believing that an omniscient authority can be an important source of knowledge but rather, the constructivist learning approach can be regarded as a learning aspect supplementing omniscient authority, describing a knowledge aspect. learning by dialogue was mainly inspired by the connectivist model suggested by siemens (2005; 2006), but during the analyses a picture emerged, where this dimension mainly deals with learning and construction of knowledge as a social process. just as the dimension constructivist learning approach, learning by dialogue provides a learning aspect not captured by previously described dimensions. internet reliance poses a dimension with a knowledge aspect, not covered by previous instruments, and is probably the dimension that most of all expresses the googling attitude referred to in the introduction. furthermore, it expresses a way of relating to knowledge that has not been possible before. indeed, during the pre-internet era it was possible to offload your memory to books or other external media. however, due to access, time and distance barriers, “looking it up in a book” was not an option of the same range as “looking it up on the net” (cf. fisher et al., 2015). thus, since the introduction of internet, it is in fact possible to refrain from memorizing and instead to offload one’s memory and to rely on finding the information on the net, immediately and once you need it, which is not a problem per se. the problems and risks lie in the confusion of knowledge and information, where the ubiquitous access to information creates the illusion of possessing personal knowledge (fisher et al., 2015). technology developing and becoming more powerful accentuates this problem, when not only information storage but also information processing is outsourced, thereby changing our epistemic practices (säljö, 2012; sparrow et al., 2011). the confusion of knowledge and information can also be viewed from the perspective of cognitive processing as described e.g. in the extended version of bloom’s taxonomy (krathwohl, 2002). if a person is to achieve a deeper level of knowing about a topic, the first level, remembering or ‘knowing-that’, is always a prerequisite for moving on to understanding, applying, analysing, evaluating and creating. in this perspective, the googling attitude suggests a ‘knowing-where’ (siemens, 2006, p. 10), which can be regarded as a stage of external information, possibly preceding remembering. however, not until that external information has been memorized and transformed into a ‘knowing-that’ as part of the personal body of information, it can enable the following levels of knowing. 6.2 epistemic awareness and educational practice muis et al. (2006, p. 42) have drawn our attention to that students should be made aware of their epistemic beliefs, since this awareness may be important for epistemic change. the same challenge has recently been addressed by bhatt & mackenzie (2019) but now with focus on the internet context and digital literacy. thus, epistemic awareness is a component in epistemic competence for both teachers and learners. much of the pedagogical potential of the novel dimensions can be deduced from the cross sea between changing pedagogies and the new learning environments emerging with new ict and media. many teaching methods and learning activities, such as the flipped classroom (cf. knewton, 2011) and pbl (cf. otting et al., 2010), increase the demands on students' self-regulation and their ict and media literacy (cf. muis, 2007; brownlee et al., 2009; walker et al., 2009; bhatt & mackenzie, 2019). thus, if a study programme is built e.g. upon pbl, it is useful to know to which extent the students in a new group actually have a constructivist approach and readiness for learning by dialogue, and how to support students’ self-directedness. should it turn out that many students lack these prerequisites, appropriate interventions can be applied to develop their epistemic mind-sets on these dimensions, thereby improving their academic performance. increased understanding regarding both teachers' and students' epistemic beliefs has been called for (cf. palmer & marra, 2008, p. 345). if the novel dimensions can increase awareness regarding the connection between epistemic beliefs and learning tasks over changes in epistemic beliefs by intervention (cf. kienhues et al., 2008), they have the potential of contributing to instruction and learning strategies that are better aligned to both learning objectives and the learners’ epistemic orientations. due to internationalization and student mobility, classes will increasingly hold students and teachers with diverse cultural backgrounds. thus, if epistemic beliefs are dependent on cultural background as suggested by e.g. zhang & watkins (2001) and hofer (2008, pp. 11-12), then awareness about this connection is increasingly important for the teacher to support and guide the learning processes in a multicultural class with students holding diverse, culturally induced, epistemic orientations. the most crucial finding of this study is the introduction of the internet reliance dimension. identifying students with a naïve stance on this dimension may prove important especially if these students are over-reliant towards internet-based resources (cf. bråten, 2008, pp. 369-370). if so, they are at risk of developing an ever-narrowing worldview and an epistemology of ignorance resulting from the ranked and filtered results provided by search engines (cf. bhatt & mackenzie, 2019; hinman, 2008, p. 73; nguyen, 2018). 6.3 future research the results presented above respond to the openly phrased research question by confirming that it is indeed possible to extend epistemic dimensions so that they also express the googling attitude. this is, however, only part of the answer: the novel dimensions need to be further tested e.g. by exploring whether they show between-groups variations congruent with the googling attitude they are expected to express. a connection between epistemic beliefs and academic performance has been suggested (e.g. aditomo, 2018). if the instrument for measuring epistemic beliefs can be developed to measure more precisely, it will probably have a predictive value in assessing each student’s epistemic competence in relation to study context, and a value for teaching practices in supporting students’ epistemic competencies by appropriate choice of learning activities. on this point, the picture is disparate with both encouraging (pieschl et al., 2014) and discouraging (knight et al., 2017) results and thus, epistemic beliefs as predictors of learning behaviour seems an under-researched area. however, net-based learning environments (lms, vle) having started to include learning analytics features will provide better possibilities to investigate the connection between students’ epistemic dimensions and trace data from authentic learning contexts, i.e. courses. there are indicators suggesting that epistemic beliefs dimensions should be measured on a sufficiently fine-grained level, since coarsely composed dimensions as the general internet epistemology (knight et al., 2017), will blur the picture. the study by trautwein & lüdtke (2007), focusing on the certainty dimension, is an interesting initiative in this line. the recent study by bråten, brandmo & kammerer (2018) expresses what we might call increased granularity: besides focusing only on the justification dimension, they divide it into three sub-dimensions, justification by authority, multiple sources and personal knowledge. these examples, together with earlier replication problems (schraw, 2013) expose a challenging tension: should we measure epistemic beliefs as a set of dimensions or as separate constructs? the proposed model arrived at (table 3) and confirmed by internal replication (table 4 and figure 2) shows fit indices that are not ideal but sufficient to encourage further development. despite deficiencies, the model provides an interesting input to the debate whether epistemic beliefs should include only views on knowledge, or also views on learning. the cfa path diagram (figure 2) provides an illustration to this debate: two groups of latent variables, the upper group describing views on learning, and the lower one describing views on knowledge. it is not far-fetched to imagine two second-level latent variables, influencing views on knowledge and views on learning, respectively (cf. section correlating dimensions). the correlations within the two groups of latent variables, especially the strong correlation between constructivist approach and learning ability, also point in this direction, and exploring second-level latent constructs is a topic for further investigation. topics dealing with the instrument itself include 1) developing the instrument such that each dimension would be represented by more than only three items (cf. carifio & perla, 2007), 2) improving items with low loadings, and 3) exploring the discarded items regarding common features that might have contributed to their dysfunctionality. in addition to these topics, the functionality of the model should be tested by exploring how well the dimensions distinguish different learners. this will be done by exploring if and to which extent dimensional group differences can be identified e.g. across users representing different digital orientations or study domains. the extensions to the epistemic beliefs instrument and the proposed (but not validated) model are, needless to say, only a beginning. considering the twenty years of history with seq and its successors gives an idea of the work that still lies ahead. keypoints the novel dimension internet reliance may help in identifying learners that are over-reliant towards internet-based resources. beliefs about learning contribute to describing one’s epistemic orientation, although they are not regarded as part of the epistemic beliefs concept. although assumed to develop independently, the epistemic beliefs dimensions correlate when using an appropriate rotation method. the novel dimensions contribute to an epistemic awareness and to adapting instruction and learning practices to learners’ epistemic orientations. an increasingly international learning context and multicultural student body requires awareness about culturally induced epistemic orientations. acknowledgments this research was funded by föreningen konstsamfundet, koulutusrahasto and svenska kulturfonden. i am grateful for the support provided by arcada through filip levälahti and all participating students during the data collection process, and for the support from the meda project through matteo stocchetti. i am grateful also for the feedback provided by my supervisors marita mäkelä and eero sormunen, and by anonymous reviewers to earlier drafts of this paper. references aditomo, a. (2018). epistemic beliefs and academic performance across soft and hard disciplines in the first year of college. journal of further and higher education, 42(4), 482-496. doi:10.1080/0309877x.2017.1281892 alexander, p. a. (2006). what would dewey say? channeling dewey on the issue of specificity of epistemic beliefs: a response to muis, bendixen, and haerle (2006). educational psychology review, 18(1), 55-65. doi:10.1007/s10648-006-9002-7 anderson, s., & balsamo, a. (2008). a pedagogy for original synners. in t. mcpherson (ed.), digital young, innovation, and the unexpected (1st ed., pp. 241-259). cambridge, ma: the mit press. doi:10.1162/dmal.9780262633598.241 bernholt, a., gruber, h., & moschner, b. (eds.). (2017). wissen und lernen. wie epistemische überzeugungen schule, universität und arbeitswelt beeinflussen [knowing and learning. the influence of epistemic beliefs on schools, universities and working life]. münster: waxmann verlag. bhatt, i., & mackenzie, a. (2019). just google it! digital literacy and the epistemology of ignorance. teaching in higher education, 24 (3), 302-317. doi:10.1080/13562517.2018.1547276 bråten, i. (2008). personal epistemology, understanding of multiple texts, and learning within internet technologies. in m. s. khine (ed.), knowing, knowledge and beliefs: epistemological studies across diverse cultures (pp. 351-376). dordrecht: springer. doi:10.1007/978-1-4020-6596-5_17 bråten, i., brandmo, c., & kammerer, y. (2018). a validation study of the internet-specific epistemic justification inventory with norwegian preservice teachers. journal of educational computing research, (onlinefirst), 1-24. doi:10.1177/0735633118769438 bråten, i., & strømsø, h. i. (2006). epistemological beliefs, interest, and gender as predictors of internet-based learning activities. computers in human behavior, 22(6), 1027-1042. doi:10.1016/j.chb.2004.03.026 bråten, i., strømsø, h. i., & samuelstuen, m. s. (2005). the relationship between internet-specific epistemological beliefs and learning within internet technologies. journal of educational computing research, 33(2), 141-171. doi:10.2190%2fe763-x0ln-6nmf-cb86 brownlee, j., walker, s., lennox, s., exley, b., & pearce, s. (2009). the first year university experience: using personal epistemology to understand effective learning and teaching in higher education. higher education, 58(5), 599-618. doi:10.1007/s10734-009-9212-2 cape, p. (2010). (2010). questionnaire length, fatigue effects and response quality revisited. paper presented at the re:think 2010: the arf 56th annual convention, ny. retrieved from https://www.surveysampling.com/ carifio, j., & perla, r. j. (2007). ten common misunderstandings, misconceptions, persistent myths and urban legends about likert scales and likert response formats and their antidotes. journal of social sciences, 3(3), 106-116. doi:10.3844/jssp.2007.106.116 chiu, y., liang, j., & tsai, c. (2013). internet-specific epistemic beliefs and self-regulated learning in online academic information searching. metacognition and learning, 8(3), 235-260. doi:10.1007/s11409-013-9103-x clark, a., & chalmers, d. (1998). the extended mind. analysis, 58(1), 7-19. doi:10.1093/analys/58.1.7 debacker, t. k., crowson, h. m., beesley, a. d., thoma, s. j., & hestevold, n. l. (2008). the challenge of measuring epistemic beliefs: an analysis of three self-report instruments. journal of experimental education, 76(3), 281-312. doi:10.3200/jexe.76.3.281-314 downes, s. (2007). an introduction to connective knowledge. paper presented at the media, knowledge and education: exploring new spaces, relations and dynamics in digital media ecologies, ed. t. hug, innsbruck university press, innsbruck, austria, 2007, june 25-26, pp. 77-102. elby, a. (2009). defining personal epistemology: a response to hofer & pintrich (1997) and sandoval (2005). journal of the learning sciences, 18(1), 138-149. doi:10.1080/10508400802581684 fabrigar, l. r., wegener, d. t., maccallum, r. c., & strahan, e. j. (1999). evaluating the use of exploratory factor analysis in psychological research. psychological methods, 4(3), 272-299. ferguson, r. (2012). learning analytics: drivers, developments and challenges. international journal of technology enhanced learning, 4(5), 304-317. doi:10.1504/ijtel.2012.051816 finch, w. h., immekus, j. c., & french, b. f. (2016). applied psychometrics using spss and amos. charlotte, nc: information age publishing inc. fisher, m., goddu, m. k., & keil, f. c. (2015). searching for explanations: how the internet inflates estimates of internal knowledge. journal of experimental psychology, general, 144(3), 674-687. doi:10.1037/xge0000070 fokkema, m., & greiff, s. (2017). how performing pca and cfa on the same data equals trouble. european journal of psychological assessment, 33(6), 399-402. doi:10.1027/1015-5759/a000460 greene, j. a., azevedo, r., & torney-purta, j. (2008). modeling epistemic and ontological cognition: philosophical perspectives and methodological directions. educational psychologist, 43(3), 142-160. doi:10.1080/00461520802178458 greene, j. a., sandoval, w. a., & bråten, i. (eds.). (2016). handbook of epistemic cognition. new york: routledge. doi:10.4324/9781315795225. retrieved from http://ebookcentral.proquest.com/ grossnickle peterson, e., alexander, p. a., & list, a. (2017). the argument for epistemic competence. in a. bernholt, h. gruber & b. moschner (eds.), wissen und lernen. wie epistemische überzeugungen schule, universität und arbeitswelt beeinflussen (pp. 255-270). münster: waxmann verlag. gunter, b., rowlands, i., & nicholas, d. (2009). the google generation. are ict innovations changing information-seeking behaviour? cambridge: chandos publishing. retrieved from http://ebookcentral.proquest.com/ hair, j. f., black, w. c., babin, b. j., & anderson, r. e. (2010). multivariate data analysis: a global perspective (7th ed.). upper saddle river (n.j.): prentice hall. head, a. j., & eisenberg, m. b. (2010). how today’s college students use wikipedia for course-related research. first monday, 15(3). doi:10.5210/fm.v15i3.2830 hinman, l. m. (2008). searching ethics: the role of search engines in the construction and distribution of knowledge. in a. spink, & m. zimmer (eds.), web search multidisciplinary perspectives (pp. 67-76). berlin, heidelberg: springer. doi:10.1007/978-3-540-75829-7_3. retrieved from https://ebookcentral.proquest.com/ hofer, b. k. (2006). beliefs about knowledge and knowing: integrating domain specificity and domain generality: a response to muis, bendixen, and haerle (2006). educational psychology review, 18(1), 67-76. doi:10.1007/s10648-006-9000-9 hofer, b. k. (2008). personal epistemology and culture. in m. s. khine (ed.), knowing, knowledge and beliefs: epistemological studies across diverse cultures (pp. 3-22). dordrecht: springer. doi:10.1007/978-1-4020-6596-5_1 hofer, b. k., & pintrich, p. r. (1997). the development of epistemological theories: beliefs about knowledge and knowing and their relation to learning. review of educational research, 67(1), 88-140. doi:10.2307/1170620 hofer, b. k., & pintrich, p. r. (eds.). (2002). personal epistemology: the psychology of beliefs about knowledge and knowing . mahwah, n.j: l. erlbaum associates. doi:10.4324/9781410604316 hooper, d., coughlan, j., & mullen, m. (2008). structural equation modelling: guidelines for determining model fit. electronic journal of business research methods, 6(1), 53-60. retrieved from http://www.ejbrm.com/volume6/issue1/p53 jones, c., & hosein, a. (2010). profiling university students' use of technology: where is the net generation divide? international journal of technology, knowledge & society, 6 (3), 43-58. doi:10.18848/1832-3669/cgp/v06i03/56097 kammerer, y., & gerjets, p. (2012). effects of search interface and internet-specific epistemic beliefs on source evaluations during web search for medical information: an eye-tracking study. behaviour & information technology, 31(1), 83-97. doi:10.1080/0144929x.2011.599040 karimi, m. n. (2014). efl students' grammar achievement in a hypermedia context: exploring the role of internet-specific personal epistemology. system, 42, 1-11. doi:10.1016/j.system.2013.10.017 khine, m. s. (ed.). (2008). knowing, knowledge and beliefs: epistemological studies across diverse cultures . dordrecht: springer. doi:10.1007/978-1-4020-6596-5. retrieved from https://ebookcentral.proquest.com/ kienhues, d., bromme, r., & stahl, e. (2008). changing epistemological beliefs: the unexpected impact of a short-term intervention. british journal of educational psychology, 78(4), 545-565. doi:10.1348/000709907x268589 kitchener, r. f. (2002). folk epistemology: an introduction. new ideas in psychology, 20(2–3), 89-105. doi:10.1016/s0732-118x(02)00003-x knewton. (2011). the flipped classroom infographic. retrieved from http://www.knewton.com/flipped-classroom/, 03.08.2012 knight, s., rienties, b., littleton, k., mitsui, m., tempelaar, d., & shah, c. (2017). the relationship of (perceived) epistemic cognition to interaction with resources on the internet. computers in human behavior, 73, 507-518. doi:10.1016/j.chb.2017.04.014 knight, s., wise, a. f., & chen, b. (2017). time for change: why learning analytics needs temporal analysis. journal of learning analytics, 4(3), 7–17. doi:10.18608/jla.2017.43.2 krathwohl, d. r. (2002). a revision of bloom's taxonomy: an overview. theory into practice, 41(4), 212-218. doi:10.1207/s15430421tip4104_2 kuhn, d., & weinstock, m. (2002). what is epistemological thinking and why does it matter? in b. k. hofer, & p. r. pintrich (eds.), personal epistemology: the psychology of beliefs about knowledge and knowing (pp. 121-144). mahwah, n.j: l. erlbaum associates. leal-soto, f., & ferrer-urbina, r. (2017). three-factor structure for epistemic belief inventory: a cross-validation study. plos one, 12(3), 1-16. doi:10.1371/journal.pone.0173295 martin, e. (2005). survey questionnaire construction. in k. kempf-leonard (ed.), encyclopedia of social measurement (pp. 723-732). new york: elsevier. doi:10.1016/b0-12-369398-5/00433-3. retrieved from http://www.sciencedirect.com/ miller, b., & record, i. (2013). justified belief in a digital age: on the epistemic implications of secret internet technologies. episteme, 10(02), 117-134. doi:10.1017/epi.2013.11 moschner, b., gruber, h., & studienstiftungsarbeitsgruppe epi. (2005). epistemologische überzeugungen. forschungsbericht nr. 18. regensburg: universität regensburg, lehrstuhl für lehr-lern-forschung. retrieved from https://portal.uni-regensburg.de/48/ muis, k. r. (2007). the role of epistemic beliefs in self-regulated learning. educational psychologist, 42(3), 173-190. doi:10.1080/00461520701416306 muis, k. r., bendixen, l. d., & haerle, f. c. (2006). domain-generality and domain-specificity in personal epistemology research: philosophical and empirical reflections in the development of a theoretical framework. educational psychology review, 18(1), 3-54. doi:10.1007/s10648-006-9003-6 nguyen, c. t. (2018). echo chambers and epistemic bubbles. episteme, (firstview, sept 13, 2018). doi:10.1017/epi.2018.32 niessen, t., vermunt, j., abma, t., widdershoven, g., & van der vleuten, c. (2004). on the nature and form of epistemologies: revealing hidden assumptions through an analysis of instrument design. european journal of school psychology, 2(1-2), 39-64. osborne, j. w. (2014). best practices in exploratory factor analysis. retrieved from http://pareonline.net/ osf. (2010). use of information and communications technology. official statistics finland. retrieved from http://www.stat.fi/til/sutivi/2010/sutivi_2010_2010-10-26_tie_001_en.html osf. (2011). use of information and communications technology by individuals. official statistics finland. retrieved from http://www.stat.fi/til/sutivi/2011/sutivi_2011_2011-11-02_tie_001_en.html otting, h., zwaal, w., tempelaar, d., & gijselaers, w. (2010). the structural relationship between students' epistemological beliefs and conceptions of teaching and learning. studies in higher education, 35(7), 741-760. doi:10.1080/03075070903383203 palmer, b., & marra, r. m. (2008). individual domain-specific epistemologies: implications for educational practice. in m. s. khine (ed.), knowing, knowledge and beliefs: epistemological studies across diverse cultures (pp. 325-350). dordrecht: springer. doi:10.1007/978-1-4020-6596-5_16 perry, w. g. (1970). forms of intellectual and ethical development in the college years: a scheme . new york: holt, rinehart and winston. personal data act of 22.4.1999. retrieved from http://www.finlex.fi/fi/laki/ajantasa/1999/19990523 pieschl, s., stallmann, f., & bromme, r. (2014). high school students' adaptation of task definitions, goals and plans to task complexity the impact of epistemic beliefs. psychological topics, 23(1), 31-52. doi:10.31820/pt prensky, m. (2001). digital natives, digital immigrants part 1. on the horizon, 9(5), 1-6. doi:10.1108/10748120110424816 purcell, k., brenner, j., & rainie, l. (2012). search engine use 2012. washington, dc: pew research center. retrieved from http://www.pewinternet.org/2012/03/09/search-engine-use-2012/ purcell, k., rainie, l., heaps, a., buchanan, j., friedrich, l., jacklin, a., . . . zickuhr, k. (2012). how teens do research in the digital world. washington dc: pew research center. retrieved from http://www.pewinternet.org/2012/11/01/how-teens-do-research-in-the-digital-world/ säljö, r. (2012). literacy, digital literacy and epistemic practices: the co-evolution of hybrid minds and external memory systems. nordic journal of digital literacy, 7(1), 5-19. retrieved from http://www.idunn.no/ts/dk/2012/01/art08 sandoval, w. a. (2005). understanding students' practical epistemologies and their influence on learning through inquiry. science education, 89(4), 634-656. doi:10.1002/sce.20065 schommer, m. (1990). effects of beliefs about the nature of knowledge on comprehension. journal of educational psychology, 82(3), 498-504. doi:10.1037/0022-0663.82.3.498 schommer, m. (1993). epistemological development and academic performance among secondary students. journal of educational psychology, 85 (3), 406-411. doi:10.1037/0022-0663.85.3.406 schommer, m. (1998). the influence of age and education on epistemological beliefs. british journal of educational psychology, 68(4), 551-562. doi:10.1111/j.2044-8279.1998.tb01311.x schommer-aikins, m. (2004). explaining the epistemological belief system: introducing the embedded systemic model and coordinated research approach. educational psychologist, 39(1), 19-29. doi:10.1207/s15326985ep3901_3 schraw, g. (2013). conceptual integration and measurement of epistemological and ontological beliefs in educational research. isrn education, vol. 2013, 1-19. doi:10.1155/2013/327680 schraw, g., bendixen, l., & dunkle, m. e. (2002). development and validation of the epistemic belief inventory (ebi). in b. k. hofer, & p. r. pintrich (eds.), personal epistemology: the psychology of beliefs about knowledge and knowing (pp. 261-275). mahwah, n.j: l. erlbaum associates. schreiber, j. b., nora, a., stage, f. k., barlow, e. a., & king, j. (2006). reporting structural equation modeling and confirmatory factor analysis results: a review. the journal of educational research, 99(6), 323-337. doi:10.3200/joer.99.6.323-338 shaffer, d. w., & clinton, k. a. (2006). toolforthoughts: reexamining thinking in the digital age. mind, culture, and activity, 13(4), 283-300. doi:10.1207/s15327884mca1304_2 siemens, g. (2005). connectivism: a learning theory for the digital age. international journal of instructional technology and distance learning, 2 (1), 3-10. siemens, g. (2006). knowing knowledge, elearnspace. retrieved from http://www.elearnspace.org/ simpson, t. w. (2012). evaluating google as an epistemic tool. metaphilosophy, 43(4), 426-445. doi:10.1111/j.1467-9973.2012.01759.x smart, p. r. (2012). the web-extended mind. metaphilosophy, 43(4), 446-463. doi:10.1111/j.1467-9973.2012.01756.x sparrow, b., liu, j., & wegner, d. m. (2011). google effects on memory: cognitive consequences of having information at our fingertips. science, 333(6043), 776-778. doi:10.1126/science.1207745 spss. (2016a). amos 24.0 [computer software]. chicago, il: spss inc., ibm corporation. spss. (2016b). spss 24.0 [computer software]. chicago, il: spss inc., ibm corporation. ståhl, t. (2017). how ict savvy are digital natives actually? nordic journal of digital literacy, 12(3), 89-108. doi:10.18261/issn.1891-943x-2017-03-04 ståhl, t., & mildén, p. (2017). applying the fee to explore epistemic beliefs among students. in a. bernholt, h. gruber & b. moschner (eds.), wissen und lernen. wie epistemische überzeugungen schule, universität und arbeitswelt beeinflussen (pp. 59-97). münster: waxmann verlag strømsø, h. i., & bråten, i. (2010). the role of personal epistemology in the self-regulation of internet-based learning. metacognition and learning, 5(1), 91-111. doi:10.1007/s11409-009-9043-7 tang, j. (2010). exploratory and confirmatory factor analysis of epistemic beliefs questionnaire about mathematics for chinese junior middle school students. journal of mathematics education, 3(2), 89-105. tourangeau, r., couper, m. p., & conrad, f. (2004). spacing, position, and order: interpretive heuristics for visual features of survey questions. public opinion quarterly, 68(3), 368-393. doi:10.1093/poq/nfh035 trautwein, u., & lüdtke, o. (2007). epistemological beliefs, school achievement, and college major: a large-scale longitudinal study on the impact of certainty beliefs. contemporary educational psychology, 32(3), 348-366. doi:10.1016/j.cedpsych.2005.11.003 van den beemt, a., akkerman, s., & simons, p. r. j. (2011). patterns of interactive media use among contemporary youth. journal of computer assisted learning, 27(2), 103-118. doi:10.1111/j.1365-2729.2010.00384.x walker, s., brownlee, j., lennox, s., exley, b., howells, k., & cocker, f. (2009). understanding first year university students: personal epistemology and learning. teaching education, 20(3), 243-256. doi:10.1080/10476210802559350 wood, p., & kardash, c. (2002). critical elements in the design and analysis of studies of epistemology. in b. k. hofer, & p. r. pintrich (eds.), personal epistemology: the psychology of beliefs about knowledge and knowing (pp. 231-260). mahwah, n.j: l. erlbaum associates. zhang, l., & watkins, d. (2001). cognitive development and student approaches to learning: an investigation of perry's theory with chinese and u.s. university students. higher education, 41(3), 239-261. doi:10.1023/a:1004151226395 microsoft word rantavuori et al_publication.docx frontline learning research vol.4 no. 3 (2016) 1 -‐ 27 issn 2295-‐3159 learning actions, objects and types of interaction: a methodological analysis of expansive learning among pre-service teachers juhana rantavuori1, yrjö engeström, lasse lipponen university of helsinki, finland article received 8 may / revised 24 march / accepted 6 april / available online 10 may abstract the paper analyzes a collaborative learning process among finnish pre-service teachers planning their own learning in a self-regulated way. the study builds on culturalhistorical activity theory and the theory of expansive learning, integrating for the first time an analysis of learning actions and an analysis of types of interaction. we examine the theory of expansive learning as a possible conceptual and methodological framework for understanding this type of collaborative learning. the task of the paper is primarily methodological. we believe that cultural-historical activity theory needs to be turned into methods and procedures of systematic empirical analysis, and this article examines one such methodological solution. at the same time, we aim to uncover some substantive dynamics of expansive learning in collaborative teacher education oriented at openended problems and tasks. an almost complete expansive mini-cycle of learning actions appeared in the pre-service teachers’ meeting. however, an analysis of the steps of formation of the shared object revealed a more complex iterative process. as the expansive learning process moved epistemically from questioning to analysis, modeling and implementation, it also moved interactionally from coordination to cooperation and communication. yet there was no mechanical correspondence between specific learning actions and specific types of interaction. transitions and disturbances were crucial for the dynamics of expansive learning. a full assessment of a potentially expansive minicycle of learning calls for extending the time scale of the analysis. keywords: activity theory; expansive learning; learning actions; types of interaction; object; disturbances 1 corresponding author: juhana rantavuori, center for research on activity, development, and learning, institute of behavioural sciences, faculty of behavioural sciences, po box 9 (siltavuorenpenger 1a), fi-00014 university of helsinki, finland. e-mail: juhana.rantavuori@helsinki.fi. doi: http://dx.doi.org/10.14786/flr.v4i3.174 rantavuori et al | f l r 2 1. introduction open-ended and problem-based collaborative learning is becoming an increasingly important challenge for many contexts in which learners face complex problems for which pre-existing standard solutions are not sufficient (bereiter & scardamalia, 1993). we argue that it is not enough to promote collaborative and problem-oriented learning in general. theoretically ambitious models and empirically rigorous methods are needed for the design and assessment of such learning processes (see goldman, 2014). in this paper, we will analyze a collaborative learning process among finnish pre-service teachers. in this particular process, the pre-service teachers were responsible for planning their own learning actions and goals. we examine the theory of expansive learning (engeström, 2015) as a possible conceptual and methodological framework for understanding this type of learning. more generally, our study contributes to research on learning and interaction in activity systems, especially to how learning and interaction are connected in open-ended problem solving. activity systems are systems where people engage in solving problems or making or designing something (greeno, 2011; greeno & engeström, 2014). they are “dynamic, open, semiotic system(s) of meaningful actions and meaning-making processes” (lemke, 1990, p. 191). an activity system can be as small as an individual working with a computer, or as large as an organization having hundreds of employees. in our case, the activity system is a group of pre-service teachers, working on an open-ended problem solving task in a selfregulated way. the task of the paper is primarily methodological. we believe that cultural-historical activity theory needs to be turned into methods and procedures of systematic empirical analysis. therefore, the aim of the paper is to contribute to the construction of a methodology for analyzing dynamics of expansive learning. a new methodological framework created in this study is tested in the analysis of a planning meeting of a preservice teacher group. our study is focused on two important aspects of expansive learning, namely types and sequences of expansive learning actions (engeström & sannino, 2010) and types and sequences of object-oriented interaction (engeström, 2008; fichtner, 1984; raiethel, 1983). our aim is to understand what kinds of learning actions pre-service teachers conduct and in what types of interaction they engage in a collaborative learning process characterized by self-regulation and open-ended problem-solving. expansive learning actions have been studied in detail previously (e.g., engeström, rantavuori, & kerosuo, 2013; foot, 2001; nilsson, 2003; seppänen, 2004), and so have types of object-oriented interaction (e.g., de lange, 2011; saari; 1995). however, no studies have thus far combined learning actions and types of interaction into an integrated analysis. to fully understand the nature of open-ended and problem-based collaborative learning, and to develop the methodology of expansive learning, we need to combine these two analyses. no studies have done this up until the present. studies of expansive learning have often been based on interventions, such as change laboratories (virkkunen & newnham, 2013), deliberately designed to implement expansive learning (e.g., engeström et al., 2013). this was not the case in the process we analyze in this paper. in this sense, our case resembles an earlier study of innovative learning in industrial work teams (engeström, 2008, pp. 118–168). the assumption of these studies is that features of expansive learning may be found in processes in which the learners face a problem or task that needs to be defined by the learners themselves and has no predefined procedure to follow or correct solution to aim at. furthermore, these studies see an inherent tension and conflict of motives in these learning processes between the safe and easy but probably rather unproductive option of following the available routine script in dealing with the task on the one hand, and the risky and difficult but possibly very productive option of turning the task into a new, expanded object and way of working on the other hand. our study examines a single learning session. full-fledged cycles of expansive learning consist of mini-cycles which may be detected and fostered within single learning sessions or other rantavuori et al | f l r 3 compact sequences of learning efforts. thus, from the point of view of the theory of expansive learning, our study addresses three interrelated methodological challenges: (a) combining and integrating for the first time an analysis of learning actions and an analysis of types of interaction, (b) examining possible features of expansive learning in a process which was not designed to accomplish expansive learning by deliberate intervention, and (c) examining possible evidence for a mini-cycle of expansive learning within a single learning session. in other words, the task of this article is to explore and elaborate on the explanatory potential of the theory of expansive learning in a context of learning to which it has not been usually applied, and to develop methodological tools for examining the potential of the theory in a systematic manner. added to this, the task of the article is also to show which role the mutual interaction between the participants plays in the expansive learning process. in what follows, we will first present the theoretical framework, the methodology used in the study, and the research questions. after that we describe the context of the study and the data collected. we then analyze our data in four sections, each devoted to one of our four research questions. finally, we discuss our findings and consider their methodological implications for the framework of expansive learning and for research on learning more generally. 2. theoretical framework 2.1 theory of expansive learning sfard (1998) suggested that there are two basic metaphors of learning competing for dominance: the acquisition metaphor and the participation metaphor. the key dimension underlying sfard’s dichotomy is derived from the question: is the learner to be understood primarily as an individual or as a community? this is an important dimension, largely inspired by the notion of community of practice put forward by lave and wenger (1991) and wenger (1998). however, an attempt to construct a one-dimensional conceptual space for the identification, analysis and comparison of theories is bound to eliminate too much of the complexity of the field of learning. the theory of expansive learning puts the primacy on communities as learners, on transformation and creation of culture, on horizontal movement and hybridization, and on the formation of theoretical concepts. in fact, from the point of view of expansive learning, both acquisition-based and participationbased approaches share much of the same conservative bias. both have little to say about transformation and creation of culture. both acquisition-based and participation-based approaches, depict learning primarily as one-way movement from incompetence to competence, with little serious analysis devoted to horizontal movement and hybridization. acquisition-based approaches may ostensibly value theoretical concepts, but their very theory of concepts is quite uniformly empiricist and formal (davydov, 1990). participation-based approaches are commonly suspicious if not hostile toward the formation of theoretical concepts, largely because these approaches, too, see theoretical concepts mainly as formal ‘bookish’ abstractions. so the theory of expansive learning must rely on its own metaphor: expansion. the core idea is qualitatively different from both acquisition and participation. in expansive learning, learners learn something that is not yet there. in other words, the learners construct a new object and concept for their collective activity, and implement this new object and concept in practice. traditional modes of learning deal with tasks in which the contents to be learned are well known ahead of time by those who design, manage, and implement various programs of learning. when whole collective activity systems, such as work processes and organizations, need to redefine themselves, traditional modes of learning are not enough. nobody knows exactly what needs to be learned. the design of rantavuori et al | f l r 4 the new activity and the acquisition of the knowledge and skills it requires are increasingly intertwined. in expansive learning activity, they merge. relying on activity theory, the theory of expansive learning is foundationally an object-oriented theory. in other words, the object is both resistant raw material and the future-oriented purpose of an activity. the object is the true carrier of the motive of the activity. thus, in expansive learning activity, motives and motivation are not sought primarily inside individual subjects – they are in the object to be transformed and expanded. in educational settings, the students’ object is a contradictory unity of meaningful knowledge (use value) and grades (exchange value). a powerful object of learning has expansive potential to go beyond the exchange value, being typically an open-ended problem or challenge that has relevance for the learners not limited to reproducing predefined correct answers. such an object of learning typically also goes beyond verbal formulations, requiring transformative material actions of experimentation, modeling, and implementation in practice. the theory of expansive learning is based on the dialectics of ascending from the abstract to the concrete (engeström & sannino, 2010). this is a method of grasping the essence of an object by tracing and theoretically reproducing the logic of its development, that is, its historical formation through the emergence and resolution of its inner contradictions. a new theoretical idea or concept is initially produced in the form of an abstract, simple explanatory relationship, a germ cell. this initial abstraction is enriched and transformed step-by-step into a concrete system of multiple, constantly developing manifestations. in an expansive learning cycle, the initial simple idea is transformed into a complex object, a new form of practice. a successful expansive cycle produces a new theoretical concept – theoretically grasped practice – concrete in its systemic richness and multiplicity of manifestations. the expansive cycle begins with individual subjects questioning the accepted practice, and it gradually expands into a collective effort. in educational contexts, the most well-known example of ascending from the abstract to the concrete is davydov’s (1990) work on elementary school mathematics learning. for davydov, the germ cell of mathematics is real number, which is a particular case of a general relationship of quantities, where one of them is taken as a measure for computing the other. a number is obtained by the general formula a/c = n, in which n is any number, a is any object represented as a quantity, and c is any measure (davydov, 1990, pp. 361–362). from working out and operating with this foundational relationship, or abstract germ cell, davydov built a whole curriculum that resulted in a mastery of a rich and concrete diversity of mathematical phenomena and tasks (schmittau & morris, 2004). in subsequent studies of expansive learning, the learning challenge has often been more problematic, stemming from contradictions that need to be resolved. in these studies, the germ cell is initially not known by the instructor-interventionists themselves; it has to be discovered and modeled by the participants investigating and transforming their activity and knowledge domain (engeström & sannino, 2010). expansive learning may be described as a stepwise process that involves seven phases called learning actions. together these actions form an expansive cycle. this sequential model should be understood as an idealized tool for analyzing elements of expansive learning; real cycles of expansive learning do not neatly follow the order depicted in the theoretical model. process theories of learning are unavoidable to some extent prescriptive in that they advocate some optimal or desirable model of the learning process. this carries the risk of self-fulfilling prophecy, that is, as design-oriented researcher may impose his or her theoretical model on learners and instructors and seek confirmation for the model from evidence stemming from such pre-designed practice. there are good ways to keep this tendency in check (engeström & sannino, 2012). in the present study, the learning process was not designed to follow the theoretical model of expansive learning to begin with. an ideal-typical sequence of learning actions in an expansive cycle can be described as follows (engeström & sannino, 2010, p. 7). rantavuori et al | f l r 5 the first action of an expansive cycle is that of questioning, criticizing, or rejecting some aspects of accepted practice and existing wisdom. the second action is that of analyzing the situation. analysis involves mental, discursive, or practical transformation of the situation in order to discover causes or explanatory mechanisms. analysis evokes “why” questions and explanatory principles. one type of analysis is historicalgenetic; it seeks to explain the situation by tracing its origination and evolution. another type of analysis is actual-empirical; it seeks to explain the situation by constructing a picture of its inner systemic relations. the third action is that of modeling the newly found explanatory relationship in some publicly observable and transmittable form. this means constructing an explicit, simplified model of the new idea that explains and offers a solution to the problematic situation. the fourth action is that of examining the model, running, operating, and experimenting on it in order to fully grasp its dynamics, potentials, and limitations. the fifth action is that of implementing the model, concretizing it by means of practical applications, enrichments, and conceptual extensions. the sixth and seventh actions are those of reflecting on and evaluating the process and consolidating its outcomes into a new, stable form of practice. the model of expansive learning is useful when we try to understand open-ended learning processes in which the problem and its solution are not predefined, and the participants must learn something that “is not yet there”, that is, to generate and appropriate culturally new practices and knowledge. expansive learning has mostly been studied in relatively long-term transformations and interventions. however, “largescale cycles involve numerous smaller cycles of learning actions” (engeström & sannino, 2010). such a mini-cycle may take place within a single intensive meeting of a group charged with a task of analyzing and solving a problem important for the development of its overall activity (e.g., engeström, 2008). although the theory of expansive learning proposes that full-fledged sequences of expansive learning actions typically take the shape of relatively predictable cycles, the cycle of expansive learning is not a universal formula of phases or stages. in fact, one probably never finds a concrete collective learning process which purely follows the ideal-typical model. the model is a heuristic conceptual tool derived from the logic of ascending from the abstract to the concrete. every time one examines or facilitates a potentially expansive learning process with the help of the model, one tests, criticizes and hopefully enriches the theoretical ideas of the model. learning processes are never purely expansive. they contain both expansive and non-expansive phases, steps forward and back, and digressions from expanding the object of activity (engeström et al., 2013). in the study of innovative learning in industrial work teams (engeström, 2008, pp. 118–168), two such non-expansive actions were identified, namely formulating/debating a problem and reinforcing existing practice. a change laboratory process in a finnish library (engeström et al., 2013) revealed three nonexpansive actions, namely informing, clarifying, and summarizing. in this study we followed the criteria of these previous studies for identifying the non-expansive learning actions. in expansive learning the emergence of a new expanding object is decisive. if such a new object was not found, the learning action was identified as non-expansive. these actions were then named descriptively, on the basis of their contents, without aiming at a theoretically systematic categorization. however, these non-expansive actions are not inimical or opposite to expansive learning, but unnecessary elements of the epistemic process of ascending from the abstract to the concrete. 2.2 object-oriented interaction the learning actions of the expansive cycle do not dictate what kinds of social interaction are involved in the learning process. to capture this aspect, we used the framework of three types of objectoriented interaction, namely coordination, cooperation, and communication. these three types of interaction rantavuori et al | f l r 6 can be understood as qualitatively different types of epistemological subject–object–subject relations (raiethel, 1983; fichtner, 1984; engeström, 2008). one basic idea to define collaboration is to make a distinction between cooperation and collaboration. according to dillenbourg, baker, blaye, and o'malley (1996), cooperation is accomplished by the division of labor among the participants; each person is responsible for a portion of the problemsolving task. by contrast, collaboration is “a coordinated, synchronous activity that is the result of a continued attempt to construct and maintain a shared conception of a problem” (roschelle & teasley, 1995, p. 70). in this article cooperation and collaboration are used as specific concepts which are part of the analytical framework of three qualitative types of interaction. therefore, our intention is not to participate in the larger ongoing discussion concerning the use concepts of cooperation and collaboration in educational research. coordination is the “default” mode of interaction in groups, experienced as business-as-usual. in coordination, each participant focuses on and performs his or her own scripted role and tasks. the script, coded in written rules, plans, and agendas or engraved in tacitly assumed traditions, coordinates the participants’ actions as if from behind their backs, without being questioned or discussed. each participant has his or her own partial object or task; the possible shared object is not articulated and participants engage in dialogue mainly to maintain and adjust boundaries between their respective tasks and roles. cooperation is typically initiated when the participants face a discoordination, that is, a disturbance or problem that cannot be fixed simply by returning to the prescribed script. in cooperative interactions, participants focus on a shared problem, trying to find mutually acceptable ways to understand, conceptualize, and work on it. in this mode, the given script is temporarily suspended and actions are driven by the demands of the shared object. participants address each other dialogically and there is typically a marked increase in the intensity of the discourse, often manifested in overlapping talk and similar indications of increased engagement. cooperation may remain a mere attempt, typically when a participant initiates it but receives no or only minimal responses from the interlocutors. such an attempt often stands out as a disturbance in that it deviates from the standard script of the interaction. interaction may also take the shape of pseudo-cooperation. in this case, participants interact in a way that resembles cooperation; they address and respond to one another, often talking about something that is perceived as problematic. however, pseudo-cooperation focuses on a substitute object, often an “eternal issue” that can be discussed ad infinitum without ever approaching a resolution. pseudo-cooperation commonly resembles collective venting, sometimes also grumbling or complaining. communication is usually initiated when the participants experience recurring conflicts or breakdowns in their coordination and cooperation. in communication, the participants question and examine their own patterns of interaction in relation to their shared object. as a result, both the object and the script are reconceptualized. this type of self-reflective and transformative phases in interaction are rare and difficult to sustain without the mobilization of novel resources, such as shared documentation, plans, or outside help. overall, the framework of expansive learning calls attention to transitions between types of interaction. as the transitions are typically triggered by discoordinations, conflicts, ruptures and breakdowns, the analysis of types of interaction needs to pay special attention to these kinds of disturbances. often when coordination is interrupted or breaks down, it turns into a cooperation attempt or communication attempt which may or may not lead to a phase of full-fledged cooperation or communication. fluid, pulsating movement from coordination to cooperation and communication and back should be a hallmark of expansive learning characterized by a longitudinal effort to redefine the object of the collective activity. rantavuori et al | f l r 7 2.3 object formation expansive learning is a process of identifying, articulating, reconceptualizing and expanding the object of the activity. in her activity-theoretical study of an elementary school teacher team planning and implementing an innovative curriculum unit, kärkkäinen (1999) identified three phases in the formation of the object of planning. shifts from one phase to the next one were described as turning points, characterized by clusters of disturbances and questioning. a simplified ideal-typical sequence of the formation of the object in expansive learning may be depicted with the help of figure 1. figure 1. ideal-typical phases of the formation of the object in expansive learning. in the first phase depicted in figure 1, the object of the activity may be in crisis due to fragmentation and routinization that prevent the practitioners from facing and embracing new challenges and opportunities in their activity. alternatively, the object may be in such an embryonic state of emergence that it is only vaguely and diffusely grasped and understood by the participants. in the second phase of figure 1, the participants articulate, conceptualize and model a new object for their activity. this new object is typically still a relatively abstract initial idea or principle, a “germ cell”, the expansive implications and potentials of which are not yet realized. in the third phase, the new object is expanded and made concrete, in other words, its manifold practical consequences, extensions, and applications are integrated into a complex totality. 3. research questions to analyze and understand the pre-service teachers’ collaborative learning process, we pose the questions enumerated in table 1. our research questions are driven by our methodological interest in examining the analytical potential of the framework of expansive learning with data from a learning context which was not deliberately designed to follow the guidelines of expansive learning. thus, the methodological questions in table 1 are of primary importance. the substantive questions may be read as tools with which the methodological questions are approached and made concrete. rantavuori et al | f l r 8 table 1 research questions methodological research questions auxiliary substantive questions 1. how does the conceptual framework of expansive learning actions work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? 1. which expansive learning actions can be identified in the learning process of the pre-service teacher group? 2. how does the conceptual framework of the object formation work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? 2. how was the shared object formed in the learning process of the pre-service teacher group? 3. how does the conceptual framework of types of object-oriented interaction work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? 3. how were the types of interaction and transitions between them manifested during the collaborative learning process? 4. how does the integration of conceptual frameworks of expansive learning actions and types of interaction work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? 4. what was the relationship between expansive learning actions and types of interaction? 4. participants and context of the study the participants in the study were six pre-service teachers. they were enrolled in a class teacher education program (primary school level) with annual intake of ten students, with educational psychology as their major. the nearest equivalent to a term class teacher outside of finland is a primary school teacher (uk) or an elementary school teacher (usa). at the time of the data collection, the students were in their fourth year. in class teacher education at university of helsinki, students complete a master of arts (education), the completion of which takes approximately five years. the class teacher education at the university of helsinki consists of two different study programs. the major subject may be either education or educational psychology. the core contents of the major subject studies in educational psychology include working as a member of a group and interaction skills; learning, growth, and development; curriculum work and learning to deal with the reality of school life; as well as learning to conduct research. the students in this program study intensively as a small group approximately for three years, applying self-regulated, collaborative learning as one of their main approaches (see eteläpelto, littleton, lahti, & wirtanen, 2005; lipponen & kumpulainen, 2011). the pre-service teachers who participated in this study were thus already socialized into working and interacting within a pedagogical culture that built on collective discussion and collaboration on open-ended and largely self-designed tasks. their activity was that of a new type of university study characterized by self-directed collaborative planning and implementation. however, this new activity existed side by side with the traditional type of university study, characterized by individual rantavuori et al | f l r 9 work on assignments given from above. a tension between these two scripts is an inherent feature of the activity analyzed here. in this article, we analyze a meeting of the pre-service teachers’ group at the beginning of a threemonth course. this course was part of the large study module called “multidisciplinary studies of school subjects taught in the comprehensive school”. during this study module students studied all 13 subjects which are taught in the primary school (grades 1–6). usually each subject is taught in its own separate course by the subject expert (teacher educator). in the teacher education program analyzed in this paper the entire study module was arranged in multidisciplinary way. in the beginning of the module the student group chose three multidisciplinary themes, that were “sustainable development”, “human being” and “time”. the selection of themes was a process were student group together created a joint conception of the important phenomena of the world. therefore this study module was also called “the deepening and widening of the world view”. under each theme one integrating course was created which consisted several school subjects and subject experts. the idea was that the students and subject expert would work together in collaborative way under the common integrating theme. the group was responsible for the planning and implementation of the contents and working procedures of each course. the first two courses of the study module (“sustainable development” and “human being”) were conducted during the second and third year. the last course (“time”) was conducted in the fourth year. the data of this study was collected from this last course. during the course the group investigated the concept of time from multiple disciplinary perspectives, integrating the subject disciplines of mother tongue, handicrafts, history, and multiculturalism into their design. as a final product of their course the members of the group agreed to produce a short theater performance. based on this initial plan they discussed the substantive idea of the theater play. they also discussed what kinds of expertise were needed in the course and invited appropriate experts (teacher educators) to join in the course. four teacher educators representing the subject disciplines listed above participated in the course in the role of experts and supervisors. the pre-service teachers and teacher educators all met as a group six times during the course. during the meetings general guidelines for the course were created, the students’ plans and ideas were discussed, and the final product was evaluated. during the course the pre-service teachers also met at least once a week without the subject experts to discuss their progress on the task and to prepare for the next meeting with subject experts. additionally, the students met some of the subject experts privately a few times during the course. 5. data collection and analysis our data corpus consists of six video-recorded meetings in which only the pre-service teachers were present, comprising a total of 12 hours of video. from this corpus, we selected the first officially scheduled two-hour meeting for detailed transcription and analysis. the selection was based on preliminary viewing of all the videos that resulted in content logs (jordan & henderson, 1995). we decided to focus on phases in which the pre-service teachers conducted planning and talked about planning. earlier studies of expansive learning (e.g., engeström, 2008, pp. 118–168) have demonstrated that features of expansive learning may be found when participants face an open-ended problem solving task, such as a need to plan something that is new for them. since an analysis combining the framework of expansive learning actions and the framework of types of interaction was new and needed to be carefully tested as a methodological solution, we decided to concentrate on a single meeting. focusing on a single meeting runs the risk that no meaningful mini-cycle of expansion is accomplished in such a limited time. our preliminary viewing of the video data convinced us that this meeting was rich in learning actions and types of interaction and would be worth a detailed analysis in spite of the risk. the procedure of our data analysis consisted of four steps, schematically depicted in figure 2. this figure 2 is a summary of the steps of our analysis, not a representation of the conceptual rantavuori et al | f l r 10 structure of expansive learning. the four steps depicted in figure 2 stem from our specific research questions. they are not meant to represent a general procedure to be applied in all analyses of expansive learning. figure 2. steps in the analysis of the data. as a first step, we identified expansive and non-expansive learning actions in the meeting by (a) discerning the topical episodes based on their substantive contents, (b) analyzing the turns of talk within each topical episode in terms of actions and formulating a preliminary description of the actions, and (c) specifying the epistemic function of each action in the stream of learning actions. a learning action typically consisted of an interactive effort that contained more than one turn of talk but was usually shorter than a topical episode. learning actions which did not correspond to the characteristics of any of the expansive learning actions and did not contain an attempt at questioning or explicating the shared object were categorized as non-expansive. as a second step, we examined the succession of the learning actions in relation to the phases of the formation of the object. in other words, we checked which object the learning actions were directed at and what possible phases and turning points emerged in the formation of the object. as a third step, we identified types of interaction in the data. an interaction type for each topical episode was tentatively named by examining the nature of exchanges in the episode and by identifying possible shared or individual objects of the participants. next disturbances, that is, unintentional deviations from the script, were identified. finally, points of transition from one type of interaction to another were examined in greater detail. as a fourth step, to investigate the relationship between expansive learning actions and types of interaction, we brought the two analyses together. next we show briefly with help of transcript excerpts how the three analysis methods mentioned above were applied on the data. the students had agreed earlier that the main task for the course would a preparation of a short theater performance. thus the students needed to write together a script for the theater play. in the next excerpt (table 2) the students are discussing whether some common frames or guidelines are needed for the writing of the script. rantavuori et al | f l r 11 table 2 an example the analyses of learning actions, types interaction and object formation turns transcription learning action type of interaction/ disturbance object formation 145 mark: shall we frame this in some way, i mean, if we go backwards in time [in the story], sort of... ae dist/coopa to 146 ann: well, somebody can go ten years forward [in his/her story], if tina goes 50 years forward [in her story]. ae dist/coopa to 147 john: i’m also getting curious whether we have some common guidelines or does everybody just choose “i will do this” or “i will do that.” is our plan again that i choose it [the story] to take place in ten years’ time, and you choose it [your story] to take place after 20 years. i don’t know if it makes any sense. ae dist/coopa to 148 mark: (inaudible) ae dist/coopa to 149 john: i tried to suggest this system with the panelists [teacher educators]. or are we going to go through any of those reference points with the panelists. that is a principled decision... ae dist/coopa to 150 ann: how about if one just begins working [his or hers own story] even if we others don’t know what the time or the place [where the story is situated]. if one could begin to create a personality or a role for the main character. then one does not necessarily need the time. it could the way to go forward with producing the fictional text. ae coord to legend: ae = analyzing: actual-empirical analysis; dist = disturbance; coopa = cooperation attempt; to = transitional object in this excerpt we identified one expansive learning action, namely actual-empirical analysis. during the learning action of analysis the painstaking process of problem finding and problem definition took place. mark (turn 145) highlighted the problem that common frames are needed for the joint writing process. john (turn 147) emphasized that it might be problematic if everybody could choose freely the topic for their writing. in the analysis of interaction this excerpt was seen as a disturbance. the group’s meeting started with coordination-type of interaction; each participant was concentrating on presenting their own idea and perspective. this coordination was disturbed as two participants, mark and john, made a cooperation attempt by challenging the group’s initial plan which they saw as too vague and non-specific. the cooperation attempt of mark and john did not get response from other participants and the interaction returned to the coordination mode (turn 150). in the analysis of object formation we concluded that in this excerpt the initial diffuse object, named “time”, was already transformed into the transitional object named “theater play”. in the next excerpt (table 3) the student group was talking about problems of collaboration and found a possible explanation from the group’s shared history. in our analysis of expansive learning this was identified to be a learning action of reflecting on the process. the student group was evaluating its own activity in a reflective way. in the analysis of interaction this sequence was identified to represent communication. john (turn 456) was tracing the problems in collaboration to the beginning of the group's life cycle, a phase in which the principle of individual freedom of choice dominated. john recognized that this habit of freedom of choice had now become a problem when the group needed to plan a collective project. rantavuori et al | f l r 12 the initial way of working now became an obstacle to interaction and collaboration. in the analysis of object formation, the transitional object “theater play” was identified also in this excerpt. table 3 an example the analyses of learning actions, types interaction and object formation turns transcription learning action type of interaction/ disturbance object formation 456 john: just recently we were so excited and explaining to the lions [another student group] how we had such great freedom in the beginning. but however, that freedom is causing us problems now. although there is lots of freedom in this unit, it is unlikely that anyone in this unit will have as much freedom as we had. although, this [freedom] is a positive thing in many ways, one negative aspect [of it] is probably that we have sort of become the conquerors of the world, who can do whatever they feel like – and “that’s how i’m going to do it” r com to 457 tina: and whenever i’m up to it… r com to 458 john: and when i’m up to it. and if i’m not up to it, nobody can tell me that “you have to do it” r com to legend: r = reflecting on the process; com = communication; to = transitional object 6. expansive learning actions in our data, we could identify all the learning actions of the expansive cycle except consolidating the new practice. the absence of consolidation is an obvious consequence of focusing on a single meeting: the modeling of a new solution had just begun and the initial idea had not matured enough yet to be consolidated and generalized into a new and stable practice. the results of the analysis of learning actions are summarized in table 4. the meeting started with an episode that did not correspond to the characteristics of any of the expansive learning actions. in this episode, the pre-service teachers discussed practical preparations for the next meeting with subject experts without an attempt at questioning or explicating the object. we gave this non-expansive action the tentative name maintaining the existing practice to describe its character without making any particular theoretical assumptions. the notion of existing practice refers here to routine practices of planning and preparation within teacher education. rantavuori et al | f l r 13 table 4 types and frequencies of expansive and non-expansive learning actions in the pre-service teachers’ meeting type of learning action number of learning actions number of turns of talk maintaining the existing practicea 1 65 questioning 1 8 analyzing: actual-empirical analysis 8 243 analyzing: historical analysis 1 2 modeling a new solution 2 6 examining the new model 5 45 implementing the new model 8 127 reflecting on the process 5 49 different topicb – 258 total 31 802 a non-expansive learning actions are indicated by italics. b conversation not related to the group’s assignment (planning of the course). as table 4 shows, the most common expansive learning actions in the meeting were analyzing, specifically actual-empirical analysis, and implementing the new model; both occurred 8 times. the large number of actions and speaking turns related to actual-empirical analysis indicates that problem finding and problem definition played a central role in the meeting – an emphasis to be expected at the beginning of the expansive learning process. interestingly enough, implementing the new model, reflecting on the process, and examining the new model formed the other dominant block of expansive learning actions. this indicates that instead of only focusing on the early learning actions of the expansive cycle, the group went indeed through an entire mini-cycle of expansive learning in the meeting. on the other hand, the low frequencies of questioning and modeling the new solution indicate that perhaps the shared object constructed in this first meeting was still only very preliminary and would invoke further questioning and re-modeling as the process went on. frequencies of expansive learning actions tell only a part of the story. the more important issue is the way in which the learning actions flow forward and form a meaningful order within a session. by meaningful order we refer to the general directionality of the theoretically formulated expansive cycle (see engeström et al., 2013). in table 5 we give a condensed overview of the progression of expansive learning actions and their contents in the meeting. rantavuori et al | f l r 14 table 5 succession of expansive and non-expansive learning actions and their contents in the pre-service teacher group’s meeting turns of talk contents learning action 1–65 practicalities concerning the next meeting with subject experts are discussed mepa 66–73 tina: “have we completely forgotten the starting point?” q 74–80 planning of the theater play begins ae 81–98 division of instructional resources for the course ae 99–154 agreement on the joint writing task ae 155–173 setting the story in the future ae 174–175 the contents of the previous course considered as starting point ha 176–225 setting the story in the future (continued) ae 226–281 disagreement whether story should be situated in future or in history ae 282–312 negotiation on the starting point of the story ends up in deadlock ae 313–315 ann suggests that the theme “making a choice” should be in everyone’s story; she gets no response m 316–321 creation of a unified story seems impossible ae 322–324 ann demands again a response to her suggestion; this time other participants are responding m 325–343 ann’s idea is accepted and discussion begins on how to include “making a choice” in each participant’s story e 344–354 participants discuss the group’s way of working and state that collaboration is possible but it takes time r 355–358 need for the virtual learning environment (fle) to make things work is acknowledged i 359–362 sheila states that it is problematic if everyone can still write what one wants without any common frame r 363–372 realization that experts of different fields have different perspectives on important moments in history e 373–378 sheila emphasizes the need for a common starting point for the writing; the themes/topics are too general to guide the writing process r 379–384 realization that important turning points in history should be discussed with teacher educators e 385–402 decision that the shared plan should be moved into the virtual platform i 403–416 realization that preparing a theater play forces the participants to collaborate r 417–429 realization that jointly prepared questions for the expert interviews are needed i 430–434 decision to inform teacher educators about today’s decisions i 435–437 decision: we have to start using the fle [virtual learning environment] i 438–447 realization: what we teach today in school should be also relevant for the pupils in future e 448–461 john: we had such great freedom at the beginning and that freedom is causing us problems now r rantavuori et al | f l r 15 461–467 decision: tina should send her text to everybody i 468–470 realization: we have to decide whom to interview i [471–713] [talk about subject matters unrelated to the planning of the next meeting and the course] [dt] 714–787 organizing the expert interviews and sending an email to subject experts i [788–802] [talk about practicalities unrelated to the planning of the next meeting and the course] [dt] legend: mep = maintaining existing practice; q = questioning; ae = analyzing: actual-empirical analysis; ha = analyzing: historical analysis; m = modeling a new solution; e = examining the new model; i = implementing the model; r = reflecting on the process; dt = different topic a non-expansive learning actions are indicated by italics. table 5 shows that the learning actions of the expansive cycle were taken by and large in the order predicted in the theory. to be sure, there were iterations, such as the sequence analyzing –> modeling –> analyzing –> modeling in turns 282–324. also, reflecting on the process was interspersed among actions of examining and implementing the new model in the latter part of the meeting. such iterations are not incompatible with the general model of the expansive cycle, but they represent an interesting challenge for further research. it seems that the expansive mini-cycle was in this case composed of two main parts. we might call these (1) working on the problem (turns 66 to 324) and (2) working on a new model (turns 325 to 470 and turns 714 to 787). during the first part, problem finding and problem definition and formulation of a tentative solution dominated the discussion. this included the learning actions of questioning, actualempirical and historical analysis, and modeling a new solution. during the second part, the solution idea was refined into practical applications and procedures. this included the learning actions of examining and implementing the new model and reflecting on the process. the learning action modeling a new solution formed a turning point and bridging phase between the two main parts. overall, the succession of learning actions in table 5 looks almost like a perfect expansive minicycle. however, closer scrutiny reveals that the cycle is not at all perfect. for this scrutiny, we need to trace the steps of the formation of the object. 7. phases of object formation the initial object of the work of the group was “time”. this was in general terms agreed upon in the group already in the spring. in the fall, before starting the officially scheduled meetings for planning and implementing the course, the pre-service teachers had an informal meeting in a café in which they came up with an idea of producing a small theater play as an outcome of the course. in their first officially scheduled meeting, the first non-expansive learning action, maintaining the existing practice (turns 63 to 65, excerpt 1), represents routine-like planning. it consisted of discussion about how to proceed, with no reference to the shared object. the pre-service teachers articulated their object first in terms of the theater play (turns 66 to 69). excerpt 1 63 sheila: have we planned at all the agenda for the next meetings? how about if everybody would prepare something for a certain meeting. how many are we... five? 64 john: six... tom [member of the group who is absent]. 65 sheila: tom, so we are six all together. how about if one or two people take charge of one meeting. or if it’s well structured, i don’t mind if everyone would prepare for a certain meeting a presentation. the we would use rantavuori et al | f l r 16 three meeting for this and then we will have two presentations for each meeting. it [a meeting] is three hours, so it means one and half hours for each person. 66 tina: have we completely forgotten the starting point, or forgotten the idea that came up last time? well, you [addressing sheila] did not hear all of it. were you taking care of some other business at the time? 67 sheila: could you explain briefly your understanding of it? 68 tina: we were developing that idea of the theater play. 69 sheila: hm. in turn 66, tina challenged the group’s routine-like mode of working and reminded the participants of a shared starting point discussed in a preceding informal preliminary meeting: “have we completely forgotten the starting point, or forgotten the idea that came up last time?” this is the first articulation of the group’s emerging object: the theater play. however, the emerging object remained quite vague, as if a formal shell to be filled with contents. it was not yet a substantive principle or a “germ cell”. in this sense, we may characterize it as a transitory object. the second turning point in the formation of the object took place much later, starting from turn 313 (excerpt 2). the pre-service teachers had discussed the theater play idea for a lengthy period, circling around the idea that each participant would produce his or her own story and pondering on the difficulty of providing coherence and continuity to a text produced this way. ann then initiated actions of modeling in which the participants articulated the second version of their emerging consciously shared object. in this phase, the new object took the shape of the principle of “making a choice” – potentially a substantive germ cell for a new model. excerpt 2 313 ann: i might have a theme to suggest. 314 sheila: go ahead. 315 ann: what if there would be a shared theme of “making a choice” in all of these [individual stories]? that could be done in different ways. the consequences of the choice can be seen later in how the story develops. even though this can be difficult to execute. still, even if characters and situations [in the individual stories] were different, the “making of a choice” would be a connecting link [between the individual stories]. […] 322 ann: now i would like to hear comments about my recent idea. instead of just everybody being silent, i would like to hear some responses like: “i’m not sure...”, or “yes, sounds good...”, or “i would like to...”. 323 sheila: would you explain it briefly one more time? 324 ann: what we should decide now is the connecting element [between the stories]; if everyone starts to write on their own, the connecting element could be making a choice. in every story the theme would be making a choice. this would be visible always, as we move further in time… 325 tina:… it is choices that have impact… 326 john: …they are the ones that have impact. 327 mark: …how would we establish continuity between persons, or is it just any act of making a choice? 328 john: that’s just what we should create together. 329 ann: the continuity is in the fact that in what comes we will see the consequences of the previous choice. 330 john: and those of the previous, previous choices. 331 tina: like for example my choices. excerpt 2 is important in that the vague and diffuse initial object – the notion of time – and the formal transitional object of a theater play were now turned into a much more focused idea, that of making a choice. the notion of choice was connected to the original notion of time by realizing that choices have consequences that are revealed in time: “in what comes we will see the consequences of the previous choice.” from table 5 (section 6) one might infer that the new object, making a choice, was systematically examined and implemented from this point on. however, this was not the case. the phase that followed immediately after the examination of the newly articulated object of making a choice (turns 344 to 354) consisted of reflecting on the process, specifically on the possibility of genuine collaboration – but no reference was made to the idea of making a choice. the next phase (turns 355 to 358) focused on the rantavuori et al | f l r 17 implementation of the plan by means of the virtual learning environment fle – again, with no reference to making a choice. in fact, until the very end of the meeting, the object of making a choice was not anymore mentioned by the participants. the actions of examining and implementing the model actually referred to the transitional object of the theater play, not to the principle of making a choice. the latter was as if forgotten, and the process circled back to the transitional object. in other words, the proposed germ cell was encapsulated, not elaborated on and expanded. the stepwise formation of the object in this meeting may be summarized with the help of figure 3. figure 3. actual steps in the formation of the object in the pre-service teachers’ meeting. the steps depicted in figure 3 testify to the iterative and non-linear character of expansive learning. in our previous study conducted in a library context (engeström et al., 2013), we identified such an iterative and non-linear loop of expansive learning cycle. in the first six sessions the occurrence of learning actions were in line with the general sequence of theoretical model of the expansive learning but in the last two sessions the expansive learning cycle started again from the beginning. in similar way, object formation does not follow the ideal-typical phases as formulated in figure 1 (section 2.3), and sometimes process can collapse and turn backwards. a single meeting is not likely to produce a neat full-fledged expansive cycle: “miniature cycles of innovative learning should be regarded as potentially expansive” (engeström, 2008). the potential is realized – or not realized – in the longer process. 8. types of interaction there are only few studies which have applied the framework of three types of object-oriented interaction. these studies have demonstrated (engeström, 2008, pp. 49–85; saari, 1995; de lange, 2011) that the most common type of interaction is coordination, the second most common is cooperation, and the rarest type is communication. further, these studies also revealed the important role of disturbances in the analysis of types of interaction. we identified all three main types of interaction – coordination, cooperation, and communication – in our data. we also found three phases of pseudo-cooperation. as shown in table 6, the most common type of interaction was coordination, comprising 227 turns of talk. 105 turns represented cooperation, and 47 turns pseudo-cooperation. communication occurred only in 10 turns. this low number of communication turns indicates that reconceptualizing the script and mode of interaction in relation to the shared object of activity was very challenging for the participants. rantavuori et al | f l r 18 table 6 types of interaction in the meeting of the pre-service teachers’ group type of interaction phases turns coordination 5 227 cooperation 6 105 pseudo-cooperation 3 47 communication 1 10 total (types of interaction) 15 389 different topic 2 258 as pointed out above, a transition from one type of interaction to another often passes through a short phase of disturbances. disturbances may lead to disintegration, contraction, or expansion in the process. in our data, we identified a number of conflicts. in addition to those, we also examined cooperation attempts and communication attempts as disturbances. the frequencies of these disturbance types are presented in table 7. table 7 types and frequencies of disturbances in the student group’s meeting disturbance episodes turns conflict 3 10 cooperation attempt 7 40 communication attempt 5 32 total 15 82 table 8 presents the temporal succession of the types of interaction in the meeting. the idea of theater play as a transitional object was invoked in turns 66 to 73. the subsequent turns 74 to 144 represent a return to coordination. the participants brought up different resource issues (time, help from teacher educators, the virtual learning environment) that did not generate a common thread and problem to be jointly tackled. questions about the allocation of time for the preparation of the theater play were raised and ruminated about but not answered: “but how much time do we have to reserve for it [preparing of the theater play], extra days, for the work it out, because it takes...?” (turn 71) “how much time have we reserved? we have booked fridays from nine to three. after the panel meetings there is always time and...” (turn 81). this does not look very efficient; one might argue that it looks more like discoordination than coordination. however, the standard script of planning in meetings is often indeed inefficient, an example being prolonged episodes in which the participants try to agree on the date and time of the next meeting, each one bringing up disconnected concerns and constraints that make the decision-making look rather absurd. this way rantavuori et al | f l r 19 coordination in meetings quite often comes close to its own limits; such episodes could easily collapse into discoordination or erupt into open conflict. as already noted in the discussion of table 5 (section 6), the group’s interaction seems to have consisted of two main parts. we might call the first part (turns 1 to 324) “coordinative interaction” and the second part (turns 325 to 470) “cooperative interaction”. characteristic to the first part was that the transitional object of theater play did not function as a truly shared object. the first part contained also a pseudo-cooperation phase and several cooperation attempts interpreted as disturbances. the second part is more problematic. there was a notable increase in cooperation and communication attempts. but as we know from the preceding section, after the brief phase of cooperation based on the object of making a choice, the remaining phases of cooperation and communication attempts were actually focused on the transitional object of theater play. in this light, the second part of the meeting was not simply continuation of the first part but rather circling back to the earlier object. table 8 types of interaction and disturbances in the student group’s meeting turns contents type of interaction / disturbance 1–65 practicalities concerning the next meeting with subject experts are discussed coordination 66–73 a disagreement between the participants of the common starting point cooperation attempta 74–144 a discussion of how to proceed with a joint preparation of a theater play coordination 145–149 a criticism that the guidelines for the joint writing task are missing cooperation attempt 150–154 a suggestion that the same protagonist in every story could be a link between different stories coordination 155–164 taking tina’s story as a common starting point pseudo-cooperation 165 is it possible to have something else than just science fiction in the story cooperation attempt 166–178 a development of the idea of the story that takes place in future pseudo-cooperation 179–189 a disagreement of how much one should put emphasis in future in his or her story cooperation attempt 190–213 a development of the story situated in future continues pseudo-cooperation 214–222 should we have a same central character in every story? cooperation attempt 223–225 john does not want to situate his story in the future conflict 226–311 unsuccessful attempts trying to find connecting theme for the shared story coordination 312 seems impossible to write a shared story conflict 313–315 ann is suggesting that in everyone’s story should be a one unified theme, which is “making a choice” but did not get response cooperation attempt rantavuori et al | f l r 20 316–321 seems that participants only want to work individually without binding structure conflict 322–324 ann demands again a response for her suggestion more determined way and this time other participants are responding cooperation attempt 325–343 ann’s idea is accepted and discussion begins how to connect making a choice in each participant’s story cooperation 344–351 participants discuss the group’s way of working and state that collaboration is possible but it takes time communication attempt 352–358 john says that one needs to follow others work too if he/she wants that his/her story works cooperation 359–362 integrating theme is missing communication attempt 363–372 chosen perspectives for the story can sometimes be too narrow cooperation 373–378 sheila: our themes are topics are too general to guide our writing process communication attempt 379–406 different ideas concerning joint story writing are considered cooperation 407–416 previously we did not have a common goal which forces us now to collaborate communication attempt 417–447 preparing interviews; informing teachers; getting a virtual learning platform; visioning pupils’ needs in future cooperation 448–451 earlier we use to have only individual goals? communication attempt 452–461 john sees that in the early-stage of group’s work it was given so much freedom that collaboration is now difficult communication 461–470 tina’s story is chosen as one starting point and the interviews of the experts are organized cooperation [471–713] [talk about subject matters unrelated to the planning of the course] [different topic] 714–787 organizing the expert interviews and sending an email to subject experts coordination [788–802] [talk about practicalities unrelated to the planning of the course] [different topic] a disturbances are indicated by italics. 9. dynamics between learning and interaction as a result of the analysis of expansive learning actions, the meeting was tentatively divided into two main parts, “working on a problem,” and “working on a model”. in a similar way, in the analysis of types of interaction, the meeting was divided in two main parts, “coordinative interaction” and “cooperative interaction”. the transition from the first part to the second part took place at the same point in both analyses. rantavuori et al | f l r 21 as the two analyses were merged in table 9 it was possible to divide the meeting into three parts. the first part of the meeting may be called “coordinated working on a problem”, the second part “transition from coordinated working to cooperative working”, and the third part “cooperative working on a model”. table 9 merged analyses of learning actions and types of interaction analysis of expansive learning analysis of interaction turns learning action turns type of interaction / disturbance 01–65 maintaining the existing practicea 01–65 coordination first part: “coordinated working on a problem” (turns 66–324) 66–73 questioning 66–73 cooperation attemptb 74–80 actual-empirical analysis 74–144 coordination 81–98 actual-empirical analysis | | 99–154 actual-empirical analysis | | | | 145–149 cooperation attempt | | 150–154 coordination 155–173 actual-empirical analysis 155–164 pseudo-cooperation | | 165 cooperation attempt 174–175 historical analysis 166–178 pseudo-cooperation 176–225 actual-empirical analysis 179–189 cooperation attempt | | 190–213 pseudo-cooperation | | 214–222 cooperation attempt | | 223–225 conflict 226–281 actual-empirical analysis 226–311 coordination 282–312 actual-empirical analysis 312 conflict second part: “transition from coordinated working to cooperative working” (turns 313–324) 313–315 modeling a new solution 313–315 cooperation attempt 316–321 actual-empirical analysis 316–321 conflict 322–324 modeling a new solution 322–324 cooperation attempt third part: “cooperative working on a model” (turns 325–470) 325–343 examining the new model 325–343 cooperation 344–351 reflecting on the process 344–351 communication attempt 352–354 examining the new model 352–358 cooperation 355–358 implementing the model | | 359–362 reflecting on the process 359–362 communication attempt 363–372 examining the new model 363–372 cooperation rantavuori et al | f l r 22 373–378 reflecting on the process 373–378 communication attempt 379–384 examining the new model 379–406 cooperation 385–402 implementing the model | | 403–416 reflecting on the process 407–416 communication attempt 417–429 implementing the model 417–447 cooperation 430–434 implementing the model | | 435–437 implementing the model | | 438–447 examining the new model | | 448–461 reflecting on the process 448–451 communication attempt | | 452–461 communication 461–467 implementing the model 461–470 cooperation 468–470 implementing the model | | [471–713] [different topic] [471–713] [different topic] 714–787 implementing the model 714–787 coordination [788–802] [different topic] [788–802] [different topic] a non-expansive learning actions are indicated by italics. b disturbances are indicated by italics. table 9 would seem to indicate that as an expansive learning process moves epistemically from questioning to analysis, modeling, implementation and reflection on the process, it also moves interactionally from coordination to cooperation and at least attempted communication. on the other hand, there is no deterministic or mechanical correspondence between specific learning actions and specific types of interaction. epistemic actions that serve an expansive function from the point of view of the entire cycle may be performed in a coordinated manner that makes them look rather unproductive within their own limited confines. and superficially productive forms of interaction may in a closer analysis turn out to be phases of pseudo-cooperation that serve to avoid the core issues rather than tackle and solve them. in table 9, there is a long phase (74–312) containing several learning actions of actual-empirical analysis and one learning action of historical analysis. this phase consists of different types of interaction (coordination and pseudo-cooperation) and several disturbances (cooperation attempts and conflicts) which shows that mechanical correspondence between specific learning actions and specific types of interaction does not exist. in section 8, part of this phase (turns 74–144) is analyzed more detailed and it is possible to see how joint planning looks inefficient and unproductive as the participants brought up different resource issues that did not generate a common thread and problem to be jointly tackled. however, in our data the learning actions of questioning were typically interpreted as cooperation attempts. the learning actions of actual-empirical analysis were interactionally more heterogeneous, containing coordination and pseudo-cooperation types of interaction as well as several cooperation attempts and conflicts. the learning actions of modeling were typically interpreted as cooperation attempts, whereas the learning actions of examining and implementing the new model were typically interpreted as cooperation. the learning actions of reflecting on the process were typically categorized as communication attempts or, in one case, as communication. probably the most important lesson from the integrated analysis is the importance of transitions and disturbances. these may be small in terms of time and number of speaking turns, but they are crucial for the understanding of the dynamics of the learning process. the fact that the short sequence of turns 313 to 324 was identified as a turning point in both analyses testifies to this. rantavuori et al | f l r 23 10. conclusions in this paper, we have examined the theory of expansive learning (engeström, 2015) as a conceptual and methodological framework for understanding open-ended and problem-based collaborative learning among finnish pre-service teachers. our study was focused especially on two aspects of expansive learning, namely types and sequences of expansive learning actions (e.g., engeström & sannino, 2010) and types and sequences of object-oriented interaction (engeström, 2008; fichtner, 1984; raiethel, 1983). the task of the paper is primarily methodological. we believe that cultural-historical activity theory needs to be turned into methods and procedures of systematic empirical analysis. therefore, the aim of the paper is to develop methodology for analyzing dynamics of expansive learning. a new methodological framework created in this study is tested in the analysis of planning meeting of a pre-service teacher group. our research questions are driven by our methodological interest in examining the analytical potential of the framework of expansive learning with data from a learning context which was not deliberately designed to follow the guidelines of expansive learning. for this reason, our methodological research questions are accompanied by auxiliary substantive questions. the substantive questions may be read as vehicles with which the methodological questions are approached and made concrete. the first methodological question of our study was: how does the conceptual framework of expansive learning actions work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? our analysis showed that it is possible to apply the framework of expansive learning actions on a learning process confined to a single meeting in which expansive learning was not deliberately induced. this indicates that expansive learning can take place as a naturally occurring process also without formative interventions such as the change laboratory method (engeström & sannino, 2010). expansive learning processes can be analyzed in minute detail at the level of conversational turns and episodes, and an almost complete mini-cycle of expansive learning can be fulfilled during a single meeting. our analysis of expansive learning actions showed that an almost complete expansive learning cycle appeared in pre-service teachers’ meeting. six out of the seven learning actions of the expansive learning cycle appeared in the meeting in a meaningful order. only the last expansive learning action, consolidating the new practice, was missing from the learning process – an outcome to be expected in light of the fact that our analysis was restricted to a mini-cycle accomplished within a single meeting. the low frequencies of the actions of questioning and modeling the new solution indicate that perhaps the shared object constructed in this first meeting was still only preliminary and would invoke further questioning and re-modeling as the process went on. it seems that the mini-cycle of expansive learning consisted of two main parts, namely “working on a problem” and “working on a new model”. the learning action of modeling a new solution functioned as a transition phase and bridge between the two main parts. an interesting methodological finding is that learning actions in this meeting roughly followed the same order as the theory proposed. there were some iterative back-and-forth movements between learning actions of analyzing and modeling and in similar fashion with learning actions of examining, implementing and reflecting (see engeström et al., 2013). this implies that learning actions may often take place in clusters in which the three first learning actions (questioning, analyzing and modeling) tend to occur together and, in a similar fashion, the next three learning actions (examining, implementing and reflecting) tend to occur together in iterative clusters. the two-part structure of the expansive learning cycle observed in this study also supports this finding. most theories of learning take the initial existence of a fairly clear problem, task or assignment as a given. it means that a phase of problem finding and definition of the object are not included in the focus of the analysis. in expansive learning this phase of “working on the problem” is essential. rantavuori et al | f l r 24 on the other hand, many theories of learning also ignore or exclude the actions of implementation. jerome bruner (1974, p. 233) pointed out that if we really want to study the conditions of learning, we need to follow our subjects far longer than is usual in laboratory experiments or test-driven classrooms. we need to see what the learners will do with their new insights, how knowledge is turned into actions. in this respect, the actions of implementation in the second part of the process of expansive learning are of utmost importance. the second methodological question of our study was: how does the conceptual framework of the object formation work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? first, tracing the formation of the object is an indispensable methodological step in the analysis of expansive learning; and second, no matter how promising and powerful features of expansion we may find in a single learning session, a full assessment of such a mini-cycle requires an analysis of the entire multi-session learning process. an ideal-typical process of expansive object formation moves from a routinized, fragmented or diffuse initial object to a consciously articulated and shared “germ cell” object, to an expanded concrete object (figure 1, section 2.3). the steps of object formation were different in the meeting we analyzed. the initial diffuse object (“time”) was first transformed into a formal transitional object (“theater play”). subsequently a potential “germ cell” object (the principle of “making a choice”) was formulated by the participants – but it was abandoned and the participants returned to the transitional object (figure 3, section 7). in other words, what looked like a nearly perfect expansive mini-cycle turned out to be a more complex iterative process. the third methodological question of our study was: how does the conceptual framework of types of object-oriented interaction work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? the framework of types of objectoriented interaction seems a promising method for opening up the dynamics of collaborative learning. the analysis of interaction shows clearly that the participants were not just focused on completing the task but also on their mutual interaction. the diversity of types of interaction found in the data indicates that different types of interaction are needed especially when students are trying to complete a vague and open-ended task. all the three types of interaction – coordination, cooperation, and communication – occurred in the meeting of the pre-service teacher group. in addition, three phases of pseudo-cooperation were identified. this finding supports our conclusion that this form of learning was indeed rich and potentially powerful. we identified several transitions between types of interaction that were marked by disturbances, either in the form of conflicts or in the form of cooperation and communication attempts that were not picked up and sustained by other members of the group. also the analysis of types of interaction indicated that the meeting was divided in two parts, “coordinative interaction” and “cooperative interaction”. however, as pointed out above, the second part was not simply a continuation of the first part but more like circling back to the earlier transitional object. the fourth methodological question of our study was: how does the integration of conceptual frameworks of expansive learning actions and types of interaction work in the analysis of data from a single session of collaborative learning not deliberately designed to follow the guidelines of expansive learning? a key methodological finding of this study is that while necessary, both epistemic learning actions and types of interaction are in themselves insufficient windows into expansive learning. what is needed as a connecting link is tracing steps in the formation of the object. in our study the analysis of the formation of the object revealed that what looked like a neatly linear expansive mini-cycle was in fact a more complex and iterative movement between different versions of the object. the true expansive potential of a mini-cycle can only be discovered by extending the time scale and scope of the analysis. our analysis indicates that as an expansive learning process moves epistemically from questioning to analysis, modeling, implementation and reflection on the process, it also moves interactionally from coordination to cooperation and at least attempted communication. on the other hand, there is no rantavuori et al | f l r 25 deterministic or mechanical correspondence between specific learning actions and specific types of interaction. the merging of the analysis of learning actions and the analysis of types of interaction led us to identify three parts in the meeting, namely “coordinated working on a problem”, “transition from coordinated working to cooperative working”, and “cooperative working on a model”. our integrated analysis highlights the importance of transitions and disturbances in expansive learning. these are often small in terms of time and number of speaking turns but crucial for the dynamics of the learning process. these kinds of findings concerning the complex character of expansive learning process have not been reported in previous studies and can thus serve as methodological supports for further research. in the theory of expansive learning, contradictions are seen as the driving force of transformation (engeström & sannino, 2010). although our analysis did not specifically focus on contradictions, we can see in the pre-service teachers’ meeting a pervasive tension between two scripts. the explicit script of the meeting was that of self-determined collaboration on a complex open-ended task, in which planning, design and implementation are unified. this script was challenged and interrupted by the traditional script of studying individually to complete given assignments, in which planning and design are reduced to technical and logistic arrangements. this tension seems to be behind the frequent disturbances observed in the meeting, and the group’s difficulties in constructing a new shared object may be understood against this background. the model of expansive learning is not a universal formula of phases or stages. one probably never finds a learning process that strictly follows the ideal-typical model of expansive learning. whenever one examines or facilitates a learning process with the help of the model, one tests, criticizes and hopefully enriches the theoretical ideas of expansive learning. the theory of expansive learning has mainly been applied to large-scale transformations in activity systems, often spanning a period of 2 or 3 years. in this study, however, an expansive learning cycle was applied to analyze a single learning session, lasting only two hours. this raises a critical question: can a mini-cycle of learning be characterized as expansive? our analysis demonstrates that a mini-cycle of innovative learning can be, to some extent, expansive. the emergence of such mini-cycles does not itself guarantee that a larger expansive cycle takes place. small cycles may remain isolated events, and the overall cycle of expansion may become stagnant or regressive or even fall apart. the occurrence of a full-fledged expansive learning cycle is challenging to achieve, and it typically requires concentrated effort of deliberate interventions. with these reservations in mind, the theory of expansive learning can be applied as a framework for analyzing small-scale innovative learning processes as well. moreover, our study contributes to research on learning and interaction in activity systems. most studies on activity systems focus either on learning or on interaction, keeping the two relatively separate. our aim was to produce a fine-grained analysis of the dynamics between expansive learning actions and types of interaction. the qualitative transition in the pre-service teachers’ learning took place at the same point in both learning actions and in types of interaction. this indicates that cycles of expansive learning actions and progressions of types interaction are closely intertwined. whether the dynamics and qualitative transitions are similar in other contexts as well, needs to be explored in future studies. keypoints a new methodological framework was created for analyzing dynamics of expansive learning. a new method is tested in the analysis of a planning meeting of a pre-service teacher group. tracing the formation of the object is an indispensable methodological step in the analysis of expansive learning. the analysis of group’s interaction highlights the importance of transitions and disturbances in expansive learning. rantavuori et al | f l r 26 this study offers a methodological lens for examining innovative forms of learning in various contexts. references bereiter, c., & scardamalia, m. (1993). surpassing ourselves: an inquiry into the nature and implications of expertise. chicago: open court. bruner, j. s. (1974). beyond the information given. london: george allen & unwin ltd. davydov, v. v. (1990). types of generalization in instruction: logical and psychological problems in the structuring of school curricula. reston: national council of teachers of mathematics. dillenbourg, p., baker, m., blaye, a., & o’malley, c. (1996). the evolution of research on collaborative learning. in h. spada, & p. reimann (eds.), learning in humans and machines (pp. 189–211). oxford: elsevier science. engeström, y. (2008). from teams to knots: activity-theoretical studies of collaboration and learning at work. cambridge: cambridge university press. engeström, y. (2015). learning by expanding: an activity-theoretical approach to developmental research. cambridge: cambridge university press. engeström, y., rantavuori, j., & kerosuo, h. (2013). expansive learning in a library: actions, cycles and deviations from instructional intentions. vocations and learning, 6 (1), 81–106. doi:10.1007/s12186012-9089-6 engeström, y., & sannino, a. (2010). studies of expansive learning: foundations, findings and future challenges. educational research review, 5, 1–24. doi:10.1016/j.edurev.2009.12.002 engeström, y., & sannino, a. (2012). whatever happened to process theories of learning? learning, culture and social interaction, 1(1), 45–56. doi:10.1016/j.lcsi.2012.03.002 eteläpelto, a., littleton, k., lahti, j., & wirtanen, s. (2005). students’ accounts of their participation in an intensive long-term learning community. international journal of educational research, 43(3), 183– 207. doi:10.1016/j.ijer.2006.06.011 fichtner, b. (1984). co-ordination, co-operation and communication in the formation of theoretical concepts in instruction. in m. hedegaard, p. hakkarainen, & y. engeström (eds.), learning and teaching on a scientific basis: methodological and epistemological aspects of the activity theory of learning and teaching. aarhus: aarhus universitet, psykologisk institut. foot, k. (2001). cultural-historical activity theory as practical theory: illuminating the development of a conflict monitoring network. communication theory, 11(1), 56–83. doi:10.1111/j.14682885.2001.tb00233.x goldman, s. r. (2014). perspectives on learning: methodologies for exploring learning processes and outcomes. frontline learning research, 2(4), 46–55. doi:10.14786/flr.v2i4.117 greeno, j. g. (2011). a situative perspective on cognition and learning in interaction. in t. koschmann (ed.), theories of learning and studies of instructional practice (vol. 1, pp. 41–71). new york: springer. greeno, j. g., & engeström, y. (2014). learning in activity. in r. k. sawyer (ed.), the cambridge handbook of the learning sciences (2nd ed., pp. 128–147). cambridge: cambridge university press. jordan, b., & henderson, a. (1995). interaction analysis: foundations and practice. the journal of the learning sciences, 4, 39–103. kärkkäinen, m. (1999). teams as breakers of traditional work practices: a longitudinal study of planning and implementing curriculum units in elementary school teacher teams. helsinki: university of helsinki, department of education. de lange, t. (2011). formal and non-formal digital practices: institutionalizing transactional learning spaces in a media classroom. learning, media and technology, 36(3), 251–275. doi:10.1080/17439884.2011.549827 rantavuori et al | f l r 27 lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge: cambridge university press. lemke, j. l. (1990). talking science. norwood, nj: ablex. lipponen, l., & kumpulainen, k. (2011). acting as accountable authors: creating interactional spaces for agency work in teacher education. teaching and teacher education, 27(5), 812–819. doi:10.1016/j.tate.2011.01.001 nilsson, m. (2003). transformation through integration: an activity theoretical analysis of school development as integration of child care institutions and elementary school. karlskrona: blekinge institute of technology. raeithel, a. (1983). tätigkeit, arbeit und praxis. frankfurt am main: campus. roschelle, j., & teasley, s. (1995). the construction of shared knowledge in collaborative problem solving. in c. e. o’malley (ed.), computer-supported collaborative learning. heidelberg: springer-verlag. saari, e. (1995). voidaanko tutkimusryhmiä perustaa? tapaustutkimus valtion teknillisen tutkimuskeskuksen metallilaboratorion ryhmäkokeilusta vuosina 1989–1991. [could the research groups be founded? a case study of metals laboratory’s team experimentation in 1989–1991 at the technical research centre of finland] vtt tiedotteita 1627. espoo: vtt offsetpaino. schmittau, j., & morris, a. (2004). the development of algebra in the elementary mathematics curriculum of v.v. davydov. the mathematics educator, 8(1), 60–87. seppänen, l. (2004). learning challenges in organic vegetable farming: an activity-theoretical study of onfarm practices. helsinki: university of helsinki, institute for rural research and training. sfard, a. (1998). on two metaphors of learning and the dangers of choosing just one. educational researcher, 27(2), 4–13. doi:10.3102/0013189x027002004 virkkunen, j., & newnham, d. s. (2013). the change laboratory: a tool for collaborative development of work and education. rotterdam, the netherlands: sense publishers. wenger, e. (1998). communities of practice: learning, meaning and identity. cambridge: cambridge university press. etelapelto et al publication frontline learning research vol.6 no. 3 (2018) 6 36 issn 2295-3159 a multi-componential methodology for exploring emotions in learning: using self-reports, behaviour registration, and physiological indicators as complementary data anneli eteläpeltoa, virpi-liisa kykyria, markku penttonen a, päivi hökkä, a, susanna paloniemi a, katja vähäsantanen a, tuomas eteläpelto b, vesa lappalainen b a faculty of education and psychology, university of jyväskylä, finland b faculty of information technology, university of jyväskylä, finland article received 22 may 2018 / revised 30 august / accepted 1 october/ available online 7 december abstract studies on emotions in learning are often based on interviews conducted after the learning. these do not capture the multi-componential nature of emotions, nor how emotions are related to the processes of learning. we see emotions as dimensional, multi-componential responses to personally meaningful events and situations. in this methodologically advanced pilot study we developed a multi-componential methodology, capable of providing complementary information on emotions in professional learning. for this purpose, we used a within-subject design applied to a single individual, with a focus on emotions during professional learning. within a laboratory setting, the subject was shown personally meaningful video extracts from a learning situation in which she had previously participated. the data were gathered through (i) self-reports of emotions via the emotion circle (ec) online assessment tool, (ii) measures of autonomic nervous system (ans) activity obtained via electrodermal activity (eda) and heart rate variability (hrv), (iii) behavioural registration of facial expression and gaze, and (iv) the stimulated recall interview (sri). self-reports of emotions via ec, and also the emotion-driven sri, were found to be productive, not only in detailing and explaining emotions experienced during the viewing of the videos, but also in bringing about reflective learning and novel insights. eda and hrv provided complementary information on the subject’s ans activity during the learning process. we present conclusions and future challenges in applying a multi-componential methodology to research emotions within professional learning. keywords: learning process; emotions; multimethod measuring; on-line self-reports; autonomic nervous system info. corresponding author mail: anneli.etelapelto@jyu.fi doi: https://doi.org/10.14786/flr.v6i3.379 acknowledgment professor anneli eteläpelto’s highly-meritorious professional career in the field of adult education at university of jyväskylä (finland) will continue in her new role as an emerita. we greatly appreciate her years of dedication to research, and especially all the inspiring ideas she has introduced over the years! among other roles, she has served the academic community as the coordinator of earli sig14 “learning and professional development” and hosted the sig meeting in 2008. further, she is renowned for her repeated successes in securing prestigious grants from the academy of finland and elsewhere. anyone who has had been fortunate enough to collaborate with anneli has probably found her an inspiring colleague and, sometimes, surprising in her contributions. in her new role as a professor emerita, she will undoubtedly continue to contribute actively to the discourse on learning and professional development. this special issue describes a starting point for new direction of research, which she has actively advocated, and we hope she will remain involved in these developments also in the future to see the vision for which she advocated fully realised. all the best and warm regards on behalf of all your colleagues! christian, raija, erno, and stephen 1. introduction there is fairly convincing evidence on the vital role of emotions in learning. emotions are related to motivational processes, self-efficacy, and active engagement, each of which has a salient role in productive learning in schooling contexts (pekrun, elliot & maier, 2006). in research on adult learning, positive emotions have been found to broaden the scope of perception, whereas anxiety and fear have been connected to a narrowing of the perception and curiosity necessary for active and agentic learning (fredrickson & branigan, 2002; hökkä, vähäsantanen, paloniemi & eteläpelto, 2017; perry, 2006; storbeck & maswood, 2015; sung & yih, 2015). in team-based learning, social and self-conscious emotions such as compassion, love, shame, anxiety, and anger have been found to influence how team members see each other, and how they perceive the future of the team (homan, van kleef & sanchez-burks, 2015). furthermore, it has been shown that in work organizations emotions critically influence the work-related learning manifested in job performance, motivation, creativity, decision-making, turnover, psychological wellbeing, teamwork, and leadership (barsade & gibson, 2007). despite convincing findings on the vital role of emotions in human learning, there has been a lack of research on the role of emotions in professional learning settings. these are characterized by processes of active influencing and developing, and by the negotiation of professional identity (eteläpelto, vähäsantanen, hökkä & paloniemi, 2013; vähäsantanen, räikkönen, paloniemi, hökkä & eteläpelto, 2018). recent studies have emphasized that professional learning occurs, in particular, via collaboration within social interaction, plus experimentation and reflection regarding one’s professional mission and practices (zwart et al., 2015). moreover, to achieve useful outcomes, social learning processes should be designed in such a manner that professionals have sufficient resources to shape their own professional identity and work (eteläpelto, vähäsantanen, hökkä & paloniemi, 2014). this touches on the notion that if meaningful learning is to be achieved, it is crucial to acknowledge and promote agency, and negotiations on professional identity (philpott & oates, 2017). here we emphasize reflection and insight concerning one’s own ways of thinking and interacting as important means of professional learning, after particular events. this kind of learning could manifest itself, for example, in insights into one’s ways of thinking and acting, plus the reasons for these, with greater overall self-awareness emerging as a result (vall et al., 2018). note that we have here adopted the term ‘professional learning’ as denoting the individual as an active and reflective participant, that is, someone who is responsible for learning and for constructing change at a personal level within a given context (labone & long, 2016; eteläpelto, 2017; vähäsantanen, paloniemi, hökkä & eteläpelto, 2017a). recent studies have shown that processes of group-based identity learning are particularly imbued with strong emotions, and that these emotions strongly influence actual learning outcomes, manifested as renegotiated work identities (vähäsantanen, hökkä, paloniemi & eteläpelto, 2017b; vähäsantanen, hökkä, paloniemi, herranen & eteläpelto, 2017). however, the findings on emotions are based on interview data collected after the learning experiences. thus, they do not take into account the multi-componential nature of the emotions (involving subjective experience, the autonomic nervous system, and behavioural changes), or how these are related to the processes of learning. a truly multi-componential understanding of emotions would appear to require a multi-componential approach to the measuring of emotions. the question then arises of how to overcome the methodological challenges of multi-componential measurement, and how to apply methodologically advanced tools to investigate emotions and related learning. in fact, there are now unobtrusive, technologically advanced tools such as face reading and gaze analysis (e.g. azevedo et al., 2013, 2016; zembylas & schultz, 2016) which, together with psychophysiological and self-assessment methods, can be used in the simultaneous collection of multiple data from different subsystems of emotions. however, in such simultaneous measurements, one must address questions of complementarity, interchangeability, validity, and reliability. in addition, there are challenges regarding the differing time windows of the various modalities. in addition to evidence on the relationships between learning and emotion, there has long been discussion on how emotions trigger or inhibit human behaviour (hommel, moors, sander & deonna, 2017), and how the emergence of emotions is closely connected to autonomic nervous system (ans) activity (levenson, 2014; mauss et al., 2005). recent discussion on emotion has been critical of mind-body dualism, pointing to evidence on how closely these interact, especially in the affective domain. it has long been recognized that ans has a central role in emotions. as levenson (2014, p. 100) has noted, ‘when it comes to emotions, all roads lead to ans’. the ans functions via two opposite but interacting regulation systems, i.e. the sympathetic and the parasympathetic nervous system. in experiences of fear and anxiety, the sympathetic nervous system produces the ‘fight or flight’ response, whereas the parasympathetic nervous system is responsible for calming our body and mind (‘rest and digest’). the ans is thus closely allied to the human experience of emotions, and the two branches of the ans are closely intertwined with behavioural responses, including changes in facial expressions and the active focusing of attention. it should be noted that the multi-componential nature of emotions has been accepted in a wide range of theoretical models of emotions (kreibig, 2010; mauss & robinson, 2009; gendolla, 2017). since early research on emotions, there has been a consensual understanding on emotions as comprising the following subsystems: (i) subjective experiences, (ii) psychophysiological responses emerging from the unconscious functioning of the ans, and (iii) behavioural and action-related manifestations. although there is agreement on the multi-componential nature of emotions, there is no agreement on the relations between the components of subjective experience, the ans, and behavioural changes. cognitive theories of emotions assume that there is a top-down relationship between levels in the subject’s cognitive assessment of emotions, and that this influences the responses of the ans and behavioural phenomena (e.g. lazarus, 1991). in contrast with this, evolutionarist-functionalist theories view emotions as organizing the activity of the ans and other physiological systems (levenson, 2014). the relationships between the different subsystems are often addressed in terms of coherence, referring to the coordination, or association, of a person’s experiential, behavioural, and physiological responses as an emotion unfolds over time (mauss et al., 2005). despite active theoretical discussion of such coherence, and of the nature of relationships between the three response systems, there is a lack of empirical evidence concerning how emotions organize activity within the ans, and between the ans and other response systems, including facial expression and subjective experience (levenson, 2014). nevertheless, empirical research has revealed that subsystem coherence and synchronization vary between different emotions (kreibig, 2010; levenson, 2014). it has also been found that different emotions (such as fear and joy) may activate different patterns of ans activity. however, research does not agree on the patterning, coherence, and specificity of the different subsystems (levenson, 2014). as indicated above, agreement on the multi-componential nature of emotions implies multi-method measurement of emotions. however, disagreement on the direction of influence between mind, behaviour, and ans responses brings challenges in interpreting the relationships between data derived from different subsystems. this implies a need for a pilot study which would address the relevant methodological challenges. we therefore sought to construct a multimethod research setting within which we could collect information simultaneously from the different subsystems of emotions operating within professional learning. in this article we consider the kinds of information that appear to emerge from such a methodology. our aim is to form some tentative conclusions on how the methods used may provide complementary data, allowing us to capture different aspects of emotions, plus their role within work-related learning processes. the study reported here was based on an understanding of emotions as multi-componential phenomena, manifested simultaneously in subjective experience and in the unconscious autonomic nervous system, and also in behaviour (with nonverbal behaviour operating somewhere in between these, being partly conscious and partly unconscious). the study aimed to capture emotions simultaneously on three levels, namely (i) subjective experience, (ii) ans, and (iii) behaviour. the methods can be summarized as follows (see also figure 2, section 5.1): (i) video-recorded episodes of previous learning situations were presented to the subject here referred to as ‘lisa’ (pseudonym) in a laboratory setting. her subjective experiences of these were elicited using an on-line application called emotion circle (ec), developed for the self-assessment of emotions concurrently with watching videos (see section 5.2.1). the videos were of episodes from a training programme in which lisa had participated previously. the episodes were selected on the basis that lisa had perceived them as personally meaningful, and as containing highly significant events for her learning within the program. the concurrent assessments of emotions (plus their connections to learning) were elaborated using the stimulated recall interview (sri) method (kagan, krathwohl & miller, 1963; kykyri et al., 2017). the sri interview took place immediately after lisa had given her ec assessment. (ii) ans activity was measured using one of the most reliable indicators of arousal emerging from sympathetic nervous system activity, namely electrodermal activity (eda), as measured via skin conductance (sc). in addition, heart rate (hr) and breathing were registered as indicators of ans activity involving the parasympathetic nervous system, manifested as heart rate variability (hrv). (iii) behavioural indicators, which are partly regulated by the autonomous nervous system, include facial expressions. these were captured through non-intrusive video-recordings, capable of being analysed via the computational methods of facereader (facereader version 6.1., 2015). in addition, the focus and direction of visual attention were registered using gaze recording (begaze version 3.7. manual, 2016). in this single case study, a within-subject design was used. a within-subject design is recommended by researchers investigating concurrent responses and the coherence of multiple sub-systems of emotions (levinson, 2014; mauss & robinson, 2009; kreibig, 2010). our purpose was to pilot the research procedure and multi-componential methodology required when one is investigating concurrent measures applicable to changing emotions, especially those emotions that occur within professional learning processes. in the best case, this would give us insights into the subjective experience of emotion – observed concurrently with ans functioning, and the behavioural manifestations of emotions – plus indications on how the measures may provide information on emotions in learning. we anticipated that the study conducted on ‘lisa’ would contribute to an understanding of the kinds of complementarity that can occur between the indicators used and the three different systems under study (subjective experience, the ans, and behaviour). the following section specifies what is known about emotions in terms of (i) the sub-components of emotion, (ii) dimensional vs. discrete models, and (iii) how emotions are related to action. based on this specification, we present our understanding and definition of emotions, necessary for addressing the validity of the methods used. 2. the conceptualization of emotion from darwin (1872/1965) onward, researchers on emotions (ekman, 1992; lazarus, 1991; levenson, 1994) have argued that emotions involve coordinated changes across experiential, behavioural, and physiological response systems (mauss et al., 2005). tomkins (1962) has suggested that emotions are sets of organized responses that (when activated simultaneously) are capable of simultaneously capturing widely distributed organs (the face, the heart, and the endocrines) and imposing on them a specific pattern of correlated responses. most, but not all, of the theorists above have taken a functional perspective, proposing that by imposing coherence across response systems, emotions facilitate the organism’s response to environmental demands, and prepare the organism for a set of diverse actions (levenson, 1994, 2003; mauss et al., 2005). for many theorists, a defining feature of emotion is response system coherence (mauss et al., 2005). this refers to the coordination, association, concordance, or organization of the response tendencies pertaining to a person’s experiential, behavioural, and physiological responses as the emotion unfolds over time (ekman, 1992; lazarus, 1991; levenson, 1994; scherer, 2009; tomkins, 1962). the notion that response coherence is a core feature of emotion suggests two corollaries. first of all, response coherence should increase as the intensity of emotion increases. weak emotions may provoke little coordination of the response systems, whereas strong emotions may provoke greater coordination. secondly, different emotions should be associated with different patterns of experiential, behavioural, and physiological response, tailored to meet the demands of different situations (mauss et al., 2005). for instance, amusement might be associated with facial displays of amusement, increased body activity, and a commensurate pattern of increased cardiovascular and electrodermal responding. by contrast, sadness might be associated with facial displays of sadness, decreased body activity, and a commensurate pattern of decreased cardiovascular and electrodermal response (mauss & robinson, 2009). nevertheless, in a review by mauss and robinson (2009), the researchers observed that, in contrast with the theoretically assumed coherence of the response systems, the empirical findings have been mixed. in fact, psychophysiologists have long emphasized the weak correlations between experiential and physiological response systems, and even between various measures within the physiological response system. along similar lines, more recent studies have found relatively modest correlations between experiential, behavioural, and physiological measures in the context of specific emotional states such as fear (mauss et al., 2005). in recent discussions on emotions, there has been a debate between discrete and dimensional models of emotions. discrete models have assumed that different emotions are associated with discrete and invariant patterns of response within each response system. by contrast, dimensional models assume that measures of emotional response reflect dimensions rather than discrete states. dimensional perspectives argue that emotional states are organized by underlying factors such as valence and arousal (e.g. barrett & russell, 1999). discrete emotion perspectives, by contrast, suggest that each emotion (e.g. anger, sadness, joy) has unique experiential, physiological, and behavioural correlates. having conducted a review, mauss and robinson (2009) concluded that research tends to support the dimensional perspective. thus, the recent consensual, componential model of emotions conceptualizes emotions as dimensional phenomena comprising experiential, physiological, and behavioural responses to personally meaningful stimuli (mauss & robinson, 2009; russell, 2005). in such a model (see figure 1), an emotional response begins with appraisal of the personal significance of an event (lazarus, 1991), which in turn gives rise to an emotional response involving subjective experience, physiology, and behaviour (frijda, 1986). figure 1. a consensual component model of emotional response (modified from mauss & robinson, 2009) a dimensional model of emotions is also present in the circumplex model of emotion (russell, 2005), which we have used as a framework in developing the emotion circle (see section 5.2.1). this is a tool aimed at the on-line assessment and self-reporting of emotions (eteläpelto et al., 2017). the circumplex model proposes that all affective states arise from cognitive interpretations of core neural sensations, which are the product of two independent neurophysiological systems. in the circumplex model, the vertical dimension depicts the level of arousal, whereas the horizontal dimension depicts the valence (pleasure vs. displeasure) of emotions. posner, russell and peterson (2005) argue that the circumplex model of affect is consistent with many recent findings from behavioural, cognitive neuroscience, neuroimaging, and developmental studies of affect. over many years there has been discussion on whether and how emotions influence action, including active learning. in a summary of recent research, hommel, moors, sander, and deonna (2017) conclude that there is little doubt that emotions influence action. emotions can influence the motivation process, and thus action, by fulfilling at least three functions. first of all, the emotions experienced can function as strong need-like motivational states. secondly, anticipated emotions can function as incentives, and justify action. thirdly, emotions can give information on progress in goal pursuit, permitting behaviour calibration (hommel et al., 2017). although the various theories assume a causal relation between emotion and action, they nevertheless disagree on the direction of this causal relation. feeling theories, such that of james (1884), see action as partly the cause of emotion. by contrast, other theories take emotion to be the cause of action. unsurprisingly, theories that see actions as part of emotions expect the processes involved in generating emotions to play a role in action as well (hommel et al., 2017). in the present article, emotions are understood as dimensional and multi-componential (i.e. experiential, physiological, and behavioural) responses to a personally meaningful antecedent event or situation, causing changes in the quality of subjective feeling, expressive behaviour, and physiological activation (kreibig, 2010; levenson, 2014; mauss & robinson, 2009). the multi-componential and dimensional understanding of emotions does not imply coherence between the different subsystems of emotion. neither does it specify the causal relationships that may exist between changes in the three components of emotions. however, coherence might increase as the intensity of the emotion increases. a dimensional understanding means that major variation in the subsystems of emotions may occur in terms of intensity and valence, without separate emotions necessarily having a specific fingerprint within or across different subsystems. furthermore, we assume that emotions are closely intertwined with the active processes of learning, even if the direction of influence or the specific causal links cannot be hypothesized. a lack of coherence between the subsystems of emotion can be manifested for several reasons. for example, subjective experiences of emotions can change without changes in autonomic nervous system responses, and ans changes may occur without concomitant changes in subjective experience. the reasons for this could derive, for example, from the lack of a subject’s faithful report on the emotional states experienced (kreibig, 2010). alternatively, couplings between the subsystems might be absent in self-reports of emotion because of a subliminal stimulus, unconscious emotions, or emotion regulation (kreibig, 2010; öhman, carlson, lundqvist & ingvar, 2007). what does this multi-component and dimensional understanding of emotion imply for the measurement of emotions? mauss and robinson (2009) suggest that the lack of strong convergence among multiple measures of emotion implies that the construct of ‘emotion’ cannot be captured with any one measure considered in isolation. they conclude that the more measures of emotion that are obtained, and the better tailored they are to the particular context and research question, the more likely it is that a researcher will learn from a particular study. since different measures of emotion appear to be sensitive to different dimensional aspects of an emotional state (with eda sensitive to arousal, facial expression sensitive to valence, etc.), they can be expected to act in a complementary way. to sum up, a multiple-component, dimensional understanding of emotions (with not much coherence between the response systems) implies first of all that we need to measure all three response systems. thus, we need to use multiple methods in such way that they cover all three subsystems (subjective experience, behaviour, and the ans). secondly, the dimensional understanding implies that we need to focus on the dimensions of both intensity and valence. thirdly, our interest in investigating emotions in connection with learning processes implies that we need to measure how emotions elicit learning processes and outcomes – and vice versa, i.e. how learning processes elicit emotions. the following section addresses the strengths and limitations of different measures as they encompass subjective experience, the ans, and behavioural subsystems of emotion. 3. the strengths and limitations of self-reporting, physiological, and behavioural indicators of emotions in learning the self-reporting of emotions has several clear strengths. pekrun (2016) suggests that self-reports can render more differentiated assessments of emotions than any other current method. self-reports are especially important for a nuanced description of emotions and thoughts. from a practical perspective, self-reporting is a more economical means of data collection than (for example) observations (pekrun, 2016). however, self-reports also have clear disadvantages, since they are limited to those emotional responses that a participant is aware of, i.e. to emotions that are inside her/his conscious mind. thus, they do not capture unconscious emotional processes (for unconscious emotions see winkelman & berridge, 2004). in addition, if self-reports are collected retrospectively, memory-bias is common. kreibig (2010) suggests that self-reports of emotions are likely to be more valid to the extent that they relate to currently experienced emotions. even in this case, however, there are concerns that not all individuals are aware of and/or capable of reporting on their momentary emotional states (mauss & robertson, 2009). in addition, there are differences between cultures in the understanding of terms relating to emotions, as well as between individuals sharing the same cultural background. an additional threat to validity emerges from the human tendency towards impression-management, which is always present in the social context of data collection (azevedo et al., 2016). the bias pertaining to impression-management and the displaying of emotions might actually be particularly strong among professionals such as teachers and leaders, whose professional work competences comprise the skills of displaying and regulating their own emotions (e.g. kreibig, 2010). nevertheless, even if it were easy to change how one reports one’s emotions for the purposes of impression-management, it may be much more difficult to change one’s level of physical activation in efforts at coping and impression-management (azevedo et al., 2016). from this point of view, physiological and behavioural indicators are needed to increase validity and reliability in the measuring of emotions. all this would imply that self-report methods should not be used alone, and should be supplemented with psychophysiological and behavioural indicators. this would also accord with an understanding of emotions as taking place in three subsystems which do not cohere, such that the measures of one cannot substitute for another. azevedo et al. (2013, 2016) have suggested the use of electrodermal activity (eda), which has long been used as a marker of sympathetic nervous system activity. in addition, researchers addressing emotions in connection with learning have suggested the use of facial expressions and eye-tracking to detect, identify, and classify affective states during learning (azevedo et al., 2013, 2016). eda refers to changes in the skin’s ability to conduct electricity. it can be measured by well-established and non-intrusive techniques that provide unique information on emotional arousal, increased cognitive work load, and task-engagement. overall, the higher the conductance level rises, the more elevated the subject’s emotional arousal becomes (huhtamäki et al., 2017; laitila et al., 2018). it has been suggested that if eda is a good predictor of cognitive load, task difficulty, and task engagement, it might also be used for predicting boredom, disengagement, and frustration in learning (azevedo et al., 2013, 2016). a feature of eda is that it is highly sensitive to small changes in low states of anxiety (boucsein, 2012; hugdahl, 1995). in addition, eda is responsive to behavioural inhibition and to defensive strategies. thus, eda increases if thoughts are suppressed or if emotional expressions are inhibited. nevertheless, an increase in eda is not equivalent to the occurrence of emotions with negative valence or stress; in fact, eda is also responsive to increased activation within the body in the case of happy excitement (benedeck & kaernback, 2010; karvonen, 2017). the limitations involved in using eda are often positioned around the difficulty of extracting information from the ‘raw’ eda signal. in addition, there can be significant daily variation in participants’ eda response, with eda levels increasing linearly throughout the day, and even from one season to another. from such considerations, azevedo et al. (2013, 2016) recommend including eda, but not limiting oneself to it, in efforts to measure ans activity. a central technique for studying ans emotion responses has been heart rate variability (hrv), which refers to a variety of methods for assessing the beat-to-beat change in the heart over time (quintana & heathers, 2014). hrv describes the variation between consecutive inter-beat intervals. both sympathetic and parasympathetic branches of the ans are involved in the regulation of heart rate (hr). sympathetic nervous system (sns) activity increases hr and decreases hrv, whereas parasympathetic nervous system (pns) activity decreases hr and increases hrv (brentson at al., 1997; tarvainen et al., 2018). hrv is commonly used as a tool in assessing cardiac autonomic regulation, and it has been used in a range of wellbeing applications to evaluate the functioning and balance of the ans (tarvainen et al., 2018). it has been suggested that hrv can be used to investigate the relationships between autonomic regulation and interpersonal interaction (porges, 2001). increased hrv has been found to indicate a feeling of safety. in group-based learning contexts, such a feeling has been connected to having space for self-reflection and the emergence of new ideas. in contrast, reduced hrv has been observed in disorders characterized by poor social cognition and emotion regulation (quintana & heathers, 2014). despite this, there are severe limitations concerning the use of hrv. because of the large inter-individual variation in hrv, it has been suggested as more appropriate to a within-subject design. in addition, hrv is affected by respiratory depth and frequency, with both breathing and blood pressure regulation having their own directly mediated relationships with hrv. thus, social-emotional tasks that induce changes in respiratory time variables and/or depth may indirectly influence hrv. in addition, a number of studies have shown continuous focused attention (and thus tasks with increased attentional demands) to reduce hrv, primarily because of changes in respiratory depth and frequency. this creates further difficulties for interpretation (quintana & heather, 2014; quintana, alvares & heathers, 2016). the starting point for hrv analysis is the electrocardiogram (ecg) recording, from which the hrv time series can be extracted. in the formulation of hrv time series, the fundamental issue is determination of the heart beat period (tarvainen et al., 2018). in addition to the ans measures, researchers addressing emotions connected to learning have suggested the use of facial expressions and eye-tracking. these might allow researchers to detect, identify, and classify affective states during learning (azevedo et al., 2013, 2016). in fact, the use of facial expression is a fundamental process-oriented approach in the detection and classification of affective states (azevedo et al., 2016; ekman, 1992). in everyday social interaction, facial expression has a clear communicative function, but it also functions as an automatic and observable indicator of expressed emotion. recently, data collection on facial expression has been conducted through software linked to a video data stream of the learner’s facial expressions. in addition, there are now many comprehensive, widely supported methods to objectively describe the facial expressions of emotions, based on ekman’s facial action coding system (facs). these analyse basic emotions described as (for example) enjoyment, fear, anger, sadness, disgust, and they indicate how the emotions evolve over time. commercial applications for facial expression recognition software have been developed to automate the coding process involved. despite the use of automated software to register the duration, fluctuation, transition, and dynamics of affective states, azevedo et al. (2016) emphasize that several disadvantages should be considered. first of all, the systems rely heavily on the quality of the video stream. there must not be shadows on the faces. moreover, eyeglasses can make it difficult to measure some emotions, and can increase the likelihood of interpretation as a neutral gesture. in addition, postures and body movement are limited, the software can only identify one face at a time, and the results are restricted to sets of specific predefined emotions. despite this, azevedo et al. (2013; 2016) suggest that the benefits of automatic facial expression recognition software outweigh the limitations. it permits non-intrusive, reasonably valid, and concurrent measurements, which can be integrated with other process data channels. thus, the method has clear potential significance. eye tracking is commonly used in learning research. the data can provide detailed description of the areas and focus of the subject’s interest (azevedo et al., 2016; hautala, loberg, hietanen, nummenmaa & astikainen, 2016). eye-tracking data can be used to measure the learner’s fixations (where?), fixation durations (how long?), saccades (eye movements from one fixation to another), plus the order and the gaze patterns (multiple fixations) that occur during learning. from these data, we can gain insights into what learners pay visual attention to during learning, and this can be both indicative and predictive of emotions (bondareva, conati & feyzi-behnagh, 2013). nevertheless, despite the strengths of eye-tracking data for research on emotion and learning, eye tracking alone cannot definitely indicate what elicited the emotion; nor can one determine what the effect of this emotion may be on subsequent activity. thus, eye-tracking data must be supplemented by other data collection methods if one is to capture the subject’s sense-making processes, as they take place in the reciprocal relationship between learning and emotions. the stimulated recall interview (sri) method has been increasingly used as the video-recording of learning situations has become more popular (kagan, krathwohl & miller, 1963). especially in group learning contexts, when one can organize subsequent viewing of the group situation and of one’s ways of acting in the situation, there is the cognitive capacity and space to gain new insights, via reflection and re-evaluation of one’s own actions (huhtamäki et al., 2017; lyle, 2003; vall et al., 2018). in teacher education, the video-assisted procedure has long been used to increase self-reflection, and thus possibilities to learn about oneself as a teacher (fuller & manning, 1973). these goals (involving an increase in self-reflection, and an understanding of oneself as a leader) were in fact the learning goals in the leadership coaching program addressed in this study. hence, sri was expected to be a useful method for our purposes. we expected that viewing the videotaped learning situations in connection with the perceived emotions would provide space for reflection on the original learning situation (at a first viewing) with insights into the emotions experienced (at a second viewing). 4. research questions and aims of the study the research questions can be specified as follows: 1. what kinds of information do concurrent self-reports, indicators of ans, and behavioural measures of emotion provide, in terms of understanding a person’s emotional responses within professional learning? 2. to what extent does the emotion-driven stimulated recall interview (sri) promote reflection and hence learning, related to the original learning situation? the prime aim of the study was to form tentative conclusions on how self-reports, indicators of ans, behavioural measures of emotion, and the emotion-driven stimulated recall interview (sri) may provide complementary information on emotions in professional learning. a further aim was to elaborate the potentials and challenges of a multi-componential methodology for researching emotions in professional learning. 5. methods 5.1. design, procedure, and stimulus material the multi-method measurement procedure described below was designed for ultimate use in wider data collection. in the present case we applied the procedure and data collection setting to a single subject. a practical purpose in this pilot study was thus to elaborate what might need to be changed or developed for wider data collection. the ethical committee of jyväskylä university evaluated the design and approved the study. all the participants mentioned in the article, and all participants at other times, gave their informed consent regarding participation. figure 2 gives a general description of the procedure (including the stimulus material and data collection methods) used in this study. figure 2 shows that in session i, the subject viewed four successive video-recorded episodes (a1, b1, b2, a2) and assessed these via the on-line emotion circle tool, described in detail in section 5.2.1. in session ii, which followed immediately after session i, an emotion-driven stimulated recall interview (sri) was conducted while the subject again viewed the same four episodes (a1, b1, b2, a2), which were shown together with the saved ec data from her emotion assessments in session i. psychophysiological data were collected via measurement of eda, hrv, and respiration. electrodes measuring eda were attached to the left hand, hr electrodes to the chest, and the respiration belt around the chest (see figure 2). behavioural data from eye movement were gathered via the begaze version 3.7 (smi) recorder, which was located at the lower edge of the display. the face reader 6.1. (noldus) video camera (which was used for recording the subject’s facial expressions) was located behind the screen (see figure 2). figure 2 procedures used in this study sessions i and ii took place under laboratory conditions in april 2018. in contrast, the selected episodes (a1, b1, b2, a2), which were used in the laboratory settings as stimulus material, were selected from the original learning settings. these had taken place five years earlier (in 2012–2013.) our subject, lisa, had participated in this earlier group-based training platform (constituting a leadership coaching program). the program was constructed to cultivate (i) participants’ professional identities, (ii) their ways of managing social relationships within the work community, and (iii) their professional communication (hökkä, vähäsantanen, paloniemi & eteläpelto, 2017; vähäsantanen, paloniemi, hökkä & eteläpelto, 2017). the program had twelve workshops in all. it covered a period of eleven months, with one day per month allotted to it. the selected episodes b1 and b2 were those which lisa had mentioned as representing the most important and meaningful learning situations for her, personally, within the program. episodes a1 and a2 were selected as representing neutral episodes. these were episodes which lisa had not mentioned at all. all of them came from the seventh workshop. the procedure of selecting and ordering the episodes to be viewed was constructed according to the abba model. this meant that at the beginning and at the end of the session there were emotionally neutral episodes. the episodes lisa had selected as most influential in terms of her learning (b1 and b2) were placed in the middle of the temporal continuum. the episodes differed in length, being chosen in such a way as to comprise coherent authentic learning episodes, bearing in mind that they would otherwise have been too fragmented for the subject to understand and evaluate. under laboratory condition, the four episodes were played consecutively (total duration = 17 min. 23 s.). the contents of the episodes were as follows: episode a1 (2 min.) depicted an informal get-together situation before the formal start of the training. in this episode, group members came to the seminar room in which the group learning session would take place. about half of the participants, plus the trainer, were already standing there face to face. they were chatting and talking informally with each other. one by one, the remaining participants walked into the room. some of them went directly to sit down in chairs placed in the form of a half circle. there was mix of voices, thus it was impossible to differentiate individual voices, although one could see the people who entered the room. episode b1 (9 min. 23 s.) depicted a situation in which some of the participants (who were close to each other as colleagues) talked about their present feelings using symbolic object working. this took place in such a way that the participants, plus the trainer, (13 persons altogether) were sitting in chairs which were in the form of a circle. in the middle of the circle there was an empty chair. this symbolized an empty place in which each participant could place an imagined object, one that best represented a central issue in their life at that moment. at the start of episode (b1), a participant, ‘bertha’ started to describe how busy she was in her work at that moment, and how she had tried to organize some weekend trip in order to recover a little; however this had merely caused more stress and internal conflict in her mind. the trainer put some questions to bertha concerning the issues that would emerge if there was space for something other than work. after this, another person (‘caroline’, a close colleague of the previous speaker) started to talk. she described very similar feelings of having too much work, with very stressful feelings connected, for example, to the salary negotiations taking place in her work organization. caroline continued to talk about serious issues regarding her health and wellbeing. while talking about these stress-induced health issues, and how they were connected also to family issues, caroline burst into tears. after this she expressed surprise about the strong emotions connected to her situation. immediately after the start of the weeping, lisa stood up. she picked up a tissue from a side table and gave it to the crying person. one by one, two other close colleagues of caroline started to cry (indicating emotional contagion). the remaining participants looked very serious. the trainer put further questions concerning how far caroline had listened to her own mind, and how she could take more time to take care of her own health and wellbeing. the situation ended with the talk of a third person, who was also crying. at the end of the episode, that person made a joke. this caused people to laugh, and feel more relieved. episode b 2 (4 min.) depicted a situation exemplifying how to work with difficult cases. the work had started with a preliminary task in which each participant had called to mind, from their personal work history a difficult colleague. they had described the situation in writing before the session and sent it to the trainer. in this episode, participants first sat face to face, in the form of a circle. the subject of the present study, ‘lisa’, role-played her difficult case using a drama method. she took the role of the difficult person, changing her voice and way of talking, to resemble that of the difficult person. after this role-play, in the next part of the episode, the participants turned their chairs 180 degrees, so that they were no longer face to face. from this ‘turning chair’ position, lisa presented what she was actually thinking, and what she would have liked to say (as her authentic self) to the difficult person. she was then saying aloud what she would have wanted to say to the imagined difficult person, and what she actually could not have said in real life. after this, the chairs were once again turned to the face-to-face position. lisa now explained to the whole group the kind of history she has had with the difficult case. at the beginning of each piece, the trainer always put a question to be answered. in presenting her reasons regarding the difficulty of the person, lisa also described how she had at last become empowered to set limits to the negative and destructive behaviour of the difficult person. episode a2 (2 min.) depicted a pair-discussion session of the whole group. the participants were sitting in chairs, which were in the form of a circle. they actively discussed with the person next to them. there was mix of voices; hence, one could not differentiate individual voices. however, one could see smiling faces, loud laughter, and the participants’ active concentration during discussion with their pairs. 5.2 data collection 5.2.1. self-reports via the emotion circle (ec) self-reports were collected from lisa via the emotion circle (ec) on-line assessment tool. this was developed for concurrent assessment of the quality and intensity of emotions. here it should be noted that there has been a lack of valid, user-friendly tools to capture changing emotions during learning processes. in our research project, we developed the ec on-line application for the self-assessment and reporting of individual shifting emotions within professional learning settings (eteläpelto et al., 2017). ec utilizes a colourful graphic interface containing 12 written emotion words (figure 3). these are presented, in line with the circumplex model of emotions (russell, 2005). the circumplex model has been designed as a single-item scale providing a quick means of assessing affect along the dimensions of pleasure – displeasure (valence) and arousal – sleepiness. pleasure is considered to be the bipolar opposite of displeasure, and the subjective feeling of arousal to be the bipolar opposite of sleepiness. these dimensions are further considered to be orthogonal (i.e. independent) and thus conceptually separated (russell, weiss & mendelsohn, 1989). the single-item scale based on these two dimensions is envisaged as an instrument that will be short and easy to complete in assessing the subjective experience of continuously and rapidly fluctuating emotions in repeated-measures design (russell, weiss, & mendelsohn, 1989). one can anticipate that multiple-item checklists or questionnaires would be too time-consuming and distracting for the purpose of reporting continuously changing emotions within the learning process. in the circumplex model of emotions, valence is described along the x axis so that on the left there are unpleasant (negative) and on the right pleasant (positive) emotions. arousal is described along the y axis so that on the upper segment (containing plus values) there are emotion constructs characterized by high arousal (‘hot’ emotions) whereas on the lower segment (containing minus values) there are emotion constructs characterized by low arousal (‘cold’ emotions). in accordance with the circumplex model of emotions, different emotion words are placed in the emotion circle (ec) (see figure 3). this means that on the upper right quadrant of the circle, there are emotions characterized by pleasant activation and activated pleasure, such as excitement, surprise, and joy. on the lower right quadrant there are emotions characterized by deactivated pleasure and pleasant deactivation (safety, compassion, courage). on the upper left quadrant there are emotions characterized by unpleasant activation and by activated displeasure (irritation, anxiety, and fear). on the lower left quadrant there are emotions characterized by deactivated displeasure and by unpleasant deactivation (shame, frustration, and sadness). the intensity of emotions is depicted in the ec via degrees of colour saturation. in the middle of the ec the colours are lighter, denoting less intensity of emotion. as one moves to the circumference, the colours become more saturated, denoting more intense emotions. figure 3. the display of the emotion circle (ec). figure 3 shows the ec display. the emotion constructs used in the emotion circle were selected on the basis of interviews conducted with the 11 participants of the leadership coaching program. within the interviews, the interviewees were first presented with an open question concerning their perceived emotions during the program. they were then shown a list of 28 emotion constructs that were expected to represent the most common emotions felt during the program. these emotions were based on the final interviews conducted at the end of the program in 2013 (hökkä, vähäsantanen, paloniemi & eteläpelto, 2017). in selecting the relevant emotion constructs, we also utilized prior studies on emotions in leaders’ identity negotiation (e.g. winkler, 2018). self-reports have been criticized on the grounds of the subjects’ tendency to report more positive emotions because of a social preference for these. because of this, ec included roughly the same number of positive (ec right side) and negative (ec left side) emotionally-related words. it was anticipated that this would help to counteract any tendency towards the reporting of positive rather than negative emotions. ec automatically saves the process data (time and object of clicking) from the subject’s assessments. this assessment video (in the present case video recording 1) can be shown to the subject immediately after the assessment. the ec application makes it possible to collect subjects’ self-assessments of their situation-specific emotions, including also data on the quality and intensity of the emotions, plus their dynamic continuity. it transforms and displays the process in such a way that it can be synchronized with other (physiological and behavioural) process data, collected at the same time. the ec also makes it possible to show the subjects’ recorded assessments together with the video-recording of the situation which the subjects have assessed. this is needed for stimulated recall. in using ec, we seek to avoid the memory bias of retrospective interviews. when we used the ec, the subject, lisa, was first given general instructions on it, with opportunities also to practise the use of it. she was asked to click on the emotion word or words which in each situation represented her subjective experience of the emotion. in order to guarantee that she had properly understood the use of ec, she was asked to imagine some emotion, then to click on the ec accordingly. after lisa had confirmed that the use of ec was easy for her, we played the recorded episodes in the order a1, b1, b2, a2, using ec to collect her self-reports, including the nature and intensity of her emotions. 5.2.2. behavioural data we obtained behavioural and expressive data from lisa using automatic gaze recordings (via begaze) and video-recording of gestures (via noldus face reader). the methods used are unobtrusive. they include a gaze recorder, located at the lower edge of the display. a video camera is used to collect data on facial expression (face reader data). this is located behind the display, with an additional light placed on the upper side of the display to prevent shadows on the faces. 5.2. 3. autonomic nervous system recordings autonomic nervous system (ans) recordings were taken from lisa during the video viewing and assessment situation. this further continued during the stimulated recall session which followed the viewing and assessment. during the ans recording sessions the following signals were recorded using the quickamp amplifier and data acquisition system (brain products, gilching, germany): the electrocardiogram (ecg) was recorded with two ag/agci electrodes (ambu neuroline 710, ballerup, denmark), attached above and below the heart, with a similar ground electrode attached over the stomach. electrodermal activity (eda) during the session was recorded via two skin conductance (sc) electrodes (el507, biopac systems, california, usa) on lisa’s non-dominant palm, below the first and fourth digits. the palm was chosen as the location, because in piloting and in previous research (karvonen, 2017), it was found that there is less measurement error from hand movements when the electrodes are in that area, as compared to fingertips. sc was determined using 0.5 v constant voltage (gsr sensor, brain products, gilching, germany). respiration during the session was registered via a fabric belt (respiratory effort sensor, spes medica, genoa, italy). this was fastened on top of lisa’s clothes, on the lower chest area. however, respiration was not analysed in this study, because the data quality was found to be inadequate (due to the belt having been too loosely attached, as discovered after the session). eda and respiration were amplified in dc mode, but the ecg was 0.5 hz high-pass filtered. signals were acquired with a sampling frequency of 1000 hz, using a data acquisition program (brainvision recorder, brain products, gilching, germany). a custom-made marker unit was used to synchronize the ans measures to the video. 5.2.4. the stimulated recall interview (sri) after lisa’s assessment of her emotions during viewing of the video-recorded learning episodes, a videoand ec-assisted stimulated recall interview (see sri; kagan, krathwohl & miller, 1963) was conducted (immediately after the assessment). in this interview, the video episodes were shown to her, along with her emotion assessments given via ec. she was encouraged to share her thoughts, feelings, and reasons at any time while watching the videos, including her assessments of the emotions that she had experienced while watching the videos, and during the original situation as she recalled it. when she started to speak, the video was stopped to give her time to explain and describe her thoughts. to assist in this phase, lisa was given general instructions, in the form of questions to consider, as follows: what thoughts, feelings, or bodily sensations did you have while watching the video and assessing your emotion connected to it? we assumed that naturally, she could have forgotten many of the feelings she had experienced during the original learning sessions (which had taken place five years previously), and that she might simply describe the thoughts, feelings, and sensations evoked by watching the session videos. nevertheless, it seemed reasonable to suppose that she might be able to recollect some intense emotions from the original learning sessions. for this reason, she was further asked to specify whether she had had a particular thought or feeling in the original learning session, or whether that thought or feeling emerged only now in the assessment session. the same ans measures were recorded during the sri as during the emotion assessment session, and the interview was recorded with a video camera. after the stimulated recall interview, lisa was further asked to comment on the user-friendliness of the emotion circle, and of the data collection sessions as a whole. 5.3 analysis of the data so far, there have not been many analytical (and especially statistical) techniques which would be relevant for analysing data from single-case research, characterized by the rapid and randomly determined alternation of conditions or processes. in their review of existing analytical techniques in single-case design, manlov and onghena (2017) found visual analysis, constituting the classical way of analysing single-case data, to be the most frequently applied technique. it has been suggested that in most cases, visual analysis is sufficient to demonstrate evidence of a relationship between conditions and outcome variables (kratochwill et al., 2013). a baseline phase is necessary to represent the control condition and to provide a clear basis for comparison. visual analysis allows comparisons between and within phases, indicating levels, trends, and variability, with data also on overlaps, the immediacy of the effect, and the consistency of patterns. visual analysis can be used to suggest a functional relation between conditions and the target behaviour, and it can indicate the most salient features of the data. visual data can be used as an initial step in the analysis, and it can be regarded as complementary to statistical analysis in making sense of the data obtained (manlov & onghena, 2017). in this study we used visual analysis to compare the levels, trends, and variability of the ans data on eda and hrv, within and between the four time segments (episodes a1, b1, b2, and a2), referring also to the pause segment, which was used to give a baseline value for the comparisons. visual analysis was also used to carry out a general assessment of the data patterns in eda and hrv. statistical analysis was used as complementary to visual analysis in the analyses of eda data. from the eda data, statistically significant (p<.05) values were calculated. in addition, visual analysis was applied in the descriptive analysis of the concurrent variation of different data modalities. this was done in respect of the ans data and the on-line self-report data on emotions collected via the emotion circle. in the visual analysis addressing the complementarity of ans data and self-reports, a figure was constructed using the time stamps written on the excel file. this indicated the exact time points and the specific emotion assessed with ec while the subject reported her emotion. in this visual description, emotions with a positive valence (excitement, surprise, joy, courage, compassion, safety) were placed on the upper side of the horizontal time axis, while emotions with a negative valence (sadness, frustration, shame, fear, anxiety, and irritation) were placed under the time axis. specific colours connected to the different emotions in ec were also presented in the visual description (figure 5.). from the ec data we calculated the absolute and percentual frequencies of clicks on different emotion words within each episode. data from the subject’s eda were analysed with the ledalab program (version 3.4.6) written in matlab (benedek & kaernbach, 2010; www.ledalab.de). before the analysis, the sampling rate was reduced to 10 hz, which was high enough to represent rapid changes in sc related to sns activation. the rapid components of sc were extracted as skin conductance responses (scrs) and written to an excel file. the scrs were normalized by computing the average and standard deviation of the session, and calculating z-scores. values larger than 2.0 were considered to be statistically significant at the p<.05 level (given that 5% of the values have that property; they can thus be considered to represent statistically significant sns activation). the rationale behind the analysis of the statistically significant eda peaks is based on the assumption that eda can track rapid and unconscious changes in sympathetic nervous system (sns) activity in very brief time windows. an increase in sns activity is related to the increased physiological arousal that accompanies most emotions and also preparation for action (boucsein, 2012; kreibig, 2010). in particular, rapid changes in eda (measured as skin conductance responses, and indicated by increased sweating, especially in palms, fingers, and feet) are thought to be a direct measure of the phasic neuronal activity of the sns (benedek & kaernbach, 2010). laitila et al. (2018) have demonstrated the added value gained from analysing the eda responses that occur in the social interaction of a couple therapy session, as a means to detect important moments of change at individual and interpersonal level. in our subjectively meaningful social learning sessions, we were also interested in detecting critical moments of change that had made the selected episodes b1 and b2 personally meaningful for lisa’s learning within the leadership coaching program. in line with the analysis of laitila et al. (2018), we expected that those values of eda which represented rare (p>.05) high peaks would reveal exceptionally high unconscious emotional arousal of the subject, and thus point to critical moments of learning. in the analyses of lisa’s eda, we compared the numbers of statistically significant high peaks between episodes of about the same length. the baseline value during the pause was used in the comparison. nevertheless, more important than mere detection of moments of change is determination of what has led to these moments, thus placing the focus on the learning process itself (laitila et al., 2018). this implies that if one is to make sense of eda data in terms of learning, they need to be complemented with other kinds of data. in the present study, ans data from eda (indicating rapid peaks in the activity of the sns) were complemented with hrv data, indicating the activity of the parasympathetic nervous system (pns). in contrast with the sns responses manifested in eda, the pns responds much more slowly, possibly over minutes rather than seconds. thus, during fairly short episodes (from 2 min to 9 minutes) there could be instances of overlapping from one episode to another. this is a feature that needs to be considered in interpreting hrv trends and changes. hrv was analysed in the time domain using the kubios hrv premium program (version 3.1; www.kubios.com). first of all, r peaks were detected in the ecg to determine the intervals between consecutive heart beats. possible artefacts were removed automatically. rr intervals were determined, and root mean squares of successive rr interval differences were determined for 60-second windows, starting from the beginning of the session, and covering the whole session in 10-second steps. the hrv was not normalized (in order to keep the relevant values transparent in milliseconds). face reader is based on the circumplex model of emotion. the model describes emotions in a two-dimensional circular space, containing arousal on the vertical axis and valence on the horizontal axis. the centre of the circle represents a neutral valence and a medium level of arousal. the circumplex model of facereader version 6.1. is based on the model described by russell (1980). the valence in face reader indicates whether the emotional status of the subject is positive or negative. ‘happiness’ is the only positive emotion, while ‘sadness’, ‘anger’, ‘fear’, and ‘disgust’ are considered to be negative emotions. ‘surprise‘ can be either positive or negative. the valence is calculated as the intensity of ‘happiness’ minus the intensity of the negative emotion with the highest intensity. for instance, if the intensity of ‘happiness’ is 0.8 and the intensities of ‘sadness’, ‘anger’, ‘fear’, and ‘disgust’ are 0.1, 0.0, 0.05, and 0.05 respectively, then the valence is 0.7 (see face reader version 6.1. reference manual, 2015, pp. 80–81). the focus of visual attention > was obtained via begaze, together with the subject’s assessments conducted with the on-line emotion circle. these process data were analysed using a process analysis, conducted via the video recordings, as depicted in the attached video depicting episode a2 (see video 1). the process analysis derived from the video recordings demonstrates how the focus of visual attention always preceded the subject’s selection of the emotion words she clicked. this information can be used in further analyses of the subject’s selection process, as well as in analyses of the usability of the emotion circle. in this study, the data were used to demonstrate the complementarity of the two relevant data modalities (comprising the more objective behavioural gaze data, and the subjective self-assessment data derived via the emotion circle). video 1. assessment process (2 min 3 s) as depicted for the process analysis with the assessments (red square) conducted via the emotion circle, and via the focus of gaze (blue circle). data from episode a2 (mp 4 format file). for the purposes of detailed process analysis, data derived from different methods can be further transformed via the open broadcaster software, changing the data into the mp4 format. this format can be moved to the observer xt12 for simultaneous display. in this way different data sets can be scrutinized simultaneously, and observed moment by moment on the screen. the sri interview data were transcribed verbatim, amounting to 3.5 pages (a4, 1.5 line space). the transcribed data were read and re-read by the second and first authors. we first identified verbal expressions of emotions, plus the explanations given for these. secondly, we identified self-reflective or other reflective contents of the utterances. finally, we focused on displays of insight and novel ideas which were not mentioned in the video-recorded episodes. 6. findings the findings are presented here in line with the research questions. we first describe the kinds of information available from concurrent self-reports, obtained via the emotion circle, the ans indicators, and the behavioural measures of emotion. these shed light on the components of emotions and learning processes, and the emotional responses that occur in professional learning (6.1). thereafter (6.2) we present findings based on the emotion-driven stimulated recall interview (sri), with comments on how these were connected to the learning situations in question. 6.1. information elicited via concurrent self-reports, ans indicators, and behaviour here we first describe findings from lisa’s self-reports on emotions, derived via the on-line emotion circle (6.1.1). we then address the facial expressions obtained via the automatic face reader; these functioned as complementary data to verify the self-reported emotions (6.1.2). ans data concerning electrodermal activity (eda), and heart rate variability (hrv), are presented in 6.1.3. 6.1.1. self-reporting of emotions via the on-line emotion circle (ec) the analysis of lisa’s self-reporting of emotions via the on-line emotion circle (done while watching the selected episodes of videotaped learning sessions) showed that during the four episodes (17 min. 23 s. in total), she clicked on emotions 134 times. table 1 shows that the clicks covered the whole time period fairly evenly. as expected, the quality of the emotions was different between the four episodes. emotionally neutral episodes (a1 and a2) were placed at the beginning and end of the session, with b1 and b2 being placed centrally. these were the episodes arousing the strongest emotions, and also the episodes which lisa had selected as most influential for her learning. regarding the contents of the emotions reported via ec, table 1 shows that only one of the given emotion words (irritation) was not reported at all. the most used emotion word was compassion. lisa would have liked to add to the ec the experience of feeling guilt, especially in episode b1. table 1 shows that emotion words with positive valence (used 94 times) were used more than twice as much as those with negative valence (used 40 times). table 1 further shows that the quality of lisa’s emotions was quite different between the four episodes. episodes a1 and a2 were fairly positive overall, since all the reported emotions exhibited positive valence (characterized by intrinsic pleasantness). as opposed to this, episode b1 included strong negative and unpleasant emotions, such as anxiety, sadness, and fear. these were accompanied by a high degree of compassion. by contrast, episode b2 was assessed as fairly positive in terms of emotions such as courage, surprise, joy, and excitement. in addition, episode b2 was reported as displaying safety and as including also compassion. table 1 number (absolute frequencies and percentages) of emotion words used in lisa’s assessment of four successive video-recorded episodes (a1, b1, b2, a2) via the on-line emotion circle (ec) while watching the videos, the perceived emotions were assessed by clicking on the emotion words given in the emotion circle (session i). however, this is a somewhat limited way of expressing one’s emotional experience. nor, taken by itself, does it indicate or give information on the reasons for the specific reported emotions. because of this, the ec assessments were further elaborated (in session ii) using the stimulated recall interview (sri) method (6.2). 6.1.2. face reader and valence of emotions differences between the episodes in terms of the valence of emotions were further analysed with the face reader (see figure 4). it confirmed the findings based on lisa’s self-reports via ec. episode b2 (figure 4 right) was characterized by positive valence, whereas episode b1 (figure 4 left) was full of very low and even negative valence. it should be noted that the glasses worn by the subject might influence error in counting gestures; thus, the findings showed a large amount of neutral emotion. empty spaces in the graph indicate the face reader software not being able to identify lisa’s gestures because she was temporarily facing away from the video. figure 4. screen captures of the valence of emotions within episodes b1 (left) and b2 (right). 6.1.3. ans responses: eda and hrv figure 5 shows the standardized eda values and hrv values calculated for the four subsequent episodes (a1, b1, b2, a2) in session i. in addition, in the middle of episode b1, there was a pause. this was not planned, and was due to technical problems. in fact, the sound of the video suddenly disappeared six minutes from the start of the viewing of the video, and four minutes from the start of episode b1, during the assessment of lisa’s emotions via ec. the technician then tried to work out the reason for the problem and make technical adjustments. this took eight minutes. during this time lisa could do nothing but sit and wait for the adjustments to be completed. in our analysis of the ans data, we realized that this technical failure provided valuable baseline data concerning lisa’s eda and hrv measures. at the start of the pause there was a rapid decrease in eda. this remained low over the next six minutes. just before the end of the pause there were some new attempts to recover the sound, with confusion and discussion concerning the functioning of the system. lisa participated in these discussions, as indicated by some increase in eda at the start of the pause. however, as figure 5 shows, during the pause, her eda values were at the lowest level for the entire data gathered during the session. the low eda values during the pause could be connected to her passive (non-agentic) motivational state (being unable to do anything). this is in accordance with kreibig’s (2010) suggestion concerning low eda: that it is characteristic of a passive state of action, and thus with deactivation of the sympathetic nervous system. if the eda levels of the pause situation are compared to the eda levels during episodes a1 and a2 (which were selected to represent emotionally neutral episodes) one can detect a clear difference between the pause and the ‘neutral’ episodes. especially in episode a1 (which came at the beginning of session i), the eda levels were relatively high, with two statistically significant (p<.05) peaks within this episode. these peaks apparently manifested lisa’s pleasure at seeing other participants in the video, coming into the room in which the training had taken place. episode a1 came at the start of the task of viewing and assessing emotions while viewing. hence, this represented a novel way of working, one that might set an additional load (manifested as an increased eda level). this is evident if one compares the eda levels between episodes a1 and a2. the latter was at the end of assessment session i, i.e. at a point when lisa was already familiar with the task. in the case of episode a2, she also knew that this was the last part of session i. she might therefore have felt more relieved than at the start of the session. nevertheless, if we focus (figure 5) on the eda levels during the personally meaningful episodes b1 and b2, we can see that there were many statistically significant high peaks in b1 (actually two before the pause, and 16 after the pause). in addition, the frequency of these peaks was fairly dense, and the frequency increased in the course of the episode b1. as described above, in lisa’s self-report concerning her emotions, episode b1 was described mostly in words with negative valence, such as anxiety, sadness, and fear. here, she reported a high level of compassion with the person who was talking about her health problems. in the sri, lisa said that she felt as if she were the body of that person. thus, the high eda values here appear to be connected with the subjective experience of high negative stress (kreibig, 2010). nevertheless, the high eda levels were not connected merely to stress with negative valence. figure 5 further shows that high eda peaks were present also in episode b2, which was subjectively assessed as having fairly positive valence (i.e. in terms of self-reports). episode b2 was described as including the subjective experience of courage, surprise, joy, and excitement. lisa also reported feelings of safety and compassion (see figure 5). this indicates that high eda is not connected merely to emotions with negative valence, and can be related also to emotions with positive valence. in general, lisa’s eda levels here seemed to be related to active cognitive and affective work while viewing episodes that were personally highly important to her, and to assessing her emotions while watching these. these situations, depicted in episodes b1 and b2, were also those which she had selected as highly meaningful for herself in terms of her perceived learning outcomes within the leadership coaching program. the eda peaks, which were connected to watching these episodes, would thus indicate critical incidents in the learning processes, but also a high intensity of emotion. this is in accordance with prior research on eda and its connections with the level of arousal, as manifested within active learning processes, and also in states of intense emotion (kreibig, 2010). figure 5 also shows the values and changes in hrv in the course of the initial assessment session (i). as compared to eda, which mainly indicates the activity of the more rapid sympathetic nervous system, hrv indicates the (generally more slowly fluctuating) activity of the parasympathetic nervous system. hrv was in the present case calculated in 60 s time windows, which were moved forward in 10-second steps to cover the whole session. the calculation of hrv as the root mean square of successive heart beat differences (rmssd) is not usually performed for periods shorter than 60 seconds (usually including 60–100 beats). hence, computationally, the changes in the hrv could not be as rapid as in eda, in which ten samples were used to represent phasic skin conductance responses in a second. an increase in hrv is a general marker of relaxing (stein, bosner, kleiger & conger, 1994). nevertheless, there is also some natural variation in the hrv, and it is thus not directly connected to the events that take place in a given situation. figure 5 shows that there was slowly changing variation in the hrv within session i. at the beginning of the session, during episode a1, the hrv was very high, thus functioning as a marker of a relaxed body. however, at the end of episode a1, the hrv dramatically decreased. it increased again at the start of the next episode, b1. thereafter, the hrv again decreased during the two eda peaks before the pause. at the beginning of the pause, when there was confusion arising from the loss of the sound for technical reasons, the hrv again steeply decreased. after the start of this unexpected technical failure, the hrv then increased during the next six minutes of the pause time. this level of hrv provided lisa’s individual baseline in the passive situation, in which deactivation of the sympathetic nervous system was indicated by low eda levels. after the pause, when episode b1 (characterized by negative valence) continued, the hrv tended to decrease throughout the episode, although the decrease was not linear. especially at the end of episode b1, when there were many high and dense eda peaks, the hrv was clearly decreasing. figure 5. self-assessments of emotions via ec, eda, and hrv during session i. the decrease in the hrv continued further in episode b2 which was characterized by positive valence, but also by high eda peaks. in the subjective self-assessment of episode b2, lisa gave many reports of surprise together with courage (see figure 5). however, towards the end of session b2, her hrv started to increase. here it should be noted that episode b2 was much shorter (4 min) than episode b1 (9 min 23 sec). since activation of parasympathetic nervous system combined with relaxing and calming down (indicated by the increase in hrv) takes place fairly slowly, this might have an influence on the delay in the hrv increase when hrv was activated within episode b2. next subchapter addresses to what extent does the emotion-driven stimulated recall interview (sri) promote reflection and hence learning, related to the original learning situation. 6.2. findings from the stimulated recall interview (sri) the assessments given via emotion circle (ec) together with the video-recorded learning episodes were used as the stimulus material for the sri. in the sri, we asked lisa to explain her thoughts, feelings, and bodily sensations, plus her reasons for her assessments via ec. with this procedure (sri) we mainly aimed to elicit the connections between emotions and the video-recorded learning situations. our findings, based on a qualitative content analysis of the transcribed interview data, showed that the emotion-driven sri produced very rich and heterogeneous descriptions, comprising self-reflection, other-related reflection, new emotion words, plus comments concerning the sri method. these descriptions illustrated reasons for specific emotions, and the connections between the reported emotions and the details of the learning situations and processes. in addition, the sri produced new insights concerning the meanings given by lisa to her own past behaviour, as well as the change in lisa’s behaviour during the learning assignment. table 2 provides a summary of the findings concerning the four episodes (a1, b1, b2, a2), including also a description of the emotional tone during the sri. a1 in the sri for this episode, the emotions lisa mentioned most frequently were ‘surprise’ and ‘joy’. she commented on the reasons behind these emotions in a neutral and calm manner. she had felt surprised and happy at seeing many familiar individuals arriving at the training venue. there were no intense emotional expressions or reflective comments relating to this episode. b1 in the sri for this episode, lisa provided nuanced descriptions, involving: (i) reasons for specific emotions, (ii) new emotions (i.e. emotions which she had not selected in the ec). she also gave reflective comments on the interactions while she viewed them. the comments involved (i) self-reflection on her own activities and bodily experiences, (ii) reflection on the activities of the other participants and the trainer, (iii) interpretative comments concerning other participants’ behaviour, (iv) reflections on the group atmosphere. in this episode, one participant was talking about her health problems. while watching the video, lisa re-lived strong emotions, i.e. ‘anxiety’, ‘sadness’, and ‘fear’. she described her re-lived emotions (and the self-reflection connected to these) as follows: …now i’m getting very anxious and i feel compassion, because this is my group…and i’m responsible for the fact that they have worked too hard, and i feel compassion and guilt, which is not included as an option in the ec. i swing between anxiety and compassion, but guilt is the strongest feeling, since i truly see that people are working too hard, but i wonder how…i haven’t realized that [sighing] so that it feels really bad…that i can feel i’ve failed… and i can feel sadness… lisa also described embodied experiences, in comments such as ‘surely, at that point my electrodermal activity was at a high level’, ‘i tried to take a deep breath in that situation’, or ‘there i definitely have tears in my eyes’. lisa suggested the addition of a new emotion word, ‘guilt’, which was not available in the ec. in addition, within the sri, she named the emotions ‘irritation’, and ‘joy’, which she had not selected in the ec assessment phase, even though these emotion words had been options in the ec. in the sri for b1, lisa also reflected on the behaviour of the group, as follows: ….then i look at the entire group, the way everybody is sort of frozen, or fortunately, it provides space [for the emotional expression] and in this sense there is safety in the group, so that people can empathize with each other and feel relieved... [smiling] b2: this episode consisted of working with ‘a difficult case’ using a drama method. in the related sri, lisa provided diverse descriptions, including (i) reasons for the specific emotions, (ii) self-reflection and self-analysis, (iii) reflection on the group, and (iv) a new insight concerning the reasons why she herself perceived the case as so difficult. the corresponding sri started with lisa’s self-reflection and self-analysis connected to the selected ec emotion of ‘surprise’, and to how this surprise emerged from her role-playing of a difficult character. she described this as follows: …so i’m wondering and i’m surprised, wondering if that’s me, i’m so bad at role-playing, but ok, i’m surprised that it’s as if oh my god…i have that difficult person in my mind all the time the one i’m role-playing…but also i’m astonished, about whether it’s me that’s speaking there or whether it’s my role character….actually i’m a bit ashamed about whether i’m so bad at role-playing but since i really am bad at role-playing… in the citation above, there is also comment on her own behaviour and on issues which are bothering her. these troubling thoughts emerged especially in relation to the other group members, and what they might think about her own role-playing of a difficult person. this further created self-doubt concerning what she had said in the group. she expressed the troubled feelings, including reflection on the group, as follows: …and then i look at those group members wondering what they’re thinking because at that time [referring to the original group learning situation] i couldn’t see it while i was concentrating on pretending to be that role character, so what i was saying … i’ve really got such a that.. uh uh that that somehow i’ve spoken inappropriately… after this, lisa spoke in a way that implied balancing between the feeling of being troubled and the emotion of being courageous (the emotion she had chosen most in the ec). she vacillated between a feeling of having been courageous, and a feeling of being ashamed of her courage. this can be seen as an attempt on her part to seek different interpretations of herself. there seems to be a struggle between, on the one hand, finding her own possible space, within which she can allow herself to be courageous, and on the other hand, social embarrassment, to the point of shame. the reflections here seem to involve tacit (previously unverbalized or unrecognized) emotion. however, within the sri, her talk took a positive direction, ending in laughter. in this way, the balance swung to a positive feeling, and to the conclusion that in fact she had not humiliated herself. after this, a novel insight emerged on the reasons why lisa had perceived the difficult case as so difficult. this insight emerged together with the reflective talk on her emotions, plus the reasons for these. in the sri (while watching the situation again in the video and elaborating her emotion of being courageous), she pondered on her behaviour as follows: … i’m surprised at how courageous i am [in this learning assignment]. um, i was pondering a lot …about whether i was going to dare to say it… that until then i was, well, that she was once my teacher…. when i started my studies in the department… and now [in this learning assignment] i could at last dare to tell her … even though she had been an authority figure to me… well okay i was really courageous… i still keep wondering how and why i was so bold to as say to her that here we are all in the same problematic situation, so why, i guess she had put herself above all the others, and she was a university lecturer when i first started to work in the department. well this has been for me i guess a kind of empowering moment and a really big issue, that i was able to tell her what i was thinking, since i have had a tendency to try to please everyone, and especially her, so that [earlier, when we worked together] i did not, i didn’t dare [laughs]… slightly later in the sri, lisa confirmed her new insight (which only emerged during the sri) as follows: ...now i look at this from a distance, this is how i have been acting… well maybe there’s some new insight about why this person was for me, well since she had previously been a great authority for me then that’s why it was such a big thing that after twenty or thirty years i had the courage to say to her you can’t act like that, tell people they can’t come to a group if they haven’t done their doctoral dissertation, just for that reason, come and make other people depressed, say you people are stupid [laughs] yeah this was something to remember… lisa also confirmed that this new idea, and the emotions attached to it, were merely those that emerged in the present situation (ec plus sri). she could remember that in the original learning setting (five years previously) she had role-played that person, but she did not remember the emotions she experienced at that time. a2: the sri for this episode produced utterances that were descriptive and fairly neutral – but also moderately positive – concerning the group atmosphere. they encompassed feelings of being safe, joyful, and surprised. in addition, she presented descriptive comments concerning individual participants in the group. table 2 summary of findings from the stimulated recall interview (sri) concerning the four episodes a1, b1, b2 and a2. 7. conclusions regarding complementarity and further challenges in using multiple methods our aim in this pilot study was to gain a preliminary understanding of how self-reports, indicators of ans, behavioural measures of emotion, and the emotion-driven stimulated recall interview (sri) may provide complementary information on the function of emotions in professional learning. in conducting the study, we wished also to elaborate the potentials and challenges of the multi-componential methodology we applied, in terms of researching emotions in professional learning. our interest in investigating emotions in relation to learning processes implies a need for a bi-directional perspective, involving how emotions elicit learning processes and outcomes, and how learning processes elicit emotions. in future discussion on emotions in learning we also need to consider recent discussion and disagreement between, on the one hand, scholars who think that emotions are universal and similar over different times and cultures (ekman, 1992), and on the other hand, those who see emotions as historically and culturally determined, and thus learned in socio-cultural contexts (barrett, 2006). the methods and tools used to which measure emotions from facial expressions (such as the facereader used in this study) based on ekman’s (1992) idea of basic emotions and the possibilities for universal measurement of them via facial movements. however, this may be misleading, bearing in mind the criticisms presented against the conception of universality in facial movements as indicators of emotions. human capabilities for emotion regulation, and individual differences in emotional intelligence (manifested as the ability to display emotions), can be expected to influence the presentation of emotions. there is, in fact, considerable evidence concerning the learning of emotions in cultural contexts, and this applies also to the learning and use of emotion words (barrett, 2006). the on-line self-reports (given via ec) and the sris based on these emerged as productive, not only in reporting and explaining one’s emotions during the learning process, but also in terms of promoting new reflective learning. this new learning was evidenced in what our subject, lisa, said in her sris, and especially in her elaborations of the emotions experienced while viewing the episodes. lisa’s self-reports via the ec, and behavioural data derived from her facial expressions, indicated that the selected episodes b1 and b2 were both connected with intense emotions. they also indicated a difference in valence, with the first of these (b1) being characterized by negative valence, and the second (b2) by positive valence. this was evident on the basis of her self-reports, and was validated by behavioural data derived from her facial expressions. the findings here include considerable measurement error, due to our subject’s need to use glasses. nevertheless, they are in line with previous studies showing high agreement between facial expressions and self-report data (harley et al., 2015). in the sri, lisa elaborated on her reasons for her emotion choices in the ec. while viewing the video clips, she could explain what had provoked the emotions. in the episode imbued with emotions of negative valence (b1), lisa reported compassion, anxiety, sadness, and fear. the next episode (b2), which was characterized by emotions with positive valence (surprise, courage, joy, compassion, excitement, and safety) seemed to produce new self-reflective ideas on the previous power relations operating between herself and a particular difficult person, and on why the relationship had been so difficult for her. this finding is in line with previous suggestions regarding the sri method as a means of promoting self-reflection (vall et al., 2018). however, the special feature in the present study was the use of the sri to focus specifically on emotions. the emotion-driven sri seemed to produce – in conjunction with reflections by lisa on herself, other members, the group interaction, and the reasons for her specific emotions – important insights concerning the causes underlying her ‘difficult case’. such novel learning, involving new insights, seems to support the productive nature of emotion-driven reflection on personally meaningful learning episodes. while strong emotions may exist as (partly unconscious) rapid events at particular moments, the elaboration of these moments afterwards, i.e. from a distance, appears to provide a productive basis for identity learning. the video-recorded episodes from personally meaningful learning settings unfolded first from distance in watching and assessing these via ec, and then slowly in the sri. they thus provided options to re-live and re-analyse the connections between the situation in which the emotions arose and the emotional responses that followed (see figure 1). all in all, this implies that within a methodology of presenting stimulus material, it is highly productive to select personally meaningful episodes for further elaboration, and to elaborate these from the perspective of emotions. the self-report method used in this study (combining the on-line ec reporting of emotions with the sri focusing on personally meaningful learning episodes) provided a powerful learning setting. it brought about deepened self-reflection, and novel insights, notably in the episode characterized by emotions with positive valence. overall, we would suggest that a combination of the on-line reporting of emotions with the emotion-driven sri is a promising methodology for investigating how emotions are connected to learning. in addition, such a combination can promote reflective insights that are of value in learning about one’s own identity. there is thus notable pedagogical potential in the sri, which appeared to bring about self-reflection, other-oriented reflection, insight, and learning. this finding resonates with the studies by vall et al. (2018) and huhtamäki et al. (2017), who noticed that the video-assisted sri stimulated reflection and insight, while facilitating therapeutic processing in couple therapy clients. during the learning process in session 1, the psychophysiological indicators eda and hrv were found to provide different and complementary information on the subject’s autonomic nervous system (ans) activity. the eda, which indicates the activity level of the sympathetic nervous system, was found to be at a high level during the viewing and assessing of the videotaped learning episodes. the lowest eda occurred during the pause, i.e. during a passive situation when there was nothing to do. comparison of the eda responses between the watched episodes showed that the highest eda peaks occurred during the viewing of personally significant episodes b1 and b2. in episode b1 (characterized by emotions with negative valence), there were many significantly high peaks, and the peaks were close to each other (i.e. dense). nevertheless, high eda peaks were also present in the other personally significant episode, i.e. the one characterized by positive emotions (b2). in this study, eda could be used to distinguish the personally meaningful episodes b1 and b2 from the passive and more neutral episodes a1 and a2. however, eda did not distinguish between the emotional valences (positive vs. negative) of the situation. for this purpose, we need other complementary methods, such as self-reports. another ans measure used here was heart rate variability (hrv), which is a measure of the activity of the parasympathetic nervous system (pns). in the present study, hrv increased (indicating high activity in the pns) thus pointing to processes of calming and relaxing during the (unintended) pause, as well as during emotionally neutral episodes a1 and a2. in contrast, hrv was found to decrease (thus indicating deactivation of the pns) in the emotionally intense episodes b1 and b2. one very interesting aspect was the decrease in hrv also during the transition from episode b1 (imbued with negative emotions) to episode b2 (imbued with positive emotions). however, the findings here should be treated with caution, given that this is a single-subject case study, and also that sensitivity to breathing (which is a feature of hrv) could have affected the hrv observations. in the present study we were unable to obtain breathing data due to a technical failure. hence, for future collection and analysis of hrv data it will be necessary to improve the reliability of the relevant equipment. the behavioural data (via face reader, gaze) were used merely to increase the reliability of the self-reporting data. in the face expression analysis we utilized – in line with our dimensional understanding of emotions – only the indicator of valence. as regards gaze data, in the present study these data were collected but not further analysed. however, the collection of such data would have utility in showing the focus of attention. in the future, we intend to use gaze data in addressing further the usability of ec. our efforts to construct measurement procedures for use under laboratory conditions produced many insights regarding technical details. these covered the proper installation of devices (such as, in the present case, the breathing belt which did not give proper data because it was too loosely tightened). furthermore, from the unplanned loss of the sound in the video display, and from the ans data collected during the pause, we learned that we actually require such a pause situation at the end of data collection sessions, to obtain the baseline for the subject’s ans data. one central challenge in this kind of multimethod measurement is the synchronisation of different devices. if there are problems with this, analysis of the data becomes very laborious. furthermore, in analysing ans data there are different time windows in eda and hrv. eda, which measures the responses of rapid sympathetic response systems, responds quickly (e.g. in a fight-flight situation). this means that the eda response occurs in just a few seconds from the stimulus situation or event. by contrast, the hrv measures the activity of the (slowly responding) parasympathetic nervous system. this means that in counting the hrv, the time window cannot be less than one minute. hence, the temporal connection to a specific event or situation remains less exact with hrv as compared to eda. for the future development of the emotion circle (ec), we gained much information from the emotion-driven stimulated recall interview (sri). it provided practical information concerning the usability and selection of emotion words. for the future, we need to test the optimal number of emotion words, and also how to take into account the saturation of colours in ec. in addition, there is a need to test a range of pictorial or iconic ways of displaying emotions in the ec. in the future, if our purpose is to measure simultaneously all three components of emotions (subjective experience, ans, and behaviour) within the processes of learning, we shall need a multidisciplinary team. we thus need to recognize that setting up this kind of multi-method measuring system will require a multidisciplinary group of researchers, comprising experts in educational sciences, psychology, psycho-physiology, and information technology. in addition, there will be a need for different kinds of inter-professional practical support and technical services. for future research on emotions in learning, we would suggest a focus on the continuities of emotional processes (in terms of the bodily activity of the ans, plus valence, and the intensity of the experienced emotions). the aim will be to create conditions that are optimal for researching and promoting supportive emotions in professional learning. keypoints this study developed a multi-componential methodology to measure emotions in learning. an on–line assessment tool, emotion circle (ec), was developed for the self-reporting of emotions during learning. the multimethod research design provided complementary information on the experiential, physiological, and behavioural components of emotions. the emotion-driven sri revealed connections between emotions and learning, and was productive in bringing about reflective learning and novel insights. acknowledgments the authors are grateful to the reviewers of the manuscript, and to donald adamson who polished the language of the article. they also wish to thank lauri viljanto and petri kinnunen for technical assistance, and hanna liljapelto who drew the figures. this work was supported by the academy of finland [grant number 288925 the role of emotions in agentic learning at work]. references azevedo, r., harley, j., trevors, g., duffy, m., feyzi-behnagh, m., & bouchet, f. et al. (2013). using trace data to examine the complex roles of cognitive, metacognitive, and emotional self-regulatory processes during learning with multi-agent systems . in r. azevedo & v. aleven(eds.), international handbook of metacognition and learning technologies (pp. 427-449). new york: springer. azevedo, r., taub, m., mudrick, farnsworth, & martin, s. (2016). interdisciplinary research methods used to investigate emotions with advanced learning technologies. in m. zembylas & p. schutz (eds.), methodological advances in research on emotion and education (pp. 231-243). dordrecht: springer. barsade, d., & gibson, s. g. (2007). why does affect matter in organizations? academy of management perspectives, 21, 36–59. barrett, l. f. (2006). are emotions natural kinds? perspectives on psychological science, 1, 28-58. barrett, l. f. & russell, j. a. (1999). the structure of current affect. controversies and emerging consensus. current directions in psychological science, 8,10–14. begaze version 3.7. (2016). manual.smi sensomotoric instruments. benedeck, m. & kaernback, c. (2010). a continuous measure of phasic electrodermal activity. journal of neuroscience methods, 190, 80-91. bondareva, d., conati , c., & feyzi-behnagh , r. (2013). inferring learning from gaze data during interaction with an environment to support self-regulated learning . international conference on artificial intelligence in education(pp. 229-239). springer. boucsein, w. (2012).electrodermal activity. newyork: springer. brentson, g. g., bigger, jr., eckberg, d. l., grossman, p., kaufman, p. g., malik, m., nagaraja, n. h., porges, s. w., saul, j. p., stone, p. h. & van den molen, m. w. (1997). heart rate variability: origins, methods, and interpretive caveats. psychophysiology, 34, 623-648. darwin, c.r. (1872/1965). the expression of emotions in man and animals. london, uk: john murray. dimaggio, g., lysaker, p., carcione, a., nicolo, g., & semerari, a. (2008). know yourself and you shall know the other to a certain extent: multiple paths of influence of self-reflection on mindreading. consciousness and cognition, 17, 778-789. ekman, p. (1992). argument for basic emotions. cognition and emotion, 6(3-4), 169-200. eteläpelto, a. (2017). emerging conceptualisations on professional agency and learning . in m. goller, & s. paloniemi (eds.), agency at work: an agentic perspective on professional learning and development (pp. 183-201). springer: cham. eteläpelto, a., hökkä, p., paloniemi, s., vähäsantanen, k., lappalainen, v., eteläpelto, t., & niskanen, k. (2017). developing an on-line application, emotion circle (ec) for the self-assessment of emotions in agentic learning at work. in l. g. chova, a. l. martínez, & i. c. torres (eds.), iceri 2017 proceedings:10th international conference of education, research and innovation. seville, spain (pp. 7763-7769). eteläpelto, a., vähäsantanen, k., hökkä, p. & paloniemi, s. (2013). what is agency? conceptualizing professional agency at work. educational research review, 10,45-65. http://authors.elsevier.com/sd/article/s1747938x13000274 eteläpelto, a., vähäsantanen, k., hökkä, p. & paloniemi, s. (2014). identity and agency in professional learning. in s. billett, c. harteis and h. gruber (eds.), international handbook of research in professional and practice-based learning volume 2 (pp. 645-672). dordrecht: springer. facereader version 6.1. (2015). reference manual. wageningen, nl: noldus information technology. fredrickson, b. l., & branigan, c. (2002). positive emotions broaden the scope of attention and thought-action repertoires. cognition and emotion,16, 313-332. frijda, n. h. (1986). the emotions. cambridge, uk: cambridge university press. fuller, f. f., & manning, b. a. (1973). self-confrontation reviewed: a video tape. a case study. journal of counseling psychology, 10(3), 237-243. gendolla, g. h. e. (2017). do emotions influence action? of course, they are hypo-phenomena of motivation. emotion review, 9(4), 348-350. harley, j.m., bouchet, f., hussain, m.s., azevedo, r. & calvo, r. (2015). a multi-componential analysis of emotions during complex learning with an intelligent multi-agent system. computers in human behavior, 48,615-625. hautala, j., loberg, o., hietanen, j. k., nummenmaa, l., & astikainen, p. (2016). effects of conversation content on viewing dyadic conversations. journal of eye movement research, 9(7), 1-12. hökkä, p., vähäsantanen, k., paloniemi, s., & eteläpelto, a. (2017). the reciprocal relationship between emotions and agency in the workplace . in m. goller, & s. paloniemi (eds.), agency at work: an agentic perspective on professional learning and development (pp. 161-181). springer: cham. doi:10.1007/978-3-319-60943-0_9 homan, a.c., van kleef, g.a. & sanchez-burks, j. (2015). team members’ emotional displays as indicators of team functioning. cognition and emotion, 30(1), 134-149. hommel, b., moors, a., sander, d., & deonna, j. (2017). emotion meets action: towards an integration of research and theory. emotion review, 9(4), 295-298. huhtamäki, h., lehtinen, r., kykyri, v-l., penttonen, m., karvonen, a., kaartinen, j., & seikkula, j. (2017). oivaltamisen hetket pariterapia-asiakkaiden jälkihaastatteluissa.[moments of insight in the stimulated recall interview of couple therapy clients]. perheterapia,33(4), 36-50. hugdahl, k. (1995). psychophysiology: the mind-body perspective. cambridge, ma: harvard university press. james, w. (1884). what is emotion? mind, 9,188-205. kagan, n., krathwohl, d. r., & miller, r. (1963). stimulated recall in therapy using video tape: a case study. journal of counseling psychology, 10,237-243. karvonen, a. (2017). sympathetic nervous system synchrony between participants of couple therapy. jyväskylä studies in education, psychology and social research599. university of jyväskylä. jyväskylä university printing house, jyväskylä, finland. kratochwill, t.r., hitchcock, j.h., horner, r.h., levin, j.r., odom, s.l., rindskopf, d.m., et al. (2013). single-case intervention research design standards. remedial and special education, 34,26-38. kreibig, s. d. (2010). autonomic nervous system activity in emotion: a review. biological psychology, 84(3), 394–421. kykyri, v.-l., karvonen, a., wahlström, j., kaartinen, j., penttonen, m., & seikkula, j. (2017). soft prosody and embodied attunement in therapeutic interaction: a multimethod case study of a moment of change. journal of constructivist psychology, 30(3), 211-234. doi:10.1080/10720537.2016.1183538 labone, e., & long, j. (2016). features of effective professional learning: a case study of the implementation of a system-based professional learning model. professional development in education, 42(1), 54–77. laitila, a., vall, b., penttonen, m., karvonen, a., kykyri, v-l., tsatsishvili, v., kaartinen, j. & seikkula, j. (2018). the added value of studying embodied responses in couple therapy research: a case study. family process 57,1-13. lazarus, r. s. (1991). emotion and adaptation. new york: oxford university press. levenson, r. w. (1994). human emotion: a functional view. in p. ekman & r. j. davidson (eds.), the nature of emotion: fundamental questions(pp. 123-126). new york: oxford university press. levenson, r. w. (2003). autonomic specifity and emotion. in r. j. davidson, k. r. scherer & h. h. goldschmith (eds.), handbook of affective sciences(pp. 212-224). new york: oxford university press. levenson, r. w. (2014). autonomic nervous system and emotion. emotion review, 6(2), 100-112. lyle, j. (2003). stimulated recall: a report on its use in naturalistic research. british educational research journal, 29(6), 861-878. manolov, r. & onghena, p. (2017). analyzing data from single-case alternating treatment designs. psychological methods, 16 (march), 1-25. mauss, i. b., levenson, r. w., mccarter, l., wilhelm, f. h., & gross, j. j. (2005). the tie that binds? coherence among emotion experience, behaviour and physiology. emotion, 5, 175-190. mauss, i. b., & robinson, m. d. (2009). measures of emotion: a review. cognition and emotion, 23,209-237. pekrun, r. (2016). using self-report to assess emotions in education. in m. zembylas, m. & p. a. schultz (eds.), methodological advances in research on emotion and education. springer international publishing. pekrun, r., elliot, a. j., & maier, m. a. (2006). achievement goals and discrete achievement emotions: a theoretical model and prospective test. journal of educational psychology, 98(3), 583-597. perry, d.p. (2006). fear and learning. trauma-related factors in the adult education process . new directions for adult and continuing education, 110, 21-27. philpott, c., & oates, c. (2017). teacher agency and professional learning communities: what can learning rounds in scotland teach us? professional development in education,43(3), 318–333. porges, s.w. (2001). the polyvagal theory: phylogenetic substrates of a social nervous system. international journal of psychophysiology, 42, 123-146. posner, j., russell, j. a. & peterson, b. s. (2005). the circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. development and psychopathology, 17(3), 715-734. quintana, d. s. & heathers, j. a. j. (2014). considerations in the assessment of heart rate variability in biobehavioral research. frontiers in psychology, 5, 805. doi: 10.3389/fpsyg.2014.00805 quintana, d. s., alvares, g. a. & heathers, j. a. j. (2016). guidelines for reporting articles on psychiatry and heart rate variability (graph): recommendations to advance research communication. transl. psychiatry, 6, 803. russell,j. a.(1980). a circumplex model of affect. journal of personality and social psychology,39, 1161–1178. russell, j. a. (1989). affect grid: a single-item scale of pleasure and arousal. journal of personality and social psychology,57 (3), 493-502. russell, j. a. (2003). core affect and the psychological construct of emotion. psychological review, 110(1), 145-172. russell, j. a. (2005). emotion in human consciousness is built on core affect. journal of consciousness studies,12(8-10), 26-42. russell, j. a., barrett, l. f. (1999). core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. journal of personality and social psychology, 76(5), 805-819. scherer, k. r. (2009). the dynamic architecture of emotion: evidence for the component process model. cognition and emotion, 23,1307-1351. stein, p. k., bosner, m. s., kleiger, r. e., & conger, b. m. (1994). heart rate variability: a measure of cardiac autonomic tone. american heart journal, 127,1376–1381. storbeck, j. & maswood, r. (2015). happiness increases verbal and spatial working memory capacity where sadness does not: emotion, working memory and executive control. cognition and emotion, 30, 925–938. sung, b., & yih, j. (2015). does interest broaden or narrow attentional scope? cognition and emotion, 30(8), 1485-1494. tarvainen, m., lipponen, j., niskanen, j.-p., & ranta-aho, p.o. (2018). kubios hrv(ver. 3.1.) user’s guide. kubios oy (limited company) /www.kubios.com/ tomkins, s. s. (1962). affect, imagery, consciousness. vol. 1. the positive affects. new york: springer. vähäsantanen, k., paloniemi, s., hökkä, p., & eteläpelto, a. (2017). agentic perspective on fostering work-related learning. studies in continuing education, 39(3), 251–267. http://www.tandfonline.com/doi/full/10.1080/0158037x.2017.1310097 vähäsantanen, k., hökkä, p., paloniemi, s., herranen, s., & eteläpelto, a. (2017). professional learning and agency in an identity coaching programme. professional development in education, 43(4), 514-536. http://www.tandfonline.com/eprint/fyumgibsbiigezx5mxzp/full vähäsantanen, k., paloniemi, s., hökkä, p., & eteläpelto, a. (2017) . an agency-promoting learning arena for developing shared work practices . in m. goller, & s. paloniemi (eds.), agency at work: an agentic perspective on professional learning and development (pp. 351–371).springer: cham. doi:10.1007/978-3-319-60943-0_18 vähäsantanen, k., räikkönen, e., paloniemi, s. hökkä, p., & eteläpelto, a. (2018). a novel instrument to measure the multidimensional structure of professional agency. vocations and learning, https://doi.org/10.1007/s12186-018-9210-6 vall, b., laitila, a., borcsa, m., kykyri, v.l., karvonen, a., kaartinen, j., penttonen, m., & seikkula, j. (2018). stimulated recall interviews: how can the research interview contribute to new therapeutic practices? revista argentina de clinical psicologica, xxvii, 2, 274-293, doi: 10.24205/03276716.2018.1068 winkelman, p., & berridge, k. c. (2004). unconscious emotion. current directions in psychological science, 13, 120-123. winkler, i. (2018). identity work and emotions: a review. international journal of management reviews, 20,120–133. öhman, a., carlsson, k., lundqvist, d., & ingvar, m. (2007). on the unconscious subcortical origin of human fear. physiology and behavior, 92, 180-185. zembylas, m., & schultz, p. a. (2016). (eds .), methodological advances in research on emotion and education. springer international publishing. zwart, r.c., korthagen, f.a.j, & attema-noordewiever, s., (2015). a strength-based approach to teacher development. professional development in education, 41(3), 579–566. codepen 7. dahlberg frontline learning research special issue vol.9 no.2 (2021) 145 169 issn 2295-3159 widening participation? (re)searching institutional pathways in higher education for migrant students the cases of sweden and italy giulia messina dahlberg1, sylvi vigmo1 & alessio surian2 1university of gothenburg, sweden 2university of padova, italy article received 17 may 2020/ revised 30 november / accepted 4 december / available online 11 march 2021 abstract the aim of this study is to shed light on the ways in which transitions and support are framed in policy contexts in relation to widening participation in higher education (he) in sweden and italy. more specifically, this study investigates the ways in which the discourse about the inclusion of migrant students in he is framed in relation to the kinds of support for this group offered in two higher educational institutions, in sweden and italy. furthermore, the study sheds light on the ways in which policy ideas about transition and widening participation are enmeshed in the students’ narratives and how they affect their experiences of participation, normalization and marginalization in he. the analysis includes two datasets: i) national policy, laws and regulations and webpages of a selection of national universities and university colleges; and ii) ethnographically generated data that builds upon a case-study design and consists of audio recordings of informal discussions and interviews with students. we are, in this study, interested in framing diversity in terms of a move beyond the naturalization of hegemonic stances where labelled “others” (e.g. based on cultural/ethnic background, functionality, socio-economic status) are treated as essentialized or mutually exclusive categories. one of the central, frontline contributions of this study, lies in its attempts to analytically scrutinise processes of inclusion and marginalisation that include a broad analytical gaze. this allowed us to analyse the mismatch between the range of support provided, and the actual needs and challenges that migrant students meet in their transition and participation to higher education in two european countries. keywords: transitions; widening participation; migrant students; higher education; vertical case studies. info corresponding author email: giulia.messina.dahlberg@gu.se doi: https://doi.org/10.14786/flr.v9i2.655 1. introduction 1.1 disentangling diversity: the case of the intersection of higher education with migration due to the migration waves in the past three decades in the geopolitical spaces of europe, student heterogeneity has gained new dimensions in addition to social and economic mobility. european countries have faced and are facing similar challenges with regard to the inclusion and integration of migrants in all sectors of society, not least in higher education (he). mobility is here understood as a human condition wherein contemporary migration is no longer framed as a “unidirectional migrant passage” (guo, 2015, p.9) but rather as “the multiple and circular migration across transnational spaces” (p. 7). such a precarious condition, we argue, may afford and/or prevent individuals’ opportunities for full participation and socialization in society in a range of different ways. over the past century, higher educational institutions have started to engage with a compelling agenda for the inclusion and integration of an increasingly “diverse” student population. in the geopolitical space of sweden, issues of openness and inclusion have marked educational policy since world-war ii, not least in regard to higher education. a government proposal “the open university” (prop. 2001/02:15) discusses widening recruitment and participation where fundamental issues include the need for the student population to reflect the makeup of society generally, and a belief that all levels of diversity, including the heterogeneity of ideas, beliefs and approaches, is key for achieving academic excellence. in italy, higher educational institutions’ policies concerning the inclusion and promotion of diversity, are informed by regulations and resources by the ministry of interior and by the ministry of research and university. for instance, in 2020, migrants who were granted international protection by the ministry of interior were offered 100 study grants to access italian universities. furthermore, unhcr produced a specific manifesto for the inclusive university that addresses the inclusion and integration of refugee students and academics (unhcr, 2019), signed by 43 italian universities so far. there are, however, several tensions between how the discourse about diversity, integration and widening participation gets framed in policy documents like the ones briefly presented above, and the ways in which practices of inclusion and support play out in the everyday life of students and academics. in the present study, the university is understood as a complex site where political, economic and social interests intersect (bacevic, 2019; tummons & beach, 2019). this involves the idea that the university is playing a major role in the global corporatisation, marketisation and massification of (higher) education on the one hand (giroux, 2010; tight, 2019), and the changing and growing societal expectations on he to support democracy and equity in the 21st century on the other (see also fox, baker, charitonos, jack, & moser-mercer, 2020). furthermore, the fields of intercultural studies and global education have yet to come to terms with diverging understandings of cultural differences according to reference frameworks that prioritize value rubrics, cultural mobility and “effective” cross-cultural communication in contrast with social mobility and critical decolonial perspectives (walsh, 2012). the research presented in this study is an attempt to shed light on and unpack some of these tensions and contradictions. taking the above as points of departure, we are, in this study, interested in framing diversity in terms of a move beyond the naturalization of hegemonic stances where labelled “others” (e.g. based on cultural/ethnic background, functionality, socio-economic status) are treated as essentialized and, even worse, mutually exclusive categories. we argue for a conceptualisation of identity, and therefore also diversity, in terms of an intricate process in which the agency of human beings does not exclusively reside in the single individual, but is also always situated, i.e. related to contexts, from specific micro-moments, to macro-structures, of which policies are an important part (see also ecclestone, biesta & hughes, 2010). diversity, from such a line of thinking, is something that gets done in a practice that is entangled with other practices, both at the micro and macro analytical level. finding (new) ways to explore and understand practices from such analytical positions is, we argue, of crucial importance when dealing with the study of the many dimensions of life for marginalized groups in society. in such a line of thinking, practices of being and becoming a student are linked to institutional contexts, social regulations and many other practices in which the social and the material are entangled (gherardi, 2017). such a “relational epistemology” (law, 1994), we argue, highlights the performative dimension of sociomateriality and provides us with the theoretical tools for the study of identity as being in a constant state of becoming, i.e. fluid, ever changing and situated, rather than essentialistic, fixed or static (hancock, 2016). and yet, “fixed” binary categories (e.g. either you are a migrant or not) are sine qua non conditions and play a crucial role in the ways in which support services, special programmes and educational arrangements are planned and operationalised in policy as well as in practice. furthermore, diversity is always entrenched in a fabric of ideologies and hegemonies (see e.g. messina dahlberg & bagga-gupta, 2019). who is “diverse” is often framed as in need of some sort of institutionalised support in policy (bagga-gupta, messina dahlberg & winther, 2016; bagga-gupta, messina dahlberg & vigmo, 2020), and this is thought of and done differently in different communities. the aim of this study is to shed light on the ways in which policy ideas about widening participation, diversity and inclusion in he are related to practices in the study of students’ institutional pathways in two european countries by answering the following questions: what are the ways in which the discourse about the inclusion of students in he is framed in relation to the kinds of support for migrant or refugee students offered in two higher educational institutions, in sweden and italy? in what ways are policy ideas about transition and widening participation enmeshed in the students’ narratives and how do they affect their experiences of participation, normalization and marginalization in he? given the aim and the research questions outlined above, we use a comparative approach to illustrate the ways in which widening participation, diversity and mobility are both context-sensitive, local phenomena, but also respond to a global discourse of growing societal expectations on he to include a diverse student population. the analysis is based on two selected european countries, sweden and italy, located at the antipodes of the european map. italy’s geographical position is strategically central for migration waves to europe that originate from the global south. swedish immigration has historically been characterised by labour migration and an integration policy largely based on notions of equality and diversity (in terms of multiculturality and integration, rather than assimilation, see also kupský, 2017). thus, the two countries’ ideological and political agendas build upon different cultural and historical frames of reference, that, in turn, affect the ways in which inclusion and integration are framed in policy and implemented in practice. such perspectives, we argue, deserve further scrutiny from a comparative and multi-scale perspective. the rather ambitious endeavour to investigate policies and their mutual relations to practice in two different nation-states across time and space has been undertaken in the study of two datasets. these have allowed us to partially connect different levels of analysis from micro to macro levels and to provide further insights on the ways in which students’ lived experiences are mutually entangled with current national and supranational policies about transitions and widening participation in he. 1.2. universities as boundary spaces there is an extensive body of research concerning the intersection of higher education and migration, where the focus lies on migrant students’ lived experience. students are often framed in the literature in terms of being in a “betwixt space”, a space between home and he, which risks leading to a sense of not belonging anywhere (dunwoodie, kaukko, wilkinson, reimer & webb, 2020; hope, 2017) or as undergoing a “transitional experience” in the physical context of the university (macfarlane, 2016). similarly, the work by gennep and maskalaki (middleton, 2018), focuses on “liminality” as the observation and analysis of factors clustering around the “rites of passage” that mark an individual’s transition from one status to another; and on “emplacement”, i.e. a sense of common identity and place in relation to boundary spaces (such as institutional educational spaces). political and policy initiatives that aim to widen participation in he have been investigated in relation to how potential students experience and navigate ways into academic studies, and what barriers can negatively affect pathways into he (farenga, 2018; lopez gavira & moriña, 2015; macfarlane, 2016). furthermore, challenges for first-generation as well as low-income students are linked to conditions for guidance and support during the attempt by higher educational institutions to reach these student groups (brown, wohn & ellison, 2016). however, access to information needs contextualization and “translation” from a more knowledgeable person to situate and link that information to students’ everyday practice and social network practices, that, thus, become important access points for the student (2016). a qualitative interview study in germany (schneider, 2018), focuses on syrian asylum seekers and refugees (asrs) applying to german higher education. the general entry requirements were linked to constraining financial costs given in the asylum seekers’ and refugees’ accounts of their socioeconomic situation, and were closely connected to an identity marker of being a refugee. schneider (2018) draws the conclusion that the formal university admission requirements would “morph into practically insurmountable barriers” (p. 468). other barriers were the language requirements, and the total lack of recognizing previously achieved skills and qualifications, unless academic documents were translated into german for verification. further insights provided through research reinforce the necessity to widen the perspective to integrate the multifaceted social practices present in students’ diverse life trajectories (hope, 2017; kyndt, donche, trigwell & lindblom-ylänne, 2017; trigwell, 2017). taylor and harris-evans (2018) highlight that the “entangled” and “irregular” transition processes must be revisited and reconceptualized to involve students’ lived realities. similarly, gale and parker (2014) point to a lack of research on the “transition as becoming” metaphor, and argue that students’ lives and their realities should become involved. this also means ensuring that other actors, who influence the transition processes, are included to address the complexities found in the non-linear processes of transition across time and space (kyndt, et al., 2017). revisiting mainstream norms, and how these impact conditions for transitions processes, calls for more diverse approaches and flexibility concerning the investigation of how students receive support (taylor & harris-evans, 2018), and what it means for “becoming, being and achieving” as a student in he, as hope (2017) succinctly puts it. adopting a linear perspective for how to adapt and conform to being a student in he, departs from the notion that the student her/himself is involved in the attempt to fit the university student norm (taylor & harris-evans, 2018). to address these challenges, a more fine-grained theoretical as well as holistic approach can contribute to balance perspectives and support the development of transition practices (kyndt et al., 2017). to conclude, the research presented in this section highlights a central challenge for migrant students, i.e. the realization and recognition of the value of their competences and abilities which are now not incorporated in the set of skills that seem to be required to participate in and complete mainstream higher educational programmes. there is, however, a glaring lack of research that focuses on how such a deficit perspective varies and impacts on students’ educational pathways in terms of transitions and participation in different disciplines and professional/vocational programmes (mangan & winter, 2017; harvey & mallman, 2019). comparative studies that focus on educational landscapes in countries in europe that have received large amounts of migrants over the past decades, are also scarce. furthermore, a general concern raised in the literature is the epistemological and methodological issue of integrating different aspects that have an impact on students’ educational pathways and transitions to he across analytical levels (gravett, kinchin, & winstone, 2020). this article aims at contributing and filling this gap in the analysis of two datasets, where the intersections between policy and practice are scrutinized in a comparative study between sweden and italy. in the remainder of the paper, section 2 discusses the vertical case study as the methodological approach. the results are presented in section 3. the study ends with a discussion of the results and their implications for the inclusion and integration of a diverse student population in higher education 2. method: vertical case study the overarching issue that this study aims at taking onboard is to investigate the ways in which policy ideas are recontextualized in local sites of practice. to address this aim and the research questions we use a multi-sited (marcus 1995), comparative approach to ethnographic studies of educational practices that is no longer bound to a specific “group” or a community, but, and in line with our take on diversity and identity outlined above, “reframes the basic unit of cultural analysis as processual and iterative” (eisenhart, 2017, p.142), thus expanding the contextualization of ethnography to shed analytical focus on how policies, practices and institutionalized arrangements in one site, move and are entangled to other contexts and timescales. bartlett and vavrus (2014) called such an approach “vertical case study”, but they emphasise that the term “vertical” is a reminiscence of how they initially conceptualised the approach that now incorporates, besides the vertical, also horizontal and transversal elements of comparison. figure 1: illustration of the vertical case study approach (adapted from bartlett & vavrus, 2014). given the aim of this study, such a comparative approach deals, on the horizontal axis (figure 1), with a variety of contexts that includes case studies with students from the spaces of sweden and italy at selected universities. university a (unia) and university c (unic) are located in sweden, while university b (unib) is located in italy. the transversal axis represents policy implementation across time and that, we argue, represents the dimension that may assist us in the endeavour to come away from research methods that, more or less exclusively, focus upon a particular singular group or population, to reframe interpretive logics and representational techniques as travelling “across time, space and level, rather than as the characteristics or lifeways of bounded groups” (eisenhart, 2017, p.135). the data for the vertical case study outlined here was created through two complementary methods: analysis of policy of support (dataset 1) and case studies of three emblematic students from a broader dataset of 13 participants (dataset 2). the first method was employed to analyse policy content and its connection across levels and sites. the second method was used to analyse the students’ narratives of their lived experiences of transitions to he. these methods used together allowed us to deal with micro and macro levels of analysis and to shed light on the relations across levels but also on their potential to engage with multiple spatial fields or sites, represented by the blue, waved line in figure 1. thus, the multi-sited ethnographies reported in this study offer means of examining the ways in which the participants in the case studies make sense of their transitions towards becoming university students when the conditions to achieve such transformations are constituted and constrained by their connection with policy, educational activities, support services, and technologies. the policy of support (dataset 1) corresponds to the vertical axis of comparison (macro-micro). the analysis of dataset 2 connects the horizontal axes (policy implementation across sites) in terms of the analysis of students’ narratives of transitions and participation in higher education at their local practices (micro-level). the nature and process of data creation of the two datasets are detailed further in the next two sub-sections. 2.1 dataset 1 the nature of dataset 1, created using an ethnographic approach by means of connection across websites and documents, lends itself to envisage the analytical scaling model from macro to micro in terms of entanglements, rather than levels or layers. this logic also gives further fuel to the debate about the notion of analytical levels and on the kinds of (dis)advantages that it may bring, in relation to an alternative perspective according to which there are always macro aspects in the micro and vice versa (see e.g. faubion & marcus, 2009; tsing, 2005). dataset 1 includes a selection of the provision of support services in he as it is framed and described in webpages and policy documents in a selection of national universities in sweden (unia) and italy (unib). figure 2. vertical entanglement of policy. overview of the principal nodes and connections. figure 2 illustrates the entanglements from the first searches using a selection of keywords in the different webpages (e.g. the “entry points” box in light grey). the illustration develops in two opposite vertical directions, starting from the entry points box in the middle. on the top half, the path in the italian data is represented, while the bottom half shows the principal nodes in the swedish data. figure 2 could be envisaged as an hourglass, of which the entry points box constitutes the narrowest part. the metaphor of the hour glass also allows us to illustrate the vertical movement “across scales” as well as search words that have been a central aspect in a distillation process in terms of “an operation of extraction” (marres & weltevrede, 2013, p. 2019) that, we argue operationalises the process of creating representative and relevant data. figure 2 represents different aspects of this process. firstly, it shows the paths undertaken, from the first search words, to the documents found in the searches. secondly, it is an illustration of the tensions that exist in the attempt to present different parts of the data according to a logic of scales, from micro to macro, or local to global, and vice versa. figure 2 also visualises the results of the searches that will be presented in further details in section 3.1. 2.2 dataset 2 in dataset 2, we specifically draw upon data from two projects, led by the two co-authors in this study. both projects focus upon students’ transitions to he and consist of ethnographically generated data that build upon a case-study design (middleton, 2018; yin, 2018). the data includes audio recordings of informal discussions and interviews with 13 participants as well as document data in settings in the nation-states of sweden and italy. the data focused upon in this dataset includes three semi-structured interviews in particular, that were carried out with three participants for dataset 2, two in sweden and one in italy. the participants represent the breadth among the student group focused upon in this study (migrant and refugees) and two of them are students at the universities that were selected for the creation of dataset 1. dataset 2 corresponds to the horizontal axis of comparison in the vertical case study approach, i.e. the comparison of policy implementation at the local level, across sites. the participants’ backgrounds range from recent asylum seekers, to students migrating in early childhood and those migrating as teenagers. the set of procedures that guided the analysis of dataset 2 were inductively oriented, and aimed at identifying “patterns across the dataset” (braun & clarke, 2006; braun, clarke & weate, 2016). firstly, we familiarized with the data by revisiting the transcriptions in several collaborative steps, as a reflective process, while ensuring to remain open to alternative patterns and in search of overlapping themes in the recorded interviews. this was followed by further refinements by critically approaching our initially suggested themes, and as a result of continued analysis (braun & clarke, 2006; braun, clarke & weate, 2016), we reached the results presented in section 3. the portraits based on the demographic parameters of the cases used in this study are spelt out below. m, born in turkey, is now in her mid 20s. m came to sweden as a toddler. though she has gone through the swedish educational system, her parents were working, not aware of the expectations of parents to engage in supporting and encouraging children’s learning also from home. the parents’ level of education is very low, and according to herself, she comes from a so-called socio-economic exposed area. her education path before entering university was marked by challenges, starting with the fact that she learned to read very late. she has been awarded a bachelor degree in pedagogy, which was the programme that opened the door to university studies, but not her first choice. after m gained her bachelor degree, she did not apply for a job, and, now more confident as a university student, she decided to continue her studies. at present she is finalizing her second bachelor in the field of organization and staff development, focusing on human resources, with a strong foundation in sociology and human work science, at uni c, sweden. f, who is now in her late 30s, was born in the kurdish part of iran. she came to sweden at thirteen, after being smuggled across the border between iran and turkey, and was sent to turkish prison for a period, as she was deemed to be an adult. f was brought up by her grandparents, while her father had immigrated to sweden for political reasons. due to her grandparents preventing her from attending the village school, f only had sporadic periods at school. f studied on her own at home, and took tests up to year six without her grandparents’ knowledge. in sweden, f was placed in a preparation class for a short time, and then placed in a class according to her age and not based on skills. as f did not understand what teachers were saying, the years in secondary school were lost. after attending adult classes (upper secondary level) to get grades required for university studies, f applied for freestanding courses that were later compiled into a bachelor degree in psychology, at unia. after her bachelor exam, f has worked at the swedish migration agency for a couple of years, handling new migrants’ and immigrants’ applications, in particular due to f’s familiarity with kurdish and persian as her two first languages. at present she is finalising her master programme in public administration, management and guidance, at unia, sweden. a is a 34 years old syrian man, with refugee status who at the time was enrolled in the second year of studies at political science bachelor degree at the uni b, italy. he had arrived in italy after deserting the syrian army and a transition period in lebanon where he taught himself english and had found a way to relate to the wider world and to earn some money by teaching arabic to english speaking people. in syria, after secondary school, he had been forced to study computer science although it was not what he wanted to do. his secondary school marks did not allow him to study political science (his first choice) at university. computer science was the only possible choice. he is a gifted storyteller, interested in writing with a focus on reporting about syria. he supports himself by translating, teaching arabic, and when necessary by working as a waiter. 2.3 analysis across axes of comparison a vertical case study approach presents a set of challenges in terms of producing (re)presentation of the results of the analysis that include important details that, put together, make a coherent whole. another challenge has been to re-think the “global/local antinomy” (bartlett & vavrus, 2014, p.134) in order to become sensitive in our analysis about the ways in which policy gets its materiality not only through inscriptions, but also through encounters and practices. finally, even our focus on a particular group and selected nation states has had methodological implications, in terms of moving away from clearly bounded research sites on the one hand, and using the same bounded notions of nation-states or specific named-groups, on the other. in order to attend to the complexity of the different dimensions in a vertical case study, not least in the use of two datasets, we used several techniques. we engaged with the analyses of datasets recursively, in that they shed mutual light to one another as the themes and interesting patterns were identified in the data. for instance, the analyses of selected chunks in the data were discussed in data sessions in relation to the ways in which they contributed to shed light on the different dimensions of a widening participation agenda in both countries, i.e. how they differed or overlapped. thus, another important gap that this study aims at filling by this frontline mix-method approach is the one between the “official and local enactment [of policy] and the proliferation of unintended consequences” (fenwick, richard & sawchuk, 2011, p.115). in addition, a key particular element of such a generative and collaborative process at a vertical level, across micro and macro, and at a horizontal level, across sites, was to inductively identify key overarching meaning patterns that were further screened through thematic analysis (braun & clarke, 2006, 2019; braun., clarke, & weate, 2016). the results of the analyses are presented in section 3, in two separated sub-sections, starting with the vertical axis of comparison and a focus on dataset 1 in section 3.1, to move to the horizontal axis and the analysis of policy implementation across sites in the narratives of our selected case studies in dataset 2, in section 3.2. the final section, 4, presents an overarching discussion and some implications that go beyond the nation-state focus of sweden and italy. 3. results 3.1 tracing vertical policy entanglements (macro-micro): unia and unib in this section, we present the results from the analysis of dataset 1, i.e. provision of support services for migrants and refugees. focus lies on the vertical axis of comparison (macro-micro) starting from the “entry points” (see figure 1) in the search conducted in each university. we start by presenting the result from the entry points in first unia and then unib across the different documents to trace their entanglements across scales and sites on the vertical and horizontal axes of comparison (see figure 1 & 2). 3.1.1 unia the analysis of unia webpages shows that issues of access and inclusion overlap across different groups where the categories of “migrants”, “refugees”, “newly arrived” “asylum seekers” blend together and are not mutually exclusive in the swedish data. figure 3 is an illustration of one of the results from the entry points in the unia searches. figure 3. university a. capture of the full-size screenshot of the website (left). zooming in the central area of the page (right). translation from swedish in the text box. the page that contains information about a mentoring programme for newly arrived migrants (see figure 3) is addressed to university students that have the opportunity, by participating in this programme, to support and guide the newly arrived students in their transition to he. the students are matched with mentors who can avail of an introduction course during which they learn about leadership, communication and intercultural competence. the text also aims at making this event (that is offered regularly every term at unia) and the role as mentor into something that many students aspire to participate in. mutual learning is raised as part of the process for both mentors and “adepts”, but “på ett roligt och nyttigt sätt” (sw:1 “in a fun and useful way”). another webpage at unia provides information about initiatives for newly arrived and integration that imply a number of educational paths especially tailored for “utländska” (sw: foreigners) and “nyanlända” (sw: newly arrived). the common denominator of the ways in which such paths are presented in the documents is the use of formulations that put the spotlight on the orientation of such programmes towards specific groups and with a specific aim. for instance, the national programme for school and pre-school teachers especially targeting migrants and newly arrived are called “snabbspår” (sw: fast track). similar programmes for other professions imply that the applicant has a competence and/or a formal education from the country of origin that can be validated as the corresponding qualifications required to enter the programme according to mainstream swedish requirements. this process is called “validering av reell kompetens” (sw: validation of actual competence) and is currently a much-debated issue in the higher educational landscape in sweden because it puts the spotlight on the mainly pragmatic and instrumental view of competence, especially when it is related to a particular professional occupation. this is discussed in depth in the swedish report by the swedish council for higher education (uhr): “kan excellens uppnås i homogena studentgrupper”? (sw: can excellence be achieved in homogeneous student groups?) from 2016 (uhr, rapport 2016). in the uhr document (2016), universities are reported to have raised issues concerning the need for the establishment of a network with other authorities to allow “nyanlända” to rapidly enter he. thus, the issue of “speed” when it comes to migrant students transitioning in and out of he is prominent in the swedish data. one year after the publication of the uhr report (2016), the issue of widening participation (rather than only recruitment) was also included as part of national policy (promemoria u2017/03082/uh), as a necessity for successful transitions across educational pathways. this implies a use of resources that take cognizance of the different functions in he, from study counselling and the routines for evaluation of “actual competence” (see above) to the kinds of pedagogical efforts required in this endeavour. the 2017 promemoria caused a vivid debate among university faculty in sweden at the end of the same year and resulted in a cancellation of the proposition to change the formulation in the swedish higher education act from widening recruitment to widening participation. 3.1.2 unib in the data from unib (italy) the page about inclusion for refugees can be accessed through a search in the university website using the term “refugees” (see figure 2). the page about inclusion and reception of refugees is rather representative of unib webpages: it provides a short framing text with a section called “resources and opportunities” that includes a series of internal links to other pages (see figure 4). figure 4. university b. capture of the full-size screenshot of the website (left). translation from swedish in the text boxes (right). these links lead to pages with information about, for instance, “scholars at risk”, student scholarships, and the manifesto for the inclusive university2 (see also figure 3). when entering the page about the manifesto, words like “accoglienza” “diritti” “parità” and “inclusione” (it:3 welcoming, rights, equity and inclusion) are used to frame an agenda of inclusion, openness and widening participation in which “scuola e università offrono un'importante opportunità per i giovani rifugiati, rappresentando un passaggio fondamentale nel loro percorso di inclusione sociale”4 . according to this logic, the university (along with school) becomes the fundamental site where a “rite of passage” is made possible, from a status of “refugee” to “university student”. once in this page, a link, embedded in the body of the text, leads the reader to a pdf document, where the official manifesto is presented. the text is a programmatic document and presents general principles and outlines that the universities that participate in it share a commitment to follow. also in this document, that comprehends students, researchers and teachers with a refugee status, the terminology about hospitality, inclusion and integration dominates the text. diversity is framed as the appreciation of cultural difference and as a source of academic enrichment for the university. the manifesto ends with a quote to the canto xvii in dante’s paradiso (heaven) in the divine comedy. the chosen verses allude to dante’s exile and the challenges of being forced to leave one’s home. tu lascerai ogne cosa diletta più caramente; e questo è quello strale che l'arco de lo essilio pria saetta. tu proverai sì come sa di sale lo pane altrui, e come è duro calle lo scendere e 'l salir per l'altrui scale.5 (dante, paradiso, canto xvii) the manifesto contains a number of external links to other unhcr documents, and the inhere initiative (he supporting refugees in europe) (www.inhereproject.eu) that includes several longer documents, e.g. the guidelines for university staff members 6 and the good practice catalogue 7. the inhere recommendations represent an international document that aims at providing general guidelines to the 29 participating universities in europe. the “good practice catalogue” focuses on a number of issues and challenges for an individual with a refugee status that aim at participating in the academic life as a student, teacher or researcher. such issues include, for instance, support for recognition of previous degrees, “equitable and wide” access, financial support, language and bridging courses, integration, employment and employability, online learning, humanitarian work and collaboration among institutions and across sectors. the section about access to he links to article 27 in a eu directive (2011/95/eu) on standards for the qualification of third-country nationals or stateless persons as beneficiaries of international protection, according to which all eu member states shall provide the same conditions for access as third country nationals who are legally residents. furthermore, the section explores further the issue of access and its boundaries, more specifically: granting equitable and wide access to higher education involves more than providing tuition-free degrees, and a multitude of diverse policy tools and institutional measures exist for non-traditional or disadvantaged learners. in addition to scholarships and financial support, measures may target refugees via outreach activities, and also beyond recruitment to provide general information to the potential refugee students about the higher education system and its opportunities, consulting them through mentoring programmes and helping them navigate through the application procedures. (https://www.inhereproject.eu/wp-content/uploads/2017/08/inhere-gpc_en.pdf.pdf p.6) support to promote and grant access goes, according to this logic, beyond policy and financial support. also, in the extract above, the refugee status is framed as belonging to another dimension that should be kept separated from the “ordinary” support for the inclusion of “non-traditional” or “disadvantaged” learners. who these latter groups include more specifically is not clear in the document, but what makes the refugee status out of the ordinary is students’ need to be informed, guided and mentored to find their way in the intricate path of higher education, not least when it comes to the application procedures. issues of inclusion and support are seemingly “differently special” for refugees as compared to other groups at risk of being marginalized. such specificity for the status of a person who has been granted asylum, that was forced, against her/his will, to suddenly leave all that she/he cares for “to taste the bitterness of others’ bread” as the quote to dante’s canto succinctly illustrates, is a rather prominent feature in the unib data and the further documents at national and international level that have been accessed in the search process. this trend is also reinforced by the terminology used in the documents, irrespectively of the language variety (italian or english), that effectively draws from semantic traits that allude to hospitality, generosity, openness, as gifts from the virtuous and prosperous donor to a person facing integration difficulties at different levels. the refugee is a person whose cultural and educational background should be welcomed, respected and elevated as a source of stimulating exchange and mutual learning. to conclude, based on our comparison, the results of the analysis of dataset 1 sheds light on the different ways in which migrant students are framed as in need of support, mentoring, welcome, or other special arrangements specifically tailored for students who differ from the mainstream in the form of individuals who are purported to belong to a homogeneous, monolingual norm related to a nation-state. the virtuous character of the host country in unib (italy) in terms of hospitality and generosity is less visible in the swedish data, and is instead replaced by a logic of efficiency and speed in the process, from reception to integration. within the local higher education institution, as well as across universities and nation-states, the results show that different services and targeted programmes (addressing students’ mobility on one side, and students with a migrant background on the other) run the risk of creating parallel welcome and support approaches concerning student conditions that share the same “cultural diversity” core issue. the results also contribute to the discussion on the relevance of conceptualizing the study of diversity and of transitions to he in scalar terms (global/local) (see also enders, 2004). the emergence of global or supra-national policies like arqus and inhere, relates to a discourse of globalization and internationalization of he (along with the idea of the migrant student as in need of special support), and offers insights on the patterns created by national governmental policies in the everyday lives of students. it is to shed light on this latter dimension that we move to the next result section of this study. 3.2 tracing students’ patterns of transitions to higher education (policy implementation across sites) in this sub-section we present the results from the analysis of dataset 2, i.e. the interviews and discussions together with three students, two in sweden and one in italy. this dataset sheds light on the second question posed in this study, i.e. the investigation of the ways in which policy ideas about transition and widening participation are enmeshed in the students’ narratives and how they affect the students’ experiences of participation, normalization and marginalization in he. 3.2.1 “breaking the bubble”: transitions as conquering seemingly unreachable spaces the transitional paths to higher education involved leaving previous familiar and local spaces, out of the comfort zone, to encounter and develop an understanding of mainstream norms that underpin studies at university level. to conquer these unknown spaces, a shift from the local and micro perspective, to a new context of individual growth, i.e. learning what this implied in practice, became a prerequisite for transitions. it is evident how previous patterns of participation in education affect and become entangled with how experiences of transitioning to higher education evolve over time, as illustrated in the narratives. when m accounted for what it is like being a student at the university, she referred to her experiences when she first entered the bachelor programme in pedagogy at unia. at the time of applying, she was less sure about what to study, and just accepted the programme that she had put first on her ranking list. as she described herself as coming from a socio-economic exposed area, she attended secondary and upper secondary level in the same suburb, with the same people. the transition to university was a challenge in many ways. her account illustrated the social and contextual boundaries, and how previous life trajectories impact transition to university. m and her friends decided to apply elsewhere to break the bubble. m mentioned her transition to university studies as something very big in her world, and that now she was a grown-up. this change also implied thinking about what she was saying. now it’s the university world, now think differently, behave maturely, maybe not say everything you are thinking. it was like entering a new identity, a new role. now you need to present a façade that you don’t really want to from within. here it becomes relevant to frame m’s narrative in terms of “transition as becoming” (gale & parker, 2014, see also biesta, 2016), i.e. as a transforming process of identity requiring the learning of “the rites of passage”, to adapt to and participate according to norms for participation as a student. these changes also impacted on m’s ways of being with others, and, in order to challenge herself, she made other decisions, to learn and develop as a person. in the case of m, the “breaking the bubble” experience, has given her self-confidence. it is evident in her account that such a transition for her, has meant to learn how to be as well as behave in accordance to the norms set for university studies, norms that have a high currency in terms of societal expectations and the ways in which she could handle different situations in her everyday life. at the beginning of her university studies, she went for people like her, to stay in a comfort zone with people sharing similar life trajectories as herself. university studies entailed placing herself in uncomfortable situations, and everything felt strange. i felt as if i had come to an empty, it was like creating, finding a new empty sheet that i had to fill in, and start writing and shape this, be courageous and dare to be brave when being in uncomfortable situations that you will meet. the notion of the “empty sheet” as a metaphor for emptiness that needed to be “filled” by education, is similar to the ways in which f described her own stance to transitioning, and being responsible for creating a space for her student identity. i have learned what academic is expected to look like, you do not have to feel inferior in certain contexts, like i did before i started my studies at the university, when i was hired as a receptionist, where i noticed that people were thinking about me, you have no education, i was nothing to them. the metaphor of “breaking the bubble” into a world that is partly concealed and surrounded by an aura of enchanting mystery is relevant to understand the kinds of shortcomings, mismatches and at times, absurd situations that our cases faced when dealing with transition to he. here, gatekeeping is a key aspect of this process as it means, for the students, to be able to see opportunities as well as who, what, where and when these can become accessible realities. 3.2.2 gatekeeping on possibilities, expectations, and ambitions to overcome “insurmountable barriers” the processes of transitioning from previous education to enter an individual pathway through the university system, and with expectations of enrolment, were enmeshed with several unforeseeable challenges at a macro policy level. with less or little insight into university policy regulations, and gatekeeping enabled by existing structures, overcoming barriers, and crossing boundaries, were interdependent with encountering key persons who could support becoming a legitimate participant in higher education. starting with low grades from the upper secondary level, f met with difficulties when applying to the programme she was interested in, a bachelor programme in psychology to which she was not admitted. somebody informed her about the possibility of applying to freestanding courses that could be compiled into an exam later. f referred to this as her opportunity, an opportunity that also added extra stress during the whole study period, as she was not guaranteed a place in these courses. each term, each course was framed by this uncertainty. it would have been easier for me if i would have been admitted to the programme from the beginning. that would have reduced my stress during all terms, on my own trying to find out what courses i could apply to and keep my fingers crossed hoping to be admitted. this was the case for each course. it’s not that easy to be admitted to a programme when you don’t have a pass with distinction on everything . below, f referred back to her experiences working at the swedish migration agency, to make sense of her own challenges to transition to higher education. f recalled the many accounts given by refugees and immigrants entering sweden, and going through the required processes. f pointed to swedish bureaucracy as seriously hampering conditions and as time-consuming. the swedes don’t like when they hear an immigrant say i want to study, i want to study, i want to have a good job and things like that. i feel as if they become envious at once, you have come here and you are not expected to, you are to work with some sh-t job, you are not expected to have a better job than us, or something in that direction. f herself had experiences from this kind of attitude as well as hearing from immigrants she has met. i have experienced this kind of feeling everywhere, not only linked to myself, when an immigrant has said i want to do this, and this, many swedes often smile scornfully. maybe that is also one reason why you don’t get help, it is not, people come from other countries, do not think about educating yourself, it is all about finding a job. that is what they want, not educate them. policies of inclusion and integration were here instantiated in f’s accounts of the ways in which she and other “immigrants” were faced with the normalized expectations wherein quick access to paid labour was the primary scope of any educational path, rather than individual growth. while this may be the aim and focus for other student groups, when pondering the possibility to start an academic programme of study, in the case of marginalized groups, the choice of the academic path seemed to obey a norm where practical gains, rather than preferences, dominated. a similar issue, but with a different outcome, was present in the data from italy and in the case of a, whose tertiary education records from syria show that he attended a computer science programme. such records are misleading and far less telling and relevant when compared with the informal education and professional competences that he recently acquired. computer science was not what a wanted to study at the university. according to the syrian formal education system, a’s secondary school marks did not allow him to enrol into the political science degree at the university. therefore, in order to pursue his studies further, computer science was the only possible choice. to a, political science and media/journalism were his passion and main drive, in relation to his studies. while escaping syria and living in lebanon, a taught himself english from scratch, mainly by watching youtube videos. a developed his competences as teacher by using english to offer face-to-face, as well as online arabic lessons. at the same time, he started to develop his skills as a journalist and translator. thus, a’s set of competences that were developed through informal and nonformal learning, i.e. not to be found in a’s formal records, included his fluency in a range of languages as well as his experience as a teacher and journalist that were not recognized in the formal process of enrolment in a higher education programme, although they may be relevant features in a’s prospective study path and future career profile. however, although very long and difficult, the “processo di riconoscimento dei crediti formativi” (it: educational credits recognition process) eventually resulted in a’s enrolment in the political science programme in unib. transitioning from a life pre-he to in-he implies several challenges and constraints, that include both existential and ontological dimensions as well as pragmatic and with immediate consequences for one’s quality of life. for a this concerned practical/logistical aspects of living and studying in unib as well as aspects of academic and content organisation. the latter implied an understanding of the rationale behind the disciplinary composition of the political science bachelor degree course. unib offered a to be tutored by another student during this phase. according to a, this peer support proved to be helpful. when i applied, the tutor (from morocco) helped me a lot, we became friends. the fact that he knows my language and he knows the rules is very useful. it helped me to understand what the study plan is. it is even more useful when he studies the same course. for instance, for a it was hard to see the relevance of the enhanced role of the history area, especially history beyond contemporary and modern events. as a student with a syrian schooling background this was an unexpected feature of the course. it was relevant to be able to talk about it and to be introduced to the rationale behind this curriculum choice by somebody who could speak “his language” and would understand a’s specific schooling background. the relationship between students’ multilingual competence and transitioning towards legitimate patterns of participation in he was a recurrent theme in the accounts. 3.2.3 overlooking multilingualism as a bridge for transitioning to he in spite of the european commission’s multilingualism policy8 , now extended beyond the european languages and spatial borders to include multilingualism mirroring a society increasingly impacted by migration, there was little evidence in the narratives that indicated available local policy support practices for language support. on the contrary, the participants’ lived experience illustrated a lack of understanding at university level of the present systemic barriers and how these affected the conditions for studying negatively. f, who said she lost several years of schooling, made frequent references to how language use in university studies was challenging. the references made to lack of linguistic capital necessary for being able to succeed, were distinguished by lack of support to develop these skills, as a constant feature of her lived reality as a student. to address this barrier, f developed strategies for how to cope with the advanced scientific language in the course literature by adopting online resources. often and most of us who come from these countries are not that good in english, we don’t have it with us. so, when you finally learn swedish, you have problems with english as well. what has helped me was that now there is google translate, so today there are more things around. i don’t think i would have managed if it was like when i came to sweden, when these resources didn’t exist. i have hardly studied any english. when f compared herself with fellow students, who had attended the swedish school system, she reasoned about writing, and the higher demands she had to struggle with, and the strategies she adopted. f has been forced to put at least the double efforts into her writing, and she constantly has to remind herself about the different characters of assignments, and what is required from the different writing genres in the various assignment formats. the concrete strategy for handling the reading and understanding texts in english, became copy-and-paste in google translate. f is aware of its shortcomings but she said that without this resource, she would not have succeeded. f’s experiences highlight a consistent barrier and constraint characterized in particular by their linguistic competence, which the migrants were unable to capitalize upon due to “insufficient pedagogical and relational approaches” (harvey & mallman, 2019). furthermore, in their study, harvey and mallman (2019) highlight that the linguistic capital of new migrants is the “most difficult for [them] to realize the potential of, due to insufficient pedagogical and relational approaches within the institution” (p.10). due to the different epistemological and ontological dimensions that are at the core of different disciplines and their traditions, our analysis shows that linguistic competence (along with other, so-called general competences), is related to the voice and tradition (in terms of expectations) of a discipline. in other words, different educational programmes and the variety of course subjects and tasks therein, put different expectations on students’ capability to use an academic language e.g. crafting an argument, applying specific, technical terminology and abstract concepts. while such expectations may seem to lie beyond other, so-called, general competences, such as transcultural sensitivity or critical and analytical thinking and proficiency in language production, is central to provide the students with the “right” tools for legitimate participation in terms of “learning to talk the talk” as seedhouse succinctly puts it (seedhouse, 2008; see also lave & wenger, 1991). in the field of language studies, neologisms like superdiversity and translanguaging (vertovec, 2017, garcia, 2009) have been introduced to acknowledge the totality of the linguistic repertoire of individuals, especially when dealing with southern multilingualism, i.e. the kind of linguistic competences that are usually not recognized in institutions in the global north. however, translanguaging has lately become a synonym of a pedagogical approach, more than an analytical construct (pavlenko, 2018; bagga-gupta & messina dahlberg, 2018). this is in line with the invitation formulated by olson (2003) to formal education institutions to re-consider the ways they mediate “between the formal institutions of society law, government, economy, science and the interests, beliefs, and intentions of persons” (p.285). in unib (italy), a suggested that being able to study and perform the exam in english would help migrant students to better understand the textbooks and to stay on track with the exam schedule. luckily some professors have accepted to allow students to have their exam in english. examples include professors teaching english, history of international relations, history of political doctrines, culture and religion, sociology. but most of the teaching materials are in italian. it would be important to have handbooks in english as well. the issue of the language of study is closely related to the issue of the pace and the grading of the study career. in another case, a professor did not accept to have the exam in english although he would accept to do an oral exam. this is a step forward because even if i am able to study in italian, i am very slow when i have to write in italian so i am wasting time even if i know the answer (and therefore getting lower marks). holding an oral exam would be better than a written exam, according to a. part of the rationale for opting for an oral rather than a written exam lies in the fact that writing the answers to the exam’s questions in italian could be a slower process, even when the content of the answer is known by the student. to be “on track” and follow the programme of study in terms of, for instance, being successful to get the right amount of ects each academic semester is a constantly present dimension in the life of all students, but is especially the case when students have been granted support on account of their socio-economic status as newly arrived migrants. 3.2.4 what and who is framing the support? academic support contradictions while reasoning about ways into higher education and the support therein, m (unic, sweden) argued that more attention should be given to the individual, her or his context, to find options available or alternatives forward. i value higher education a lot, but it doesn’t mean that it shapes your identity. it should not feel impossible to study, to me it was all about finding my way into studies. it is so multidimensional, it is not possible just to connect with having low grades, and this person cannot continue studies. grades are not enough, you need a personal meeting. m’s account pointed to potential students’ ways of navigating their way to university studies (see e.g. farenga, 2018; lopez gavira & moriña, 2015; macfarlane, 2016) and the challenges on a formal, policy level when life trajectories and complexities found in transitioning to university studies do not take into account dimensions from lived realities, that diverge from the norm expressed in structures and regulations for access, and application processes as presented in the analysis of dataset 1. one example of such a mismatching was provided by a (unib, italy) who recalled that during his first academic year he was entitled to have his meals for free in the university canteens. to do so he had to certify his low income. at the end of the academic year, he was asked to pay the money back because during that year he was not able to pass enough exams, i.e. the equivalent of 24 ects. as a result, in his own words: i am paying the money back through several instalments. i paid for one part of it. every 2-3 months i am paying part of it. i probably still owe 4-600 euros. i respect the rule. this kind of support was made available when it was actually relevant to the student’s needs, but at a time (the very first year) when he was likely to be unable to comply with the conditions attached to the support that was being offered. these types of mismatches between the institutional attempt to provide support and the actual conditions of those who might benefit from such support, provide evidence of the need for he institutions to acquire a more comprehensive and complex view of the condition of migrant students and to involve them in negotiating and defining more appropriate and viable ways to address students’ equality. in the remainder of the paper, we will further discuss the issues that arose in the analyses of both datasets, as well as the pedagogical implications that may derive from the study of the entanglements of complex phenomena across analytical scales. 4. discussion the aim of this study was to investigate the intersections between policy and practice in relation to widening participation and transitions to higher education in two european countries that have faced and are still facing challenges related to the integration and inclusion of students whose identity positionings are not in conformity with the mainstream. furthermore, this study is an attempt to take a step further in finding relevant analytical paths as well as methodologies that may shed light on these phenomena by bringing together and juxtaposing macro and micro analytical levels. focus has lied on the textures of practices (gherardi, 2006) in which migrant students are entangled with in their everyday life as participants in he. we have focused upon the analyses of two datasets, one including national and international policy in selected online webpages, and the other case studies that focus on three students, enrolled in three higher educational institutions in sweden and italy. a comparison of the ways in which the selected universities in the respective countries deal with the challenges connected to the recent wave of migration related to global crises, has been incorporated in a research design in which focus lies at the intersection of a number of dimensions, one stretching across nation-states (nationally and locally) and the other across analytical scales (macro and micro). there are a number of reflections that we can make from the analyses carried out in this study. firstly, higher education pedagogy in times of transition requires to meet the kinds of societal challenges related to an (higher) education for the masses (see e.g. tight, 2019), as well as uncertain career paths and relevance especially related to diverse student cohorts. secondly, what we have framed here under the common term of migrant students is not a homogenous group in either country. and, yet, services of “special” support for students are only available for those who can produce evidence of their status of belonging to named social groups, like refugees, newly arrived migrants, but also, more in general, students whose abilities do not conform with the current formal understanding of what a “normal student” may be. thus, if we accept the theoretical argument that identity and diversity are dimensions of human life that are constantly fluctuating and related to historical and cultural understandings of what different labels are and may mean, what does that leave us with, in terms of the study of a complex assemblage like the university (bacevic, 2019)? the problem of “one higher education for all” automatically implies the creation of parallel systems within the systems, or, also, alternative solutions for the inclusion of a wide and diverse student population. we argue that one important pedagogical implication for universities globally, is to reflect on their role in the purported “widening participation” agenda that takes into account its political dimensions at the macro level (in terms of policy and infrastructures) as well as the micro level (in terms of what is done in the classroom). this study offers a substantial contribution to the research that attempts to analytically scrutinise processes of inclusion and marginalisation with a broad analytical gaze, that allowed us to analyse, among other things, a “catch 22 situation”, i.e. of mismatch between the range of support, including policies of inclusion from educational institutions, and the actual needs and challenges that the individuals belonging to the group focused upon here, meet in their transition and participation in higher education (in two european countries) (see also dangoisse, de clercq, van meenen, chartier & nils, 2019; dunwoodie et al., 2020). furthermore, sociomaterial analyses like the one conducted in this study question the assumption that policy and standards developed at a national level and implemented in the local level have separated logics or that there exists “an ontological distinction between the scalar level of the local, regional, national and global” (fenwick, edwards & sawchuck, 2011). the resulting image, thus, is far from sharp. real problems, as ingold (2018) succinctly puts it, seldom arise with a self-contained solution already inside them. real problems have no solution. for instance, in our analysis of the italian data, being granted free-meal vouchers during the first year of study, provided that the student completes a certain amount of ects, implies, rather than the solution to a problem of accessibility and equity, the creation of further issues, when the student, who was not in the position to complete the credits during the first year, must reimburse the costs of all vouchers. the problem of access to academic language is present in the analysis of both datasets in terms of a lack of formal support services. as an alternative to such a formal support, the students reported to use digital technology and internet access to find ways to eventually enter and participate in he in legitimate ways. in the case of f, in sweden, google translate was a crucial component for her to be able to compensate for a lack of support in the development of an academic language where a familiarity with english, as well as swedish, constitutes an important dimension. similarly, in the italian data, a was able to learn english by himself, watching youtube videos. furthermore, the analysis of policy ideas has also shown that the promotion of mobility (including social mobility) in he, as it is framed in the two countries, has important consequences for the inclusion of marginalized groups and their transitions to he. one important ideology that has clearly emerged in the analysis of the swedish and italian data is the notion of “speed”. speed is here understood as the sine qua non condition for transitions for all students. it is especially present in the discourse about the inclusion of marginalized groups and, more specifically, for students with a diverse ethnic background, like newly arrived migrants and refugees. while the rhetoric around speed differs in the analyses of policy in the two countries, (being rather prominent and more straightforwardly presented in the swedish data) it permeates the expectation of a widening participation agenda in both countries. speed means to get swiftly enrolled in the “right” higher educational programme, in getting university ects, a degree and eventually a relevant job position. getting everything “right” seems to be a rather prominent issue in the ways in which our cases have handled their transitions to he, in terms of a “breaking the bubble” experience. according to this logic, the university becomes an enchanted place that has the potential to open up new possibilities that arise at the students’ horizon. here, connection to key persons (often friends and/or peers at the university) that possess a practical knowledge about the most successful path towards inclusion and swift transitions to he, was of paramount importance for all our cases in both contexts. practices of transitions from a life pre-he to in-he are related to the ways in which competences (both formal and nonformal) are acknowledged in policy documents and this is framed differently in the two countries. while in sweden practices of recognition of informal competences, so-called “reell kompetens”, have been present for a long time, at least on the outset, within the scope to widen the recruitment to higher education, this feature was not present in the italian dataset 1. the notions of “reell kompetens” (sw: informal competence) and “crediti formativi” (it: formative credits) differ in that the former recognizes the informal dimension of the competence acquired, whether the latter does not. a declared policy of recognition of informal competence in sweden, however, does not mean that higher education is automatically more open or that the process of the inclusion and recognition of these competences in the student formal records is without issues. as we have seen in f’s report, this is far from the actual situation for all students that lack formal credits or marks in their lives pre-he. in fact, a eventually did manage to enrol in political science at unib (italy), albeit after a long process. in other words, standardized practices of recognition of previous competences (be them gained through formal education or not) are, in our analysis of both datasets and across countries, rather than the solution to the issue of widening recruitment and participation, the result of efforts to accommodate an ideology in which mobility and fluidity of individuals and ideas has high currency, especially in europe after the bologna agreement at the end of the 1990s. this is also related to issues of integration, wherein the understanding of diversity varies in the two contexts of this study: from openness as a result of generosity and humanitarian efforts, where diversity is welcome as a source of mutual enrichment (italy); to openness as the necessary condition to create diverse groups in academia (sweden) that will, in turn, reflect the makeup of society at large, where, according to this logic, diversity is the norm. to conclude, the analytical argument that constitutes one of the central, frontline contribution of this study, is that, in order to shed light on the kinds of opportunities that arise at the students’ horizon, it is relevant to compare and understand the ways in which policy implementation and appropriation takes place in practice and across sites. however, one important result that emerged in such an endeavour is that the boundaries between these dimensions are not to be conceived as dividing membranes that clearly mark what is global and what is local, what is macro and micro, or what is labelled as belonging to one nation-state or another. rather, the vertical case study approach used here has shown the complex entanglements of, for instance, policies of standardization, and the ways in which students make sense of situations in which such policies become practices of standardization in terms of, for instance i) standards in relation to what the students should deliver to pass a course and ii) standards of the support delivered to help the students to do so. transitions and widening participation, and the challenges that these have entailed for the students, are, in other words, not only affected by policy, they are, in fact, produced by a policy of inclusion. our analyses have confirmed and carefully illustrated the general discourse in policy where migrant students are individuals whose specific experiences make access to and participation in he distinct for them (perry & mallozzi, 2011). this deficit perspective still frames the issue as a one-way “aid” relation between the higher education organization and the students in both countries. our analysis suggests that alternative frames could become part of a more adaptable and sensitive approach to issues related to how to accommodate a diverse student population in he. such alternative frames include a shift in perspective that takes into account the students’ competences and (formal and nonformal) educational backgrounds in relation to their study and career plans. this could be achieved by including and analysing students’ autobiographical data to allow a better understanding of their communities of interest, transnational networks, and skills (see also ünlüsoy & de haan, 2020). this informal and nonformal curriculum indicates the importance for he institutions to make available appropriate autobiographical and competence recognition and validation tools. furthermore, being aware of such skills would offer university staff and faculty opportunities to find ways forward in the curricula’s international dimensions as well as to explore and to acknowledge potential contributions by migrant students to develop courses and assessment approaches across global and local dimensions. keypoints processes of inclusion in he at the macro level, inevitably incorporate processes of marginalization at the micro level. boundaries between analytical scales (macro-micro, global-local) are not hermetical. a rhetoric of speed in transitioning to and across higher education permeates the expectations of what successful transition to he may entail. proficiency in language production is crucial to provide migrant students with the “right” tools for legitimate participation. higher education institutions need to elaborate appropriate competence and autobiographical recognition and validation tools for migrant students. footnotes 1 sw: = original in swedish. 2https://www.unhcr.it/wp-content/uploads/2019/11/manifesto-delluniversita-inclusiva_unhcr.pdf 3 it: = original in italian. 4it: school and university offer an important opportunity for young refugees, thus representing a crucial transition in their path of social inclusion. (our translation) 5 it: you shall leave everything you love most dearly: /this is the arrow that the bow of exile shoots first. /you shall know the bitter (salty) taste/ of others’ bread, and know/ how hard a path it is to continuously/ descend and ascend others’ stairs. (our translation) 6 https://www.inhereproject.eu/wp-content/uploads/2018/09/inhere_guidelines_en.pdf 7https://www.inhereproject.eu/wp-content/uploads/2017/08/inhere-gpc_en.pdf.pdf 8https://ec.europa.eu/education/policies/multilingualism/about-multilingualism-policy_en list of abbreviations asrs asylum seekers and refugees he higher education inhere higher education supporting refugees in europe initiative uhr universitetsoch högskolerådet (sw: swedish council for higher education) unia university a (sweden) unib university b (italy) unic university c (sweden) unhcr united nations high commissioner for refugees unicore university corridors for refugees project references bagga-gupta, s., messina dahlberg, g. & winther, y. (2016) disabling and enabling technologies for learning in higher education for all: issues and challenges for whom? informatics, 3(21). doi: 10.3390/informatics3040021 bagga-gupta, s., messina dahlberg, g., vigmo, s. (2020). equity and social justice for whom and by whom in contemporary higher education? situated-distributed policies of inclusion/integration in sweden. learning and teaching. the international journal of higher education in the social sciences, 13 (3), 82-110. doi: 10.3167/latiss.2020.130306 bagga-gupta, s. & messina dahlberg, g. (2018). meaning-making or heterogeneity in the areas of language and identity? the case of translanguaging and nyanlända (newly-arrived) across time and space. international journal of multilingualism, 15(4), 383-411. doi: 10.1080/14790718.2018.1468446 bacevic, j. (2019). with or without u? assemblage theory and (de)territorialising the university. globalisation, societies and education, 17(1), 78-91. doi: 10.1080/14767724.2018.1498323 bartlett, l., & vavrus, f. (2014). transversing the vertical case study: a methodological approach to studies of educational policy as practice. anthropology and education quarterly, 45(2), 131-147. doi: 10.1111/aeq.12055 biesta, g. j. j. (2016). good education in an age of measurement: ethics, politics, democracy. london: routledge. braun, v., & clarke, v. (2006). using thematic analysis in psychology, qualitative research in psychology, 3(2), 77-101. doi: 10.1191/1478088706qp063oa braun, v., & clarke, v. (2019). reflecting on reflexive thematic analysis. qualitative research in sport, exercise and health, 11(4), 589-597. doi: 10.1080/2159676x.2019.1628806 braun, v., clarke, v. & weate, p. (2016). using thematic analysis in sport and exercise research. in b. smith & a. c. sparkes (eds.), routledge handbook of qualitative research in sport and exercise (pp. 191-205). london: routledge. brown, m., g., wohn, d. y., & ellison, n. (2016). without a map: college access and the online practices of youth from low-income communities. computers & education, 92-93, 104-116. doi: 10.1016/j.compedu.2015.10.001 dangoisse, f., clercq, m. d., meenen, f. v., chartier, l., & nils, f. (2020). when disability becomes ability to navigate the transition to higher education: a comparison of students with and without disabilities. european journal of special needs education, 35(4), 513-528. doi: 10.1080/08856257.2019.1708642 dunwoodie, k., kaukko, m., wilkinson, j., reimer, k., & webb, s. (2020). widening university access for students of asylum-seeking backgrounds:(mis) recognition in an australian context. higher education policy, 1-22. ecclestone, k., biesta, g. & hughes. m. (2010) (eds.) transitions and learning through the lifecourse. london: routledge eisenhart, m. (2017). a matter of scale: multi-scale ethnographic research on education in the united states. ethnography and education, 12 (2), 134-147. doi: 10.1080/17457823.2016.1257947 enders, j. (2004). higher education, internationalisation, and the nation-state: recent developments and challenges to governance theory. higher education, 47(3), 361-382. doi: 10.1023/b:high.0000016461.98676.30 eu directive 2011/95/eu. directive 2011/95/eu of the european parliament and the council. official journal of the european union l337/10 en . https://eur-lex.europa.eu/lexuriserv/lexuriserv.do?uri=oj:l:2011:337:0009:0026:en:pdf farenga, s. a. (2018). early struggles, peer groups and eventual success: an artful inquiry into unpacking transitions into university of widening participation students. widening participation and lifelong learning, 20(1), 60-78. doi: 10.5456/wpll.20.1.60 faubion, j. d., & marcus, g. e. (2009) (eds.). fieldwork is not what it used to be. learning anthropology’s methods in a time of transition. ithaca: cornell university press. fenwick, t., edwards, r., & sawchuck, p. (2011). emerging approaches to educational research. tracing the sociomaterial. london: routledge. fox, a., baker, s., charitonos, k., jack, v., & moser-mercer, b. (2020). ethics-in-practice in fragile contexts: research in education for displaced persons, refugees and asylum seekers. british educational research journal. doi: 10.1002/berj.3618. gale, t., & parker, s. (2014) navigating change: a typology of student transition in higher education. studies in higher education, 39 (5), 734-753, doi: 10.1080/03075079.2012.721351 garcía, o. (2009). bilingual education in the 21st century: a global perspective. oxford: blackwell. gherardi, s. (2017). sociomateriality in posthuman practice theory. in a. hui, t. schatzki, & e. shove (eds.), the nexus of practices. connections, constellations, practitioners (pp. 38-51). london: routledge. giroux, h. a. (2010). bare pedagogy and the scourge of neoliberalism: rethinking higher education as a democratic public sphere, educational forum, 74(3), 184–196. doi: 10.1080/00131725.2010.483897 gravett, k., kinchin, i. m., & winstone, n. e. (2020). frailty in transition? troubling the norms, boundaries and limitations of transition theory and practice. higher education research & development, 1-17. doi: 10.1080/07294360.2020.1721442 guo, s. (2015). the changing nature of adult education in the age of transnational migration: toward a model of recognitive adult education. in s. guo & e. lange (eds.), transnational migration, social inclusion and adult education (pp. 7–17). new directions for adult and continuing education, no. 146. san francisco, ca: jossey-bass. hancock, a-m. (2016). intersectionality. an intellectual history. oxford: oxford university press. harvey, a., & mallman, m. (2019). beyond cultural capital: understanding the strengths of new migrants within higher education. policy futures in education, 17(5), 657-673. doi: 10.1177/1478210318822180 hope, j. (2017). cutting rough diamonds: the transition experiences first generation students in higher education. in e. kyndt, v. donche, k. trigwell, and s. lindblom-ylänne (eds.),higher education transitions – theory and research (pp. 85-100) . new york: routledge. ingold, t. (2017). anthropology and/as education. new york: routledge. kupský, a. (2017). history and changes of swedish migration policy. journal of geography, politics and society, 7(3), 50–56. doi: 10.4467/24512249jg.17.027.7183 kyndt, e., donche, v., trigwell, k., & lindblom-ylänne, s. (2017). understanding higher education transitions: why theory, research and practice matter. in e. kyndt, v. donche, k. trigwell, and s. lindblom-ylänne (eds.),higher education transitions – theory and research (pp. 306-319) . new york: routledge. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge, ma: cambridge university press. law, j. (2004). after method: mess in social research. london and new york: routledge. lopez gavira, r. & moriña, a. (2015). hidden voices in higher education: inclusive policies and practices in social science and law classrooms. international journal of inclusive education, 19(4), 365-378. doi: 10.1080/13603116.2014.935812 macfarlane, k. (2016). transition through immersion in he: an evaluation of how a transition and immersion programme for school pupils embeds a culture of the university experience for key stakeholders. widening participation and lifelong learning, 18(3), 63-73. doi: 10.5456/wpll.18.3.63 mangan, d. & winter, l. a. (2017). (in)validation and (mis)recognition in higher education: the experiences of students from refugee backgrounds. international journal of lifelong education, 36(4), 486-502. doi: 10.1080/02601370.2017.1287131 marcus, g. e. (1995). ethnography in/of the world system: the emergence of multi-sited ethnography. annual review of anthropology, 24, 95-117. doi: 10.1146/annurev.an.24.100195.000523 marres, n., & weltevrede, e. (2013). scraping the social? issues in live social research. journal of cultural economy, 6(3), 313-335. doi: 10.1080/17530350.2013.772070 messina dahlberg, g. & bagga-gupta, s. (2019). on the quest to “go beyond” a bounded view of language. research in the intersections of the educational sciences, language studies and deaf studies domains 1997-2018. deafness and education international, 21(2-3), 74-98. doi: 10.1080/14643154.2018.1561782 middleton, a. (2018). reimagining spaces for learning in higher education. london: palgrave. olson, d.r. (2003). psychological theory and educational reform. how school remakes mind and society. cambridge: cambridge university press. pavlenko, a. (2018). superdiversity and why it isn’t. reflections on terminological innovations and academic branding. in s. breidbach, l. küster & b. schmenk (eds.).sloganizations in language education discourse (pp. 142-168) . bristol: multilingual matters. perry, k. h., mallozzi, c. a. (2011). ‘are you able … to learn?’: power and access to higher education for african refugees in the usa. power and education, 3(3), 249–262. doi: 10.2304/power.2011.3.3.249 promemoria u2017/03082/uh. brett deltagande i högskoleutbildning. available at: http://www.regeringen.se/rattsdokument/departementsserien-och-promemorior/2017/07/brett deltagande-i-hogskoleutbildning/ prop. 2001/02:15. den öppna högskolan. available at: https://www.riksdagen.se/sv/dokument-lagar/dokument/proposition/den-oppna-hogskolan_g p0315d2 ramsay, g., baker, s. (2019). higher education and students from a refugee background: a meta-scoping study. refugee survey quarterly. doi: 10.1093/rsq/hdy018 schneider, l. (2018). access and aspirations: syrian refugees’ experiences of entering higher education in germany. research in comparative & international education, 13(3, 457-478. doi: 1 0.1177/1745499918784764 seedhouse, p. (2008). learning to talk the talk: conversation analysis as a tool for induction of trainee teachers. in s. garton & k. richards (eds.) professional encounters in tesol (pp. 42-57). basingstoke: palgrave macmillan. taylor, c. a., & harris-evans, j. (2018). reconceptualising transition to higher education with deleuze and guattari. studies in higher education, 43, 1254–1267. doi: 10.1080/03075079. 2016.1242567 tight, m. (2019). mass higher education and massification. higher education policy, 2019(32), 93-108. doi: 10.1057/s41307-017-0075-3 trigwell, k. (2017). transitions within university: concepts and cases. in e. kyndt, v. donche, k. trigwell, and s. lindblom-ylänne (eds). higher education transitions – theory and research (pp. 121-130). new york: routledge. tsing, a. (2005). friction: an ethnography of global connection. princeton: princeton university press. tummons, j. & beach, d. (2019). ethnography, materiality, and the principle of symmetry: problematising anthropocentrism and interactionism in the ethnography of education, ethnography and education, 15(3), 286–299. doi: 10.1080/17457823.2019.1683756 uhr rapport. (2016). kan excellens uppnås i homogena grupper? en redovisning av regeringsuppdraget att kartlägga och analysera lärosätenas arbete med breddad rekrytering och breddat deltagande. available at: https://www.uhr.se/globalassets/_uhr.se/publikationer/2016/uhr-kan-excellens-uppnas-i-homogena-studentgrupper.pdf unchr (2019). manifesto dell’università inclusiva. available at: https://www.unhcr.it/wp-content/uploads/2019/11/manifesto-delluniversita-inclusiva_unhcr.pdf ünlüsoy, a., & de haan, m. (2020). turkish-dutch teens’ networked configurations for learning. frontline learning research, 8(2), 109-130. doi: 10.14786/flr.v8i2.423 vertovec, s. (2017). super-diversity. london: routledge. walsh, c. (2012). interculturalidad crítica/pedagogía decolonial. revista de educação técnica e tecnológica em ciências agrícolas, 3 (6), 25-42. yin, r. k. (2018). case study research and applications . design & methods. 6th ed. thousand oaks: sage. frontline learning research 3 (2014) 78-82 issn 2295-3159 corresponding author: tomi silander, xerox research centre europe, www.xrce.xerox.com, tomi.silander@xrce.xerox.com ; petri nokelainen, university of tampere, www.uta.fi/edu, petri.nokelainen@uta.fi. http://dx.doi.org/10.14786/flr.v2i1.107 78 | f l r using new models to analyze complex regularities of the world: commentary on musso et al. (2013) petri nokelainen a , tomi silander b a university of tampere, finland b xerox research centre europe, france abstract this commentary to the recent article by musso et al. (2013) discusses issues related to model fitting, comparison of classification accuracy of generative and discriminative models, and two (or more) cultures of data modeling. we start by questioning the extremely high classification accuracy with an empirical data from a complex domain. there is a risk that we model perfect nonsense perfectly. our second concern is related to the relevance of comparing multilayer perceptron neural networks and linear discriminant analysis classification accuracy indices. we find this problematic, as it is like comparing apples and oranges. it would have been easier to interpret the model and the variable (group) importance’s if the authors would have compared mlp to some discriminative classifier, such as group lasso logistic regression. finally, we conclude our commentary with a discussion about the predictive properties of the adopted data modeling approach. keywords: artificial neural networks; commentary; model-fit; generative and discriminative models; algorithmic data modeling 1. introduction statistical methods are constantly developed, not only within statistics, but also in other disciplines, such as, physics, economics, bioinformatics, linguistics, and computer science. we therefore are very sympathetic to the attempts to promote new methods for analyzing educational data. however, also in this research field, for years there has been an emphasis on the predictive modeling, for example, to learn structures from the data (nokelainen, silander, ruohotie, & tirri, 2007; tirri, nokelainen, & komulainen, 2013) and to predict class membership (nokelainen & ruohotie, 2009; nokelainen, tirri, campbell, & walberg, 2007; villaverde, p. nokelainen & t. silander 79 | f l r godoy, & amandi, 2006). the recent boom of data analytics has further increased the efforts in this front. one methodological rationale behind this development is that predictiveness guards against over-fitting and serves as a natural criterion for the quality of the model. classical statistical literature was not emphasizing this aspect, since models were kept relatively simple to avoid over-fitting and to keep the calculations reasonable. in addition, much of the theory concerned asymptotic behavior in which case over-fitting is usually not an issue. increased computing power now allows more complicated models, such as bayesian, fuzzy and neural networks, to be used. while the increased flexibility brings benefits, there are also possibilities to make new kind of errors in the analysis. since we share the enthusiasm to promote new methods, we also feel that is of utmost importance to perform the analyses with these new methods using extremely high methodological standards. in this respect, we find some of the procedures followed in the recent article by musso, kyndt, cascallar and dochy (2013) problematic. before discussing about these issues in detail, we wish to indicate that we agree with edelsbrunner and schneider’s (2013) previous commentary on this article where they state that there are other data analysis techniques with similar properties than anns, but without the drawbacks. 2. fitting to the test data our first concern is the reported 100% classification accuracy in such a complex domain, and the lack of thorough discussion of this issue. multilayer perceptron (mlp) neural networks are universal function approximators (lek & guegan, 1999). with enough twisting of the parameters, one can use them to implement any classification rule (schittenkopf, deco, & brauer, 1997). consequently, the networks could in theory also be designed to explain the version of the dataset in which the gpa scores would be randomly assigned to the students. what knowledge does such a model (that can explain anything) extract from the real world? one persuasive answer does indeed lie in prediction. only the regularities help one to generalize beyond the training sample, that is, to predict. but here one needs to be very careful. to do this right, the data must first be split into two parts and then the model must be built using only the first part. the testing should be done with the second part of the data – the part that was not used in the model building process at all. the big question is: can we trust us to be able to refrain from “cheating” (using the test data)? in order to avoid this, it would be best to gather the test data after building the model, or to separate it from the training data in the very beginning, and give it to somebody else who will then, after the model has been built, test the accuracy of the model – once and for all! the paper by musso and her colleagues (2013) practically acknowledges that such a discipline was not rigorously followed. the network structure and learning parameters were adjusted to maximize the accuracy in test data. many models were tested to achieve this. even the division of the data into training and test samples was manipulated in order to “… maximize the training sample while preserving the appearance of all detected patterns in the testing sample …” (musso et al., 2013, 60). now, one cannot totally exclude the possibility that the authors actually promote this methodology as a sound one. take a maximally flexible model family, find the most parsimonious model that fits 100% to the data, and then analyze the model. but if that were the case, why torture oneself with the tedious manual work to find 100% fit (yes fit, not generalization) to the test data? it would be easier to just fit to the whole data set – but that would break the illusion of prediction. 3. comparison with the linear discriminant analysis we find that the authors’ decision to compare the model to the other models sets a very good example that should more often be followed in the educational research. such comparisons are widely used in machine learning (e.g., demšar, 2006). however, comparing the multilayer perceptron and the discriminant analysis raises some questions. behind the linear discriminant analysis is a linear discriminant model that defines a joint probability distribution for the whole 19-variate (18 independent variables + gpa p. nokelainen & t. silander 80 | f l r class) data vector. such joint probability distributions can be used for classification, since the conditional probability p(gpa-class | predictors) is proportional to the joint distribution p(gpa-class & predictors). these kinds of classifiers are usually called generative classifiers (e.g., xue & titterington, 2008), since they are based on the models that can be used to sample the whole (19-variate) data vectors. mlps are not generative classifiers, but so called discriminative classifiers. they are built to directly estimate the conditional distribution p(gpa-class | predictors) without modeling the relationships among the predictors. (reading the paper sometimes makes you feel that the authors claim otherwise.) while the linear discriminant model da1 used in the musso et al. (2013) paper has about 2*18 + 18*18 = 360 parameters, the neural network model has 18*15*2 = 540 parameters. the difference in number of parameters is not huge, but all the parameters of the mlp are used for modeling the conditional distribution, while the parameters in the linear discriminant model also take care of modeling the relationships between variables. since the linear discriminant is also a predictive classifier, one cannot but wonder why the confusion matrices for linear discriminants were not reported. those numbers surely would have fitted to the same space without any problem. on the other hand, it is plausible that any differences found are due to the other classifier being generative and the other one discriminative. it would have been much more meaningful to compare the mlp to some discriminative classifier such as a logistic regression, or better yet, some sparse version of it such as the group lasso with interaction terms (meier, van de geer, & bühlmann, 2008) that would make interpreting the model and the variable (group) importances much easier. furthermore, the musso et al. (2013) paper is very unclear about how the variable importances have been calculated. the attempt to follow the references only lead to the statements like “this has been implemented in software x” or to an unpublished technical report by one of the authors. 4. two cultures according to breiman (2001b), there are two statistical modeling cultures. the data modeling culture assumes that the data are generated by a given stochastic data model (such as linear or logistic regression). the algorithmic modeling culture treats the data mechanism as unknown, using, for example, decision trees and neural networks. although the first of these two cultures, focusing on data models, is still dominating, many fields outside statistics are rapidly adopting a wide variety of tools. neural networks are often considered as black-box models that do not offer a good explanation and understanding of the domain (correa, bielza, & pamies-teixeira, 2009). consequently, such models are sometimes hastily deemed as unsuitable for much of the science. we would like to take the opportunity to say a word for such black-box models along the lines expressed by a statistician, leo breiman (1928-2005). world may not be a simple place. while among the simple theories there are those who most closely approximate the complex reality, it is a priori possible that none of those simple theories, even the best of them, approximate the situation well. if the model does not predict well, one can argue that it has not captured the regularities of the world, so what insight would understanding and interpreting such a model offer us. (breiman, 2001b.) most of the statistical community would agree that only if the model is reasonably good (and we mean generalization, not just fit to the sample), interpretation makes sense. edelsbrunner and schneider (2013) indicate in their commentary on this article that whenever possible, more theory-driven data modeling techniques should be preferred. however, if we limit ourselves to the models that can be easily interpreted, we may end up discarding models that truly capture important regularities of the domain. there are two different strategies then to extract true knowledge from the world. the first one is a classical one in which we try to find a well predicting model among the easily interpretable ones. this is the path that should always be attempted. unfortunately, we suspect that it was not seriously pursued in the article by musso et al. (2013). it is also possible to try to build a well predicting model, even if it is not that easy to interpret, and then put more effort to squeeze out the knowledge from the model. one could argue that this is what happens, when you ask a doctor why she made the diagnosis she did. the answer will (only) be some approximation of the real reason. still doctors are considered useful. p. nokelainen & t. silander 81 | f l r we have an educated guess, that such a procedure is behind the independent variable importance measures featured in the article. naturally, such procedures should be carefully documented in order to understand what kind of information we have managed to extract from the model. the article leaves the impression of the claim that artificial neural networks were somehow especially good for inferring how different complex patterns of variables affect the outcome. however, the presented results list only univariate importance of variables. how could that possibly tell us anything relevant about complex patterns? neural networks are by no means the only black-box models that can be successful in the prediction. many ensemble learning based or motivated methods, for instance random decision forests (breiman, 2001a) and bayesian additive regression trees (chipman, george, & mcculloch, 2010), are among such models. ensemble methods have reached very high classification accuracies by using several (or growing a forest of) decision trees on the same data instead of a single-tree predictor. ever increasing data sizes (e.g., massive open online courses, moocs, may have 100 000 students with all their data gathered automatically to the digital form) and increasing computer power may well shift focus from small, simple and understandable models, to the big, complex black-box models. but hopefully some of that computing power can also be used to extract understandable (even if not always very close to truth) approximations of the true complex regularities of the world. key points artificial neural networks (ann) certainly provide interesting modeling possibilities for educational scientists, but they also set certain challenges for the design of the study and interpretation of the results. the article shows a very good example by comparing the results of ann to a conventional data modeling approach, but the comparison should have been made between two discriminative classifiers. ensemble methods provide a modern and powerful alternative to neural networks as they use predictions of several models built during learning process instead of using a single model. references breiman, l. (2001a). random forests. machine learning, 45, 5–32. doi:10.1023/a:1010933404324 breiman, l. (2001b). statistical modeling: the two cultures. statistical science, 16(3), 199–231. doi:10.1214/ss/1009213726 chipman, h. a., george, e. i., & mcculloch, r. e. (2010). bart: bayesian additive regression trees. the annals of applied statistics, 4(1), 266–298. doi:10.1214/09-aoas285 demšar, j. (2006). statistical comparison of classifiers over multiple data sets. journal of machine learning research, 7, 1–30. correa, m., bielza, c., & pamies-teixeira, j. (2009). comparison of bayesian networks and artificial neural networks for quality detection in a machining process. expert systems with applications, 36, 7270– 7279. doi:10.1016/j.eswa.2008.09.024 lek, s., & guegan, j. f. (1999). artificial neural networks as a tool in ecological modelling, an introduction. ecological modelling, 120, 65–73. doi:10.1016/s0304-3800(99)00092-7 meier, l., van de geer, s., & bühlmann, p. (2008). the group lasso for logistic regression. journal of the royal statistical society: series b, 70(part 1), 53-71. doi:10.1111/j.1467-9868.2007.00627.x musso, m. f., kyndt, e., cascallar, e. c., & dochy, f. (2013). predicting general academic performance and identifying differential contribution of participating variables using artificial neural networks. frontline learning research, 1, 42-71. doi:10.14786/flr.v1i1.13 p. nokelainen & t. silander 82 | f l r nokelainen, p., silander, t., ruohotie, p., & tirri, h. (2007). investigating the number of non-linear and multi-modal relationships between observed variables measuring a growth-oriented atmosphere. quality & quantity, 41(6), 869-890. doi:10.1007/s11135-006-9030-x nokelainen, p., & ruohotie, p. (2009). non-linear modeling of growth prerequisites in a finnish polytechnic institution of higher education. journal of workplace learning, 21(1), 36-57. doi:10.1108/13665620910924907 nokelainen, p., tirri, k., campbell, j. r., & walberg, h. (2007). factors that contribute or hinder academic productivity: comparing two groups of most and least successful olympians. educational research and evaluation, 13(6), 483-500. doi:10.1080/13803610701785931 schittenkopf, c., deco, g., & brauer, w. (1997). two strategies to avoid overfitting in feedforward networks. neural networks, 10(3), 505-516. doi:10.1016/s0893-6080(96)00086-x schneider, m., & edelsbrunner, p. (2013). modelling for prediction vs. modelling for understanding: commentary on musso et al. (2013). frontline learning research, 1(2), 99-101. doi:10.14786/flr.v1i2.74 tirri, k., nokelainen, p., & komulainen, e. (2013). multiple intelligences: can they be measured? psychological test and assessment modeling, 55(4), 438-461. doi:10.1007/978-94-6091-758-5_1 villaverde, j. e., godoy, d., & amandi, a. (2006). learning styles’ recognition in e-learning environments with feed-forward neural networks. journal of computer assisted learning, 22, 197–206. doi:10.1111/j.1365-2729.2006.00169.x xue, j-h., & titterington, d. m. (2008). comment on “on discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes”. neural processing letters, 28(3), 169-187. doi:10.1007/s11063-008-9088-7 frontline learning research 7 (2014) 16 issn 2295-3159 corresponding author: frank fischer (frank.fischer@psy.lmu.de) & sanna järvelä (sanna.jarvela@oulu.fi) doi: http://dx.doi.org/10.14786/flr.v2i4.131 1 | f l r methodological advances in research on learning and instruction and in the learning sciences frank fischer a , sanna järvelä b a university of munich, germany b university of oulu, finland recent years have seen a dynamic growth of research communities addressing conditions, processes and outcomes of learning in formal and informal environments. two of them have markedly advanced the field: the community on research on learning and instruction that has been organized in the european association for research on learning and instruction (earli), and the learning sciences community, including the computer-supported collaborative learning community, organised in the international society of the learning sciences (isls). in this special issue we bring together excellent young researchers from these two communities who are currently contributing to advancing the methodology. we are convinced that the methodological developments in these two communities have a lot of commonalities as the core phenomena under investigation and the core questions are related to conditions, processes and outcomes of learning. common for both of these communities is that they have strong roots in cognitive science. however, we also assume that there are substantial differences in these methodological developments, as the foci of the two communities differ in important respects. most importantly, the learning sciences have strong theoretical roots in situative cognition and socio-cultural approaches focusing on learning activities in authentic contexts. the main assumption underlying this focus is that knowledge is represented in activity structures rather than solely in the head (greeno, 2006). therefore, removing the activities of their social and physical contexts into which they belong will change their nature and, hence, research would lead to invalid results, because only a part of the knowledge that is relevant for effectively participating in a practice can be investigated. given these assumptions, it comes as no surprise that learning sciences research focuses on learning in authentic activities in contexts rather than settings stripped off the context for reasons of control in the experimental studies. besides experiments and mixed-method approaches a core methodology that originated in the clear need for alternatives to deductive-experimental methods for early phases of such field research is design-based research with a cyclic process and the goal to improve a practice and to develop a modest and local theory. dbr has its origins in seminal papers by ann brown (1992) and by allan collins f. fischer & s. järvelä 2 | f l r (1992) as well as in influences coming from computer science (see hoadley & van haneghan, 2011). as knowledge is seen to be tied to activities in practices rather than to a single individual, units of analysis beyond the individual (e.g., network, or activity) are rather the rule than the exception in learning sciences research. explorations of different units of analysis are happening in both communities, of course, but they are more pronounced in the learning sciences community. due to the theoretical roots in socio-cultural thinking and situative cognition the relation of the social and material environment to individual cognition is at the core of theorizing in the learning sciences. this is perhaps most obvious in research on computersupported collaborative learning (see dillenbourg, järvelä & fischer, 2009). as the activities or practices are seen as the core medium of knowing, and the practices differ a lot between communities, domains and disciplines, research in the learning sciences has an important focus on disciplinary practices (e.g. herrenkohl & cornelius, 2013). as the use of tools is a key feature of any community, tool appropriation and use are important foci in learning sciences research. in the learning sciences, the concept of tool is often very broadly defined ranging from tools like scientific concepts to digital technologies. research in the learning and instruction community is characterized by a strong connection of basic research to applied field studies. the field has deeper roots into experimental psychology and general psychology of learning and motivation. traditionally, research on learning and instruction has focused on basic processes of cognition and learning and then applied these principles to teaching and learning practices. for example, understanding metacognitive processes in human learning (flavell, 1979) has led many research groups to making effective interventions to the classroom contexts (azevedo & hadwin, 2005). also research on self-regulated learning has tried to integrate empirical evidence on basic processes of cognition, motivation and emotion into broader applications and interventions in the classrooms, where teacher’s role, students’ activities and features of the learning environment have been synchronized to serve learning (e.g., dignath, buettner & langfeld, 2008). in recent years, basic research on learning and instruction has been helpful for designing powerful learning environments, where knowledge about student’s cognitive, motivational and emotional processes and their individual differences has been applied to instructional design. for example, knowledge on scientific reasoning and on worked-out examples has been applied in developing guidance for inquiry learning (mulder, lazonder & de jong, 2014) and collaborative learning (kollar, ufer, reichersdorfer, vogel, fischer & reiss, 2014). in the learning and instruction community one of the current strong emphases is on methodological orientations linking learning research to natural science brain research. the educational neuroscience movement seems to be more pronounced in research on learning and instruction than in the learning sciences. this is consistent with the deeper roots of learning and instruction research in general and experimental psychology, which has developed a strong neuroscience orientation over the last years. in addition, methodologies are being developed addressing the temporal characteristics of learning. in both communities, quantitative approaches to the analysis of temporal aspects of the learning process have been developed over the last years. it is argued that the explanatory power and the validity of the analyses can be improved dramatically by including the time information that has typically been neglected in many studies on individual and collaborative learning. in research on learning and instruction, this new focus has originated as a consequence of a conceptual shift, as molenaar (this volume, p. xx) puts it: “constructs formerly viewed as personal traits, such as self-regulated learning and motivation, are now conceptualized as a series of events that unfold over time”. there are several arguments in support for this point also in recent publications in the learning sciences (e.g., reimann, 2009). f. fischer & s. järvelä 3 | f l r there are four main potentials for innovation resulting from these developments for learning research, no matter if situated in research on learning and instruction or in learning sciences research. potential #1: increased gain in scientific understanding through more “messy studies” when investigating “real” learning in new fields. it seems inadequate to presume a purely deductive experimental approach in fields where the set of potentially influential variables is unknown. learning research is not an exception here, the same applies to other fields like, e.g. physics, where pioneering research at the edges of current scientific knowledge is more “messy” as well (wieman, 2014). dbr approaches, although still in their infancies, might well develop into a standard methodology for pioneering research on “real learning“ in authentic settings, also in research on learning and instruction. in this special issue, svihla (this volume) reports on recent developments in dbr that address the issues of scalability and generalizability: designbased implementation research (dbir). this might be a promising alternative approach to randomized trial approaches to implementation research in fields where the set of influential and to-be-controlled variables in real formal and informal learning environments is far from clear. because of its design focus, dbr and dbir might contribute to advancing learning research beyond generating new scientific knowledge: they might have the potential to build bridges into practice and increase the credibility and trustworthiness of learning research. an alternative approach is suggested by stegmann (this volume), who addresses the issue of control in studies of complex, collaborative learning environments. he argues for a more systematic use of nomological networks on the conceptual level in connection with as-controlled-as-possible empirical studies that include measures of learning processes as their methodological core. potential #2: more comprehensive understanding of learning phenomena through the use of methodologies that can handle multiple units of analysis and include process analyses. units like the activity, the group or the collective could become standard for questions that transcend the individual’s learning. it will be a challenge how to conceptually deal with this paradigm shift: talking about “learning“ also with respect to super-individual units. for example, should team learning be considered as a whole, or should the term “learning” be reserved for the individual and different concepts should be used to describe what is happening in activities or collectives? an even more far reaching question is to what extent phenomena on super-individual levels should be traced back (or be reduced as some would prefer to say) to the individual contribution, i.e. social phenomena are treated as a result of interacting individuals, and the phenomena can be fully explained by the individual contributions and reactions. increasingly there is research arguing that some social phenomena in contexts of learning cannot be reasonably reduced to the individuals involved (cress, held & kimmerle, 2013; eberle, stegmann & fischer, 2014; stahl, 2006). in this special issue, stegmann’s (this volume) work is additionally addressing this aspect. he describes measures of individual cognition and argumentative discourse in computer-supported small groups and exemplifies approaches to a synchronized analysis of individual cognition and group discourse to address the mutual impact. we argue that systematically employing other units of analysis in learning research than the individual would not only advance research on learning in context, but also help to build bridges into other social sciences that are sometimes hesitating because of the exclusivity of the individual-centric perspective of some learning researchers. potential #3: overcoming overreliance on self-reports: from personal constructs to series of interactions unfolding over time. many learning researchers are currently working on developing alternative conceptualisations of well-established psychological constructs such as self-regulation or motivation. there are shortcomings of relying solely on self-reports in questionnaires (e.g. zimmerman, 2008) to measure personal constructs, such as low predictive value for behaviour in real problem-solving situations. learning researchers have therefore begun to develop methodological approaches that use behaviour or interaction in problem-solving situations as indicators for these constructs. an example from research in the learning f. fischer & s. järvelä 4 | f l r sciences is dan hickeys work on disciplinary engagement in a discussion (filsecker & hickey, 2014) as a complementary measure of motivation. in this special issue, inge molenaar’s work is representing this broader issue. she focuses on the temporal characteristics of learning processes that are typically missed when only self-report measures are used or observational data is aggregated into frequencies over the whole learning process under consideration. also recent advances in the use of computer-generated trace data for understanding patterns and processes of students’ learning (malmberg, järvenoja & järvelä, 2013) have advanced the instructional design field for developing scaffolding and prompts for computer supported learning (järvelä & hadwin, 2013). potential # 4: building bridges between research on learning and cognitive neuroscience. there have been discussions if the gap between education and neuroscience might require a bridge too far. however, recent advances in cognitive neuroscience are encouraging. research on learning and instruction and in the learning sciences are increasingly interested in the biological basis of the learning phenomena under investigation and some of these ideas have already been applied e.g. to mathematics learning (hannula, lepola & lehtinen, 2010). in the learning sciences and the learning and instruction community there is increasing awareness of the possibilities to analyse processes that are not readily accessible for behavioural research. one can hope that in the future, researchers on learning and instruction and in the learning sciences will be able to successfully point out interesting learning phenomena to neuroscientists (varma, mccandliss & schwartz, 2008). these often complex and dynamic phenomena are typically highly challenging for contemporary neuroscientists. at the same time one can hope that researchers in learning and instruction as well as in the learning sciences would become more receptive for stimulations coming from unexplained phenomena in neuroimaging studies on cognition and learning. de smedt (this volume) addresses these questions and elaborates on some convincing examples from mathematics learning that give evidence for a productive interaction between research on learning and instruction and cognitive neuroscience. he argues that the successful interaction crucially depends on finding the right level of resolution or granularity when involving neuroscience methods. we argue that it is now a good point in time to start exploring this interaction from both research on learning and instruction and in the learning sciences more systematically. this would enhance the interface of learning research to the natural sciences. at this interface there is a considerable potential for innovation. conclusion research on learning and instruction and research in the learning sciences have seen considerable methodological advancements in recent years. although a certain specialisation can be seen due to differences in some of the basic assumptions we see good reasons for transferring these innovations between the research communities. we see four potentials for innovation for learning research resulting from these methodological developments: (1) increased gain in scientific understanding through more “messy studies” when investigating “real” learning in new fields, (2) more comprehensive understanding of learning phenomena through the use of methodologies that can handle multiple units of analysis and entail processes analyses, (3) overcoming overreliance on self-reports: from personal constructs of learning and motivation to series of interactions unfolding over time, and (4) building bridges between research on learning and cognitive neuroscience. the contributions to this special issue are each addressing one of these potentials. f. fischer & s. järvelä 5 | f l r references azevedo, r. & hadwin, a. f. (2005). scaffolding self-regulated learning and metacognition – implications for the design of computer-based scaffolds. instructional science, 33(5), 367–379. doi:10.1007/s11251-005-1272-9 brown, a. l. (1992). design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. journal of the learning sciences, 2(2), 141-178. doi:10.1207/s15327809jls0202_2 collins, a. (1992). toward a design science of education. in e. scanlon & t. o'shea (eds.), new directions in educational technology (pp. 15-22). new york: springer. doi:10.1007/978-3-642-77750-9_2 cress, u., held, c., & kimmerle, j. (2013). the collective knowledge of social tags: direct and indirect influences on navigation, learning, and information processing. computers & education, 60(1), 5973. doi:10.1016/j.compedu.2012.06.015 dignath, c., büttner, g. & langfeldt, h.-p. (2008). how can primary school students acquire self-regulated learning most efficiently? a meta-analysis on interventions that aim at fostering self-regulation. educational research review, 3, 101-129. doi:10.1016/j.edurev.2008.02.003 dillenbourg, p., järvelä, s., & fischer, f. (2009). the evolution of research on computer-supported collaborative learning. in n. balacheff et al. (eds). technology-enhanced learning (pp. 3-19). springer, the netherlands. doi:10.1007/978-1-4020-9827-7_1 eberle, j., stegmann, k., & fischer, f. (2014). legitimate peripheral participation in communities of practice: participation support structures for newcomers in faculty student councils. journal of the learning sciences, 23(2), 1-29. doi:10.1080/10508406.2014.883978 greeno, j. g. (2006). learning in activity. in r. k. sawyer (ed.), the cambridge handbook of the learning sciences (pp. 79-96). new york: cambridge university press. filsecker, m., & hickey, d. t. (2014). a multilevel analysis of the effects of external rewards on elementary students' motivation, engagement and learning in an educational game. computers & education, 75, 136-148. doi:10.1016/j.compedu.2014.02.008 flavell, j. h. (1979). metacognition and cognitive monitoring: a new area of cognitive-developmental inquiry. american psychologist, 34, 906-911. doi:10.1037/0003-066x.34.10.906 hannula, m., lepola, j. & lehtinen, e. (2010). spontaneous focusing on numerosity as a domain-specific predictor of arithmetical skills. journal for experimental child psychology, 107, 394-406. doi:10.1016/j.jecp.2010.06.004 herrenkohl, l. r. & cornelius, l. (2013). investigating elementary students' scientific and historical argumentation. journal of the learning sciences, 22(3), 413-461. doi:10.1080/10508406.2013.799475 hoadley, c. & van haneghan, j. (2011). the learning sciences: where they came from and what it means for instructional designers. in r. a. reiser & j. v. dempsey (eds.), trends and issues in instructional design and technology (3rd ed., pp. 53-63). new york: pearson. järvelä, s. & hadwin, a. (2013). new frontiers: regulating learning in cscl. educational psychologist, 48(1), 25-39. doi:10.1080/00461520.2012.748006 kollar, i., ufer, s., reichersdorfer, e., vogel, f., fischer, f., & reiss, k. (2014). effects of collaboration scripts and heuristic worked examples on the acquisition of mathematical argumentation skills of teacher students with different levels of prior achievement. learning and instruction, 32, 22-36. doi:10.1016/j.learninstruc.2014.01.003 malmberg, j., järvenoja, h. & järvelä, s. (2013). patterns in elementary school students’ strategic actions in varying learning situations. instructional science, 41, 933–954. doi:10.1007/s11251-012-9262-1 f. fischer & s. järvelä 6 | f l r mulder, y. g., lazonder, a. w., de jong, t. (2014). using heuristic worked examples to promote inquirybased learning. learning and instruction, 29, 56-64. doi:10.1016/j.learninstruc.2013.08.001 reimann, p. (2009). time is precious: variable-and event-centred approaches to process analysis in cscl research. international journal of computer-supported collaborative learning, 4(3), 239-257. doi:10.1007/s11412-009-9070-z stahl, g. (2006). group cognition. cambridge, ma: mit press. varma, s., mccandliss, b. d., & schwartz, d. l. (2008). scientific and pragmatic challenges for bridging education and neuroscience. educational researcher, 37, 140-152. doi:10.3102/0013189x08317687 wieman, c. e. (2014). the similarities between research in education and research in the hard sciences. educational researcher, 43(1), 12-14. doi:10.3102/0013189x13520294 zimmerman, b. j. (2008). investigating self-regulation and motivation: historical background, methodological developments, and future prospects. american educational research journal, 45(1), 166-183. doi:10.3102/0002831207312909 microsoft word jones_finalproofs.docx frontline learning research vol. 10 no. 1 (2022) 46 75 issn 2295-3159 corresponding author: cheryl jones, college of health college of science, health, engineering and education, murdoch university, western australia. c.a.jones@murdoch.edu.au doi:https://doi.org/10.14786/flr.v10i1.851 interpersonal affect in groupwork: a comparative case study of two small groups with contrasting group dynamics outcomes cheryl jones1, simone volet1, deborah pino-pasternak2 & olli-pekka heinimäki3 1 murdoch university, australia 2 university of canberra, australia 3 university of turku, finland article received 28 april 2021 / article revised 24 february 2022 / accepted 4 july / available online 2 august 2022 abstract teamwork capabilities are essential for 21st century life, with groupwork emerging as a fruitful context to develop these skills. case studies that explore interpersonal affect dynamics in authentic higher education groupwork settings can highlight collaborative skills development needs. this comparative case-study traced the sociodynamic evolution of two groups of first-year university students to investigate the high collaborative variance outcomes of the two groups, which reported starkly contrasting group dynamics (negative and dysfunctional or positive and collaborative). mixedmethods (video-recorded observations of five groupwork labs over one semester, and group interviews) provided interpersonal affect data as real-time visible behaviours, and the felt experiences and perceptions of participants. the study traced interpersonal affect dynamics in the natural fluctuation of not just task-focused (on-task), but also explicitly relational (off-task) interactions, which revealed their function in both task participation and group dynamics. findings illustrate visible interpersonal affect behaviours that manifested and evolved over time as interactive patterns, and group dynamics outcomes. fine-grained analysis of interactions unveiled interpersonal affect as a collective, evolving process, and the mechanism through which one group started and stayed highly positive and collaborative over the semester. the other group showed a tendency towards splitting to undertake tasks early, leading to low group-level interpersonal attentiveness, and over time, subgroups emerged through interactions both off-task and on-task. the study made visible the pervasive nature of interpersonal affect as enacted through seemingly inconsequential everyday behaviours that supported the relational and task-based needs of groupwork, and those behaviours which impeded collaboration. keywords: group dynamics; interpersonal affect; groupwork; socioemotional interaction; higher education jones, volet, pino-pasternak & heinimäki 47 | f l r 1. introduction while it is generally believed that engaging in groupwork in higher education prepares students for future teamwork (curşeu et al., 2018), the relational realm of their social interactions (i.e., interpersonal dynamics) can be highly challenging (näykki et al., 2014). as bakhtiar et al. (2018) have argued, not only academic performance, but process outcomes such as students’ interpersonal experiences, are an essential part of the collaboration picture given that perceived experiences influence attitudes towards future groupwork. research on groupwork learning processes has traditionally theorised social interaction in terms of the dual function of the cognitive for performing shared learning tasks, and the socioemotional for social (i.e., relational) performance (isohätälä et al., 2019; kreijns et al., 2003). while such distinctions may serve analytical purposes, the interdependent nature of cognitive and socioemotional processes of groupwork are also widely recognised, although the inherently relational aspect has traditionally been a secondary focus for groupwork research (baker et al., 2013). yet, as baker et al. (2013) note, for some students, the social can be particularly salient, and as students often struggle with the social dynamics of groupwork (näykki et al., 2014) there is a need for better understanding these aspects through case studies that closely examine real time interactions in authentic group situations. relying exclusively on post hoc individual self-report data cannot shed light on the social dynamics as manifest through the interdependent actions that unfold between participants. further, the function of affect as inherently interpersonal phenomena in social interaction is often overlooked due to its pervasive and hidden in plain sight nature, yet as barsade and knight’s (2015) review of group affect research has found, it is an important part of understanding the group dynamics puzzle. the present research was grounded in a perspective of interpersonal affect as inherently social (i.e., relational) and dynamically evolving over time (jones et al., 2021; mesquita & boiger, 2014) to examine the starkly contrasting social dynamics outcomes reported by two groups, each with four members, who were first year teacher education students undertaking a mandatory introductory science unit. the aim was to understand the function of interpersonal affect, as visible behavioural phenomena enacted by participants within their moment-to-moment interactions, and how it could explain the contrasting perceptions of the two groups regarding their group dynamics as negative, or positive. the following sections present the conceptual framework that guided the present study, and selected research in social, and educational psychology, that has empirically examined affect phenomena as social and dynamic in groupwork situations using dynamic methods (i.e., observations). 1.1 affect as inherently interpersonal and temporally evolving phenomena in social interaction affect has traditionally been studied as individual (i.e., intrapersonal) phenomena incorporating a range of affective states, such as the preferences, attitudes, moods, affect dispositions, interpersonal stances, and emotions that individuals experience or express (scherer, 2005, p. 704). the american psychological association, for example, defines affect as: any experience of feeling or emotion, ranging from suffering to elation, from the simplest to the most complex sensations of feeling, and from the most normal to the most pathological emotional reactions. often described in terms of positive affect or negative affect, both mood and emotion are considered affective states. (american psychological association, n.d.) a paradigm shift in recent decades “from intrapersonal to interpersonal” perspectives, however, reflects growing recognition of the social nature of emotions (van kleef, 2021, p. 91) and affect phenomena more broadly (kuppens, 2015). hess and hareli (2019), for example, posit that even the most routine of everyday social encounters involve some emotion exchange, which acts as the “communicative signals” (p. 2) that coordinate social interaction (van kleef, 2021), and which are often taken for granted due to their pervasive presence. according to philosopher sheets-johnstone (2009), affect in its most fundamental sense compels avoidance or engagement and can be seen as “responsivity, jones, volet, pino-pasternak & heinimäki 48 | f l r a feature affectively characterizable as interest or aversion, hence as movement toward or away from something in the environment” (p. 376), and described variously in terms of unpleasant-pleasant, goodbad, positive-negative, and so on. the nature of affect as interpersonally manifest and evolving over time in social interaction, is highlighted in mesquita and boiger’s (2014) sociodynamic model of emotions, which describes emotions as arising through interaction with others and serving an important role in cultivating the cohesion of the sociocultural contexts in which they occur. according to mesquita and boiger (2014), affect and social interaction “form one system” (p. 298) such that the affect (i.e., emotions; moods) that arises during social encounters is not reducible to an individual’s experience or expression; it is part of the interpersonal situation as it unfolds in groupwork. this sociodynamic perspective highlights how affect is collectively cocreated (e.g., as group climate, conflict, or mood), as demonstrated in social psychology research that has focused on its visible nature in social contexts. for example, bartel and saavedra (2000) posited that for the relational effect of group mood to manifest it must be communicated in social interaction through visible behaviours. they demonstrated group mood as perceptible phenomena in 70 diverse workgroups using an observation instrument based on affect valence and activation, their observations aligning with participants’ self-reports. barsade’s (2002) experimental study with university students then showed a contagion effect of affect as dynamically evolving in group interactions, in which a confederate enacted bartel and saavedra’s (2000) behavioural indicators. the relational effect of affect phenomena (i.e., as group mood, climate, tone) has subsequently been illustrated in education and workplace contexts (barsade & knight, 2015). as slaby (2016) observed, “relational affect is often more a matter of specific modes of interaction various ways of beingand acting-together in a situation, modes of joint or cocomportment regardless of whether these modes of interaction assume the shape of a specific emotion type or not” (p. 8). based on the above, the construct interpersonal affect was conceptualised in the present study as visible negative or positive behaviours in order to explore their manifestation, and function in the fundamentally relational realm of groupwork. studying the function of affect in social interaction also requires its conceptualization as dynamically manifesting and evolving over time (kuppens, 2015). reviews of the research on affect in groups have highlighted its dynamic temporal nature, such as the way in which affect during early group life has been shown to impact how groups interact and develop going forward (barsade & knight, 2015). observational analyses of group interaction in collaborative learning contexts (e.g., bakhtiar et al., 2018; kwon et al., 2014) have found that socioemotional interactions influenced ongoing group interaction, including group learning processes. for example, observations of group interaction have found that negative socioemotional interactions can influence group learning processes over time, such as reducing task engagement (e.g., näykki et al., 2014). affect can also evolve as temporal interactive patterns. for example, järvenoja et al. (2019) reported temporal interaction patterns in an explorational study of emotional regulation processes, which found groups exhibited three types of challenges – cognitive; emotional and motivational; social context and interaction – that evolved as temporal patterns in the absence of any perceptible collective emotion regulation. more studies of groups’ real time interactions, that shed light on affective processes as they spontaneously arise and unfold are needed, as “collaborative learning is a temporally unfolding process, and as such, can only be captured as a series of interactions emerging over time” (isohätälä et al., 2019, p. 833). furthermore, observations of group interactions often examine episodes, such as socioemotional hotspots, emotion regulation processes, or interactions from one meeting, and studies that trace affect phenomena as they arise moment-to-moment and sequentially unfold over longer periods are also needed to explore how they emerge as collective, group-level relational dynamics. jones, volet, pino-pasternak & heinimäki 49 | f l r 1.2 interpersonal affect and the importance of group dynamics outcomes recent studies on emotion regulation in collaborative learning have been instrumental in highlighting the pervasive nature of affect as innately interpersonal in groupwork, unveiling positive socioemotional behaviours that are important in supporting the quality of group learning processes. for example, widely agreed socioemotional behaviours found to support group learning, and which are innately relational, include providing encouragement (e.g., bakhtiar et al., 2018; isohätälä et al., 2018; järvenoja et al., 2019; kwon et al., 2014; lobczowski et al., 2021), and displaying respect (e.g., bakhtiar et al., 2018; isohätälä et al., 2018) towards one another. conversely, socioemotional behaviours found to hinder group learning processes include undermining, rejecting, or overruling others’ contributions (e.g., bakhtiar et al., 2018; näykki et al., 2014). groupwork often also involves socioemotional challenges such as participants’ anxiety, and frustration (järvenoja et al., 2019), and emotion regulation strategies that can have an unfavourable impact, for example complaining, or venting, which can spread among participants (lobczowski et al., 2021). research has also identified conflict emergence due to inadequate regulation of relational challenges, proving detrimental to task engagement (näykki et al., 2014), and groups also often avoid the critical argumentation needed for problem solving in favour of maintaining positive relations (e.g., isohätälä et al., 2018; sohr et al., 2018). this suggests that participants can struggle balancing task and relational demands of collaboration (näykki et al., 2014). research on emotion regulation processes has thus shown the ubiquitous presence of affect and its important function in the quality of joint learning processes, and their deeply intertwined nature with the fundamental relational realm of groupwork. yet, as garcía et al. (2020) note, it remains the case that typically, “socioemotional interactions are studied in function of the results of the task and not as a phenomenon of interest in itself” (p. 209) and interactions not necessarily oriented to the learning task also warrant attention (järvenoja et al., 2017). thus, there remains much to be understood about the jointly manifested nature of affect phenomena in groups’ social dynamics, and the subsequent impact of interpersonal affect on participants’ subjective experience of these dynamics. the importance of participants subjective experiences of their group dynamics was underscored by bakhtiar et al. (2018, p.59) who argued that “although performance is commonly used as an indicator of productive collaboration, another important indicator is group members’ perceptions of their experience, as these perceptions are carried forward as beliefs and knowledge informing approaches to future collaborative work.” in the present study, following poupore (2018), groups’ alternative negative or positive dynamics were conceptualised as outcomes on the grounds that the group dynamics of each meeting are viewed as micro-outcomes which serve as inputs to subsequent meetings, and critically, the self-reports of participants at semester end. group dynamics can be broadly understood as: the processes, operations, and changes that occur within social groups, which affect patterns of affiliation, communication, conflict, conformity, decision making, influence, leadership, norm formation, and power. the term…emphasizes the power of the fluid, ever-changing forces that characterise interpersonal groups. (american psychological association, n.d.) according to reviews of the group affect literature (e.g., barsade & knight, 2015; knight & eisenkraft, 2015), understanding group dynamics outcomes requires examining affect phenomena. this was a key finding of barsade’s (2002) study, which showed that affect dynamically evolved over the course of a meeting, impacting the group dynamics of the experimental groups. forsyth (2014) describes group dynamics as “the influential actions, processes, and changes that occur” (p. 2). in the present study, interpersonal affect is thus examined as influential actions (i.e., negative or positive behaviours) that unfolded at the micro-temporal level, and which evolved over time into macro-temporal interactive patterns that contributed to the groups’ contrasting dynamics outcomes. as the present study focused primarily on the relational (otherwise known as affective) realm, group interaction was extended beyond traditional task focus to incorporate off-task interactions. kreijns jones, volet, pino-pasternak & heinimäki 50 | f l r et al. (2003) argued that off-task interaction is typically affect-laden, less formal, and a space where people can establish relationships. according to vygotsky (1978), the intersubjectivity that occurs in social interaction is fundamental to human relations (garcía et al., 2020) thus off-task interaction equally has relevance in understanding how the starkly contrasting groups intersubjectively cocreated their social understanding. along this line, barkaoui et al. (2008) adopted a vygotskian perspective for their analysis of off-task interaction, arguing that all interaction, including affective (i.e., relational) interaction off-task, is germane to collaboration. other empirical research has also shown that off-task interaction influences ongoing interaction. for example, in experimental research with university students, pre-meeting small-talk influenced ongoing positive socioemotional interactions (yoerger et al., 2018), and in workplace teams, informal (e.g., sports, weather) chat was found to be infused with interpersonal affect that had a positive relational and task impact (gorse & emmitt, 2009). in the present study, which focused primarily on the relational realm of groups, all of the off-task interactions were therefore conceptualised as innately affective, as described further below in section 2.4 observational data coding. 1.3 the present study this study explored interpersonal affect, of two groups of first-year university students who reported starkly contrasting group dynamics outcomes (negative and dysfunctional; positive and collaborative). the aim was to examine the extent to which the two groups’ contrasting perceived dynamics outcomes could be understood in relation to the visible interpersonal affect that arose and evolved in their ongoing interactions. in the present study, off-task interactions were distinguished from those on-task to enable the exploration of how interpersonal affect in relational talk off-task may influence not only ongoing interpersonal affect, but also participants’ evolving task participation given that learning was after all, the groups’ raison d’être (thus foregrounding relational but not ignoring its task function). the present study therefore traced interpersonal affect, in a phenomenological sense following the participants’ interpersonal affect in social interaction, in its natural fluctuation off-task and on-task. task participation was operationalised as participant/s contributing to the group task, evidenced through nonverbal and verbal communications (isohätälä et al., 2019) or undertaking task functions (e.g., interacting with materials, with or on behalf of the group). absence of task participation, in turn, was apparent by participant/s talking off-task. as groups naturally fluctuate between more or less informal modes, from spontaneous small-talk to task participation, the evolution of interpersonal affect is sequentially interwoven throughout the fabric of these two broad intersecting domains. off-task and on-task interactions comprise the whole social context, expected to provide unique insights into the relational function of interpersonal affect as ontologically unfolding in groupwork. two research questions guided this study: rq1: how does interpersonal affect manifest and evolve over time in the off-task and on-task interactions, of two groups that reported contrasting group dynamics outcomes following their groupwork? rq2: what kind of interpersonal affect phenomena characterise the fluctuation of off-task and on-task interaction? jones, volet, pino-pasternak & heinimäki 51 | f l r 2. methodology 2.1 research design this comparative case study (yin, 2018) explored two small groups that were video-recorded undertaking shared science activities and then interviewed at the end of semester, yielding both external observations and participants’ own perspectives. following näykki et al’s (2014) suggestion, a comparative case study was used to examine moment-to-moment interpersonal affect behaviours that taken together, contributed to participants perceived negative or positive group dynamics outcomes. these kinds of everyday, hidden in plain sight phenomena are often only made apparent through contrast (mills et al, 2012), and could contribute to better understanding interpersonal dynamics given that participants’ perceptions of their group interaction influence their engagement in ongoing interaction. 2.2 participants and context data for the study are a subset of a larger research project conducted within an introductory science unit for first-year teacher education students, in which the students were filmed during five groupwork labs. twenty-two groups, spread across six different lab classes with different teachers, remained intact with their four members attending all classes over the semester (no natural attrition). two case groups were selected as the focal point of this study, based on their starkly contrasting (negative or positive) self-reported group dynamics in their group interviews at semester end. specifically, one group repeatedly expressed highly positive dynamics and an enjoyable experience, while the other reported salient negative events and ongoing interpersonal tensions. the two groups were from the same lab class, hence had the same teacher and lab conditions, limiting confounds potentially associated with different teachers, therefore making them highly suitable for this comparative analysis. group a comprised two females and two males, and group b, three females and one male. each group had one mature-aged student (over 25-years-old), and three under 25-years-old. the students selfselected into groups, however being a first-year unit, typically did not know one another well yet, and tended to form into groups as they were seated when instructed to form groups. students were also asked to stay in their groups for the semester but could discuss with the teacher if they wanted to change. participants were also advised that they could withdraw from the research at any time. approval for the research was provided by the university’s human research ethics committee and conducted in accordance with the national research code of conduct. participants provided written consent for videorecordings and interviews. all names used are pseudonyms. the research context was a science unit aimed to develop first-year student teachers’ knowledge of fundamental concepts in chemistry, earth sciences, and physics, and understanding of scientific inquiry including practical experimental skills of planning and conducting investigations. weekly lab activities consisted of one two-hour class, in which learning tasks were undertaken in small groups, using everyday materials for hands-on experiments. the groups were advised to work together (i.e., not to split but to work as an intact group on their activities). details of the five labs’ science activities can be found in appendix a. 2.3 data sources 2.3.1 video-recorded observations of nine groupwork labs undertaken, five were video-recorded, filming the groups in their initial three weeks working together, then mid-semester, and their final group activity, providing a macrotemporal perspective spanning the semester. the teacher-instructed labs included collective hands-on experimenting followed by group science reasoning. activities included shared planning, jones, volet, pino-pasternak & heinimäki 52 | f l r experimentation, and conceptual reasoning. almost eight hours (479.5 minutes) of video-footage of the two groups in five labs, were coded and analysed (see table 2 for number of coded interactions and breakdown by off-task and on-task). the duration of coded observations for each activity ranged from 31-55 minutes. 2.3.2 group interviews conversation style focus group interviews (approximately one hour with each group separately), were audio-recorded and transcribed. they elicited participants’ feelings and perceptions regarding their groupwork. although all participants were present for all video-recorded labs, one participant in both groups declined the interview. participants were invited to start the conversation with a general question asked to ignite discussion: “what would you like to share [with us] about your experience in the labs?” then followed conversation that was inspired by a video-stimulated recall interview approach (sherin, 2004). videoclips were shown to stimulate informal discussion and directly tap group members’ elucidations of their group interactions, followed by the question, “what would you like to say about this episode?” 2.4 observational data coding the video-recordings were systematically coded using the observer xt behavioural coding software. a coding scheme was developed to exhaustively, and exclusively, parse group interactions into one of 18 discrete codes (see appendix b) and trace visible interpersonal affect as concrete behaviours. (data examples for each code are provided in appendix c). the unit of analysis was a discrete verbal behaviour (i.e., single utterance) or nonverbal behaviour. coding was undertaken at the individual level as each participant could be enacting different behaviours (off-task or on-task) at the same time. the scheme was informed by a review of observational research in education and workplace contexts (e.g., jones et al., 2021), with codes adapted from studies in collaborative learning (e.g., isohätälä et al., 2019; rogat & linnenbrink-garcia, 2011), and kauffeld and lehmann-willenbrock’s (2012) act4teams instrument, which has been extensively validated in workplace and university contexts. while only some on-task behaviours included visible manifestations of affect, with a positive or negative valence, all off-task behaviours were conceptualised as affective. off-task codes were exploratory in nature to tap relational small-talk, humour, and laughter targeted to the group (i.e., positive interpersonal affect), or otherwise non-inclusive behaviours (e.g., whispered side-talk), conceptualised as negative interpersonal affect. the non-affective category (fifth column of appendix b) represents behaviours with no overtly obvious affective valence. the code empty-talk (adapted from kauffeld & lehmann-willenbrock, 2012) was used 103 times of 6,500 coded behaviours (1.6%), therefore was excluded from analysis, as reviewing their occurrence suggested these instances did not impact group interaction. the codes, representing broad interpersonal affect behavioural types, and valence, aimed to tap interpersonal affect as it occurs in the kinds of hidden in plain sight, everyday behaviours of social interaction, what slaby (2016) referred to as the relational affect that is reflected in the ways we behave and act together in a situation, and which do not necessarily always involve expression of a particular emotion. in the off-task categories, for example, codes are generic (e.g., “small-talk”, “humour”, see appendix b). within on-task categories (positive, negative) codes have several behaviours (identified as salient affect behaviours in empirical research as discussed above) grouped together (e.g., “abrupt, curt or rude behaviours; interrupting to over-rule; ignoring”) as the aim was to denote the general type of interpersonal affect behaviours, and valence. for example, “complaining; negative utterances” can be directed to objects, or the task, whereas “criticizing/ running someone down” is clearly directed towards other/s, therefore a different kind of interpersonal affect. jones, volet, pino-pasternak & heinimäki 53 | f l r initial exploratory data analysis by the first author inspired a draft of the coding scheme, which was trialled with a second researcher (fourth author) from a dissimilar sociocultural milieu as we considered that researchers from diverse sociocultural contexts might contribute richly distinct viewpoints on interpersonal affect, and address observer biases. the trial process included joint viewing of video-clips, sharing conceptual and empirical understanding of events (rogat & linnenbrink-garcia, 2013), then an iterative process involving individually coding test data followed by joint meetings, and further independent coding and comparison. following, systematic coding including inter-rater reliability coding (hallgren, 2012), was conducted. to assess inter-rater reliability, a portion of each video was coded by two researchers (1,753 behaviours, 27%). segments for inter-rater coding were randomly selected to comprise a portion from each lab for each group. the overall average interrater reliability produced a cohen’s kappa of κ=.86. by coding category, off-task agreement overall was κ=.92 (positive κ=.90; negative κ=1.0). on-task agreement overall was κ=.85 (negative κ=.82; positive κ=.85). table 1 lists the breakdown of inter-rater agreement by group and lab. disagreements were resolved through discussion and repeated observations, and a small portion (n=43) of highly ambiguous behaviours were coded collaboratively rather than individually (rogat & linnenbrink-garcia, 2013). these were typically in the on-task negative interpersonal affect category, and involved unravelling ambiguous episodes (i.e., several exchanges) that appeared to have some abrupt, curt or rude, behaviours in the context of the interactive flow, such as whether other/s had been deliberately, or inadvertently, ignored in discussion. table 1 inter-rater reliability agreement (cohen’s kappa) week 2 week 3 week 4 week 7 week 12 all labs total group a .85 .92 .81 .90 .90 .89 group b .89 .76 .82 .75 .73 .81 both groups .87 .88 .81 .86 .87 .86 2.5 data analysis 2.5.1 frequency analysis the coded data were exported from observer xt for each group by lab, and their frequencies tabulated and analysed. the data analysis comprised three steps, which reflect the gradual zooming into the data, starting by focusing on the groups’ interactions to identify the extent of off-task and on-task interaction. next, moving to their interpersonal affect within interactions, the coded data were analysed by group, in each lab to develop a picture of the groups’ evolutionary trajectories over the semester. the groups’ evolutionary trajectories were analysed in terms of off-task and on-task interactions, and interpersonal affect, to highlight the emergence of interactive patterns over time, for each group. then, the analysis focused on the breakdown of the visible behaviours that were coded as evidence of interpersonal affect in each group. 2.5.2 qualitative analysis of interpersonal affect in the fluctuation of off-task and on-task interaction the coded observations were then qualitatively analysed in the observer xt by first temporally segmenting each of the ten videos (two groups in five labs), into 30-second segments with brief descriptive labels for an overview of the entire dataset for each group (isohätälä et al., 2018; näykki et al., 2014). the video-recordings were also transcribed in the observer xt. an “elaborated running record” (rogat & linnenbrink-garcia, 2013, p. 105) additionally documented salient nonverbal phenomena (i.e., orientation to other/s, eye-gaze, spatial and material use). common episodes across groups were identified in each lab, which highlighted salient comparative events and evidence of interpersonal affect in the fluctuation of off-task and on-task interactions. jones, volet, pino-pasternak & heinimäki 54 | f l r 2.5.3 qualitative analysis of group interviews the interviews provided a perspective of interpersonal affect as the felt experiences of the participants, and their interpretations of their own and others’ interactions regarding task and relational aspects of their groupwork experience. qualitative content analysis of the focus group interviews (huber, 2020) was conducted after the video-recordings had been coded and fully analysed in the above steps. the analysis was undertaken in two phases. first, content of participants’ talk was explored in terms of: i) members’ own feeling states (negative or positive valence) expressed regarding relational aspects of their groupwork, or; ii) interpersonal perceptions about other/s (negative or positive valence) or about others’ affect state/s; iii) negative or positive comments about the learning tasks; iv) negative or positive comments about task interactions; and v) perceptions of how they got on as a group. the interview transcripts were then explored for any other phenomena that may be insightful regarding the groups’ interpersonal dynamics, such as whether participants exhibited agreement regarding their perceptions. 3. results 3.1 interpersonal affect in off-task and on-task interactions, by valence, and over time (rq1) the findings addressing the first research question are reported in three sub-sections, reflecting the focus of the three data analysis steps: interactions; interpersonal affect; and visible interpersonal affect behaviours. two main group differences emerged, consistent with the self-reports of contrasting negative (group b) and positive (group a) group dynamics. first, interpersonal affect was overall more negative than positive in group b, and highly positive overall in group a; and, secondly, group a’s interactions both off-task and on-task exhibited minimal presence of the side conversations evident in group b. 3.1.1 interactions: by off-task and on-task, and evolution over time the breakdown of off-task and on-task interactions overall is presented in table 2, showing that off-task interactions comprised over 20% of all interactions, of each group. table 2 breakdown of off-task and on-task interactions overall group a frequency (%) group b frequency (%) total frequency (%) total interaction 3,672 (100.0) 2,828 (100.0) 6,500 (100.0) off-task 1,124 (30.6) 628 (22.2) 1,752 (27.0) on-task 2,484 (67.7) 2,161 (76.4) 4,645 (71.5) note: 1. the total of off-task and on-task does not equal 100% as empty talk was excluded from analyses due to evident minimal impact on group interaction and minimal appearance in both groups. 2. all off-task interactions were conceptualised as inherently capturing interpersonal affect. 3. not all on-task interactions exhibited visible affect. jones, volet, pino-pasternak & heinimäki 55 | f l r the presence of off-task interactions was found in both groups in every lab, with a remarkably similar temporal pattern in the frequencies off-task and on-task interaction across groups over the five labs, shown in figure 1. this similarity suggests the presence of common contextual factors (i.e., the task activities) and therefore the need to look beyond task characteristics (casciaro & lobo, 2008) to understand what contributed to the starkly differing valence of the interpersonal affect across groups. figure 1. evolution of off-task and on-task interactions over time. 3.1.2 interpersonal affect: by valence, off-task and on-task, and its evolution over time concerning valence, the two groups were starkly different in their overall interpersonal affect, within both off-task and on-task interactions. as shown in the upper part of table 3, group a exhibited overall 91.0% positive and 9.0% negative, and group b 47.0% positive and 53.0% negative interpersonal affect. these findings align with the two groups’ self-reports of their dynamics at semester end (reported later in section 3.3.3), suggesting that the visible interpersonal affect behaviours identified through the coding (reported in section 3.1.3) contributed to the groups’ contrasting social dynamics outcomes. table 3 breakdown of interpersonal affect by valence overall and within off-task and on-task interactions interpersonal affect by valence group a frequency (%) group b frequency (%) overall (off-task + on-task) positive 1,748 (91.0) 534 (47.0) negative 173 (9.0) 600 (53.0) off-task 1,124 (100.0) 628 (100.0) positive 1,024 (91.1) 224 (35.7) negative 100 (8.9) 404 (64.3) on-task 797 (100.0) 506 (100.0) positive 724 (91.0) 310 (61.3) negative 73 (9.0) 196 (38.7) 63 76,3 79,2 55,7 68 35 22,5 19 41,5 31 0 10 20 30 40 50 60 70 80 90 100 week 2 week 3 week 4 week 7 week 12 group a on-task off-task % 78,3 88,4 81 61 73,4 19,4 10,6 18,5 36,3 26 0 10 20 30 40 50 60 70 80 90 100 week 2 week 3 week 4 week 7 week 12 group b % jones, volet, pino-pasternak & heinimäki 56 | f l r the breakdown of interpersonal affect by valence within off-task and on-task interactions is shown in the lower sections of table 3. the findings for group a are particularly striking, being almost identical for off-task and on-task (i.e., around 91.0% positive and 9.0% negative). in contrast, group b display more negative than positive interpersonal affect overall (53%), and somewhat opposite findings across off-task and on-task, specifically, off-task being 35.7% positive and 64.3% negative, and on-task interpersonal affect 61.3% positive and 38.7% negative. the patterns of group b suggest that here too interpersonal affect may traverse off-task and on-task. both groups exhibited more positive than negative interpersonal affect when on-task (see second last row of table 3). however, off-task and on-task interaction could operate at the same time (e.g., member/s in side-talk and other/s on-task) and likewise negative or positive interpersonal affect could coincide, showing that the full picture of how interpersonal affect in off-task and on-task interactions intersected during groupwork is more complex. this is examined qualitatively in rq2. a temporal overview of the breakdown of interpersonal affect by valence within off-task and on-task interactions is shown in figure 2. within on-task interactions the breakdown by valence across labs shows a systematically higher percentage of positive interpersonal affect in group a than in group b but a relatively consistent pattern over time in both positive and negative interpersonal affect across groups. in contrast, within off-task interactions, both groups display noticeable fluxes across the labs, but for group a it is in regard to their positive interpersonal affect while for group b it is in regard to their negative interpersonal affect, with off-task interactions a relatively high source of negative interpersonal affect. of note, however, in the first lab, both groups’ positive interpersonal affect dominated their interaction, especially off-task. yet, group b also started with around 5% negative interpersonal affect off-task and on-task (10.7% combined), which in the second lab rose slightly, and increased off-task thereafter. in contrast, figure 2 shows group a’s negative interpersonal affect was 3.5% for off-task and on-task combined, and remained under 5% over time, with more positive interpersonal affect in both off-task and on-task interactions across labs. this begs the question regarding the interpersonal affect arising in early group life, and its potential function in the divergent dynamics of the groups over time, which is qualitatively explored in rq2. figure 2. evolution of interpersonal affect by valence in off-task and on-task interactions over time. 3.1.3 visible interpersonal affect behaviours: by valence, off-task and on-task, and their evolution over time zooming in to the visible interpersonal affect behaviours of the two groups, figure 3 presents a temporal overview for each group of their positive or negative interpersonal affect behaviours off-task. 0 5 10 15 20 25 30 35 40 45 wk.2 wk.3 wk.4 wk.7 wk.12 group a pos off-task neg off-task pos on-task neg on-task % 0 5 10 15 20 25 30 35 40 45 wk.2 wk.3 wk.4 wk.7 wk.12 group b pos off-task neg off-task pos on-task neg on-task % jones, volet, pino-pasternak & heinimäki 57 | f l r (a full breakdown of the two groups’ positive and negative interpersonal affect behavioural codes offtask and on-task over time is provided in a table in appendix d.) regarding off-task interactions, the most striking group difference was the way in which group a started, and stayed positive over time, compared with group b. figure 3 shows the difference between the two groups off-task (inherently affective relational interactions) regarding side-talk. overall, this comprised half (50.3%) of all off-task interaction of group b, compared to 5.6% in group a. side-talk is off-task chat that innately excludes member/s because its content is not inclusive, or due to its low volume (i.e., whispering), or corporeal positioning (e.g., turned away from other/s). figure 3 shows that side-talk emerged in group b’s first lab (3.2%), increasing steadily over time to peak in week seven (18%). its manifestation and evolution as a pervasive interpersonal affect behaviour over time in group b, and its impact on group dynamics and task participation are examined in rq2. in contrast, figure 3 shows that in their first lab group a exhibited a lot of positive small-talk, involving humour and laughter. small-talk was relationally positive by its characteristics (e.g., content and volume were group-inclusive), and remained comparatively high in group a over time. group b’s positive smalltalk was far less frequent, with group-level off-task humour and laughter consistently lower and decreasing over time. figure 3. evolution of visible positive and negative interpersonal affect behaviours in off-task interactions. regarding on-task interactions, as interpersonal affect was coded into five negative and five positive behaviours in on-task interaction, for clarity of presentation they are shown separately (figures 4 and 5, respectively), for an overview of each group over the semester. scrutinising negative interpersonal affect behaviours within on-task interactions (figure 4) reveals a key intergroup difference: the non-existence of splitting the group in group a. in group b, splitting was apparent in the first lab, but decreased over time. as side-talk showed a temporal increase (figure 3), the possibility of a link between these two behaviours across off-task and on-task interaction is examined in rq2. considering positive interpersonal affect behaviours within on-task interaction (figure 5), across groups (although varying in frequency), efforts in lightening the atmosphere (e.g., task-related humour) featured most, and typically followed a similar pattern. in group a, laughter closely tracked lightening the atmosphere over time, suggesting member/s responding to lightening contributions, such as responding to a task-related joke with laughter. in contrast, in group b although laughter followed lightening at the beginning, it steadily decreased over time, but peaked in the final lab, suggesting there typically was not the same response to lightening contributions as in group a. 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 wk.2 wk.3 wk.4 wk.7 wk.12 group a small talk (pos) humour laughter neg small talk side talk on phone % 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 wk.2 wk.3 wk.4 wk.7 wk.12 group b small talk (pos) humour laughter neg small talk side talk on phone % jones, volet, pino-pasternak & heinimäki 58 | f l r figure 4. evolution of visible negative interpersonal affect behaviours in on-task interactions. figure 5. evolution of visible positive interpersonal affect behaviours in on-task interactions. in summary, an evolutionary perspective of interpersonal affect behaviours indicates that overall, group a, both off-task and on-task, started and remained positive over time. in contrast, group b started somewhat positive but the negative interpersonal affect evident from the beginning appeared to seed, increasing over the semester, most evident off-task. overall, the coding analysis shows that interpersonal affect was pervasive across groups, with its valence transcending off-task and on-task interaction, in both groups. the presence of both off-task and on-task interaction took a remarkably similar pattern across groups over the semester, indicating the similar task conditions, and therefore the need to look beyond the task to explore the groups’ different interpersonal affect trajectories, and contrasting dynamics outcomes. 3.2 manifestation of interpersonal affect in the fluctuation of off-task and on-task interaction (rq2) qualitative analysis explored how the interpersonal affect behaviours identified in the first research question actually manifested in dynamic interactions and contributed to the contrasting group dynamics outcomes reported by participants at semester end. the interplay of off-task interactions in 0 1 2 3 4 5 6 7 8 9 10 wk.2 wk.3 wk.4 wk.7 wk.12 group a complaining / neg talk criticise/ run down abrupt/ curt/ ignore disruptor splitting group % 0 1 2 3 4 5 6 7 8 9 10 wk.2 wk.3 wk.4 wk.7 wk.12 group b complaining / neg talk criticise/ run down abrupt/ curt/ ignore disruptor splitting group % 0 1 2 3 4 5 6 7 8 9 10 wk.2 wk.3 wk.4 wk.7 wk.12 group a inclusive praise/support enthusiasm/int erest lighten atmosphere laughter % 0 1 2 3 4 5 6 7 8 9 10 wk.2 wk.3 wk.4 wk.7 wk.12 group b inclusive praise/support enthusiasm/int erest lighten atmosphere laughter % jones, volet, pino-pasternak & heinimäki 59 | f l r their natural fluctuation with on-task interactions was explored, focusing in particular on the contrast of side-talk (which was over half of all off-task interaction in group b, and just 5.6% in group a). 3.2.1 developing social cohesion early in their first lab, both groups started with a high task focus, with longer episodes of social (offtask) chat occurring late in the lab when students finished the task (e.g., they had cleaned activity materials away, and had stopped discussing their activity outcomes). their first task involved preparation, and observations of two products (see appendix a for task information). the following brief excerpts present each group’s first moments, as they commenced. in group b, one member suggested splitting to manage the task’s two experiments. the teacher, overhearing, instructed that groups undertake the activities together: excerpt 1 group b initial interactions nick alright [standing as teacher finishes explaining activities; glances to nell, then to lisa and abby, who are talking quietly together] lisa [responding] alright, i’ll get the stuff for oobleck nick i’ll try silly slime, i’m no cook! nell well do we want to split in twos: two make the oobleck and two make the silly slime? nick okay, good idea! lisa no [frowning, mouth turned down] nell [responds to lisa] or, make it all together? lisa i don’t want to miss out on making both of them [smiles] teacher [overhearing, tells the group, also reiterating to the class]: no, make it all together group b thus started amicably, as did group a, yet subtle differences were apparent: excerpt 2 group a initial interactions eric alright [standing as teacher finishes explaining] anna let’s go! eric yeah? let’s go grab the stuff [all four stand] anna okay. i’ll grab the playdough eric take that over there so we can just check it [points to lab manual] anna yep eric singing: check yourself before you wreck yourself [as they go together to the materials table] group a commenced similarly to group b with anna saying, “i’ll grab…” but to which eric immediately responded, “so we can…check…”, which subtly adjusts the materials gathering as a collective process. a few minutes later, they returned together with materials for their first product, and subsequently together collected materials for the second activity. in group a, the collective start provided task affordance for group relational development (i.e., social and task cohesion). this was evident in the way that each product’s preparation involved all members working together, with relatively high on-task humour and laughter (see figure 5) that involved all four participants. conversely, group b participants returned separately in dyads a few minutes apart, each with a tray of materials. they commenced working collectively (therefore not coded “split-group”). yet, embedded in their language was an implicit reference to dyadic ownership of product preparation (i.e., the two activities), which they all expressed. for example, nell: “did you guys get the kettle water?” “lisa: no. ours is just normal water;” nick: oh, we needed hot water;” lisa: “shall we do yours first?” jones, volet, pino-pasternak & heinimäki 60 | f l r abby: “are we doing ours yet?” accompanying these kinds of comments, a split-group also emerged occasionally, as lisa quietly discussed on-task with abby, sometimes subtly resisting nick’s contributions. for example, at the beginning of their first lab, nick extended his arm to assist lisa, who was mixing some of the materials she and abby had collected. lisa tells nick that she just needs a spoon, and he immediately dropped his arm. lisa’s manner is not overtly curt, but nor is it inclusive (coded conservatively task na) and these kinds of borderline interactions became the norm in the group (e.g., “yeah wait”, “hang on”, “just read the-!”, “you’re reading the wrong one!”). in the second lab, latent dyadic subgroups emerged again in group b, with nell requesting nick assist gathering materials, while lisa and abby sat chatting off-task for five minutes, making no move to join the task. there was also no attempt to include them, thus the latent dyads of the previous week were tacitly endorsed by all participants, their pattern of commencing labs with off-task and on-task dyads continuing over the semester. in contrast, group a collected materials together, side-talk was usually brief, and interactions were characterised by positive interpersonal affect such as small-talk involving humour and laughter that was group-level (i.e., involved all four members). a qualitative difference between the two groups regarding their off-task interaction, distinguishing side-talk in group b from the small-talk typical of group a, was its low volume, and (generally the same) side-talkers, sometimes turned towards one another exclusively. group b’s split group on-task and side-talk off-task characterised early low social cohesion. this appeared to create procedural confusion with at times two dyads interacting separately, sometimes not knowing what the other was doing or saying. within this context, occasionally other negative interpersonal affect behaviours arose (e.g., highly directive interactions, ignoring) as member/s tried to ascertain what had been done, where they were up to, and so on. in this way the dyadic interactions contributed to the overall group dynamics, not only as non-cohesiveness but also an undertone of tension that occasionally surfaced as the visible negative interpersonal affect behaviours reported in section 3.1. this established the basis for ongoing interactions and highlighted a key group-level relational difference between the two groups, whereby group a for the most part interacted as a group, and group b interacted increasingly in dyads. 3.2.2 interpersonal affect in the evolution of group dynamics and task participation over time the analysis of the interplay of interpersonal affect in the fluctuation of off-task and on-task interaction over the semester highlighted another key difference between the groups in how each group’s interactive dynamics evolved over time. group-level attentiveness to one another was consistently evident in group a. in contrast, in group b, low attentiveness to one another as a group appeared exacerbated by subgroup emergence. the contrast in group-level interpersonal attentiveness is illustrated in the following brief excerpts from week four, in which groups had to plan, conduct, and document an experiment. the first excerpt is characteristic of group b’s communication: excerpt 3 inattentiveness in group b nell so, what’s our hypothesis? [reading aloud from lab book as abby was verbalising a hypothesis, which nell ignores] lisa missy here[signals that abby has a hypothesis] here, just, go on! [encourages abby to continue. nell glances briefly at lisa, then to nick, who is writing] abby briefly laughs [appears shy, quietly spoken, looking down at her writing] abby the more we increase the vinegar the ... [starts reading hypothesis again; nell ignores abby, looks to nick] nick if we increase the vinegar volume the reaction...decrease [abby stops speaking as nick speaks] nell the quicker the reaction rate nick yeah. the reaction rate should quicken jones, volet, pino-pasternak & heinimäki 61 | f l r following these interactions, lisa and abby had a brief, quiet exchange. in excerpt 4 below, group a had discussed and agreed their experiment, then sam suggested an alternative, but without justification. using science reasoning, eric and anna opposed the idea, to which sam remarked “yeah, okay”, looking downwards, and becoming quiet. a few minutes later, eric appeared to take responsibility for group harmony, checking if sam was happy with their decision: excerpt 4 attentiveness in group a eric are you happy with that sam? sam yeah, i just don’t know how you’dit should be alright, it should be alright eric what’s your question? sam how long it will actually be in the air though…to get a good measurement. we can give it a go and then we’ll find out eric … we’ve got the trial sam there’s only one way to find out anyway so, as i say [emphasizing his contribution] anna yeah, let’s just trial then modify here, eric exhibits interpersonal attentiveness, as sam had been quiet and appeared withdrawn as he looked downwards and stopped interacting with the group for almost two minutes (1 minute, 54 seconds). during this time eric continually made task-related jokes (lightening the atmosphere). after asking if sam is happy with the group decision, eric showed further interest in sam’s thoughts: “what’s your question?” following, suzi initiated an off-task relational episode, which appeared to reweave the social fabric of the group. this was evident by all members engaging in the talk and sharing personal information. later, in a similarly challenging episode, another off-task relational conversation followed. the excerpts are characteristic of each group’s myriad, fleeting yet pervasive behaviours of interpersonal affect that together cocreated each group’s social space, illustrating how group a participants routinely exhibited interpersonal attentiveness. conversely, group b participants unintentionally, and deliberately, ignored (i.e., inattentiveness) one another. the analysis revealed the way in which interpersonal attentiveness, highlighted by its consistent presence in group a and its relative absence in group b, was a subtle but relevant form of positive interpersonal affect in the groups’ face-to-face task interactions. the groups’ contrasting interpersonal affect was further emphasized later in the semester in week seven when an off-task peak across groups (e.g., see figure 3) occurred. according to conversation across groups, this appeared due to a combination of two broader contextual factors. first, students had just returned following their first practicum, which permeated off-task conversations. second, the task (electrical circuits) was considered challenging, stated by teachers and students alike. in group b, after initially working as a group, the subgroups emerged with one dyad increasingly off-task and the other on-task, and tensions surfaced (e.g., lisa: “she doesn’t want my help, i’m not smart enough for this”, nell: “what? well, you’re more than welcome to try!”). in contrast, in group a, although there was also uneven task participation with two members doing the lion’s share of making electrical circuits, the group typically engaged together more in small-talk while exploring with the electrical circuits. summarising, the qualitative analysis showed how interpersonal affect behaviours in the interplay of groups’ off-task and on-task interaction in early group life evolved into their diverging relational trajectories (group dynamics) and task participation. specifically, in group b, the early appearance of side-talk off-task and splitting the group on-task, although minimal in the first lab, evolved as an implicit interactive (social) norm, and in contrast, group a started, and stayed positive and intact as a group. 3.3.3 self-report interpretations of groupwork experience the groups’ overall negative or positive interpersonal affect extended into the focus group interviews, and group a expressed being “lucky” regarding their positive experience of their jones, volet, pino-pasternak & heinimäki 62 | f l r groupwork. group b members, complaining about their groupwork experience reported that researchers would see plenty of off-task chat, summarily dismissing the frequent side-talk (e.g., explaining that they “just like talking while doing our work”). overall, group a’s self-reports largely aligned with researchers’ observations. group b members reported negative experiences, which aligned with the observational data, but also displayed lacking awareness regarding their own behaviours in cocreating the group’s social dynamics. group a commenced with a focus on their learning experiences, agreeing that the experiments were fun, but the conceptual reasoning highly challenging. this involved each group discussing and producing a group reasoning statement linking their observations and results of experiments using everyday household materials, with the relevant science concepts. group a participants discussed how their group context supported their science learning through this activity. for example, anna: “i think at the beginning i felt really nervous” but members were “bouncing ideas off each other…i think the confidence came from the groupwork and actually just, having fun.” the others agreed, suggesting that suzi too (absent from the interview) had enjoyed their groupwork. watching video-clips of their final lab, they commented on their task participation, anna reflecting, “it was good that we all contributed…”, eric agreeing: “yeah. you can see that everyone’s really involved…it was good.” they explained how over the semester they engaged everyone by rotating critical task elements including turn taking with leading the conceptual reasoning talk and documenting their joint reasoning statement, sam noting, “i think by the time we started rotating we were all pretty comfortable with each other.” participants discussed their positive interaction off-task, which anna believed had supported her learning: “for me…the contact of the group…we had a little chat and then we got into it…”, adding that it had changed her negative perspective of groupwork. they all reflected that their interactions offtask enabled them to relate well across the age-divide through showing reciprocal interest in one another’s diverse leisure pursuits, thus revealing how they utilised the affordances of off-task chats to bridge their individual differences. the oldest member praised his peers as “champs” in this regard which meant that the group members “were able to relate and talk about things other than just science”, noting awareness of the discord other groups experienced. in contrast, group b participants commenced with “where’s the smart one?” (nell) referring to nick who was absent, then briefly mentioned that they all “hated” the conceptual reasoning and moving swiftly to the relational realm: “it’s hard work being in groups” (nell). lisa and abby discussed how they liked chatting (off-task) as they worked, saying “that’s just what we do” (abby), which they perceived nick disapproved of (what they referred to as “gossiping”) and so ignored them. they suggested nick was too task-focused (nell: he’s like nuh, it has to be all science). yet, their first lab together also shows nick initiating small-talk with the group, which he continued to do over the semester. they were all vocal regarding their perceptions that nick had ignored lisa and abby’s task contributions from the start, lisa repeatedly stating frustration about feeling unheard, while abby commented: “i just gave up saying anything because he didn’t even listen to me. so, i said nothing.” watching video-clips of their final lab elicited further relational dynamics comments, lisa stating: “i was getting so frustrated this day” because nick insisted his boat would be the group boat. however, lisa and nell then explained “we sort of just gave up and we were like, nick, you just make the boat” (on behalf of the group), which reflects more closely what actually occurred. they attributed the negative group dynamics to nick, and ultimately to age difference. while age difference was also present in group a, it was reported as unproblematic (e.g., the above-mentioned comment of the oldest group a member praising his peers as “champs” for how they all engaged as a group both on-task and off-task. the three group b members repeatedly commented, “we were very chilled, all three of us”; “we’re just really relaxed people…” (referring to the high amount of side-talk). yet, they also repeatedly mentioned how “angry” lisa would become during their groupwork, at odds with the “relaxed” comments, and the bickering (e.g., abrupt, curt) between lisa and nell that appeared in all five labs. jones, volet, pino-pasternak & heinimäki 63 | f l r this went unmentioned in the interview, potentially highlighting a limitation of group interviews, where member/s may not be comfortable expressing fully their real feelings and perceptions of their groupwork experience, although abby and nell expressed feeling uncomfortable when lisa became angry (which was attributed to her frustration with nick ignoring her task contributions). in sum, group a’s accounts largely aligned with researchers’ observed salience of interpersonal affect dynamics in the interplay of off-task and on-task interactions. conversely, although group b’s accounts reflected their negative dynamics, their self-reports diverged from researchers’ observations (coding frequencies and fine-grained qualitative analysis), which indicated that all members had contributed to (cocreated) the group’s dynamics. 4. discussion the present study focused on the relational realm of groupwork, emphasizing the important function of interpersonal affect as collectively manifest and dynamically cocreated by all members in the social dynamic of groups. the analysis of interactions confirmed self-reports regarding participants perceived negative or positive group dynamics outcomes, showing that interpersonal affect which arose early in off-task and on-task interactions swiftly became interactive patterns, that shaped task participation, and group dynamics outcomes. this study extends the limited case study research on affect in group interaction as it unfolds in real time, unveiling the microlevel interpersonal affect behaviours that evolve as group patterns, and their function in the collaborative variability that continues to be reported in groupwork (e.g., lobczowski et al., 2021). the coding scheme was instrumental for capturing the frequency, valence, and temporal evolution of interpersonal affect through behaviours manifest in the natural fluctuation of off-task and on-task interaction for a more complete picture of how social dynamics sequentially unfolded (langer-osuna et al., 2020). while group dynamics research has largely relied on static, post hoc self-report methods, this process-oriented study provided a dynamic perspective (vriesema & mccaslin, 2020) that unveiled how in both groups, visible interpersonal affect behaviours comprised a vital piece of the group dynamics puzzle (barsade & knight, 2015). insights afforded through a sociodynamic perspective of affect in the relational realm of groupwork are considered below in terms of key findings, and their implications for groupwork in higher education. the sociodynamic conceptual lens illuminated the innately interpersonal nature of affect as a jointly manifest and pervasive component of group interaction that was irreducible to any one participant (mesquita & boiger, 2014). its perceptible nature (bartel & saavedra, 2000) in the behaviours of participants was traced as dynamically woven through the natural ebb and flow of the groups’ on-task and off-task interactions that cocreated each group’s social space (langer-osuna et al., 2020). the visible nature of interpersonal affect can be viewed through philosopher sheets-johnstone’s (2009) perspective of affect as fundamentally compelling actors’ towards or away from (the group), reflecting the way in which “relational affect” is manifest through interactions that are not always emotion expressions (slaby, 2016). likewise, social cohesion is broadly defined as “the attraction of members to one another and to the group as a whole” (forsyth, 2014, p. 136). importantly, our systematic, microlevel analysis revealed the collective interpersonal affect behaviours that evolved so differently in the two groups, with the opportunity for social cohesion thwarted early in group b despite some positive efforts (evidenced in the coding results). alternatively, social cohesion developed early in group a and was sustained all semester, withstanding inevitable challenges (e.g., during week four, illustrated in excerpt 4). the finding of early interpersonal affect in shaping both groups’ interactive patterns (i.e., negative or positive) over the semester, aligns with previous studies that have identified the influence of early affect in ongoing group processes (e.g., bakhtiar et al., 2018; kwon et al., 2014; näykki et al., 2014), reflecting group development theories regarding the tenuous nature of early group life (braun et jones, volet, pino-pasternak & heinimäki 64 | f l r al., 2020). it highlights the critical role of early interpersonal affect as enacted behavioural phenomena, for the ongoing function of groups. a key finding was the difference between the two groups of early latent subgroup emergence in group b while group a started, and stayed, intact. fine-grained analysis illuminated how in group b subgroups developed through seemingly inconsequential behaviours that solidified into increasingly negative interpersonal affect over time. group dynamics scholars have cautioned the propensity of subgroups for creating tension and conflict (forsyth, 2014), which the present study not only affirmed but unveiled how they actually emerged. the development of subgroups is underexamined, yet their presence has been observed as unhelpful in higher education groupwork. for example, näykki et al’s (2014) case study of group conflict showed participants providing dyadic support for one another, which was not advantageous at group-level, and tensions ultimately diminished the group’s task engagement. in the present study, group b, by their own admission, in the final lab left one member to do the group task alone. conversely, in group a, qualitative analysis revealed that rarely, and briefly, were task functions undertaken dyadically, and then always done in different dyads, which appeared a fruitful way of preventing subgroups from inadvertently developing. this is important for students and educators to be aware of since group tasks often involve some activity dispersion. furthermore, the socially complex dynamic of subgroups, and their consequences also need to be better understood. the off-task and on-task dyads that were present in group b not only reduced all-group task participation but also importantly, decreased participants’ opportunities for improving their social dynamics, and collaboration skills. in group b this appeared to create a kind of spiral effect, not only increasing negative interpersonal affect but also further entrenching the subgroups. the detrimental impact of the subgroups echo collaborative learning literature highlighting the importance of working truly together on a task (e.g., dillenbourg, 1999; summers & volet, 2010). extending the research, which has shown the widespread propensity for students to divide tasks, reducing opportunities for joint engagement (oţoiu et al. 2019), the present study also revealed the important relational implications this can have, including subgroup development off-task. especially in first-year university the opportunity to establish subgroup friendships off-task while working within a group may be enticing but as the present study suggests, can be detrimental for group dynamics and task participation. moreover, the qualitative analysis also showed that side-talk appeared even before tension was evident, signalling its potential in contributing to subgroup emergence also in positive groups, therefore participants need to be aware that seemingly inconsequential side-talk can be counterproductive if frequent and prolonged. indeed, one group a member made a significant contribution to side-talk, which was typically responded to only briefly, preventing its establishment as a relational dynamic, and the potential for subgroup development through off-task chat. alternatively, off-task talk when at whole group-level, enhanced group cohesion (barkaoui et al., 2008). during their interview, group a members themselves attributed the social cohesion they had developed as helpful to what they acknowledged as the challenging task of their group science reasoning. in contrast, at their interview group b participants expressed their aversion to the science reasoning in each lab, which the video-recordings showed at times appeared exacerbated by members not responding to each other’s contributions. this may have fuelled perceptions of this aspect of their groupwork as highly negative (rather than challenging) since strong emotion was also expressed about being ignored. the fine-grained qualitative analysis revealed that another key difference between the two groups was interpersonal attentiveness, with its relative absence in group b a key source of aggravation. as external observers it was relatively easier to discern attentiveness in group a’s responses to one another when coding interactions. in group b, systematic lack of acknowledgement of contributions came from all members, exacerbated by nonverbal behaviours such as low eye-contact, making it difficult to distinguish if participants were deliberately ignored or literally unheard. it appeared a combination of both, stemming from the subgroups and in turn further cementing them. do and schallert’s (2004) study of affect in class discussions found that adult students “tuned out” for numerous reasons, including if discussion was off-track, or to manage negative affect. when side-talk, and non jones, volet, pino-pasternak & heinimäki 65 | f l r responsiveness on-task arose early, group b participants may at times have mentally tuned out to one another. in the literature, interpersonal attentiveness has sometimes been observed as active listening, categorised as a positive socioemotional behaviour (e.g., garcía et al., 2020; isohätälä et al., 2018; rogat & linnenbrink-garcia, 2011). the term interpersonal attentiveness adopted in the present study acknowledges the reciprocal nature of active listening. this is consistent with scherer’s (2005) affect phenomena typology, which includes the interpersonal stances actors adopt in social interaction, such as an “active listening attitude” (garcía et al., 2020, p. 217) displayed through behaviours including eyegaze, nodding, and verbal responses (isohätälä et al., 2018). these were apparent in group a’s high frequency of positive interpersonal affect over the semester (e.g., responses of laughter, reciprocal lightening comments) during task interaction. in group b, an early tendency towards splitting the group, which evolved into subgroup emergence, increased other negative interpersonal affect behaviours, further reducing group-level interpersonal attentiveness. this led to the frustration and anger expressed in the interview, which although reported as stemming from one participant, the analysis revealed that low attentiveness was visible early from all members (i.e., group-level). group a did not exhibit, or report being or feeling unheard. the importance of attentiveness for productive collaboration and positive group dynamics outcomes has been shown in various contexts (e.g., barron, 2003; garcía et al., 2020; rogat & linnenbrink-garcia, 2011; ucan & webb, 2015) and this innately relational (i.e., interpersonal) aspect of groupwork deserves more empirical attention. 5. limitations, future research, and conclusion being an in-depth case study, our sample of participants was necessarily small, but our realtime behavioural data (n=6,500 frequencies) were substantial, capturing a broad range of interpersonal affect behaviours during both off-task and on-task interaction, providing a full picture of group dynamics. examining the contrast groups in five labs over a semester unveiled the wide range of interpersonal affect behaviours that otherwise would not have been revealed as sociodynamically manifest, their evolution as interactive patterns over time, and their function in task participation and the group dynamics outcomes of each group. the explorative case study design means that while the findings cannot be generalised to other groupwork situations, the contrasting nature of the groups contributes to observational studies that help to explain variability in groupwork outcomes through a detailed exploration of a wide range of interpersonal affect behaviours. theoretically, the collective cocreation of interpersonal affect in groupwork, manifest through yet underexplored, taken for granted behaviours may be more pervasive, and influential than is currently understood. although some participants referred to their prior groupwork experiences in the interviews (three in group b, and two in group a) and in video-recordings, the study did not include individual participant data such as prior experience of groupwork. it focused instead on the cocreated, collective nature of interpersonal affect in relational dynamics, given it is now typical in educational contexts and in the workplace, that actors are expected to enter groups with different levels of collaborative experience as well as other individual differences, such as knowledge, attitudes towards groupwork, and goals. however, future research that also includes individual-level background data could provide important insights into how particular individual differences interplay to influence affect, and other group dynamics. related to this point, how affect functions as interpersonal phenomena in socioculturally diverse groupwork settings is an important research area (kuppens et al., 2017; lehmann-willenbrock et al., 2014) of increasing relevance for education, the workplace, and social life more broadly. research on university students’ intercultural social interaction (e.g., ujitani & volet, 2008), for example, has shown that humour expression that is culturally insensitive can result in hurt feelings and misunderstandings. jones, volet, pino-pasternak & heinimäki 66 | f l r individual interviews might also have provided further insight in the present study, as students could be reluctant to fully share their real feelings with their peers. having one participant absent in each group interview is a limitation indicative of “messy” real-life research. the focus group interviews did, however, provide a window into the way in which each group spoke of an absent member, reflecting the contrasting group dynamics, and the way in which three group b members had co-constructed their own social meaning of their group dynamics. the study highlights the value of combining self-report and observations for gaining insight into group dynamics (vriesema & mccaslin, 2020), providing affect data as participants internal feelings and perceptions, and as visibly unfolding sociodynamic phenomena (barsade, 2002). importantly, their combined analysis revealed that participants were, variously, more, or less aware of their own behaviours and how they themselves created their group experiences and outcomes, revealing the extent “perceptions and actual behaviours are related to one another” (lehmann-willenbrock & chiu, 2018, p. 1156). riebe et al. (2016, p. 639) report in their review of higher education teamwork pedagogy that the literature also typically lacks “recognition that students [themselves] have a significant role to play when it comes to the achievement of teamwork learning outcomes.” distinguishing off-task from on-task interactions for analytical purposes confirmed barkaoui et al’s (2008) finding that all interaction is part of collaboration and therefore should be more widely incorporated into research (langer-osuna et al., 2020). the present study also unveiled actual behavioural referents of interpersonal affect as dynamically evolving joint action in groupwork, and perhaps most striking is that even in close proximity face-to-face around their worktable over an entire semester, group b participants complained of feeling unheard. according to ferreira (2021), a key issue for collaborative learning is whether participants are actually able to “understand what it takes to participate in joint action” (p. 1466). this may be better understood, ferreira (2021) proposes, by adopting an embodied perspective that can more deeply incorporate the function of nonverbal phenomena to provide new insights, such as how bodies are utilised in ways that foster or impede collaboration. this could be a fruitful avenue for exploring more deeply the joint nature (barron, 2003) of interpersonal attentiveness and how it develops as a group norm. in this study, we explored interpersonal affect in the relational realm of group interaction in a comparative case study with two small groups of students in the same class, who reported contrasting group dynamics outcomes (negative and dysfunctional; positive and collaborative). systematic coding traced the sequential flow of interpersonal affect in groups’ social interaction as it naturally ebbed and flowed through task focused (on-task) and more informal (off-task) interactions, revealing specific behaviours which arose during early group interaction that were formative for the different relational pathways that each group took over time. the study unveiled seemingly routine, everyday interpersonal affect behaviours (e.g., side-talk; humour; laughter) on a micro time-scale which, taken together, unfolded over the semester as interactive patterns, and group dynamics outcomes. keypoints exploring the interplay of off-task and on-task interactions enabled unique insights into interpersonal affect in groupwork. interpersonal affect emergent in groups’ first meetings, served as affective inputs to subsequent meetings. interpersonal affect behaviours evolved over time into macro-temporal interactive patterns and group process outcomes. combining observations and self-report data revealed variance in participants’ awareness of their own behaviours in co-creating their group dynamics. the study revealed that subgroups can emerge during off-task or on-task interaction, proving detrimental to group cohesion. jones, volet, pino-pasternak & heinimäki 67 | f l r acknowledgments the authors would like to thank the student and teacher participants in the research. the first author was supported by an australian government research training program (rtp) scholarship through the college of science, health, engineering and education, murdoch university, western australia. the work of the second and third authors was supported by the australian research council, under the discovery award (dp150101142), and the fourth author was supported by the turku university foundation (no. 080805). references american psychological association. (n.d.). affect. in apa dictionary of psychology. retrieved 23 january 2020 from https://dictionary.apa.org/affecthttps://dictionary.apa.org/?_ga=2.160359743.368224603.16 09865270-776417432.1606326331 baker, m., andrriessen, j., & järvelä, s. (eds.). (2013). affective learning together: social and emotional dimensions of collaborative learning, pp. 1-30. oxon, uk: routledge. bakhtiar, a., webster, e. a., & hadwin, a. f. (2018). regulation and socio-emotional interactions in a positive and a negative group climate. metacognition and learning, 13(1), 57-90. https://doi.org/10.1007/s11409-017-9178-x barkaoui, k., so, m., & suzuki, w. (2008). is it relevant? the role of off-task talk in collaborative learning. journal of applied linguistics, 5(1), 31-54. https://doi.org/10.1558/japl.v5i1.31 barron, b. (2003). when smart groups fail. journal of the learning sciences, 12(3), 307-359. https://doi.org/10.1207/s15327809jls1203_1 barsade, s. g. (2002). the ripple effect: emotional contagion and its influence on group behavior. administrative science quarterly, 47(4), 644–675. https://doi.org/10.2307/3094912 barsade, s. g., & knight, a. p. (2015). group affect. annual review of organizational psychology and organizational behavior, 2, 21-46. https://doi. org/10.1177/0963721412438352 bartel, c. a., & saavedra, r. (2000). the collective construction of work group moods. administrative science quarterly, 45(2), 197-231. https://doi.org/10.2307/2667070 braun, m. t., kozlowski, s. w., brown, t. a., & deshon, r. p. (2020). exploring the dynamic team cohesion–performance and coordination–performance relationships of newly formed teams. small group research, 51(5), 551-580. https://doi.org/10.1177/1046496420907157 casciaro, t., & lobo, m. s. (2008). when competence is irrelevant: the role of interpersonal affect in task-related ties. administrative science quarterly, 53(4), 655-684. https://doi.org/10.2189/asqu.53.4.655 curşeu, p. l., chappin, m. m., & jansen, r. j. (2018). gender diversity and motivation in collaborative learning groups: the mediating role of group discussion quality. social psychology of education, 21, 289-302. https://doi:10.1007/s11218-017-9419-5 dillenbourg, p. (1999). what do you mean by collaborative learning? in p. dillenbourg, (ed.), collaborative learning: cognitive and computational approaches (pp. 1-19). elsevier. do, s. l., & schallert, d. l. (2004). emotions and classroom talk: toward a model of the role of affect in students’ experiences of classroom discussions. journal of educational psychology, 96(4), 619-634. https://doi.org/10.1037/0022-0663.96.4.619 jones, volet, pino-pasternak & heinimäki 68 | f l r ferreira, j. m. (2021). what if we look at the body? an embodied perspective of collaborative learning. educational psychology review, 33(4), 1455-1473. https://doi.org/10.1007/s10648021-09607-8 forsyth, d. r. (2014). group dynamics (6th ed.). wadsworth cengage learning. garcía, a., olivares, h., simão, l. m., & dominguez, a. l. (2020). socioemotional interactions in collaborative learning: an analysis from the perspective of semiotic cultural psychology. culture & psychology, 1-19. https://doi.org/10.1177/1354067x20976513 gorse, c. a., & emmitt, s. (2009). informal interaction in construction progress meetings. construction management and economics, 27(10), 983-993. https://doi.org/10.1080/01446190903179710 hallgren, k. a. (2012). computing inter-rater reliability for observational data: an overview and tutorial. tutorials in quantitative methods for psychology, 8(1), 23-34. https://doi.org/10.20982/tqmp.08.1.p023 hess, u., & hareli, s. (2019). the emotion-based inferences in context (ebic) model. in u. hess & s. hareli (eds.), the social nature of emotion expression: what emotions can tell us about the world (pp. 1-5). springer. https://doi.org/10.1007/978-3-030-32968-6 huber, m. (2020). video-based content analysis. in m. huber & d. e. froehlich (eds.), analyzing group interactions: a guidebook for qualitative, quantitative and mixed methods (pp. 37–48). routledge. isohätälä, j., näykki, p., & järvelä, s. (2019). cognitive and socio-emotional interaction in collaborative learning: exploring fluctuations in students’ participation. scandinavian journal of educational research, 64(6), 831-851. https://doi.org/10.1080/00313831.2019.1623310 isohätälä, j., näykki, p., järvelä, s., & baker, m. j. (2018). striking a balance: socio-emotional processes during argumentation in collaborative learning interaction. learning, culture, and social interaction, 16, 1-19. https://doi.org/10.1016/j.lcsi.2017.09.003 järvenoja, h., järvelä, s., & malmberg, j. (2017). supporting groups’ emotion and motivation regulation during collaborative learning. learning and instruction, 70, 101090 https://doi.org/10.1016/j.learninstruc.2017.11.004 järvenoja, h., näykki, p., & törmänen, t. (2019). emotional regulation in collaborative learning: when do higher education students activate group level regulation in the face of challenges? studies in higher education, 44(10), 1747–1757. https://doi.org/10.1080/03075079.2019.1665318 jones, c., volet, s., & pino-pasternak, d. (2021). observational research in face-to-face small groupwork: capturing affect as socio-dynamic interpersonal phenomena. small group research, 52(3), 341-376. https://doi.org/10.1177/104649642098592 kauffeld, s., & lehmann-willenbrock, n. (2012). meetings matter: effects of team meetings on team and organizational success. small group research, 43(2), 128-156. https://doi.org/10.1177/1046496411429599 knight, a. p., & eisenkraft, n. (2015). positive is usually good, negative is not always bad: the effects of group affect on social integration and task performance. journal of applied psychology, 100(4), 1214–1227. https://doi.org/10.1037/apl0000006 kreijns, k., kirschner, p. a., & jochems, w. (2003). identifying the pitfalls for social interaction in computer-supported collaborative learning environments: a review of the research. computers in human behavior, 19(3), 335-353. https://doi:10.1016/s0747-5632(02)00057-2 jones, volet, pino-pasternak & heinimäki 69 | f l r kuppens, p. (2015). it’s about time: a special section on affect dynamics. emotion review, 7(4), 297-300. https://doi.org/10.1177/1754073915590947 kuppens, p., tuerlinckx, f., yik, m., koval, p., coosemans, j., zeng, k. j., & russell, j. a. (2017). the relation between valence and arousal in subjective experience varies with personality and culture. journal of personality 85(4), 530-542. https://doi.org/10.1111/jopy.12258 kwon, k., liu, y-h., & johnson, l. p. (2014). group regulation and social-emotional interactions observed in computer supported collaborative learning: comparison between good vs. poor collaborators. computers & education, 78, 185-200. https://doi.org/10.1016/j.compedu.2014.06.004 langer-osuna, j. m., gargroetzi, e., munson, j., & chavez, r. (2020). exploring the role of off-task activity on students’ collaborative dynamics. journal of educational psychology, 112(3), 514532. http://dx.doi.org/10.1037/edu0000464 lehmann-willenbrock, n., allen, j. a., & meinecke, a. l. (2014). observing culture: differences in u.s.-american and german team meeting behaviors. group processes & intergroup relations,17(2) 252-271. https://doi.org/10.1177/1368430213497066 lehmann-willenbrock, n., & chiu, m. m. (2018). igniting and resolving content disagreements during team interactions: a statistical discourse analysis of team dynamics at work. journal of organizational behavior, 39(9), 1142-1162. https://doi.org/10.1002/job.2256 lobczowski, n. g., lyons, k., greene, j. a., & mclaughlin, j. e. (2021). socioemotional regulation strategies in a project-based learning environment. contemporary educational psychology, 65, 101968. https://doi.org/10.1016/j.cedpsych.2021.101968 mesquita, b., & boiger, m. (2014). emotions in context: a sociodynamic model of emotions. emotion review, 6(4), 298-302. https://doi.org/10.1177/1754073914534480 mills, a. j., durepos, g., & wiebe, e. (eds.). (2012). comparative case study. in encyclopedia of case study research (pp. 175-176). sage publications. https://doi.org/10.4135/9781412957397 näykki, p., järvelä, s., kirschner, p. a., & järvenoja, h. (2014). socio-emotional conflict in collaborative learning: a process-oriented case study in a higher education context. international journal of educational research, 68, 1-14. https://doi.org/10.1016/j.ijer.2014.07.001 oțoiu, c., rațiu, l., & rus, c. l. (2019). rivals when we work together: team rivalry effects on performance in collaborative learning groups. administrative sciences, 9(3), 61. https://doi.org/10.3390/admsci9030061 poupore, g. (2018). a complex systems investigation of group work dynamics in l2 interactive tasks. the modern language journal, 102(2), 350-370. https://doi.org/10.1111/modl.12467 riebe, l., girardi, a., & whitsed, c. (2016). a systematic literature review of teamwork pedagogy in higher education. small group research, 47(2), 619-664. https://doi.org/10.1177/1046496416665221 rogat, t. k., and linnenbrink-garcia, l. (2011). socially shared regulation in collaborative groups: an analysis of the interplay between quality of social regulation and group processes. cognition and instruction, 29(4), 375-415. https://doi.org/10.1080/07370008.2011.607930 rogat, t. k., & linnenbrink-garcia, l. (2013). understanding the quality variation of socially shared regulation: a focus on methodology. in m. vauras & s. volet (eds.), interpersonal regulation of learning and motivation: methodological advances (pp. 102-125). routledge. scherer, k. r. (2005). what are emotions? and how can they be measured? social science information, 44(4), 695-729. https://doi.org/10.1177/0539018405058216 jones, volet, pino-pasternak & heinimäki 70 | f l r sherin, m. g. (2004). new perspectives on the role of video in teacher education. in j. brophy (ed.), advances in research on teaching: vol. 10. using video in teacher education (pp. 1–27). elsevier. sheets-johnstone, m. (2009). animation: the fundamental, essential, and properly descriptive concept. continental philosophy review, 42 (3), 375-400. https://doi.org/10.1007/s11007009-9109-x slaby, j. (2016). relational affect. working paper sfb 1171 affective societies 02/16. http://edocs.fuberlin.de/docs/receive/fudocs_series_000000000562 sohr, e. r., gupta a., & elby, a. (2018). taking an escape hatch: managing tension in group discourse. science education, 102(5), 883-916. https://doi.org/10.1002/sce.21448 summers, m., & volet, s. (2010). group work does not necessarily equal collaborative learning: evidence from observations and self-reports. european journal of psychology of education, 25(4), 473–492. https://doi.org/10.1007/s10212-010-0026-5 ucan, s., & webb, m. (2015). social regulation of learning during collaborative inquiry learning in science: how does it emerge and what are its functions? international journal of science education, 37(15), 2503-2532. https://doi.org/10.1080/09500693.2015.1083634 ujitani e, & volet, s. socio-emotional challenges in international education: insight into reciprocal understanding and intercultural relational development. journal of research in international education, 7(3), 279-303. https://doi.org/10.1177/1475240908099975 van kleef, g. a. (2021). comment: moving (further) beyond private experience: on the radicalization of the social approach to emotions and the emancipation of verbal emotional expressions. emotion review, 13(2), 90-94. https://doi.org/10.1177/1754073921991231 vriesema, c. c., & mccaslin, m. (2020). experience and meaning in small-group contexts: fusing observational and self-report data to capture self and other dynamics. frontline learning research, 8(3), 126-139. https://doi.org/10.14786/flr.v8i3.493 vygotsky, l. s. (1978). mind in society: the development of higher psychological processes. harvard university press. yin. r. k. (2018). case study research and applications: design and methods (6th ed.). sage publications inc. yoerger, m., allen. j. a., & crowe. j. (2018). the impact of premeeting talk on group performance. small group research, 49(2) 226-258. https://doi.org/10.1177/1046496417744883 jones, volet, pino-pasternak & heinimäki 71 | f l r appendix a group task and activities in five labs jones, volet, pino-pasternak & heinimäki 72 | f l r appendix b coding scheme for individual-level interpersonal affect behaviours in groupwork jones, volet, pino-pasternak & heinimäki 73 | f l r appendix c coding scheme for individual-level interpersonal affect behaviours in groupwork with data examples behavioural category and code example off-task positive small talk “how does everyone else feel? good?”; “do you have a cold?”; “no i’ve got allergies”; “running late, were you?”; “yeah, i missed the train” humour and joking “it’s like a star wars punishment?”; “you’re gonna be a quality dad, you’ve already got your dad jokes ready!” laughter laughter that is related to the social chat off-task negative small talk negative “i’m just tired. i’m sick. we’re all sick”; “yeah, don’t become teachers you’ll be sick”; “i swear it’s getting worse as the day goes on” side-talk “we’ll have to ask [name] how his surgery went…”; “he seems like a nice guy” [looking at other/s phone] using mobile phone using phone for personal purposes: scrolling; texting; talking on phone on-task positive inclusiveness “now, [name], you can do the honours it you want”; “i think we’ve got this! all over it!”; “everyone else agree with that?” offering praise or support “ooh, look at his prep skills, it’s immaculate!”; “that’s so good [name]!”; “do you want help?”; “oh yeah, good point!” showing enthusiasm, interest “i’m still blasting rockets in the air, it’s still cool!”; “this should be interesting. listen, listen! it’s sizzling!” “that’s awesome! i wonder why it’s flashing like that…” lightening the atmosphere “the hot air from my mouth could keep it up in the air!”; “yeah, we just don’t have the power captain” “oh, she might have madness to her reasoning” laughter laughter that is related to the task focus on-task negative complaining; negative expressions “i’m bored”; “i can’t be bothered!”; “i’m starting to hate this experiment” criticising/running someone down “…your handwriting’s driving me insane”; “no, it does! she’s wrong!”; “we spend half an hour organising what we’re gonna do abrupt, curt or rude behaviour; interrupting to over-rule; ignoring “shut up!”; “noooo, hang on!”; “you’re reading the wrong one!” “oh no. it’s a circuit love. not just to play…there’s a certain way to connect things!” disrupting “you calm down!” [said jokingly to member reading aloud conceptual question, stopping task discussion from proceeding]; “are you going to your class today?” [spoken as member is science explaining, disrupting conceptual reasoning] jones, volet, pino-pasternak & heinimäki 74 | f l r splitting group “well do we want to split in two’s? two make the oobleck and two make the psylli slime?”; “we’ll continue doing this if you guys want to do that” non-affective task interaction [non-affective] “what did you write?”; “okay put green in the middle there and then we create the series circuit”; “it doesn’t do anything. it’s an insulator. it doesn’t conduct” empty talk “what was i gonna say?”; “okay i need to write my name…” jones, volet, pino-pasternak & heinimäki 75 | f l r appendix d breakdown of interpersonal affect behavioural codes by frequency (and %) over time in five labs frontline learning research 6 (2014) 25-33 issn 2295-3159 corresponding author: karsten stegmann, department psychology, lmu munich, leopoldstrasse 13, 80802 munich, germany, email: stegmann@lmu.de doi: http://dx.doi.org/10.14786/flr.v2i4.112 25 | f l r constructing nomological nets on the basis of process analyses to strengthen cscl research karsten stegmann a a department psychology, lmu munich, germany article received 26 may 2014 / revised 6 august 2014 / accepted 29 september 2014 / available online 23 december 2014 abstract due to the nature of collaborative learning, realising perfectly controlled experiments often requires an unreasonable amount of resources and sometimes it is not possible at all. against this background, i propose to augment as good as feasible experimental design with a nomological net of relations between instructional support (intervention), learning processes and learning outcomes. nomological networks are known from construct validity. in construct validity, the relations between variables (e.g. group differences, correlation matrices) are used to provide evidence for the validity of a measure. by adding multiple process and outcome variables together with the corresponding relations between intervention, process and outcome, the validity of causal relations found can be strengthened. i suggest adopting quality criteria from good research designs to evaluate the nomological nets. the resulting net needs to be (1) theory grounded, (2) situational, (3) feasible, (4) redundant, and (5) efficient. by making these nomological nets explicit and by designing them according to the presented criteria, cscl research becomes more potent: the risk of inconclusive results is reduced while results that form a consistent nomological net can be interpreted with a stronger confidence, even if the experimental design has some flaws. if this becomes standard in cscl research, it can be expected to contribute significantly better to knowledge accumulation in this area of research. keywords: construct validity; nomological net; research design; computer-supported collaborative learning (cscl) k. stegmann 26 | f l r 1. the role of process analyses in cscl research approaches to computer-supported collaborative learning (cscl) are mainly based on three assumptions. first, collaborative learning outperforms (under particular circumstances, e.g. with specific support) other methods when it comes to learning outcomes. usually, specific collaborative activities like argumentation (e.g. clark, d'angelo, & menekse, 2009), transactive co-construction (e.g. molinari et al., 2013; weinberger & fischer, 2006), reciprocal teaching (palincsar & brown, 1984) and collaborative concept mapping (van boxtel, van der linden, roelofs, & erkens, 2002) are considered to be positively related to individual cognitive processes of learning. second, computer support enables both certain learning activities (e.g. simulation-based inquiry learning; de jong & van joolingen, 1998) and more direct support for certain activities (e.g. scaffolds as an inherent, but adaptive, component of the learning environment; cf. koschmann, 1994). technology enables natural systems and phenomena that would otherwise be invisible and therefore impossible to be experienced (e.g. the heart of an engine or magnetism on an atomic level; cf. fischer, lowe, & schwan, 2008). technology also enables us to facilitate learning processes by different means, e.g. by making various resources accessible (e.g. osborne & hennessey, 2003), scaffolding specific individual processes like the construction of single arguments (stegmann, wecker, weinberger, & fischer, 2012), or offering ways to communicate and collaborate (wegerif, 2002). third, the combination of collaborative learning and technology can have positive interaction effects that go beyond the simple combination of main effects. on the one hand, the quality of collaborative learning processes is lifted through adaptive scaffolds that positively moderate the positive effects of collaborative learning. on the other hand, the effects of technology functions (like access to various resources) on learning outcomes are boosted through collaborative learning (cf. weinberger, stegmann, & fischer, 2010). set against this background, cscl research aims to provide knowledge about how technology can support collaborative learning processes (and thereby learning outcomes on an individual as well as a group level; cf. stahl, 2006) most effectively. on the one hand, the problems that may arise through collaboration or the use of technology have to be minimised, while, on the other, the use of technology resources and collaborative learning processes has to be optimised. the effect of cscl on learning outcomes is therefore mediated by processes that occur during the collaborative learning phase. this general model can be described in a triangle of hypotheses (cf. wecker, stegmann, & fischer, 2012; fig. 1): (a) instructional/technological support facilitates learning activities; (b) facilitated learning activities have positive effects on learning outcomes; and (c) mediated by learning activities, instructional/technological support has a positive effect on learning outcomes. figure 1: general triangle of hypotheses in cscl research. to test this triangle of hypotheses and to allow researchers to infer causal-effect relations, three conditions must be fulfilled (cf. cook & campbell, 1979): (a) when the causing variable varies, the affected variable must vary too (covariation); (b) the cause must occur before the effect occurs (temporal precedence); and (c) no plausible alternative explanations exist. while the first two conditions can be reached in cscl research rather easily, the third condition is very difficult to reach. cscl research often takes place in field-like settings and even studies with a rather higher level of control (e.g., weinberger, marttunen, laurinen, & stegmann, 2013) are much less controlled than classical psychological experiments. k. stegmann 27 | f l r the adherence to instructional advice, for example, is usually not enforced. testing the effect of an intervention on learning activities is, therefore, experimentally a variation check, but semantically the test of whether the way the instruction is realised is able to induce the intended behaviour. due to the nature of collaborative learning, realising perfectly controlled experiments requires an unreasonable amount of resources and sometimes it is not possible at all. just imagine a jigsaw experiment (for a detailed description of the jigsaw method see aronson, 1978). in a jigsaw script, the content to be learnt is split into, for example, four subtopics. groups of four prepare one of the four topics and finally four new groups are formed with one learner from each of the previous groups and learners teach their subtopic to the other group members. in this experiment, individual learning, unscripted collaborative learning and collaborative learning using the jigsaw method are compared. in the individual learning condition, 32 subjects are enough if a large effect with 80% power is expected. in the condition with unscripted collaborative learning in groups of four, the number of subjects might be optimally, due to nestedness of data, 128 learners in 32 groups. in the jigsaw condition, 512 subjects are required due to the fact that 16 subjects learn collaboratively together in 32 groups. according to maas and hox (2005), a number of more than 50 groups is needed for acceptable statistical multilevel analyses. with 64 groups across two conditions, this criterion is fulfilled. finally, this simple one-factorial design with three conditions requires 672 subjects to find effects with large effect size. and still it would not be free of confounded factors, e.g. the type of support is confounded with group size. while a condition with unscripted learners in groups of 16 would be possible, a condition with jigsaw script with groups of four is not possible. this example illustrates the inherent problems of cscl research in excluding plausible alternative explanations through experimental design. against this background, i propose to augment as good as feasible experimental design with a nomological net of relations (hypotheses) between instructional support (intervention), learning processes and learning outcomes. 2. criteria for nomological nets as the basis of high quality cscl research nomological networks are known from construct validity (cf. cronbach & meehl, 1955). in construct validity, the relations between variables (e.g. group differences, correlation matrices) are used to provide evidence for the validity of a measure. i like to utilise this idea to validate the (causal) relation between interventions, mediators and outcome variables. the smallest nomological net possible comprises just two variables and one relation, but does not yet additionally validate a causal relation. the net, however, becomes stronger the more ties and knots are part of the net. the stronger the net, the stronger the confidence in the validity of causal relations between variables. the smallest net that can increase the confidence in a causal relation in cscl research is the triangle of hypotheses described previously with three knots and three relations. by adding multiple process and outcome variables together with the corresponding relations between intervention, process and outcome, the validity of causal relations found can be strengthened. the development of such a nomological net requires some general quality criteria that allow evaluation of the net. i suggest adopting quality criteria from good research designs as described, for example, by trochim and land (1982). according to these authors, the nature of good designs is (1) theory grounded, (2) situational, (3) feasible, (4) redundant, and (5) efficient. in the following sections, i provide a short explanation of the criteria and some illustrating examples from cscl research for each criterion. 2.1 theory grounded the nomological net needs to be theory grounded. for each of the relations, a directional effect needs to be explicable by a theory and may be backed up by previous empirical findings. the question, for example, concerning the extent to which learners mutually influence one another has attracted considerable attention in cscl research. particular focus has been placed on the degree to which groups of learners share a mutual understanding via social interaction. accordingly, attempts have been made to quantify this process, which is referred to as knowledge convergence, based on analyses of text-based knowledge-building processes. mäkitalo-siegl and colleagues (2012) traced so-called knowledge pieces through collaboration. k. stegmann 28 | f l r the transfer of knowledge pieces from one learner to another was measured by comparing the knowledge pieces mentioned by single learners before, during and after collaboration. the relations in the corresponding nomological net would be that collaborative learners share more knowledge pieces during collaboration and that these shared knowledge pieces are known better by group members after collaboration. weinberger, stegmann and fischer (2007) presented an approach with additional quantitative measures of the convergence of prior knowledge, collaborative processes and acquired knowledge based on fine-grained (i.e. at the level of inferences) analyses of text-based data sources (pretest, text-based online discussion, post-test). the authors suggest using the variation coefficient of the number of different inferences within a group of learners (analogous to knowledge pieces) as an indicator of knowledge divergence. applying these measures provided insight into the relationship between the processes and outcomes of collaborative knowledge construction (e.g., weinberger, stegmann, & fischer, 2005; zottmann, et al., 2013). the nomological net may comprise, for example, the relation that learners with high divergence during online discussions (i.e. contributing different as opposed to identical inferences) are more likely to share knowledge after collaboration than learners with high convergence during online discussions. in real life, however, not all assumed relations show up as expected in empirical research. the strength of the net is, of course, stronger if all hypotheses made a priori test successfully. in practice, a net needs to be adopted post hoc to explain the results at hand. in these cases, additional explanatory variables and relations may be added to achieve a consistent net. the new relations need, of course, to be theory grounded as well. by adding further mediators, moderators and/or suppressors, the results can form a consistent net again. as a by-product, the adaption of a net is a further development of the initial theory. 2.2 situational the manipulated/measured variables as well as the relations between them need to be situational defined. the interventions, the process variables and, to a large extent, the outcome variables in cscl research are highly situational, i.e. depend on the situation, content and context at hand. the intervention is usually one realisation out of an endless number of possible realisations of a specific theoretical principle (e.g. an implementation of the jigsaw method; cf. wecker, 2013). the process and outcome variables may be termed rather general (such as “content quality”, “quality of argumentation”; cf. stegmann, weinberger, & fischer, 2007), but the concrete operationalisation requires the inclusion of bottom-up criteria that derive from the (learning) material and raw process data. in most cases, especially in the case of process variables, general (i.e. less situational) measures of skills or competences (like those applied in pisa studies; cf. kobarg, prenzel, & seidel, 2011) are not suitable, because they measure features (abilities) of persons, not activities. it is, therefore, necessary to apply measures with a high content validity. such measures usually need to be developed individually for each situation. it is necessary to quantify the qualities of collaborative activities with respect to multiple quality dimensions. a detailed description of such a multidimensional approach for the qualitative coding of online discussions (maqcod) can be found in stegmann and fischer (2011). collaborative (learning) activities are, for example, often analysed in terms of the content quality or quality of the argumentation (e.g. weinberger & fischer, 2006). depending on the dimension, the grain size of the analysis needs to be defined (cf. strijbos, martens, prins, & jochems, 2006). the quality of the argumentation, for example, could be defined for the entire discourse, single messages or even single arguments. which grain size is most suitable depends on the theoretically defined relationship between the quality dimension to be analysed and the intended learning outcome. if, for example, the theoretical model assumes that formulating arguments with grounds and warrants is a core collaborative learning activity, single arguments rather than complete conversations need to be the focus. along with grain size, the categories need to be defined. single arguments can, for instance, be coded in terms of whether they are grounded or not. the definitions of the dimensions, grain size and categories per dimension need to be carefully documented. this may comprise: segmentation rules and examples for the application of these rules; the names of dimensions and categories; rules about when to assign a specific category; and examples when a category applies and when it does not. this documentation forms the basis for an objective, reliable k. stegmann 29 | f l r and content-valid coding of learning activities and, thereby, for the inclusion of process variables in a nomological net. 2.3 feasible the inclusion of process variables in a nomological net requires that it is feasible to extract the variable from the recorded activities during cscl. it is, for example, problematic to measure the depth of cognitive processing just by analysing written cscl discourse data. researchers may argue that a sophisticated, well-elaborated argument can be regarded as an indicator of deep cognitive processing. the argument, however, might be just “copied” from a different source (e.g. a learning partner or prior knowledge) without deep cognitive processing. such a measure, therefore, does not have sufficient content validity to be included in a nomological net with the intended function. this is not an argument against the variable “depth of cognitive processing” in general. the requirement is to measure the variable in a contentvalid way, i.e. as directly as possible. if data sources such as think-aloud protocols are available (e.g. stegmann, wecker, weinberger, & fischer, 2012), it might be adequate to include such a variable in a nomological net. 2.4 redundant like in a cockpit of an aeroplane, central components of the nomological net might be redundant, i.e. implemented several times. in cscl research, this redundancy is often regarded as a methodological challenge rather than a strength of the research design. an inherent feature of collaborative learning is the nestedness of learners in groups and in time. learners are part of a group and thereby features of the group affect learning. in addition, the knowledge and skills of the single learner as well as of the group change over time (wise & chiu, 2011). the activities of learners in a group are affected not only by the initial features of the single learners and the collaborative learning phase, but also by the activities that the single learner and the group performed previously. furthermore, instructional support such as collaboration scripts affects the relationship between previous and current activities. the nestedness of learners in groups and in time is not only an issue if researchers aim to understand why learners learn in a certain way; learners and groups also change over time. the script theory of guidance (stog; fischer, kollar, stegmann, & wecker, 2013), for example, assumes that instructional support for collaborative learning needs to be adapted consecutively to ensure the optimal fit between the skills of the single learners/the group and the instructional support. as a result, a nomological net may include relations that reflect such ideas but add time as a moderator of the effect of an intervention on the process of collaborative learning. as already raised in the jigsaw study example, (quantitative) research on cscl often requires many more participants due to the issue that learners who learned in groups cannot be regarded as independent observations. from the viewpoint of the nomological net, the relation, for example, of an intervention at group level on a specific process is redundant within a group. for an intervention that aims to facilitate, for example, argumentation sequences (e.g., stegmann, weinberger, & fischer, 2011; jeong, clark, sampson & menekse, 2010), a positive relation between the intervention and the number of argument-counterargumentsyntheses sequences may be added to the nomological net at group level. on an individual level (i.e. the level of group members), positive relations between the intervention and the number of contributed counterarguments and syntheses may be added to the net. furthermore, scaffolds examined in cscl research often focus on specific processes that occur multiple times during collaborative learning. if an intervention such as a collaboration script aims to support the quality of each single argument (cf. stegmann, wecker, weinberger, & fischer, 2012) with respect to grounds and warrants, the effect is expected to show up on the level of each argument (as the probability that a claim is supported by ground and/or warrant), on the level of each individual learner (as the share of grounded/warranted claims contributed by an individual), and on the group level (as a higher argumentative quality of the discussion). k. stegmann 30 | f l r 2.5 efficient an important aspect, finally, is the efficiency of the testing of the relations specified in the nomological net. efficiency is in general determined by two factors: the usefulness of a result and the extent of resources required to reach the result. this criterion seems to be contradictory to the examples for the previously described criteria. these criteria require rather qualitative analyses of processes on multiple levels including time series. many more resources (i.e. technology to record data, space to archive the data, manpower to develop coding manuals and to analyse data) need to be spent. in cscl research, however, the digital learning environment in which learning activities under examination usually take place can reduce the amount of resources. especially assessing and analysing data are supported by technology. technologies like ibeacon or active rfid chips allow learners to be traced as well as the interaction between them and artefacts to learn from in the context, for example, of museums. eberle and colleagues (2013), for example, traced the activities of conference participants using active rfid chips to examine the relation between interaction between conference participants, planned future collaboration right after the conference and collaborative publications two years after the conference. in scenarios with computer-mediated communication, the communication can easily be recorded. the opportunity to log data in technology-enhanced learning environments can easily produce a large amount of data that exceeds the limits for meaningful human analyses. the development in the area of machine learning technology, however, enables researchers to train algorithms to – supervised or unsupervised – analyse digital data according to multiple dimensions such as quality of argumentation, content quality or emotions. to apply these algorithms to data at hand, in a first step, features of the learning processes need to be extracted. in the case of written discourse data, for example, the number of specific words or word pairs, the punctuation or the line length are extracted. this step can be easily performed using tools such as taghelper (rosé et al., 2008) or lightside (mayfield & rosé, 2012). in a second step, these features are used in conjunction with a human coding that serves as training material to build models that are able to measure the respective quality. recently, mu and colleagues (2012) presented the acodea framework, which may serve as a blueprint on how to apply this technology in cscl research. the empirical results presented by mu and colleagues (2012) show that this procedure enables objective analyses of texts that were not previously used for training to be conducted. the results obtained were at the same level as those produced by human coding, and were in some cases even better than those produced by interhuman objectivity. while the described application of technology in the research process contributes to a reduction of required resources, i further argue that the usefulness of the results is increased by testing a comprehensive nomological net in comparison to results not embedded in a net. testing the three types of hypotheses of the general triangle of hypotheses (cf. fig. 1) increases the probability that a study will produce useful results (and not just because three times more hypotheses are tested). if all of the three hypotheses are significant, this can be regarded as a validation of the underlying theoretical model. however, if one or two of the three hypotheses fail (e.g. the relationship between learning activities and learning outcomes), but the others are significant, the findings provide a starting point for explanations that may improve the initial theoretical model. it is only if all of the hypotheses fail to be significant that the empirical results will be completely inconclusive regarding generalisability and causal relations. nevertheless, a more in-depth analysis of learning activities and post hoc adaption of the nomological net still provides insights into the mechanisms of learning, regardless of significant effects on learning outcomes. 3. conclusion the general structure of cscl research can be described using the introduced triangle of hypotheses. therefore, nomological nets are an inherent feature of cscl research. by making these nomological nets explicit and by designing them according to the presented criteria, the research becomes more potent: the risk of inconclusive results is reduced while results that form a coherent nomological net can be interpreted with a stronger confidence even if the experimental design has some flaws. this, however, k. stegmann 31 | f l r is by no means an argument to conduct studies with an easily improvable experimental design or to skip experimental variation completely. an as good as possible experimental design is the basic prerequisite for the nomological net to contribute to strengthening the confidence in causal interpretations of effects. the suggestion to use a nomological net as described is, nevertheless, not limited to quantitative research approaches. some if not all relations might be examined with qualitative methods. the effect on the confidence in the interpretation is the same as in quantitative methods: it increases. actually, i would expect quantitative and qualitative methods to be used in a complementary way to form nomological nets in cscl research. the explication of the nomological net, therefore, should become obligatory in reports and presentations on research in cscl. studies that aim to provide evidence for causal relations need to report effects on processes and outcomes, not either or. the processes have to be analysed in a way that ensures content validity. the (statistical) analyses have to make use of the multilevel structure of the process data. new technologies have to be applied to cope with the vast amount of data. if this becomes standard in cscl research, it can be expected to contribute significantly better to knowledge accumulation in this area of research. keypoints realising perfectly controlled experiments often requires unreasonable amount of resources and sometimes it is not possible at all. by adding multiple process and outcome variables together with the according relations between intervention, process and outcome into a nomological net, the validity of causal relations found can be strengthened. the resulting nomological net needs to be (1) theory grounded, (2) situational, (3) feasible, (4) redundant, and (5) efficient incorporating nomological nets reduce the risk of inconclusive results while results that form a consistent nomological net can be interpreted with a stronger confidence, even if the experimental design has some flaws. references aronson, e. (1978). the jigsaw classroom. london: sage. clark, d. b., d'angelo, c. m. & menekse, m. (2009). initial structuring of online discussions to improve learning and argumentation: incorporating students' own explanations as seed comments versus an augmented-preset approach to seeding discussions. journal of science education and technology, 18, 321-333. doi:10.1007/s10956-009-9159-1 cook, t. d., & campbell, d. t. (1979). quasi-experimentation: design and analysis for field setting. ma: houghton mifflin. cronbach, l. j., & meehl, p. e. (1955). construct validity in psychological tests. psychological bulletin, 52(4), 281. doi:10.1037/h0040957 de jong & van joolingen (1998). scientific discovery learning with computer simulations of conceptual domains. review of educational research, 68, 179-201. doi:10.3102/00346543068002179 eberle, j., stegmann, k., lund, k., barrat, a., sailer, m., & fischer, f. (2013). fostering learning and collaboration in a scientific community – evidence from an experiment using rfid devices to measure collaborative processes. in n. rummel, m. kapur, m. nathan, & s. puntambekar, s. (eds.), to see the world and a grain of sand: learning across levels of space, time, and scale: cscl 2013 conference proceedings volume 1 — full papers & symposia (pp. 169-175). international society of the learning sciences. k. stegmann 32 | f l r jeong, a., clark, d. b., sampson, v. d., & menekse, m. (2010). sequential analysis of scientific argumentation in asynchronous online discussion environments. in s. puntambekar, g. erkens & c. hmelo-silver (eds.), analyzing interactions in cscl: methodologies, approaches and issues. berlin: springer. doi:10.1007/978-1-4419-7710-6_10 fischer, f., kollar, i., stegmann, k., & wecker, c. (2013). toward a script theory of guidance in computersupported collaborative learning. educational psychologist, 48(1), 56-66. doi:10.1080/00461520.2012.748005 fischer, s., lowe, r. k., & schwan, s. (2008). effects of presentation speed of a dynamic visualization on the understanding of a mechanical system. applied cognitive psychology, 22(8), 1126-1141. doi:10.1002/acp.1426 kobarg, m., prenzel, m., & seidel, t. (2011). an international comparison of science teaching and learning. further results from pisa 2006. münster: waxmann verlag. koschmann, t. d. (1994). toward a theory of computer support for collaborative learning. the journal of the learning sciences, 3(3), 219-225. doi:10.1207/s15327809jls0303_1 maas, c. j. m., & hox, j. (2005). sufficient samples sizes for multilevel modeling. methodology, 1, 86–92. doi:10.1027/1614-1881.1.3.86 mayfield, e., & rosé, c. p. (2012). lightside: open source machine learning for text accessible to nonexperts. in m. d. shermis & j. burstein (eds.), handbook of automated essay grading (pp. 124135). new york: routledge. mäkitalo-siegl, k., stegmann, k., frete, a., & streng, s. (2012). orchestrating computer-supported collaborative learning: effects of knowledge sharing and shared knowledge. in s. abramovich (ed.), computers in education (pp. 75-91). commack, ny: nova science publishers. molinari, g., chanel, g., betrancourt, m., pun, t, & bozelle, c. (2013). emotion feedback during computer-mediated collaboration: effects on self-reported emotions and perceived interaction. in n. rummel, m. kapur, m. nathan, & s. puntambekar, s. (eds.), to see the world and a grain of sand: learning across levels of space, time, and scale: cscl 2013 conference proceedings volume 1 — full papers & symposia (pp. 336-343). international society of the learning sciences. mu, j., stegmann, k., mayfield, e., rosé, c. & fischer, f. (2012). the acodea framework: developing segmentation and classification schemes for fully automatic analysis of online discussions. international journal of computer-supported collaborative learning, 7(2), 285–305. doi:10.1007/s11412-012-9147-y osborne, j., & henessy, s. (2003). literature review in science education and the role of ict: promise, problems and future directions. bristol: nesta futurelab. retrieved from http://www.futurelab.org.uk/resources/publications_reports_articles/literature_reviews /literature_review380 palincsar, a. s., & brown, a. l. (1984). reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. cognition & instruction, 1(2), 117-175. doi:10.1207/s1532690xci0102_1 rosé, c. p., wang, y. c., arguello, j., stegmann, k., weinberger, a., & fischer, f. (2008). analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. international journal of computer-supported collaborative learning, 3(3), 237-271. doi:10.1007/s11412-007-9034-0 stahl, g. (2006). group cognition. cambridge, ma: mit press. stegmann, k. & fischer, f. (2011). quantifying qualities in collaborative knowledge construction: the analysis of online discussions. in s. puntambekar, g. erkens & c. hmelo-silver (eds.), analyzing interactions in cscl: methods, approaches and issues (pp. 247-268). new york: springer. doi:10.1007/978-1-4419-7710-6_12 stegmann, k., weinberger, a., & fischer, f. (2011). aktives lernen durch argumentieren: evidenz für das modell der argumentativen wissenskonstruktion in online-diskussionen [active learning by argumentation: evidence for the model of argumentative knowledge construction in online discussions.]. unterrichtswissenschaft, 39(3), 231–244. k. stegmann 33 | f l r stegmann, k., wecker, c., weinberger, a. & fischer, f. (2012). collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment. instructional science, 40(2), 297-323. doi:10.1007/s11251-011-9174-5 stegmann, k., weinberger, a., & fischer, f. (2007). facilitating argumentative knowledge construction with computer-supported collaboration scripts. international journal of computer-supported collaborative learning, 2(4), 421-447. doi:10.1007/978-0-387-36949-5_12 strijbos, j.-w., martens, r. l., prins, f. j., & jochems, w. m. g. (2006). content analysis: what are they talking about? computers & education, 46(1), 29-48. doi:10.1016/j.compedu.2005.04.002 trochim, w., & land, d. (1982). designing designs for research. the researcher, 1(1), 1-6. van boxtel, c., van der linden, j., roelofs, e., & erkens, g. (2002). collaborative concept mapping: provoking and supporting meaningful discourse. theory into practice, 41(1), 40-46. doi:10.1207/s15430421tip4101_7 wecker, c. (2013). how to support prescriptive statements by empirical research: some missing parts. educational psychology review, 25(1), 1-18. doi:10.1007/s10648-012-9208-9 wecker, c., stegmann, k., & fischer, f. (2012). lernund kooperationsprozesse: warum sind sie interessant und wie können sie analysiert werden? [learning and cooperation processes in casebased learning. interesting issues and analysis approaches] report: zeitschrift für weiterbildungsforschung, 35(3), 30-41. doi:10.3278/rep1203w wegerif, r. (2002). thinking skills, technology and learning: a review of the literature for nesta futurelab. bristol: nesta futurelab. retrieved from: http://www.futurelab.org.uk/resources/publications_reports_articles/literature_reviews /literature_review394 weinberger, a., & fischer, f. (2006). a framework to analyze argumentative knowledge construction in computer-supported collaborative learning. computers & education, 46(1), 71-95. doi:10.1016/j.compedu.2005.04.003 weinberger, a., marttunen, m., laurinen, l., & stegmann, k. (2013). inducing socio-cognitive conflict in finnish and german groups of online learners by cscl script. international journal of computersupported collaborative learning, 8(3), 333-349. doi:10.1007/s11412-013-9173-4 weinberger, a., stegmann, k., & fischer, f. (2005). computer-supported collaborative learning in higher education: scripts for argumentative knowledge construction in distributed groups. in t. koschmann, d. d. suthers & t.-w. chan (eds.), computer supported collaborative learning 2005: the next 10 years! proceedings of the international conference on computer supported collaborative learning 2005 (pp. 717-726). mahwah, nj: lawrence erlbaum. doi:10.3115/1149293.1149387 weinberger, a., stegmann, k., & fischer, f. (2007). knowledge convergence in collaborative learning: concepts and assessment. learning and instruction, 17(4), 416-426. doi:10.1016/j.learninstruc.2007.03.007 weinberger, a., stegmann, k., & fischer, f. (2010). learning to argue online: scripted groups surpass individuals (unscripted groups do not). computers in human behavior, 26(4), 506-515. doi:10.1016/j.chb.2009.08.007 wise, a. f., & chiu, m. m. (2011). analyzing temporal patterns of knowledge construction in a role-based online discussion. international journal of computer-supported collaborative learning, 6(3), 445-470. doi:10.1007/s11412-011-9120-1 zottmann, j., stegmann, k., strijbos, j. w., vogel, f., wecker, c., & fischer, f. (2013). computersupported collaborative learning with digital video cases in teacher education: the impact of teaching experience on knowledge convergence. computers in human behavior, 29(5), 2100-2108. doi: 10.1016/j.chb.2013.04.014 frontline learning research 2 (2013) 86-98 issn 2295-3159 corresponding author: katrien vangrieken, university leuven: centre for research on professional learning & development and lifelong learning, dekenstraat 2, bus 3772, 3000 leuven, belgium, katrien.vangrieken@student.kuleuven.be http://dx.doi.org/10.14786/flr.v1i2.23 86 | f l r team entitativity and teacher teams in schools: towards a typology katrien vangrieken a , filip dochy a , elisabeth raes a , eva kyndt a a university of leuven, belgium article received 17 april 2014 / revised 16 december 2014 / accepted 18 december 2014 / available online 20 december 2014 abstract in this article we summarise research that discusses „teacher teams‟. the central questions guiding this study are „how is the term „teacher team‟ used and defined in previous research?‟ and „what types of teacher teams has previous research identified or explored?‟. we attempted to answer these questions by searching literature on teacher teams and comparing what these articles present as being teacher teams. we attempted to further grasp the concept of teacher teams by creating a typology for defining different types of teacher teams. overall, the literature pertaining to teacher teams appeared to be characterised by a considerable amount of haziness and teacher „teams‟ mostly do not seem to be proper „teams‟ when keeping the criteria of a team as defined by cohen and bailey (1997) in mind. the proposed typology, characterising the groups of teachers by their task, whether they are organised disciplinary or interdisciplinary, whether they are situated within or cross grades, their temporal duration and degree of team entitativity or „teamness‟, appears to be a useful framework to further clarify different sorts of teacher „teams‟. keywords: teams; teacher teams; typology; entitativity 1. introduction: beyond ‘egg-crate’ schools – teams in schools overall, teaming in schools appears to be quite a challenge, not the least because of a long-standing culture of teacher isolation and individualism in schools (gajda & koliba, 2008). teachers may feel that their autonomy is threatened by collaboration and that conflicts that they previously tended to avoid come to the surface (somech, 2008). teachers appear to be predominantly confined in their classroom in which they work in isolation, as such creating what lortie (1975, in westheimer, 2008) calls „egg-crate‟ schools. k. vangrieken et al. 87 | f l r despite this prevalent resistance to collaborate, a lot of studies point out to positive effects of a team structure in schools. teaming in schools appears to be a broad and rather vague concept with varying interpretations in the literature. nonetheless, it is of vital theoretical and practical importance to clarify this concept. in order to be able to properly discuss teacher teams it is essential to have a clear view on what such teams actually are and whether it is warrantable to speak of „teacher teams‟ in general or whether there are different types of teacher teams. cohen and bailey (1997) already pointed at the importance of team types in discussing their results. thus, the first aim of this article is to look at how the term „teacher teams‟ is defined in previous research and what type of teams were explored in former scientific inquiry. the importance of this article is shown in the fact that there might be several types of teams in schools and that these might possess different levels of „team entitativity‟ (the degree to which a „team‟ actually is a team, the „teamness‟ of teams). a clear typology thus could be useful in order to be able to draw warranted conclusions from former research that are applicable to a specific subset of teams since different types of teams may have different characteristics and thus different conclusions (for practice) may be justified. aside from the description of a few rather vague categories, previous research on teacher teams seems to lack a clear typological framework in order to clearly conceptualise the complexity of the concept of teacher teams. thus, the second aim of the article is to present a typology for using the team concept in schools. in the following will be discussed what „teacher teams‟ are and a search for clarification in the ruling conceptual confusion concerning this sort of teams will be undertaken. 2. defining teams among the large number of existing definitions of „teams‟, the one formulated by cohen and bailey (1997) seems to be the most comprehensive and mostly used in research on teamwork and team learning (e.g. dochy, gijbels, raes, & kyndt, 2014; decuyper, dochy, & van den bossche, 2010). these authors described a team as follows: “a team is a collection of individuals who are interdependent in their tasks, who share responsibility for outcomes, who see themselves and who are seen by others as an intact social entity embedded in one or more larger social systems (for example, business unit or the corporation), and who manage their relationships across organizational boundaries” (p. 241). teams thus have to meet these six criteria. teams are seen as different from groups and are mostly defined more narrowly. as such, van den bossche, gijselaers, segers, and kirschner (2006) stated that „a team is more than a group of people in the same space, physical or virtual‟ (p.490). salas, burke, and cannon-bowers (2001) argued that teams differ from groups in task interdependence, structure and time span. in this sense, all teams are groups when groups are seen as sharing a common social categorisation and identity (raes, kyndt, decuyper, van den bossche, & dochy, submitted). but not all groups are teams since a team has characteristics that not necessarily have to be present to define a group. 3. on the use of the term ‘team’ the articles primarily using the term „team‟ show a considerable amount of diversity in the interpretation of this term which leads to a lack of conceptual clarity. few studies clearly define what they mean when they speak of a „team‟ or a „teacher team‟. several authors appeared to use the term „team‟ without specifying what they mean with this term or who or what these so-called „teams‟ include. a vast amount of studies were more exploratory in nature and did not start off by giving a definition of „teams‟ but described the teams under study (e.g. gunn & king, 2003; hackmann, petzko, valentine, clark, nori, & lucas, 2002; meirink, imants, meijer, & verloop, 2010; somech, 2005). even authors that focused on other denominations often seemed to use the term „team‟ k. vangrieken et al. 88 | f l r somewhere in their article: some studies started off by writing about „collaboration‟, „community‟, „department‟ or „critical friends group‟ and then later on referred to „teams‟ (mostly as a form of collaboration) without giving further explanation (e.g. achinstein, 2002; avila de lima, 2001; curry, 2008; datnow, 2011; dickinson, 2009; kelchtermans, 2006; leonard & leonard, 2003; lomos, hofman, & bosker, 2011; scribner, hager, & warne, 2002; visscher & witziers, 2004; williams, 2010). this mix up of different terms is confirmed by westheimer (2008) who mentioned that schools use different denominations to describe collaboration between teachers, one of them being „teams‟. there thus appears to be a misconception among teachers regarding the term „team‟ since it can be doubted whether what teachers define as being a „team‟ matches any of the criteria for teams that are mentioned in the contemporary team literature (smith, 2009). smith (2009) furthermore concludes that in the perception of teachers, teamwork is depicted as mere collaboration between friends. amidst this inaudibility that accompanies the use of the term „team‟, a few differentiations can be made in the different interpretations and uses of the term. this points at the importance of creating a teacher team typology: in order to be able to make clear and justified conclusions concerning teacher teams it is essential to clarify to which sort of teacher team these pertain. 4.1 existing teacher team categories and team typologies 4.1.1 existing teacher team categories some other authors already pointed at the diversity in the types of teams existing in schools. as such supovitz (2002) stated that teams can be organised in different ways: for example same grade level or vertical (cross grades), teams can loop (teachers stay with the same students for several years), the members can stay in fixed grade levels, or teams can have mixed configurations. pounder (1998) mentioned management teams or school advisory groups, special services teams, and interdisciplinary instructional teams. park, henkin, and robert (2005) distinguished three comparable types of school teams: governance teams, instructional teams and planning teams. governance teams do not have an instructional task but usually develop policies to meet specific needs of local communities. principals, teachers, parents, and community members are the primary decision makers (ellis & fouts, 1994, in park et al., 2005). according to buckley (2000, in park et al., 2005) instructional teams serve to realise flexible scheduling of instruction and higher integration of subject matters. this sort of teams can be organised according to grade level or subject. planning teams are organised to tackle specific problems, which can be temporary or more complex and long term. drach-zahavy and somech (2002) mentioned the fact that teams in schools serve different purposes and distinguished between management teams, instruction teams and pedagogic teams. the management teams are involved with administrative issues and participate in the management. the instruction teams gather around a subject area and their ultimate goal is to enhance teaching effectiveness. finally, the authors state that pedagogic teams consist of teachers that teach in the same class and these teams are focused on improving the pedagogic decisions on specific pupils. the teams in the study of tonso, jung, and colombo (2006) could be sorted out into administrative teams, grade level teaching teams (which were further divided into mixed content subteams) and social service teams. smith (2009) focused on science teachers conceptions of teams and teamwork and listed eight possible teams that can emerge in a school setting. management teams are charged with administrative issues, pedagogic teams are based on teachers teaching the same class, instructional teams are based on subject matter and serve to foster teacher effectiveness while interdisciplinary teams gather teachers from different subject areas who collaborate in teaching and learning. appraisal teams provide assistance in making sense of problem situations. in informational teams the members merely exchange information that is needed to perform the teaching profession, instrumental teams provide practical support and emotional teams form a supportive network with encouraging words and sympathetic understanding. as might be clear, the last three types of teams could be seen as less of an actual „team‟ compared to the first group. k. vangrieken et al. 89 | f l r thus, the existing categories seem to focus primarily on the task of the teacher team to distinguish different types (drach-zahavy & somech, 2002; park et al., 2005; pounder, 1998; tonso et al., 2006). only supovitz (2002) explicitly focused on the organisational differences between the teams. this study attempts to expand the focus to other constructs than the task domain including task as well as organisational features. 4.1.2 team typologies cohen and bailey (1997), devine, clayton, philips, dunford and melner (1999) and hollenbeck, beersma and schouteden (2012) presented typologies of teams in general (not focused on teacher teams in specific). cohen and bailey (1997) distinguished four types of teams: work, parallel, project and management teams. devine et al. (1999) presented a dimensional approach to a team typology using the dimensions product type and temporal duration. the crossing of these two typologies results in four team types: ad hoc project teams, ongoing project teams, ad hoc production teams and ongoing production teams. the article of hollenbeck et al. (2012) searched to transcend different existing team typologies relying on a dimensional scaling approach based on three underlying constructs: skill differentiation, authority differentiation and temporal stability. 4.2 transcending the different existing categorisations: a typology several authors thus pointed out to the existing diversity in teacher teams and distinguished different „categories‟ of teams. these different categorisations overlap to some extent and the teams mentioned in the literature appear to fit into these categories to a certain degree. for that reason, the abovementioned existing categories, together with typologies of teams in general (not teacher teams in specific) (devine et al., 1999; hollenbeck et al., 2012; cohen & bailey, 1997), will serve as a starting point for the typology that is made here. they will be supplemented with other important categories and dimensions that play an important role in the literature discussing teacher teams. overall, following defining features of a teacher team typology (presented in appendix 1 table 1) appear to be important. 4.2.1 task first, teacher teams may have tasks pertaining to governance or management. pounder (1998) stated that management teams may include representative teachers, school support staff and parents or community members. their main responsibility is advising the principal or other administrators in problem solving, planning and decision-making concerning school improvement. according to ellis and fouts (1994, in park et al., 2005) governance teams develop policies to meet specific needs of local communities. principals, teachers, parents and community members are the primary decision makers. thus although teachers may be part of such teams, other representatives are often included as well. secondly, instruction appears to be a very important task for teacher teams: overall, teacher teams show a primary focus on instruction and student learning. here instruction is seen as tasks teams perform that are directly related to student instruction. in order to create further clarification, two subtasks are distinguished here: instruction/teaching and planning of instruction. instruction/teaching includes all tasks of teachers directly pertaining to the instruction of a particular group of students. this includes collaborating on the instruction, evaluation, and follow-up of a particular group of students. the subdivision of planning of instruction is seen as collaboratively planning instruction in general and is not necessarily limited to a common group of students. tasks here entail in general the planning, coordinating and evaluating of curriculum (flowers, mertens, & mulhall, 2000; mertens & flowers, 2004; gunn & king, 2003; yisrael, 2008). it may also include planning considering student assignment (flexible grouping strategies) and scheduling (conley, fauske, & pounder, 2004), which are needed before the instructional process can start. a third task, problem-specific planning, is inspired by the typology of park et al. (2005) who stated that planning teams are responsible for tackling specific problems and can be of a temporary or a longer k. vangrieken et al. 90 | f l r lasting nature. smith (2009) spoke of appraisal teams who offer assistance in making sense of problem situations and as such can be related to this task category. although teams in the articles under study have an array of different decision-making responsibilities and do tackle specific problems, these specific tasks are mainly coupled with a more general task such as instruction for example. this type of task is thus not that clearly delineated from the other tasks. fourthly, the task of teacher teams may pertain to pedagogy, as it is one of the team-types distinguished by drach-zahavy and somech (2002) and smith (2009). this task can be related to supporting student learning and managing student behaviour (e.g. crow & pounder, 2000; supovitz, 2002; watson, 2005), to communication with parents (e.g. crow & pounder, 2000; flowers et al., 2000) or more general to a discussion of the teaching and pedagogy and the challenges experienced by teachers (e.g. havnes, 2009). a following and related task of teacher teams may include special or social services. pounder (1998) stated that special services teams are responsible for the evaluation, placement, and educational plans of exceptional students. they may include special education teachers, professional support staff, administrators, representative parents, and others. the responsibilities of this type of team are not limited to pure educational tasks but stretch further into the social and psychological functioning of students. both types of teams (special and social services) can be seen as similar to some extent (the social service team in the study of tonso et al. (2006) included a special education teachers for example) and might be integrated into one team in some schools and their tasks are thus seen as belonging to the same task category. sixthly, teacher teams can have tasks related to innovation and school reform (meirink et al., 2010). quite often teams are being associated with school reform or innovation. euwema and van der waals (2007) pointed to the fact that the environment of schools is increasingly dynamic and complex. and this will lead to a decrease in the predictability of developments causing an increased pressure on the ability of the school to adapt and innovate. the authors pointed to these developments as an important, although not the only, reason for organising schoolwork in teams. meirink et al. (2010) and meirink (2007) spoke of temporary „innovative teams‟ that are responsible for designing and experimenting with new teaching practices. overall, enhancing teacher collaboration appears to be a rather „recent‟ innovative attempt in a few countries, organising teachers in teams is one of the ways to accomplish this goal. some studies, such as watson (2005), spoke of learning teams, in which the learning of teachers is of central importance. this can entail learning of teachers considering the teaching practice, as such watson (2005) stated that the professional learning teams, sometimes referred to as (professional) learning communities (saunders, goldenberg, & gallimore, 2009; dickinson, 2009; cheng & ko, 2009; williams, 2010) or communities of practice (curry, 2008), in his study are involved in the implementation of a school improvement process. this is closely linked to the category „innovation‟ and shows that the boundaries between the different task categories can be blurred. overall the (professional) learning teams discussed in the literature are directed towards improving student performance. this thus forms a bridge between the task of learning and of instruction: teachers need to learn in order to improve their instruction and thus enhance student learning. finally, some studies appear to mention a mere material or practical „task‟ when discussing teacher teams. for example, main and bryer (2005) pointed at the sharing of physical space and/or resources as being part of the „task‟ of teacher teams and smith (2009) referred to instrumental teams who provide practical support. this clearly forms a rather infirm base for teacher teams and a grouping of teachers showing a mere material or practical base upon which to collaborate can be merely considered as working in proper „teams‟. smith (2009) furthermore pointed at informational teams and emotional teams. in the first, members merely exchange information that they need in order to perform the teaching profession. the latter provides a supportive framework with encouraging words and sympathetic understanding. it becomes clear that these tasks on themselves as well are not enough to justifiably speak of an actual „team‟ and as such they are mentioned in this task category. k. vangrieken et al. 91 | f l r 4.2.2 discipline level: disciplinary or interdisciplinary a second important distinction to be made is whether teacher teams are organised disciplinary or interdisciplinary (teachers teaching the same or different subjects). this can be linked to the dimension of skill differentiation mentioned by hollenbeck, et al. (2012): this means that members have more or less specialised knowledge or functional capacities that make them more or less difficult to replace. as such, in interdisciplinary teams teachers have expertise in different subject areas. in some school contexts this distinction may be less relevant. for example, in primary or elementary schools teachers are responsible for teaching all courses and are thus not specialists in one or more disciplines. in such contexts it appears irrelevant to speak of disciplinary or interdisciplinary since every teacher is responsible for all disciplines to be taught. an exception here could be when a different teacher who is responsible for teaching crafts or music, a special education teacher,... is included in the team. when teachers are not the only team members, „interdisciplinary‟ refers to the fact that the team is comprised of people from different professions (e.g. nurses, social workers, specialists,...). 4.2.3 grade level: cross or within grade level another important distinction that can be made in teams of teachers is the fact whether they are situated on a grade level (responsible for students in the same grade level) or not (responsible for students cross grades). pounder (1998) states that a common middle school structure appears to consist of interdisciplinary grade-level teams. as such it should not come as a surprise that quite a lot of the studies (of those who clarify these characteristics) referred to such teams in middle schools. 4.2.4 temporal duration considering temporal duration (whether the teams are designed temporarily or for a longer time period), there are only two studies explicitly referring to temporary teams (meirink, 2007; meirink et al., 2010). most other studies seem to refer to teams that are more long-term (a temporal duration of the collaboration is not given), except for drach-zahavy and somech (2002) and somech (2005) who mentioned that the teams under study already worked together for at least one year. 3.2.5 team entitativity a final and vital feature of teacher teams is a dimension that is captured in the term of „entitativity‟ (campbell, 1958). this terms covers the fact whether an aggregate of persons actually behaves as a system. according to campbell, entitativity includes „the perception that a social aggregate is a coherent, unified and meaningful entity‟ (haslam, rotschild, & ernst, 2004, p.65). it entails the degree of being a unity or a coherent whole and thus represents the interdependence that is present in groups or teams (campbell, 1985). ohlsson (2013) also states that teams possessing a strong level of interdependence see themselves as an actual team. in this article, team entitativity is conceived as the degree to which a collection of individuals is an actual team as described by cohen and bailey (1997). the criteria in the definition of cohen and bailey (1997) will serve as a basic measure of team entitativity. these six criteria entail: a collection of individuals; who are interdependent in their tasks; share responsibilities for outcomes; see themselves and are seen by others as an intact social entity; embedded in one or more social systems; and manage their relationships across organisational boundaries (what will be referred to as boundary crossing). the more criteria the teams meet and the stronger they fulfill them, the higher their degree of team entitativity or „teamness‟ will be‟. the concept of team entitativity is further elaborated upon in the review article of vangrieken, dochy and raes (submitted). k. vangrieken et al. 92 | f l r 4. conclusion this short article tried to answer the questions: „how is the term „teacher teams‟ used and defined in previous research?‟ and „what types of teacher teams has previous research identified or explored?‟. this study results in the following: first, starting from a comprehensive definition of „teams‟ that provides clear criteria from which can assessed whether groups can rightfully be called „teams‟ (cohen & bailey, 1997), we find that teacher teams in literature in most cases do not meet these criteria or at least often no effort is made to make definitions and characteristics of groups of teachers sufficiently explicit. a clear-cut unambiguous definition of teams in schools or teacher teams appears to lack. different authors discussing teacher teams tend to use different interpretations of the term „team‟ and seem to discuss different types of teams. most of the articles lack an insightful definition of what they mean when they use the term „team‟ which makes interpreting the results of their research quite challenging. moreover, no single description of teacher teams met all criteria of a team as described by cohen and bailey (1997). so, „teachers groups‟ appear to be mostly „groups‟ instead of highly entitative „teams‟. this finding is in line with a conclusion made by smith (2009) who stated that teams as they are usually defined outside education are perceived as dysfunctional in the experiences of science teachers, they do not exist or do not work in schools. smith (2009) furthermore stated that although the teachers in the study experience membership of multiple teams, it can be questioned whether these so called teams really exist in the meaning of „teams‟ as described in the conventional team literature. in the latter, a team is presumed to be much more than a collection of individual teachers who are gathered around their timetabled subjects, staffroom or science department (smith, 2009). as a consequence, it would be interesting to find out what criteria are really met by the so-called teacher teams in literature. it would be reasonable to argue that some teacher groups discussed in literature are more a „team‟ than others in the sense that they meet more of the aforementioned criteria (what is previously referred to as team entitativity). at this point, it is difficult to assess the degree of team entitativity of teams described in literature based on the current vague information in most studies. future studies on teacher teams should go deeper into the real origin and scope of the teams. a typology for teacher teams can be based on the following axes: task (governance/management, instruction, problem-specific planning, pedagogy, special/social services, innovation/school reform, learning, material/practical), discipline level (disciplinary or interdisciplinary), grade level (within or cross gradelevel), temporal duration (temporary or lasting) and team entitativity (low, moderate or high). as a consequence, an overarching typology is proposed in figure 1. k. vangrieken et al. 93 | f l r figure 1. typology. there appears to be a vast amount of variation in „teacher teams‟, with a variety in the task and organisation of the teams. the above discussed framework appears to be useful in trying to clarify what these „teams‟ consist of. by giving a specification of all of these distinctions, which lacks in a lot of studies, a rather clear description can be given of what sort of teacher team is under study. keypoints there appears to be a lack of clarity and a large variety within the concept of teacher teams. they can have a large diversity of tasks and can be organised in different ways. the lack of a clear description makes it difficult to draw warranted conclusions since it may not be justified to make generalisations across different types of teacher teams. when using the definition of cohen and bailey (1997) of teams to assess whether „teacher teams‟ can rightfully be called „teams‟, we conclude that groupings of teachers are mostly „groups‟ rather than „teams‟ since „teams‟ are often defined quite vaguely in team literature and these descriptions hardly ever meet all criteria mentioned in this definition. a typology for teacher teams based on the axes of task, discipline level, grade level, temporal duration and team entitativity is a useful framework to describe the sort of teacher team under study. this thus creates some clarity among the indistinctness surrounding the use of the term „teacher team‟. k. vangrieken et al. 94 | f l r references achinstein, b. (2002). conflict amid community: the micropolitics of teacher collaboration. teacher college record, 104(3), 421-455. retrieved from http://www.tcrecord.org/ avila de lima, j. (2001). forgetting about friendship: using conflict in teacher communities as a catalyst for school change. journal of educational change, 2, 97-122. retrieved from http://link.springer.com bertrand, l., roberts, r.a., & buchanan, r. (2006). striving for success: teacher perspectives of a vertical team initiative. national forum of teacher education journal-electronic, 16(3). retrieved from http://www.allthingsplc.info brouwer, p. (2011). collaboration in teacher teams (doctoral dissertation). retrieved from http://dspace.library.uu.nl/handle/1874/214140 brouwer, p., brekelmans, m., nieuwenhuis, l., & simons, r.j. (2012). fostering teacher community development: a review of design principles and a case study of an innovative interdisciplinary team. learning environments research, 15(3), 319-344. doi:10.1007/s10984-012-9119-1 campbell, d.t. (1958). common fate, similarity, and other indices of the status of aggregates of persons as social entities. behavioral science, 3(1), 14-25. retrieved from http://librilinks.libis.be carroll, t.g., & foster, e. (2008). learning teams: creating what‟s next. national commission on teaching and america‟s future. retrieved from http://nctaf.org cheng, l.p., & ko, h. (2009). teacher-team development in a school-based professional development program. the mathematics educator, 19(1), 8-17. retrieved from http://math.nie.edu.sg/ame/matheduc/ cohen, s.g., bailey, d.e. (1997). what makes teams work: group effectiveness research from the shop floor to the executive suite. journal of management, 23(3), 239-290. conley, s., fauske, j. & d.g. pounder (2004). teacher work group effectiveness. educational administration quarterly, 40(5), 663-703. doi:10.1177/0013161x04268841 crow, m.g., & d.g. pounder (2000). interdisciplinary teacher teams: context, design, and process. educational administration quarterly, 36(2), 216-254. doi:10.1177/0013161x00362004 curry, m. (2008). critical friends groups: the possibilities and limitations embedded in teacher professional communities aimed at instructional improvement and school reform. teachers college record, 110(4), 733-774. retrieved from http://www.tcrecord.org/ datnow, a. (2011). collaboration and contrived collegiality: revisiting hargreaves in the age of accountability. journal of educational change, 12(2), 147-158. doi:10.1007/s10833-011-9154-1 decuyper, s., dochy, f., & van den bossche, p. (2010). grasping the dynamic complexity of team learning: an integrative model for effective team learning in organisations. educational research review, 5, 111-133. doi:10.1016/j.edurev.2010.02.002 devine, d.j., clayton, l.d., philips, j.l., dunford, b.b., & melner, s.b. (1999). teams in organizations: prevalence, characteristics, and effectiveness. small group research, 30, 678-711. doi:10.1177/104649649903000602 dickinson, e.b. (2009). the impact of collaborative teacher teaming on teacher learning (specialist project). retrieved from http://digitalcommons.wku.edu dochy, f., gijbels, d., raes, e., & kyndt, e. (2014). team learning in education and professional organisations. in s. billet, c. harteis & h. gruber (eds.), international handbook of research in professional and practice-based learning. the netherlands: springer. drach-zahavy, a., & somech, a. (2002). team heterogeneity and its relationship with team support and team effectiveness. journal of educational administration, 40(1), 44 – 66. doi:10.1108/09578230210415643 euwema, m.c., & van der waals, j. (2007). teams in scholen. samen werkt het beter. leusden: bmc. flowers, n., mertens, s.b., & mulhall, p.f. (2000). what makes interdisciplinary teams effective? middle school journal, 31(4), 53-56. retrieved from http://limo.libis.be gajda, r., & koliba, c.j. (2008). evaluating and improving the quality of teacher collaboration: a fieldtested framework for secondary school leaders. nassp bulletin, 92(2), 133-153. doi:10.1177/0192636508320990 k. vangrieken et al. 95 | f l r gunn, j.h., & king, m.b. (2003). trouble in paradise: power, conflict and community in an interdisciplinary teacher team. urban education, 38(2), 173-195. doi:10.1177/0042085902250466 hackmann, d.g., petzko, v.n., valentine, j.w., clark, d.c., nori, j.r., & lucas, s.e. (2002). beyond interdisciplinary teaming: findings and implications of the nassp national middle level study. nassp bulletin, 86(632), 33-47. doi:10.1177/019263650208663204 haslam, n., rothschild, l., & ernst, d. (2004). essentialism and entitativity: structures of beliefs about the ontology of social categories. in v. yzerbyt, c.m. judd & o. corneille (eds.), the psychology of group perception: perceived variability, entitativity and essentialism (pp. 61-78). new york (ny): psychology press. havnes, a. (2009): talk, planning and decision‐making in interdisciplinary teacher teams: a case study. teachers and teaching: theory and practice, 15(1), 155-176. doi:10.1080/13540600802661360 hollenbeck, j.r., beersma, b., & schouten, m.e. (2012). beyond team types and taxonomies: a dimensional scaling conceptualization for team description. academy of management review, 37(1), 82-106. doi:10.5465/amr.2010.0181 kelchtermans, g. (2006). teacher collaboration and collegiality as workplace conditions. a review. zeitschrift für pädagogik, 52(2), 220-237. retrieved from http://www.pedocs.de leonard, l., & leonard, p. (2003). the continuing problem with collaboration: teachers talk. current issues in education, 6(15). retrieved from http://cie.asu.edu lomos, c., hofman, r.h., & bosker, r.j. (2011). the relationship between departments as professional learning communities and student achievement in secondary schools. teaching and teacher education, 27, 722-731. doi:10.1016/j.tate.2010.12.003 main, k. (2007). a year long study of the formation and development of middle school teaching teams (doctoral dissertation). retrieved from https://www120.secure.griffith.edu.au/rch/file/64a6473e-3a2bf149-bd30-6e2033bbef0f/1/02whole.pdf main, k., & bryer, f. (2005). what does a „good‟ teaching team look like in a middle school classroom? stimulating the “action” as participants in participatory research, 2, 196-204. retrieved from http://www.griffith.edu.au meirink, j.a. (2007). individual teacher learning in a context of collaboration in teams (doctoral dissertation). retrieved from https://openaccess.leidenuniv.nl meirink, j.a., imants, j., meijer, p.c., & verloop, n. (2010). teacher learning and collaboration in innovative teams. cambridge journal of education, 40(2), 161-181. doi:10.1080/0305764x.2010.481256 mertens, s.b., & flowers, n. (2004). research summary: interdisciplinary teaming. retrieved from http://www.nmsa.org ohlsson, j. (2013). team learning: collective reflection processes in teacher teams. the journal of workplace learning, 25(5), 296-309. doi:10.1108/jwl-feb-2012-0011 park, s., henkin, a.b., & egley, r. (2005).teacher team commitment, teamwork and trust: exploring associations. journal of educational administration, 43(5), 462 – 479. doi:10.1108/09578230510615233 pounder, d. (1998). chapter 5: teacher teams: redesigning teacher‟s work for collaboration. in d. pounder (ed.), restructuring schools for collaboration: promises and pitfalls (pp. 65-88). albany: state university of new york press. raes, e., kyndt, e., decuyper, s., van den bossche, p., & dochy, f. (submitted). group development and team learning: how development stages affect team-level learning behavior. human resource development quarterly. rone, b.c. (2009). the impact of the data team structure on collaborative teams and student achievement (doctoral dissertation). retrieved from http://proquest.umi.com salas, e., burke, c.s., & cannon-bowers, j.a. (2000). teamwork: emerging principles. international journal of management reviews, 2(4), 339-356. saunders, w.m., goldenberg, c.n., & gallimore, r. (2009). increasing achievement by focusing grade-level teams on improving classroom learning: a prospective, quasi-experimental study of title i schools. am educational research journal, 46(4). doi:10.3102/0002831209333185 k. vangrieken et al. 96 | f l r scribner, j.p., hager, d.r. & warne, t.r. (2002). the paradox of professional community: tales form two high schools. educational administration quarterly, 38(45). doi:10.1177/0013161x02381003 smith, g. (2009). if teams are so good... science teachers‟ conceptions of teams and teamwork (doctoral dissertation). retrieved from http://eprints.qut.edu.au somech, a. (2005). teachers‟ personal and team empowerment and their relations to organizational outcomes: contradictory or compatible constructs? educational administration quarterly, 41(2), 237266. doi:10.1177/0013161x04269592 somech, a. (2008). managing conflict in school teams: the impact of task and goal interdependence on conflict management and team effectiveness. educational administration quarterly, 44. doi:10.1177/0013161x08318957 supovitz, j.a. (2002). developing communities of instructional practice. teachers college record, 104(8), 1591-1626. retrieved from http://www.tcrecord.org tonso, k.l., jung, m.l. & m. colombo (2006). “it‟s hard answering your calling”: teacher teams in a restructuring urban middle school. research in middle level education, 30(1), 1-22. retrieved from http://www.eric.ed.gov van den bossche, p., gijselaers, w.h., segers, m., & kirschner, p.a. (2006). social and cognitive factors driving teamwork in collaborative learning environments: team learning beliefs and behaviors. small group research, 37(5), 490-521. doi:10.1177/1046496406292938 vangrieken, k., dochy, f., & raes, e. (submitted). teacher teams and collaboration: a review. visscher, a.j., & witziers, b. (2004). subject departments as professional communities? british educational research journal, 30(6), 785-800. doi:10.1080/0141192042000279503 watson, s. t. (2005). teacher collaboration and school reform: distributing leadership through the use of professional learning teams (doctoral dissertation). retrieved from http://edt.missouri.edu westheimer, j. (2008). chapter 41: learning among colleagues: teacher community and the shared enterprise of education. in m. cochran-smith, s. feiman-nemser, & j. mcintyre (eds.). handbook of research on teacher education. reston, va: assocation of teacher educators; lanham, md: rowman wigglesworth, m. (2011). the effects of teacher collaboration on students understanding: relating to high school earth science concepts. montana: lap lambert academic publishing. williams, m.l. (2010). teacher collaboration as professional development in a large, suburban high school (doctoral dissertation). retrieved from http://digitalcommons.unl.edu yisrael, s.b. (2008). a qualitative case study: the positive impact interdisciplinary teaming has on teacher morale (doctoral dissertation). retrieved from http://etd.ohiolink.edu appendix table 1. typology framework task governance/management 1 instruction (according to grade level or subject) instruction/teaching examining individual student work generated from common formative assessments (rone, 2009) developing instruction to address the academic needs of students (saunders et al., 2009) keep track of the progress and revise instruction (saunders et al., 2009) studying previous test data of students (bertrand, roberts, & buchanan, 2006) coordinating instruction, communication and assessment for a common group of students (flowers et al., 2000) 1 as mentioned earlier, management and special services teams were not the focal point of this study because of the fact that these often include other team members than just teachers and for that reason no literature considering this type of teams is discussed here. k. vangrieken et al. 97 | f l r developing and implementing interdisciplinary curriculum and teaching strategies based on the developmental needs of the children (crow & pounder, 2000) developing coordinated interventions and management strategies to tackle problems considering student learning (crow & pounder, 2000) team teaching (brouwer, 2011; brouwer, brekelmans, & nieuwenhuis, 2012; main, 2007; main & bryer, 2005) coherent curriculum development (organisation of education and discussing students) (brouwer, 2011; brouwer et al., 2012) planning instruction planning, coordinating and evaluating of curriculum and instruction across academic areas (yisrael, 2008; mertens & flowers, 2004) planning curriculum and developing assessments (gunn & king, 2003) realising common goals across different classes (main & bryer, 2005) set and share academic goals (saunders et al., 2009) collaboratively planning and administering assessment (main & bryer, 2005) development and implementation of the subject matter (somech, 2008) collaborating on instructional strategies (wigglesworth, 2011; supovitz, 2002) evaluating collaboratively constructed materials (wigglesworth, 2011) developing course syllabi and benchmark tests (bertrand et al., 2006) planning interdisciplinary teaching (havnes, 2009) coordinating individual subject-specific teaching (havnes, 2009) work together to plan, design, integrate and implement shared instructional methods, curricula and assessment targeted towards curricular and pedagogical alignment (watson, 2005) decision-making authorities considering curricular emphasis and coordination (conley et al., 2004) decision-making authorities considering student class assignment and flexible grouping strategies, student assessment (conley et al., 2004) decision-making authorities considering curricular and co-curricular scheduling (conley et al., 2004) problem-specific planning (tackle specific problems. temporary or long term) planning teams are responsible for tackling specific problems and can be of a temporary or a longer lasting nature (park et al., 2005) pedagogy developing coordinated interventions and management strategies to tackle problems considering student learning and/or behavior (crow & pounder, 2000) providing coordinated communication with parents (crow & pounder, 2000) building-wide support and intervention programs for students, monitor the effectiveness of these programs and make improvement recommendations (watson, 2005) communication (with families) (mertens & flowers, 2004) continually exploring their curricular and pedagogical strategies and the influences of these on student learning (supovitz, 2002) discussing teaching, practice, the challenges they experience as teachers, and pedagogy (havnes, 2009) decision-making authorities considering student management and behavioural interventions (conley et al., 2004) decision-making authorities considering coordinated parent communication (conley et al., 2004) special/social services 2 innovation and school reform 2 see 1 k. vangrieken et al. 98 | f l r designing and experimenting with new teaching practices (meirink et al., 2010) learning collaboratively learning (saunders et al., 2009; watson, 2005) sharing expertise and experience across generations (carroll & foster, 2008) material/practical these teams lack a shared task, but share for example resources. this collaboration is mostly realised for practicality reasons. budgetary allocation (main & bryer, 2005) sharing of resources and/or physical space (main & bryer, 2005) practical support (smith, 2009) discipline level interdisciplinary teachers from different subject areas are part of the team. disciplinary teachers from the same subject area or part of the team. grade level within-grade level teachers responsible for students from the same grade-level cross-grade level teachers responsible for students from different grade levels temporal duration temporary for a definite time/project lasting team entitativity low (1-2 criteria met) moderate (3-4 criteria met) high (5-6 criteria met) microsoft word vokatis & zhang_publication.docx frontline learning research vol.4 no. 1 (2016) 58 -‐ 77 issn 2295-‐3159 corresponding author: barbara vokatis, department of elementary education and reading, state university of new york at oneonta, ravine parkway 108, oneonta, ny 13820, usa, email: barbara.vokatis@oneonta.edu doi: http://dx.doi.org/10.14786/flr.v4i1.223 the professional identity of three innovative teachers engaging in sustained knowledge building using technology barbara vokatisa, jianwei zhangb astate university of new york at oneonta, united states bstate university of new york at albany, united states article received 30 october / revised 15 february / accepted 16 february / available online 14 april abstract diffusing inquiry-based pedagogy in schools for deep and lasting change requires teacher transformation and capacity building. this study characterizes the professional identity of three elementary school teachers who have productively engaged in inquirybased classroom practice using knowledge building pedagogy and knowledge forum, a collaborative online environment. grounded theory analysis of teacher interviews, supplemented with field observations, highlights five distinctive features of the teachers’ identity: (a) teachers as professional knowledge builders to explore new visions of teaching for continual improvement of knowledge building; (b) teachers as co-learners to form symmetrical relationships with students so they can take on the highest level of responsibility; (c) teachers as problem-solvers and barrier-breakers holding a proactive stance toward the contexts of practice; (d) teachers as members of a professional community that encourages collaboration, innovation, and continual improvement; and (e) an empowering relationship with the principal who supports teacher innovation and collaboration. keywords: teacher identity; knowledge building; inquiry learning; problem-centred pedagogy; technology vokatis & zhang | f l r 59 1. introduction current education reforms require teachers’ capacity to incorporate authentic inquiry practices by which students construct powerful explanations and designs to address authentic problems (national research council, 2012). while various professional development resources are initiated to help teachers understand inquiry-based teaching strategies and technologies, the knowledge of teaching methods does not suffice to become a responsive and thoughtful inquiry-based educator (fairbanks, duffy, faircloth, ye, levin, rohr, & stein, 2010). the complexity of teaching demands that teachers develop adaptive expertise (bransford, darling-hammond, & lepage, 2005) and capacity to engage in wise and collaborative improvisation in response to students’ evolving thinking and changing needs (little, lampert, graziani, borko, clark, & wong, 2007; sawyer, 2004). therefore, good inquiry-based teaching cannot be reduced to prescriptive techniques, but comes from the identity of the teacher as a whole person (palmer, 1997). this aspect of teaching gives educators a sense of their selfhood as dynamically connected with their students and subject areas, allowing them to have a clear vision of themselves and what is important for them to accomplish with children (duffy, 2005), and to have the soul and agency to overcome obstacles as they constantly look for ways to respond to the needs and thinking of students (fairbanks et al., 2010). the goal of this study is to provide a detailed account of the professional identity of three elementary school teachers who have been working persistently and productively with knowledge building pedagogy and technology, one of the most influential computer-supported collaborative inquiry programs to cultivate creative knowledge work among students (scardamalia & bereiter, 2006). our conceptual framework guiding this study is two-fold, focusing on teacher identity and knowledge building, respectively. 1.1 conceptualizing teacher identity the concept of teacher identity refers to how teachers identify themselves as teachers, including who they are as professionals, and who they strive and are empowered to become in a constant process of reflecting on their practices and experiences. teacher identity is not a static entity; a teacher constantly constructs and develops a reflective sense of self through looking into his or her practice and life of teaching, as a mirror (palmer, 1997). teachers teach who they are (clandinin & huber, 2005; palmer, 1997); a teacher’s identity is associated with his or her distinctive set of practices (gee, 2001), such as inquiry-based teaching. in this sense, teacher identity is intertwined with teacher practices (enyedy, goldberg, & welsh, 2005). teachers’ professional identity arises out of their various types of teaching practices across contexts in which they construct holistic views of themselves in relation to students, colleagues, professional purposes, and circumstances of teaching (beijaard, meijer, & verloop, 2004; dillabough, 1999; olsen, 2008). in this sense, teacher identity differs from teachers’ specific practices and functional roles: their roles are associated with specific jobs and skills of teaching while teacher identity is a more personal entity that indicates how one identifies himself or herself as a teacher (mayer, 1999). our conceptualization of identity draws heavily on the work of gee (2001) and other researchers who highlight a number of important characteristics of teacher identity (beijaard et al., 2004; connelly & clandinin, 2000; rodgers & scott, 2008; sfard & prusak, 2005). first, teacher identity is a constantly undergoing process in which a person interprets and reinterprets oneself as a certain kind of person and is also recognized as a certain kind of person in a particular context (gee, 2001). it is not limited to answering the question “who am i at this moment?”, but also entails answering the question: “who do i want to become?” (beijaard et al., 2004). thus, teachers need to constantly explore and reflect on who they are as professionals based on their experiences (antonek, mccormick, & donato, 1997; brooke, 1994) and actively look for new ways to define their professional work to approach important educational issues (coldron & smith, 1999). they frame and develop who they are through reflective story telling about what they strive for and do as teachers: “stories to live by” that are shaped by the past and project into their ongoing lives and works (connelly & clandinin, 2000). vokatis & zhang | f l r 60 as its second feature, teacher identity is shaped by multiple contexts of teaching practices (beijaard et al., 2004; rodgers & scott, 2008). contexts entail larger socio-cultural-historical processes that influence teachers’ identity (varelas, house, & wenzel, 2005), personal histories that alter teachers’ beliefs and values, the culture of the institution, including the history of the institution, and values held by its administrators and other members. through reflecting on their practice and identity, teachers “become more in tune with their sense of self and with a deep understanding of how this self fits into a larger context which involves others” (beauchamp & thomas, 2009, p. 182). therefore, teacher identity involves sub-identities that are reflected in their relationships with peer teachers, students, administrators, and other members of their school communities (beijaard et al., 2004). examining a teacher’s identity requires understanding how the teacher forms certain relationships with his or her students, peer teachers, and school administrators and positions himself or herself toward the context such as the curriculum, school policy, and physical school environment. social relationships are crucial to identity, because to have an identity one must be recognized as a particular “kind of person” by others (gee, 2001). how a teacher identifies himself or herself stems from the nature of social interactions the teacher has with his or her peers and others (dillabough, 1999). within multiple contexts, a teacher forms multiple relationships that bring forth multiple aspects of himself or herself (gee & crawford, 1998; rodgers & scott, 2008). teacher identity is co-constructed “through engagement with others in cultural practice” (smagorinsky, cook, moore, jackson, & fry, 2004, p. 21). a teacher’s identity influences how he or she negotiates his or her role in relation to administration, curriculum and students, and these relationships further influence the teacher’s identity (enyedy et al., 2005). in an autographic study, brooke (1994) also found that becoming a professional teacher involves interacting with others’ views. the conflict between one’s own images of teaching and peers’ expectations of what makes a professional teacher may lead to deep reflection on identity (volkmann & anderson, 1998). a final, critical feature of teachers’ identity pertains to their agency and voice to shape their own professional paths (rodger & scott, 2008). teachers’ professional identity develops as a result of the negotiation between given factors, such as existing social structures and policies, and teachers’ agencydriven participation and engagement with educational resources and ideas (coldron & smith, 1999). agency is the empowerment to act (holland, lachicotte, skinner, & caine, 1998). a teacher with agency not only knows how to act within the existing world of education but also acts upon and remakes the world in line with his or her vision. such agency results from a teacher’s realization of his or her identity (beauchamp & thomas, 2009; parkinson, 2008) and further drives his or her continual efforts to explore and form new identifies as he or she goes beyond current classroom practices. in this study, we investigated the identity of inquiry-based teachers who have engaged in knowledge building pedagogy and technology as a school-wide innovation implemented over a decade. school-based professional development has been provided to support their ongoing reflection on their knowledge building practices in classrooms as well as what it means to be inquiry-based, knowledge building teachers. this study examines how the teachers reshape their identity through their long-term engagement in knowledge building practices. in line with the above conceptualization, we analyse their reflective stories about what they strive for through their unique set of practices (gee, 2001), supported by their relationships with their students, colleagues, and the principal. 1.2 teacher identity in inquiry-based, knowledge building communities with schools increasingly incorporating problem-centred, inquiry-based pedagogy to develop student productive knowledge and higher-order competencies (e.g. creativity, collaboration, and other 21st century competencies), research on teacher learning and development needs to understand the professional identity of teachers who are dedicated to implementing and sustaining inquiry-based learning in their practice. the literature on inquiry-based, collaborative learning suggests various new roles to be played by the teacher: a designer, facilitator, mentor, modeller of authentic inquiry processes, and a partner or covokatis & zhang | f l r 61 learner who co-engages in the inquiry processes with his or her students, valuing students as collaborative contributors while fostering their ownership and agency (belland, glazewski, & richardson, 2008; brush & saye, 2000; crawford, 2000; hjalmarson & diefes-dux, 2008; hmelo-silver & burrows, 2006; lunn & solomon, 2000; mills, 2014; tabak & baumgartner, 2004; zhang & sun, 2011; zhang, hong, scardamalia, & morley, 2011). however, existing research on teacher learning to support inquiry-based classroom innovation primarily focuses on teacher knowledge and practices (see fishman, davis, & chan, 2014 for a review), with scarce research efforts centring on the professional identity of teachers who are innovative, persistent, and productive in implementing inquiry-based learning (enyedy et al., 2005). teacher identity aligned with inquiry-based pedagogy allows teachers to pursue and persist in implementing adaptive, responsive teaching (duffy, 2005; fairbanks et al., 2010) and continually inventing new and more productive practices (davis, 2006). therefore, in order to better understand teacher identity in support of inquiry-based pedagogy, the present study examines the professional identity of teachers who are deeply engaged in continuous classroom innovation using knowledge building pedagogy and technology (scardamalia & bereiter, 2006). the knowledge building pedagogy belongs to the larger family of problem-centered, inquiry-based learning programs, with a particular focus on designing inquiry following authentic knowledge creation processes (scardamalia & bereiter, 2006). beyond project-based inquiry that requires students to address pre-defined problems and tasks, knowledge building pedagogy approaches inquiry as progressive problem solving achieved by sustained collective discourse: students identify new and deeper problems as old ones are addressed, driving sustained advancement of collective understandings. students work as a knowledge building community to engage in such sustained inquiry and discourse and collectively advance the “state of the art” of their community’s collective knowledge (scardamalia & bereiter, 2006). they identify deepening problems of understanding, develop and contribute ideas to a public space, engage in collaborative discourse and experimentation, and use a wide variety of resources to advance their ideas. a networked knowledge building environment—knowledge forum, formerly known as csile (computer-supported intentional learning environment)—has been developed to support knowledge building discourse and processes (see scardamalia & bereiter, 2006). knowledge forum provides a collective knowledge space that gives student ideas a public, permanent representation. students contribute diverse ideas to ongoing conversations and collectively advance the ideas through constructive criticisms, mutual build-on, and progressive problem solving, with new and deeper challenges identified as their understanding is advanced (bereiter, 2002). specifically, students record ideas in views (workspaces). these workspaces correspond with their focal goals. students write notes in these views in order to contribute their ideas, data, and related information using text and graphics. knowledge forum has supportive features for knowledge-building discourse that allows students to co-author, build on, and annotate notes. students can also create reference links with citations to existing notes, as well as add keywords and create rise-above notes to summarize and advance their discussions (scardamalia, 2004). knowledge forum scaffolds additionally support both individual contributions and learning as well as collaboration, turning over to students higher-level knowledge processes. customizable scaffolds are designed to support various knowledge processes, such as using the sentence starters “my theory,” “i need to understand,” “this theory cannot explain,” “a better theory,” and “putting our knowledge together” to support theory development (scardamalia, 2004). knowledge building is a dynamic, social activity system in which students interact with diverse people and ideas to advance their collective knowledge. the social and cognitive complexity of this process requires a principle-based, adaptive approach to classroom design and practice, which differs from procedure-based inquiry designs that require students and their teacher to work on pre-defined project tasks following pre-scripted procedures and timelines (zhang et al., 2011). knowledge building in classrooms is guided by a set of 12 knowledge building principles, including epistemic agency, real ideas and authentic problems, continual idea improvement, collective responsibility for community knowledge, knowledge building discourse, and constructive use of authoritative sources (scardamalia, 2002). epistemic agency allows students to set goals for learning, initiate and sustain knowledge advancement, and engage in higherlevel knowledge work normally left to the teacher. the principle of real ideas and authentic problems allows vokatis & zhang | f l r 62 students to identify problems that stem from their curiosity and efforts to understand the world. the principle of continual idea improvement treats ideas as ever improvable, not simply rejected or accepted. the principle of collective responsibility for community knowledge places a responsibility on all participants for contributing to community goals and advancing community knowledge, not only individual learning. the principle of knowledge building discourse asks students to engage in discursive practices whose goals are not only to share, but also to transform and advance knowledge. the idea behind the principle of constructive use of authoritative sources is accessing and critically evaluating sources of information. students use these sources to support and refine their ideas, not just to find “the answer.” the 12 principles and corresponding support in knowledge forum create affordances for knowledge building in a community. in a knowledge building initiative focusing on a deep curriculum area(s), teachers and their students co-construct goals of inquiry based on progressive questions from students, and design knowledge building activities in light of the principles. resources and supports are in place for teachers to share lesson examples and reflect on knowledge building processes using real-time analytic data about idea contributions and social interactions. teachers have utilized the automated feedback generated by the automated tools for the purpose of facilitating reflection and improving practice. this principle-based approach gives teachers a high-level ownership over classroom practice and innovation, so they can continually improve and adapt classroom designs and pedagogical understandings to enable increasingly productive knowledge building experiences among their students (chan, 2011; zhang et al., 2011). this study examines the professional identity of a group of teachers from an elementary school that has been implementing knowledge building pedagogy and technology for more than a decade. our previous analysis based on rich data collection over eight years demonstrated the teachers’ continual improvement of inquiry practice as reflected in student active and collaborative engagement in knowledge building (zhang et al., 2011). enabling such sustained innovation and improvement, the teachers formed into a professional knowledge building community themselves to discuss advances and challenges, co-design and test classroom designs, reflect on their practice based on data collected, and continually deepen their understanding of knowledge building principles that inform new possibilities of improvement (zhang et al., 2011). the purpose of this study is to investigate the professional identity of these teachers in the context of knowledge building classrooms. our research question asks: what characterizes the professional identity of these teachers who are dedicated to and capable of sustained innovation using knowledge building pedagogy and technology? 2. method this case study was conducted as a part of a larger research initiative to examine the enactment of knowledge building as a principle-based innovation at an elementary school over a decade: the dr. eric jackman institute of child study laboratory school in toronto (zhang et al., 2011). the school was established in 1926, partly inspired by the work of john dewey. it enrols approximately 200 students from nursery (pre-k), junior kindergarten, senior kindergarten, to grade 6, with 22 students on average per class. most families come from a middle class background and pay a tuition fee. as a laboratory school, jackman ics has been involved in initiating and disseminating new ideas related to improving education. it makes daily contributions to teacher training, providing internship opportunities for graduate students in the programs of child development and education. knowledge building pedagogy and csile/knowledge forum were first introduced in 1994, tested by a few classrooms between 1996-2000, and adopted across the entire school since 2000. for the larger study, we analysed the knowledge building initiatives facilitated by the teachers over eight years focused on core scientific themes as well as social topics. the analysis of student online discourse in knowledge forum demonstrated increasing levels of collaborative knowledge advancement vokatis & zhang | f l r 63 associated with years of teachers’ experience. teacher interviews, reflection journals, and on-site observations helped to elaborate the teachers’ efforts and school conditions. qualitative analysis revealed that the teachers continually and collaboratively worked on improving their practice and deepening their understanding of knowledge building pedagogy. through experimenting with new ideas and openly sharing them with other teachers, they developed adaptive expertise (crawford, schlager, toyama, riel, & vahey, 2005; zhang et al., 2011). the present study re-analysed the interviews with three teachers to understand their professional identity. these teachers were chosen because they represented different grade levels, had the most extensive experience with knowledge building pedagogy at their school, and were often requested to provide mentoring support to other teachers from the international network of knowledge building communities. the teachers included raphael, a male teacher teaching grades 4-6; zanna, a female teacher teaching grades 2-3; and cadence, a female teacher teaching kindergarten (these are pseudonyms). all the teachers were middle-aged and had over five years of experience in teaching. each interview took approximately 30 minutes, focusing on how the teachers approached and improved their classroom practices to better support knowledge building. example interview questions included: how do you see your role as a teacher? what are the three most important qualities you would like to develop in your students? what are the major things you do to develop these qualities? what have been your three most important improvements in your teaching in the past years? in what way do you see your colleagues/principal as supportive of your efforts for seeking innovation in teaching? the analysis of identity was mainly based on the interviews in which the teachers reflect on who they are and what they strive for. the analysis was contextualized by observational records of the teachers’ classroom practices, which were systematically analysed for the larger project. the interviews were fully transcribed, and then analysed using a grounded theory approach (strauss & corbin, 1998). the grounded theory approach suits the subject of this research because, although teacher professional identity has been investigated in the literature in light of theories of identity, inquiry-based teachers with high levels of innovativeness have never been examined for this purpose. we also argue that incorporating already existing ideas from literature regarding what constitutes teacher identity not only does not “compromise methodological ‘purity’” (dunne, 2011, p. 113), but “can actually enhance rigor” (p. 113). the literature review provides a clear rationale for the study and a specific research approach (coyne & cowley, 2006; mcghee, marland, & atkinson, 2007). secondly, it is helpful in contextualizing the study (mccann & clark, 2003), provides the researcher with an aim (urquhart, 2007), and makes known how the phenomenon has been researched (denzin, 2002; mcmenamin, 2006). thirdly, it is helpful in developing ‘sensitising concepts’ (coffey & atkinson, 1996; mccann & clark, 2003) and promoting “clarity in thinking about concepts and possible theory development” (henwood & pidgeon, 2006, p. 350). following procedures of grounded theory analysis, the first author first read and re-read the interview transcriptions, and created open codes that reflect specific features of the teachers’ identity. these codes were then categorized into primary themes that include subthemes to capture prominent features of the teachers’ professional identity in connection with their knowledge building practice. the creation of the themes was informed by teacher identity and knowledge building as our two-fold conceptual framework and by the features of teacher identity reviewed in the beginning of this article while remaining open to possible new aspects of teacher identity in the contexts of knowledge building. the two authors then co-reviewed the open codes and initial themes and subthemes and discussed any disagreements. for example, several raw codes such as classroom as a community of researchers, teacher as an authentic co-learner, and faith in students developed into the following subtheme: “teacher as a co-learner: sustaining students-driven inquiry through symmetrical teacher-student interactions in which the teacher is not an intellectual authority but a co-learner.” this subtheme became then the core aspect of the following theme: “teachers as co-learners: forming symmetrical relationships with students so they can take on the highest level of responsibility for learning and knowledge advancement.” in grounded theory analysis, the degree of agreement between researchers is not as important as “the content of disagreements and the insights that discussion can provide for refining coding frames” (barbour, 2001). in addition, we employed a reflexive approach to the analysis to ensure reliability and validity (barry, vokatis & zhang | f l r 64 britten, barber, bradley, & stevenson, 1999). schwandt (1997) specifies reflexivity as two-fold. the first aspect involves being part of the setting, context, and a phenomenon being researched. the second is “[a] process of self reflection of one’s biases, theoretical predispositions, preferences and so forth” (p. 135). we used reflexivity “to move us outward to achieve an expansion of understanding” (barry, britten, barber, bradley, & stevenson, 1999, p. 30). we were reflexive not only by keeping reflexive diaries and recording analytic decisions in memos, but also by being reflexive about every decision we made (mason, 1996). in addition, as a team, we engaged in group reflexivity, making sure that there was a dialogue between our individual reflexivities and our group reflexivity (barry et al., 1999). in negotiation of our ideas, we developed a dialectic that improved our thinking. that is, by sharing and negotiating our thinking and differences, we thought through our positions and justified them, and if an argument could not be justified, it became apparent that it was weak (barry et al., 1999). the themes and subthemes were then refined and further validated through relating and comparing the themes, checking data against the themes, and triangulating the identified themes with data from the teachers’ journals and field observations. the refined themes and subthemes are elaborated in results. 3. results the data analysis identified five overarching themes—each involving a number of subthemes—that characterize the professional identity of the knowledge building teachers. these themes are summarized in table 1 and elaborated below. table 1 themes and subthemes that characterize the identity of the knowledge building teachers theme subtheme teachers as professional knowledge builders to explore new visions of teaching: viewing teaching as ever improvable to open new possibilities for student knowledge building and development a vision of teaching for lifelong learning and whole child development (e.g. intellectual curiosity, creative problem solving, caring and collective responsibility, open-mindedness) beyond curriculum coverage; a strong belief that pedagogical knowledge and practice need to be continually built and refined to foster increasingly productive knowledge building; an adaptive, open approach to teaching so new classroom arrangements, procedures, and technologies are continually tested and flexibly adapted and integrated in the service of knowledge building and inquiry. teachers as co-learners: forming symmetrical relationships with students so they can take on the highest level of responsibility for learning and knowledge advancement teacher as a co-learner: sustaining students-driven inquiry through symmetrical teacher-student interactions in which the teacher is not an intellectual authority but a co-learner; student agency: honouring students as research team members who are responsible for proposing goals and ideas for research, building and assessing theories, and designing experiments and other activities; collective engagement: respecting and engaging each student as a contributive member of a knowledge building community; students-driven discourse: striving for spontaneous, idea-centred conversations co-improvised by all community members in both faceto-face and online environment—with the teacher as one of them. vokatis & zhang | f l r 65 teachers as problem-solvers and barrierbreakers: holding a proactive stance toward the contexts of practice to address challenges, constraints, and barriers for continual improvement a commitment to developing context-adaptive strategies to make knowledge building possible and effective across age groups and classroom settings; a barrier-breaking attitude to address practical challenges such as time limit and technical problems through flexible and integrated arrangements. teachers as members of a professional community that encourages collaboration, innovation, and continual improvement: building collaborative relationships with colleagues to share, discuss, design, and reflect on innovative classroom practices a shared focus on continual improvement and invention in teaching beyond routine procedures; conviction that improvement is achieved through collaborative efforts; continual, professional knowledge building discourse that supports collaborative problem solving and ideation in teaching; boldness to share and reflect on both successes and failures; confidence in accepting risk-taking as inevitable in experimenting with new approaches; a hybrid identity that integrates practice with research for continual improvement of teaching. an empowering relationship with the principal: perceiving the principal as both a leader and a professional colleague who supports teacher innovation and collaboration democratic, supportive, and professionally centred relationship with the principal as crucial in all teachers’ undertakings; empowerment to innovate resulted from the relationship in which collaborative experimentation and risk-taking are valued and encouraged. 3.1 teachers as professional knowledge builders to explore new visions of teaching: viewing teaching as ever improvable to open new possibilities for student knowledge building and development as a crucial aspect of who they are and mean to achieve, the teachers’ comments in the interviews reveal a perception of themselves as professional knowledge builders who are committed to explore innovative visions of teaching to open new possibilities for student knowledge building and development. first, the focal teachers see themselves as teachers for whole child development and lifelong learning, not just to cover the curriculum. specifically, they are committed to developing crucial qualities that go beyond curriculum content coverage, such as curiosity, intellectual thinking skills, creative problem solving, social caring, collective responsibility, and open-mindedness. they stress that these qualities are important for student development and further needed for students to engage in productive knowledge building. developing curiosity is one of the most important qualities that constitute who they are as young children’s educators. raphael (teaching grades 4-6) recognizes the limitation of the curriculum in stimulating children’s natural curiosity that drives inquiry learning; thus, in his classrooms, he particularly encourages students to ask deeper and deeper questions and engage in curiosity-driven inquiry. similarly, cadence and zanna (grades 2-3) mention the importance of exploring questions that are asked by children and of their interest. specifically, they elaborate that this way of teaching, contextualized and situated in children’s lives, is aligned with children’s needs and curiosity, leading to active engagement and deep exploration, which is the essence of inquiry. as zanna says, she wants to instil “a love of learning so that when they come to school they don’t see the work of school as being just for school but they see that it is important for their life.” at the same time, zanna underscores that this attitude to teaching is a part of who she always was: “i really tried to hold onto it in the public school system, but to come back here and find everyone like-minded it has just brought me right back to the way children learn best, developmentally appropriate practice.” curiosity, according to the teachers, as a tenet that stimulates desires to investigate something in depth, is vital in problem solving and nurturing an inquiring mind. teaching how to solve problems also starts very early. it is in kindergarten where children learn that problems can be solved with their efforts and vokatis & zhang | f l r 66 that there are specific words and strategies that can help them. when problems are brought to the group, through conversation children find out how to solve them. cadence, the kindergarten teacher, stresses, “we talk a lot. we bring problems of understanding, social issues to the group... no matter what kind of problem it is, you can solve something piece by piece... you can bring it to the community and see that you aren't just addressing the small problem that came up or that conflict, you are actually figuring out how to figure out all conflicts.” zanna adds, “we have a class meeting every friday where we sit in a circle and we talk about the week and one thing that kids can do is to raise up a problem they have.” at the same time, raphael underscores that major knowledge advancements may not be achieved all the time as he would hope, but at least he observes “the desire to go deeper” among his students as a result of their engagement in the inquiry activities. developing independence in thinking is another important part of what the teachers mean to achieve. cadence mentions developing independence and open-mindedness in thinking very early in children’s education: “i want them to be knowing that they can act independently; they don't need to have a teacher there, guiding them in the whole way, and telling them what they're doing is right or wrong.” for zanna, developing independence is also important: “i want the kids to rely on each other so they don't feel they need to come to me. i don't want to be centre of the class.” raphael stresses that he wants to instill in children a disposition of “seeking the information as opposed to children thinking that things have to come to them either from a teacher or a book.” aligned with their commitment to growing care, curiosity, and independent thinking in their students, the teachers focus their role on creating a community of knowledge builders who share collective responsibility. the teachers stress collective responsibility that allows students to keep moving forward in their common pursuit of knowledge and hold each other accountable without the teacher stepping in all the time. cadence mentions that her kindergarteners already learn that being a part of a community entails certain behaviours, such as respecting everybody as a member of the community and valuing everyone’s ideas. zanna adds, “i want them to take responsibility for their actions, not to blame others or to say ‘oh i didn’t do it’ but to make good choices and when they don’t make good choices to admit to it.” raphael gives a specific example of such collective responsibility, “…the children will say ‘you’ve been researching for two days and you haven’t written anything on the database yet, we need to know what you’ve done. we’ve given you this time and you have to give us back some information’ and so that completely changes the nature of the community it is very responsible and works as a unit and where they are leading it themselves.” to explore and achieve their vision of teaching, the teachers embark on a sustained, reflective journey to explore and build new knowledge about their profession. in the interviews, the teachers comment that they constantly rediscover what it means to be a knowledge building teacher. their understanding of knowledge building pedagogy has been constantly evolving over time. in this process, they are all learners. raphael made a comment that encapsulates this critical belief in rethinking, refinement, and improvement: “but five years from now, we can come back and say: ‘i don't know what i was talking about then, and this feels like knowledge building!’ so there is constant improvement. just like ideas are improvable, the process of knowledge building is improvable.... you're constantly going deeper in what this means... none of us is the learned. we're all learners.” cadence adds: “i never try to think that worked really well, i'm going to do the same thing again. i always look for ways to improve my practice.” the teachers’ comments on new advances they have made in their classrooms demonstrate their adaptive, open approach to teaching by which new classroom arrangements, procedures, and technologies are continually tested, flexibly adapted, and integrated to serve and strengthen knowledge building and inquiry. what these teachers have learned and experienced in this process further strengthens their identity as knowledge building teachers. for zanna, continual testing of adaptive approaches is vital to her professional identity. she has experimented with various strategies to support student knowledge building discourse both in face-to-face interactions and in knowledge forum and found out that giving children the opportunity of recording their ideas in the online space creates the possibility to revisit the ideas later for further vokatis & zhang | f l r 67 exploration. she expresses her thoughts in this regard, “...because we have the software, the questions live there and they do get answered.” reflecting on his experimentations in the classroom, raphael comments on the changes he has made to develop dynamic collaboration structures for knowledge building over three years. he began with collaboration in fixed small-groups in the first year, evolved to collaboration in interacting groups, and eventually to opportunistic collaboration among students based on emergent needs without fixed smallgroups. analysis of the online discourse showed that increased connectivity and productivity among his students resulted from these changes (see zhang, scardamalia, reeve, & messina, 2009; zhang & messina, 2010). raphael has also changed the way classroom conversations are organized; instead of scheduling them in advance, he decided to allow the conversations to emerge naturally, when children felt that they had something important to discuss. similarly, the use of technology in his classroom has gone from teacherdirected tasks to trusting children and allowing them to decide if an idea or question is suitable and important for discussion, online or face-to-face. as all the teachers stress, such efforts to continually improve and innovate teaching is an important characteristic of who they are. as new notions and strategies of teaching continue to develop, the teachers further redefine their relationship with students in the classroom. 3.2 teachers as co-learners: forming symmetrical relationships with students so they can take on the highest level of responsibility for learning and knowledge advancement the focal teachers identify themselves as co-learners who honour students as research team members for collective knowledge building. the teachers describe themselves as authentic members of the classroom community who co-engage in the knowledge building and problem solving processes with students. raphael stresses his position: “we are a community of researchers in the classroom…i try to do things that i don’t know the answer to so that it becomes an authentic kb [knowledge building] experience for me as well, so that i can say to the students ‘i’m not exactly sure. let’s find out.’” zanna adds, “i don't want to be centre of the class. i want to be another member of the community.” cadence also underscores this distinctive relationship that she creates in her classroom, saying: “you're not always the intellectual authority.” perceiving themselves as co-learners leads to a more symmetrical relationship with students. they honour students as research team members who have the agency and capability of proposing goals and topics for research, building and evaluating theories, designing experiments, and forming collaborative groups. raphael underscores this point with enthusiasm: “imagine if a child feels that from the very beginning they could add by connecting things in an interesting way… they might be adding a new perspective, a new theory.” the teachers trust that children can take on high-level responsibility in the classroom to generate deepening questions and ideas. they communicate this trust to their students during activities and encourage children to ask questions that would direct and deepen their collective inquiry. as zanna notes: “children could put their ideas in the pocket if you want to talk about. so the students have more agency in what's happening in the [knowledge building] talks. it's not me deciding, but they identify ‘we want to talk about this,’ ‘we want to put the view up on the wall.’ whatever they want to do.” the symmetrical relationship with students is reflected in students-driven, open-ended discourse in the classroom and on knowledge forum, which are not pre-scripted by the teacher but co-improvised by all community members—with the teacher as one of them. the teachers comment on the importance of respecting diverse ideas. they model their respect of student ideas in the classroom and further create a community that respects diverse voices from all members. the diverse ideas are treated as the driving force to deepen classroom discussions. cadence underscores her openness to follow children’s deepening questions and ideas and let them explore these questions for sustained inquiry and discourse. she describes how children’s interest in how trees breathe led to a three-month investigation: vokatis & zhang | f l r 68 it was the very first day of school. i thought it would be interesting to do a study of trees. … and i tried to think where it might go…every year in the fall, [students] bring in different colours of leaves, they look at the shapes...i think i would probably be talking about leaves and colours and maybe get to the cells... so the very first day, i started asking kids what they knew about trees. and as they told me about different parts of trees, i drew on a piece of chart paper. so someone said branches...twigs...and then a child said: "lungs." and i just stopped… it's such a clear way that puts me in an interesting position. so i said: "where would i put the lungs?" and she said: "i don't know. they have to breath, don't they? they're alive." and for the next months, we looked into how trees breathe. that's how it caught children's interests in the class. ... and it was amazing to notice that you don't have to have these arbitrary barriers, that you can study so many things: do literacy and drama, deep thinking, and specific experiments... so for me it was a huge moment as a teacher to realize just how much you can blast open the possibilities of depth and time. in the above example, when one of the students mentioned lungs as a part of a tree, the teacher did not just say that trees do not have lungs but treated this as a real and meaningful idea, which has the potential to stimulate investigations of how trees breathe and live. through acknowledging the student’s idea and asking a question “where would i put the lungs?” the teacher helped the students to recognize an authentic problem, leading to a deep inquiry beyond the teacher’s imagination. in addition to their trust in student agency and capability to generate questions and ideas for deep inquiry, the teachers further encourage students to take on high-level responsibilities that are usually enacted by the teacher in traditional classrooms. these include giving input to high-level decisions about what needs to be studied, through what activities, who will do what, when, and how the online discussion space should be structured and used. the teachers all stress that it is important to encourage children to propose specific problems for discussions as opposed to following teacher-set topics and schedules. the example coming from the classroom of raphael is particularly striking. at the beginning, he simply planned and scheduled knowledge building talks—a structure co-developed by the teachers to facilitate interactive discourse focusing on advancement of ideas beyond information sharing (see zhang et al., 2011). then, he hung pockets in the classroom to encourage students to drop a note when they have important problems or knowledge advances to talk about. through this and other changes, the knowledge building talks in his classroom have become much more spontaneous and organic, with continual improvement of ideas as the focus. in our classroom observations, we captured chunks of metacognitive discourse embedded on the ongoing classroom dialogues, which focus on issues such as: are we making progress? what are the areas that need more research? what kinds of information should be recorded in knowledge forum? student input to these questions leads to collective decisions about how the community should focus and refine their knowledge building work in the next phase. the focal teachers see knowledge forum as an enabler for the shift of high-level responsibility to students. for example, zanna values the use of knowledge forum to support sustained discourse and further make students’ questions and progress visible for reflection. the software provides a space where “the questions live there and they do get answered.” she expresses that before the use of knowledge forum, it was hard to trace which questions were answered. with knowledge forum’s scaffolds, marking questions using “i need to understand” and theories using “my theory” and “a better theory,” both teachers and their students can trace progress in addressing progressive questions and find areas that need deeper contributions, assisting collective decision making about unfolding directions and deeper actions. while all three teachers emphasize the importance of a symmetrical relationship with their students, their personal styles vary. zanna prefers to be a quiet speaker in the classroom. raphael emphasizes that it is ok to intervene in a classroom discussion actively when needed, such as to recall and model the rules of contribution. vokatis & zhang | f l r 69 3.3 teachers as problem-solvers and barrier-breakers: holding a proactive stance toward the contexts of practice to address challenges, constraints, and barriers for continual improvement developing innovative practices in line with their visions of teaching require the teachers to face and address a range of challenges and barriers resulting from the contexts, such as time limit and school schedule, subject area limitations, age differences, and technology malfunction. instead of being defeated by the challenges and barriers, the teachers become active problem-solvers and barrier-breakers. while facing various challenges, the teachers have developed adaptive strategies to make knowledge building possible and effective across student age groups and classrooms, and they speak about these efforts as a part of who they are. as they strive to engage students of all ages in knowledge building, they need to develop adaptive ways of addressing the challenge of developmental differences. for younger students, the teachers especially focus on modelling tenets that are foundational for knowledge building, such as developing respect for others and different perspectives in order to adapt knowledge building principles to younger students. cadence provides the support when she models how children should respectfully converse with each other, “i try to model a lot of my expectations for the children. so when we sit on the carpet, i don't sit on the chair...i think it's an important thing for me because i'm at their level... i hope i'm showing what kinds of comments, what kinds of questions have value for the whole group, and that, again, every voice needs to be heard.” such modelling does not need to be as intensive for older students. but raphael underscores that while he strives to be just a member of the community, he never forgets about his modeling role, “...we are still modelling for children.” another significant challenge the teachers have encountered is how to foster deep inquiry in different subject areas within typical time constraints. the teachers comment on a strategy they have developed to integrate different subjects into a sustained knowledge building initiative that addresses core contents of all the areas for integrated understanding. one of the most striking examples comes from cadence who integrated several subjects under one big topic: trees and how they breathe. she says, “...you can study so many things: do literacy and drama, and deep thinking, and specific experiments, every kinds of learning we want the children to do, you can actually do as one topic, because if it's a good topic, like trees and how they breathe, it is so rich, there're so many directions you can go.” efficiently integrating different subjects allowed the teachers to reallocate the time needed for each subject while further fostering the connected understandings among their students. the intensive use of knowledge forum and other technology tools also requires the teachers to solve emergent problems related to technology use. the teachers comment on their constant experimentations to find meaningful ways to use technology for knowledge building and address issues of unproductive technology use. raphael shared a story that a few of his students once refused to write on knowledge forum. by sitting down to listen to the students’ concerns, he realized that the problem resulted from his procedural use of knowledge forum: students were assigned to write online based on a preset schedule when they might not have deep ideas to contribute. raphael made a change to encourage students to use the technology only when it is necessary, focusing on contributing important ideas instead of simple facts from books. doing so helped to increase student engagement. he then reflected on what this struggle taught him, “so we have to really be careful of how we use the technology, that is not for the sake of technology. it has to be for the sake of knowledge building.” in terms of technological reliability, the teachers also need to learn to solve various technical problems themselves (e.g. internet connection, forgetting passwords) due to the lack of a full time technology support specialist. they treat this challenge as an opportunity to model to children how problems can be solved and create alternative arrangements when technology does not work. raphael stresses his persistent and proactive approach to solving problems with a “strong stomach.” he says: “it puts you in a role where you have to be happy all the time with technology, and that's a lot of work. the children are watching you. they could give up easily, because the frustrations sometimes are huge. so we need always to be able to be flexible... it’s about saying ‘oh ok that’s not working, let’s do this over here....” vokatis & zhang | f l r 70 even at this laboratory school, one that has a supportive context for innovative classroom practices, the teachers experience challenges and struggles that they have to address in order to effectively implement knowledge building in their classrooms. working collaboratively as a community helps them to share and address the challenges with mutual social support. 3.4 teachers as members of a professional community that encourages collaboration, innovation, and continual improvement: building collaborative relationships with colleagues to share, discuss, design, and reflect on innovative classroom practices teachers’ innovative collaboration with colleagues and considering themselves as not only teachers but also researchers is another major aspect of who they are as professionals. the teachers see themselves and their teaching as a part of a professional community of teachers they work with. as zanna says, “...everyone here is so interested in their teaching and improving it.” furthermore, all teachers underscore that forming such a professional group is crucial to innovation and improvement of their teaching. raphael comments: “it creates an environment where you say something and even by just talking about it you are improving your understanding.” cadence adds, “anytime i have an idea, a question and i want to connect with another class or another teacher you pretty much have people who are willing to go ahead and do it.” moreover, conducting professionally oriented discourse at weekly knowledge building meetings, which supports their collaborative problem solving and formation of new ideas, shapes who they are as professionals. at the meetings, they exchange their classroom designs, insights, and challenges, ask questions, and continually develop better understanding and strategies for deeper and more productive knowledge building. raphael stresses, “…each one of these [meetings] has completely changed me, my practice, you can, we all know, you can create a kb [knowledge building] environment and not be a knowledge-builder yourself. and to truly understand you need to be immersed in a kb experience yourself.” they also stress that this community, which strives for excellence in teaching, is open to sharing both successes and failures. such freedom and boldness in terms of talking about both successes and failures comes from the teachers’ shared belief that risk-taking is inevitable in experimenting with new approaches that lead to the improvement of teaching practice. in this community, as zanna underscores, “there is not that sort of pretending that everything is going great. people bring their problems up and admit when things aren’t going well.” these teachers also identify themselves as both teachers and researchers and stress that researching their own practice, as well as working with other researchers, is a critical component of good teaching that promotes innovation, refinement, and change. raphael elaborates on this important connection between teaching and researching, “the researcher part informs the teaching and the teaching informs the researcher part of me.” zanna sums up, “for me research goes along really well with teaching ... good teachers constantly reflect on their teaching and think about how to improve it... it is just a natural part of good teaching.” 3.5 an empowering relationship with the principal: perceiving the principal as both a leader and a professional colleague who supports teacher innovation and collaboration for continual improvement the focal teachers develop a supportive relationship with the principal who is professionally instead of administratively oriented. through weekly knowledge building meetings and informal ongoing interactions, the teachers share with the principal and other colleagues their teaching expertise, ideas and designs, understanding of children’s development and needs, and vision of innovative teaching. the principal participates in the professional dialogues and gives her input. cadence comments on this sharing, vokatis & zhang | f l r 71 “if you have an idea, you present it to her [the principal] and it makes sense to her, she is going to back you up. she may have questions about it and ask you to think about it in a slightly different way that has more value, but she is really going to support it.” zanna underscores that she gains support from the principal to sustain her innovative practices, saying: “she is a fabulous leader and it allows me to teach the way i want to teach, be innovative, and reflect on my practice.” raphael expresses the essence of this relationship, “we are incredibly empowered. we are given a lot of support, but with that comes a huge amount of responsibility as well.” 4. discussion the present study sought to illuminate the professional identity of three teachers who have been working persistently and productively with knowledge building pedagogy and technology. while the existing literature on teachers in inquiry-based settings focuses on investigating teacher practices and strategies to facilitate collaborative inquiry and the interplay with teacher knowledge, beliefs, and goals (see fishman et al., 2014 for a review), this study is the first to examine the new professional identity of teachers who have engaged in sustained knowledge building pedagogy and classroom innovation for multiple years. our interview data captured their reflective story telling about what they do and mean to achieve as teachers (connelly & clandinin, 2000). as the review of literature suggests, teachers constantly construct and refine their reflective sense of self through looking into their practice of teaching (antonek, mccormick, & donato, 1997; brooke, 1994; palmer, 1997). through their long-term engagement in knowledge building pedagogy and technology, as a distinctive set of practices (gee, 2001), the focal teachers in this study develop new understandings of who they are and what it means to be inquiry-based knowledge building teachers. specifically, the analysis elaborates five important distinctive facets of the teachers’ identity that fits into the larger context of practice (beauchamp & thomas, 2009) involving their students, colleagues, and principal. first, beyond routine implementers of teaching, the teachers are professional knowledge builders who explore new visions and possibilities of teaching and test new and adaptive teaching designs for continual improvement. their visions of teaching concentrate on whole child development, including intellectual curiosity, creative problem solving, caring, respect, collective responsibility, and openmindedness. they see knowledge building pedagogy and technology as supporting their visions. since there are no given classroom procedures for achieving these high-order learning outcomes, the teachers have to work as professional knowledge builders to develop and improve specific designs in light of principles of knowledge building. in this undergoing process (gee, 2001), they are empowered to engage in constant interpretation and reinterpretation (beijaard et al., 2004) of who they are as knowledge building teachers. they are deeply aware that who they are as teachers constantly changes because of their strong sense that they need to become educators who help children to engage in increasingly productive knowledge building. this mindset allows the teachers to develop an adaptive approach that demonstrates itself in readiness and openness to change existing classroom arrangements and processes and test new and improved strategies, including new ways to use technology. their identity as vision-directed professional knowledge builders is consistent with and supported by their knowledge building practices, which require high-level dynamics and adaptation in classroom work. the teachers reflect on who they are and what they do in the context of knowledge building pedagogy, which in itself demands that teachers build knowledge about their pedagogy and develop it. a related aspect of the teachers’ identity has to do with how they position themselves in relation to the contextual challenges and constraints of their work (beijaard et al., 2004, gee, 2001; rodgers & scott, 2008; varelas et al., 2005). the literature suggests that teachers often see obstacles and contextual constraints as preventing them from innovation and change. the teachers in this study actively identify and vokatis & zhang | f l r 72 address challenges instead of avoiding them. they identify themselves as problem solvers and barrierbreakers who continuously develop adaptive strategies to make knowledge building possible and productive. they approach obstacles in a proactive way so they can solve the problems with their colleagues and students, transform obstacles into innovative ideas and opportunities, and implement and improve knowledge building under new conditions. such a proactive stance is an important part of the teachers’ identity, allowing them to resolve dilemmas and make decisions to sustain student-centred inquiry (see also, enyedy et al., 2005). another important characteristic of the professional identity that the focal teachers display is reflected in their social relationships with others (beijaard et al., 2004; gee, 2001; sfard & prusak, 2005). through forming relationships with students, peer teachers, and administrators, the teachers brought forth multiple identities (gee, 2001), or aspects of oneself (rodgers & scott, 2008), that are connected to “their performances in society” (gee, 2001, p. 99). first, the teachers’ relationships with students constitute their identity. their understanding of themselves as a certain kind of professionals deeply involves students (beauchamp & thomas, 2009) and forming certain kinds of relationships with them. different from traditional authoritative roles, they identify themselves as co-learners with their students in a community of knowledge builders. this results in a symmetrical relationship with students in which students assume highlevel agency for continually evolving knowledge building. this relationship is aligned with the way the teachers approach classroom discussions in both face-to-face and online settings through knowledge forum, not as teacher-planned conversations but students-driven, spontaneous, and co-improvised conversations driven by students’ authentic questions and ideas. such a symmetrical relationship has been evidenced to some extent in research on inquiry-oriented teachers; however, this symmetry was either not always sustained (enyedy et al., 2005) or was incidental (crawford, 2000; tabak & baumgartner, 2004). yet another essential attribute of teachers’ identity is deeply intertwined with the kind of relationships they form with other teachers and with their principal. their sense of belonging to a professional community, as the teachers express in their reflective narratives, substantially strengthens their bold vision of innovative and adventurous teaching (cohen, 1989). in the collaborative team, devoted to inquiry and improvement of teaching, the teachers also display a hybrid identity (bereiter, 2002) by describing themselves not only as teachers but also researchers who research their own practice, in cooperation with other teachers and researchers, to continually advance their pedagogical insights and strategies. increasingly effective knowledge building practice is what results from this multidimensional identity that involves co-developing better understanding and designs of classroom practices, supporting each other to solve problems and take risks, sharing successes and failures based on formal and informal data collection, and challenging one another in an atmosphere of mutual respect and sharing. supporting their exploration and improvement of classroom practices to facilitate knowledge building, the teachers further develop a democratic and professionally oriented relationship with their principal. underpinning this relationship is a mutual understanding that continual innovation and experimentation are necessary for educational improvement and that teaching needs to be coupled with research. these teachers treat the principal both as a leader who is devoted to the school and as an educator who can always share ideas and expertise and engage in professional conversation with teachers. this type of relationship results in teachers’ seeing themselves as empowered to pursue teaching according to their vision, take risks to experiment with innovative approaches, and collaborate and share with their principal and other colleagues about advances and challenges. this kind of democratic relationship with the principal and its direct connection to how teachers perceive and identify themselves has never been evidenced in literature on teacher identity. these various aspects of teacher identity appear to be deeply connected to one another, depicting a coherent image of the teacher’s self in the contexts of inquiry-based, knowledge building classrooms. the teachers’ identity as vision-driven professional knowledge builders who continually improve classroom practice is supported by their role as problem-solvers and barrier-breakers to address contextual challenges and constraints and by their relationships with their students, peers, and principal. they co-construct their vokatis & zhang | f l r 73 identity through engagement with their students, peers, and principal in transformative cultural practices (smagorinky et al., 2004), which focus on collaborative knowledge building. through co-engaging with their students in knowledge building and reflecting on such experiences, they notice and are impressed by the deep ideas and active thinking of their students, which further reinforce their trust in student potential and agency and help the teachers to envision new possibilities to further engage students’ responsibility through improved classroom designs. through ongoing dialogue at weekly meetings that focus on knowledge building progress, strategies, and challenges, the teachers support and acknowledge one another as professional knowledge builders, problem solvers, and co-learners. these identities are further empowered by the democratic relationship with the principal who facilitates a supportive school culture for sustained innovation (zhang et al., 2011). with these important characteristics of who they are and what they strive for, the teachers are able to be persistent and productive in implementing and improving knowledge building practices and addressing various challenges on an ongoing basis. at the point of this study, the teachers were still searching for effective ways to implement knowledge building in mathematics, drawing on a set of strategies tested. 5. implications in conclusion, this study of the three teachers who have been engaging in knowledge building pedagogy for continual innovation contributes to understanding the new professional identity of inquirybased teachers in computer-supported collaborative classrooms. the teachers’ identity is multifaceted, as vision-driven professional knowledge builders, problem solvers, co-learners with students, and innovative collaborators with colleagues. such identity is co-constructed through sustained engagement in the pedagogical practice of knowledge building that both the teachers and administrators value as beneficial for children’ development as well as the constant improvement of teaching in spite of challenges. it is further shaped and sustained through the symmetrical relationship with students, innovative collaboration with other teachers, and the democratic relationship with the principal. teacher development initiatives to support authentic inquiry practices and educational innovations need to nurture the new aspects of teacher identity identified in this study. the best form of professional development is probably to create collaborative, professional knowledge building communities among teachers in which such important new identities are valued and enacted, as featured in this study. in the field of computer-supported collaborative learning, researchers are developing innovative efforts to create reflective communities and professional networks among teachers so they can develop the capacity to implement collaborative knowledge building among their students (chan, 2011; laferrière, breuleux, allaire, hamel, law, et al., in press). a primary focus of such communities is on engaging teachers in collaborative sharing of pedagogical understandings and co-creation of classroom designs (voogt, laferrière, breuleux, itow, hickey, & mckenney, 2015). in light of the findings of this study, researchers may additionally test systematic efforts to help teachers reflect on and transform their professional identity, including their vision of teaching, stance toward classroom practice, and relationships with their students, colleagues, and administrators. such identity-focused reflection may be designed using a narrative approach (sfard & prusak, 2005) to engage teachers in collaborative story telling about who they are now and who they hope to become professionally, always in relation to their context that includes both the practice and the types of relationships they build with others. comparing the stories among different teachers, including their principal, and reflecting on the stories in relation to the principles of collaborative inquiry and knowledge building may create valuable opportunities for teachers to transform their identity and practices. we are interested to test this possibility in our future studies. as a potential shortcoming of this study, the findings were generated based on the analysis of a small sample of teachers who have been implementing knowledge building as a specific model of inquiry-based vokatis & zhang | f l r 74 pedagogy. the results are limited to understanding teacher identity in the contexts of sustained, open-ended inquiry in which the teachers are not charged to follow a set of tasks and procedural steps designed by researchers and curriculum developers, but to design, improvise, and deepen the inquiry process as it unfolds, based on interactive input from students. future research needs to look at other innovative groups of teachers, at different stages of their career, to see similarities and differences in how they understand and perform their identities, and conduct deeper analyses of teacher identity performed in their classroom practices. keypoints bringing computer-supported collaborative knowledge building into classrooms requires new professional identities of teachers. the new professional identities involve teachers as vision-driven professional knowledge builders; as problem-solvers to address contextual challenges; as co-learners with students; and as innovative collaborators with colleagues. teacher development efforts to support authentic inquiry and knowledge building need to nurture these new aspects of teacher identity. the best form of professional development is probably to create collaborative, professional knowledge building communities among teachers in which such important new identities are valued, enacted, and reflected upon. acknowledgments this research was supported by the national science foundation (iis #1441479). any opinions expressed in this paper are those of the authors and do not necessarily reflect the views of the national science foundation. part of the findings has been presented at the international conference on computer supported collaborative learning (cscl 2015, gothenburg, sweden). the authors would like to thank the teachers, principal, and students of the dr. eric jackman institute of child study of the university of toronto for the insights, accomplishments, and research opportunities enabled by their work. references antonek, j. l., mccormick, d. e., & donato, r. (1997). the student teacher portfolio as autobiography: developing a professional identity. the modern language journal, 81(1), 15–27. doi: 10.2307/329158 barbour, r. (2001). checklists for improving rigour in qualitative research: a case of the tail wagging the dog? british medical journal, 322, 1115–1117. doi: http://dx.doi.org/10.1136/bmj.322.7294.1115 barry, c. a., britten, n., barber, n., bradley, c., & stevenson, f. (1999). using reflexivity to optimize teamwork in qualitative research. qualitative health research, (9)1, 26–44. beauchamp, c., & thomas, l. (2009). understanding teacher identity: an overview of issues in the literature and implications for teacher education. cambridge journal of education, 39(2), 175–189. doi:10.1080/03057640902902252 beijaard, d., meijer, p. c., & verloop, n. (2004). reconsidering research on teachers' professional identity. teaching & teacher education: an international journal of research and studies, 20(2), 107–128. doi:10.1016/j.tate.2003.07.001 vokatis & zhang | f l r 75 belland, b. r., glazewski, k. d., & richardson, j. c. (2008). a scaffolding framework to support the construction of evidence-based arguments among middle school students. educational technology research and development, 56(4), 401–422. doi:10.1007/s11423-007-9074-1 bereiter, c. (2002). education and mind in the knowledge age. mahwah, nj: erlbaum. bransford, j., darling-hammond, l., & lepage, p. (2005). introduction. in l. darling-hammond & j. bransford (eds.), preparing teachers for a changing world: what teachers should learn and be able to do (pp. 1–39). san francisco, ca: jossey-bass. brooke, g. e. (1994). my personal journey toward professionalism. young children, 49(6), 69–71. brush, t., & saye, j. (2000). implementation and evaluation of a student-centered learning unit: a case study. educational technology research and development, 48(3),79–100. chan, c. k. k. (2011). bridging research and practice: implementing and sustaining knowledge building in hong kong classrooms. international journal of computer-supported collaborative learning, 6(2), 147–186. doi:10.1007/s11412-011-9121-0 clandinin, d. j. & huber, m. (2005). shifting stories to live by: interweaving the personal and the professional in teachers’ lives. in d. beijaard, p. meijer, g. morine-dershimer, & h. tillema (eds.), teacher professional development in changing conditions (pp. 43–61). dordrecht: springer. coffey, a., & aktinson, p. (1996). making sense of qualitative data: complementary research strategies. thousand oaks, ca: sage. cohen, d. k. (1989). teaching practice: plus que ca change....in p. w. jackson (ed.), contributing to educational change: perspectives on research and practice (pp. 27–84). berkeley, ca: mccutchan. coldron, j., & smith, r. (1999). active location in teachers’ construction of their professional identities. journal of curriculum studies, 31(6), 711–726. doi:10.1080/002202799182954 connelly, f. m., & clandinin, d. j. (2000). shaping a professional identity: stories of education practice. london, on: althouse press. coyne, i., & cowley, s. (2006). using grounded theory to research parent participation. journal of research in nursing, 11(6), 501–515. doi: 10.1177/1744987106065831 crawford, b. a. (2000). embracing the essence of inquiry: new roles for science teachers. journal of research in science teaching, 37, 916–937. doi: 10.1002/1098-2736(200011)37:9<916::aidtea4>3.0.co;2-2 crawford, v. m., schlager, m., toyama, y., riel, m., & vahey, p. (2005, april). characterizing adaptive expertise in science teaching: report on a laboratory study of teacher reasoning. paper presented at the annual meeting of the american educational research association, montreal, canada. davis, e. (2006). characterizing productive reflection among preservice elementary teachers: seeing what matters. teaching and teacher education, 22, 281–301. doi:10.1016/j.tate.2005.11.005 denzin, n. k. (2002). the interpretive process. in m. huberman & m. b. miles (eds.), the qualitative researcher’s companion (pp. 340–368). thousand oaks, ca: sage. dillabough, j. a. (1999). gender politics and conceptions of the modern teacher: women, identity and professionalism. british journal of sociology of education, 20(3), 373–394. duffy, g. (2005). metacognition and the development of reading teachers. in c. block, s. israel, k. kinnucan-welsch, & k. bauserman (eds.), metacognition and literacy learning (pp. 299–314). mahwah, nj: lawrence erlbaum. dunne, c. (2011). the place of the literature review in grounded theory research. international journal of social research methodology, 14(2), 111–124. doi: 10.1080/13645579.2010.494930 enyedy, n., goldberg, j, & welsh, k. m. (2005). complex dilemmas of identity and practice. science education, 90(1), 68–93. doi 10.1002/sce.20096 fairbanks, c. m., duffy, g. g., faircloth, b. s., ye, h., levin, b., rohr, j., & stein, c. (2010). beyond knowledge: exploring why some teachers are more thoughtfully adaptive than others. journal of teacher education, 61(1/2), 161–171. doi: 10.1177/0022487109347874 fishman, b. j., davis, e.a., & chan, c. k.k. (2014). a learning sciences perspective on teacher learning research. in r. k. sawyer (ed.), cambridge handbook of the learning sciences (2nd ed., pp.750–769). new york: cambridge university press. vokatis & zhang | f l r 76 gee, j. p. (2001). identity as an analytic lens for research in education. in w. g. secada (ed.), review of research in education, vol. 25 (pp. 99–125). washington, dc: american educational research association. gee, j., & crawford, v. (1998). two kinds of teenagers: language, identity, and social class. in d. alvermann, k. hinchman, d. moore, s. phelps, & d. waff (eds.), reconceptualizing the literacies in adolescents’ lives (pp. 225–245). mahwah, nj: erlbaum. henwood, k., & pidgeon, n. (2006). grounded theory. in g. m. breakwell, s. hammond, c. fife-shaw, & j. a. smith (eds.), research methods in psychology (3rd ed., pp. 342–365). thousand oaks, ca: sage. hjalmarson, m. a., & diefes-dux, h. (2008). teacher as designer: a framework for teacher analysis of mathematical model-eliciting activities. interdisciplinary journal of problem-based learning, 2(1), 57–78. doi:10.7771/1541-5015.1051 hmelo-silver, c. e., & burrows, h. s. (2006). goals and strategies of a problem-based learning facilitator. interdisciplinary journal of problem-based learning, 1(1), 21–39. doi:10.7771/1541-5015.1004 holland, d., lachicotte, w., skinner, d., & caine, c. (1998). identity and agency in cultural worlds. cambridge, ma: harvard university press. laferrière, t., breuleux, a., allaire, s., hamel, c., law, n., montané, m., hernandez, o., turcotte, s., & scardamalia, m. (in press). the knowledge building international project (kbip): scaling up professional development for effective uses of collaborative technologies. in c.-k. looi & l. w. teh (eds.), scaling educational innovations. new york: springer. little, j. w., lampert, m., graziani, f., borko, h., clark, k. k., & wong, n. (2007, april). conceptualizing and investigating the practice of facilitation in content-oriented teacher professional development. symposium conducted at the annual meeting of the american educational research association, chicago. lunn, s., & solomon, j. (2000). primary teachers' thinking about the english national curriculum for science: autobiographies, warrants, and autonomy. journal of research in science teaching, 37(10), 1043–1056. doi: 10.1002/1098-2736(200012)37:10<1043::aid-tea2>3.0.co;2-s mason, j. (1996). qualitative researching. london: sage ltd. mayer, d. (1999) building teaching identities: implications for pre-service teacher education. paper presented to the australian association for research in education, melbourne. mccann, t., & clark, e. (2003). grounded theory in nursing research: part 1 – methodology. nurse researcher, 11(2), 7–18. mcghee, g., marland, g. r., & atkinson, j. (2007). grounded theory research: literature reviewing and reflexivity. journal of advanced nursing, 60(3), 334–342. doi: 10.1111/j.1365-2648.2007.04436.x mcmenamin, i. (2006). process and text: teaching students to review the literature. ps: political science and politics, 39(1), 133–135. mills, h. (2014). learning for real. portsmouth, nh: heinemann. national research council (2012). a framework for k-12 science education: practices, crosscutting concepts, and core ideas. washington, dc: the national academies press. olsen, b. (2008). introducing teacher identity and this volume. teacher education quarterly, 35(3), 3–6. palmer, p. j. (1997). the courage to teach: exploring the inner landscape of a teacher’s life. san francisco, ca: jossey-bass publishers. parkison, p. (2008). space for performing teacher identity: through the lens of kafka and hegel. teachers and teaching: theory and practice, 14(1), 51–60. doi:10.1080/13540600701837640 rodgers, c. r., & scott, k. h. (2008). development of the personal self and professional identity in learning to teach. in m. cochran-smith & s. feiman-nemser (eds.), handbook of research in teacher education (pp. 732–755). mahway, nj: lawrence. earlbaum. schwandt, t. (1997). qualitative inquiry: a dictionary of terms. thousand oaks, ca: sage. sfard, a., & prusak, a. (2005). telling identities: in search of an analytic tool for investigating learning as a culturally shaped activity. educational researcher, 34(4), 14–22. doi: 10.3102/0013189x034004014 sawyer, r. (2004). creative teaching: collaborative improvisation. educational leadership, 33, 12–20. doi: 10.3102/0013189x033002012 vokatis & zhang | f l r 77 scardamalia, m., & bereiter, c. (2006). knowledge building: theory, pedagogy, and technology. in r. k. sawyer (ed.), cambridge handbook of the learning sciences (pp. 97–115). new york: cambridge university press. smagorinsky, p., cook, l. s., moore, c., jackson, a.y., & fry, p. g. (2004). tensions in learning to teach: accommodations and the development of a teaching identity. journal of teacher education, 55(1), 8– 24. doi: 10.1177/0022487103260067 strauss, a., & corbin, j. (1998). basics of qualitative research: techniques and procedures for developing grounded theory (2nd ed.). newbury park, ca: sage. tabak, i., & baumgartner, e. (2004). the teacher as partner: exploring participant structures, symmetry, and identity work in scaffolding. cognition and instruction, 22(4), 393–429. urquhart, c. (2007). the evolving nature of grounded theory method: the case of the information systems discipline. in a. bryant & k. charmaz (eds.), the sage handbook of grounded theory (pp. 339–360). london: sage. varelas, m., house, r., & wenzel, s. (2005). beginning teachers immersed into science: scientist and science teacher identities. science education, 89(3), 492–516. doi: 10.1002/sce.20047 volkmann, m. j., & anderson, m. a. (1998). creating professional identity: dilemmas and metaphors of a first-year chemistry teacher. science education, 82(3), 293–310. voogt, j., laferrie`re, t., breuleux, a., itow, r. c., hickey, d. t., & mckenney, s. (2015). collaborative design as a form of professional development. instructional science, 43, 259–282. doi: 10.1002/(sici)1098-237x(199806)82:3<293::aid-sce1>3.0.co;2-7 zhang, j., hong, h.-y., scardamalia, m., teo, c. l., & morley, e. a. (2011). sustaining knowledge building as a principle-based innovation at an elementary school. journal of the learning sciences, 20(2), 262–307. doi: 10.1080/10508406.2011.528317 zhang, j., & messina, r. (2010). collaborative productivity as self-sustaining processes in a grade 4 knowledge building community. in k. gomez, j. radinsky, & l. lyons (eds.), proceedings of the 9th international conference of the learning sciences (pp. 49-56). chicago, il: international society of the learning sciences. zhang, j., scardamalia, m., reeve, r., & messina, r. (2009). designs for collective cognitive responsibility in knowledge building communities. journal of the learning sciences, 18, 7–44. doi:10.1080/10508400802581676 zhang, j, & sun, y. (2011). reading for idea advancement in a grade 4 knowledge building community. instructional science, 39(4), 429–452. doi: 10.1007/s11251-010-9135-4 winne publication frontline learning research vol 6 no. 3 special issue (2018) 250-258 issn 2295-3159 discussion paradigmatic issues in state-of-the-art research using process data philip h. winnea afaculty of education, simon fraser university, canada abstract learning science is enthusiastically adopting new instruments to gather physiological and other forms of event data to represent mental states and series of them that reflect processes. in an attempt to provoke more thought about this kind of research, i suggest paradigmatic issues relating to data, analyses of them and interpretations of results. i advocate we not label these data as “objective.” instead, we share a subjective interpretation of them. i argue propositions about validity need more nuance. bounds on generalization related to so-called ecological validity are rarely empirically justified. when researchers transform raw data before analysis and when analytic methods partition variance, interpretations of results omit key qualifications. i posit emotion and motivation be positioned in theory as moderators rather than mediators because agentic, self-regulating learners make and revise knowledge by choosing forms of cognitive engagement in a context where they interpret arousal. i note that researchers’ anchor interpretations of process data in learners’ accounts. this creates a tautology that troubles usual notions of reliability. finally, i recommend research involving process data turn more toward helping learners identify conditions of learning that spark arousal so learners can regulate motivation and emotion. this leads to a surprise: treating learners as individuals and helping them identify triggers of arousal may recommend learning science cast emotions and motivation as epiphenomena. keywords: validity; trace data; agency; motivation; emotion; self-regulated learning, learning science paradigm info. corresponding author: : philip h. winne, faculty of education, simon fraser university, burnaby, british columbia v3h 4r2, canada doi: https://doi.org/10.14786/flr.v6i3.551 paradigmatic issues in state-of-the-art research using process data a great deal of recent research has investigated relations between learners’ affective states, brain states and other physiologically-related variables to traditional measures of achievement, motivation and upcoming indicators of cognitive processing called traces. articles in this special issue represent a broad and high-quality sample of these efforts. they boldly explore newer approaches to gathering data, tackle challenges in analyzing data with unconventional properties, and suggest new views of frameworks to account for learning processes that create outcomes. it is almost certain that no research study is conceptually faultless or methodologically perfect. interpretations and implications researchers draw about their results arise in that context and, thus, are debatable. in this article, i do make conventional critiques about whether this or that instrument or analytic approach is faulty or whether another is likely more appropriate. instead, i describe from my perspective paradigmatic issues about these kinds of research. my aim is to provoke thinking not about any particular study but about fundamental characteristics of this up-and-coming line of research. 1. process and trace data are a step forward but not the truth one description often applied to process data is they are objective. depending on what one means by “objective” this is a valid interpretation or it is wrong. it is fine to label process data as objective in the sense that “reasonable” observers can agree whether an event occurred. i prefer to think of this as shared subjectivity rather than objectivity. a second sense of the concept of objectivity is wrong. from this perspective, data are conceptualized as incapable of having any other value. i elaborate. data such as a gaze duration or a button click are verifiable – a learner’s gaze settled on a particular area of interest for k or more milliseconds. a button was clicked. the metric is nominal and boundaries are definite. for data like these the question is: what are properties the counting metric? if each gaze period or each click is identical, each event may count as “1” and a total period of gazing at something in particular or the sum of clicks can be identified by adding each instance. in this context where data are labeled “objective” theoretical constructs must be considered. first, i take as axiomatic – believed without proof – learners are agents. they are in control of what they think about. i fend off an immediate counterclaim about mental activities that might be considered a “cognitive reflex.” some mental activities are genuine reflexes. the startle response is an example. genuine reflexes like these are very infrequent in everyday learning situations, so i treat them as rare and genuine anomalies. other apparent cognitive reflexes are learned cognitive routines that have been automated through extensive experience, particularly practice with feedback. learners are supposed to develop and automate such routines and use them as they study and collaborate. understanding others’ speech at everyday rates of utterance is an example. other examples include number facts such as 4 × 8 = 32, raising a hand to be recognized in discussion, and subvocalizing roy g biv to name colors in the visible spectrum in order of their wavelengths from shorter to longer. as well, education encourages students to disassemble other cognitive routines that are disciplinary or social misconceptions. examples are: denominators must be identical to multiply fractions, females are not adept at math and maintenance rehearsal is the best tactic to promote recall . because learners are agents, observers need more information than gaze duration or clickstream events to validly interpret an observation. suppose a learner gazing at a particular region in a diagram of an electric circuit could use software to identify that region (e.g., enclose it in an ellipse) and tag it “confusing.” these extra data – the region enclosed plus the tag – signal what the learner was thinking that caused gaze to linger. the learner judged (metacognitively monitored) she was challenged to understand something about that part of the circuit. she was motivated to make a permanent record of that state of mind, so she drew an ellipse. quite likely, she plans to search later – ask the teacher or a peer, comb the internet – to locate information about this and other content tagged “confusing” to resolve these confusions. suppose a button in a software tool was labeled “see more ….” a click of that button signals the learner is seeking additional or elaborative information beyond what is presented in the current display. clicking the button represents interest, an expectation useful material will be found at the resource linked to the button, and an efficacy expectation understanding can be enhanced by accessing that new information. these examples illustrate trace data (winne, 1982). the best instances of trace data copule an a verifiable event – gaze lingering for a measured time on a particular bit of information, a button clicked, a tag applied – with a convincing theoretical claim about what that event “means.” when learners generate trace data without having to do much more than they normally would do as they study – when the data are ambient in the sense means used to generate data are integrally involved in ways learners normally engage with information (see winne, teng, chang, lin, marzouk, nesbit, patzak, raković, samadi, & vytasek, 2019) – observers have the additional information needed to construct well founded inferences. note, however, such inferences are nonetheless grounded in a theoretical framework. an event datum or string of event data create an opportunity to ask, why did that event occur? why is the string of events shaped as it is? learning science is keen to notice events and characterize strings of them. but event data are not objective. people notice anomalous events because they are unexpected and interesting with respect to subjective schemas that describe the world. when researchers use instrumentation to notice an event, theories underlie the mechanisms that allow the instrument to record the event. in short, process data and especially trace data are inherently and inescapably subjective. what we mean by the label “objective” is really that we share subjectivitity. 2. claims about validity often overreach validity is a concept often misinterpreted. as messick (1989) argued, validity is not a property of an instrument, a protocol or a setting. settings, instruments and protocols do not have validity. validity is more nuanced. it is a property of an inference or an interpretation. people construct inferences and interpretations. i consider three important cases. 2.1 to what does validity apply? statements of relationship or causation – the findings reported about research in learning science – have a degree validity. that degree is proportional to the extent constructs named in stating a finding correspond to operational definitions that describe how data were generated. one form of this relationship was just discussed in regard to trace data. other less obvious cases need address. transformations of data also transform constructs suppose a researcher transforms raw data. two common kinds of transformations populate our research literature. one is transformations of scale, such as a log transformation of continuous scores or an arcsine transformation of proportions. these are often used to reshape a distribution of data so it more closely matches a gaussian (normal) distribution that is better suited to many inferential statistical methods. a second widely used transformation of data is statistically partitioning and removing variance from a variable. examples are standardized partial weighting (e.g., regression) coefficients in a linear modeling analysis. both scale and variance partitioning transformations of data change raw scores into new scores. these new transformed scores are then analyzed by a statistical or machine learning method. results of these analytical methods are then framed using words identical to the untransformed data. this is careless phrasing. the fault is not with the numerical work. it has been rendered appropriate by the transformation. the fault lies in failing to recognize transformed scores introduce an additional operational definition, the transformation. unlike careful attention to reporting the span of a scale for reporting responses to questionnaire items, the number of options offered in multiple-choice items or sampling rates for physiological processes that vary continuously across time, changes to data wrought by numerical transformations are almost never taken into account. why is this important? numerically transformed data submitted to analytical methods represent a different construct than the construct represented by raw data. but, this difference is not represented in descriptions and interpretations of analytical results. researchers don’t write, “the base 10 log version of our control variable … .” the nomological network changes when data are transformed (see winne, 1983). correlations of raw scores with transformed scores always are less, sometimes much less, than 1.00. correlations of raw scores with other anchor variables differ, sometimes considerably, from correlations of transformed scores with those anchor variables. learning science is misled whenever the full operational definition of data is not recognized. 2.2 concerns for ecological validity are rarely empirically justified a third concern about validity relates to the concept of ecological validity. variations of this claim are common in the literature of learning science: “because learners carry out tasks in settings where they learn every day, a study’s findings have ecological validity.” it follows: “when conditions drift away from those characterizing an everyday setting, e.g., from a regular classroom to a lab, findings lose ecological validity in proportion to the drift.” the error of this claim about lessened generalizability is twofold. first, worry that findings disintegrate in a new setting very rarely have grounding in any or sufficient research demonstrating that factors differentiating the settings actually affect findings. it is too casually presumed without empirical backing that such and such a factor differentiating settings is a cause or moderator of an effect. two examples are our literature’s practice of reporting where a sample originates (e.g., a western canadian university) and the proportions of females and males in a sample. to my knowledge, no study has demonstrated geographic locations of canadian universities moderate variance in findings other than the residential addresses of participants in studies. or, if one study’s participants are 60% female and another study’s participants are 48% female, where are studies demonstrating sex of participant is a proven moderator of cognitive processing or emotion? in general, our research traditions have a very sparse catalog of factors that influence the values of population parameters as a function of factors such as location, proportion of females and males, language first spoken, ethnic heritage and the like (see winne, 2017). in fact, claims about “ecological validity” are more guesswork. there are straightforward repairs for this fault. researchers can analyze their data to investigate the extent to which ecological factors moderate findings in their study. to do this places two burdens on research. first, such factors need to be considered at the outset of a study and measured. second, sample sizes need to be larger. another solution is to scour the literature for findings that demonstrate an ecological factor does moderate what was investigated in a current study. lastly, researchers could refrain from speculating about the extent to which the generalizability of findings may be limited by ecological factors lacking empirical backing. 3. arousal and emotion moderate learning emotion has become a “hot” (pun intended) topic on learning science. how might emotions relate to learning? i argue they are moderators not mediators. knowledge is fashioned by cognitively operating on information, e.g., generating an explanation for causal influences improves comprehension and memory (bisra, liu, nesbit, salimi, & winne, 2018). affect or emotion can moderate a process like self explanation and other cognitive operations applied to information in two ways. first, an affective state or emotional thought may be one factor a learner judges when deciding whether to apply a cognitive operation to particular information. this influences what a learner learns not because the learner is in a particular emotional state but because a cognitive process is or is not applied to particular information when the learner experiences a particular emotion. second, the contents of long-term memory are multidimensional. learners can judge the accuracy, thoroughness, and reliability of their knowledge (although they sometimes err in the accuracy of their judgments). knowledge in long-term memory is associated with a variety of other contents, for example, contextual descriptions about when a proposition was added to memory, what other knowledge it relates to and an affective stance regarding that proposition. elaborations like these form a network, and characteristics of the cognitive network correlate to what learners perceive, learn and can recall. an affective experience or emotional thought is elaborative content. these experiences augment results of “cold” cognitive operations and are added into the network of long-term memory. cognitive operations are the causes of learning. affect and emotional experiences are content that may moderate cognitive operations. several articles in this special issue set a stage for an intriguing deduction. a learner may not be aware a particular emotion is aroused while learning but, nonetheless, that state of arousal may influence cognition or it’s manifestation in interpersonal interactions. thus, what learners learn may vary without the learner’s awareness. what research has yet to explore is whether it is helpful to alert a learner that “now” is the time to regulate arousal. a complication to such a study is learners’ capacities to change one affective state into another, that is, to regulate affect. alerts about tacit emotions will be effective in proportion to this aptitude. future research might probe how well learners can regulate affect and whether learners can learn to regulate affect to benefit learning. 4. puzzles about proxy measures of emotion physiological signals and facial displays are proxies for learners’ experiences of emotions and other motivational constructs. measures generated by instruments that represent these experiences are fundamentally different. measures developed using physiological sensors used to detect arousal, for example, blood flow or eda, rely on comparisons to a baseline. researchers declare deviations from a subjectively set threshold to mark arousal or recovery. deviations span time. in contrast, facial displays are measured by configurations of points on a face at a point in time. these measures are absolute; a configuration is matched or it is not. time is irrelevant to the measurement. this contrast invites a question: what roles does time or rate of change play in a learner’s perception of affect? do learners perceive affect as variance in arousal over time or do they perceive affect as step function, on or off? what implications, if any, might this have for theorizing about affect and for learners approach to regulating affect? 4.1 a vexing tautology both kinds of proxies for affect are validated by asking learners to label their arousal. this creates a tautology. a researcher observes a measurement and asks the learner, “what is your emotion now?” thereafter, that measurement is taken as a signal of the named emotion. how can the researcher be sure? the learner said so. why is this tautology an issue? a benefit of instrumentation used to gather proxy data is its unobtrusiveness. the learner does not have to be interrupted to report states of arousal or variation in affect. however, beyond instrumenting arousal, researchers are interested to identify valence. today’s state-of-the-art instruments can’t do that without confronting the tautology just described. 5. is reliability relevant? when a learner reports and proxy measurements do not correlate highly, two cases need sorting. first, one or the other – the learner or the instrument – may be biased. learners may be reluctant to report some affects. social demand does not disappear in the lab, work groups or classrooms. also, instruments may be miscalibrated. detecting and correcting bias in learners’ reports requires ground truth. the previously mentioned tautology undermines faith there is ground truth. a second case to consider when learners’ reports do not highly correlate with proxy measurements is when the learner, the instrument or both are unreliable. data generated by traditional instruments – responses to survey questions, answers to achievement test items, and the like – almost always prompt researchers to investigate the reliability of those data. following cronbach, gleser, randa and rajaratnam (1972), reliability is a matter of identifying sources of variance in scores, identifying which of those factors (or facets) cannot be explained or controlled, and quantifying the contribution those “nuisance” sources make to variance in scores. that is, unreliability arises because nuisance factors introduce erratic variance in learners’ reports or an instrument’s measurements. some instruments generating proxy data about arousal, for example, sensors registering eda, can be examined for reliability using other mechanical systems with known reliability. this helps to disassociate erratic variability in measurements researchers use to gauge learners’ arousal. any residual variance attributed to extraneous factors is then ascribed to the learner. but, if the learner is not biased (i.e., reliably misreports about arousal), can the learner be wrong in declaring their experience of affect? again, the tautology occludes interpretation. which source of data is the reliable source? 6. final thoughts: applying findings and the evolution of learning science learners’ interpretations of states of arousal create their emotions and motivations. they will report emotions and motivations when asked. researchers gather and interpret proxy data about these states. both learners’ reports and researchers’ proxies correlate with what learners learn. learners’ emotions and motivations do not create knowledge or change knowledge. knowledge is created and amended when learners apply cognitive operations to information. emotions and motivations are factors, mixed with other information, learners weigh in choosing which information to operate on and which cognitive operations to apply to selected information. according to this logic, emotions and motivations are moderators but not causes of learning. if this logic is correct, what are implications for helping learners become better learners? i encourage research aim to identify conditions learners perceive that arouse them. then, identify which of those conditions correlate positively and which correlated negatively with what learners learn. with a more precise map of relations between conditions learners perceive about their instructional context and learning results, instructional designs can more dependably be engineered to offer learners experiences in which learners can more productively self-regulate learning. this account of learning leads to a potential surprise. suppose conditions of instruction that arouse a particular learner are known. suppose further conditions positively related to this learner’s achievements can be incorporated into the instructional context and those that negatively correlate with learning can be removed. can learning science now ignore a learner’s motivations and emotions. is the ultimate goal of learning science finding dependable principles for engineering features of instruction that are devoid of constructs about inner states like emotions and motivations? should learning science strive to locate theoretical constructs such as emotions and motivations to the category of epiphenomena? this line of thinking explicitly and emphatically acknowledges students are individuals. averaging out individual differences to describe effects at the level of a randomly formed group pervades research in learning science. i argue this is a mistake in attempts to model learners as self-regulating agents (winne, 2017). learners are agents (winne, 2018). they choose how to learn. they are in control. what matters is their perception of instructional conditions. each learner’s perception emerges from a history of individual experiences about which conditions merit attention, what those conditions represent and predict about the present context, and what is a path to goals that context. the surprise i reveal about learning science and guidance it can generate for designing instruction rests on the axiom that learners are agents. instructional design therefore must be sensitive to, responsive to and supportive of learners as individuals. in this quest, instrumentation and methods like those described in this special issue are a boon. with energetic attention to methodological and interpretive issues affecting how data are interpreted, learning science carried out using methods and with instrumentation described in this special issue offers great opportunity to elevate understandings about each learner as an individual, about relationships between each learner’s perceptions of their learning environment, and what learners learn in those contexts. key points process data are better described in terms of shared subjectivity rather than as objective. transformations of data change constructs and too often are not recognized in reporting effects. concerns about ecological validity often lack empirical grounding. emotion and motivations are moderators, not causes of learning. grounding interpretations of process data on learners’ reports is tautological, undermining traditional concerns about reliability. surprisingly, progressive research relating process data to achievement may render emotions and motivations to the category of epiphenomena. acknowledgements foundations for this article were developed with financial support provided over many years by the social sciences and humanities council of canada and simon fraser university. references bisra, k., liu, q., nesbit, j. c., salimi, f., & winne, p. h. (2018). inducing self-explanation: a meta-analysis . educational psychology review, 30, 703-725. cronbach, l. j., gleser, g. c., nanda, h., & rajaratnam, n. (1972). the dependability of behavioral measurements. new york: wiley. messick, s. (1989). validity. in r. l. linn (ed.), educational measurement (3rd ed., pp. 13-103). new york: macmillan winne, p. h. (1982). minimizing the black box problem to enhance the validity of theories about instructional effects. instructional science, 11, 13-28 winne, p. h. (1983). distortions of construct validity in multiple regression analysis. canadian journal of behavioural science, 15, 187-202 . winne, p. h. (2017). leveraging big data to help each learner upgrade learning and accelerate learning science. teachers college record, 119(3), 1-24. winne, p. h. (2018). cognition and metacognition within self-regulated learning. in d. schunk & j. greene (eds.),handbook of self-regulation of learning and performance. (2 nd ed., pp. 36-48). new york, ny: routledge. winne, p. h., teng, k., chang, d., lin, m. p-c., marzouk, z., nesbit, j. c., patzak, a., raković, m., samadi, d., & vytasek , j. (2019). nstudy: software for learning analytics about processes for self-regulated learning. journal of learning analytics, 6 , 95-106. spe frontline learning research vol.4 no. 4 special issue (2016) 39 47 issn 2295-3159 corresponding author: crina i. damşa, department of education, university of oslo, po box 1092 blindern, 0317 oslo, norway. email: crina.damsa@ils.uio.no. doi: http://dx.doi.org/10.14786/flr.v4i4.208 revisiting learning in higher education—framing notions redefined through an ecological perspective crina damsa, alfredo jornet university of oslo, norway article received 15 september / revised 8 january / accepted 22 february / available online 10 january abstract this article employs an ecological perspective as a means of revisiting the notion of learning, with a particular focus on learning in higher education. learning is reconceptualised as a process entailing mutually constitutive, epistemic, social and affective relations in which knowledge, identity and agency become collective achievements of whole ecosystems. this conceptualisation implies that learning involves a trans-contextual and multimodal process, in which both learners and their social and material environments change. this article examines the implications of an ecological perspective on framing notions central to learning and current educational research, namely (a) knowledge co-construction and epistemic agency, (b) the role of (material) knowledge resources in the learning process and (c) the trans-contextuality that characterises learning in today’s knowledge society. the discussion concludes by identifying prospects that an ecological perspective offers to education and research on learning in higher education. the insights emerging from this reconceptualisation imply changes in the ways we can enhance and analytically account for the transformative potential of education. they also indicate the necessity for further advancing our understanding of learners’ ways of assembling the epistemic spaces necessary to engage in meaningful learning, their agency in this process and their relationship with the (social and material) environment. keywords: higher education, ecological perspectives on learning, co-construction of knowledge, agency, materiality, trans-contextuality, transformative nature of learning mailto:crina.damsa@ils.uio.no http://dx.doi.org/10.14786/flr.v4i4.208 damsa et jornet | f l r 40 1. introduction this article revisits the notion of learning, particularly in the context of higher education, by discussing key ideas derived from an ecological perspective. such a perspective is timely, considering today’s complex epistemological, social and institutional context, in which interdependent links between human subjectivities, collective human cultures and their environments are becoming more visible. learning is no longer viewed as the mastering of a given subject; it involves being knowledgeable across a variety of contexts, with the ability to connect to remote knowledge resources, communities and (work) sites no longer bound to one particular physical context (carvalho & goodyear, 2015; säljö, 2010). an ecological perspective is of particular interest in higher education, in which changing societal contexts and knowledge dynamics are creating new, open-ended and often unexplored opportunities and challenges in helping learners create professional futures. we are particularly interested in the broadening connections emerging between learning settings in higher education and in other contexts, in which curricular crossovers between scholarly knowledge and professional practices or cross-boundary learning arrangements (such as internships) are frequent. more knowledge is needed about the kind of learning opportunities that emerge as students, educators, professionals and other actors enter into new social and material configurations that are essentially uncertain and open-ended (markauskaite & goodyear, 2014; richter et al., 2015).. the aims of this article are to elaborate on the ecological premises that underlie existing sociocultural, situative and sociomaterial approaches and to discuss the implications for learning research and practice. thus, in this article, we do not develop a distinctly new approach, but rather make visible and elaborate on essential premises that are common to these other frameworks but which often remain tacit or underdeveloped. most importantly, an ecological perspective conceives of learning as an irreducible, mutually constitutive set of relationships between individuals and their social and material environments. thoroughly following this conception leads to new insights about how learners and social contexts develop together, and it also challenges the remaining dualisms present in the literature. we begin this article by laying out the basic premises of an ecological perspective. drawing on an empirical study of project-based, collaborative learning in higher education, we then revisit notions important in current educational research, namely knowledge co-construction and agency, knowledge resources and materials and trans-contextuality. in each case, we discuss how a consideration of the ecological premises allows us to reconceptualise these notions, highlighting aspects of learning that are not so visible when these premises are implicit or simply ignored. 2. learning from an ecological perspective ecology—the study of the relationships of organisms among one another and to their environment— is a central subject in biology. it has also been important for some of the most influential theories on learning and education. these include vygotsky’s (2012) cultural-historical theory, dewey and bentley’s 1949/1999 views of knowing as entailing transactional—i.e., mutually transforming—relations between organisms and the environment, gibson’s (1979) ecological psychology and bateson’s (1972) theory of learning and communication. common to these otherwise disparate scholars are two interlocked postulates that contrast with the individualistic and constructivist theories still present in current research on learning in general and in higher education in particular: (a) learning is not a private, internal process, but involves transactions between people and their socio-material environment, in which both people and environments are transformed; and (b) learning involves not only intellectual dimensions, but also practical and affective ones. in learning, the entire person-in-setting is transformed. we elaborate on this viewpoint by highlighting arguments that propound an expansion of the damsa et jornet | f l r 41 sociocultural theory, emphasising its underlying relational ontology and transformative stance. accordingly, human subjectivity, intersubjective (i.e., social) exchange, and material practice and production are irreducible aspects of a three-fold dialectical system (stetsenko, 2008). the individual actively relates to the environment and other individuals, and those relations then come to form part of how a person relates to herself and how she comes to know and develop (vygotsky, 1987). however, at the same time, the individual engages in the production of new material conditions, and thus acts upon and changes the world so that ‘the individual could no longer be understood without his/her cultural means; and the society could no longer be understood without the agency of individuals who use and produce artefacts (engeström, 2001, p. 134). from an ecological perspective, learning involves not only epistemology—how we come to know things—but also, and most fundamentally, ontology (packer & goicoechea, 2000). that is, knowledge, knowing and knowledgeable action are not ontologically separated from human development, but are inherently related to it. viewed from this perspective, learning is not a process whereby stable, unchanging things become known by unchanging individuals. rather, learning comprises changes in the conditions of human life and activity, in which both individuals and environments change. dewey (1938/1997) captured this mutually transforming relation in the principle of the continuity of experience, in which, through experience of the world—and precisely because experience involves material and bodily engagement—the world changes, thus changing the conditions under which new experiences are had. this is a change that involves not only the intellect but the whole person and how one relates to oneself and to others. experience changes not only the way we intellectually know the world but also the way we affectively and perceptually relate to it (roth & jornet, 2014). although these ecological principles, which imply the primacy of the social ecosystem over the individual, might not be new to readers familiar with sociocultural and situative approaches, their implications are still under-developed in the context of educational research and practice (roth, 2015). 3. redefining key framing concepts from an ecological perspective to better understand how focus on the ecological premises described here contributes to reconceptualising learning, we revisit three notions that are important in current educational research and particularly important for higher education: (a) knowledge co-construction and agency, (b) knowledge resources and (c) trans-contextuality. we draw from a case involving groups of computer engineering students enrolled in an undergraduate introductory course in web design and development. the course included bi-weekly lectures in web development (e.g., html5, java programming languages), lab sessions and a four-week collaborative web design and development project as the main course assignment. the student groups in the course were to receive guidance from the teachers and had various knowledge resources at their disposal. the setting is particularly interesting because it illustrates ways in which higher education programmes are attempting to prepare students to enter professional domains and societal contexts. we focus on one specific group mainly because their active and sustained engagement in the collaborative project illustrates both opportunities and challenges associated with the learning processes. our description of the case is based on video-recordings of actual group interactions and group interviews. damsa et jornet | f l r 42 the focus group consisted of four male students with a genuine interest in software development. the group chose to design and develop a website for an external customer. their learning process was characterised both by opportunities and challenges, related to both the learning of new content (i.e., the programming language and its application) and to ways of thinking and working in the field of web development. on the one hand, the students organised themselves effectively and employed varied and unexpected resources, most of them beyond the formal course curriculum (textbook). they organised the project work by dividing tasks and then holding long face-to-face meetings, during which they discussed strategy, searched for resources, integrated individually programmed codes and fixed bugs. they worked iteratively on their software product by developing and refining (paper and digital) mock-ups, following the methods of experienced web developers, which they explored online or by talking to experts. the group used and engaged with some resources provided in the course and with external resources from the web development community (crowd-source online programming platforms). the feedback on the developing product and on project management was mainly provided by the customer. on the other hand, the students experienced difficulties in understanding the complexity of the task’s requirements. this often led to crashing prototypes and put pressure on the group’s interaction. the students indicated that they found the project very interesting, but that the requirements were not specific enough and discussion often went on in circles, without a productive loop. these emerging issues were solved through group discussions, trial and error and by using clues found online and in customer feedback. this unplanned feedback compensated for the relatively little guidance received by the group. although the group’s assessment of the assignment was positive, they received a lower grade than they expected. the students critiqued the complexity of the task, which combined technical and project management challenges, neither of which were made completely clear in the assignment guidelines. in addition, the students considered that identifying and addressing errors before grading could have been facilitated by more sustained guidance during the development process. 3.1 redefining knowledge co-construction and agency the notion of knowledge construction is widely used in educational research (e.g., schellens & valcke, 2006), most often to denote individuals’ formations of mental models and representations. in an attempt to overcome focus on the individual, higher education research has more recently used the notion of co-construction to denote processes that focus on collective participation in learning activities and on transforming the environment (damşa, ludvigsen, & andriessen, 2013; richter et al., 2015;). the case above clearly offers an example of such a co-construction process. the students worked jointly to create a product, and in this context, learning was the result not of an individual but of a social process, joint efforts and the resources involved. however, the notion of co-construction is often associated with a focus on how participants ‘negotiate’ meanings about given practices and topics (e.g., heo, kim, & kim, 2010). here, a triad formed by subject, object and meaning is maintained, in which each of the three elements remains selfcontained and therefore ontologically primary. co-construction thus turns the focus away from individual minds and toward joint group cooperation. however, in doing so, it still retains the ontological primacy of subjects and objects over the social, transformative process. latour (2013) eloquently depicts this limitation: ‘every use of the word construction … opens up an enigma as to the author of the construction’ (p. 158). in the case of the student group described above, every step in the development process opened up new paths of inquiry and development, but it also forced the group to face new problems and choices. by engaging with these problems and activities, they generated new ideas and conceptual artefacts (the programming code), which in turn represented new departure points in their endeavour (damşa & nerland, 2016). an ecological perspective gives primacy to the social, shifting the focus away from subjects and objects and pointing to a ‘a better appreciation of the material flows and currents of sensory awareness within which both ideas and things reciprocally take shape’ (ingold, 2011, p. 10). in the described case, the students damsa et jornet | f l r 43 develop a product together. yet, the product itself is in constant transformation, taking different forms (sketches, drawings and prototypes). the students themselves do not have a clear idea of what they are in the process of ‘co-constructing’, and much of what happens is not planned, but emerges. there is not just jointly knowing, but there is also being uncertain, a condition that nonetheless does not impede the students’ engagement in professional practices for which they have not yet developed expertise. as has been thematised in recent research on transfer taking and the ecological perspective, this participation is possible not because the individual students carry with them already formed understandings; rather, it is because there is an emergent constitutive order that cannot be attributed to the individual mind, but to an unfolding field of action (damşa et al., 2010; jornet, roth, & krange, 2016). here, the learner’s receptivity, affectivity and competence to engage in social relations with others is primary over their individual (intellectual) intentions. in analysing so-called co-construction events, the focus cannot be on either the constructing agents or the constructed objects, because both are constantly changing. this has implications for the notion of agency, which has not received adequate attention in higher education research (ashwin, 2008). some studies have begun to reconceptualise learning-related agency in terms of shared epistemic agency (damşa et al., 2010). these studies have considered the social–relational aspects of learning and how knowledgegenerating processes become more than an individual endeavour. transformative conceptions of agency are also being examined (e.g., kumpulainen, 2013; engeström, sannino, & virkukken, 2014). attention to the ecological premises creates the need to consider how collaborations involve affective and perceptual changes, in which learners are not only intellectual agents (how could they otherwise engage in practices for which they do not yet have the required knowledge?) but also subject to the performative and affective relations in which they engage (roth & jornet, 2014). in the case presented here, the students not only (co)construct but draw from and appropriate cultural resources that are not their own. an ecological approach should be able to account for the role of these resources in ways that do not reify individual (agent, subject)– tool (object, world) dualism. 3.2 redefining knowledge resources and materials traditionally, domain-related knowledge has been ‘translated’ into classroom curricula that emphasise conceptual knowledge and understanding, in which teaching materials are often seen as involving knowledge representations or tools (säljö, 2010). with the growth of ubiquitous information and communication technologies (icts), the range of knowledge resources available for teaching and learning has dramatically expanded. this is visible in the case above, in which the students did not turn to the textbook, but most often relied on online professional programming and/or social platforms. classical literature has it that learning involves a process of interpreting and decoding representations to solve an already given problem. not possibly knowing the domain and its practices in advance (learning these is the goal of the course), the students’ engagement with resources and materials was problemand world-forming. yet, how materials partake in the formation of students’ worlds (i.e., perceptions, knowledge or identity), that is, in processes of ontogenesis, is rarely discussed in connection to learning and research in higher education. an ecological perspective is in line with recent sociocultural conceptualisations that view materials as meaning or sense-making resources (säljö, 2010). according to this view, materials come to form integral part of thinking and doing through processes of sign formation (vygotsky, 1987). it is not, as often is implied, that materials and technology ‘mediate’ between learners and the world, which would maintain cartesian dualism (stetsenko, 2005). rather, materials become entangled with people’s lives and form new organs, which cannot be reduced to either learner (subject) or material (object, tool) (vygotsky, 1989). learners orient towards materials, which organise the participants’ perceptions and actions. at the same time, these actions transform the very materials that shaped them in the first place. accordingly, the ‘things’ of learning—that is, ‘teachers, learning activities and spaces, knowledge representations such as texts, pedagogy, curriculum content, and so forth’ (fenwick et al., 2012, p. 2)—cannot be taken for granted, but damsa et jornet | f l r 44 are seen as ‘themselves effects of heterogeneous relations’ (p. 2). in the case described above, the learning materials are diverse, but they certainly come to form an ecology that both organises and depends upon the organisation of the students’ joint work. by coming into contact with knowledge resources related to professional practice (through delivering to customers, searching for information in professional and crowdfunded fora and using already existing codes), the participants are not so much being mediated to access knowledge as they are developing habits, dispositions and forms of orienting towards the epistemic, digital and physical world of which they already form an integral part (damşa & nerland, 2016). 3.3 trans-contextuality an ecological perspective is relevant for explaining learning as a social and continuous process occurring across contexts and occasions. individualist approaches rely on notions of transfer of knowledge to explain how learners move knowledgeably across contexts (e.g., reed, 2012). boundary crossing has been formulated as an alternative notion to account for how moving across settings involves social and material (rather than only mental and abstract) processes (akkerman & baker, 2011). taking the perspective of the students in our case, however, no explicit boundaries were apparent between their university setting and the professional world, in which they were already participating in several ways (through meetings, online or in contact with customers). indeed, the students’ concerns emerged with respect to both the formulation of the task (an aspect of schooling practice with which they are familiar) and aspects of the professional programming practices that may be said to be beyond the university’s boundary. an ecological perspective challenges the notion of boundary, demanding instead an account that adequately describes the lines of becoming (intellectual, social and relational) that learners and materials together constitute and undergo. by crossing contexts, learners assemble an epistemic space (markauskaite & goodyear, 2014), in which individual and collective goals, needs and epistemological orientations develop, capitalising on teaching and guidance, resources and infrastructure. there are practical and methodological challenges associated with learning trajectories traversing time and space through (digital) technology (carvalho & goodyear, 2015; erstad, 2013). learners have more access to information from a multitude of sources; it is a ‘polyphonic’ (säljö, 2010), multi-contextual world. although it is generally considered beneficial for learning, capitalising on widely available and distributed knowledge, resources and tools is not a straightforward process (orlikowski, 2007). in our exemplary case, challenges and tensions emerged in relation to various aspects of the learning situation: the affordances offered by the state-of-the-art knowledge, practices and technologies, the students’ positioning in relation to the tasks and the domain, and the teaching and assessment practices within the institutional setting. however, from a perspective that takes the ecological premises laid out here seriously, these challenges and tensions cannot be the result of self-contained learners, which inter-act with self-contained resources and self-contained (i.e., contained within boundaries) practices. as higher education continues to develop outreach practices in which schooling goes into professional practice, and vice-versa, we no longer have a crossing of boundaries, but a new line of development within which different materials and subjectivities unfold. 4. concluding remarks: ecology and learning in higher education this contribution elaborates on considerations of learning as a set of mutually constitutive relationships among individual, institutional and societal contexts. the empirical material illustrates how, as a group of students engaged in actual relations in and across knowledge domains (e.g., the school, the professional field of software programming), there is not just co-construction of knowledge but also reconfiguration of their affective and relational orientations. while learning software development and damsa et jornet | f l r 45 programming, the students generated knowledge and re-enacted practices and objects (see stetsenko, 2005) that fed into and transformed their knowledge landscape and their personal horizons. in line with the transformational ontology posited by the ecological perspective, a reconsideration of the role of higher education involves preparing people to not only learn and adapt to existing knowledge, practices and environments but also to actively transform them (stetsenko, 2008). such a perspective has the potential to guide learners, education and society towards a notion of learning that accounts for the fluid elements of the epistemic, social and material-digital environments we are surrounded by and partake in (ingold, 2011; säljö, 2010). within this augmented learning context, the role of formal educational settings, such as higher education, entails more than simply organising learning and helping learners go through authoritative obligatory passage points (callon, 1984). it needs to offer resources to allow students become critical and productive participants, helping them manage their own learning and development trajectories. acknowledging that learning is an achievement of whole (eco-) systems, and not primarily of individuals alone, educational settings should orient not towards individuals but towards transformational potentials. if learning is not about acquiring knowledge but about changing the world, then providing tools and opportunities for that change should become primary. an ecological perspective thus addresses the need to view learning not from a normative perspective—i.e., in terms of the competences we want learners to achieve—but in terms of the life world of the learner, for whom the structures in the world (and the boundaries thereof) are not the same as that of the educator or researcher. the insights emerging from this reconceptualisation indicate the necessity for a more sophisticated and versatile account of the transformative potential of learning and for advancing our understanding of learners’ authoritative positioning and agency in learning (kumpulainen, 2013), their relationship with the (epistemic, social and material) environment and the way they assemble the epistemic space necessary to engage in meaningful and transformative learning. keypoints this article provides an ecological perspective for revisiting premises and notions fundamental to learning and development, in relation to higher education contexts, expands on current sociocultural and sociomaterial theories by proposing learning as a transformative process whereby both the learner and the environment change, and which entails development that is not only intellectual but also social and affective, builds on empirical material from a study of a higher education course aimed at bridging educational and professional contexts and challenges higher education to reconsider the premises for defining learning and to provide the appropriate framework for transformative learning to take place and for remaining dualistic viewpoints to be overcome. references akkerman, s. f., & bakker, a. (2011). boundary crossing and boundary objects. review of educational research, 81(2), 132–169. ashwin, p. (2008). accounting for structure and agency in ‘close-up’ research on teaching, learning and assessment in higher education. international journal of educational research, 47, 151–158. bateson, g. (1972). steps to an ecology of mind: collected essays in anthropology, psychiatry, evolution, and epistemology. chicago, il: university of chicago press. damsa et jornet | f l r 46 callon, m. (1984). some elements of a sociology of translation: domestication of the callps and the fishermen of st brieuc bay. the sociological review, 32, 196–233. carvalho, l., & goodyear, p. (2014). the architecture of productive learning networks. new york, ny: routledge. damşa, c. i., kirschner, p. a., andriessen, j. e. b., erkens, g., & sins, p. h. m. (2010). shared epistemic agency an empirical study of an emergent construct. journal of the learning sciences , 19(2), 143–186. doi:10.1080/10508401003708381 damşa, c., ludvigsen, s., & andriessen, j. (2013). knowledge co-construction – epistemic consensus or relational assent? in m. baker, j. andriessen, & s. jaarvela (eds.), affective learning together: social and emotional dimensions of collaborative learning (pp. 97-119). london, england: routledge academic publishers & taylor and francis group. damşa, c.i., & nerland, m. (2016). student learning through participation in inquiry activities: two cases from teaching and computer engineering education. vocations and learning, doi:10.1007/s12186-0169152-9 dewey, j. (1997). education and experience. new york, ny: touchstone. (original work published 1938) dewey, j., & bentley, a. f. (1999). knowing and the known. in r. handy & e. e. hardwood (eds.), useful procedures of inquiry (pp. 97–209). great barrington, ma: behavioral research council. (original work published 1949) engeström, y. (2010). expansive learning at work: toward an activity theoretical reconceptualization. journal of education and work, 14(1), 133-156. engeström, y., sannino, a., & virkkunen, j. (2014). on the methodological demands of formative interventions. mind, culture and activity, 21(2), 118–128. doi:10.1080/10749039.2014.891868 erstad, o. (2013). digital learning lives: trajectories, literacies, and schooling. new york, bern, berlin, bruxelles, frankfurt am main, oxford, wien, peter lang publishing group. fenwick, t., edwards, r., & sawchuk, p. r. (2012). emerging approaches to educational research: tracing the socio-material. london: routledge. gibson, j. j. (1979). the ecological approach to visual perception. boston, ma: houghton mifflin. heo, h., lim, k. y., & kim, y. (2010). exploratory study on the patterns of online interaction and knowledge co-construction in project-based learning. computers & education, 55, 1383–1392. ingold, t. (2011). redrawing anthropology: materials, movements, lines. aldershot, england: ashgate. ingold, t. (2015). the life of lines. london, england: routledge. jornet, a., roth, w.-m., & krange, i. (2016). a transactional approach to transfer episodes. journal of the learning sciences. doi:10.1080/10508406.2016.1147449 jornet, a., & steier, r. (2015). the matter of space: bodily performances and the emergence of boundary objects during multidisciplinary design meetings. mind, culture, and activity, 22, 129–151. kumpulainen, k. (2013). the legacy of productive disciplinary engagement. international journal of educational research, 64, 215–220. doi:http://dx.doi.org/10.1016/j.ijer.2013.07.006 latour, b. (2013). an inquiry into modes of existence: an anthropology of the moderns. cambridge, ma: harvard university press. markauskaite, l., & goodyear, p. (2014). professional work and knowledge. in s. billett, c. harteis, & h. gruber (eds.), international handbook of research in professional and practice-based learning (pp. 79– 106). dordrecht, netherlands: springer. orlikowski, w. (2007). sociomaterial practices: exploring technology at work. organization studies, 28, 1435–1448. doi:10.1177/0170840607081138 packer, j. m., & goicoechea, j. (2000). sociocultural and constructivist theories of learning: ontology, not just epistemology. educational psychologist, 35(4), 227–241. doi:10.1207/s15326985ep3504_02 reed, s. k. (2012). learning by mapping across situations. journal of the learning sciences, 21, 353–398. richter, c., allert, h., albrecht, j., & ruhl, e. (2015). grappling with the not-yet-known. in o. lindwall, p. häkkinen, t. koschman, p. tchounikine, & s. ludvigsen (eds.), exploring the material conditions of learning: the computer supported collaborative learning (cscl) conference 2015, volume 1 (pp. 284–291). gothenburg, sweden: the international society of the learning sciences. damsa et jornet | f l r 47 roth, w.-m. (2015). the primacy of the social and sociogenesis. integrative psychological and behavioral science. doi:10.1007/s12124-015-9331-5 roth, w.-m., & jornet, a. (2014). toward a theory of experience. science education, 98, 106–126. schellens, t., & valcke, m. (2006). fostering knowledge construction in university students through asynchronous discussion groups. computers & education, 46, 349–370. stetsenko, a. (2005). activity as object-related: resolving the dichotomy of individual and collective planes of activity. mind, culture, and social interaction, 12(1), 70–88. doi:10.1207/s15327884mca1201_6 stetsenko, a. (2008). from relational ontology to transformative activist stance on development and learning: expanding vygotsky’s (chat) project. cultural studies of science education, 3, 471–491. säljö, r. (2010). digital tools and challenges to institutional traditions of learning: technologies, social memory and the performative nature of learning. journal of computer assisted learning, 26, 53–64. doi:10.1111/j.1365-2729.2009.00341.x vygotsky, l. s. (1987). the collected works of l. s. vygotsky: vol. 1. problems of general psychology. new york, ny: plenum. vygotsky, l. s. (1989). concrete human psychology. soviet psychology, 27, 53–77. vygotsky, l. s. (2012). thought and language (rev. ed.). cambridge, ma: mit press. (original work published 1986) microsoft word castello et al_publication.docx frontline learning research vol.3 no. 3 special issue (2015) 39 54 issn 2295-3159 corresponding author: montserrat castelló, facultat de psicologia, ciències de l’educació i l’esport. blanquerna. universitat ramon llull, císter 34. 08022. barcelona. phone: +34932533000, fax: +34932533031, email: montserratcb@blanquerna.url.edu doi: http://dx.doi.org/10.14786/flr.v3i3.149 researcher identity in transition: signals to identify and manage spheres of activity in a risk-career montserrat castellóa, sofie kobayashib, michelle k. mcginnc, hans pechard, jenna vekkailae, & gina wiskerf auniversitat ramon llull, spain buniversity of copenhagen, denmark cbrock university, canada dalpen-adria-universität klagenfurt, austria euniversity of helsinki, finland funiversity of brighton, uk article received 31 january 2015 / revised 14 june 2015 / accepted 9 july 2015 / available online 28 september 2015 abstract within the current higher education context, early career researchers (ecrs) face a ‘risk-career’ in which predictable, stable academic careers have become increasingly rare. traditional milestones to signal progress toward a sustainable research career are disappearing or subject to reinterpretation, and ecrs need to attend to new or reimagined signals in their efforts to develop a researcher identity in this current context. in this article, we present a comprehensive framework for researcher identity in relation to the ways ecrs recognise and respond to divergent signals across spheres of activity. we illustrate this framework through eight identity stories drawn from our earlier research projects. each identity story highlights the congruence (or lack of congruence) between signals across spheres of activity and emphasises the different ways ecrs respond to these signals. the proposed comprehensive framework allows for the analysis of researcher identity development through the complex and intertwined activities in which ecrs are involved. we advance this approach as a foundation for a sustained research agenda to understand how ecrs identify and respond to relevant signals, and, consequently, to unravel the complex interplay between signals and spheres of activity evident in struggles to become researchers in a risk-career environment. keywords: researcher identity; identity development; signals; spheres of activity; risk-career castelló et al | f l r 40 1. introduction the position of early career researchers (ecrs) has always been challenging and involves many difficulties that must be conquered in order to secure personally and intellectually satisfying positions and a strong sense of self as a researcher. however the situation has become particularly acute over the past few decades as higher education systems have been confronted with changing worldwide circumstances due to the requirements of the knowledge society and various economic and political constraints (cantwell, 2011; winter, 2009). changes are especially dramatic with respect to the nature of researcher education and identity development for ecrs who struggle with the demands of global mobility, the lack of stable or permanent positions, and the need to consider alternative careers (introduction, this issue). ecrs are now embarked upon what we define as a ‘risk-career’ (weber, 1947), rather than, as previously, a relatively more predictable academic career. in this changing context, traditional milestones that enabled ecrs to build their identities are disappearing or subject to reinterpretation. ecrs need to identify or reinterpret signals (yorke, 2009) from institutions and academic communities. signals related to expectations, constraints, and opportunities may cue performance and progress toward professional skill development and potential career directions. although studies focusing on identity development or identity trajectories have grown exponentially in recent years, research in the field has not yet resulted in a comprehensive framework that integrates identity and signals or offers a comprehensive way to analyse researcher identity as it unfolds across the different systems or spheres of activity in which ecrs participate. the specific aim of this article is to explore researcher identity in relation to the signals ecrs perceive across different spheres of activity as they attempt to manage a risk-career. our overarching purpose is to offer a comprehensive framework useful for analysing how signals can be identified and used to build a researcher identity in a risk-career, one where career trajectories are less certain than they were. consistent with the position presented in the first article of this special issue (introduction), we assume the definition of early career researchers (ecrs) presented by the earli special interest group researcher education and careers to include individuals with up to 10 years of research experience, which means doctoral students, postdoctoral researchers, newly-hired lecturers, as well as professionals in universities and other employment. globally, stable academic careers have dwindled and a range of alternative academic positions has emerged: contract teaching, contract postdoctoral research, teaching-only lecturer positions, and administrative positions related to research, teaching, or student services. in the non-academic context, emerging types of and contexts for employment include business, government, non-governmental organisations, banking, industry, and previously unknown entities (e.g., start-up companies). the existing research literature base provides little information about the experiences of individuals facing uncertain employment or alternative academic positions. prior studies shed light on quite narrow aspects, such as international postdoctoral employment in enterprise modes of academic production (cantwell, 2011; porfilio, gorlewski, & pineo-jensen, 2013), or critical interactions that shape careers for early academics and new teaching staff (hemmings, hill, & sharp, 2013). little is known about required competencies, employment satisfaction, the range of skills required, and, most specifically, ways to formulate a researcher identity in this changing environment. it is a paradox that on the one hand research and advanced education is of ever-growing importance for knowledge-based economies, while on the other hand the attractiveness of academic working conditions is decreasing and it is becoming more difficult for ecrs to embark on stable careers. ecrs are exposed to contradictory signals about expectations, constraints, and opportunities in relation to their careers. the knowledge-based economy boosts an expansion in training positions for researchers (cyranoski, gilbert, ledford, nayar, & yahia, 2011), which signals to potential students that it is worthwhile to start doctoral training. however, once they become students, these individuals may learn that the increase in training positions is not matched by an increase in stable jobs for researchers. they realise, sometimes too late, that castelló et al | f l r 41 they have chosen a risk-career in which they might face the danger of precarious positions where they may or may not feel they can contribute as researchers. in a risk-career, traditional mechanisms are fading for individuals to identify as ‘members’ of a collective, and for others to attribute or acknowledge such membership (castelló & iñesta, 2012). ecrs are positioned differently to those already established within their fields and may hold various competing interests and identity constructions (archer, 2008). ecrs are not only ‘becoming’ but also ‘unbecoming’ (archer, 2008), meaning that they are not always recognised by others in terms of the dominant structures and practices. ecrs may also unbecome by their own choice as a possible form of resisting the dominant practices (archer, 2008; danaher, 2015; pyhältö & keskinen, 2012). although there is a vast body of literature from the field of higher education about professional identity, academic identity, authorial or writing identity, emotional identity, and other related concepts, many studies lack a clear definition of what identity means, how the notion is operationalised or analysed, and the underlying theoretical and methodological assumptions (trede, macklin, & bridges, 2012). moreover, it is common that studies do not focus on identity as a whole, but rather tend to conceptualise it as a multidimensional construct that can be applied to different activities and systems in which particular experiences are developed. there is a need for an integrative and comprehensive framework to identify and analyse signals and changes in identity. in order to understand such mechanisms, we first present a comprehensive framework of the notion of researcher identity, produced by analysing spheres of activity related to researcher and career development to account for theoretical assumptions about researcher identity in a risk-career; and second we illustrate this framework through eight identity stories drawn from our earlier research projects. 2. a comprehensive framework for the study of researcher identity in a changing environment we conceptualise researcher identity to be a dynamic and social process that develops through participation in different disciplinary and academic communities. this conceptualisation also implies that researcher identity is relational and discursively constructed through a recursive and iterative process of subject positioning, which involves a process of self or subject constructions that influence the ways people interpret the present and learn for the future (harré, moghaddam, cairnie, rothbart, & sabat, 2009; holland, lachicotte, skinner, & cain, 2001; lave & wenger, 1991; sutherland & taylor, 2011). therefore, researcher identity should not be considered a static product but a continuous process of identification, which can be described in terms of development (baker & lattuca, 2010) or an ‘identity-trajectory’ (mcalpine, amundsen, & turner, 2014) that accounts for both the continuity of stable personhood over time and a sense of ongoing change. this conceptualisation presents identity development as a route by which a newcomer becomes part of a community (golde, 1998; lave & wenger, 1991; sweitzer 2009). however, socialisation could also be considered a two-way process (mcdaniels, 2010) in which an individual actively explores possibilities for differentiation and negotiation with a community to find balance between institutional and structural positioning (archer, 2008), and to create space for personal autonomous actions in a changing environment (clegg, 2008). according to this broad sociocultural conceptualisation of researcher identity, it is important to account for the particular activities and interactions that characterise the different communities in which ecrs participate. ecrs interact with and engage in multiple communities, and these different communities shape the activities and positions that ecrs adopt. in the current complex higher education work conditions, crossing boundaries is one of the requirements of researchers and this includes personal, disciplinary, national, and professional positions related to research, teaching, administration, and leadership (boden, borrego, & newswander, 2011; holley, 2010; mcalpine & amundsen, 2009; sweitzer, 2009). castelló et al | f l r 42 we propose the notion of spheres of activity as a helpful construct to characterise and explain the prototypical activities of different communities in which ecrs tend to be engaged. a particular sphere, as it works as a system, is shaped by rules, artefacts, and specific divisions of labour (engeström & sannino, 2010) and by the actions that individuals and communities develop to achieve outputs. actions, although performed by individuals, are also socially organised within communities, which accounts for recurrent actions shared by a group of individuals. at the same time, each of the spheres in which an individual participates can be shaped by different communities. therefore, notions of spheres of activity and communities are not synonymous. spheres can be considered domains or fields of participation in life or in human activity. communities are defined by the types of social actions that are developed by different groups of individuals within each sphere of activity. for instance, the learning sphere includes several communities (e.g., a community of peers participating in regular doctoral courses or seminars, or a community of phd students in a research team working with—and learning from—more senior researchers). in the case of ecrs, we distinguish at least three related spheres of activity that affect identity development (camps & castelló, 2013), as illustrated in figure 1. some representative activities of a particular sphere are emphasised or have more relevance at the beginning of the process of be(com)ing a researcher (e.g., completing set requirements for a doctoral program), whereas other activities (e.g., publishing or securing research funding) may be more common throughout or at advanced stages of researcher development. figure 1. spheres of activity for ecrs. the learning activity sphere is characterised by those more or less formal situations in which ecrs are situated as students in learning environments of different communities. these situations include seminars and doctoral courses, some aspects of supervisor relationships, and the increasing variety and number of development activities assessed for doctoral students and probationary or apprentice research staff (mcalpine, jazvac-martek, & hopwood, 2009). displaying these activities has to do with redefining the student identity developed in previous stages, since roles, outputs, and artefacts differ as students advance in their doctoral journeys toward the status of and possible employment as researchers. moreover, ecrs should also learn, usually implicitly, ways of acting, values, and practices that are prototypical of relevant disciplinary communities. institutional expectations of an increase in interdisciplinary work make this learning of disciplinary activity even more complex for ecrs faced with contradictory and fuzzy signals regarding appropriate actions and expected outputs. castelló et al | f l r 43 the professional activity sphere is shaped by prototypical activities defining the professional communities to which ecrs belong or aim to belong when they finish their journeys. at times, these activities overlap with others from the learning sphere, which is common during doctoral study, particularly for those ecrs who aim to develop academic careers. in these cases, participating in scientific events, applying for grants and funding, presenting research, or teaching courses could simultaneously serve as learning and professional activities toward acquiring a university position. for ecrs advancing research careers in professional settings outside academia, the scenario is still more complex, since they must understand and participate in two—or more—distinct professional communities. doctoral students who are already employed in professional roles within or outside the academy may experience particular challenges deciding when and how to prioritise the learning activity or professional activity sphere. a third sphere of activity accounts for personal, family, and social activities that are variably related to learning and professional activities, especially in terms of values and aims, and the need to develop a researcher identity aligned with one’s personal intentions. there is a complex, dynamic interplay between ecrs and the spheres in which they are involved (pyhältö, nummenmaa, soini, stubb, & lonka, 2012; vekkaila, pyhältö, & lonka, 2013a) and hence, between the signals perceived across these spheres. one way to understand the dynamics contributing to researcher identity for ecrs is to explore them in terms of congruence (fit) (see edwards, 2007) or lack of congruence (misfit) between individuals and their environment (castelló, iñesta, & corcelles, 2013; pyhältö et al., 2012; vekkaila et al., 2013a). ideally, a constructive congruence is formed among the signals from the overlapping spheres. for instance, the personal or the professional sphere outside academia may function as a source of support for ecrs’ activities and goals in the learning sphere (such as earning a doctorate and aspiring toward a researcher career) (vekkaila, pyhältö, & lonka, 2013a, 2013b). on the other hand, activities in the personal and professional spheres may compete with the academic learning activities and often-distant goals (e.g., publishing articles and books, securing a permanent position at the university) by providing rival interests and prioritising short-term goals (vekkaila et al., 2013a, 2013b). moreover, depending on the context and domain, there are likely to be tensions or contradictions among the signals across these spheres. for instance, in much doctoral education, professional and learning spheres are highly intertwined. pursuing a doctorate entails conducting—and learning to conduct—research work and increasingly writing and publishing—and learning to write and publish—articles (castelló et al., 2013; pyhältö et al., 2012; vekkaila, pyhältö, hakkarainen, keskinen, & lonka, 2012). through numerous interactions across the spheres, individuals’ positioning, the environments, and the relations among them are constantly evolving and being re-negotiated (camps & castelló, 2013; lave & wenger, 1991). intersections across spheres are multiple and unavoidable, which may illuminate synergies or contradictions for ecrs who are striving to make sense of the signals and transitions they encounter as they formulate a researcher identity. discovering and sharing the changes in rules and recommended actions in each sphere could enhance ecrs’ awareness of new signals crucial for researcher identity development in the 21st century. it might also be useful to explain transitions between communities and what these transitions imply for the processes involved in researcher identity construction (castelló et al., 2013; giddens, 1991; goffman, 1967; strandler, johansson, wisker, & claesson, 2014; wisker & robinson, 2012). 3. identity stories earlier work conducted individually by team members on various projects concerned issues related to ecrs’ identity, engagement, sense of belonging, writing, metacognition, wellbeing, and resilience, among other topics. the risk-career, signals during researcher career development, researcher identity, and the need to manage contradictions and tensions emerged as main themes during our discussion of these previous castelló et al | f l r 44 research projects. in order to illustrate the proposed comprehensive framework, we consulted our existing datasets to build indicative identity stories exemplifying ecrs’ experiences and trajectories in the light of their recognition of and response to signals affecting the development of researcher identity in the context of a risk-career. data were drawn from projects about doctoral education in finland (pyhältö et al., 2012; pyhältö, stubb, & lonka, 2009; vekkaila et al., 2012, 2013a) and in the united kingdom (wisker et al., 2010), and studies about being academics and researchers in canada (mcginn, 2012a, 2012b). this analysis is based upon interview transcripts from a total of 83 ecrs across three countries: finland (35), the united kingdom (30), and canada (18). all interviews were gathered according to the research ethics clearance procedures in the respective jurisdictions with care to protect the rights of the (potentially vulnerable) ecrs. to reduce risks of harm and ensure compliance with accepted procedures, individual team members worked directly with the original interview transcripts from their respective projects and did not share raw data with others. all the interviews included information about how the ecrs perceived themselves. the finnish participants were asked to discuss significant positive and negative turning points during their doctoral journeys and how they perceived themselves at these points (vekkaila et al., 2012, 2013a). the interviews conducted in the united kingdom aimed to draw out the participants’ experiences and identify transitions, turning points, and key learning moments within their doctoral journeys (wisker et al., 2010), whereas the participants in canada were asked explicitly to describe their perceptions of themselves as academic researchers (mcginn, 2012a) and more generally in academe (mcginn, 2012b). we selected interviews that presented (a) the strongest and clearest expressions of researcher identity, that is, participants’ perceptions of themselves as researchers; and (b) evidence of ecrs’ recognition of and response to signals from different spheres of activity while managing a risk-career. team members re-analysed their respective interviews using our jointly constructed framework of identity, spheres, and signals. based on these new analyses, we prepared drafts of identity stories from original interview transcripts, which were then reviewed by the whole research team. initially we started with a large number of identity stories; however, after several careful readings, we collectively selected a final set of eight identity stories on the basis of their potential capacity for illustrating the ways signals are emerging and interpreted by ecrs in the current higher education context of risk-careers. in our selections, we specifically sought diversity in terms of countries of origin and location, fields of research, and researcher career systems. although this analysis emphasises this limited set of just eight identity stories, the issues addressed were prevalent across our international datasets and not limited to these eight ecrs. 4. discussion of the identity stories identity stories from mari, jaakko, dan, wang, siiri, elaine, aatu, and kenneth (these are pseudonyms) provide a wide range of examples of development of researcher identity, spheres, and signals and represent empirical examples of researcher identity and tensions during early-stage identity construction. in each story, we highlight relevant spheres, the congruence (or lack of congruence) between signals coming from one sphere and another sphere (or among signals coming from multiple spheres), and the ways in which ecrs respond to (or, equally important, miss) these signals. we also specify the nature of different types of signals, ranging from those perceived as implicit to more explicit ones, as well as characteristics of responses in terms of agency and identity construction. for ease of reference, we provide an overview of the eight identity stories in table 1. castelló et al | f l r 45 table 1 overview of identity stories pseudonym gender country english as first (l1) or second (l2) language interviewed in first (l1) or second (l2) language position at time of interview mari f finland l2 l1 doctoral student jaakko m finland l2 l1 doctoral student dan m uk (originally israel) l2 l2 lecturer wang m canada (originally china) l2 l2 assistant professor siiri f finland l2 l1 doctoral student elaine f canada l1 l1 assistant professor aatu m finland l2 l1 doctoral student kenneth m canada l1 l1 educational developer (doctoral studies on hold) 5. signals of congruence congruence between individuals’ perceptions of and interpretations of signals in the learning activity sphere and those in the professional activity sphere can reinforce ecrs’ identity as researchers. such congruence was evident for two finnish doctoral students from the natural sciences. participating in international conferences and networking across universities is an increasingly common requirement for establishing a successful researcher career, and therefore these professional academic activities are also activities and goals in the learning activity sphere. mari reported, “the most significant turning point in the final phase of doctoral studies was the meeting of the international researcher whose research had inspired me from the beginning of my studies…. we also talked that i could visit her group and conduct my post-doc project there.” her success in networking internationally and establishing connections with an international researcher were signals that strengthened her identity as a scientist and prompted her to make active decisions in terms of her further research and future in academia. a similar fit was evident in jaakko’s story. he also strengthened his identity as a researcher by reading signals from the international arena: “i presented my results, got encouragement from others, and i learnt what they did.” moreover, jaakko moved beyond the learning activity sphere, where dependence on a supervisor is common, into the professional activity sphere through co-authoring an article with other international researchers: “in this article the main responsibility of writing was shared between me and another, more senior scientist…. if my supervisor would have been involved in this i would not have such an independent role in writing the article and collaborating with others.” such signals enabled jaakko to develop an identity consistent with moving castelló et al | f l r 46 toward the post-doctoral phase of his academic career: “and they are also indicators in my cv showing that i have the competence to work with others outside my own group.” figure 2. interpretation of signals from spheres of activity for mari and jaakko. dan’s identity story illustrates the importance of congruence between the perceived signals from the professional and the personal activity spheres. brought up in an orphanage in israel, dan had neither family ties nor money. he developed an internal sense of values and determination to succeed, becoming a physical education instructor for young men with difficult histories involving crime, poverty, and lack of education. his research was on physical education training. his achievement during a phd in the united kingdom was intellectual and personal; it affected his sense of himself as a professional success and a role model: “my wife fell in love with me more, she appreciates me more, especially my father in law. my students too—they appreciate me and they were my catalyst for my research. i dedicated this research to my students while other people usually dedicate their phd to their family…. the close society—family and colleagues— appreciate this, including my students who are proud of their lecturer who is perceived as a role model. a role model in the practical area and in the cognitive area—a doctor and a professional. usually, when i publish or when i participate in conferences, professional development courses or workshops then the title is meaningful.” the fit between the personal and the professional activity spheres for dan applied to his personal values and family as well as his research and teaching. figure 3. dan’s interpretation of signals from spheres of activity. castelló et al | f l r 47 6. solving tensions and incongruences such congruence between the perceived signals coming from the personal and the professional activity spheres was also important for wang, but his identity story is more complex with tensions within the professional sphere between two competing communities with different values and rules. wang completed his doctoral degree, was hired at a canadian university, and had recently transferred to a second canadian university. both positions were as a tenure-track assistant professor in education. prior to his doctoral studies, he was employed as a professor in his home country, china, where he received awards as a researcher and was selected for an international scholarship. his first appointment in canada was at a research-intensive university where there were extensive pressures to secure research funding: “this is a bombard, this unspoken language.” he was happy to have transitioned to a less competitive environment where his national research grant and strong publication record allowed him to stand out rather than trail behind colleagues. perceived improvements in his general health and wellbeing indicated the importance for him of experiencing greater congruence between his personal and professional activity spheres. figure 4. wang’s interpretation of signals from spheres of activity. another identity story revealed perceived signals in the personal activity sphere that represented a misfit with one community of the professional activity sphere and a fit with another community of this same professional activity sphere. siiri, a finnish doctoral student in behavioural sciences, was involved in doctoral studies part-time while working full-time outside academia. initially, her doctoral studies and professional work outside the university were strongly interconnected and her employer encouraged her to conduct a thesis; however, this congruence diminished over time: “there was this organisational change and my time to conduct the thesis disappeared…. then, i was not able to write the articles, i have dozens of conference posters but those cannot be included in the doctoral thesis.… if the situation would have stayed the same i probably would have a stronger identity as a researcher.” within the professional activity sphere, a tension developed between her academic and her non-academic activities, and in time the signals from her non-academic professional life increasingly diminished the importance of earning the doctorate: “in this field the academic degrees are of course one way of gaining expertise but the other, as appreciated, way is through conducting the work.… in the beginning the expectations motivated me to pursue the doctorate but now when i am an acknowledged expert in my field without the phd it has decreased my motivation to pursue it.” siiri was dedicated to her professional career outside academia and had developed her identity as an expert by relying on signals coming from her non-academic professional life. therefore, she had gradually become distanced and alienated from her identity as an academic researcher. instead, she valued her identity as an expert, and in that sense, she experienced congruence between her personal values and her professional life outside academia. castelló et al | f l r 48 figure 5. siiri’s interpretation of signals from spheres of activity. elaine’s story also illustrates a misfit between perceived signals coming from the personal activity sphere and those from the different communities of the professional activity sphere, in which she participated, particularly regarding personal values. for elaine, completing a phd and securing a position as a tenure-track professor in education in canada actually diminished her sense of self-esteem. prior to entering academe as a mature student, she had research experience through her work outside academia, including publishing and evaluating research funding bids. during those early experiences, her personal sense of identity as a researcher was reinforced in the ways that others treated and referenced her: “it wasn’t only my own identity but being recognised by others as being a researcher in the field … certainly the recognition by others and which i think had started off … by having my first study being published in [an academic journal].” these early, positive expressions led her to doctoral studies and an academic position, but she felt the signals of success in academia had shifted to particular kinds of dissemination (peer-reviewed journal articles) rather than the public policy work she had done. this shift undermined the pleasures she once associated with research and her confidence as a researcher. her identity story is a clear example of tensions within the professional activity sphere and particularly of the ways contradictions between communities within and outside academia can interfere with identity development. elaine’s expectation that her professional experience would contribute and enhance her academic activity had not been upheld, which undermined her researcher identity. figure 6. elaine’s interpretations of signals from spheres of activity. aatu’s story illustrates a misfit in the perceived signals coming from all three spheres. at the beginning of his doctoral studies, aatu, a finnish doctoral student in behavioural sciences, was eager and inspired to pursue his doctoral research: “i thought that now i will pursue my thesis, i planned my castelló et al | f l r 49 publications.… no problems with that because i had such excellent data…. and the beginning was excellent, i wrote three conference papers and then i presented my results in conferences… and i thought that i had reached a new level as a researcher.” however, the signals aatu interpreted from the journal peer-review process were different: “this same paper that included the same information and structure as the paper that got accepted in the previous conference and got positive review comments got now crushing review comments from the journal.… i thought that there was no logic in this system.” aatu’s story focused on new expectations in the academic professional activity sphere and the learning activity sphere: increasingly doctoral students are required to publish during their doctoral studies, but they are still in the process of learning what is involved in writing and publishing papers. the learning experiences involved in the publication process were not in congruence with aatu’s initial expectations, which resulted in misfits and tensions between his personal intentions and the signals he received from the professional activity sphere, causing him to struggle with his identity as a researcher: “i wondered if i could get any permanent position from the university with my publication list.… maybe i was not capable to play this ‘science game.’ i think that this is not worth it at all…. do i even want to play this game anymore?” the sense he made of the signals from the learning activity sphere and the professional activity sphere made him increasingly alienated from research and cynical towards science. aatu identified some signals and requirements defining a research career but they were not things he considered meaningful and worth striving toward; they were inconsistent with priorities in his personal activity sphere. figure 7. aatu’s interpretation of signals from spheres of activity. the final identity story illustrates both fits and misfits among the perceived signals from the professional, learning, and personal activity spheres. kenneth had placed his doctoral studies in the humanities on hold temporarily while employed in a full-time, non-tenure-track position at a canadian university. within his doctoral program, he had perceived himself on the margins both with regards to his theoretical interests and his evolving interests in pedagogy: “i was loving my teaching, loving it. so that in one of my comprehensive exams i added … some things on pedagogy… and at that time maybe it should have clicked that i was, you know, interested in maybe writing and exploring that further.” he had been feeling like a “loser” because his phd was unfinished. this feeling of powerlessness lifted, however, when he accepted a full-time teaching development position. he felt he could effect change in the teaching profession, and this in turn left him feeling positive about himself and more confident he would eventually finish his degree: “i feel like i am beginning to be able to effect some of that change and it’s a cool feeling and so i know now that i’ll finish my phd.” he felt a strong sense of belonging within the higher education castelló et al | f l r 50 pedagogical community and even began to see teaching and teaching development as a prospective career choice. he felt respected and included by other academics in his teaching role, and saw this as an affirming space for himself. he did not feel similarly encouraged by others to do research, which undermined his researcher identity. figure 8. kenneth’s interpretation of signals from spheres of activity. 7. conclusion higher education now offers increasingly precarious career prospects for ecrs. in this contribution, we first offered a framework to account for the notion of researcher identity, which provides a new comprehensive way to analyse researcher identity development within the complex and intertwined spheres of activity in which ecrs are involved. second, in combining across and re-scrutinising data from a range of previous research projects, we explored the usefulness and potential of this framework by means of illustrating ways in which ecrs were aware of, and responded to, signals about their career trajectories, which, in turn, were connected to researcher identity (castelló et al., 2013; pyhältö et al., 2012; vekkaila et al., 2013a). recognition of and response to these differing signals is an important aspect of an ecr’s identity-trajectory (mcalpine et al., 2014) within the context of a risk-career. in the current higher education landscape, ecrs are faced with ever-increasing and possibly conflicting demands to advance toward research careers they fear may not materialise. rather than anticipating stable research careers in academic institutions, ecrs are now pressured to consider how they might contribute and find satisfaction through alternative academic or perhaps non-academic careers. consciously attending to the signals present across spheres of activity may provide ecrs with a sense of agency within the uncertainty of a risk-career environment. we were heartened by the extent to which this new framework applied across the diverse researcher education and researcher career systems in our various international contexts and the different disciplinary fields and professional settings for ecrs involved in our interviews, but we also acknowledge that this conceptualisation requires further testing and analysis with new data to assess its wider transferability along the various career trajectories ecrs face globally. moreover, since data we used came from our previous castelló et al | f l r 51 work, the situations and signals we have been able to identify might not be fully representative of the emerging tensions and pressures that ecrs are facing within the current context of a risk-career. assessing and refining the provided comprehensive framework, identifying signals emerging in and across different spheres of activity, and helping ecrs to identify and respond to these signals are important issues that deserve recognition and focused attention in the efforts of the researcher education and careers sig to advance a shared research agenda for exploring ecrs’ identity development in the 21st century. we propose the following future research emphases as ones that have the greatest potential for consolidating the comprehensive conceptual framework introduced here and facilitating ecrs’ identity harmonic development: • original data specific to ecrs’ spheres of activity, perceived signals, and tensions associated with a risk-career are needed in order to discuss and further illuminate the complex considerations discussed in this paper and advance knowledge about the nature and development of researcher identity. • cross-cultural analyses of ecrs’ experiences across contexts could lead to better understandings of the ways ecrs identify and respond to relevant signals in their various contexts, and, consequently, could unravel the complex interplay between signals and spheres of activity when dealing with tensions and struggling to become researchers in a risk-career environment. in the current globalised context, such cross-cultural analyses could extend to include situations were ecrs pursue opportunities in other cultures and countries. • in the changing scenarios facing higher education systems worldwide, the study of the ways ecrs deal with the perceived continuities, and especially discontinuities, among spheres of activity could help to identify and theorise the conflicting signals that systems are producing and to provide ecrs with tools to better interpret and respond to these signals. • longitudinal analyses of changes ecrs face as they progress from admission to graduation and into initial appointments within and beyond academe will be particularly useful to understand transitions, trajectories, and the varying signals between and among spheres of activity. • more generally, we encourage researchers who focus on the ways specific activities (e.g., writing, supervisory interactions, teaching or publishing, among others) contribute to ecrs’ identity development should attempt to situate their conceptual and methodological assumptions in relation to a comprehensive framework of identity development, such as the one provided in this article, in order to make diverse research data integration possible. keypoints in the current higher education context, early career researchers (ecrs) face a ‘risk-career,’ in which they must identify and interpret new or emergent ‘signals’ in their efforts to develop a researcher identity. the proposed comprehensive framework for researcher identity emphasises ecrs’ recognition and response to signals across spheres of activity. identity stories drawn from prior studies illustrate the congruence (or lack of congruence) between and among signals across different spheres of activity, and the varied ways ecrs respond to (or miss) these signals. the framework and identity stories are intended to offer exemplars to assist ecrs, supervisors, and university managers to identify issues and manage risk-careers. castelló et al | f l r 52 acknowledgements data for this analysis and some of theoretical underpinnings were drawn from earlier projects funded by the finnish cultural foundation, the academy of finland, the higher education academy of uk, the higher education funding council for england, the social sciences and humanities research council of canada, the human resources and skills development canada job creation program, the spanish ministerio de economía y competitividad (dgicyt (cso2013-41108-r) and our respective institutions. this paper arose from a working group established for the inaugural meeting of the earli special interest group researcher education and careers in barcelona, spain, october 2014. montserrat castelló served as coordinator for the group and primary author for this paper. all other authors’ names are listed in alphabetical order. references archer, l. (2008). younger academics’ constructions of ‘authenticity’, ‘success’ and professional identity. studies in higher education, 33, 385–403. doi:10.1080/03075070802211729 baker, v. l., & lattuca, l. r. (2010). developmental networks and learning: toward an interdisciplinary perspective on identity development during doctoral study. studies in higher education, 35, 807–827. doi:10.1080/03075070903501887 boden, d., borrego, m., & newswander, l. k. (2011). student socialization in interdisciplinary doctoral education. higher education, 62, 741–755. doi:10.1007/s10734-011-9415-1 camps, a., & castelló, m. (2013). la escritura académica en la universidad [academic writing at university]. redu. revista de docencia universitaria, 11(1), 17–36. retrieved from http://www.redu.net cantwell, b. (2011). academic in-sourcing: international postdoctoral employment and new modes of academic production. journal of higher education policy and management, 33, 101–114. doi:10.1080/1360080x.2011.550032 castelló, m., & iñesta, a. (2012). texts as artifacts-in-activity: developing authorial identity and academic voice in writing academic research papers. in m. castelló & c. donahue (eds.), university writing: selves and texts in academic societies (pp. 179–200). bingley, uk: emerald. castelló, m., iñesta, a., & corcelles, m. (2013). learning to write a research article: ph.d. students’ transitions toward disciplinary writing regulation. research in the teaching of english, 47, 442–477. clegg, s., (2008). academic identities under threat? british educational research journal, 34, 329–345. doi:10.1080/01411920701532269 cyranoski, d., gilbert, n., ledford, h., nayar, a., & yahia, m. (2011). the phd factory. nature, 472, 276– 279. doi:10.1038/472276a danaher, p. (2015). forms of capital and transition pedagogies: researching to learn among postgraduate students and early career academics at an australian university. to appear in c. guerin, c. nygaard, & p. bartholomew (eds.), learning to research—researching to learn. oxfordshire, uk: libri. edwards, j. r. (2007). the relationship between person–environment fit and outcomes: an integrative theoretical framework. in c. ostroff & t. a. judge (eds.), perspectives on organizational fit (pp. 209– 258). san francisco, ca: jossey-bass. engeström, y., & sannino, a. (2010). studies of expansive learning: foundations, findings and future challenges. educational research review, 5, 1–24. doi:10.1016/j.edurev.2009.12.002 giddens, a. (1991). modernity and self-identity: self and society in the late modern age. cambridge, uk: polity. castelló et al | f l r 53 goffman, e. (1967). interaction ritual: essays on the face-to-face behaviour. london, uk: allen lane. golde, c. m. (1998). beginning graduate school: explaining first‐year doctoral attrition. new directions for higher education, 1998(101), 55–64. doi:10.1002/he.10105 harré, r., moghaddam, f. m., cairnie, t. p., rothbart, d., & sabat, s. r. (2009). recent advances in positioning theory. theory & psychology, 19, 5–31. doi:10.1177/0959354308101417 hemmings, b., hill, d., & sharp, j. g. (2013). critical interactions shaping early academic career development in two higher education institutions. issues in educational research, 23, 35–51. retrieved from http://www.iier.org.au/ holland, d. c., lachicotte, w., jr., skinner, d., & cain, c. (2001). identity and agency in cultural worlds. cambridge, ma: harvard university press. holley, k. (2010). doctoral student socialization in interdisciplinary fields. in s. k. gardner & p. mendoza (eds.), on becoming a scholar: socialization and development in doctoral education (pp. 97–112). sterling, va: stylus. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. new york, ny: cambridge university press. mcalpine, l., & amundsen, c. (2009). identity and agency: pleasures and collegiality among the challenges of the doctoral journey. studies in continuing education, 31, 109–125. doi:10.1080/01580370902927378 mcalpine, l., amundsen, c., & turner, g. (2014). identity-trajectory: reframing early career academic experience. british educational research journal, 40, 952–969. doi:10.1002/berj.3123 mcalpine, l., jazvac-martek, m., & hopwood, n. (2009). doctoral student experience in education: activities and difficulties influencing identity development. international journal for researcher development, 1, 97–109. doi:10.1108/1759751x201100007 mcdaniels, m. (2010). doctoral student socialization for teaching roles. in s. k. gardner & p. mendoza (eds.), on becoming a scholar: socialization and development in doctoral education (pp. 29–44). sterling, va: stylus. mcginn, m. k. (2012a). being academic researchers: navigating pleasures and pains in the current canadian context. workplace: a journal for academic labor, 21, 14–24. retrieved from http://ojs.library.ubc.ca/index.php/workplace/ mcginn, m. k. (guest ed.). (2012b). belonging and non-belonging: costs and consequences in academic lives [special issue]. workplace: a journal for academic labor, 19. retrieved from http://ojs.library.ubc.ca/index.php/workplace/ porfilio, b. j., gorlewski, j. a., & pineo-jensen, s. (guest eds.). (2013). the new academic labor market and graduate students [special issue]. workplace: a journal for academic labor, 22. retrieved from http://ojs.library.ubc.ca/index.php/workplace/ pyhältö, k., & keskinen, j. (2012). doctoral students’ sense of relational agency in their scholarly communities. international journal of higher education, 1(2), 136–149. doi:10.5430/ijhe.v1n2p136 pyhältö, k., nummenmaa a. r., soini, t., stubb, j., & lonka, k. (2012). research on scholarly communities and development of scholarly identity in finnish doctoral education. in t. s. ahola & d. m. hoffman (eds.), higher education research in finland: emerging structures and contemporary issues (pp. 337–357). jyväskylä, finland: jyväskylä university press. pyhältö, k., stubb, j., & lonka, k. (2009). developing scholarly communities as learning environments for doctoral students. international journal for academic development, 14, 221–232. doi:10.1080/13601440903106551 strandler, o., johansson, t., wisker, g., & claesson, s. (2014). supervisor or counsellor?—emotional boundary work in supervision. international journal for researcher development, 5, 70–82. doi:10.1108/ijrd-03-2014-0002 sutherland, k., & taylor, l. (2011). the development of identity, agency and community in the early stages of the academic career. international journal for academic development, 16, 183–186. doi:10.1080/1360144x.2011.596698 castelló et al | f l r 54 sweitzer, v. (2009). towards a theory of doctoral student professional identity development: a developmental networks approach. the journal of higher education, 80, 1–33. doi:10.1353/jhe.0.0034 trede, f., macklin, r. & bridges, d. (2012). professional identity development: a review of the higher education literature. studies in higher education, 37, 365–384. doi:10.1080/03075079.2010.521237 vekkaila, j., pyhältö, k., hakkarainen, k., keskinen, j., & lonka, k. (2012). doctoral students’ key learning experiences in the natural sciences. international journal for researcher development, 3, 154–183. doi:10.1108/17597511311316991 vekkaila, j., pyhältö, k., & lonka, k. (2013a). experiences of disengagement—a study of doctoral students in the behavioral sciences. international journal of doctoral studies, 8, 61–81. retrieved from http://www.informingscience.org/journals/ijds/ vekkaila, j., pyhältö, k., & lonka, k. (2013b). focusing on doctoral students’ experiences of engagement in thesis work. frontline learning research, 1(2), 10–32. doi:10.14786/flr.v1i2.43 weber, m. (1947). science as a profession. in h. h. gerth & c. w. mills (eds.), from max weber: essays in sociology (pp. 129–156). london, uk: kegan. winter, r. (2009). academic manager or managed academic? academic identity schisms in higher education. journal of higher education policy and management, 31, 121–131. doi:10.1080/13600800902825835 wisker, g., morris, c., cheng, m., masika, r., warnes, m., trafford, v., robinson, g., & lilly, j. (2010). doctoral learning journeys: final report. retrieved from http://about.brighton.ac.uk/clt/research/cltresearch/doctoral-learning-journeys/ wisker, g., & robinson, g. (2012). picking up the pieces: supervisors and doctoral “orphans.” international journal for researcher development, 3, 139–153. doi:10.1108/17597511311316982 yorke, m. (2009). faulty signals? inadequacies of grading systems and a possible response. in g. joughin (ed.), assessment, learning and judgement in higher education (pp. 65–84). new york, ny: springer science+business media. microsoft word goldberg & schwarz_publication.docx frontline learning research vol.4 no. 4 special issue (2016) 7 -‐ 19 issn 2295-‐3159 harnessing emotions to deliberative argumentation in classroom discussions on historical issues in multi-cultural contexts tsafrir goldberga, baruch b. schwarzb a university of haifa, israel bthe hebrew university of jerusalem, israel article received 18 september / revised 25 january / accepted 16 march / available online 11 may abstract this theoretical paper is about the role of emotions in historical reasoning in the context of classroom discussions. peer deliberations around texts have become important practices in history education according to progressive pedagogies. however, in the context of issues involving emotions, such approaches may result in an obstacle for historical clairvoyance. the expression of strong emotions may bias the use of sources, compromise historical reasoning, and impede argumentative dialogue. coping with emotions in the history classrooms is a new challenge in history education. in this paper, we suggest that rather than attempting to foster positive emotion only or to avoid emotions all together, we should look at ways of engaging with emotion in history teaching. we present examples of peer deliberations on charged historical topics according to three pedagogical approaches that address emotions in different ways. the protocols we present open numerous questions: (a) whether facilitating engagement with own and the other's emotions may lead to better processing of information and better deliberation of a historical question; (b) whether promoting national pride boosts reliance on collective narratives; and (c) whether adopting a critical teaching approach eliminates emotions and biases. based on these examples and findings in social psychology, we bring forward working hypotheses according to which we suggest that instead of dodging emotional issues, teachers should harness emotions – not only positive but also negative ones, to critical and productive engagement in classroom activities. keywords: teacher identity; knowledge building; inquiry learning; problem-centered pedagogy; technology tsafrir goldberg, dept. of learning, instruction & supervision, university of haifa, 199 abba khoushi rd., haifa, 31905, israel. tgoldberg@edu.haifa.ac.il. doi: http://dx.doi.org/10.14786/flr.v4i4.211 goldberg & schwarz | f l r 8 1. introduction: positions towards the role of emotions in history learning emotions are responses to internal or external events with particular significance for the organism. they include verbal, physiological, behavioral, and neural mechanisms (fox, 2008). emotional experience involves the coordination and synchronization of bodily symptoms, action tendencies, and feelings, driven by appraisal processes (scherer, 2005). although the inclusion of cognitive appraisal (the evaluation of events and objects) in emotions is controversial, cognition is considered either as part of the emotion experience or as interacting with emotion. in his celebrated book, descartes' error: emotion, reason, and the human brain, neurologist antónio damásio (1994) explains that emotions guide (or bias) behavior and posits that rationality requires emotional input. he argues that rené descartes' "error" was the dualist separation of mind and body, rationality and emotion. researchers from other domains express comparable claims: for biologists maturana and varela (1987), cognition, language and mood or emotion are inextricable. for semiotician radford (2015), all emotions and motivations are inherently social and culturally constructed, and do not necessarily obstruct thinking. the question for the educationalist is then how to handle emotions when aiming to foster rational reasoning. this issue is particularly challenging in historical reasoning: since psychologists showed that strong emotions and loyalties hinder rational thinking in general (bless & fiedler, 2006), some researchers in history education suggest avoiding strong and negative emotions holding a sway over cognition (foster, 2013). moreover, politicians and decision makers are often adamant to avoid negative emotions in history classes for ideological reasons (evans, avery, & pederson, 1999). in other words, although research has shown the intricate relations between rationality and emotion, many opt lessening bursts of strong emotions in history classes. descartes comes in again through the backdoor. in the present paper, we claim that handling and capitalizing on emotions as resources for learning is a major goal in history education in the 21st century. we focus on face-to-face deliberative argumentation about highly loaded historical issues, a context that exacerbates emotions in the case of history (schwarz & goldberg, 2013) and fosters reasoning and learning in general (schwarz & asterhan, 2010). we provide examples of teaching approaches designed to engage emotions in different ways and exemplify how these emotions affect deliberative discussions of historical topics in productive or counter-productive directions. we rely on these examples to articulate hypotheses on the beneficial and detrimental roles of emotions in deliberative historical argumentation. 1.1 "don't get emotional now" historical research has long strove for impartiality, and even when presumptuous aspirations for objectivity were abandoned, emotionality and partisanship are still considered hindrances (haskell, 1990). school history teaching, while attuned to various other goals besides promoting norms of academic historical practice, is also shifting to a growing extent to emphasizing rational disciplinary thinking and discourse (barton, 2009; national center for history in the schools, 2010). in a survey of leading history education experts, only one of the ten intellectually challenging core practices of history teaching they recommend refers (tangentially) to learner emotions ("connect to personal/cultural experience"). even then, the emphasis is on helping learners properly distance themselves from personal reaction and views (fogo, 2014). some educationalists are even more explicit: foster's (2013) review of teaching controversial issues advises eschewing the highly emotive topics in favor of more distant events, in order to allow learners to better focus on disciplinary practices. accordingly, the majority of teachers tend to avoid charged and emotive issues in history teaching (levstik, 2000). issues arousing strong (negative) emotions are often considered "taboo" and are formally or covertly sanctioned (evans et al., 1999). evasion is more frequent with topics that shed unpleasant light on learners' in-group, and may elicit collective shame or guilt which are aversive emotions (helmsing, 2014; wohl, branscombe, & klar, 2006). indeed, historical issues bearing on identity are especially emotion intensive and the emotions they may raise are nor solely positive (mccully, 2006). goldberg & schwarz | f l r 9 zembylas and kambani (2012) claim the emotional risk and complexity of teaching controversial historical issues are especially threatening in societies divided by intergroup conflicts. such a risk arises due to history's role in learners' identity formation, constructing a meta-narrative in which learners position themselves (goldberg, porat, & schwarz, 2006). this may be the reason curriculum policy makers tend to restrict learners' encounter with out-group historical narratives (bar-tal, 2007; goldberg & gerwin, 2013; hilton & liu, 2008). even educational initiatives designed to engage students with their nation's multicultural history such as euroclio's initiatives for the new post soviet democracies, have been criticized for a tendency towards harmonization and towards the evasion from conflictual episodes (maier, 2011). history education experts claim learners should check personal inclinations and emotions lest they be prone to bias and "presentism" (davis, yeager, & foster, 2001; wineburg, mosborg, & porat, 2001). in the context of emotionally charged topics that are more salient in collective memory learners' evidence evaluation tended to be more biased (goldberg, schwarz, & porat, 2008). when confronted with accounts of their nation's history that posed threat to their national pride, patriots demonstrated biased processing of historical information (miron, branscombe, & biernat, 2010). thus, emotions and issues arousing strong emotions seem to threaten “good” rational and disciplinary oriented learning. however, can identity and emotions be side tracked without losing essential aspects of history teaching? some educationalists harness emotions to history learning. for example, zembylas and kambani (2012) call for a focus on the emotional side of teaching (contested) history, to purposefully engage learners with discomforting emotions in reference to sensitive historical topics. for them, empathizing is a strategy to be practiced (zembylas, 2004; zembylas, 2013). britzman (2000) refers to the importance of engaging with learners' emotions when teaching about historical collective trauma. none of the above researchers is interested in history teaching as an end in itself, though. their educational aim concerns reconciliation between adversaries. in the learning sciences, there is now a general recognition of the importance of relating curriculum to learners' identity and community history and of engaging them in advocacy and social critique to boost motivation for learning (thompson, 2014; varelas, 2012). barton and mccully (2010) stress that encountering conflicting historical perspectives through critical disciplinary inquiry may help students engage in an internally persuasive dialog about the past and achieve a tolerant and receptive identity. instead of dodging the role of emotions and of identity, gottlieb, wineburg, and zakai (2005) claim one should acknowledge identity influences on historical understanding, and accept the fact that individuals apply different critical standards to historical sources central and peripheral to group identity. muller mirza and colleagues (muller mirza et al., 2014) point to the importance of "secondarisation" of emotion, the process in which individuals reflect on their emotions and generalize them into more abstract concepts. such a process is essential for handling contentious intercultural topics and for academic achievement. bar-on and adwan (2006) advocate structuring history teaching to help learners acknowledge their own and the other's emotions and collective identity. they assume this would help deliberating contentious issues of the past in a more productive and reasoned way and motivate engagement with it. however, the effects of affirmation of sentiments relating to national identification and collective narrative have not been explored so far. we present here a study enabling a comparison between three approaches to history learning – disciplinary critical inquiry, mutual narrative acknowledgement and patriotic apologetic teaching, and by such we inquire about relation between emotion and learning in history. to do so, we focus on inter-group deliberative discussions (or deliberative argumentation) on a "hot" historical topic: deliberative argumentation is a propitious context for learning (schwarz & asterhan, 2010); hot historical topics are lieux de memoire where identity and national identification (or national pride) are susceptible to emerge. our working hypothesis is that emotion and identity do not constitute obstacles to deliberative discussions or disciplinary practice by themselves. to check our hypothesis, we compare examples of peer deliberations with different structuring of engagement with emotions. in the last part of the paper, we rely on these examples to articulate hypotheses on the role of emotions in deliberative argumentation. goldberg & schwarz | f l r 10 1.2 feeling and discussing the past: learners' deliberative discussions learners' emotions were addressed through three approaches to history teaching (for full details see goldberg and ron's (2014) description of procedure). the first is an authoritative single-narrative approach aligns with declared national history teaching goals such as acquiring factual knowledge of main events and enhancing students' commitment to the state and their collective identity (israeli ministry of education, 2015). instruction according to this approach was based on a textbook chapter written under direct governmental supervision to produce a clear account stressing the righteousness of israel (domke, urbach, & goldberg, 2009; yaron, 2009). teaching was an "initiation-recitation-evaluation" session with a powerpoint presentation, directed at getting the "right answer" in a short quiz and instilling pride in one's nation. this conventional lower order thinking type of teaching appears to be very common in social studies classrooms (saye & social studies inquiry research collaborative (ssirc), 2013) and aligns with helmsing's examples of reasoning that does not challenge national pride (2014). the second approach is empathetic dual-narrative. it aims at arousing feelings of empathy for the other and mutual affirmation of collective sentiments. instruction was based on excerpts from a dual narrative history textbook created by jewish and palestinian teachers (adwan & bar-on, 2004; bar-on & adwan, 2006). learners were driven to empathetic attention to the emotions and values of adversary narrators and reflection on the reactions they arouse. this practice aligns to some degree with the process of secondarisation, in which learners reflect on emotions and produce more generalized understanding of their social aspect (mirza, grossen, de diesbach-dolder, & nicollin, 2014). finally, the critical inquiry approach aims at modelling disciplinary practice and developing critical thinking skills (reisman, 2012). instruction was based on the use of conflicting sources accompanied with information allowing inferences as to context, goal and bias of authors. teachers coached students in sourcing and corroboration (wineburg, 2001) and explicitly attempted to instil impartiality in the encounter with evidence (instructing them to read like a "swedish" [i.e. neutral] historian). table 1 succinctly summarizes the emotional emphases and practices of the three approaches. jewish and arab israeli students from diverse schools were randomly allocated to study the topic of the 1948 war ("war of independence") in one of the three approaches. two weeks later, participants were paired by teaching approach into jewish-arab dyads; they engaged in deliberative discussion of the war and the causes of the palestinian refugee problem, a topic bound to raise emotional reactions. all discussions were audiotaped and transcribed. we marked all discussion episodes that included direct references to history or identity, or use of historical disciplinary practices (sourcing, contextualization, perspective taking, causal explanation) (lee & ashby, 2000; wineburg, 2001). two researchers coded ten of the sixty discussion, and arrived at 75% agreement. differences were discussed and resolved. the rest of discussions were analyzed separately. we bring forth four episodes we deem representative of potential effects of teaching approaches on the relations of emotion and learning in history. goldberg & schwarz | f l r 11 table 1 teaching approaches, emotional emphases and practices approach emotional emphasis practices conventional authoritative instilling pride in one's nation getting the "right answer" presentation textbook reading and summary. exam empathetic dual-narrative mutual affirmation conflict resolution empathy nonjudgmental listening to collective narratives. identifying with emotions and values critical disciplinary inquiry induction into disciplinary practice instill impartiality critical analysis of conflicting sources synthesis of sources figure 1 shows a first protocol in the critical-inquiry condition. at the beginning of the discussion (not presented here), the jewish participant devoted almost four times as many words to analysis and evaluation of the sources as his arab peer (416 vs. 111 words). could this stress on the analytical disciplinary approach come at the price of feeling attached to the collective and its history? the protocol shows that the jewish discussant, who attempts an implicit distancing from jewish identity ("israeli is enough for me"), also expresses a disinterest in jewish history ("it doesn't arouse interest in me…i did it for the exam…i don't care what i study"). by contrast, his peer declares herself a palestinian arab, and stresses her glorifying view of her people ("the palestinians especially are very very very smart"). she also endorses emphatically learning the history of "my arab country", by which she learns "very very wise things", and which she would "love others to study" it too. the contrast between the two learners suggests that some degree of identification and perhaps even glorification seems to motivate learning one's own national history. furthermore, it seems that the arab student who is more enthusiastic to learn about her own groups history also views learning the others' history more positively ("even when i study jewish history…it's good, nice that i learn more"). could the positive effect that feelings of pride and identification have on motivation to learn in-group history, be generalized to the out-group? is there an essential relation between disidentification and emphasis on disciplinary practice? these questions are complex and do not seem to have general answers. goldberg & schwarz | f l r 12 figure 1. an excerpt of a discussion in a critical-inquiry condition that suggests that national identification boosts the motivation to study own and others’ national history. a: we in school study history, jewish history, the jewish undergrounds and…but you don't study arab history…why? j: because we live in one state…state defined with one nation, and which has one curriculum… a: i live in this country and define myself as palestinian arab, ok? you define yourself as jewish, ok? j: israeli a: jewish israeli j: whatever….israeli is good enough for me a: for you it is enough to study jewish history, i define myself as arab i will learn only the arab… j: call me narrow… but the fact i study something connected to me, supposedly connected to me, the truth is it doesn't, it doesn't arouse interest in me. i don't care which history. i did it for the exam. a: not me. i, even when i study jewish history…it's good, nice that i learn more and more. but i study jewish history and i would love the others to study mine… j: i agree with you a: you know the arabs, the palestinians especially … are very very very smart…so, i learn about my arab country, i very very very study very wise things. j: first, l agree with you…i wouldn't mind studying another history…but personally, if you ask me, i don't care what history i would have studied. goldberg & schwarz | f l r 13 figure 2 shows an excerpt of a discussion between two discussants in the narrative empathetic condition. they took the perspective of the other, and frequently expressed feelings. figure 2. two discussants in the narrative empathetic condition exemplify they took the perspective of the other, and frequently expressed feelings it is noteworthy that both attune themselves to the suffering of the palestinians, an orientation that apparently also facilitated a more collaborative atmosphere. perspective taking is used for laying foundations for mutual trust, for discussing the possible (although to some degree counterfactual) decisions of leaders and for agreeing they would not fight as their predecessors had. the second part of figure 2 shows another phase in which the jewish discussant invites his palestinian peer to relate to her and her peoples' feelings, with which he appears to empathize. both cases of perspective taking show awareness of the difference and distance between the historical agents and the learners (as evinced by the use of the third person "them”), and of the action of perspective taking ("that's how i imagine myself if i was in their place at the time"). emotion here is used as a venue into the disciplinary practice of perspective taking. it is a cognitive tool, directed at reconstructing the historical agents' consciousness. this approach also aligns to some degree with the movement between unicity and genericity, which drives forward the process of "secondarisation" in dealing with tense social phenomena (muller mirza et al., 2014). it is worth comparing this discussion to a conversation in the conventional-authoritative condition. figure 3 shows an arab discussant responding in highly contentious and emotional manner to his jewish peer's reconciliatory counterfactual speculation. both discussants do not demarcate themselves from historical figures, but identify and "merge" with them through using first and second person plural, in what a: i have a question; if you were one of the great men in the country or in israel, would do you think you would have done? j: got it. i'd try to compromise on one decision a: which is? come on… j: seems to me like the un saidto split the land into two parts so there would be peace and they wouldn't fight…that's it…what do you thinkwhat would you have done? a: me too, i think may be we could have reached a peace agreement… … j: just a question right out of my headhow do you think the palestinians felt after they were deported? a: fear. no mother, no land, no one to turn to, that's how i imagine myself if i was in their place at the time…nothing to do, no power, no army, no leaders to supervise them or lead them, nothing but themselves going where the arab states or un tells them to go. j: no one to lead them. a: and that's scary, right. goldberg & schwarz | f l r 14 seems like a clear expression of collective memory. discussants do not refer to the text they studied, nor do they analyse critically the evidence it contained. they rely on religious or mythical backings rather than on historical ones, impeding further deliberation of the question. thus, it is in the context of teaching aimed at conveying a clear undisputed narrative that learning is emotionally disrupted and the past is disputed with no reliance on the discipline of history. the jewish discussant, who initially adopted a more rational and more collaborative perspective, feels forced to gradually adopt a confrontational model. figure 3. two discussants in the conventional-authoritative condition entertain a contentious interaction. let us exemplify now another discussion in the critical-disciplinary condition. figure 4 shows an arab discussant who demonstrates his feeling of loyalty to his community as mediated through the family. in spite of his declared impartiality ("you know, i don't distinguish between the texts") he points quite clearly to his being an arab as the reason for his preference for arab historian's excerpt. this preference is accompanied by what seems like a confirmation bias – the tendency to view evidence consistent with prior opinions as more reliable. the jewish participant on the other hand shows far lower preference for in-group member's sources and founds his criticism of sources on the practices he studied in the preparatory session – sourcing and contextualization. we can see here how national identity can bias historical practices such as evidence evaluation. however, the critical inquiry approach encourages participants to reflect on their evaluation of evidence and expose their bias. it also seems that, at least for a member of the dominant ethnic group, the critical inquiry approach promotes a more balanced and impartial disciplinary practice. j: that's what i think should have happened; we could get along, you know what, even two separate states, one state for two people and joint leadership and everything. no need to deport, you know…at most we could have asked for some more territory so it would be enough for two states. a: but you didn't! you deported and murdered j: and i'm saying, in my view there shouldn't have been deportation a: and you killed us and murdered us and stole and all this j: i didn't do it and i think a: you didn't do it but they did j: i think it was a mistake and it shouldn't have been done in no way…there were mistakes on both sides in the same way we fought over our state earlier, and you fought for your state. we both wanted something, basically the same. each in his direction. a: ok. but you said you needed a state for the jews, but why in palestine? j: because i said it, this place is sacred for me too, it is, like my granddad’s granddad’s granddad he had here, they found graves here, it's a place my ancestors lived in same as your ancestors…this place is important to us, also sacred for us too…i'm not a religious person and wouldn't go by the edicts of judaism in most cases but in the same way a palestinian thinks this place is sacred for the palestinian people, it's a sacred place for the jewish people. goldberg & schwarz | f l r 15 figure 4. two discussants in the critical-inquiry condition demonstrating identity motivated and disciplinary practice. 2. discussion this paper initiates a reflection on the effects of emotions on oral argumentation for highly charged historical topics. the three pedagogical approaches exemplified in the protocols modelled different ways students handled emotions. the conventional authoritative single-narrative approach instils pride in one's nation and appears to delegitimize doubt and perspective taking. the empathetic dual-narrative approach facilitates mutual affirmation and increases the use of historical perspective taking, though not the use of critical thinking. the critical inquiry approach draws learners to reflect and expose the relation of their identity and emotions to disciplinary practices, and to some degree helps overcome it. our claims cannot count as conclusions but as working hypotheses for further research. moreover, this paper outlined the implications of various theoretical and empirical perspectives on the role of emotions in historical reasoning, fleshing them with actual discussion excerpts. we set forth the working hypotheses that these implications suggest. firstly, history teaching that legitimizes the complex emotions arising from encounter with outgroup perspectives by promoting strategic empathy and reflection on emotion (mccully, 2006; zembylas, 2013), appears to promote productive deliberative discussions. this is perhaps because it affords more chances for mutual gestures helping maintain dialogue or because discomforting emotions help participants take the perspective of historical agents in troubled times (zembylas & mcglynn, 2012). engagement with emotion according to this approach seems to foster a nuanced and conscious use of perspective taking, though not necessarily better handling of evidence. secondly, a direct attempt to expose the relation of identity-related emotions to historical practices, helps develop learners' internally persuasive dialogue and reflection about evidence (barton & mccully, 2010). this may promote critical thinking practices such as evidence evaluation. however, it does not insure curbing emotionally driven biases since evidence is used (perhaps with more restraint and awareness) in relation to identity needs and emotions (goldberg, 2013; gottlieb et al., 2005). both these approaches promote a productive merging of emotion and cognition or reasoning (mingers, 1991; radford, 2015). by contrast, the conventional teaching approach, neither challenges nor acknowledges the role of collective emotion in learning (barton, 2009). however, it appears a: our opinion, i, because i'm an arab, you know, i don't distinguish between the texts, that of the jew or the arab, but because my parents lived through it, they were in the time of 1948, i give more credibility to the text of written by the arab. i read there many things i already knew. j: your parents told you similar things? a: yeah…but what i read in the jewish author's text was new, so that's why… j: ok. personally, i too really believe the arab author more. a: what? j: i too really believe the arab author more because the israeli author writes… from a role of propaganda kind of, and from within the ministry of foreign affairs and attempts to explain israel to the world, so it's in his interest to be in favor of the israeli side. goldberg & schwarz | f l r 16 to enhance or unleash its (negative) effect, both on learning and on deliberative discussion (bless & fiedler, 2006; hilton & liu, 2008). this may hint that relating to emotions holds a promise for better processing of information or deliberation of a historical question. in summary, the examples presented suggest that emotion and identity do not necessarily constitute obstacles to deliberative discussions or disciplinary practice by themselves, in line with baker et al.'s (2013) ideas. however, students in each approach engaged their emotions and learning differently. it appears that instructional practices moderate and influence the relations of emotion and reasoning. facilitating empathetic listening, nurturing national glorification, or attempting to hold emotion at bay, may each lead to a different way of arguing about the past. we currently undertake an experimental study that involves the systematic comparison of discussions as well as of learning outcomes in the three approaches. the analyses we undertake will hopefully confirm and sharpen our working hypotheses. meanwhile, we believe that it is possible to rely on the above examples to draw some tentative conclusions on the teaching of history. history was introduced as a core discipline in schools in the 19th century in order to bring students to believe that they belong to a nation and to foster national pride (ferro, 2004). we alluded to the experts' emotions-free list of core teaching practices and skills that reflect a substantial shift to the adoption of the norms of history as a critical rational discipline (fogo, 2014). this shift does not pay attention, though, to the emotions history nurtures, arouses and is motivated by. the cognitive practices of history are nested within, colored by and interact with emotion (maturana & varela, 1987) as collaborative learning and disciplinary oriented practices take the lead in history teaching, the role of emotions becomes ever more important. the protocols we presented suggest that while simply fostering the glorification of the nation impedes historical deliberation, concurring teaching approaches that bring to the fore alternative narratives and engage with strong emotions may actually help handle such influences. therefore, educators should not dodge emotions in their teaching but, on the contrary, should capitalize on them to boost historical reasoning. the first place for bringing forward these strong emotions is of course the discussion. this setting risks bringing to the surface strong emotions (like anger), leading to breakdowns. however, the preparatory practices of engagement with both in-group and out-group perspectives, acknowledging and evaluating emotional overtones, seem to tone down contentious reactions when engaging in argumentation across groups. with appropriate framing and instruction, students develop their capacity to handle emotions in discussions (muller mirza et al., 2014). they speak with each other, and deliberating together the historical roots of their conflict, even if their discourse was sometimes biased. we suggest, then, that the core practices of history teaching should also include addressing common opinions held by the different stakeholders on the issue, relating to the emotional states of the different historical actors and to the emotional reactions, of the learners (bar-on & adwan, 2006) and facilitating small group discussions across groups by helping the handling of emotions. these additional practices may promote the role of history in helping learners become citizens engaged in productive deliberation of their contentious past and their shared present. furthermore, we believe that research on historical understanding and reasoning should change both its prescriptive and its analytic stance to the role of emotions. first, if we wish to address the complexity of goals and needs history education addresses in reality (and not in an idealized rational expert model), it would serve researchers well to give emotion a more central role and treat it without disdain (barton, 2009). second, instead of relating to emotions and loyalties anecdotally, as indications of diverse identities, biased cognition or novice practice, there should be much to gain in exploring it proactively. methodologically this means tracking and documenting emotion, whether through self-report or other implicit and observational methods now accessible through emotions research (see baker et al., 2013). analytically, we should start looking at emotions as promoters, factors and even as desirable outcomes of learning in history (goldberg, 2013). this does not mean, of course, that we should ignore the inhibiting role of emotions in some cases, and that we should eliminate from class activity a detached critical-disciplinary approach. our message is goldberg & schwarz | f l r 17 that emotions are precious resources for history education, but that teachers should learn when and how to capitalize on them. references adwan, s., & bar-on, d. (2004). shared history project: a prime example of peace building under fire. international journal of politics, culture, and society, 17(3), 513-521. baker, m. j., andriessen, j. e., & järvelä, s. (eds.). (2013). affective learning together. london: routledge. bar-on, d., & adwan, s. (2006). the psychology of better dialogue between two separate but interdependent narratives. israeli and palestinian narratives of conflict: history’s double helix, 205224. bar-tal, d. (2007). socio-psychological foundations of intractable conflicts. american behavioral scientist, 50(11), 1430-1453. barton, k. c., & mccully, a. w. (2010). "you can form your own point of view": internally persuasive discourse in northern ireland students' encounters with history. teachers college record, 112(1), 142181. barton, k. c. (2009). the denial of desire: how to make history education meaningless. in l. symcox, & a. wilshcut (eds.), national history standards: the problem of the canon and the future of teaching history (pp. 265-282). charlotte, nc: information age. bless, h., & fiedler, k. (2006). mood and the regulation of information processing and behavior. in j. p. forgas (ed.), affect in social thinking and behavior (pp. 65-84). new york: psychology press. britzman, d. p. (2000). if the story cannot end: deferred action, ambivalence, and difficult knowledge. in r. simon, s. rosenberg & c. eppert (eds) between hope and despair: pedagogy and the remembrance of historical trauma, (pp. 27-55). lanham, md: rowan and littlefield. davis, o. l., yeager, e. a., & foster, s. j. (2001). historical empathy and perspective taking in the social studies. lanham, md: rowman & littlefield. domke, e., urbach, c., & goldberg, t. (2009). building a state in the middle east [bonim medina bamizrach hatichon]. jerusalem, israel: zalman shazar center. eid, n. (2010). the inner conflict: how palestinian students in israel react to the dual narrative approach concerning the events of 1948. journal of educational media, memory, and society, 2(1), 55-77. evans, r. w., avery, p. g., & pederson, p. v. (1999). taboo topics: cultural restraint on teaching social issues. the social studies, 90(5), 218-224. ferro, m. (2004). the use and abuse of history: or how the past is taught to children. london: routledge. fogo, b. (2014). core practices for teaching history: the results of a delphi panel survey. theory & research in social education, 42(2), 151-196. fox, e. (2008). emotion science: cognitive and neuroscientific approaches to understanding human emotions. hampshire, uk: palgrave macmillan. goldberg, t., & gerwin, d. (2013). israeli history curriculum and the conservative liberal pendulum. international journal of historical teaching, learning and research, 11(2), 111-124. goldberg, t. (2013). “it's in my veins”: identity and disciplinary practice in students' discussions of a historical issue. theory & research in social education, 41(1), 33-64. goldberg, t., & ron, y. (2014). ‘look, each side says something different’: the impact of competing history teaching approaches on jewish and arab adolescents’ discussions of the jewish–arab conflict. journal of peace education, 11(1), 1-29. goldberg, t., porat, d., & schwarz, b. b. (2006). “here started the rift we see today”: student and textbook narratives between official and counter memory. narrative inquiry 16(2), 319–347. goldberg, t., schwarz, b. b., & porat, d. (2008). living and dormant collective memories as contexts of history learning. learning and instruction, 18(3), 223-237. doi:http://dx.doi.org/10.1016/j.learninstruc.2007.04.005 goldberg & schwarz | f l r 18 gottlieb, e., wineburg, s., & zakai, s. (2005). when history matters: epistemic switching in the interpretation of culturally charged texts. eleventh biennial meeting of the european association of learning and instruction, haskell, t. l. (1990). objectivity is not neutrality: rhetoric vs. practice in peter novick's that noble dream. history and theory, 29(2), 129-157. helmsing, m. (2014). virtuous subjects: a critical analysis of the affective substance of social studies education. theory and research in social education, 42(1), 127-140. hilton, d. j., & liu, j. h. (2008). culture and intergroup relations: the role of social representations of history. in r. m. sorrentino, & y. susumu (eds.), handbook of motivation and cognition across cultures (pp. 343-368). london, uk: academic press. israeli ministry of education (2015). history curriculum for the jewish secular public schools. retrieved march 2016 from http://cms.education.gov.il/educationcms/units/mazkirut_pedagogit/history/tochnitlimudimvt/talt ashah.htm lee, p., & ashby, r. (2000). progression in historical understanding among students ages 7-14. in p. n. stearns, p. seixas & s. wineburg (eds.), knowing, teaching, and learning history: national and international perspectives (pp. 199-222). new york, ny: new york university press. levstik, l. s. (2000). articulating the silences: teachers' and adolescents' conceptions of historical significance. in p. n. stearns, p. seixas & s. wineburg (eds.), knowing, teaching, and learning history: national and international perspectives (pp. 284-305). new york, ny: new york university press. maier, r. (2011). how we lived together in georgia in the 20th centuryexternal evaluation euroclio/matra project (2008—2011). retrieved from http://www.euroclio.eu/new/index.php/component/docman/doc_download/1080-how-we-livedtogether-in-georgia-in-the-20th-century-external-review maturana, h. r., & varela, f. j. (1987). the tree of knowledge: the biological roots of human understanding. boston: new science library/shambhala publications. mccully, a. (2006). practitioner perceptions of their role in facilitating the handling of controversial issues in contested societies: a northern irish experience. educational review, 58(1), 51-65. mingers, j. (1991). the cognitive theories of maturana and varela. systems practice, 4(4), 319-338. miron, a. m., branscombe, n. r., & biernat, m. (2010). motivated shifting of justice standards. personality & social psychology bulletin, 36(6), 768-779. doi:10.1177/0146167210370031 muller mirza, n., grossen, m., de diesbach-dolder, s., & nicollin, l. (2014). transforming personal experience and emotions through secondarisation in education for cultural diversity: an interplay between unicity and genericity. learning, culture and social interaction, 3(4), 263-273. national center for history in the schools. (2010). historical issues. retrieved from http://www.nchs.ucla.edu/history-standards/historical-thinking-standards/5.-historical-issues radford, l. (2015). of love, frustration, and mathematics: a cultural-historical approach to emotions in mathematics teaching and learning. in b. pepin, & b. roesken-winter (eds.), from beliefs to dynamic affect systems in mathematics education (pp. 25-49). new york: springer. reisman, a. (2012). reading like a historian: a document-based history curriculum intervention in urban high schools. cognition and instruction, 30(1), 86-112. saye, j., & social studies inquiry research collaborative (ssirc). (2013). authentic pedagogy: its presence in social studies classrooms and relationship to student performance on state-mandated tests. theory & research in social education, 41(1), 89-132. schwarz, b. b., & asterhan, c. s. (2010). argumentation and reasoning. in k. littleton, c. wood & j. kleine staarman (eds.), international handbook of psychology in education (pp. 137-176). london: emerald group publishing. schwarz, b. b., & goldberg, t. (2013). “look who’s talking”: identity and emotions as resources to historical peer reasoning. in m. j. baker, j. e. andriessen & s. järvelä (eds.), affective learning together (pp. 272-292). london: routledge. goldberg & schwarz | f l r 19 sorek, t. (2011). the quest for victory: collective memory and national identification among the arabpalestinian citizens of israel. sociology, 45(3), 464-479. thompson, j. (2014). engaging girls’ sociohistorical identities in science. journal of the learning sciences, 23(3), 392-446. varelas, m. (2012). identity construction and science education research: learning, teaching, and being in multiple contexts. springer science & business media. wineburg, s. s. (2001). historical thinking and other unnatural acts: charting the future of teaching the past temple university press. wineburg, s. s., mosborg, s., & porat, d. (2001). what can forrest gump tell us about students' historical understanding? social education, 65(1), 55-58. wohl, m. j., branscombe, n. r., & klar, y. (2006). collective guilt: emotional reactions when one's group has done wrong or been wronged. european review of social psychology, 17(1), 1-37. doi:10.1080/10463280600574815 yaron, m. (2009). history superintendent's requested corrections to the textbook: building a state in the middle east.[personal correspondence to author] zembylas, m. (2004). emotion metaphors and emotional labor in science teaching. science education, 88(3), 301-324. zembylas, m. (2013). critical pedagogy and emotion: working through ‘troubled knowledge’ in posttraumatic contexts. critical kambani studies in education, 54(2), 176-189. zembylas, m., &, f. (2012). the teaching of controversial issues during elementary-level history instruction: greek-cypriot teachers' perceptions and emotions. theory & research in social education, 40(2), 107133. zembylas, m., & mcglynn, c. (2012). discomforting pedagogies: emotional tensions, ethical dilemmas and transformative possibilities. british educational research journal, 38(1), 41-59. microsoft word lamsa_ finalproofs.docx frontline learning research vol. 9 no. 3 (2021) 1-12 issn 2295-3159 info corresponding author: joni lämsä, p.o. box 35, fi-40014, university of jyväskylä, finland, joni.lamsa@jyu.fi doi: https://doi.org/10.14786/flr.v9i3.645 staying at the front line of literature: how can topic modelling help researchers follow recent studies? joni lämsä1, catalina espinoza2, ari tuhkala3, & raija hämäläinen1 1department of education, university of jyväskylä, finland 2center for advanced research in education, university of chile, chile 3finnish institute for educational research, university of jyväskylä, finland article received 23 june 2020 / article revised 20 december / accepted 26 march 2021 / available online 14 april abstract staying at the front line in learning research is challenging because many fields are rapidly developing. one such field is research on the temporal aspects of computer-supported collaborative learning (cscl). to obtain an overview of these fields, systematic literature reviews can capture patterns of existing research. however, conducting systematic literature reviews is time-consuming and do not reveal future developments in the field. this study proposes a machine learning method based on topic modelling that takes articles from a systematic literature review on the temporal aspects of cscl (49 original articles published before 2019) as a starting point to describe the most recent development in this field (52 new articles published between 2019 and 2020). we aimed to explore how to identify new relevant articles in this field and relate the original articles to the new articles. first, we trained the topic model with the results, discussion, and conclusion sections of the original articles, enabling us to correctly identify 74% (n = 17) of new and relevant articles. second, clusterisation of the original and new articles indicated that the field has advanced in its new and relevant articles because the topics concerning the regulation of learning and collaborative knowledge construction related 26 original articles to 10 new articles. new irrelevant studies typically emerged in clusters that did not include any specific topic with a high topic occurrence. our method may provide researchers with resources to follow the patterns in their fields instead of conducting repetitive systematic literature reviews. keywords: automatic content analysis; computer-supported collaborative learning; literature review; temporal analysis; topic model lämsä et al 2 | f l r 1. introduction research in learning sciences has become more interdisciplinary because increasingly complex datasets and methods may require the expertise of computer scientists and signal processors. this interdisciplinary collaboration opens up the possibility of new publication forums in the learning sciences. however, this could also make thorough systematic or thematic literature reviews (see gruber et al., 2020) even more arduous. thus, it would be useful if the vast amount of work done by scholars when conducting systematic literature reviews could be exploited when monitoring how a specific line of research would proceed. if relevant future studies can be automatically identified and related to previous research, this would decrease the need to perform recurring systematic literature reviews on similar topics, thus affording researchers more working hours to advance in their fields. to address these aspirations, we present a machine learning–based method that takes articles from a manual systematic literature review as a starting point to describe the recent developments in the field. we illustrate the potential of our innovative method in the context of research focusing on the temporal analysis of computer-supported collaborative learning (cscl). this field of research forms a particularly promising basis for studying its progress because the studies focusing on the temporal aspects of cscl are increasingly being published and involve interdisciplinary collaboration (e.g., hadwin, 2021; lämsä et al., 2021). in this study, we define the temporal analysis of cscl as analysing the characteristics of events or the interrelations between these events over time. the events may relate to learner interaction, thoughts and ideas developed during the interaction and the use of technological resources to mediate the interaction (see lämsä et al., 2021). a temporal analysis of cscl may benefit both practitioners and researchers by revealing how (not only what) learning occurs in cscl settings (lämsä, 2020), particularly now when covid-19 highlights the need for effective cscl more than ever (järvelä & rosé, 2020). when we manually reviewed the literature focusing on the temporal aspects of cscl (see section 2 and lämsä et al., 2021), we found that the interdisciplinary collaboration in this field has caused challenges regarding the commensurability and comparability of the studies. particularly, the studies seemed to be fragmented in terms of their theoretical frameworks (cf. hew et al., 2019), methodologies, and results and implications. this finding implies that both practitioners and researchers may struggle with staying at the front line concerning the big picture of cscl and its research because of this fragmentation. practitioners may benefit from our method if it can filter applicable research to support them in the design and implementation of research-based cscl innovations. similarly, our method can benefit researchers because it can illustrate whether and how the recent research has contributed to prior studies. we investigate the added value of our method for practitioners and researchers by addressing the following research questions: rq1: how and to what extent can a machine learning–based method be used to identify new relevant articles in the field of manual systematic literature review? rq2: how and to what extent can the machine learning–based method be used to relate new and original articles to each other? 2. methodology when manually reviewing the literature on the temporal aspects of cscl in february 2019 (see lämsä et al., 2021), we carefully selected the search terms concerning temporality, collaborative learning, and computer-supported learning. we used the education resources information center (eric), scopus, and web of science databases and identified 436 articles, of which we manually screened and assessed their eligibility. in this study, we included 49 peer-reviewed journal articles that focused on the temporal analysis of cscl for further analysis (original articles). to find new articles, lämsä et al 3 | f l r we repeated the literature searches with the same search terms and databases in february 2020 as for the original articles. the searches found 88 articles that had been published between february 2019 and 2020. from these 88 articles, we excluded 36 articles, of which 31 were duplicates, three had no full text available, one was a conference proceeding article, and one was already included in the set of the original articles. in the following analyses, we refer to these included 52 peer-reviewed journal articles as a set of new articles. the utilised machine learning–based method was grounded on a natural language processing technique known as topic modelling, which is based on statistical algorithms that find topics in a collection of documents (boyd-graber et al., 2017). these topics are ranked lists of words, where each word has a probability of belonging to a topic (see table 1), or more formally, topics are probability distributions over vocabularies. in the following sections, we describe how the original articles were exploited to build the topic models that, in turn, were used to identify the new relevant articles (rq1) and relate them to the original articles (rq2). figure 1 summarises our procedure. figure 1: procedure for describing original and new articles to address the research questions (rqs) 2.1 extracting and preprocessing text first, we extracted raw text from the original and new articles and removed tables, figures, formulas, bullet points, footnotes, and page numbers. second, we separated the different sections of the articles under the following headings: introduction, theoretical framework, methodology, results, discussion, and conclusion. however, because not all the articles had all of these sections (e.g., an article may have a combined results and discussion section), we decided to combine the sections into three wider sections: (1) introduction and theoretical framework, (2) methodology, and (3) results, discussion, and conclusion. then, we carried out text preprocessing, including common text cleaning, such as transforming text to lowercase and removing symbols and infrequent words. finally, we utilised lämsä et al 4 | f l r the natural language toolkit (bird et al., 2009) to perform word stemming (reducing words to their root form) and common english stop word removal (e.g., the, at, is). 2.2 training topic models we used latent dirichlet allocation (lda) (blei et al., 2003) and the gensim library (rehurek & sohjka, 2010) to train topic models for each section 1–3 of the original articles. the output of training topic models includes both a list of topics and the trained topic model itself. the trained topic model can process new text and measure the presence of the listed topics. we performed a sensitivity analysis based on topic coherence values (provided by the gensim library) to find an appropriate number of topics for each section. as an outcome, we had trained three topic models, one for each section, that all included 17 topics. 2.3 labelling topics from topic models the trained topic models contained a list of topics found in each section. we labelled the topics by analysing the most representative words and utilising expert knowledge from the manual systematic review of the literature. if possible, we labelled the topics based on the theoretical framework to which the most representative words refer. we demonstrate this idea in table 1 using topic models for section 3 as an example, presenting the labels and the 10 most representative words. for example, topic 1 (temporal aspects of cscl) is a generic topic that illustrates a stage in the temporal analysis procedure. namely, researchers code messages of groups of students, after which they analyse the typical sequences of messages. this kind of sequential analysis reveals what kind of messages follow each other in a short temporal context whose duration may be a few messages (the words with italics refer to the 10 most representative words from topic 1). 2.4 obtaining topic occurrence in original articles in lda, articles are represented as lists of topic probabilities; the goal is to find the topic probabilities of a document that are better suited to rebuild the document by randomly selecting words. for example, if an article has a higher topic probability for topic 16 compared with other topics (see table 1), most of the words in the article can be selected from the top of topic 16. we refer to topic probabilities in an article as a topic occurrence to distinguish them from words’ probabilities inside a topic. when we used topic models for sections 1–3, we obtained 51 topic occurrences for each original article (17 topic occurrences for each topic model). 2.5 obtaining topic occurrence in new articles the process used for the new articles was very similar to the one applied to the old articles (figure 1). the only difference was that we directly applied the trained topic models for sections 1–3 to obtain the topic occurrences of the new articles. we illustrate the topic occurrences of original, new relevant, and new irrelevant articles using the topic model for section 3 in figure 2. for each article, some topics have a higher probability than the rest (e.g., topic 16 is more relevant to an original article than to a new irrelevant article; see (a) and (c) in figure 2). therefore, we expect to find semantic similarity between topic occurrences that have shorter distances. lämsä et al 5 | f l r table 1 five topics and the assigned topic labels, including the 10 most representative words from the topic model for section 3 (results, discussion, and conclusion). topic 1: temporal aspects of computersupported collaborative learning topic 7: regulation of learning and learning performance topic 8: regulation of learning topic 11: socially shared metacognitive regulation (ssmr) topic 16: collaborative knowledge construction number model regul ssmr discuss student group learn process group signific perform collabor phase student code focus task thread knowledg group student social studi behaviour analysi ssrl1 share research construct show collabor student differ learn sequenc challeng group inquiri result knowledg differ result note process messag individu discuss data pattern 1socially shared regulation of learning (a) (b) (c) figure 2: the topic occurrence of (a) an original article, (b) a new relevant article, and (c) a new irrelevant article obtained using the topic model for section 3. the distance between (a) and (b) was 0.33, between (a) and (c) was 0.65, and between (b) and (c) was 0.49. to answer rq1, the first and second authors screened and labelled the new 52 articles manually as relevant or irrelevant regarding the analysis of the temporal aspects of cscl. in the first phase, we screened the journal and title of the articles and labelled the studies that did not have a learning or lämsä et al 6 | f l r instructional context as irrelevant (n = 27, e.g., studies from environmental sciences). in the second phase, we also screened the abstract of the articles and labelled the studies that did not focus on cscl and analyse its temporal aspects as irrelevant (n = 2, e.g., a study that focused merely on learning performance). we solved the disagreements between the first and second authors in the common meetings among all the authors. altogether, 23 new articles focused on the analysis of the temporal aspects of cscl (relevant), while 29 articles did not (irrelevant). next, for each topic model, we measured the distance between the corresponding topic occurrences of the new articles and original articles. the shorter the distance between two articles, the more similar the topic occurrences (figure 2). for each new article, we kept the distance to the closest original article (i.e., the most similar because a new relevant article might not be related to every article in the manual systematic literature review). finally, we compared the distances between the relevant and irrelevant articles. we selected the most suitable topic model so that the topic occurrences of the new relevant articles were similar to the ones from the original articles. to answer rq2, we used the articles’ topic occurrences from the previously selected topic model (rq1). we measured the similarity between topic occurrences using the euclidean distance, and we applied hierarchical clustering to find groups of similar articles. we performed the clustering in three levels: the root, two subgroups, and the leaves. the root of the clustering contains all the articles: 52 new articles and 49 original articles. the root was then divided into two subgroups, denoting the greatest distance between the articles belonging to different subgroups. the leaves are groups of articles of varying sizes. we interpreted the clusters by examining the topic occurrences (figure 2) and previously assigned topic labels (table 1). 3. results 3.1 the topic model trained with the results, discussion, and conclusion sections identified new relevant articles most accurately. we identified new relevant articles relating to the temporal aspects of cscl by measuring the distance between a new article and the closest original article. the results showed that for the three topic models, the relevant new articles were closer to the original articles than the irrelevant articles (figure 3). particularly, the topic model for section 3, which we trained with results, discussion, and conclusion sections, gave the best results because the distance between the new relevant articles and the closest original article overlapped the least with the distance between new irrelevant articles and the closest original article [figure 3 (c)]. when we used the topic model for section 3 and the distance of 0.27 as a threshold, 71% of the new articles, which were closer than the threshold, were relevant. those relevant articles represent 74% of the total relevant articles, which minimised the number of irrelevant articles. when we used topic models for sections 1 and 2, the distances between the original articles and new relevant articles overlapped more with new irrelevant articles [figure 3 (a) and (b)]. table 2 summarises our results if the distance of 0.27 is considered for the threshold. lämsä et al 7 | f l r (a) (b) (c) figure 3: boxplots of the distances between relevant and irrelevant new articles and the closest original article separately for (a) topic model for section 1, (b) topic model for section 2, and (c) topic model for section 3. lämsä et al 8 | f l r table 2 the numbers of relevant and irrelevant articles identified and missed when using three topic models topic model (section 1): introduction and theoretical framework topic model (section 2): methodology topic model (section 3): results, discussion, and conclusion relevant identified (true positives) 10 17 17 relevant missed (false negatives) 13 6 6 irrelevant identified as relevant (false positive) 3 14 7 irrelevant identified as irrelevant (true negatives) 26 15 22 precision (proportion of the true positives to the sum of true and false positives) 0.77 0.55 0.71 recall (proportion of the true positives to the sum of the true positives and false negatives) 0.43 0.74 0.74 3.2 a few topics with high topic occurrence relate new relevant to original articles figure 4 shows the outcome of the hierarchical clustering. when interpreting figure 4, based on the cscl theoretical frameworks, a few topics concerning collaborative knowledge construction and regulation of learning relate new relevant articles to original articles. topic 16 (see table 1) relates five new relevant articles to 17 original articles (the leaves with double borders), and these articles mostly belong to a smaller subgroup. topics 7, 8, and 11 (see table 1) relate five new relevant articles to nine original articles (the leaves with bold borders), and these articles belong to a larger subgroup. most of the new irrelevant articles (n = 28) were clustered into three different leaves (figure 4). from this set, 17 articles appeared in the leaves with different topics. moreover, 11 articles appeared in the leaf that included only one new relevant article and one original article. most of the original articles (n = 31) had a topic with a value higher than 0.45. in contrast, the clusters formed by various topics contained articles in which the topic occurrence of the most important topic was less than 0.2, meaning that there were no predominant topics. because topic occurrence is a probability distribution (must sum up to one), topic occurrence is more scattered if no particular topic is more significant [see figure 2 (c)]; this feature clusters together most of the irrelevant articles, but it also mixes irrelevant articles with relevant articles that have several important topics. in our case, 12 new relevant articles emerged in the leaves with different topics. lämsä et al 9 | f l r figure 4: new and original articles’ clustered and associated topics. each leaf has a grey box with the number of new and original articles. the text in the leaves corresponds to the number of the main topics and their labels. lämsä et al 10 | f l r 4. discussion and conclusion when considering some of the most-cited journals in the educational research field (review of educational research and educational research review), systematic literature reviews may ‘shape the future of research and practice’ (murphy et al., 2017, p. 2; alexander, 2020). we showed how a machine learning–based method can be used to identify new relevant articles in the field of the manual systematic literature review (rq1) and how it can relate new and original articles to each other (rq2). this novel method may help to follow the evolution of the ‘big picture’ of the research fields based on multidisciplinary collaboration, such as studies focusing on the temporal analysis of cscl. because these studies are published in various forums and involve different theoretical frameworks, methodological approaches, and results and implications, our methodological innovation may reveal how literature reviews can ‘shape the future of research and practice’. even though our method has potential, there are several limitations and critical issues to consider because the current study was an initial attempt to investigate the potential of topic modelling in the context of staying in the front line of literature. first, instead of an ‘automatic’ method, our method could be called ‘semiautomatic’ (cf. tuhkala et al., 2018). for instance, we extracted the texts of different article sections manually. even though this extraction process could be automatised, we decided to focus on automating more complex phases of our procedure (see figure 1). in the future, we aim to automatise a pipeline in which all the articles that arise from certain search terms can be processed, filtered according to their relevance, and related to original articles. second, because the number of original articles was relatively small, we could apply a heuristic to automatically identify the relevant and irrelevant new articles (rq1; see table 2). our heuristic was based on identifying new relevant articles without including too many irrelevant ones (high precision value) or filtering relevant ones (high recall value; see table 2). in the future, more complex methods can be tested, particularly if there are more articles. third, we trained the topic model only with original articles (figure 1), so all the topics can relate to the temporal aspects of cscl and the analysis of these aspects. thus, there were no topics that could have properly described new irrelevant articles. we will consider training the topic models by using both original and new articles and using the articles of related systematic literature reviews to capture a broader picture of the field. despite these limitations, our innovative method may open up new avenues to follow patterns in the different research fields based on the content of articles, instead of, for example, mere bibliographic information (chen et al., 2020). a recently published editorial of educational research review (gruber et al., 2020, p. 1) highlighted that systematic literature reviews should ‘extend beyond reporting or summarising what has been done in a particular field’. here, we see our method as more complementary than contradictory to researchers’ manual work when using the review approach to address their research questions. namely, topic modelling (figure 1) is an unsupervised method, so it may reveal patterns (or topics; see an example in table 1) from the existing literature to which researchers may not pay attention to. moreover, our method may assist researchers in some timeconsuming tasks when they conduct systematic literature reviews. if considering the preferred reporting items for systematic reviews and meta-analyses (prisma) statement (moher et al., 2009) as an example, machine learning–based methods may help researchers in identifying the relevant articles (rq1) and in screening and assessing the eligibility of the articles based on their relatedness to the research problems of interest (rq2). this kind of assistance would allow for investing more resources in a critical review of the included articles and, thus, scientifically valuable contributions. at the same time, it is as crucial in machine learning–based methods as it is in manual systematic literature reviews that researchers report their decisions transparently throughout the process (cf. our procedure in figure 1 and sections 2.1–2.5). in our research context, the topic model for section 3—which we trained with the results, discussion, and conclusion sections—had the best performance in identifying new relevant articles (rq1); this may be related to the theoretical fragmentation of studies in the educational technology field (hew et al., 2019) because original papers had been published in the journals of both computer sciences lämsä et al 11 | f l r and learning sciences. thus, the topic model for section 1 could not properly separate new relevant and irrelevant articles. moreover, the methods used to analyse the temporal aspects of cscl (e.g., sequential analysis) have been used in many disciplines, so the predictive power of the topic model for section 2 might be affected by this issue. in addition to identifying new relevant articles with moderate accuracy by using the topic model for section 3, we could relate new and old articles based on the few topics present in this topic model (rq2). for example, we found leaves of eight articles (four new relevant and four original articles) and seven articles (five new relevant and two original articles) that seemed to concern the temporal aspects of cscl in the context of the regulation of learning and collaborative knowledge construction, respectively (figure 4). these findings may inform both practitioners and researchers by showing widely used theoretical frameworks and providing a ‘state of the research’ (murphy et al., 2017, p. 5; see figure 4). in the future, when the number of new articles increases, clearer clusters and leaves of original and new articles may emerge. researchers can follow the fluctuation of the rising research topics by monitoring the size of the leaves (figure 4). the increasing number of articles would allow for more focused machine learning–based literature reviews so that the procedure for describing new articles (figure 1) would focus on, for example, a certain theoretical framework through which the temporal aspects of cscl can be analysed. our method could also be applied in completely different research fields if there is an existing systematic literature review from that field, and it is possible to train the topic models based on the included articles in the review (section 2.1). as research fields differ from each other and similar fields may have fundamentally different research traditions, further studies could, for example, investigate how to obtain a topic model (section 2.2) and its essential topics (section 2.3) whose topic occurrences (sections 2.4–2.5) could separate studies with fundamentally different epistemological stances. obtaining topic models and interpreting their essential topics, which researchers can do to address their research aims, require thorough expertise on the research field, in addition to the knowledge and skills to apply machine learning–based methods. key points many research fields on learning sciences are developing rapidly, which makes conducting systematic literature reviews a time-consuming task. we propose an innovative method that uses an existing systematic literature review to describe the recent developments in the field being reviewed. we illustrate the potential of our method using the literature on the temporal analysis of computer-supported collaborative learning. our machine learning–based method identified new relevant articles and related them to the previous literature with moderate accuracy. our method may decrease the need to do recurring systematic literature reviews, giving researchers more working hours to advance their fields. acknowledgements this research was funded by the academy of finland [grant numbers 292466 and 318095, the multidisciplinary research on learning and teaching profiles i and ii of university of jyväskylä]. lämsä et al 12 | f l r references alexander, p. a. (2020). methodological guidance paper: the art and science of quality systematic reviews. review of educational research, 90(1), 6–23. https://doi.org/10.3102/0034654319854352 bird, s., loper e., & klein, e. (2009). natural language processing with python. o’reilly media inc. blei, d. m., ng, a. y., & jordan, m. i. (2003). latent dirichlet allocation. journal of machine learning research, 3, 993–1022. boyd-graber, j. l., hu, y., & mimno, d. (2017). applications of topic models (vol. 11). now publishers incorporated. chen, x., zou, d., & xie, h. (2020). fifty years of british journal of educational technology: a topic modeling based bibliometric perspective. british journal of educational technology, 51(3), 692–708. https://doi.org/10.1111/bjet.12907 gruber, h., hämäläinen, r. h., hickey, d. t., pang, m. f., & pedaste, m. (2020). mission and scope of the journal educational research review. educational research review, 30, 100328. https://doi.org/10.1016/j.edurev.2020.100328 hadwin, a. f. (2021). commentary and future directions: what can multi-modal data reveal about temporal and adaptive processes in self-regulated learning? learning and instruction, 72, 101287. https://doi.org/10.1016/j.learninstruc.2019.101287 hew, k. f., lan, m., tang, y., jia, c., & lo, c. k. (2019), where is the ‘theory’ within the field of educational technology research? british journal of educational technology, 50(3), 956–971. https://doi.org/10.1111/bjet.12770 järvelä, s., & rosé, c. p. (2020). advocating for group interaction in the age of covid-19. international journal of computer-supported collaborative learning, 15(2), 143–147. https://doi.org/10.1007/s11412-02009324-4 lämsä, j. (2020). developing the temporal analysis for computer-supported collaborative learning in the context of scaffolded inquiry [doctoral dissertation, university of jyväskylä]. jyu dissertations, 245. http://urn.fi/urn:isbn:978-951-39-8248-5 lämsä, j., hämäläinen, r., koskinen, p., viiri, j., & lampi, e. (2021). what do we do when we analyse the temporal aspects of computer-supported collaborative learning? a systematic literature review. educational research review, 33, 100387. https://doi.org/10.1016/j.edurev.2021.100387 moher, d., liberati, a., tetzlaff, j., altman, d.g., & the prisma group. (2009). preferred reporting items for systematic reviews and meta-analyses: the prisma statement. plos med, 6(7), 1–6. https://doi.org/10.1136/bmj.b2535 murphy, p. k., knight, s. l., & dowd, a. c. (2017). familiar paths and new directions: inaugural call for manuscripts. review of educational research, 87(1), 3–6. https://doi.org/10.3102/0034654317691764 rehurek, r., & sojka, p. (2010). software framework for topic modelling with large corpora. proceedings of the lrec 2010 workshop on new challenges for nlp frameworks (pp. 45–50). elra. https://doi.org/10.13140/2.1.2393.1847 tuhkala, a., kärkkäinen, t., & nieminen, p. (2018). semi-automatic literature mapping of participatory design studies 2006–2016. in proceedings of the 15th participatory design conference (pp. 1–5). association for computing machinery. https://doi.org/10.1145/3210604.3210621 microsoft word pesu et al_publication.docx ! ! ! ! ! frontline)learning)research)vol.4)no.)3)(2016))92)=)109) issn)2295=3159)) ! *corresponding author at: department of psychology, p.o. box 35, university of jyväskylä, fin-40014, finland, email address: laura.a.pesu@jyu.fi doi: http://dx.doi.org/10.14786/flr.v4i3.249! the development of adolescents’ self-concept of ability through grades 7-9 and the role of parental beliefs laura pesu*, kaisa aunola, jaana viljaranta, & jari-erik nurmi university of jyväskylä, finland article received 7 april / revised 4 july / accepted 8 july / available online 20 july! abstract this study examined the development of adolescents’ self-concept of ability in mathematics and literacy during secondary school, and the role that mothers’ and fathers’ beliefs concerning their child’s abilities play in this development. also examined was whether the role of mothers’ and fathers’ beliefs about their adolescent child’s ability in mathematics and literacy differs according to the adolescent’s gender and level of performance. a total of 231 adolescents and their mothers and fathers were followed up across secondary school. the results showed, first, that adolescents’ self-concept of ability declined slightly from grade 7 to grade 9 in both mathematics and literacy. second, mothers’ and fathers’ beliefs about their adolescent child’s abilities in grade 7 predicted the child’s subsequent self-concept in grade 9, but only in mathematics. third, the role of mothers’ beliefs in their child’s self-concept of mathematics ability was found to be stronger among high-performing than low-performing adolescents. keywords: self-concept of ability; secondary school; mother’s beliefs; father’s beliefs pesu%et%al% % % | f l r ! ! 93! 1. introduction students’ self-concept of ability in different academic domains, that is, the knowledge and perceptions individuals have of themselves in a particular subject area (bong & skaalvik, 2003; brunner, keller, hornung, reichert, & martin, 2009) influences their academic performance and the academic careerrelated choices they make (eccles et al. 1983; marsh, trautwein, lüdtke, köller, & baumert, 2005; valentine, dubois, & cooper, 2004; wigfield, eccles, schiefele, roeser, & davis-kean, 2006). since these self-conceptions guide students’ actual performance at school and hence their future education and related decisions, it is important to identify the factors that support the development of self-concept, particularly during the critical period of adolescence when self-concept of ability typically declines (nagy et al., 2010; wigfield et al., 1997). because the development of self-concept of ability has been suggested to be linked to interaction with other people (dermitzaki & efklides, 2000), such as parents, the present study examined the development of self-concept of ability in literacy and mathematics among 231 finnish adolescents from grade 7 to grade 9, and the role that mothers’ and fathers’ beliefs about their children’s abilities play in this development. also investigated was whether children’s gender and level of performance influence the possible associations between parental beliefs and their child’s self-concept of ability. 1.1 self-concept of ability recent research has led to an understanding that self-concept is multidimensional and hierarchical in nature and is formed in social comparison and in communication with significant others (bong & skaalvik, 2003). thus, academic self-concept may be different for the domains of mathematics and verbal skills, for example (arens, yeung, craven, & hasselhorn, 2011). previous research has shown that mathematics and verbal self-concepts are almost uncorrelated although achievement in mathematics and verbal subjects substantially correlate (marsh, 1990; marsh, byrne, & shavelson, 1988). the internal/external frame of reference (i/e) model focuses on explaining why this is. according to the i/e model academic self-concept in a specific school subject is formed in relation to two comparison processes that are called “frames of reference” (marsh & yeung, 2001). in the external (normative/social comparison) frame of reference a student compares his/her own performance in a particular domain (e.g. mathematics) with her/his perception of other students’ performance in this domain. in the internal (ipsative-like) reference a student compares his/her own performance in a particular domain (e.g. mathematics) with his/her performance in other school subjects (e.g. literacy). the actual self-concept in a particular school domain is formed in these simultaneous comparison processes. thus, if a student is poor in mathematics compared to other students in his/her class (external comparison), but in comparison to his/her performance in other school subjects is doing better in mathematics than in other subjects, his/her mathematics self-concept can be good. based on the internal/external frame of reference (i/e) model, as well as previous empirical studies showing that mathematics and verbal self-concept domains are distinct (arens et al., 2011), in the present study selfconcept is approached subject-specifically. the expectancy-value theory by eccles et al. (1983) provides a theoretical framework for selfconcept in the academic setting. according to the expectancy-value theory (eccles et al., 1983; eccles & wigfield, 1995; wigfield & eccles, 2000) individuals’ performance in school and their academic choices are explained not only by the extent to which they value the activity in question, but also by the expectancies they have for success in that activity (wigfield & eccles, 2000). according to the theory, students’ selfconcept of ability, that is, the individual’s perception of his or her competence in a certain academic domain, influences the expectancies students have and, through these expectancies, different academic outcomes, such as performance (wigfield & eccles, 2000). theoretically, self-concept of ability is distinct from expectancy of success: self-concept of ability focuses on present ability while expectancies focus on the future. however, empirically these two concepts have not been found to be separate (eccles et al., 1983; wigfield & eccles, 2000). pesu%et%al% % % | f l r ! ! 94! previous research has shown that students’ self-concept of ability plays an important role in academic environments by directing behavior and effort in learning situations (e.g., atkinson, 1964; bandura, 1986; eccles et al., 1983; wigfield et al., 2006). students who believe in their abilities and expect that they can and will do well in a task are much more likely to perform better and to engage in an adaptive manner in such academic tasks than students who do not believe in their abilities and expect to fail in a certain task (chapman, tunmer, & prochnow, 2000; eccles et al., 1983; pintrich & schunk, 2008). similar results have been found among both younger school-aged children (chapman et al., 2000) and adolescents (caprara, vecchione, alessandri, gerbino, & barbaranelli, 2011; eccles et al., 1983), and in different academic domains, such as math (chiu & klassen, 2010; eccles et al., 1983) and literacy (chapman et al., 2000; chiu & klassen, 2009). among adolescents, self-concept of ability has further been found to predict career choices. it has been shown, for example, that students who have greater confidence in their math abilities are more likely to aspire to math-related careers than students whose confidence in their math abilities is lower (eccles, 2007). several studies have shown that the development of self-concept of abilities is a continuous process that starts at the very beginning of the school career. young students typically have very positive, and even unrealistic, perceptions of their abilities during the first years of primary school (aunola, leskinen, onatsuarvilommi, & nurmi, 2002), but as they grow older, their perceptions of their abilities become more realistic and more negative (jacobs, lanza, osgood, eccles, & wigfield, 2002). one important phase for the development of self-concept of ability is early adolescence (preckel, niepel, schneider, & brunner, 2013). during this time, many physical changes and changes in a person’s environment and social context take place. at the same time an educational transition usually takes place the transition to secondary school. this transition means changes in adolescents’ everyday social contexts, in the ways adolescents get feedback in school and in their frames of reference (see wigfield et al., 2006). the rates of self-concept of ability in mathematics and literacy have been shown to decline during elementary and secondary school (e.g. wigfield, eccles, maciver, reuman, & midgley, 1991). because the earlier studies on the topic have mainly been carried out in the us (eccles et al., 1983; nagy et al., 2010), australia (nagy et al., 2010; watt, 2004), or germany (nagy et al., 2010; preckel et al., 2013), it is not known, however, whether the results on the tendency of self-concept of ability to decline during the transition to secondary school apply to other cultural and educational settings. consequently, the first aim of the present study was to examine the developmental changes in self-concept of mathematics and literacy abilities during secondary school in finland. the characteristics of the finnish school system differ from school systems in some other countries. in finland, children start their education by attending pre-school in the year they turn 6. in the year of their 7th birthday children start compulsory comprehensive school which is divided into a lower level (i.e., elementary school; grades 1-6) and an upper level (i.e., secondary school; grades 7-9). in finnish secondary schools all students are taught at the same academic level and students do not need to make decisions whether to take higher or lower level courses. this characteristic of finnish school system is different from, for example, the system in germany where students need to decide which achievement-based secondary school track they take (gniewosz & noack, 2012). because in finland the compulsory courses are at the same level for everyone! both highand low-performing students are studying in the same classrooms. moreover, in finnish comprehensive school education extra attention is paid to support particularly those students who have difficulties in their learning. the fact that finnish school system includes well-developed support services for students suffering, for example, from learning difficulties has been suggested to partly explain finnish students’ academic success in worldwide pisa-studies (välijärvi et al., 2007). overall, the fact that in finland all students are taught at the same academic level independent of their level of performance or motivation and that extra attention is paid to support students with learning difficulties may positively impact the students’ self-concept development, particularly among students showing lower performance. pesu%et%al% % % | f l r ! ! 95! 1.2 the role of parents previous studies have shown that ability-related self-concepts develop in interaction with one’s environment, and are affected by evaluations of and feedback from parents (bong & skaalvik, 2003; eccles et al. 1983; gniewosz, eccles, & noack, 2014; shavelson, hubner, & stanton, 1976). according to the expectancy-value model proposed by eccles and colleagues (1983), parental beliefs about their children’s abilities may affect children’s self-concept of ability development through at least two mechanisms (see e.g. eccles, 1993). first, parents may directly tell their children what they think the child is good at (jacobs & eccles, 2000). second, parents can also provide different learning opportunities for their children based on their beliefs about their children’s abilities (jacobs & eccles, 2000). children then interpret this information from their parents and incorporate it into their self-concept of ability (jacobs & eccles, 2000). there is also strong empirical evidence for the assumption that parents’ beliefs about their children’s academic performance affect children’s subject-specific self-concept of ability (eccles parsons, adler, & kaczala, 1982; frome & eccles, 1998; gniewosz, eccles, & noack, 2012; jacobs, 1991; mcgrath & repetti, 2000; phillips, 1987). for example, parents’ beliefs in their child’s success in the literacy domain have been found to be positively related to sixth-grade children’s self-concept of their literacy ability (frome & eccles, 1998). similar results have been found in the domain of mathematics (eccles parsons et al., 1982; gniewosz et al., 2012). although the importance of parental beliefs in the formation of children’s self-concept of mathematics and literacy ability is widely acknowledged, there is some evidence that the role of parental beliefs in the development of students’ self-concept may vary with age (e.g., gniewosz et al., 2012). for example, pesu, viljaranta and aunola (2016) found that teachers’ beliefs played a bigger role than parents’ beliefs in first-grade students’ self-concept of mathematics and literacy ability development. gniewosz et al. (2012), in turn, found that the effects of maternal child-related competence beliefs on students’ mathematics self-concept increased during the secondary school transition, whereas the effect of grades decreased. interestingly, after the school transition the impact of maternal competence beliefs decreased and the impact of grades increased. when interpreting the previous results on the topic it should be noted that although longitudinal procedures were applied when predicting children’s self-concept of ability by parental beliefs, children’s self-concept of ability may also play a role in parental beliefs. the studies focusing on the role of parental beliefs in students’ self-concept of abilities have also found some gender differences. for example, it has been shown that parents typically think that boys are better at mathematics than girls (eccles parsons et al., 1982; eccles & jacobs, 1987; gunderson, ramirez, levine, & beilock, 2012), independently of children’s actual performance in mathematics (eccles, 1993; eccles parsons et al., 1982). this has been shown to impact girls’ self-perceptions in mathematics (jacobs, 1991). conversely, parents tend to think that girls do better in literacy (gniewosz et al., 2014). although there are studies focusing on these mean-level-differences in parental beliefs concerning boys and girls, less is known, however, whether there is variability in the relations among parental beliefs and their children’s self-concept of ability between boys and girls. according to simpkins, fredricks, and eccles (2012) it is important to study whether the associations among the indicators vary as a function of gender because socialization and cognitive theories suggest that the associations are not similar for boys and girls. according to these theories adolescents most likely act in a similar way as people who are most similar to themselves (maccoby, 1998). this suggests that mothers may have a stronger impact on their daughters than to their sons and fathers on their sons than their daughters (maccoby, 1998). furthermore, testing the moderating effect of gender is important because it may have important implications for interventions (simpkins et al., 2012): if parental beliefs have different impact on boys and girls self-concept of ability, the interventions should take this into consideration when thinking the best ways to support girls and boys. whether the effect of parental beliefs about their children’s abilities on children’s self-concept development is affected by the child’s gender is thus far, however, underexplored. alongside gender it has been recently suggested that the child’s level of performance may also impact the association between adults’ beliefs and students’ self-concept of ability (pesu et al., 2016). in the study by pesu et al. (2016), the impact of teachers’ beliefs on first-grade students’ self-concept of pesu%et%al% % % | f l r ! ! 96! mathematics and reading ability was different depending on the level of student’s performance: among highperforming students, teachers’ beliefs had a positive impact on students’ self-concept of mathematics and reading ability, whereas among low-performing students, teachers’ beliefs did not have this positive impact. pesu et al. (2016) suggested that one explanation for this differential impact of teacher beliefs is that highperforming children are more prone to be affected by adults’ beliefs than low-performing children as (owing to their cognitive abilities) they are able to make more accurate interpretations of adults’ feedback and their own performance. also, bohlmann and weinstein (2013) argued that children’s cognitive reasoning skills affect the way they perceive, interpret, and attribute meaning to teachers’ actions. thus, it can be that students who have better cognitive skills are better able to interpret adults’ feedback overall. however, the differential role that parental beliefs have on student self-concept, depending on the student’s level of performance, has not to our knowledge been investigated among older children like secondary school students. finding out differences in the associations between parental beliefs and students’ self-concept of ability depending on students’ level of performance might have important implications for interventions. one further limitation of earlier research is that the majority of studies on the role of parental beliefs have focused on the role of mothers (for exceptions, see frome & eccles, 1998; gniewosz & noack, 2012; pesu et al., 2016), to the relative neglect of the role of fathers’ beliefs. however, it might be that mothers and fathers play a different role in their children’s self-concept development (frome & eccles, 1998; macgrath & repetti, 2000; maccoby, 1998). consequently, the second aim of the present study was to investigate the role of mothers’ and fathers’ beliefs about their adolescent children’s abilities in mathematics and literacy in the development of adolescents’ self-concept of ability during secondary school. further, possible differences in these associations depending on the adolescent’s gender, on the one hand, and level of performance, on the other, were investigated. the research questions were: a) to what extent finnish adolescents’ self-concept of mathematics and literacy ability change during secondary school? based on earlier literature, we hypothesized that self-concept of mathematics and literacy ability decline during grades 7-9 (nagy et al., 2010; wigfield et al., 1991). b) do parental beliefs concerning adolescents’ abilities predict the development of adolescents’ self-concept of literacy and mathematics ability during grades 7-9? we hypothesized that mothers’ and fathers’ beliefs positively predict adolescents’ subsequent self-concept of literacy and mathematics ability (e.g. frome & eccles, 1998;!gniewosz et al., 2012). c) are there differences in the associations between parental beliefs and adolescents’ selfconcept of abilities depending on adolescents’ a) gender, b) level of performance? we set two alternative hypotheses concerning the gender differences in the associations. as the first hypothesis, we hypothesized that the associations of mothers’ beliefs with adolescents’ selfconcept of ability are stronger among girls than among boys whereas the associations of fathers’ beliefs with adolescents’ self-concept of ability are stronger among boys than among girls, as suggested by the socialization model (maccoby, 1998). as the second hypothesis, we hypothesized that gender does not play a role in the connections between mothers’/fathers’ beliefs and self-concept of ability because previous studies have not found these kinds of gender differences (pesu at al., 2016; simpkins et al., 2012).! based one previous results by pesu et al. (2016), we also hypothesized that the role of mothers’/fathers’ beliefs in self-concept of ability is stronger among highthan low-performing students. pesu%et%al% % % | f l r ! ! 97! 2. method 2.1 participants the present study is a part of a longitudinal study (the jyväskylä entrance into primary school (jeps) study (nurmi & aunola, 1999–2009)) focusing on students’ academic and motivational development from the beginning of the school career until the end of comprehensive school. the sample comprised students from two medium-sized districts (urban or semi-urban areas) in central finland. the present study focuses on the data obtained from the adolescents and their parents when the former were in the 7th and 9th grades. the participants were 231 students in grade 7 and 221 in grade 9 (in grade 7: 114 girls and 117 boys, in grade 9:107 girls and 114 boys) and their mothers (n = 221) and fathers (n = 191). the adolescents filled in questionnaires on their self-concept of ability in the spring of the 7th grade and again in the spring of the 9th grade. performance in mathematics and literacy was assessed by tests in the spring term of the 7th grade. all questionnaires and tests were performed during regular school hours in classroom group situations by trained investigators. mothers and fathers were asked to fill in mailed questionnaires concerning their beliefs about their child’s performance in mathematics and literacy in the spring of the grade 7. the response rate was 96 % for mothers and 83% for fathers. the families participating in the study were to some extent more educated than the finnish population overall (statistics finland, 2010): 11.5% of mothers and 12.1% of fathers had no vocational education, 26.6% of mothers and 38.4% of fathers had a vocational education, and 61.9% of mothers and 49.6% of fathers had a degree from an institution of higher learning (e.g., polytechnic) or university. at the beginning of the 7th grade, 68,3% of the children were living in a nuclear family, 13,5% were living in a blended family, and 9,1% were living in a single parent household. 2.2 measures 2.2.1 self-concept of ability in literacy and mathematics students’ self-concept of ability in mathematics and literacy was measured with a questionnaire based on the ideas presented by eccles and wigfield (1995). students were asked to answer three questions, separately for mathematics and literacy (how good are you at mathematics / literacy? how good do you think you are at mathematics / literacy compared to the other students in your class? how hard are assignments related to mathematics / literacy for you (revised)) on a 5-point likert-scale. self-concept of ability in mathematics and literacy were scored separately by calculating the mean of the three items in each case. the cronbach’s alpha reliabilities for self-concept in mathematics in grade 7 and grade 9 were .87 and .89, respectively, and for self-concept in literacy .81 and .81, respectively. 2.2.2 adolescents’ performance in mathematics adolescents’ performance in mathematics was assessed with the group-administered ktlt test (räsänen & leino, 2005), which is a standardized math test for grades 7-9 (13-16 years). the test consists of 40 mathematical tasks (basic calculation and equation tasks, word problems, geometry tasks, measurement tasks), to be done individually. one point was given for each correct answer. the test was administered with a 45-minute time limit. the internal reliability of the test in the present data was .86. the internal reliability of the test in the normative data (n = 1,157) has been shown to be 0.88 (räsänen & leino, 2005). the test has also been shown to correlate with other measures of mathematical skills (r = 0.61–0.78, p < 0.001; räsänen & leino, 2005). pesu%et%al% % % | f l r ! ! 98! 2.2.3 adolescents’ performance in literacy adolescents’ performance in literacy was measured by three subtests taken from the test of word reading, spelling and reading comprehension (holopainen, kairaluoma, nevala, ahonen, & aro, 2004): a) in the first spelling error task, participants were asked to mark with a vertical line on 100 words typed on a sheet of paper as many spelling errors (an extra, missing, or wrong letter in a word) as they could identify in 3.5 minutes. the score was the number of correctly detected errors. the test-retest reliability for the subtest has been shown to be 0.83 (holopainen et al., 2004). b) in the second word chain test, the participants were asked to separate understandable words in a word chain by drawing a line between the words. a total of 100 words were presented in chains of four words with no spaces between them. the adolescents were allowed 3.5 minutes to find the end of one word and the beginning of a new word in each chain and to mark it with a vertical line. the test was scored as the number of correctly found words. the test-retest reliability of the subtest has been shown to be 0.84 (holopainen et al., 2004). c) in the reading comprehension test, the participants were asked to read a four-page long story (the hounds of the village, written by finnish author veikko huovinen), in which 52 words had been changed so that they did not fit in with the story (i.e., they were in contradiction with the meaning of the sentence, paragraph or larger text context). the participants were asked to underline all the inappropriate words they could find. a point was given for each correctly underlined word. the time limit for the subtest was 45 minutes. the sum score of the standardized three subtest scores was taken as the measure of literacy performance. the cronbach’s alpha reliability of the sum score was .81. 2.2.4 mothers’ and fathers’ beliefs about their child’s performance in literacy/mathematics mothers’ and fathers’ beliefs were measured at the end of the 7th grade with 2 items (e.g. how well do you think your child is doing in literacy/mathematics at the moment? how well do you think your child will do in literacy/mathematics in the future?) using a 4-point likert-scale. the cronbach alpha reliabilities of the scale were .92 (literacy) and .93 (mathematics) among mothers and .92 (literacy) and .93 (mathematics) among fathers. 2.2.5. analyses strategy the analyses were carried out along the following steps. first, the developmental changes in adolescents’ self-concepts of mathematics and literacy abilities from grade 7 to grade 9, and possible gender differences in these changes, was investigated by repeated measures anova. second, hierarchical regression analyses were carried out to examine whether parents’ beliefs about their adolescent children’s abilities in mathematics and literacy play a role in the development of adolescents’ self-concept of mathematics and literacy ability during secondary school and whether the role of parental beliefs differs according to the adolescent’s gender or level of performance. in these analyses, adolescents’ self-concept of ability in a specific school subject in the spring of the ninth grade (time 2) was predicted by their selfconcept of ability in that subject in the spring of the seventh grade (time 1), academic performance in that subject in the seventh grade (time 1), gender, and mothers’ or fathers’ beliefs about their child’s abilities in the spring of the seventh grade (time 1). each variable was entered stepwise in the analysis. the effects of mothers’ and fathers’ beliefs were tested in separate analyses. in order to determine whether any connection existed between mothers’/fathers’ beliefs and the adolescents’ subsequent level of self-concept of ability was influenced by the adolescents’ gender or by the adolescents’ level of performance, the related interaction terms (gender x belief or academic performance x belief) were added to the analysis in the last step. each interaction term was tested in a separate analysis. the analysis was carried out separately for self-concept of mathematics ability and self-concept of literacy ability. in order to be able to examine the effects of the pesu%et%al% % % | f l r ! ! 99! interaction terms, all the predictor variables were standardized before being added to the regression models and before calculating any interaction terms. the missing data was handled pairwise. 3. results the means (m), standard deviations (sd), and pearson product-moment-correlations of the study variables are shown in table 1. the results of repeated measures anova showed that adolescents’ self-concept of mathematics ability slightly declined from grade 7 (m = 3.41, sd = 0.82) to grade 9 (m = 3.30, sd = 0.94; f (1, 202) = 5.87, p < .05). their self-concept of literacy also slightly declined during this period (time 1: m = 3.54, sd = 0.70; time 2: m = 3.44, sd = 0.72; f (1, 203) = 3.86, p = .05). self-concept of literacy ability was higher among girls than boys across the measurement points (f (1, 202) = 21.14, p < .001), whereas self-concept of math ability was higher among boys than girls (f (1, 202) = 6.23, p < .05). no gender differences in the change in self-concepts from grade 7 to grade 9 were, however, evident. pesu%et%al% % % | f l r ! ! 100! table 1 intercorrelations, means, and standard deviations for the study variables variables 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 1. self-concept literacy t1 2. self-concept literacy t2 .48c 3. self-concept math t1 .12 .20b 4. self-concept math t2 .08 .29c .71c 5. performance literacy t1 .38c .45c .23b .23b 6. performance math t1 .26c .27c .59c .51c .60c 7. gender -.21b -.32c .17b .14a -.36c .00 8. mother belief literacy t1 .46c .40c .29c .23b .57c .46c -.30c 9. mother belief math t1 .09 .16a .64c .62c .39c .64c -.01 .53c 10. father belief literacy t1 .35c .35c .25b .24b .51c .48c -.28c .56c .38c 11. father belief math t1 .01 .06 .54c .55c .32c .56c .02 .40c .67c .58c m 3.56 3.47 3.36 3.30 46.34 21.52 2.90 2.76 2.87 2.84 sd .69 .74 .84 .94 14.43 6.33 .79 .80 .69 .76 note. a = p < .05. b = p < .01. c = p < .001. t1 = time 1, t2= time 2 pesu%et%al% % % | f l r ! ! 101! 3.1 math-related self-concept the results of the hierarchical regression analyses for mathematics-related self-concept (see table 2) showed, first, that individual differences in self-concept of mathematics ability were relatively stable from grade 7 to grade 9. second, mothers’ (β = .28, p < .001) and fathers’ (β = .22, p < .001) beliefs about their child’s abilities predicted adolescents’ subsequent self-concept of mathematics ability at the end of grade 9, after controlling for the previous levels of self-concept of mathematics ability and mathematics performance: the higher the beliefs parents had about their child’s mathematics ability in grade 7, the better the adolescents’ self-concept of mathematics ability was in grade 9. finally, the connections between adolescents’ self-concept of mathematics ability and mothers’ belief in mathematics was found to be different depending on the adolescent’s level of mathematics performance (β = .13, p < .01). to examine this interaction effect further, aiken and west’s (1991) procedure was used. in this procedure, simple slopes for the mothers’ belief variable in the prediction of adolescents’ mathematics self-concept were calculated and presented using standardized scores separately for adolescents who showed either low (–1 sd) or high (+1 sd) levels of mathematics performance. the results are shown in figure 1. the results showed that among high-performing adolescents, mothers’ beliefs positively predicted subsequent self-concept of mathematics ability, whereas among low-performing adolescents this positive effect of mothers’ beliefs was weaker. the impact of parental beliefs was similar for boys and girls. table 2 the results of hierarchical regression analyses for mathematics related self-concept at time 2 (standardized betas) predictor mathematics related self-concept at time 2 step1 β step2 β step3 β step4 β step5 β a. self-concept (time 1) .71*** .71*** .62*** .50*** .47*** b. gender .02 .04 .06 .07 c. performance (time 1) .14* .03 .06 d. beliefs d1. beliefs mother (time 1) .28*** .30*** d2. beliefs father (time 1) .22** .23** e. interaction terms b x d1 -.16 b x d2 .05 c x d1 .13** c x d2 .04 r2 = .51 r2 = .51 r2 = .52 r2 = .55-.561 r2 = .55-.571 note 1. *** p < .001, ** p < .01, * p < .05 the effects of mothers’ and fathers’ beliefs were each tested in separate analyses. similarly all interaction terms were tested in separate analyses. 1 r2 varies depending on which variables are included into the model as predictor variables. pesu%et%al% % % | f l r ! ! 102! figure 1. the impact of mothers’ beliefs on students’ mathematics related self-concept among low, medium and high-performing students 3.2 literacy-related self-concept the results of hierarchical regression analyses (see table 3) showed, first, that individual differences in self-concept of literacy ability were relatively stable through grades 7-9. the results showed further that, after controlling for the previous level of self-concept and literacy performance, mothers’ or fathers’ beliefs did not predict adolescents’ self-concept of literacy ability. no parental belief x gender or parental belief x performance interaction effects were found either. pesu%et%al% % % | f l r ! ! 103! table 3 the results of hierarchical regression analyses for literacy related self-concept at time 2 (standardized betas) predictor literacy related self-concept at time 2 step1 β step2 β step3 β step4 β step5 β a. self-concept (time 1) .48*** .43*** .34*** .32*** .32*** b. gender -.23*** -.15* -.15* -.15* c. performance (time 1) .27*** .24** .24** d. beliefs d1. beliefs mother (time 1) .08 .08 d2. beliefs father (time 1) .08 .09 e. interaction terms b x d1 .21 b x d2 .03 c x d1 .02 c x d2 .08 r2 = .23 r2 = .28 r2 = .33 r2 = .34. r2 = .34 note 1. *** p < .001, ** p < .01, * p < .05 the effects of mothers’ and fathers’ beliefs were each tested in separate analyses. similarly all interaction terms were tested in separate analyses. 4. discussion the present study aimed to contribute to the literature on students’ self-concept of ability by examining, first, to what extent developmental changes in self-concept of mathematics and literacy abilities occur among finnish students across secondary school and, second, what role mothers’ and fathers’ beliefs play in the development of adolescents’ self-concept of mathematics and literacy ability during this period. furthermore, whether the possible associations of parental beliefs with adolescents’ self-concepts of abilities are influenced by adolescents’ gender or level of performance was investigated. the results showed first that both self-concept of mathematics and literacy ability slightly declined during secondary school. second, mothers’ and fathers’ beliefs about their child’s abilities predicted changes in the adolescents’ self-concept of ability, but only in mathematics: the higher the beliefs parents had about their child’s mathematics ability in grade 7, the better the adolescents’ subsequent self-concept of mathematics ability was in grade 9. furthermore, the role of mothers’ beliefs in adolescents’ self-concept of mathematics ability was found to be particularly strong among those adolescents who showed a high level of mathematics performance. finally, gender did not have an effect on the connections between parental beliefs and adolescents’ self-concept of ability development in mathematics or literacy. 4.1. the development of self-concept of ability the results of this study showed first that adolescents’ self-concept slightly declined during secondary school among both girls and boys. this result is consistent with previous results reported among us (nagy et al., 2010; wigfield et al., 1991), german (nagy et al., 2010) and australian students (nagy et pesu%et%al% % % | f l r ! ! 104! al., 2010) and suggest that also in the finnish context the secondary school years are an important time for the development of self-concept. the period of the transition to secondary school brings many changes in adolescents’ lives. their everyday social contexts change, the ways they get feedback at school change, and their frames of reference change (see wigfield et al., 2006). it is noteworthy, however, that in the present study the decline in self-concept was only minor. one explanation for there being only a slight decline in self-concept can be found in the finnish national curriculum guidelines, according to which teachers should focus on motivating both boys and girls equally to learn and to help them build a positive self-concept. thus, it is possible that since finnish teachers are aware of the importance of supporting self-concept construction, students receive much support from their school in this area, and thus show less of a decline in self-concept during adolescence. 4.2. the role of mothers’ and fathers’ beliefs in self-concept of ability development the results of the present study showed further that mothers’ and fathers’ beliefs predicted students’ self-concept of mathematics ability development across secondary school: the higher parental beliefs at the beginning of secondary school, the higher the adolescent’s self-concept in mathematics at the end of secondary school. the results are in line with eccles et al.’s expectancy-value theory which suggests that parental beliefs affect their children’s self-concept of ability (eccles parsons et al., 1982; frome & eccles, 1998; lau & pun, 1999; mcgrath & repetti, 2000). previous empirical research on the role of parents in students’ self-concept of ability development, however, has mainly focused on the role of mothers’ beliefs whereas that of fathers’ has received less attention. however, there is some evidence that mothers and fathers both play a role in their children’s self-concept of ability development in both mathematics and literacy (frome & eccles, 1998; gniewosz et al., 2014) at least among sixth-grade students (frome & eccles, 1998) and fifthto seventh-graders (gniewosz et al., 2014). the results of the present study also indicate that in secondary school both mothers’ and fathers’ beliefs have an impact on adolescents’ self-concept of ability in the domain of mathematics. this result adds to the literature since previous studies on the role of parents have not focused on this particular age group. however, the present results are inconsistent with those of previous research insofar as mothers’ and fathers’ beliefs did not play a role in their adolescents’ self-concept development in the domain of literacy. there are several possible explanations for this result. first, it is possible that achievement feedback is less clear in literacy than in mathematics, which would help explain why parents had a more evident role in adolescents’ self-concept development in mathematics. another possibility is that because mathematics is typically considered a more difficult school subject than literacy, and because there is a clearer declining trend in the self-concept of mathematics ability, the self-concept of mathematics ability is particularly prone to external feedback. third, previous studies showing connections between parental beliefs and their children’s self-concept of ability development have been conducted in cultural settings other than finland. research has revealed that finnish children attain fluency in native language reading and writing earlier, by the end of the first school year (seymour, aro, & erskine, 2003) than for example english-speaking children, whose rate of literacy skills development is more than twice as slow (seymour et al., 2003). slower literacy skills development has been attributed to fundamental linguistic differences in syllabic complexity and orthographic depth (seymour et al., 2003). for this reason, finnish parents might involve themselves less in their children’s literacy-related studies than mathematics studies, also later on. thus, parents might have less information about their children’s success in literacy than in mathematics and thus less influence on their children’s self-concept in literacy than in mathematics. the results of the present study showed, finally, that the role of mothers’ beliefs about their adolescent child’s mathematics ability was dependent on the level of the adolescent’s performance: mothers beliefs were positively related to their children’s self-concept of mathematics ability among high-performing adolescents but less so among low-performing adolescents. these results are in line with the results of pesu et al. (2016), who found that the role of teachers’ beliefs on first graders’ self-concept of mathematics and reading ability differed depending on the level of the student’s performance: teachers’ beliefs had a positive pesu%et%al% % % | f l r ! ! 105! impact on students’ self-concept of mathematics and reading ability only among high-performing students, not among low-performing students. there are several possible explanations for this result that mothers’ beliefs play a role, particularly among high-performing children. first, it might be that mothers communicate their beliefs, even where they are equally positive, differently to children whose levels of performance are different. thus, the effect of mothers’ beliefs would be different for children who perform differently at school. second, it could be that students interpret mothers’ cues about their beliefs differently depending on their level of performance. bohlmann and weinstein (2013) argued that children’s cognitive abilities influence their perceptions and interpretations of teachers’ actions. it is possible that children’s cognitive abilities influence their perceptions of external feedback overall. thus, it could be that high-performing adolescents are cognitively better able to accurately perceive and interpret mothers’ beliefs (see also, pesu et al., 2016). the present study showed that gender had no effect on the development of self-concept of ability in either mathematics or literacy. this result is consistent with previous studies showing similar patterns in the development of self-concept in boys and girls (e.g. nagy et al., 2010). the results of the present study showed further that gender did not influence the relationship between mothers’ and fathers’ beliefs and adolescents’ self-concept of ability development. since finland can be considered an egalitarian culture (chiu & klassen, 2009; chiu & klassen, 2010), there might be fewer gender differences overall. in an egalitarian culture, individuals are taught to view, value, and act towards one another as equals based on their common humanity (chiu & klassen, 2009; chiu & klassen, 2010). people learn these practices and values through formal and informal socialization, including through schooling (chiu & klassen, 2009; chiu & klassen, 2010). finnish culture is also considered as having little characteristics of a masculine culture (chiu & klassen, 2009; chiu & klassen, 2010). in masculine cultures males are typically favored in higher status roles, and women have lower income (cheung & chan, 2007). because gender roles are rigid in masculine cultures, this may lead, for example, girls to value mathematics learning less, devote less time to studying mathematics and have lower mathematics self-concept than boys (hofstede, 2003; wigfield, tonks, & eccles, 2004). as finland is considered an egalitarian and less masculine culture than many other cultures (chiu & klassen, 2009; chiu & klassen, 2010), finnish children grow up in a society where boys and girls are treated more equally than in cultures that are less egalitarian. this may explain why the present study did not show any gender differences in girls’ and boys’ self-concept of abilities and why the impact of parental beliefs was similar for boys and girls. 4.3. limitations this study has its limitations. first, the study was carried out in just one educational setting, finland. as it is possible that parental beliefs play a different role in students’ self-concept of abilities in different educational settings and cultures, further cross-cultural research on the topic is needed. second, even though a longitudinal procedure was used in the present study, it might be that some third factor not controlled for explains the predictions found. one should, therefore, be cautious before making any judgements about the possible causality of the results. third, the measure for mothers’ and fathers’ beliefs included two questions only. in future research measurements including more items to measure parental beliefs should be used to replicate the results found here. overall, the results of this study suggest that during secondary school finnish adolescents’ selfconcepts of mathematics and literacy ability undergo a slight decline, and that in the domain of mathematics both mothers’ and fathers’ beliefs about their children’s abilities play a role in the development of adolescents’ self-concept of ability. it is important that both mothers and fathers know what role they play in the formation of their children’s self-concepts of ability. because parents receive information about their children’s success at school indirectly, i.e. via grades and feedback from teachers, they might tend to think they do not have much of a role in their children’s academic-related life. it is important that schools and teachers in particular inform parents about the crucial role they can have on their children’s self-concept development in different academic domains. teachers and school personnel should inform parents about the pesu%et%al% % % | f l r ! ! 106! ways in which they, both mothers and fathers, could support their children and their children’s developing self-concepts. keypoints mothers’ and fathers’ child-specific ability beliefs predicted adolescents’ self-concept of mathematics ability. mothers’ and fathers’ beliefs did not predict adolescents’ self-concept of literacy ability. the relations between mothers’ beliefs and adolescents’ self-concept of mathematics ability varied according to adolescents’ performance: mothers beliefs were positively related to their children’s self-concept of mathematics ability among high-performing adolescents but less so among low-performing adolescents. references aiken, l. s., & west, s. g. (1991). multiple regression: testing and interpreting interactions. newbury park, ca: sage. arens, a. k., yeung, a. s., craven, r. g., & hasselhorn, m. (2011). the twofold multidimensionality of academic self-concept: domain specificity and separation between competence and affect components. journal of educational psychology, 103, 970-981. doi: 10.1037/a0025047 atkinson, j. w. (1964). an introduction to motivation. princeton, nj: van nostrand. aunola, k., leskinen, e., onatsu-arvilommi, t., & nurmi, j-e. (2002). three methods for studying developmental change: a case of reading skills and self-concept. british journal of educational psychology, 72, 343-364. doi: 10.1348/000709902320634447 bandura, a. (1986). social foundations of thought and action: a social cognitive theory. englewood cliffs, nj: prentice-hall. bohlmann, n., & weinstein, r. (2013). classroom context, teacher expectations, and cognitive level: predicting children’s math ability judgments. journal of applied developmental psychology, 34, 288298. doi: 10.1016/j.appdev.2013.06.003 bong, m., & skaalvik, e. m. (2003). academic self-concept and self-efficacy: how different are they really? educational psychology review, 15, 1–40. doi: 10.1023/a:1021302408382 brunner, m., keller, u., hornung, c., reichert, m., & martin, r. (2009). the cross-cultural generalizability of a new structural model of academic self-concepts. learning and individual differences, 19, 387-403. doi:10.1016/j.lindif.2008.11.008 caprara, g. v., vecchione, m., alessandri, g., gerbino, m., & barbaranelli, c. (2011). the contribution of personality traits and self-efficacy beliefs to academic achievement: a longitudinal study. british journal of educational psychology, 81, 78-96. doi: 10.1348/2044-8279.002004 chapman, j. w., tunmer, e. t., & prochnow, j. e. (2000). early reading-related skills and performance, reading self-concept, and the development of academic self-concept: a longitudinal study. journal of educational psychology, 92, 703-708. doi: 10.1037/0022-0663.92.4.703 cheung, h. y., & chan, a. w. h. (2007). how culture affects female inequality across countries. journal of studies in international education, 11, 157−179. doi: 10.1177/1028315306291538 chiu, m. m., & klassen, r. m. (2009). calibration of reading self-concept and reading achievement among 15-year-olds: cultural differences in 34 countries. learning and individual differences, 19, 372-386. doi:10.1016/j.lindif.2008.10.004 chiu, m. m., & klassen, r. m. (2010). relations of mathematics self-concept and its calibration with mathematics achievement: cultural differences among fifteen-year-olds in 34 countries. learning and instruction, 20, 2-17. doi:10.1016/j.learninstruc.2008.11.002 pesu%et%al% % % | f l r ! ! 107! dermitzaki, i., & efklides, a. (2000). aspects of self-concept and their relationship to language performance and verbal reasoning ability. the american journal of psychology, 113, 621–637. doi: 10.2307/1423475 eccles, j. s. (1993). school and family effects on the ontogeny of children’s interests, self-perceptions, and activity choices. in j. e. jacobs & r. dienstbier (eds.), developmental perspectives on motivation (pp. 145-208). university of nebraska press. eccles, j. s. (2007). where are all the women? gender differences in participation in physical science and engineering. in s. j. ceci & w. m. williams (eds.), why aren’t more women in science? top researchers debate the evidence (pp. 199–210). washington, dc: american psychological association. doi:10.1037/11546-016 eccles, j. s., adler, t. f., futterman, r., goff, s. b., kaczala, c. m., meece, j. l., & midgley, c. (1983). expectancies, values, and academic behaviors. in j. t. spence (ed.), achievement and achievement motivation (pp. 75–146). san francisco: w. h. freeman. eccles parsons, j., adler, t. f., & kaczala, c. m. (1982). socialization of achievement attitudes and beliefs: parental influences. child development, 53, 310–321. doi: 10.2307/1128973 eccles, j. s., & jacobs, j. e. (1987). social forces shape math attitudes and performance. in m. r. walsh (ed.), the psychology of women: ongoing debates (pp. 341-354). new haven, us: yale university press. eccles, j. s., & wigfield, a. (1995). in the mind of the actor: the structure of adolescents’ achievement task values and expectancy-related beliefs. personality and social psychology bulletin, 3, 215-225. doi: 10.1177/0146167295213003 frome, p. m., & eccles, j. s. (1998). parents’ influence on children’s achievement-related perceptions. journal of personality and social psychology, 74, 435–452. doi: 10.1037/0022-3514.74.2.435 gniewosz, b., eccles, j. s., & noack, p. (2012). secondary school transition and the use of different sources of information for the construction of the academic self-concept. social development, 21, 537-557. doi: 10.1111/j.1467-9507.2011.00635.x gniewosz, b., eccles, j. s., & noack, p. (2014). early adolescents’ development of academic self-concept and intrinsic task value: the role of contextual feedback. journal of research on adolescence, 25, 1-15. doi: 10.1111/jora.12140. gniewosz, b., & noack, p. (2012). mamakind or papakind? [mom’s child or dad’s child]: early adolescents’ parental preferences in intergenerational academic value transmission. learning and individual differences, 22, 544-548. doi:10.1016/j.lindif.2012.03.003 gunderson, e. a., ramirez, g., levine, s. c., & beilock, s. i. (2012). the role of parents and teachers in the development of gender-related math attitudes. sex roles, 66, 156-166. doi: 10.1007/s11199-011-99962 hofstede, g. (2003). culture’s consequences. thousand oaks, ca: sage. holopainen, l., kairaluoma, l., nevala, j., ahonen, t., & aro, m. (2004). lukivaikeuksien seulontatesti nuorille ja aikuisille. [dyslexia screening test for youth and adults]. jyväskylä: jyväskylän yliopistopaino. jacobs, j. e. (1991). influence of gender stereotypes on parent and child mathematics attitudes, journal of educational psychology, 83, 518–527. doi: 10.1037/0022-0663.83.4.518 jacobs, j. e., & eccles, j. s. (2000). parents, task values, and real-life achievement-related choices. in c. sansone & j. m. harackiewicz (eds.), intrinsic and extrinsic motivation: the search for optimal motivation and performance (pp. 405–439). san diego, ca: academic press, inc. jacobs, j. e., lanza, s., osgood, d. w., eccles, j. s., & wigfield, a. (2002). changes in children’s selfcompetence and values: gender and domain differences across grades one through twelve. child development, 73, 509-527. doi: 10.1111/1467-8624.00421 lau, s., & pun, k-t. (1999). parental evaluations and their agreement: relationship with children’s selfconcepts. social behavior and personality: an international journal, 27, 639–650. doi: 10.2224/sbp.1999.27.6.639 maccoby, e. e. (1998). the two sexes: growing up apart, coming together. cambridge, ma: belknap press. pesu%et%al% % % | f l r ! ! 108! marsh, h. w. (1990). a multidimensional, hierarchical self-concept: theoretical and empirical justification. educational psychology review, 2, 77-172. doi: 10.1007/bf01322177 marsh, h. w., byrne, b. m., & shavelson, r. (1988). a multifaceted academic self-concept: its hierarchical structure and its relation to academic achievement. journal of educational psychology, 80, 366-380. doi: 10.1037/0022-0663.80.3.366 marsh, h. w., trautwein, u., lüdtke, o., köller, o. & baumert, j. (2005). academic self-concept, interest, grades, and standardized test scores: reciprocal effects models of causal ordering. child development, 76, 397-416. doi: 10.1111/j.1467-8624.2005.00853.x marsh, h. w., & yeung, a. s. (2001). an extension of the internal/external frame of reference model: a response to bong (1998). multivariate behavioral research, 36, 389-420. doi: 10.1207/s15327906389420 mcgrath, e. p., & repetti, r. l. (2000). mothers’ and fathers’ attitudes toward their children’s academic performance and children’s perceptions of their academic competence. journal of youth and adolescence, 29, 713–723. doi: 10.1023/a:1026460007421 nagy, g., watt, h. m. g., eccles, j., trautwein, u., lüdtke, o., & baumert, j. (2010). the development of students’ mathematics self-concept in relation to gender: different countries, different trajectories? journal of research on adolescence, 20, 482-506. doi: 10.1111/j.1532-7795.2010.00644.x pesu, l., viljaranta, j., & aunola, k. (2016). the role of parents’ and teachers’ beliefs in children’s selfconcept development. journal of applied developmental psychology, 44, 63-71. doi: 10.1016/j.appdev.2016.03.001 phillips, d. a. (1987). socialization of perceived academic competence among highly competent children. child development, 58, 1308–1320. doi: 10.2307/1130623 pintrich, p. r. & schunk, d. h. (2008). motivation in education. theory, research and applications (3rd ed.). new jersey: pearson education. preckel, f., niepel, c., schneider, m., & brunner, m. (2013). self-concept in adolescence: a longitudinal study on reciprocal effects of self-perceptions in academic and social domains. journal of adolescence, 36, 1165-1175. doi: 10.1016/j.adolescence.2013.09.001 räsänen, p., & leino, l. (2005). ktlt. laskutaidon testi. opas yksilö-tai ryhmämuotoista arviointia varten. seymour, p. h., aro, m., & erskine, j. m. (2003). foundation literacy acquisition in european orthographies. british journal of psychology, 94, 143-174. doi: 10.1348/000712603321661859 shavelson. r. j., hubner, j. j., & stanton, g. c. (1976). self-concept: validation of construct interpretations. review of educational research, 46, 407-441. doi: 10.2307/1170010 simpkins, s. d., fredricks, j. a., & eccles, j. s. (2012). charting the eccles’ expectancy-value model from mothers’ beliefs in childhood to youths’ activities in adolescence. developmental psychology, 48, 1019-1032. doi: 10.1037/a0027468 statistics finland (2010). differences between municipalities in educational level of population were still considerable in 2009. helsinki: statistics finland. retrieved march 20, 2016, from http:// www.stat.fi/til/vkour/2009/vkour_2009_2010-12-03_tie_001_en.html valentine, j. c., dubois, d. l., & cooper, h. (2004). the relation between self-beliefs and academic achievement: a meta-analytic review. educational psychologist, 39, 111-133. doi: 10.1207/s15326985ep3902_3 välijärvi, j., kupari, p., linnakylä, p., reinikainen, p., sulkunen, s., törnroos, j., & arffman, i. (2007). the finnish success in pisa-and some reasons behind it: pisa 2003. 2. jyväskylän yliopisto, koulutuksen tutkimuslaitos. watt, h. m. g. (2004). development of adolescents’ self-perceptions, values, and task-perceptions according to gender and domain in 7ththrough 11th-grade australian students. child development, 75, 15561574. doi: 10.1111/j.1467-8624.2004.00757.x wigfield, a., & eccles, j. s. (2000). expectancy-value theory of achievement motivation. contemporary educational psychology, 25, 68–81. doi: 10.1006/ceps.1999.1015 wigfield, a., eccles, j. s., maciver, d., reuman, d. a., & midgley, c. (1991). transitions during early adolescence: changes in children’s domain-specific self-perceptions and general self-esteem across the pesu%et%al% % % | f l r ! ! 109! transition to junior high school. developmental psychology, 27, 552-565. doi: 10.1037/00121649.27.4.552 wigfield, a., eccles, j. s., schiefele, u, roeser, r. w., & davis-kean, p. (2006). development of achievement motivation. in n. eisenberg, w. damon & r. m. lerner (eds.), handbook of child psychology: vol. 3, social, emotional, and personality development (6th ed.) (pp. 933-1002). hoboken, nj, us: john wiley & sons inc. wigfield, a., eccles, j. s., yoon, k. s., harold, r. d., arbreton, a. j. a., freeman-doan, c., & blumenfeld, p. c. (1997). change in children’s competence beliefs and subjective task values across the elementary school years: a 3-year study. journal of educational psychology, 89, 451-469. doi: 10.1037/0022-0663.89.3.451 wigfield, a., tonks, s., & eccles, j. s. (2004). expectancy value theory in cross-cultural perspective. in d. m. mcinerney & s. van etten (eds.), big theories revisited (pp. 165-198). charlotte, nc:iap. microsoft word malmberg et al_publication.docx frontline learning research vol.4 no. 5 (2016) 62 -‐ 82 issn 2295-‐3159 contact information: lars-erik malmberg, department of education, university of oxford, 15, norham gardens, ox2 6py, oxford, uk. email: lars-erik.malmberg@education.ox.ac.uk. doi: http://dx.doi.org/10.14786/flr.v4i5.227 within-students variability in learning experiences, and teachers' perceptions of students' task-focus lars-erik malmberga, wee h. t. lima, asko tolvanenb and jari-erik nurmib a university of oxford, united kingdom b university of jyväskylä, finland article received 16 november / revised 7 july / accepted 7 september / available online 11 january abstract in order to advance our understanding of educational processes, we present a tutorial of intraindividual variability. an adaptive educational process is characterised by stable (less variability), and a maladaptive process is characterised by instable (more variability) learning experiences from one learning situation to the next. we outline step by step how we specify a multilevel structural equation model of state, trait and individual differences in intraindividual variability constructs, which can be appropriately fitted to intraindividual data (e.g., time-points nested in persons, intensive longitudinal data). in total 285 primary school students’ (years 5 and 6) completed the learning experience questionnaire using handheld computers, on average 13.6 learning episodes during one week (sd = 4.6; range = 5-29; nepisodes = 3,433). we defined mean squared successive differences (mssd) for each manifest indicator of task difficulty, competence evaluation and intrinsic motivation. we also demonstrate how to specify multivariate models for investigating convergent validity of the variability constructs. overall, our study provides support for intraindividual variability as a construct in its own right, which has the potential to provide novel insight into students’ learning processes. keywords: intraindividual variability; multilevel structural equation model (msem); learning experience; ecological momentary assessment malmberg et al | f l r 63 1. introduction there is a growing interest in the study of students’ learning processes using diary and real-time data (schmitz, 2006). these micro-longitudinal studies expand our knowledge about learning processes beyond what we can learn from single time-point cross-sectional studies, in at least three ways. first, there is considerable variation in students’ learning experiences, e.g., their engagement, beliefs, motivation, emotions, and performance from one situation to another (i.e., intraindividual variation), more so than there is variation between students (i.e., interpersonal variation; schmitz & skinner, 1993). second, situationspecific learning experiences vary as a function of contextual features, such as perceived autonomy support (tsai, kunter, lüdtke, & trautwein, 2008), and extrinsic motivation (malmberg, pakarinen, vasalampi, & nurmi, 2015). this means that situation specific opportunities and constraints, such as provision of support and levels of expectation, form an integral part of students’ learning experiences. third, students’ individual characteristics can moderate the relationship between experiences. compared with relatively lower achievers, higher achievers had more stable control beliefs and perceived task ease from one situation to the next (musher-eizenman, nesselroade, & schmitz, 2002), and exerted more effort when confronted with difficult tasks (malmberg, walls, martin, little, & lim, 2013). what we know less about is the intraindividual variability in students’ learning experiences from one situation to the next. whilst intraindividual variation captures the differences between individuals’ experiences above or below their own average experience (i.e., an “individual standard deviation” of own “ups” and “downs”), intraindividual variability, inconsistency, or instability refers to the magnitude of short-term fluctuations in the order of the ups and downs from one time-point to the next (e.g., jahng, wood, & trull, 2008; kernis, grannemann, & barclay, 1989). this magnitude of intraindividual variability is larger when the shifts between highs and lows are more abrupt, occur more often, and the swings go from one extreme to the other. in the present study we go beyond previous real-time studies of students’ learning experiences in two ways. first, we propose a methodology for specifying a within-person variability construct alongside state and trait constructs, using state-of-the-art multilevel structural equation models (msem). the msem allows us to model latent constructs net of measurement error at two (or more) levels of data. second, we include teacher perceived student task-focus as an indicator of convergent validity of students’ intraindividual variability. accumulated research shows that students, who in the eyes of their teacher are generally task-focused, are: intrinsically motivated, deploy task-focused rather than task-avoidant behavioural engagement, exert effort, seek help when they need it, and persist when they encounter difficulties (eccles, wigfield, & schiefele, 1998; nurmi, hirvonen & aunola, 2008, zimmerman, 2000). it would be important to know whether students who teachers regard as task-focused, are also more stable in their learning experiences, i.e., less variability in students’ perceptions of task difficulty, competence beliefs and intrinsic motivation from one learning situation to the next. to this end we provide a brief overview of intraindividual research in education, task-focused learning, a didactical example of the mean squared sequential difference (mssd) index of intraindividual variation, and an msem specification. 1.1. intraindividual resesarch to education there appears to be a surge in intraindividual research in education. since the seminal diary studies by schmitz and skinner (1993) and musher-eizenman et al. (2002), an up-swing in the number of publications has been seen, for example schmitz and wiese (2006), and tsai et al., (2008). recent studies have used experience sampling of students’ academic emotions (goetz, frenzel, stoeger & hall, 2010), coping with boredom (nett, goetz & hall, 2011) and metacognitive strategies (nett, goetz, hall & frenzel, 2012); ecological momentary assessment studies of effort exertion, competence beliefs and task difficulty (malmberg, walls et al., 2013); and contextual activity sampling of university students’ challenge, competence and emotions (inkinen et al., 2013). data in these studies were collected at multiple time-points in their natural settings, as close in time as possible to events, thus reducing retrospection bias (wilhelm, perrez, & pawlik, 2012). the importance of the intraindividual perspective on learning experiences is threefold. these studies pave the way for understanding, first, learning processes as they occur in real-time; second, individual differences in such learning processes, and third, how teachers might differentially malmberg et al | f l r 64 support individual students. taken together, an intraindividual approach to learning can help us understand both learning processes and the ways in which teachers can support these (schmitz, 2006). 1.2. intraindividual variation and variability in the research fields of personality and psychiatry, affect instability is characteristic of personality disorders (jahng et al., 2008; trull et al., 2008), with particular focus on negative mood (eid & langeheine, 2003), affect (eid & diener, 1999), mood and job satisfaction (ilies & judge, 2002), affect and mood instability (jahng et al., 2008), short-term fluctuations in self-esteem (kernis et al., 1989), and mood variability (mcconville & cooper, 1997). expanding into other fields, recent studies of intraindividual variability include secure attachment (la guardia, ryan, couchman, & deci, 2000), temperament (hooker, nesselroade, nesselroade, & lerner, 1987), perceived control (eizenman, nesselroade, featherman, & rowe, 1997), and coping (roesch et al., 2010). a range of techniques have been suggested for aggregating measures of within-person variability (for a review, see jahng et al., 2008): the intraindividual standard deviation (or variance), first-order autocorrelation coefficients r, and the mean square successive difference (mssd; von neumann, kent, bellinson, & hart, 1941). while the intraindividual standard deviation is intuitively appealing, it does not capture the frequency of change (larsen, 1987). the mssd calculates an aggregate that takes the sequential order of the events into account (equation 1). mssd = ! !!! (x! + 1 − x!)! !!! !!! (1), where xi + 1 is the lagged value of xi. the squared difference between xi + 1 and xi assures that the magnitude of the successive differences is captured. there are n-1 observations in the dataset (see appendix 1). in a didactic simulation shown in figure 1 we exemplify the conduct of the mean (m), standard deviation (sd), the mean square successive difference (mssd), and the autocorrelation (r), in three scenarios (panels a, b and c). for a similar simulation see jahng et al. (2008). when we observe the raw data in panel a (figure 1) we find that the m and sd are the same as in panel c, in which the data has been rank-ordered in descending order. the m and sd in panel b, in which each data-point has been multiplied by two, are the same as multiplying the m and sd of those in panel a by two. while the sd indeed captures variation, it is not sufficient for capturing the magnitude of variation. the stability over time captured by the autocorrelation r remains the same in panels a and b, demonstrating that r does not capture the magnitude of change either. the autocorrelation coefficient r is different in panel c demonstrating that the order of events matter. finally mssd differs in all three panels demonstrating that it is both sensitive to magnitude (panel b) and order of change (panel c). in the present study we use mssd for investigating intraindividual variability. in previous studies, a range of models for investigating lagged associations have been specified, including time-series and spectral analysis (larsen, 1987; ram et al., 2005), the mixed-effects location scale model (li & hedeker, 2012), generalized multilevel model (jahng et al., 2008), and mixture distribution models (eid & langeheine, 2003). however, these models do not correct for measurement error in constructs. to do so, we calculated mssd for each indicator of our latent constructs and modelled these using multilevel structural equation models (msem). although time-series typically requires longer stretches of time-points, the mssd method is suggested to be robust also for shorter time-series e.g., a number of time-points during each day (ebner-priemer, eid, kleindienst, stabenow, & trull, 2009). malmberg et al | f l r 65 figure 1. three example time-series and indices of intraindividual variability (cf. jahng, et al., 2008). note: panel a represents one sample student for whom 29 situation reports were observed for intrinsic motivation (1 = low motivation, 4 = high motivation). panel b represents each numerical value in panel a multiplied by 2, so the scale now spans 2 to 8. panel c represents the raw data from panel a but now rank-ordered in descending order. m = mean, sd = standard deviation, mssd = the mean square successive difference, and r = autocorrelation. 1.3. research questions and hypotheses a) what is the structural validity of the state, trait and intraindividual variability constructs? b) what is the association between trait and intraindividual variability constructs? c) how do trait and intraindividual variability constructs of students’ learning experiences converge with teacher-reported task-focus? hypothesis 1: we expected convergence between teacher-reports of students' task-focus (nurmi et al., 2008), higher level of task-focus positively and moderately associated with trait-levels of each construct, and negatively associated with variability constructs (i.e., higher task-focus less variability, lower task-focus more variability). 2. method 2.1. sample and procedure in total, 353 students in 16 classrooms in 11 schools participated in the learning every lesson (lel) study (for details see malmberg, woolgar, & martin, 2013; malmberg, walls et al., 2013; malmberg et al., 2015), with informed parental or guardian consent. the study was carried out in two quite diverse areas in southeast england, uk. students were asked to complete the electronic learning experience questionnaire (leq) for personal digital assistant (pda) at the end of each learning episode or at least once per lesson. teachers or teaching assistants were asked to complete a brief one-page report of each student they taught. teaching arrangements differed across the classes. in half of the classrooms, one teacher reported on all his or her students; in four classrooms two teachers reported on the students; in two malmberg et al | f l r 66 classrooms, there was a mix of students with one or two teacher reports; and in another two classrooms, two or three teachers reported. in order to investigate the correspondence between students’ and teachers’ views of the students, in the final study sample we included all observations for which both teacher and student reports for any given student were available. the intraclass correlation for teacher-reported task-focus was ricc = .08 between classrooms and ricc = .08 between teachers (malmberg et al., 2015). in order to not burden the models with additional hierarchical levels, teacher reports were aggregated for each student, weighted for the number of experiences with each teacher. however, for the purpose of aggregating mssd-indices of the lagged relationships between the time-points, we carried out analyses for those students who had at least five timepoints of data available (roughly the possible number of reports per day). there were 285 students who reported on 3,433 learning episodes: on average 13.6 learning episodes (sd = 4.6; range = 5-29) combined with 434 teacher reports (139 students had one teacher report, 143 had two reports and 3 had three reports). of these there were 126 boys (44.2%) and 159 girls (55.8%), 104 were in year 5 (36.5%) and 181 in year 6 (63.5%). they were 10.5 years old on average (sd = 0.64). 2.2. student-reported measures students’ learning experiences were measured using the validated leq (reliability, structural and external validity), covering sources of motivation, learning behaviour, competence evaluation and affect (malmberg, woolgar, & martin, 2013). 2.2.1. task difficulty students completed a single item measuring task difficulty: “the learning task i was doing was”, on a four-point scale (1= very easy, 4= very hard). 2.2.2. competence evaluation students responded to two items indicating competence evaluation (mα = .70; sdα = .18): “how well were you doing at this task” on a five-point scale (1 = poorly, 5 = very well), and “how much did you understand” on a four-point scale (1 = all of it, 4 = none of it; reverse-coded). 2.2.3 intrinsic motivation students were asked “why were you doing this task?” and responded to three items measuring intrinsic motivation: “i enjoyed it”, “i chose to do it”, and “i was interested in it”. when we split the data by day and learning experience, the average internal consistency was mα = .85 (sdα = .09). 2.3. teacher-reported measures teachers reported on each student’s task-focused characteristics and behaviour. 2.3.1. task-focus we used teacher-reports of each student’s task-focus in school in general. task-focus was measured with six items modified from the observer-rating scale of achievement strategies (osas; nurmi, & aunola, 1998), and the behavioural strategy rating scale ii (bsr-ii; aunola, nurmi, parrila, & onatsuarvilommi, 2000; zhang, nurmi, kiuru, lerkkanen, & aunola, 2011). teachers were asked to think about each student’s behaviour and work habits in class, and respond on five-point scales (0 = not at all, 1 = rarely, 2 = sometimes, 3 = often, 4 = very often), to what extent each of the six statements characterise the way each student typically behaves in learning situations. half of the items were positively worded (indicating taskfocus): “actively attempts to solve even difficult tasks”, “demonstrates initiative and persistence in activities and tasks”, and “tries hard to finish even difficult tasks”. the three negatively worded items (indicating taskmalmberg et al | f l r 67 avoidance) were: “has a tendency to find something else to do, instead of focusing on the task at hand”, “gives up easily”, and “loses focus if a task or activity is not going well” (α = .88). we specified the construct so that higher values indicated more focus on tasks. task-focus was strongly and positively related to academic performance (malmberg et al., 2015). 2.4. analytic procedures we specified multilevel structural equation models (msem) in mplus (muthén & muthén, 2012). at the within level we specified a latent state construct ξw1 using x1 to x3 as indicators (see figure 2). at the between level, we specified a correspondence between level trait construct ξb1, equating factor loadings across the levels for metric invariance between the state and trait constructs (morin, marsh, nagengast, & scalas, 2014). we then specified a second between-level construct, which captures interindividual differences in intraindividual variability, ξb2 using k indicators. figure 2. msem of statetrait and variability constructs note: indicators are raw data of time-points (t) nested in students (i). circles above (at the between level, e.g., x1b) and below (at the within level e.g., x1w) the indicators depict latent constructs of decomposed betweenand within-level indicators respectively. there is one within-level latent construct (ξw1) and two between-level constructs (ξb1 and ξb2), with factor loadings (λ, one-headed arrows) linking constructs to level-specific indicators. variances of latent constructs are indicated in double headed arrows (ψ). residuals of indicators are also depicted with double headed arrows (ε), at the within-level measurement error. the mean-structure (triangle with 1 inside) is estimated at the between-level (i.e., cluster-intercepts, τ). in the dataset we created lagged variables (xkt+1) of each indicator (xkt) for each student. this gave 285 additional lines of data, one for each participant in our data-matrix, giving a total of nti = 3,718 lines of malmberg et al | f l r 68 data (see appendix 1). we then, in mplus, defined intraindividual squared deviations (xkt+1 xkt) 2 which we used as indicators (see appendix 2). the scalar of the mssd equation, ! !!! , was not necessary to apply as there are n-1 number of successive differences for each participant. calculating the average of the successive differences is to divide the sum of the squared successive differences by n-1. we specified msems, presented in figures 3-5, for each construct using one (difficulty), two (competence), and three indicators (intrinsic motivation) for each latent construct separately (models 1-3). we then illustrated how to specify three multivariate models, presented in figure 6 for investigating convergence between trait and variability constructs, and between variability and task-focus (models 4-6). we inspected indices of convergence (association of higher magnitude where expected) and divergence (lack of association where expected; campbell & fiske, 1959). model fit was assessed by inspecting cut-offs for goodness of fit indices: ≤.06 for good model fit using the root mean square error of approximation (rmsea) and the standardized root mean square residual for the within (srmrw) and the between level (srmrb), and ≥.90 for acceptable and ≥.95 for good model fit for the comparative fit index (cfi; browne & cudeck, 1993). assuming mar we treated missing data (4.8% of the missing data-points, in the dataset with the non-lagged variables) using the default fiml algorithm in mplus (muthén & muthén, 2012). we used the robust maximum likelihood estimator (mlr) which corrects standard errors for non-normality. 3. results in order to test structural validity of the state, trait, and variability-constructs of each learning experience, we present a univariate msem specified with one manifest indicator (task-difficulty), two indicators (competence evaluation), and three indicators (intrinsic motivation). to investigate the association between trait and variability-constructs we report on the correlation between these latent constructs. 3.1. univariate models as shown in fig 3, we illustrate how to specify our proposed model using a single item indicator. to identify this model we fixed a number of parameters: all factor loadings (at 1), and residuals (at 0). the pooled within-level variance was ψw1= 0.88 and between ψb1 = 0.26, showing that 22.5% of the variance of task difficulty resided at the between level. we note that the variance of the variability construct, ψb2 = 2.05, was larger than the variance of the trait construct. the association between trait-task-difficulty and variability in task-difficulty was ρ = 0.46, that is the more difficult tasks appeared on average during the week, the more variability in task-difficulty (i.e., larger ups and downs in task difficulty during the week). malmberg et al | f l r 69 figure 3. multilevel structural equation model of latent state, trait and intraindividual variability of task difficulty. note: manifest indicators are diff = task-difficulty as shown in fig 4, we illustrate how to specify our proposed model using two indicators. the pooled within-level variance was ψw1 = 0.32 and between ψb1 = 0.13, showing that 29.5% of the variance of competence beliefs resided at the between level. we note that the variance of the variability construct, ψb2 = 0.66, was larger than the variance of the trait construct. the association between trait intrinsic motivation and variability in intrinsic motivation was ρ = -0.72, that is the more competent students thought they were on average during the week, the less variable they thought their competences were during the week (i.e., smaller ups and downs in competence belief during the week). malmberg et al | f l r 70 figure 4. multilevel structural equation model of latent state, trait and intraindividual variability of competence belief. note: manifest indicators are well = how well?, und = understanding as shown in fig 5, we illustrate how to specify our proposed model using three indicators. the pooled within-level variance was ψw1= 0.51 and between ψb1 = 0.32, showing that 38.7% of the variance of competence evaluation resided at the between level. we note that the variance of the variability construct, ψb2 = 1.26, was larger than the variance of the trait construct. the association between trait-task-difficulty and variability in task-difficulty was ρ = -0.35, that is the more intrinsically motivated students thought they were on average during the week, the less their motivation fluctuated during the week (i.e., smaller ups and downs in intrinsic motivation during the week). malmberg et al | f l r 71 figure 5. multilevel structural equation model of latent state, trait and intraindividual variability of intrinsic motivation. note: manifest indicators are enj = enjoyment, int = interest, and cho = choice. 3.2. multivariate models in models 4 to 6 we present three possible models for investigating convergent validity of the variability constructs (see fig 6). in model 4 we specified three state constructs at the within-level, with three corresponding trait constructs at the between-level, and three variability constructs. the structural parameters of interest are the associations between the trait-construct and variability-construct of task difficulty (ρ = 0.46), competence belief (ρ = -0.64) and intrinsic motivation (ρ = -0.35) respectively. these associations were similar to the ones found in the separate univariate models. in model 5 we specified associations between the three trait-constructs and teacher-reported student task-focus, and between variability constructs and task-focus. more task-focused students, on average during the week, found tasks easier (ρ = -0.19), felt more successful (ρ = 0.38), and were more intrinsically motivated (ρ = 0.25). more task-focused students found tasks of more equal difficulty (ρ = -0.26), fluctuated less in their competence beliefs (ρ = -0.35), but were not significantly less variable in their intrinsic motivation. malmberg et al | f l r 72 in model 6 we specified a higher-order construct of variability. this means that the factor loadings of the higher-order construct explain the associations between the latent constructs. students who were more variable overall (i.e., a higher value on the higher-order variability construct) found, on average during the week, task more difficult (ρ = 0.50), their competence lower (ρ = -0.62) and were less intrinsically motivated (ρ = -0.32). figure 6. multivariate models of state, trait and variability constructs. note: only structural parts of the models shown for clarity. estimates (standardized correlations) are from mplus 7.4 (muthén & muthén, 2012). malmberg et al | f l r 73 4. discussion in order to advance our understanding of students’ learning processes in real time, we investigated intraindividual variability in students’ learning experiences, and convergence between intraindividual variability and teacher-reported task-focus. inclusion of such intraindividual variability construct(s) in process models of learning experiences would expand current modelling practices in the field. up to now, models include: (1) decomposition of learning experiences into within (time-points) and between (students) components, (2) random (moderator) effects of perceptions of the context on learning experiences, and (3) fixed and moderation effects of personal characteristics on learning experiences. as the mssd captures both magnitude and order of events (von neumann et al., 1941; jahng et al., 2008), we created a dataset with lagged variables and specified aggregated variables to use in msems. we specified three latent constructs: a state-construct at the within-level, and a trait and an intraindividual variability construct at the between-level. in response to our first research question regarding the structural validity of state-trait and variability dimensions of learning experiences, we found support for the specificity of the intraindividual variability dimension. importantly, this suggests that intraindividual variability in learning experiences such as motivation, adaptive behaviours, and competence evaluations capture an important dimension of students’ experiences of learning, in addition to variability of constructs in other fields of research, e.g., affect instability (trull et al., 2008). in response to our second research question, we confirmed the hypothesis that trait and variability dimensions of learning experiences converged with teacher-reported task focus. importantly, teacher reports of higher task-focus were related to more adaptive learning experiences on average during the week (i.e., less difficulty, feeling more competent, experiencing higher intrinsic motivation), and to less variability in these same learning experiences (i.e., a smaller magnitude in momentary fluctuations from one learning episode to the next). 4.1. state, trait and individual differences in intraindividual variability we found three sources of support for the distinction between state, trait and variability dimensions of the same construct. first, msem of each learning experience construct in turn (models 1-3) suggested that state, trait and intraindividual variability are separable constructs. importantly, this expands existing two-level models in which states and traits have been modeled at the within and between levels respectively (e.g., roesch et al., 2010). second, inspection of associations between trait and intraindividual variability constructs at the between level suggested convergence, that is traits and variability dimensions of each construct were moderately to strongly associated |ρ| = .35 to .64. third, model 6 suggested that intraindividual variability could be specified as a higher-order construct. taken together, our msem using aggregates of mssds of each indicator was deemed successful for portraying the variability dimension. going beyond previous studies (malmberg et al., 2013; schmitz & skinner, 1993) which have shown that there is more variance within (i.e., intraindividual) than between students (i.e., interindividual difference), we suggest it is possible to retrieve systematic variance of intraindividual experiences by specifying intraindividual variability dimensions of constructs. importantly, this demonstrates that there are systematic individual differences in how students vary within themselves. there are at least two research contexts in which it could be useful to implement the specification of such an intraindividual variability construct in its own right. first, it could be possible to design intervention studies with bursts of collections of intensive longitudinal data (schmitz, 2015; walls, barta, stawski, collyer, & hofer, 2011). if measures of intraindividual variability could be obtained at both preand post-tests, it would be possible to investigate treatment effects geared towards decreasing such intraindividual variability. second, if reports of studentteacher interaction were possible to collect alongside collection of students’ self-reported learning experiences, it would be possible to specify models in which teacher sensitivity to disengagement or off-task behaviour might alleviate intraindividual variability. with regard to questionnaire design, we suggest it could be possible to create at least two types of psychometric measures, for researchers who do not aspire to measure processes by collecting intensive malmberg et al | f l r 74 longitudinal data. first, while previous self-report measures of emotional self-concept (i.e., emotional stability, meaning the perception of feeling calm, emotionally stable and worried; e.g., marsh, 1989) and stability of self-esteem (rosenberg, 1965), have indeed focused on intraindividual variability as a trait, the findings from our present study suggest that it could be possible to create a wider range of “variability-as-atrait” constructs. second, while present observation instruments of students’ engagement and task-focus are designed to capture trait aspects (zhang et al., 2011), future instruments could focus on variability of such observations. 4.2. intraindividual variability and teacher perceptions of task-focus task-focus, as a psychometric construct, is operationalized as a trait-level of students’ adaptive work habits in classrooms, the extent to which each individual student attempts difficult tasks, persists and stays focused on these (nurmi, & aunola, 1998; aunola et al., 2000; zhang et al., 2011). model 5 showed that teacher-reported student task-focus was associated with less difficulty, stronger sense of competence and more intrinsic motivation, in line with hypothesis 1. in a previous study, lower achievers were found to withdraw effort when confronted with a difficult task, while higher achievers exerted more effort (malmberg, walls, et al., 2013). consistent with the idea that students who experience learning as inherently interesting, pay attention and focus on their task at hand in that learning situation (nurmi et al., 2008), teacher rated task-focus was also related with higher levels of intrinsic motivation. this does not mean that teachers can "see" students' motivation as such (lee & reeve, 2012), but rather intrinsic motivation manifested as energized behaviour (sheldon & elliot, 1998). a higher level of task-focus was also related to students’ trait competence evaluation. competence belief also varied from one learning situation to another. future studies should investigate to what extent this is linked with experiences of particular school subjects or particular teachers. taken together, the current and the previous study hint at the importance to further investigate stability and variability in the ways teachers support and place demands on students, that are optimal for students’ learning over time. there were three important associations between task-focus and variability. a higher level of taskfocus was related, first, to less variability in difficultly and competence beliefs, and, second, to a higher level of the higher-order variability-construct. importantly, it appears that variability could be construed as a trait dimension in itself. the variability-construct might play an important role in models of self-regulation (boekaerts & corno, 2005), in which “top-down” self-regulation is typical of students who steer their learning processes by setting goals for enhancing their knowledge by sustaining motivation, rather than being obstructed by situational demands and setbacks typical of “bottom-up” self-regulation. an important future research task would be to investigate intraindividual variability in relation to self-set learning goals (or the lack of such goals). there are two important implications of students’ variability in learning experiences for the different ways in which teachers can support different students. first, the variability in itself demonstrates that all students have ups and downs, less task-focused students more so than more task-focused students. it would be important for teachers to capture the “ups”, particularly of less task-focused students. teachers want to capitalize on students’ “ups” as these would be teachable moments (hamre & pianta, 2005; pianta & hamre, 2009). second, the variability in students’ learning experiences is inherently linked to experiences of the learning context. from the teachers’ point of view there lurks a danger in them classifying students as “engaged” or “disengaged” as all students have their ups and downs. this means that “engaged” students have their disengaged moments. these are moments when teachers can redirect students. it also means that “disengaged” students have their engaged moments. these moments are the ones to capitalize on for learning; the others for redirection. it would be important also for teacher educators to focus on the meaning of intraindividual variability, allowing prospective teachers to focus on changes in student behaviours and actions in real time. in future studies, it would be important to combine studies of students’ intraindividual variability and measure of teacher support. teachers in classrooms can promote students’ learning processes by supporting malmberg et al | f l r 75 their autonomy (reeve, jang, carrell, jeon, & barch, 2004), being involved with students and structuring the learning contents (skinner & belmont, 1993), providing task-contingent praise (deci, koestner, & ryan, 1999), providing feedback directed at reducing discrepancies between current understanding and learning goals (hattie & timperley, 2007), and tailoring goals to individual learners (hattie, biggs, & purdie, 1996; butler & winne, 1995; pianta, belsky, vandergrift, hours, & morrison, 2008), in order to enhance motivation, effort, and selection of optimally difficult tasks. 4.3. limitations there were three limitations of the present study. first, the lagged variables we created spanned from 5 to 29 observations, which is shorter than the typical time-series model. however, for the purpose of calculating the mssd the number of observations might be sufficient (ebner-priemer et al., 2009). future studies of students’ learning processes should aim at collecting more repeated measures for the purpose of creating longer time-series. such a demand needs to be carefully weighed against the risks of response fatigue of participants. second, the model we applied assumes equivalent duration between each time-lag (e.g., jahng et al., 2008). thus our models do not account for unequal number of responses per day, unequal lags between each subsequent reports, and downtime when not at school. the current findings would need to be replicated with models more suitable for unequally spaced lagged data. third, the empirical data stem from a particular age-group in a particular sociocultural context, england, so replications in other contexts and age-groups would be valuable to carry out. 4.4. conclusions variability in intraindividual learning experiences captures the abruptness, frequency, order, and magnitude of students’ “ups and downs” in their engagement during a week at school. teacher perceived student task-focus converged with both trait-levels and variability in task difficulty, competence beliefs and intrinsic motivation. intraindividual variability formed a higher-order construct. overall, our study provides support for intraindividual variability as a construct in its own right, which has the potential to provide novel insight into students’ learning processes. keypoints we investigated intraindividual variability of primary school students' experience of learning. we used the mean square successive differences (mssd) as an index of magnitude of variability. we specified latent state, trait, and intraindividual variability constructs using multilevel structural equation models (msem). higher teacher-reported-task-focus was related to less variability in learning experiences. intraindividual variability is an important educational construct in its own right. acknowledgments the learning every lesson (lel) study was funded by the john fell foundation, and carried out during the first author’s research councils uk (rcuk) fellowship 2007-12. malmberg et al | f l r 76 references aunola, k., nurmi, j.-e., parrila, r., & onatsu-arvilommi, t. (2000). behavioral strategy relating scale ii. unpublished measurement instrument jyväskylä: university of jyväskylä, finland. boekaerts, m., & corno, l. (2005). self-regulation in the classroom: a perspective on assessment and intervention. applied psychology: an international review, 54, 199-231. doi: 10.1111/j.14640597.2005.00205.x browne, m. w., & cudeck, r. (1993). alternative ways of assessing model fit. in k. a. bollen, & j. s. long (eds.), testing structural equation models (pp. 136-162). beverly hills, ca: sage. butler, d. l., & winne, p. h. (1995). feedback and self-regulated learning: a theoretical synthesis. review of educational research, 65, 245-281. doi:10.3102/00346543065003245 campbell, d. t., & fiske, d. w. (1959). convergent and discriminant validation by the multitraitmultimethod matrix. psychological bulletin, 56, 81-105. doi:10.1037/h0046016 deci, e. l., koestner, r., & ryan, r. m. (1999). a meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. psychological bulletin, 125, 627–668. doi:10.1037/0033-2909.125.6.627 ebner-priemer, u. w., eid, m., kleindienst, n., stabenow, s., & trull, t. j. (2009). analytic strategies for understanding affective (in)stability and other dynamic processes in psychopathology. journal of abnormal psychology, 118, 195–202. doi:10.1037/a0014868 eccles, j. s., midgley, c., wigfield, a., buchanan, c. m., reuman, d., flanagan, c., & maciver, d. (1993). development during adolescence: the impact of stage-environment fit on young adolescents’ experiences in schools and in families. american psychologist, 48, 90-101. doi:10.1037/0003066x.48.2.90 eccles, j. s., wigfield, a., & schiefele, u. (1998). motivation to succeed. in w. damon & n. eisenberg, (eds.), handbook of child psychology, 5th ed.: vol 3. social, emotional, and personality development (pp. 1017-1095). hoboken, nj.: john wiley & sons. eid, m., & diener, e. (1999). intraindividual variability in affect: reliability, validity, and personality correlates. journal of personality and social psychology, 76, 662-676. doi:10.1037/00223514.76.4.662 eid, m., & langeheine, r. (2003). separating stable from variable individuals in longitudinal studies by mixture distribution models. measurement: interdisciplinary research and perspectives, 1, 179-206. doi:10.1207/s15366359mea0103_01 eizenman, d. r., nesselroade, j. r., featherman, d. l., & rowe, j.w. (1997). intraindividual variability in perceived control in an older sample: the macarthur successful aging studies. psychology and aging, 12, 489-502. doi:10.1037/0882-7974.12.3.489 georgiou, g., manolitsis, g., nurmi, j.-e., & parrila, r. (2010). does task-focused versus task-avoidance behavior matter for literacy development in an orthographically consistent language? contemporary educational psychology, 35, 1−10. doi:10.1016/j.cedpsych.2009.07.001 gill, p., & r. remedios, r. (2013). how should researchers in education operationalise on-task behaviours? cambridge journal of education, 43, 199-222. doi:10.1080/0305764x.2013.767878 goetz, t., frenzel, a. c., stoeger, h., & hall, n. c. (2010). antecedents of everyday positive emotions: an experience sampling analysis. motivation and emotion, 34, 49-62. doi:10.1007/s11031-009-9152-2 hamre, b. k., & pianta, r. c. (2005). can instructional and emotional support in the first-grade classroom make a difference for children at risk of school failure? child development, 76, 949-967. doi:10.1111/j.1467-8624.2005.00889.x hattie, j., biggs, j., & purdie, n. (1996). effects of learning skills interventions on student learning: a metaanalysis. review of educational research, 66, 99-136. doi:10.3102/00346543066002099 hattie, j., & timperley, h. (2007). the power of feedback. review of educational research, 77, 81-112. doi:10.3102/003465430298487 hirvonen, r., tolvanen, a., aunola, k., & nurmi, j.-e. (2012). the developmental dynamics of taskavoidant behavior and math performance in kindergarten and elementary school. learning and individual differences,22, 715-723. doi: 10.1016/j.lindif.2012.05.014 malmberg et al | f l r 77 hooker, k., nesselroade, d. w., nesselroade, j. r., & lerner, r. m. (1987). the structure of intraindividual temperament in the context of mother-child dyads: p-technique factor analyses of short-term change. developmental psychology, 23, 332-346. doi:10.1037/0012-1649.23.3.332 ilies, r., & judge, t. a. (2002). understanding the dynamic relationships among personality, mood, and job satisfaction: a field experience sampling study. organizational behavior and human decision processes, 89, 1119–1139. doi:10.1016/s0749-5978(02)00018-3 inkinen, m., lonka, k., hakkarainen, k., muukkonen, h., litmanen, t., & salmela-aro, k. (2013). the interface between core affects and the challenge-skill relationship. journal of happiness studies, 15, 891-913. doi:10.1007/s10902-013-9455-6 jahng, s., wood, p. k., & trull, t. j. (2008). analysis of affective instability in ecological momentary assessment: indices using successive difference and group comparison via multilevel modeling. psychological methods, 13, 354–375. doi:10.1037/a0014173 kernis, m. h., grannemann, b. d., & barclay, l. c. (1989). stability and level of self-esteem as predictors of anger arousal and hostility. journal of personality and social psychology, 56, 1013-1022. doi:10.1037/0022-3514.56.6.1013 la guardia, j. g., ryan, r. m., couchman, c. e., & deci, e. l. (2000). within-person variation in security of attachment: a self-determination theory perspective on attachment, need fulfillment, and well-being. journal of personality and social psychology, 79, 367-384. doi:10.1037/0022-3514.79.3.367 larsen, r. j. (1987). the stability of mood variability: a spectral analytic approach to daily mood assessments. journal of personality and social psychology, 52, 1195-1204. doi:10.1037/00223514.52.6.1195 lee, w., & reeve, j. (2012). teachers’ estimates of their students’ motivation and engagement: being in synch with students. educational psychology, 32, 727–747. doi:10.1080/01443410.2012.732385 li, x., & hedeker, d. (2012). a three-level mixed-effects location scale model with an application to ecological momentary assessment data. statistics in medicine, 31, 3192-3210. doi:10.1002/sim.5393 malmberg, l.-e., walls, t., martin, a. j., little, t. d., & lim, w. h. t. (2013). primary school students' learning experiences of, and self-beliefs about competence, effort, and difficulty: random effects models. learning and individual differences, 28, 54–65. doi:10.1016/j.lindif.2013.09.007 malmberg, l.-e., woolgar, c., & martin, a. (2013). quality of measurement of the learning experience questionnaire for personal digital assistants. international journal of quantitative research in education, 1, 275-296. doi:10.1504/ijqre.2013.057689 malmberg, l.-e., pakarinen, e., vasalampi, k., & nurmi, j-e. (2015). students’ school performance, taskfocus, and situation-specific motivation. learning and instruction, 39, 158-167. doi:10.1016/j.learninstruc.2015.05.005 marsh, h. w. (1989). age and sex effects in multiple dimensions of self-concept: preadolescence to early adulthood. journal of educational psychology, 81, 417-430. doi:10.1037/0022-0663.81.3.417 mcconville, c., & cooper, c. (1997). the temporal stability of mood variability. personality and individual differences, 23, 161-164. doi:10.1016/s0191-8869(97)00013-5 morin, a. j. s., marsh, h. w., nagengast, b., & scalas, l. f. (2014). doubly latent multilevel analyses of classroom climate: an illustration. the journal of experimental education, 82, 143-167. doi:10.1080/00220973.2013.769412 musher-eizenman, d. r., nesselroade, j. r., & schmitz, b. (2002). perceived control and academic performance: a comparison of highand low-performing children on within-person change patterns. international journal of behavioral development, 26, 540–547. doi:10.1080/01650250143000517 muthén, l. k. & muthén, b. o. (2012). mplus statistical analysis with latent variables: user’s guide (version 7). los angeles, ca: muthén & muthén. nett, u. e., goetz, t., & hall, n. c. (2011). coping with boredom in school: an experience sampling perspective. contemporary educational psychology, 36,49–59. doi:10.1016/j.cedpsych.2010.10.003 nett, u. e., goetz, t., hall, n. c., & frenzel, a. c. (2012). metacognitive strategies and test performance: an experience sampling analysis of students’ learning behavior. education research international. article id 958319, 16 pages. [online] http://kops.ub.uni-konstanz.de/handle/urn:nbn:de:bsz:352-206102 (accessed 12 april 2013). malmberg et al | f l r 78 nurmi, j.-e., & aunola, k. (1998). observer rating scale of achievement strategies (osas). unpublished measurement instrument. jyväskylä: university of jyväskylä, finland. nurmi, j.-e., hirvonen, r., & aunola, k. (2008). motivation and achievement beliefs in elementary school: a holistic approach using longitudinal data. unterrichtswissenschaft, 36(3), 237–254. doi:10.3262/uw0803237 pianta, r. c., & hamre, b. k. (2009). conceptualization, measurement, and improvement of classroom processes: standardized observation can leverage capacity. educational researcher, 38, 109-119. doi:10.3102/0013189x09332374 pianta, r. c., belsky, j., vandergrift, n., hours, r., & morrison, f. (2008). classroom effects on children’s achievement trajectories in elementary school. american educational research journal, 45, 365-397. doi:10.3102/0002831207308230 ram, n., chow, s.-m., bowles, r. p., wang, l., grimm, k., fujita, f., & nesselroade, j. r. (2005). examining interindividual differences in cyclicity of pleasant and unpleasant affect using spectral analysis and item response modeling. psychometrika, 70, 773-790. doi:10.1007/s11336-001-1270-5 reeve, j., jang, h., carrell, d., jeon, s., & barch, j. (2004). enhancing students’ engagement by increasing teachers’ autonomy support. motivation and emotion, 28, 147-169. doi:10.1023/b:moem.0000032312.95499.6f roesch, s. c., aldridge, a. a., stocking, s. n., villodas, f., leung, q., bartley, c. e. & black, l. j. (2010). multilevel factor analysis and structural equation modeling of daily diary coping data: modeling trait and state variation. multivariate behavioral research, 45, 767-789. doi:10.1080/00273171.2010.519276 rosenberg, m. (1965). society and the adolescent self-image. princeton, n j: princeton university press. schmitz, b. (2006). advantages of studying processes in educational research. learning and instruction, 16, 433-449. doi:10.1016/j.learninstruc.2006.09.004 schmitz, b. (2015). the study of learning processes using time-series analyses [video file]. available from http://www.education.ox.ac.uk/network-on-intrapersonal-research-in-education-nire/seminar1/bernhard-schmitz/ schmitz, b., & skinner, e. (1993). perceived control, effort, and academic performance: interindividual, intraindividual, and multivariate time-series analyses. journal of personality and social psychology, 64, 1010-1028. doi:10.1037/0022-3514.64.6.1010 schmitz, b. & wiese, b. s. (2006). new perspectives for the evaluation of training sessions in self-regulated learning: time-series analyses of diary data. contemporary educational psychology, 31, 64 – 96. doi:10.1016/j.cedpsych.2005.02.002 sheldon, k. m. & elliot, a. j. (1998). not all personal goals are personal: comparing autonomous and controlled reasons for goals as predictors of effort and attainment. personality and social psychology bulletin, 24, 546-557. doi:10.1177/0146167298245010 skinner, e. a., & belmont, m. j. (1993). motivation in the classroom: reciprocal effects of teacher behavior and student engagement across the school year. journal of educational psychology, 85, 571-581. doi:10.1037/0022-0663.85.4.571 trull, t. j., solhan, m. b., tragesser, s. l., jahng, s., wood, p. k., piasecki, t. m., & watson, d. (2008). affective instability: measuring a core feature of borderline personality disorder with ecological momentary assessment. journal of abnormal psychology, 117, 647-661. doi:10.1037/a0012532 tsai, y-m., kunter, m., lüdtke, o. & trautwein, u. (2008). day-to-day variation in competence beliefs: how autonomy support predicts young adolescents’ felt competence in h. w., marsh, r. g., craven, & d. m., mcinerney (eds.), self-processes, learning, and enabling human potential: dynamic new approaches (pp. 119-143). charlotte, nc:information age publishing.. von neumann, j., kent, r. h., bellinson, h. r., & hart, b. i. (1941). the mean square successive difference. the annals of mathematical statistics, 12, 153–162. doi:10.1214/aoms/1177731746 walls, t. a., barta, w. d., stawski, r. s., collyer, c, & hofer, s. m. (2011). time-scale dependent longitudinal designs. in b. laursen, t. d. little, & n. card (eds.), handbook of developmental research methods (pp. 46-64). new york: guilford press. malmberg et al | f l r 79 wilhelm, p., perrez, m., & pawlik, k. (2012). conducting research in daily life. in m. r. mehl, and t. s. conner, (eds.), handbook of research methods for studying daily life (pp. 62-68). new york: guilford press. zhang, x., nurmi, j-e., kiuru, n., lerkkanen, m.-k., & aunola, k. (2011). a teacher-report measure of children's task-avoidant behavior: a validation study of the behavioral strategy rating scale. learning and individual differences, 21, 690–698. doi:10.1016/j.lindif.2011.09.007 zimmerman, b. j. (2000). self-efficacy: an essential motive to learn. contemporary educational psychology, 25, 82–91. doi:10.1006/ceps.1999.1016 malmberg et al | f l r 80 appendix 1. hand-calculation of the mean square successive difference (mssd) this hand-calculation presents the values of panel a in figure 1. the columns below represent: timepoint = 28 time-points for one individual. note that the numeric values of intrt+1 are replicated in the intrt column, only one row below each corresponding intrt+1 value. this gives a 29th time-point; intrt+1 = intrinsic motivation at time-point t+1; intrt = intrinsic motivation at time-point t. this variable was created by lagging the intrt+1 variable one time-point step. time-point t becomes a predictor of time-point t+1; δ = difference between intrt+1 and intrt; δ2 = squared difference between intrt+1 and intrt; l = missing data created by the lag. time-point intrt+1 intrt δ δ2 1 4.0 l l l 2 3.5 4.0 -0.5 0.25 3 2.0 3.5 -1.5 2.25 4 2.5 2.0 0.5 0.25 5 4.0 2.5 1.5 2.25 6 2.0 4.0 -2.0 4.00 7 2.0 2.0 0.0 0.00 8 2.0 2.0 0.0 0.00 9 2.0 2.0 0.0 0.00 10 2.5 2.0 0.5 0.25 11 1.0 2.5 -1.5 2.25 12 2.0 1.0 1.0 1.00 13 2.5 2.0 0.5 0.25 14 2.5 2.5 0.0 0.00 15 2.5 2.5 0.0 0.00 16 2.0 2.5 -0.5 0.25 17 1.0 2.0 -1.0 1.00 18 2.5 1.0 1.5 2.25 19 2.0 2.5 -0.5 0.25 20 1.5 2.0 -0.5 0.25 21 1.5 1.5 0.0 0.00 22 2.5 1.5 1.0 1.00 23 1.5 2.5 -1.0 1.00 24 1.0 1.5 -0.5 0.25 25 2.0 1.0 1.0 1.00 26 1.0 2.0 -1.0 1.00 27 1.0 1.0 0.0 0.00 28 1.0 1.0 0.0 0.00 (29) l 1.0 l l m 2.05 2.05 -0.11 0.78 sd 0.83 0.83 0.89 1.01 r(t+1,t) 0.36 the mssd is the average of δ2 using n-1 (28-1=27) as denominator. r(t+1,t) is the autocorrelation between intrt+1 and intrt. malmberg et al | f l r 81 appendix 2. mplus code for single construct model (intrinsic motivation) title: mssd 28 june 2016 ; data: file is "c:\variability.txt" ; variable: names are studid sequence lag_n diff_t1 diff_t0 well_t1 well_t0 und_t1 und_t0 enj_t1 enj_t0 int_t1 int_t0 cho_t1 cho_t0 focus1 focus2 focus3 avoid1 avoid2 avoid3 ; !enj=enjoyment, int=interest, cho=choice, t1 = time t+1, t0 = time t usevar = enj_t1 int_t1 cho_t1 enj_var int_var cho_var ; !include three observed and three defined variables missing all (-9) ; between enj_var int_var cho_var ; !defined variables are at level 2 cluster = studid; !clustering by student define: enj_va = (enj_t1 enj_t0)**2 ; !squared difference of enjoyment int_va = (int_t1 int_t0)**2 ; !average squared difference of enjoyment cho_va = (cho_t1 cho_t0)**2 ; !squared difference of interest enj_var = cluster_mean (enj_va) ; !average squared difference of interest int_var = cluster_mean (int_va) ; !squared difference of choice cho_var = cluster_mean (cho_va) ; !average squared difference of choice center (grandmean) enj_t1 int_t1 cho_t1 enj_var int_var cho_var ; !grand mean centre level 2 indicators analysis: type = twolevel ; model: %within% w_intr by enj_t1 (a) int_t1 (b) cho_t1 (c) ; ! w_ = state-construct (within-level) ! factor loadings of within and between indicators are equated between level w_intr (var_w) ; !estimate variance, and use for calculating new parameter %between% b_intr by enj_t1 (a) int_t1 (b) cho_t1 (c) ; ! b_intr = trait-construct intr_var by enj_var int_var cho_var ; ! variability construct b_intr with intr_var ; b_intr (var_b) ; malmberg et al | f l r 82 !estimate variance, and use for calculating new parameter int_t1*.05 (br1) ; ! estimate error variance model constraint: br1 > 0 ; !br2 > 0 ; new(var_comp); var_comp = var_b / (var_b + var_w) ; ! calculate intraclass correlation of latent constructs output: stand sampstat tech1 tech2 ; microsoft word lindblom-ylänne et al_publication.docx frontline learning research vol.3 no. 2 (2015) 47-62 issn 2295-3159 academic procrastinators, strategic delayers and something betwixt and between: an interview study sari lindblom-ylännea1, emmi saariahoa, mikko inkinena, anne-haarala-muhonenb, telle hailikaria afaculty of behavioural sciences, university of helsinki, finland bfaculty of law, university of helsinki, finland article received 1 march 2015 / revised 24 april 2015 / accepted 25 may 2015 / available online 12 june 2015 abstract the study explored university undergraduates’ dilatory behaviour, more precisely, procrastination and strategic delaying. using qualitative interview data, we applied a theory-driven and person-oriented approach to test the theoretical model of klingsieck (2013). the sample consisted of 28 bachelor students whose study pace had been slow during their first university year. three student profiles emerged. the first concerned strategic delay and was represented by motivated students with strong self-efficacy beliefs who had intentionally postponed their studying. the second consisted of students whose delaying was unnecessary in nature; these students had minor self-regulation problems but were still motivated to study. the third profile consisted of procrastinating students who lacked self-regulation skills and had weaker self-efficacy beliefs. the results indicate that dilatory behaviour can vary from strategic delay to dysfunctional procrastination, and that different factors are related to these various types of dilatory behaviour. this study adds to our theoretical understanding of academic procrastination by empirically testing a new theoretical model of procrastination. in addition, the study shows the value of using a qualitative approach in understanding the phenomenon of dilatory behaviour. keywords: academic procrastination; strategic delay; dilatory behaviour; university student 1 corresponding author: sari lindblom-ylänne, institute of behavioural sciences, university of helsinki, p.o. box 9, 00014 university of helsinki , finland, email: sari.lindblom@helsinki.fi doi: http://dx.doi.org/10.14786/flr.v3i2.154 lindblom-‐ylänne et al | f l r 48 1. introduction research has shown that academic procrastination is very common among university students: almost all occasionally procrastinate in one or another domain of their studies, and approximately every second student regularly procrastinates (rothblum, solomon & murakami, 1986; steel, 2007). however, research in this area often lacks precision in the definition of procrastination, with the concept being used to describe different types of delay varying from functional to dysfunctional (e.g., klingsieck, 2013; schraw, wadkins & olafson, 2007; steel, 2007). an example of functional procrastination is a situation in which a student studies effectively and attains favourable results under pressure of an approaching deadline (e.g., choi & moran, 2009; chu & choi, 2005; schraw et al., 2007). examples of dysfunctional procrastination are delaying the decided time for beginning study processes, moving scheduled study periods for the future and engaging in study-irrelevant behaviour (e.g., schouwenburg, 1995). the different and even contrasting definitions of procrastination have made it difficult to understand the phenomenon and to follow the research. further, the definitions’ lack of precision also influences the way researchers operationalise these constructs and the analyses they perform. to address this pervasive problem, klingsieck (2013) recently provided an excellent meta-analysis of the different definitions of procrastination and of the trends in procrastination research. she suggests that a clear distinction should be made between procrastination and strategic delay, in other words, between dysfunctional and functional forms of delay. klingsieck (2013) proposed that the following seven parameters be used for this purpose: 1) the delay of an overt or covert act; 2) the act is intended to be started and/or completed; 3) the act is necessary or of personal importance; 4) the delay is voluntary; 5) the delay is unnecessary or irrational; 6) the act is delayed despite being aware of the potential negative consequences; and 7) the delay is accompanied by subjective discomfort or other negative consequences. according to klingsieck (2013), parameters 1 and 2 characterise any form of delay, and 3 and 4 both procrastination and strategic delay. what differentiates procrastination from strategic delay is the nature of the delay itself. the delay in procrastination is unnecessary, irrational and even harmful (5). in strategic delay, a student is confident that the positive consequences will eventually outweigh the potential negative ones, whereas procrastination involves negative consequences and is accompanied by subjective discomfort or other negative consequences (6 and 7). to summarise, “there is no functional form of procrastination, but there is a functional form of delay” (klingsieck 2013, 26), in other words, strategic delay. in the light of klingsieck’s distinction, research that has emphasised the adaptive forms of procrastination (e.g., choi & moran, 2009; chu & choi, 2005; ferrari, johnson & mccown, 1995; schraw, et al., 2007) could be considered as research on strategic delay. choi and colleagues (choi & moran, 2009; chu & choi, 2005) have used the terms ‘passive’ and ‘active’ procrastination. by passive procrastination they refer to postponing tasks “until the last minute because of an inability to make the decision to act in a timely manner” (choi & moran, 2009, 196). their definition of active procrastination fits klingsieck's (2013) definition of strategic delay where students are highly motivated by time pressure, and are able to complete tasks before deadlines and achieve satisfactory outcomes. thus, typical of active procrastinators is a preference for working under pressure (chu & choi, 2005). corkin, lu and lindt (2011) have argued that active procrastination is distinct from procrastination in general, and should be referred to as ‘active delay.’ their definition of active delay is very close to klingsieck's (2013) strategic delay. corkin et al. (2011) also showed active delay to be associated with adaptive self-regulatory processes and academic achievement, and procrastination to be associated with mastery-avoidance goals and a lack of metacognitive strategy. finally, grunschel, partzek and fries (2013a) used the term ‘purposeful delay,’ which can be considered synonymous with strategic delay. on the basis of this distinction between procrastination and strategic delay, procrastination can be defined as “the voluntary delay of an intended and necessary and/or [personally] important activity, despite expecting potential negative consequences that outweigh the positive consequences of the delay” (klingsieck, 2013, 26). klingsieck’s definition extends steel’s (2007, 65) definition of procrastination as “a prevalent and pernicious form of self-regulatory failure.” strategic delay may be defined here as the lindblom-‐ylänne et al | f l r 49 voluntary delay of an intended and necessary and/or [personally] important activity in which positive consequences are believed to outweigh negative consequences in the long run. from the point of view of individual students, procrastination and strategic delay can be closely intertwined when examining individuals in different study contexts, in different study situations and at different times. therefore, to understand individual dilatory behaviour more deeply, it is important, as klingsieck (2013) suggests, to expand the investigation of procrastination from one specific context and cover longer periods of time and different contexts. furthermore, when exploring the nature of procrastination it is important to take into account the whole procrastination process, from its reasons and contexts to actual procrastination behaviour and its consequences, as in grunschel, partzek and fries (2013b). the present study aims to empirically test the theoretical model of klingsieck (2013), in which strategic delay is separated from procrastination, by using a qualitative theory-driven and person-oriented approach. as a starting point we use the above-mentioned definitions of procrastination and strategic delay. in addition, the study aims to explore motivational, volitional and situational factors related to procrastination and strategic delay. 1.1 motivational, volitional and situational factors that promote procrastination theoretical approaches in procrastination research vary (klingsieck, 2013). the present study focuses on the motivational-volitional and situational dimensions of procrastination. researchers have identified many motivational-volitional and situational factors that accompany procrastination. low intrinsic study motivation, problems in self-regulation, poor time-management and/or organising skills and weak selfefficacy beliefs have been shown to be key factors in leading to procrastination (e.g., grunschel et al., 2013a; lee, 2005; pychyl, morin & salmon, 2000; rebetez, rochat & van der linden, 2015; steel, 2007; strunk, cho, steele & bridges, 2013; tice & baumeister, 1997; wolters, 2003). many studies have shown a link between procrastination and both extrinsic motivation and a lack of motivation (grunschel et al., 2013b; lee, 2005; pychyl et al., 2000; rebetez et al., 2015; tice & baumeister, 1997). lack of self-regulation has also been shown to increase procrastination (e.g., steel, 2007). self-regulation refers to a student’s own active role in his or her learning process (e.g., pintrich, 1995; vermunt & van rijswijk 1988; vermunt & verloop, 1999; zimmermann, 1994). characteristic of self-regulation is monitoring one’s actions, using metacognition, and regulating motivational and emotional states (e.g., pintrich, 1995; zimmerman, 1994). if these important skills are missing or if they are poorly developed, it is more difficult for a student to control his or her cognition, motivation, actions and emotions (pintrich, 1995), and this lack of control can promote procrastination. students who lack self-regulation skills often struggle alone, whereas students with good self-regulation skills are able to seek help for their study-related problems (newman, 1994; pintrich, 2004). klassen, krawschuk and rajani (2008) interestingly showed that low self-efficacy for self-regulation was a stronger predictor of the tendency to procrastinate than poor self-regulation skills or weak self-efficacy beliefs alone, or other motivation variables. according to them, self-efficacy for self-regulation “reflects an individual’s beliefs in his or her capabilities to use a variety of learning strategies, resist distractions, complete schoolwork, and participate in class learning” (p. 918). they showed that “self-efficacy to structure the learning environment […] leads to timely task completion and successful academic achievement” (klassen, et al., 2008, 922). furthermore, problems in self-regulation, together with underachievement and the avoidance of tasks seen as demanding, are characteristic of a self-handicapping strategy (e.g., eerde, 2003; garcia & pintrich, 1994; howell & watson, 2007), which in turn has been shown to be related to procrastination (ferrari & tice, 2000; solomon & rothblum, 1984; steel, 2007). self-handicapping is a cognitive strategy, which concerns avoiding effort, and in this way preventing potential failure from lowering self-esteem. for persons applying a self-handicapping strategy, avoiding effort is a way to make good performance less likely and to protect their sense of self-competence (jones & berglas, 1978). furthermore, typical of such a strategy is maladaptive task-irrelevant behaviour and a preference for external regulation in which responsibility for the lindblom-‐ylänne et al | f l r 50 learning process is shifted to the teacher (heikkilä & lonka, 2006). in addition, achievement goals are related to procrastination, and can be divided into mastery and performance goals. mastery goals focus on developing new skills, whereas performance goals focus on demonstrating ability and skills (e.g., ames & archer, 1988; elliot & harackiewicz, 1994). strong achievement goals have been shown to reduce procrastination, whereas having weak or no achievement goals increases procrastination (howell & buro, 2009). further, the effect of achievement goals on study process and study success seems to be mediated by students’ self-efficacy beliefs. the higher the students’ perceived self-efficacy is, the higher they set their goals, which in turn leads to better academic achievement (e.g., cheng & chiou, 2010). solomon and rothblum (1984) found that both fear of failure and feeling the task at hand to be disagreeable caused procrastination. in the case of fearing failure, procrastination has been explained by traits such as perfectionism, anxiety and low self-efficacy beliefs regarding one’s skills in organizing and regulating oneself in order to succeed at specific tasks (e.g., bandura, 1997). experiencing a task as disagreeable has been explained by problems in time management. fear of failure, low self-efficacy beliefs, task aversiveness and laziness have all been repeatedly mentioned as factors leading to procrastination (e.g., blunt & pychyl, 2000; eerde, 2003; ferrari & tice, 2000; howell & watson, 2007; rothblum et al., 1986; wolters, 2003). 1.2 aims of the study the present study has two objectives. firstly, we aim to empirically test the theoretical model of klingsieck (2013), in which strategic delay is separated from procrastination, by using a qualitative theorydriven and person-oriented approach. secondly, we aim to clarify on how motivational, volitional and situational factors are related to procrastination and strategic delay. the purpose is to explore long-term dilatory behaviour, i.e., procrastination and strategic delay, among university students during the first study year. because research on academic procrastination has mainly focused on short-term, task-related procrastination, knowledge about long-term dilatory behaviour is scarce. there are, however, a few interesting exceptions. a good example of a long-term research design is wäschle, allgaier, lachner, fink and nückles (2014): long-term procrastination and its relation to self-efficacy among university students were investigated using self-monitoring protocols during an academic term. the present study extends longterm dilatory behaviour to the whole academic year. 2. methodology 2.1 participants the participants comprised bachelor-level humanities and law students (n=28) from the university of helsinki, whose study pace had been slow during their first academic year. these individuals lacked at least a quarter of the credits students at the university are expected to earn each year. students earning less than 46 credits during their first year must submit a report to the university regarding their slow study pace, and create a detailed plan for future studies. the 46-credit limit is also the minimum requirement for receiving government-financed study grants, which are an important student benefit. we applied purposive sampling and invited these students to participate in interviews after their first year of study. two sample groups were assembled. the first consisted of humanities students from the faculty of arts, where graduation times are the longest (more than four years), and the second consisted of students from the faculty of law, where the average graduation times are the shortest (approx.. three and a half years). in addition, the two bachelor curricula are different in nature: the law curriculum is professional, lindblom-‐ylänne et al | f l r 51 comprised mainly of law studies with few optional courses, whereas humanities students can freely choose their minors and there are fewer compulsory elements. a total of 154 humanities students were enrolled in three bachelor of arts undergraduate programs, of whom 27 (17.5%) had earned less than 46 credits. of these 27 students 17 (63%) volunteered to be interviewed. their mean age was 24 years, ranging from 20 to 36. altogether 27% of these participants were male and 73% female. male students were slightly over-represented in the sample, their proportion of the whole cohort being 23%. a total of 247 law students were enrolled in bachelor of law undergraduate program, of whom 36 students (15%) had earned less than 46 credits. of these 36 students, 11 (31%) volunteered to be interviewed. their mean age was 22.5 years, ranging from 20 to 26. altogether 55% of these participants were male and 45% female. similarly to the humanities sample, male students were over-represented, their proportion of the whole cohort being 43%. in the results section, the humanities students are referred to as h, and law students as l. 2.2 materials research on procrastination has mainly applied quantitative approaches in which the data have been collected through various self-report instruments. to our knowledge, only four studies have applied a qualitative approach in order to explore procrastination or delay in academic studying (grunschel et al., 2013b; klingsieck, grund, schmid & fries, 2013; patrzek, grunschel & fries, 2012; schraw et al., 2007). these studies vary in their definitions of procrastination and in their qualitative methodological designs. furthermore, schraw et al. (2007) questioned whether self-report instruments could capture all of the possible dimensions of procrastination and therefore chose a qualitative approach. we also opted for a qualitative approach, but applied it differently compared to the four above-mentioned qualitative studies. our study’s sample comprised students whose study pace had been slow during their first university year, but we did not particularly ask the students whether, when or how they procrastinated. instead, we used in-depth interviews to explore the study processes as well as the expectations, experiences, interest and motivation of students who had studied slowly during their first year. further, we applied a qualitative person-oriented approach (e.g., vanthurnout, 2011), meaning that we used individual students as units of analysis. interviewing university students after their first study year made it possible to expand the scope of our investigation of procrastination and strategic delay from individual assignments or courses to a whole academic year. the participants volunteered to be interviewed after their first study year. the data collection was approved by the faculties. the students were informed that the results of the study would be used to enhance the program design and the development of the teaching-learning environments. the students gave their informed consent to participate in the study and were told that they could withdraw from it at any time. the interviews concentrated on three broad themes, which were based on previous research on motivational, volitional and situational factors related to dilatory behaviour. the first two explored the motivational and volitional dimensions of dilatory behaviour, whereas the third focused on the situational dimensions, as follows: a) students’ evaluations of themselves as university students, their study aims and future goals, motivation to study, and study success, b) descriptions of the students’ study processes and practices, and c) experiences of the teaching-learning environment. contrary to klingsieck et al. (2013) and grunschel et al. (2013b), we did not specifically ask for the students’ views, explanations or definitions of procrastination. instead, the interviews focused on their aims, study processes, evaluations and experiences of their first study year, and thus ‘circled around’ the lindblom-‐ylänne et al | f l r 52 phenomenon. by doing so we wanted to ensure that the students’ spontaneous and personal views were heard, and that our questions did not steer the students to explore their first study year specifically from the point of view of procrastination. consequently, our interviews were longer and less structured than those of schraw et al. (2007), klingsieck et al. (2013) and grunschel et al. (2013b). the fourth and fifth authors acted as interviewers; the fourth author interviewed the law students and the fifth the humanities students. the length of the interviews varied from approximately forty minutes to an hour, and the interviews were transcribed verbatim. for each profile, the most typical and representative extracts were selected. the selected extracts were translated into english. due to this translation process, the extracts do not represent authentic spoken english. to ensure the anonymity of the interviewees, the age and gender of the participants are not revealed. all students are referred to as ‘she’. 2.3 procedure we applied a theory-driven approach in which we used klingsieck’s model (2013) as a theoretical basis for the study. we developed the analysis process by using the model of deductive content analysis by elo and kyngäs (2008) as the starting point. all five authors were involved in the analysis process, which consisted of four phases. during phase 1 the data were prepared for the analysis and the unit of analysis was defined. as we applied a person-oriented approach, we used individual students or whole interviews as units of analysis. we also developed a categorisation matrix in which criteria were created for the seven parameters (table 1). klingsieck’s descriptions of the seven parameters of her model were quite short, which presented several obstacles with respect to our theory-driven approach. therefore it was important to create more explicit criteria. most challenging was to devise criteria for parameters 4 and 5, more precisely to define the difference between voluntary and unnecessary delay. defining this difference was particularly important, because according to klingsieck, parameters 3 and 4 represent strategic delay and parameters 5 to 7 procrastination. in addition, ‘unnecessary’ and ‘irrational’ in the description of the parameter 5 seemed very different from each other, which made this parameter quite broad in nature. finally, it was important to define what we meant by ‘act’. we defined it as study activities, processes, assignments and tasks during the first study year, in other words, not as a specific study task at a course level. lindblom-‐ylänne et al | f l r 53 table 1. the categorisation matrix. criteria for the seven parameters for the theory-driven data analysis. note. additional criteria for phase 3 are presented in italics. in phase 2 the categorisation matrix was used to review all data parameter by parameter. the first and fifth authors independently analysed the interview transcripts of all 28 students by checking each of the seven parameters of klingsieck’s model one by one, moving from the first parameter to the seventh. each interview transcript was therefore analysed independently by these authors in a cycle of seven rounds. in each round the data were coded for correspondence with criteria for each parameter. for example, for klingsieck’s seven parameters of delay (2013). criteria 1) an overt or covert act is delayed. evidence of delay, e.g., a low number of credits, unfinished courses, or not following timetables and not meeting deadlines. 2) the start or the completion of the act is intended. evidence of studies having been started and no evidence of dropping out from the program. 3) the act is necessary or of personal importance. at least one of the following elements: intention to graduate from university, commitment to studying or graduating, interest in studying or motivation to study. 4) the delay is voluntary and not imposed on oneself by external matters. no evidence of external reasons for the delay, such as sickness or family crisis. 5) the delay is unnecessary or irrational. evidence of possibilities to act or choose another way to proceed in studying, such as working fewer hours, spending less time on hobbies, or using more time for studying. no evidence of clear reasons for the delay. 6) the delay is achieved despite being aware of its potential negative consequences. evidence of the awareness of possible negative consequences. awareness is explicitly stated. 7) the delay is accompanied by subjective discomfort or other negative consequences. evidence of subjective discomfort or of negative consequences. discomfort is not necessarily expressed by using only negative words. therefore, jokes and laughing are scrutinized within their contexts. lindblom-‐ylänne et al | f l r 54 parameter 1 ‘an overt or covert act is delayed’ we coded all data segments in the interviews, which showed evidence of delay of an act, i.e., delay of study activities, processes, assignments or tasks during the first study year. the pieces of evidence were, for example, unfinished courses, not following timetables or not meeting deadlines. the two authors then compared their findings, and were unanimous about all participants having fulfilled the parameters 1 to 3: all students had delayed an overt or covert act (1), and intended to start and/or complete the act (2). it was also clear that the act had been necessary or of personal importance to all students (3). in addition, the two authors were unanimous regarding the voluntary versus involuntary nature of students’ dilatory behaviour (4). despite difficulties in creating criteria for ‘unnecessary’ or ‘irrational’ dilatory behaviour (5), the comparisons concerning this showed no differences between the two authors. however, the authors differed slightly regarding parameters 6 and 7. in some cases, it had been difficult to evaluate whether a student had been aware of the potential negative consequences (6). for phase 3 we clarified the criteria for this parameter so that a student’s awareness needed to be explicitly stated, and thus not inferred by the authors. furthermore, the two authors discussed criteria for ‘subjective discomfort and other negative consequences’ (7). in most of the cases, parameter 7 was easy to evaluate, although a number of problematic instances were found in which students had laughed and/or joked about their dilatory behaviour. thus, subjective discomfort seemed to be sometimes disguised by joking. the criteria for parameter 7 was further specified so that joking or laughing about dilatory behaviour needed to be scrutinized within their contexts, and that individual jokes would not automatically fulfil this parameter. at the end of this phase, all authors agreed on the adjusted criteria. phase 3 concentrated only on parameters 4 to 7, because the previous phase showed that each participant undoubtedly met parameters 1 to 3. the 17 interviews transcripts of the humanities students were independently analysed by the second author, and the 11 interview transcripts of the law students were analysed by the fourth. the analysis results were then compared and discussed between all authors. in phase 4 the student profiles were created, with all authors being involved. this phase concentrated particularly on analysing the individual profiles of students who met the criteria for strategic delay, but not all of the criteria for procrastination. 3. results 3.1 the three dilatory profiles the first aim of the study was to empirically test klingsieck’s model (2013), in which strategic delay and procrastination differ from any kind of delay in that the delay must be voluntary. the analysis revealed that one humanities student did not meet this criterion, because the slow progress of her studies due to external factors, more precisely, by unexpected family crises. altogether ten participants (37%) comprising six humanities and four law students, perfectly fit into klingsieck’s characterization of strategic delay (i.e., parameters 1 to 4). this profile was named strategic delayers. the following extract was very typical of the students in this profile: well, it [slow study pace] was mainly because of my own choices, i mean how to use your time…whether to study or go to work and so on, hobbies as well. they are just my own choices. of course the other school [completing studies at another university] affected, and partly also the fact that some courses took place simultaneously. (student h5) four students met all seven parameters of klingsieck’s (2013) model. these students were aware of the negative consequences of delay. further, their delay was unnecessary or irrational: i don’t know how the others…maybe they just are able to study and make it work, i can’t. in one course i should have completed two essays, but i just couldn’t do the other one at lindblom-‐ylänne et al | f l r 55 all. i finished one essay and got a poor grade, but that other…it was just, i just could not start. i even talked to the teacher and got good advice on how to start and what points to include, but somehow everything just vanished from my head. now i doubt whether i can ever succeed in writing essays, but i just have to try. (student h13) in addition, all four students expressed subjective discomfort, as the following extract shows: i’m independent and ambitious, but i easily get nervous and lose self-confidence. then i start feeling that i cannot do this, and that i’d just like to leave everything…or is it wise to spend all my time on this and just go crazy. so i dropped out from many courses and felt like a loser. (student h8) interestingly, 13 students’ profiles (48%) could not be determined as representing either strategic delayers or procrastinators. of these, seven students met the first six parameters meaning that these students were aware of the potential negative consequences of their dilatory behaviour, but had not experienced subjective discomfort or other negative outcomes. this was the only aspect separating these students from the four who met all seven parameters. we therefore created a procrastinators profile consisting of two subgroups: procrastinators not expressing subjective discomfort (n=6; 22%) and procrastinators experiencing subjective discomfort (n=4; 15%). this can be seen as following klingsieck’s model, because she mentions that procrastination often entails subjective discomfort or negative consequences, but not always. the subgroup procrastinators not expressing subjective discomfort consisted of one law and five humanities students, and the subgroup procrastinators experiencing subjective discomfort consisted of three humanities and one law student. despite the negative consequences of their unnecessary delay, procrastinators not expressing subjective discomfort did not exhibit anxiety or stress: i’m really bad in doing anything independently. maybe i do some assignments, but i don’t read much of the literature, which i should be able to do in this field. i’m also too lazy to reserve time for that. for example, i feel now so tired that there is no way i’m going to the library even though i should. instead, i go home and do anything else but study. maybe at some point i do some studying – a little [laughs]. (student h6) finally, the remaining seven students met the parameters 1 to 5: their dilatory behaviour had been unnecessary or irrational, which was not characteristic of the strategic delayers’ profile. these students had the possibility to choose another way to proceed in their studying, such as working fewer hours, spending less time on hobbies, or using more time to studying. however, there was no evidence of subjective discomfort and of being aware of potential negative consequences, which were typical of the procrastinators’ profile. these seven students seemed to fall between the strategic delayers and procrastinators, forming another profile we termed unnecessarily delaying students (n=7, 26%). two students represented humanities and five law. the following extract was very typical of them: i’m doing ok and have liked it here [at the faculty of law], but it was quite a surprise how much one should read and study for exams. i’ve had many other things to do, work and hobbies. i kept up my study pace quite nicely during the fall semester, but in the spring i started to slip. i realised too late that i didn’t start early enough and hadn’t read enough for the exams. (student l1) figure 1 summarises how klingsieck’s seven parameters (2013) were met in the humanities and law students’ dilatory profiles. parameters 1 to 3 were met in all profiles. strategic delayers met parameters 1 to 4 and unnecessarily delaying students parameters 1 to 5. the first subgroup of procrastinators, i.e., procrastinators not expressing subjective discomfort, met parameters 1 to 6, and the second subgroup of procrastinators, i.e., procrastinators experiencing subjective discomfort, met all seven parameters. lindblom-‐ylänne et al | f l r 56 figure 1. the dilatory profiles created on the basis of klingsieck’s (2013) seven parameters (n=28). 3.2 motivational, volitional and situational factors related to strategic delay and procrastination the second aim of the study was to explore how motivational, volitional and situational factors are related to procrastination and strategic delay. next, we explore each dilatory profile in more detail from the point of view of these factors. 3.2.1 strategic delayers the strategic delayers’ evaluations of themselves as students, their study experiences, and their experiences of the teaching-learning environment were positive. their strong volition was expressed by good self-regulation and time management, as shown in the following extract: i think i have good time-management skills, because i have been able to do a lot of sports and work while studying. i think there is a nice balance now and i’m quite happy about what i’m able to do in one week. i try to plan my schedule about a month ahead. (student h2) some students had encountered minor difficulties in self-regulation or time management, but had already sought help or changed their study practices after reflecting upon their life situation and study plans, which also reflects volition: at the beginning of the fall semester i lost my study rhythm, but now i have found a good one. i realised that i couldn’t be successful in studying while both working part-time and doing a lot of sports, so i had to lessen my exercising hours. (student l9) the students in this profile were able to describe their study processes and practices in greater detail than those in the other profiles. furthermore, strategic delayers seemed to be more successful at combining their studies with family and/or working life. all strategic delayers also showed personal interest in studying as well as intrinsic motivation. the following extract is very representative of this profile: 0 1 2 3 4 5 6 7 procras2nators expressing subjec2ve discomfort (n=4) procras2nators not experiencing subjec2ve discomfort (n=6) unnecessarily delaying students (n=7) strategic delayers (n=10) no strategic delay or procras2na2on (n=1) parameters d il a to ry p ro fi le s 1 act is delayed 2 act isintended to be started / completed 3 act is necessary 4 act is voluntary 5 act is unneccesary 6 delay despite being aware of consequences 7 subjective discomfort lindblom-‐ylänne et al | f l r 57 i chose this field as a result of my own personal interests. the subject has interested me through my whole school history. it was one of my favourite subjects at school. in addition, i’m devoted to a hobby, which even increases my interest in studying, because it gives me a personal perspective on this field. (student h9) 3.2.2 unnecessarily delaying students unnecessarily delaying students shared the same positive study experiences and high interest and motivation with strategic delayers, but seemed to show weaker volition, as the following typical extract shows: there seems to be all kinds of disrupting factors…i’m sometimes nit in the right mood to study effectively. it’s difficult to describe this feeling, it’s a kind of lack of being able to concentrate…i would say that my biggest problem is to begin reading. after i have started, then it becomes easier, but then something interrupts my studying again. (student l11) typical of unnecessarily delaying students was also a discrepancy between their own study objectives and the actual study practices, in other words an intention-action gap: i’m very good at making plans, but quite poor at executing them. i try to allow myself free time as well, but studying seems to steal it. i try to use my time effectively, but because i work part-time, it’s often difficult to really have enough time for everything. (student h16) while strategic delayers had succeeded through their time management and self-regulation in combining studies, family life and work, unnecessarily delaying students had more difficulty doing so, which resulted in slower study progress. 3.2.3 procrastinators selecting one’s own field of study had not been a clear or easy task for the students in both subgroups of the procrastinators profile. these students had selected their major subjects either on the basis of success in this subject in high school, or because they could not decide upon a better option. they might also have seriously considered other disciplines before finally deciding, as the following extracts indicate: when i realised that i don’t have the skills and ambition [to reach my dream profession], i created a backup plan. i thought of different options, but this field has been my choice for a couple of years now. (student h6, subgroup procrastinators not expressing subjective discomfort) furthermore, most students did not have clear plans for the future, as the following extract shows: well, it's not very convincing [future employment]. everyone asks me what i’m going to become when i graduate, and i cannot answer. maybe something. maybe a rare type of expert, but who could hire someone like that? (student h13, subgroup procrastinators experiencing subjective discomfort) time management was also difficult for all students in this profile: just thinking about a calendar gives me the creeps. so i didn’t have a calendar, because i felt it would control my life. however, i finally gave in very reluctantly, and started to use a mobile calendar. that, however, was actually a good thing, because when it beeps it reminds me of things. maybe i just feel that others have more free time than i have. maybe others are just better at organising. because i’m less focused as a person, my use of time is not efficient, and i realise that. (student h12, subgroup procrastinators not expressing subjective discomfort) lindblom-‐ylänne et al | f l r 58 procrastinators had also missed lectures and stopped attending some courses. further, their study experiences seemed less positive than those of the first two dilatory profiles. however, the subgroups differed from each other in terms of their study experiences in that procrastinators experiencing subjective discomfort expressed more negative emotions and had less positive experiences than procrastinators not expressing subjective discomfort. clear differences were noted between the two subgroups in self-efficacy beliefs, experienced stress and study motivation. procrastinators experiencing subjective discomfort were less motivated to study than procrastinators not expressing subjective discomfort: the most important reason for my lowered interest was a feeling of strain or burden. now when i think about it logically, there were no clear reasons for that feeling, but what can you do about your feelings? (student h3, subgroup procrastinators experiencing subjective discomfort) furthermore, procrastinators experiencing subjective discomfort seemed to lack the ability to evaluate their own knowledge and skills, whereas procrastinators not expressing subjective discomfort had a more realistic view of themselves as students, describing themselves honestly and in a manner that showed no without anxiety or stress: lazy. aimless. floating. so i’m not the kind of person who has clear long-term objectives. i could be more active. in high school i liked that someone was watching and monitoring me. at university i should also have someone with authority to push me forward. you know, it’s too easy here to think that i will do this the next year, so delaying is not that harmful. i know that some students succeed at being efficient and organized, but i’m not the only one like this. (student h12, subgroup procrastinators not expressing subjective discomfort) the first study year of procrastinators experiencing subjective discomfort had been much more difficult than they had anticipated, and they were puzzled by the problems that had arisen. the vicious circle of procrastination resulted from a combination of lack of regulation skills, high workloads, lack of interest, low self-efficacy beliefs, and exhaustion, as the following typical extract indicates: from primary to upper secondary school i was a really good student, but now i feel that i can’t learn anything about any topic. this depresses me. i don’t have enough time to really learn something, and that feels bad. maybe i’m aiming too high, and when i can’t reach my aims, i get depressed. (student l4, subgroup procrastinators experiencing subjective discomfort) the students were unevenly distributed in the profiles in terms of their discipline, except for the profile strategic delayers, in which about 37% of both humanities and law students belonged. the humanities students were over-represented in both procrastinators subgroups. almost half of the law students belonged in the profile unnecessarily delaying students whereas only 13% of the humanities belonged in this profile. 4. discussion the present study empirically tested klingsieck’s (2013) theoretical model, and provided support for the suggestion to differentiate procrastination from strategic delay. interestingly, our results showed that dilatory behaviour is even more complex than klingsieck suggests. our in-depth qualitative analyses revealed forms of dilatory behaviour lying somewhere between procrastination and strategic delay, that do not meet the criteria for either strategic delay or procrastination: unnecessarily delaying students showed no lindblom-‐ylänne et al | f l r 59 awareness of potential negative consequences and did not exhibit subjective discomfort, and procrastinators not expressing subjective discomfort did not meet klingsieck’s (2013) seventh criterion of subjective discomfort or other negative consequences. in addition, the results interestingly showed only a small percentage of students meeting all seven criteria of procrastination, making this form of dilatory behaviour the least common in our data. the strategic delayers had good self-regulation and time-management skills as well as self-efficacy for self-regulation, which was shown by klassen et al. (2008) to impede procrastination. these students also exhibited strong achievement goals as well as interest and intrinsic motivation with respect to studying, all of which has been shown by howell and buro (2009) to reduce procrastination. furthermore, they exhibited good metacognitive and reflective skills in evaluating and developing their study processes and practices. these students’ dilatory behaviour was due to their life situations and how they had prioritised their study tasks. unnecessarily delaying students were also motivated to study and showed an interest in their majors, as was the case with strategic delayers. many had quite clear study plans as well. however, typical of these students was an intention-action gap: they often could not execute their study plans, which indicates problems in self-regulation and time management. this is in line with studies showing a relation between self-regulation problems and dilatory behaviour (e.g., corkin et al., 2011; wolters, 2003). both subgroups of the procrastinators profile lacked self-regulation and time-management skills, had low self-efficacy for self-regulation, and interest in their major subject was lower than that of the two other profiles. they also lacked intrinsic motivation and clear goals. however, procrastinators not expressing subjective discomfort seemed more motivated and interested than procrastinators experiencing subjective discomfort. these factors are in line with research showing that low intrinsic study motivation, problems in self-regulation, poor time-management skills and weak self-efficacy beliefs to be key factors in promoting procrastination (e.g., grunschel, et al., 2013a; lee, 2005; pychyl, et al., 2000; steel, 2007; strunk, et al., 2013; tice & baumeister, 1997; wolters, 2003). procrastinating students often mentioned personal characteristics when explaining their slow study pace, which could indicate trait procrastination (schouwenburg, 1995). this is consistent with research that has emphasized the maladaptive nature of procrastination (e.g., pychyl, et al., 2000; schouwenburg, 1995; solomon & rothblum, 1984; steel, 2007). shraw et al. (2007) have reported that some students postpone their studying because of fatigue and burnout. wäschle et al. (2014) showed that low goal achievement decreased perceived self-efficacy and increased procrastination. according to them, this might be due to both an expectation of repeated failure as well as negative emotions, which was also evident in our ‘procrastinating’ students. wäschle et al. (2014, 112) further showed that “instead of increasing their learning and raising cognitive strategy use, these students tended to irrationally postpone their studying.” the procrastinating students in our study also seemed to ‘freeze’ instead of act when confronting study problems, thus indicating a self-handicapping strategy (eerde, 2003; garcia & pintrich, 1994; howell & watson, 2007). other interesting differences were found between the humanities and law students. as mentioned earlier, the average graduation time in humanities is longer than in law. our results were in line with this, because the percentage of humanities students was higher in the procrastinators profile, whereas that of the law students was higher in the unnecessarily delaying students profile. the difference is probably due to situational factors: the law curriculum consists largely of mandatory legal courses and leaves little freedom of choice, whereas humanities students must make their own decisions concerning their minors. thus the results suggest that having the freedom to choose between numerous possibilities may promote procrastination, particularly for students with low self-regulation skills. however, because of the small sample size, we cannot make any generalisations of the relation between study context and the nature of dilatory behaviour. lindblom-‐ylänne et al | f l r 60 our study has several limitations. the number of participants was quite low and the students represented a narrow range of academic disciplines, i.e., humanities and law. in addition, even though our qualitative approach and means of data collection were chosen deliberately, it is possible that memory distortion may have affected our data. utilising the experience-sampling method developed by csikszentmihalyi, larson and prescott (1977) combined with interviews might have improved the participants’ memories of their first study year. pychyl et al., (2000) have successfully applied this method in procrastination research. another option would have been to complement interviews with other kinds of selfreport methods, such as self-monitoring protocols (wäschle et al., 2014) or learning diaries. however, as all previously mentioned data-collection methods are based on self-reports, it would be important in the future to complement self-report data with evaluations by tutors or study counsellors. it is also possible that errors were made in assigning students to dilatory profiles. however, we tried to minimise the errors by involving all five authors to independently assign students to the dilatory profiles. much of the research to date on procrastination and dilatory behaviour has applied mainly quantitative approaches and has focused on short-term procrastination. our theory-driven, person-oriented approach and our focus on long-term procrastination yielded a profound understanding of factors related to slow study progress among university students. in our view, future research into dilatory behaviour should endeavour either to be qualitative or use mixed-method or multi-method designs. in this way it is possible to better capture the richness and variation in dilatory behaviour. the results of the study can be applied to support the individual study paths of university students. the results imply that students representing different dilatory profiles need different kind of support during their studies. while strategic delayers might do well in a study environment offering ample alternatives and a freedom of choice, for unnecessarily delaying students this kind of an environment can be more harmful. unnecessarily delaying students can benefit from support for developing their self-regulation and timemanagement skills. further, the results indicate that procrastinators would have needed help from the beginning of their university studies, and during the spring term the latest, because the harmful effects procrastination were clearly visible after the first study year. these students’ self-efficacy beliefs had already weakened and they had started to doubt their skills, motivation and interest in their study fields. therefore, it is important to develop such study-counselling practices that can help diagnosing and solving students’ study problems at an early stage. keypoints the study empirically tested klingsieck’s (2013) theoretical model, and provided support for the suggestion to differentiate procrastination from strategic delay. a qualitative approach can better capture the richness of dilatory behaviour. the theory-driven, person-oriented approach and the focus on long-term procrastination yielded a profound understanding of factors related to slow study progress among university students. references ames, c. & archer, j. (1988). achievement goals in the classroom: students’ learning strategies and motivation processes. journal of educational psychology, 80(3), 260-267. bandura, a. (1997). self-efficacy. the exercise of control. new york: w. h. freeman and company. blunt, a., & pychyl, t. (2000). task aversiveness and procrastination: a multi-dimensional approach to task aversiveness across stages of personal projects. personality and individual differences, 28, 153-167. cheng, p.y., & chiou, w.-b. (2010). achievement, attributions, self-efficacy, and goal setting by accounting undergraduates. psychological reports, 106(1), 1-11. lindblom-‐ylänne et al | f l r 61 choi, j.n., & moran, s. v. (2009). why not procrastinate? development and validation of a new active procrastination scale. the journal of social psychology, 149(2), 195–211. chu, a.h.c., & choi, j. n. (2005). rethinking procrastination: positive effects of “active” procrastination behavior on attitudes and performance. the journal of social psychology, 145(3), 245–264. corkin, d, yu, s., & lindt, s. (2011). comparing active delay and procrastination from a self-regulated learning perspective. learning and individual differences 21, 602–606. doi:10.1016/j.lindif.2011.07.005 csikszentmihalyi, m., larson, r., & prescott, s. (1977). the ecology of adolescent activity and experience. journal of youth and adolescence, 6, 281-294. eerde, w. (2003). a meta-analytically derived nomological network of procrastination. personality and individual differences, 35, 1401-1418. doi: 10.1016/s0191-8869(02)00358-6 elliott, a., & harackiewicz, j. (1994). goal setting, achievement orientation, and intrinsic motivation: a meditational analysis. journal of personality and social psychology, 66(5), 968-980. elo, s. & kyngäs, h. (2008). the qualitative content analysis process. journal of advanced nursing, 62(1), 107-115. doi: 10.1111/j.1365-2648.2007.04569.x. ferrari, j. r., johnson, j. l., & mccown, w. g. (1995). an overview of procrastination. in j. r. ferrari, j. l. johnson, & w. g. mccown (eds.). procrastination and task avoidance. theory, research and treatment (pp. 1-20). new york: plenum press. ferrari, j. r., & tice, d. m. (2000). procrastination as a self-handicap for men and women: a task-avoidance strategy in a laboratory setting. journal of research in personality, 34, 73–83. doi:10.1006/jrpe.1999.2261 garcia, t., & pintrich, p. r. (1994). regulating motivation and cognition in the classroom: the role of self schemas and self-regulatory strategies. in d. h. schunk & b. j. zimmerman (eds.) self-regulation of learning and performance. issues and educational applications (pp. 127–153). new jersey: lawrence erlbaum associates, inc., publishers. grunschel, c., patrzek, j., & fries, s. (2013a). exploring reasons and consequences of academic procrastination: an interview study. european journal of psychology of education, 28(3), 841-861. doi 10.1007/s10212-012-0143-4 grunschel, c., patrzek, j., & fries, s. (2013b). exploring different types of academic delayers: a latent profile analysis. learning and individual differences, 23, 225-233. doi: 10.1016/j.lindif.2012.09.014. heikkilä, a., & lonka k. (2006). studying in higher education: students’ approaches to learning, self regulation, and cognitive strategies. studies in higher education, 31, no. 1, 99–117. doi: 10.1080/03075070500392433. howell, a., & buro, k. (2009). implicit beliefs, achievement goals, and procrastination: a mediational analysis. learning and individual differences, 19, 151-154. doi:10.1016/j.lindif.2008.08.006 howell, a., & watson, d. (2007). procrastination: associations with achievement goal orientation and learning strategies. personality and individual differences, 43, 267-178. doi:10.1016/j.paid.2006.11.017. jones, e. & berglas, s. (1978). control of attributions about the self through self-handicapping strategies: the appeal of alcohol and the role of underachievement. personality and social psychology bulletin, 4(2), 200-2006. klassen, m., krawchuk, l., & rajani, s. (2008). academic procrastination of undergraduates: low self efficacy to self-regulate predicts higher levels of procrastination. contemporary educational psychology, 33, 915-931. doi:10.1016/j.cedpsych.2007.07.001. klingsieck, k. (2013). procrastination. when good things don’t come to those who wait. european psychologist, 18(1), 24–34. doi: 10.1027/1016-9040/a000138. klingsieck, k., grund, a., schmid, s., & fries, s. (2013) why students procrastinate: a qualitative approach. journal of college student development, 54(4), 397-412. doi: 10.1353/csd.2013.0060 lee, e. (2005). the relationship of motivation and flow experience to academic procrastination in university students. the journal of genetic psychology, 166, 5-14. doi:10.3200/gntp.166.1.5-15. newman, r. s. (1994). adaptive help seeking: a strategy of self-regulated learning. in d. h. schunk d. h., & b. j. zimmerman (eds.) self-regulation of learning and performance. issues and educational applications (pp. 283–301). new jersey: lawrence erlbaum associates, inc., publishers. patrzek, j., grunschel, c., & fries, s. (2012). academic procrastination: the perspective of university lindblom-‐ylänne et al | f l r 62 counsellors. international journal for the advancement of counselling, 34(3), 185-201. doi: 10.1007/s10447-012-9150-z. pintrich, p. r. (1995) understanding self-regulated learning. in p. r. pintrich (ed.) understanding self regulated learning (pp. 3-12). san francisco: jossey-bass publishers. pintrich, p. r. (2004). a conceptual framework for assessing motivation and self-regulated learning in college students. educational psychology review, 4, 385-408. pychyl, t. a., morin, r. w., & salmon, b. r. (2000). procrastination and the planning fallacy: an examination of the study habits of university students. journal of social behavior and personality, 15, 135–150. rebetez, m., rochat, l., & van der linden, m. (2015). cognitive, emotional, and motivational factors related to procrastination: a cluster analytic approach. personality and individual differences 76, 1–6. http://dx.doi.org/10.1016/j.paid.2014.11.044. rothblum, e. d., solomon, l. j., & murakami, j. (1986). affective, cognitive, and behavioral differences between high and low procrastinators. journal of counseling psychology, 33(4), 387-394. schouwenburg, h. c. (1995). academic procrastination: theoretical notions, measurement, and research. in j. r. ferrari, j. l. johnson, & w. g. mccown (eds.) procrastination and task avoidance. theory, research and treatment (pp. 71–96). new york: plenum press. schraw, g., wadkins, t. & olafson, l. (2007). doing the things we do: a grounded theory of academic procrastination. journal of educational psychology, 99, 12–25. doi: 10.1037/0022-0663.99.1.12. solomon, l. j. & rothblum, e. d. (1984). academic procrastination: frequency and cognitive-behavioral correlates. journal of counseling psychology, 31, 503–509. steel, p. (2007). the nature of procrastination: a meta-analytic and theoretical review of quintessential self regulatory failure. psychological bulletin, 133, 65–94. doi: 10.1037/0033-2909.133.1.65 strunk, k., cho, j., steele, m., & bridges, s. (2013). developing and validation of a 2 x 2 model of time related academic behaviour: procrastination and timely engagement. learning and individual differences, 25, 35-44. doi: 10.1016/j.lindif.2013.02.007. tice, d, m., & baumeister r, f. (1997). longitudinal study of procrastination, performance, stress and health: the costs and benefits of dawdling. psychological science, 8(6), 454–458. vanthurnout, g. (2011). patterns in student learning. exploring a person-oriented and a longitudinal research perspective. antwerpen-apeldoom: garant. doctoral thesis. university of antwerpen, belgium. vermunt, j., d., h., m., & van rijswik, f., a., w., m. (1988). analysis and development of students’ skill in self-regulated learning. higher education 17, 647–682. vermunt, j., d., & verloop n. (1999). congruence and friction between learning and teaching. learning and instruction 9, 257–280. wolters, c. a. (2003). understanding procrastination from a self-regulated learning perspective. journal of educational psychology, 95(1), 179–187. doi: 10.1037/0022-0663.95.1.179. wäschle, k., allgaier, a., lachner, a., fink, s., & nückles, m. (2014). procrastination and self-efficacy: tracing vicious and virtuous circles in self-regulated learning. learning and instruction, 29, 103 114. http://dx.doi.org/10.1016/j.learninstruc.2013.09.005. zimmerman, b. j. (1994). dimensions of academic self-regulation: a conceptual framework for education. in d. h. schunk d. h., & b. j. zimmerman (eds.) self-regulation of learning and performance. issues and educational applications (pp. 3–21). new jersey: lawrence erlbaum associates, inc., publishers. microsoft word bronkhorst & de kleijn_publication.docx frontline learning research vol.4 no. 3 (2016) 75 -‐ 91 issn 2295-‐3159 challenges and learning outcomes of educational design research for phd students dr. larike h. bronkhorst1 & dr. renske a.m. de kleijn utrecht university, the netherlands article received 10 october / revised 2 march / accepted 11 april / available online 17 may abstract educational design research (edr) is described as a complex research approach. the challenges resulting from this complexity are typically described as procedural, whereas edr might also be challenging for different reasons, specifically for early career researchers. yet, challenging experiences may be noteworthy in the process of learning to do research and becoming a researcher. to explore this issue further, we engaged in a collaborative self-study, and conducted a narrative cross-case analysis of two phd candidates’ experiences of engaging in edr, focusing on challenges and learning outcomes. we find indications that the challenges of edr might be related to edr’s relatively new and minority position in educational sciences and the role a (early career) researcher needs to assume in edr. retrospectively, the challenges appear closely related to learning outcomes, which are described in terms of a more profound understanding of research (quality) and of oneself as a researcher. as such, insights gained by self-study of research practices provide a complementary perspective to existing literature on edr and becoming a researcher. keywords: educational design research; phd learning; doctoral education; self-study 1 corresponding author: heidelberglaan 1, 3584 cs utrecht, utrecht, the netherlands, email: l.h.bronkhorst@uu.nl doi: http://dx.doi.org/10.14786/flr.v4i3.198 bronkhorst & de kleijn | f l r 76 1. introduction educational design research is described as a challenging research approach (e.g. collins, joseph, & bielaczyc, 2004). different aspects of this complexity are discussed in literature, typically tracing the origin of this complexity back to the multiple aims of educational design research, namely contributing to the general understanding of teaching and learning and creating a viable contextualized design to solve a local problem (anderson & shattuck, 2012). some stress how educational design research might be especially challenging for early career researchers (herrington, mckenney, reeves, & oliver, 2007), defined as researchers with up to ten years of experience (andres, bengtsen, castaño, crossouard, keefer, & pyhältö, 2015), including phd students. the challenges of educational design research for early career researchers are described as procedural, whereas case studies (e.g. akkerman, bronkhorst, & zitter, 2013) and experience suggest that educational design research might also be challenging for different reasons. while some studies suggest that such challenges can lead to phd students experiencing dissonance (e.g. wisker, robinson, trafford, creighton, & warns, 2003), others suggest that while challenges can be burdensome for phd students, they can also be experienced as empowering (stubb, pyhältö, & lonka, 2011), benefitting the learning process involved in becoming a researcher (hall & burns, 2009). being early career researchers with experience in educational design research, we conducted a selfstudy exploring phd candidates’ experiences with educational design research in terms of the challenges as well as the learning outcomes. self-study is an unconventional and relatively unknown method, gaining popularity in research on teacher education (zwart, smit, & admiraal, 2015) as a powerful way of providing insights complementary to those gained from other research methods (bullough & pinnegar, 2001; loughran, 2007). as such, this article can be appreciated as a potentially thought-provoking example of using self-study methodology for studying researcher practices, illustrating the methodology’s potential and pitfalls to critically analyse early career researchers’ developing research practices. 1.1 educational design research (edr) the origin of educational design research (edr) is often traced back to the works of brown (1992). different authors use different terms, including design research, design-based research (or the abbreviation dbr) and design experiments; also, the specific methods used in edr studies differ (engeström, 2011a; reeves, herrington, & oliver, 2005). barab and squire (2004, p.2) typify edr as “a series of approaches, with the intent of producing new theories, artifacts, and practices that account for and potentially impact learning and teaching in naturalistic settings”. in their review of the last decade of edr research, anderson and shattuck (2012) characterize edr as research situated in a real educational context, concentrating on testing a significant intervention, in collaboration with practitioner(s), informed by theories and an assessment of the local context as well as practices in other contexts. accordingly, edr uses mixed methods, involves multiple iterations to perfect the design, and requires a collaborative partnership between researcher(s) and practitioner(s), as it focuses on theory development and overcoming a problem in local practice, typically culminating in design principles. edr is generally acclaimed as it is assumed to have the potential to enhance theoretical knowledge development while also having public educational value (van den akker, 1999) and therefore resonates with wider calls for increasing the relevance and impact of educational research (anderson & shattuck, 2012). baptista, frick, holley, remmik, tesch, and åkerlind (2015) describe how these calls for relevance are also being voiced in relation to research conducted as part of phd dissertations, where the usefulness of the knowledge gained by the research is increasingly considered. increasing attention to and application of edr is demonstrated by the special issues devoted to this topic by leading educational research journals, including journal of the learning sciences (barab & squire, 2004), educational researcher (kelly, 2003) and educational psychologist (sandoval & bell, 2004). bronkhorst & de kleijn | f l r 77 more and more, edr is not only applauded, but also critically assessed (svihla, 2014). several authors have pointed at potential weaknesses in edr methodology (dede, 2004; shavelson, phillips, towne, & feuer, 2003), questioning edr’s potential to draw causal claims in natural settings and edr’s tendency to generalize small scale studies (kelly, 2004). akkerman, bronkhorst, and zitter (2013) maintain that pursuing concurrent goals in edr requires immediate and sometimes intuitive actions and decisions, as accepted systems of quality assurance are lacking. barab and squire (2004) highlight how edr requires researchers to carefully balance insider and outsider perspectives: “if a researcher is intimately involved in the conceptualization, design, development, implementation, and re-searching of a pedagogical approach, then ensuring that researchers can make credible and trustworthy assertions is a challenge” (p.10). additionally, several scholars have addressed the challenges of analysis in edr, given the large amounts of data that are usually collected (collins, joseph, & bielaczyc, 2004; kelly, 2004). herrington, mckenney, reeves, and oliver (2007) have argued that specifically for beginning researchers, edr might be too challenging. for one, the longitudinal nature of most design studies might extend the four-year period in which most phd students are expected to complete their studies (evans, 2010). but even if the data collection itself can be completed, the richness of the data collected might extend the time and especially the skill needed for analysis. these challenges are presented and interpreted as procedural and thereby manageable, as can be deduced from anderson and shattuck’s (2012) solution in terms of multi-year multi-actor research agendas. such solutions help advance edr, but fail to appreciate the experience of engaging in edr and its challenges for early career researchers. for instance, a case study of edr conducted by a phd candidate stresses the consequences in terms of feelings of insecurity and misfit of a seemingly procedural challenge—namely assuring quality in edr studies (akkerman et al., 2013). recently, castelló, kobayaski, mcginn, pechar, vekkaila, and wisker (2015) have called attention to feelings of misfit as a phd student, as they impact present and future professional aspirations and can lead students to abandoning the field of learning and instruction. others have also cautioned against the consequences of feelings of dissonance as a phd student (wisker et al., 2003). 1.2 learning to do research in contrast, lee and roth (2003) argue that the learning potential of research actually lies in working through the challenges engaging in research is likely to generate. others have also hinted at the learning potential of challenging circumstances for learning to do research, especially for early career researchers (e.g. haigh, 2012; hopwood, 2010). in such cases, learning is conceptualized as participation (gonzálezocampo et al., 2015) and learning outcomes are studied in terms of becoming a researcher, emphasizing not only the technical, but also the personal nature of learning to do research (barnacle, 2005). “transitioning […] to the role of researcher is not as simple as acquiring a new set of skills and expanding one’s knowledge of scholarship” (hall & burns, 2009, p.53). departing from this conceptualization, the biographical or narrative process of learning is usually studied, as well as the open-ended and potentially transformative outcomes for researcher identity of engaging in research (e.g. lee & roth, 2003). such studies are in line with widespread calls for more in-depth, longitudinal research to explore what it means to become a researcher (hall & burns, 2009; stubb et al., 2011). in terms of specific learning outcomes, herrington and colleagues (2007) also report potential learning outcomes of conducting edr specifically for phd students. most importantly, phd students can learn to see practitioners as partners in research instead of beneficiaries of the outcomes of their research, thereby potentially increasing the impact of educational research on educational practice. moreover, phd students might benefit from learning about the ways in which edr differs from other research methods. also stressing the importance of awareness of the implications of methodological decisions made during research, newbury (2002, p.156, emphasis added) states that “[p]erhaps most important is research students’ exposure to alternative approaches”. similarly, pallas (2001) emphasizes how preparing doctoral students for an essentially unpredictable future entails acquainting them with epistemological diversity as consumers bronkhorst & de kleijn | f l r 78 (i.e. reading research grounded in diverse epistemologies) as well as producers (i.e. engaging in research with diverse epistemologies). summarizing, despite the fact that the challenges and learning potential of engaging in edr for early career researchers are acknowledged, there is still a great deal to explore about the actual experience of engaging in edr. in this study, we explore what challenges and learning outcomes phd students experience when engaging in edr, as such insight could not only support early career researchers and their supervisors in making their edr studies successful, but also provide an in-depth insider perspective, informing a wider audience about what it means to become an (educational design) researcher. 2. methods 2.1 context of the study cognizant of the differences in early career researcher education across countries (andres, bengtsen, castaño, crossouard, keefer, & pyhältö, 2015), we detail the specific characteristics of phd trajectories in educational and learning sciences in the netherlands, where this study was conducted. first of all, phd trajectories are jobs; phd candidates do not pay tuition, but are paid for doing research. as such, the phrase ‘phd student’ is not used in dutch, as candidates are not seen as students, although they are supervised by at least one full professor and one so-called ‘daily supervisor’ – usually an assistant or associate professor. additionally, the phd dissertation consists of four semi-independent articles, written in english and preferably published internationally. hence, although coursework is included in this trajectory, it usually does not extend throughout the trajectory, as the focus of the four-year trajectory is the research project. consequently, a completed (research) master’s degree is a prerequisite to enter a four-year phd trajectory. 2.2 self-study given the exploratory and interpretative research aim, and conceptualizing phd learning as participation, we chose to conduct a collaborative self-study, also referred to as auto-ethnography (e.g. holt, 2003). self-study is typified as a methodology for studying professional practices that stems from a desire “to be more fully informed about the nature of a knowledge of practice” (loughran, 2007, p.14). in the field of learning and instruction, self-study is an approach mainly used in research on teacher education and teaching (zwart, smit, & admiraal, 2015). self-study has become recognized as a powerful methodology to promote critical reflective attitudes, understand the relationship between theory and practice more profoundly, and develop knowledge from an insider perspective (loughran, 2007; petrarca & bullock, 2014; williams & ritter, 2010). the increasing use of self-study, reflected in the creation of the self-study journal studying teacher education, can be understood in the light of current debates on methodology—more specifically, debates on how to take into account the contextualization of human thinking and acting, and the importance of both outsider and insider understanding in research on teacher education (e.g., hamilton, smith, & worthington, 2008) and in educational research in general (e.g., maxwell, 2004a). 2.3 participants the authors of this paper are the two participants in this self-study. for clarity, we use pseudonyms and refer to them in the third person, and only use ‘we’ to refer to ourselves as authors of this article. at the time of the study, both participants were in the penultimate year of their phd trajectory. they had started their phd trajectories at the same university department in 2008, having completed the same research bronkhorst & de kleijn | f l r 79 master’s degree in two consecutive cohorts. the full professors who supervised their phd research differed, but the phd candidates had the same daily supervisor. mary was 25 years old at the time of data collection. in her edr study, she originally aimed to design a tool to support the goal-relatedness of the master’s thesis supervision by conducting group discussion meetings and individual interviews with five supervisors with a locally good reputation (see also de kleijn, meijer, brekelmans, & pilot, 2015). erica was 28 years old. she studied how student teachers’ meaning-oriented learning and deliberate practice could be fostered by collaboratively re-designing two year-long courses in the teacher education program with two pairs of teacher educators, based on design principles developed in prior research2 (see also bronkhorst, meijer, koster, & vermunt, 2011; bronkhorst, meijer, koster, akkerman, & vermunt, 2013). mary and erica informally discussed their progress in their edr studies and it seemed that they had quite different experiences, which they found striking given their similar background. this triggered the desire for a more systematic exploration of their experiences in edr by means of a collaborative self-study. 2.4 interviews we assumed that a probing interview might help in the explication of challenges and learning outcomes, as self-narratives have been shown to be a powerful methodology for self-studies (haigh, 2012). such interviews require well established interview skills and can benefit from the interviewer and interviewee being acquainted (lichtman, 2006). therefore, the interviews were conducted by their daily supervisor, christine, as she knew both mary and erica and had extensive experience in conducting qualitative research in general and open interviews specifically. christine was asked to conduct individual in-depth interviews with the phd candidates revolving around broadly defined topics: (1) the edr research that they had conducted; (2) the challenges they had experienced in their research and how they had dealt with these; and (3) the learning outcomes in the process of becoming a researcher that they attributed to engaging in edr. these themes were to be addressed longitudinally (i.e. in terms of their development over time). as a potential learning outcome of engaging in edr concerns a different relationship with participants (herrington et al., 2007), mary and erica had interviewed the participants in their edr studies upon completing their studies. selected fragments concerning the edr participants’ perception of the phd candidates from these interview were provided to christine, to inform her about the participants’ perspectives. christine designed an open interview structure (see table 1) adhering to this input and these guiding principles. she explicitly used her knowledge of the phd candidates to have them explicate more. both interviews lasted about an hour and a half and were fully transcribed. 2 although departing from design principles, the approach to the collaborative design in erica’s study can be characterized as a formative intervention (see also bronkhorst et al., 2013; penuel, 2014). bronkhorst & de kleijn | f l r 80 table 1. interview themes and example questions interview themes example questions engaging in edr can you explain your reasoning for designing the study the way you did? would you (still) call it (design) research and why? challenges experienced i can recall that this was challenging, at times. can you tell me some more about that? how did you deal with this challenge? was this a conscious choice? becoming a researcher how would you describe yourself as a researcher? what did you learn by engaging in edr? additional probes used for all themes would you do/have done it differently in the future/past? can you give an example? 2.5 analysis a cross-case analysis of experiences of engaging in edr was performed. this was preceded by a within-case, narrative, connective analysis (maxwell, 2004b) of each phd candidate individually. first, drawing on critical incident technique (meijer, de graaf, & meirink, 2010), in both interviews fragments were selected where challenges and/or (learning) outcomes were substantially discussed. we verified our selections by scrutinizing the transcripts for words that indicated emotions (e.g. ‘doubt’), struggles (e.g. ‘difficult’) and/or words that indicated changes (e.g. ‘different’) or time differences (e.g. ‘now’). secondly, we traced each of these key experiences backwards and forwards: in the transcripts, we identified the processes by which they came about and how they subsequently developed, as well as factors or individuals that had influenced their origin, development or outcome. in order to triangulate the findings from these interviews with the phd candidates, the interviews with the participants of both phd candidates’ edr studies were also scrutinized for confirming and disconfirming evidence. thirdly, we compared and discussed our individual findings from the previous steps until a consensus on the relevant themes and their relationships was reached. the quality of this last step was enhanced by a ‘peer-debriefing’ (guba, 1981), in which a colleague, unfamiliar with the study, read the data and analysis and critically questioned the initial findings. based on these steps, we created two descriptions which are presented chronologically in the results section in order to increase legibility and understanding for readers. these descriptions are based on and contain illustrative quotes from the interview transcripts of the interviews with the phd candidates and of the data from their participants. we used these descriptions for the cross-case analysis. the cross-case analysis was sensitized by our theoretical framework, focusing on developing views on edr methods and quality, the role of a researcher in edr, and the process of becoming a researcher. bronkhorst & de kleijn | f l r 81 3. results 3.1 mary although an edr study had been included in mary’s research plan from the start, she kept postponing it. her hesitation mainly resulted from the lack of guidelines, protocols or general conventions mary found in the literature on edr. she herself considered reliability, in terms of reproducibility, a key quality criterion for scientific research, for which clear and shared conventions were necessary. she generally preferred to be in control of what she was doing: “this is what makes me insecure and creates chaos in my head. because there are no guidelines to hold on to and ‘everything goes’. and that i find very difficult. i obviously need boundaries and limitations.” additionally, at that time she thought research was about answering questions and proving or demonstrating theories. yet, when an edr study was no longer necessary for completing her thesis, she made the conscious decision to engage in edr. she knew that edr was well outside her comfort zone and thus would be fairly challenging for her, but she wanted to be(come) an all-round researcher and, being a phd student, she counted on her supervisors’ support. for her edr study, mary invited expert thesis supervisors to three collaborative design meetings. she prepared these meetings extensively, but she reasoned that she could not completely control (“board up”) the research, as she sought her participants’ expertise on the topic of thesis supervision, for which she needed their ownership, expertise and creativity. this conscious lack of control over how the process unfolded was a recurring challenge for her, before and between these meetings. she also experienced an ethical dilemma in relation to the time investment she asked of her participants: “how will i ever tell them that they invested three times two hours, almost an entire working day? how will i ever tell them that i am not able to write [an article] about it? that it eventually does not lead to something that will be part of my dissertation. that was the biggest stress [factor], so to say.” during the collaborative design meetings, she was immediately confronted with things that did not go as she had imagined, but she surprised herself by being able to adapt to unexpected circumstances successfully: “i thought: ‘i want to understand what is happening here.’ […] at that time i did not worry at all, as in: ‘gosh, where is this going?’ it was more in looking back and when i started preparing the next meeting, that i panicked. because that brought me back in the research mind set. but during the meetings i was mainly curious.” the fact that she saw how she could relate her participants’ contributions to the literature greatly supported this process, as did their general enthusiasm: “they were also really captivated by the issue and also appreciated discussing it so much that my fear of ‘they are wasting their time here’ lessened a little. and i also really observed during those meetings that [engaging in the study] brought about things for them as well.” the planning, time-management and enthusiasm was also recognized by one of her participants, who indicated in the final interview: “well, i really liked it, the way that you handled it. i mean, you are enthusiastic but also focused and flexible in the way you handled things. i found that very pleasant.” (edr participant) these meetings, the data collection, became a collaborative exploration and mary’s role as a researcher changed in this process; she was seen more as an expert on the literature of supervision, which contrasted with her earlier experiences with questionnaires and interviews, where she felt like a “nitwit”. bronkhorst & de kleijn | f l r 82 yet, it also meant that sometimes others took on the leadership role and determined the course of action in the meetings. moreover, mary did not only ask questions, but was also asked questions in return. she saw this as a sign of ownership on the part of her participants, necessary for the research, and welcomed it as such. looking back, engaging in edr brought about a number of changes for mary. first, her view on research and research quality has shifted. she would no longer claim reproducibility is a key criterion for scientific research. instead, transparency has become crucial and the dialogue with theory is now her first and foremost connotation with scientific research. consequently, she would design future studies differently, leaving room to deviate from her original plan. similarly, she would now say that research is about understanding and asking questions, rather than answering them. by engaging in edr, she now has a clearer view on it, but she still perceives confusion between different perspectives on edr: “i kind of have the impression that we have yet to agree on what [educational design research] actually is. or at least that there are a lot of different perspectives on it. so it has become clearer for me what i would consider good design research.” as her edr was actually a quest for understanding and did not result in a design, she doubts whether she has actually engaged in edr, an idea with which she now feels comfortable: “i don’t mind that at all. that was not my first priority, or goal. i consider such a design a means. what this resulted in—what i had not imagined in advance—is the [increased] understanding.” now, she would question the time investment asked of participants in completely controlled research: “can you ask people to…what does it entail to ask people to participate in such a highly structured interview, in which there is hardly any room for their own input? in such a way that you as a researcher are only ‘taking’?” moreover, she now sees the dialogue with theory, present already during data collection, as a key characteristic for research quality in general. these shifting perspectives also concern how she sees herself as a researcher. she no longer believes in the need to choose between the paradigms or ‘teams’ she perceived before, but feels comfortable as a multi-faceted and most of all curious researcher. she prefers certain types of research, including a preference for control and statistics, without considering these to be better types of research. “i immediately get the jibbers when i am assigned to a team, whether that is qualitative or quantitative. i like to think that the [research] question is leading, so to speak. the research design is then a means to answer the question, to put it like that.” she now knows she is capable of doing different types of research to satisfy her curiosity. 3.2 erica erica’s research goal was to study an intervention in educational practice, by having the educators (i.e. the participants in her study) experiment and explore the effects of that experimentation. the two main guiding principles for her research were that, first, educators should have agency in shaping educational design, as she believed that researcher control is not desirable nor possible in general, and, second, that the agency and expertise of the educators would actually make the design better. “so in that sense, i hoped that they would try out things for me, but i also hoped that the things they would then try out would be enriched with what they knew. so not just [based on] my theoretical knowledge.” bronkhorst & de kleijn | f l r 83 based on the literature she had read about edr, she thought her research would be a somewhat structured research endeavour. she started her project combining ideas resulting from her master’s with her intuitive ideas about how “the world works”. initially, these ideas were almost contradictory. “if you look at my research design, it really falls between two stools. the one is how i was educated, with large scale and more objective instruments on which i would have no further influence. and [on the other hand] apparently my intuitive ideas about how such a thing works and which data you need to collect for that.” in her research design, she mainly attributed the valuable expertise required in this research to the educators, as she felt she had very little knowledge of the educational context, also calling herself a “nitwit”. upon engaging in the research, erica noticed how the process took its own turn, which she— contrary to her expectations—could not really characterize as structured nor cyclical, because every decision was grounded in a prior decision. moreover, she noticed how she herself felt reluctant to exert agency. at some point, her participants asked her to share more of her knowledge and expertise, as they indicated in their interview: “at some point we said: […] ‘we want to hear your opinion. […] especially [as it is] a different perspective. that is the added value.’” (edr participant) this made her realize that her research would benefit from combining different expertise, including her own. she became more proactive and felt more at ease with steering the collaborative design according to her own agenda. she noticed how she recognized possibilities in which she could influence the educators’ engagement, relying on communicative techniques she normally did not associate with research (e.g. purposeful small talk) and which she had acquired elsewhere. “i had not considered [….] that i would apply those things. and that i would consider that they are part of research, [which] i had not really imagined. i thought it would be […] much less interaction, or actually, much less personal.” this led to ethical dilemmas as she questioned if it “was allowed” to act in this way deliberately in research. “but because i feel as though the values of education and research are different, and people always think that research is objective, and spotless and ethical and responsible, that makes it feel worse when you apply these [communicative techniques] as a researcher.” all in all, this meant that her research design relied more and more on her intuition and less on what she assumed other educational researchers would prefer. intuitively she thought her research “could not be done differently”, but rationally she feared the response of the educational and learning sciences community at large. especially at conferences, erica often realized how her research differed from research presented: “actually at each research conference i attended, where a different type of research was presented, each time i thought: ‘i do that completely differently…why do i do that so differently?’ also because i had trouble indicating what it was exactly that i was doing, and why i thought it was important.” this in turn led to doubts about herself as a researcher, which she hid from her supervisors, along with specific details about what she was really doing in her research. “during the year i had my doubts [about] if i was a researcher after all, or someone who participated. or someone who was very meaningful for the educators, but who doesn’t amount to anything in the research context.” her participants recognized these doubts, as they indicated during their interview: “in that sense, i did not envy her [erica], the past year. i was aware that she put herself in a difficult position, by choosing this [research] approach. […] because it means that she has a lot to legitimize in the research domain.” (edr participant) bronkhorst & de kleijn | f l r 84 in dealing with these challenges, she relied on two strategies. first, she made an effort to explicate her intuition(s), which was rather implicit at the start of her study. putting her rationale for the research in writing enabled her to have faith in the study’s value for research alongside its value for educational practice. secondly, she purposefully shared this vision, first at conferences and later also with her supervisors. the feedback was positive, which made her trust her intuition more. “in that sense the aera was also important. [...it was where] the keynote of engeström [2011b] took place and then i felt supported in: okay, it might not be common what i assert, but there are people who’d like to hear about it.” as such, she feels that perhaps her vision on research has not changed much, but she now understands how more collaborative research designs, and the active role assigned to participants in these, benefit research next to educational practice, which was a crucial outcome for her. in terms of research and research quality, she assigns increased importance to (ecological) validity: measuring what one intends to measure in context. she describes how she might not feel comfortable doing educational research in decontextualized settings any more. in general, she would now argue that research control over a situation, often employed to yield reliability, might produce unnatural behaviour. she prefers to rely more on transparency and the available knowledge on human behaviour in designing her studies: “in that sense i started thinking less about how a research intervention ought to be and more about, ‘what do we know about how people learn?’” looking back, erica now thinks that her perspective on edr or intervention research in general may not be shared by all, but is supported by some whom she holds high. consequently, she would now call herself a researcher, albeit one with a specific perspective, namely: ‘we have as much to learn from practice as they do from us’. this in turn has increased her confidence towards the research community and her participants. 3.3 cross-case comparison 3.3. 1 challenges of edr mary and erica both describe how engaging in edr was challenging, especially with respect to two archetypal characteristics of edr: the cyclical process, including multiple iterations, and the relationship with participants. both describe how their experience with edr differs from how edr is presented in the literature, specifically as being simplified and structured. mary had expected this in advance, which was one of the main reasons for postponing her edr study, whereas erica was somewhat surprised, despite the fact that the complexity of edr is described in the literature. apparently, reading about edr’s complexity— including the necessity of ongoing decisions and the challenges of balancing the multiple stakes and stakeholders involved—does not fully account nor prepare for the experience of edr. only when actually engaging in the research did it become clear to both phd students what a cyclical research process entailed, indicating that learning by doing seems necessary for learning to conduct edr. despite having taken rather different approaches, erica and mary both experienced ethical dilemmas with respect to their role in relation to the role of the participants. the nature of their dilemmas differed. erica struggled with the fact that she purposely invested in social interaction with her participants in order for them to be committed to her study, which she found ethically questionable in conducting research. the participants in her edr study do not mention having noted such behaviour when describing the working relationship, which they felt was appropriate. mary, on the other hand, struggled with the fact that her participants invested time in her study while she herself was not even sure about whether the study would result in a chapter in her dissertation. in contrast, the participants in her edr study only mentioned how inspiring their participation in the edr study had been for them. bronkhorst & de kleijn | f l r 85 3.3.2 becoming a researcher edr necessitated dealing with challenges and the resulting feeling of misfit. mary had anticipated that she personally would not be a good fit with edr; she questioned whether she would be able to cope with scant guidelines about how to carry out the study, mainly in light of her prior experience with more controlled research and personal preference for structured activities. erica experienced a lack of fit between her edr approach and what she considered to be the “general research community” as she questioned whether her collaborative approach to research, initially mainly informed by previous experiences outside academia, would be accepted and understood by the general educational research community, where other standards appeared to exist. conducting their edr studies involved working through these challenges and dealing with their insecurities, thereby fuelling reflections on becoming and being a researcher. ironically in light of the insecurities, the most salient outcome is that in the end both phd students describe a development from seeing themselves as ‘nitwits’ or novice researchers to being seen as an expert in their specific fields of study. in both cases, the participants in their edr studies played an important part in the transition, as they explicitly asked the phd candidates to take an expert role and to share their knowledge, even before the candidates themselves felt comfortable doing so. 3.3.3 quality of educational research next to reflections on their own expertise, the challenges experienced in engaging in edr triggered contemplations on the quality of educational research. both phd candidates described that after having engaged in edr, they regarded transparency as one of the most important criteria for educational research, more than replicability. this appeared to result from the fact that replicability was impossible to achieve in edr, which would imply that their edr was not scientific. yet, in hindsight, the phd candidates do not question their research, as the ongoing dialogue with theory and the general quest for understanding made it scientific, in their opinion (and the acceptance of their papers in international journals supports that assertion). both phd candidates do, however, question the feasibility of replicability when it is operationalized in terms of complete researcher control. 4. conclusion and discussion this study departed from contrasting findings of previous studies concerning the challenges of phd students when engaging in educational design research (edr). for one, the challenges of edr are described as procedural, but also as fundamental in the literature. moreover, the consequences of dealing with such challenges are evaluated as burdensome, but also as empowering, benefitting the process of becoming a researcher. to explore these contrasting findings, this explorative study examined the experiences of phd candidates engaging in edr, focusing on challenges and learning outcomes. our findings show that the challenges experienced concerned two typical aspects of edr, namely the cyclical nature of an edr process and the role of participants (e.g., collins et al., 2004). more specifically, our analysis shows how the cyclical nature of edr required ongoing decisions within limited time. yet, it was not the procedural aspect of making these decisions, but the candidates’ awareness of the significant implications of these decisions for the research (cf. newbury, 2002) that was experienced as challenging. similarly, our findings reaffirm the potential edr to support phd students in learning to see practitioners as partners in research, rather than beneficiaries of the outcomes of their study (herrington et al., 2007). yet, for the phd candidates not interacting with participants, but balancing the multiple stakes and stakeholders involved – including the research community was troubling and caused a lot of doubt and insecurity. it follows that while the iterative nature and the role of the participants are manageable for phd bronkhorst & de kleijn | f l r 86 candidates engaging in edr, the (perceived) conflicts with accepted quality standards of research can make them disruptive (as suggested by akkerman, et al., 2013). the phd candidates’ experiences in edr contained both ambiguity and novelty, a combination that often proves to be quite challenging (cf. wisker et al., 2003). however, the analysis illustrates how working through the challenges actually made the engagement in edr educative, as this necessitated learning about alternative perspectives on and approaches to research and research quality. this resulted in a more elaborated perspective on edr and research quality in general, and a more pronounced understanding of what kind of researcher the phd candidates hoped to become. finding that the learning outcomes are intertwined with the challenges extends previous findings on phd learning, wherein phd experiences were categorized as burdensome or empowering (stubb et al., 2011), by illustrating that burdensome experiences can become empowering over time. more specifically, our findings show that in the end both phd candidates describe having developed a more refined perspective on research quality, alongside a new understanding of what it means to be a researcher. the latter finding is in line with a conceptualization of phd learning as a process of participation (gonzález-ocampo et al., 2015) and learning outcomes in terms of becoming (hall & burns, 2009). this resonates with pallas (2001) who, among others, stresses the importance of learning about and from epistemological diversity in research preparation. hall and burns (2009, p.61) draw attention to how learning about multiple and even contrasting epistemological approaches can inform phd students about the researcher they want to become, as “becoming a professional researcher requires students to negotiate new identities and reconceptualize themselves both as people and professionals in addition to learning specific skills” (p.49). generally speaking, the results of this study indicate that describing the challenges of edr for early career researchers in terms of the necessary time and technical skills needed for the data collection and analysis, is insufficient. alternatively, we would ascribe at least some of edr’s complexity and learning potential for phd students to two other aspects: edr’s relatively new and minority position in educational sciences, and the role a researcher needs to assume to make edr a success. first, edr was not a mainstream research design in the netherlands at the time of study and was only marginally discussed in doctorate curricula (cf. wilhelm, craig, glover, allen, & huffman, 2000, for a similar discussion of qualitative research). the phd candidates therefore did not have experience in designing, conducting or even reading about edr. moreover, some edr characteristics differ from what the phd candidates learned about research, as edr aims to generate hypotheses rather than to prove them; is interactive rather than objective or distant; and relies on emergent instead of completely controlled designs (collins et al., 2004). secondly, the relationship with the participants in edr also differs in many respects from other types of research, notably from those in which these phd candidates were educated. the relationship with participants is a cornerstone of edr and required mary and erica to take on a specific role as a researcher. our findings indicate that two aspects of this role can be quite challenging. first, in edr the participants and researchers are assumed to collaborate. it follows that part of the control over the research process is handed over to the participants (edwards, sebba, & rickinson, 2007). sharing the control over the research process is precisely what differentiates interventions that seek to build on participant agency and those that solely recognize researcher expertise, according to engeström (2011a). as the implications of shared research control have only recently become topic of debate, the phd candidates had few examples or guidelines to build on. secondly, the participants acknowledged and addressed the phd researchers as knowledgeable or even experts on their research topic. this expert role not only disagreed with the phd candidates’ perception of themselves as novices and their role as junior in the research domain, it it also implied that they as researchers might have had influence over the research content – something they have learned to avoid at all cost. these differences in perspectives highlight the merit of checking participant perspectives on the research, also as a way of quality assurance (see also bronkhorst et al., 2013). additionally, our analysis suggests that not only the causes, but also the experience of edr’s complexity, especially for early career researchers, is not fully represented entirely in the literature. the edr literature describes how edr is and should be a cyclical process and a collaborative exploration for both bronkhorst & de kleijn | f l r 87 researcher and participants, but the specifics of such processes and collaborations are typically not discussed extensively. akkerman and colleagues (2013) even suggest that there might be a difference between how edr studies are reified as structured while “a lot of design researchers in praxis act differently” (p.422). for those learning to do research, being aware of potential differences between edr practice and its reification might prove valuable. 4.1 limitations in light of calls for more in-depth research (stubb et al., 2011), attending to experiences of engaging in edr from a contextualized insider perspective was our main argumentation for choosing to conduct a self-study. while our findings highlight self-study’s possibilities for increased understanding of research practices, there are some pitfalls that deserve attention. for one, seeing as the content of the interviews was partly determined by the phd candidates, ways in which edr was not experienced as challenging or did not carry learning potential have received limited attention. in terms of interpretation of findings, distinguishing personally relevant findings from findings relevant for the field requires an outsider perspective in self-study. therefore, we asked a colleague to be involved in verifying our analysis, supporting us in being less biased. seeing as the anonymous reviewers also played a vital role in interpreting our findings from a wider perspective, we underscore the importance of debating self-study methods and findings publicly—in line with calls from the self-study community of teacher education (see, for instance, loughran, 2007). 4.2 implications the findings illustrate how the experience of engaging in research in general, and edr in particular, deserves more attention and support, which holds significant implications for designing and supervising early career researchers. in both cases studied, the edr experience evolves from challenging to carrying learning potential, but not without effort. therefore, for those debating whether or not edr is (too) challenging for phd students, we would like to stress how our study shows that engaging in edr can challenge phd students to develop and extend their methodological competence, to adapt and develop methodologies and perceive methodology as a field of study in itself. few would not consider these to be valuable outcomes. evans (2010), for instance, holds that they align with an extended understanding of professionality of early career researchers and newbury (2002) claims that they lie at the core of methodological reflexivity, which he considers key to researcher preparation. however, for these challenges to be educative, our findings suggest there needs to be space, not only in terms of time and resources, but also conceptually and methodologically, and support for phd students to work through such challenges. unfortunately, such space is not always available in our age of efficiency and accountability (biesta, 2010). in terms of support, our findings also draw attention to how engaging in edr plays out differently in light of phd students’ participation in other communities, echoing the results of castelló and colleagues (2015). for supervisors, it is important to realize that conducting edr as an early career researcher brings about a range of insecurities and challenges that can differ depending on researcher personalities, previous experiences with research, and participation in other communities. given the important role supervisors play in facilitating the successful completion of a phd trajectory (gonzález-ocampo et al., 2015), we suggest that supervisors need to adapt their supervision to the specific challenges that phd students experience, referred to as “adaptivity” in the context of in the context of master’s thesis supervision (de kleijn, bronkhorst, meijer, pilot, & brekelmans, 2014). yet, phd students and supervisors should not avoid insecurities altogether, as they can carry learning potential. a final implication concerns the methodology used in our study. typically, the professional practices under scrutiny in self-study concern teaching, with the exception of studies focusing on self-study as a method. there are very few examples in which research practices are the object of study (for an example, see bronkhorst, van rijswijk, meijer, koster, & vermunt, 2013). in research on teacher education, self-study bronkhorst & de kleijn | f l r 88 has become recognized as a powerful methodology for teachers and teacher educators to promote local professional (knowledge) development, as well as develop relevant knowledge for the field from an insider perspective (petrarca & bullock, 2014; williams & ritter, 2010). our experiences in this study suggest that invoking self-study to understand research practices can enrich our understanding in more than one way. not only by means of our findings, foregrounding aspects of engaging in edr undisclosed in the literature, but also by critically analysing and openly discussing the challenges involved in conducting research and becoming an educational researcher. keypoints studies phd students’ experiences of engaging in educational design research. finds specific edr challenges and learning outcomes pertaining to early career researchers. relates these challenges to edr’s position in educational sciences. advances self-study as a way for phd students to learn about research practices. acknowledgements the authors would like to thank prof. dr. paulien meijer for her role in data collection and dr. harmen schaap for his valuable help with the data analysis. references akkerman, s. f., bronkhorst, l. h., & zitter, i. (2013). the complexity of educational design research. quality & quantity, 47(1), 421-439. doi: 10.1007/s11135-011-9527-9 anderson, t., & shattuck, j. (2012). design-based research: a decade of progress in education research? educational researcher, 41(1), 16-25. doi: 10.3102/0013189x11428813 andres, l., bengtsen, s. s., castaño, l. g., crossouard, b., keefer, j. m., & pyhältö, k. (2015). drivers and interpretations of doctoral education today: national comparisons. frontline learning research, 3(3), 1-18. doi: 10.14786/flr.v3i3.177 barab, s., & squire, k. (2004). design-based research: putting a stake in the ground. the journal of the learning sciences, 13(1), 1-14. doi: 10.1207/s15327809jls1301_1 barnacle, r. (2005). research education ontologies: exploring doctoral becoming. higher education research & development, 24(2), 179-188. doi: 10.1080/07294360500062995 baptista, a., frick, l., holley, k., remmik, m., tesch, j., & åkerlind, g. (2015). the doctorate as an original contribution to knowledge: considering relationships between originality, creativity, and innovation. frontline learning research, 3(3), 51-63. doi: 10.14786/flr.v3i3.147 biesta, g. j. (2010). why ‘what works’ still won’t work: from evidence-based education to value-based education. studies in philosophy and education, 29(5), 491-503. doi: 10.1007/s11217-010-9191-x bronkhorst, l. h., meijer, p. c., koster, b., & vermunt, j. d. (2011). fostering meaning oriented learning and deliberate practice in teacher education. teaching and teacher education, 27, 1120-1130. doi: 10.1016/j.tate.2011.05.008 bronkhorst, l. h., meijer, p. c., koster, b., akkerman, s. f., & vermunt, j. d. (2013). consequential research designs in research on teacher education. teaching and teacher education, 33, 90-99. doi: 10.1016/j.tate.2013.02.007 bronkhorst & de kleijn | f l r 89 bronkhorst, l. h., van rijswijk, m. m., meijer, p. c., koster, b., & vermunt, j. d. (2013). university teachers’ collateral transitions: continuity and discontinuity between research and teaching. infancía y aprendizaje, 36, 293-308. doi: 10.1174/021037013807532972 brown, a.l. (1992). design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. journal of the learning sciences, 2(2), 141-178. doi: 10.1207/s15327809jls0202_2 bullough, r. v., & pinnegar, s. (2001). guidelines for quality in autobiographical forms of self-study research. educational researcher, 30(3), 13-21. doi: 10.3102/0013189x030003013 castelló, m., kobayaski, s., mcginn, m., pechar, h., vekkaila, j., & wisker, g. (2015). researcher identity in transition: signals to identify and managesspheres of activity in a risk-career. frontline learning research, 3(3), 35-50. doi: 10.14786/flr.v3i3.149 collins, a., joseph, d., & bielaczyc, k. (2004). design research: theoretical and methodological issues. journal of the learning sciences, 13(1), 15-42. doi: 10.1207/s15327809jls1301_2 dede, c. (2004). if design-based research is the answer, what is the question? a commentary on collins, joseph, and bielaczyc; disessa and cobb; and fishman, marx, blumenthal, krajcik, and soloway in the jls special issue on design-based research. journal of the learning sciences, 13(1), 105-114. doi: 10.1207/s15327809jls1301_5 edwards, a., sebba, j., & rickinson, m. (2007). working with users: some implications for educational research. british educational research journal, 33(5), 647-661. doi: 10.1080/01411920701582199 engeström, y. (2011a). from design experiments to formative interventions. theory & psychology, 21, 598628. doi:10.1177/0959354311419252 engeström, y. (2011b). intervening to shape the future. keynote given at the annual aera conference, new orleans, lo. evans, l. (2010). developing the european researcher: ‘extended’ professionality within the bologna process. professional development in education, 36, 663-677. doi:10.1080/19415251003633573 gonzález-ocampo, g., kiley, m., lopes, a., malcolm, j., menezes, i., morais, r., & virtanen, v. (2015). the curriculum question in doctoral education. frontline learning research, 3(3), 19-34. doi: 10.14786/flr.v3i3.191 guba, e. g. (1981). criteria for assessing the trustworthiness of naturalistic inquiries. educational technology research and development, 29(2), 75-91. doi:10.1007/bf02766777 haigh, n. (2012). historical research and research in higher education: reflections and recommendations from a self-study. higher education research & development, 31(5), 689-702. doi: 10.1080/07294360.2012.689955 hall, l., & burns, l. (2009). identity development and mentoring in doctoral education. harvard educational review, 79(1), 49-70. doi: 10.17763/haer.79.1.wr25486891279345 hamilton, m. l., smith, l., & worthington, k. (2008). fitting the methodology with the research: an exploration of narrative, self-study and auto-ethnography. studying teacher education, 4(1), 17-28. doi: 10.1080/17425960801976321 herrington, j., mckenney, s., reeves, t., & oliver, r. (2007). design-based research and doctoral students: guidelines for preparing a dissertation proposal. in c. montgomerie, & j. seale (eds.), proceedings of edmedia 2007: wolrd conference on education multimedia, hypermedia & telecommunications (pp. 4089-4097). chesapeake, va: aace. holt, n. l. (2003). representation, legitimation, and autoethnography: an autoethnographic writing story. international journal of qualitative methods, 2(1), 18-28. doi: 10.1177/160940690300200102 hopwood, n. (2010). doctoral students as journal editors: non-formal learning through academic work. higher education research & development, 29(3), 319-331. doi: 10.1080/07294360903532032 kelly, a. e. (2003). theme issue: the role of design in educational research. educational researcher, 32(1), 3-4. doi: 10.3102/0013189x032001003 kelly, a. e. (2004). design research in education: yes, but is it methodological? journal of the learning sciences, 13(1), 115-128. doi: 10.1207/s15327809jls1301_6 bronkhorst & de kleijn | f l r 90 kleijn, r. a. m. de, bronkhorst, l. h., meijer, p. c., pilot, a., & brekelmans, m. (2014). understanding the up, back, and forward-component in master's thesis supervision with adaptivity. studies in higher education, 1-17. doi: 10.1080/03075079.2014.980399 kleijn, r. a. m. de, meijer, p. c., brekelmans, m., & pilot, a. (2015). adaptive research supervision: exploring expert thesis supervisors' practical knowledge. higher education research & development, 34(1), 117-130. doi: 10.1080/07294360.2014.934331 lee, s., & roth, w. (2003). becoming and belonging: learning qualitative research through legitimate peripheral participation. forum: qualitative social research, 4(2), available online at: http://www.qualitative-research.net/index.php/fqs/article/view/708 (accessed october 26, 2012). lichtman, m. (2006). qualitative research in education. a user's guide. thousand oaks, london, new delhi: sage publications, inc. loughran, j. (2007). researching teacher education practices responding to the challenges, demands, and expectations of self-study. journal of teacher education, 58(1), 12-20. doi: 10.1177/0022487106296217 maxwell, j. a. (2004a). causal explanation, qualitative research, and scientific inquiry in education. educational researcher, 33(2), 3-11. doi: 10.3102/0013189x033002003 maxwell, j. a. (2004b). using qualitative methods for causal explanation. field methods, 16(3), 243-264. doi:10.1177/1525822x04266831 meijer, p. c., de graaf, g., & meirink, j. (2011). key experiences in student teachers’ development. teachers and teaching: theory and practice, 17(1), 115-129. doi: 10.1080/13540602.2011.538502 newbury, d. (2002). doctoral education in design, the process of research degree study, and the ‘trained researcher’. art, design and communication in higher education, 1(3): 149-159. doi: 10.1386/adch.1.3.149 pallas, a. m. (2001). preparing education doctoral students for epistemological diversity. educational researcher, 6-11. doi: jstor.org/stable/3594455 penuel, w. r. (2014). emerging forms of formative intervention research in education. mind, culture, and activity, 21(2), 97-117. doi: 10.1080/10749039.2014.884137 petrarca, d., & bullock, s. m. (2014). tensions between theory and practice: interrogating our pedagogy through collaborative self-study. professional development in education, 5, 265-281. doi:10.1080/19415257.2013.801876 reeves, t. c., herrington, j., & oliver, r. (2005). design research: a socially responsible approach to instructional technology research in higher education. journal of computing in higher education, 16(2), 96-115. doi:10.1007/bf02961476 shavelson, r. j., phillips, d. c., towne, l., & feuer, m. j. (2003). on the science of education design studies. educational researcher, 32(1), 25-28. doi: 10.3102/0013189x032001025 sandoval, w. a., & bell, p. (2004). design-based research methods for studying learning in context: introduction. educational psychologist, 39(4), 199-201. doi:10.1207/s15326985ep3904_1 svihla, v. (2014). advances in design-based research. frontline learning research, 2(4), 35-45. doi: 10.14786/flr.v2i4.114 stubb, j., pyhältö, k., & lonka, k. (2011). balancing between inspiration and exhaustion: phd students' experienced socio-psychological well-being. studies in continuing education, 33(1), 33-50. doi: 10.1080/0158037x.2010.515572 van den akker, j. (1999). principles and methods of development research. in j. van den akker, n. nieveen, r. m. branch, k. l. gustafson & t. plomp (eds.), design methodology and developmental research in education and training (pp. 1-14). the netherlands: kluwer academic publishers. wilhelm, r. w., craig, m. t., glover, r. j., allen, d. d., & huffman, j. b. (2000). becoming qualitative researchers: a collaborative approach to faculty development. innovative higher education, 24(4), 265-278. doi: 10.1023/b:ihie.0000047414.56668.5b bronkhorst & de kleijn | f l r 91 williams, j., & ritter, j. k. (2010). constructing new professional identities through self-study: from teacher to teacher educator. professional development in education, 36, 77-92. doi:10.1080/19415250903454833 wisker, g., robinson, g., trafford, v., creighton, e., & warnes, m. (2003). recognising and overcoming dissonance in postgraduate student research. studies in higher education, 28(1), 91-105. doi: 10.1080/03075070309304 zwart, r. c., smit, b., & admiraal, w. f. (2015). a closer look at teacher research: a review study into the nature and value of research conducted by teachers. pedagogische studien, 92(2), 131-149. frontline learning research 5 special issue „learning through networks‟ (2014) 4 14 issn 2295-3159 corresponding author: nino pataraia, 58 port dundas road, g4 0hg, glasgow, uk, nino.pataraia@gcu.ac.uk doi: http://dx.doi.org/10.14786/flr.v2i2.89 4 | f l r ‘who do you talk to about your teaching?’: networking activities among university teachers nino pataraia a , isobel falconer a , anoush margaryan a , allison littlejohn a , sally fincher b a caledonian academy, glasgow caledonian university, glasgow, scotland, uk b university of kent, kent, england, uk nino pataraia, 58 port dundas road, g4 0hg, glasgow, uk, nino.pataraia@gcu.ac.uk article received 15 february 2014 / revised 26 april 2014 / accepted 28 june 2014 / available online 15 july 2014 abstract as the higher education environment changes, there are calls for university teachers to change and enhance their teaching practices to match. networking practices are known to be deeply implicated in studies of change and diffusion of innovation, yet academics’ networking activities in relation to teaching have been little studied. this paper extends the current limited understanding, building on roxå and mårtensson’s work (2009) and extending it from sweden to the uk and usa. it is based on two separate studies, one from the share project led by the university of kent, and one from glasgow caledonian university, exploring the composition of personal networks, and the characteristics of interactions in order to understand the networking practices which may support change of teaching practice. we conclude that academics’ personal teaching networks are mainly discipline-specific and strongly localised. this contrasts with the research networks found by becher and trowler (2001) and may reduce innovation, although about half the respondents also had external contacts that might support creativity. keywords: networks; interactions; conversational partners; higher education; academics n. pataraia et al. 5 | f l r 1 background as the higher education environment changes, there are calls for university teachers to change and enhance their teaching practices to match (e.g. european commission, 2009). if in the past learning, adult education and professional development were largely associated with formal education and training (kyndt, dochy, & nijs, 2009; tynjälä, 2008), nowadays it is becoming recognised that learning is lifewide and can take place at work or elsewhere (skule, 2004). furthermore, education scholars argue that teaching knowledge is frequently experientially acquired, and change in teaching occurs through adoption and adaptation of new practices learnt about informally (eraut; 1994; 2004; knight, 2006). thomson (2013) argues that “academics are able to learn about teaching through informal conversation, and for some issues, and even individuals, it may be a more appropriate means for learning about teaching than formal academic development” (p. 205). despite the fact that the significance of informal aspects of academics‟ learning about teaching is becoming recognised, there is still little insight into how and when academics engage in informal learning for enhancing their practice (thomson, 2013). given that a network represents a locus for informal interactions, offering a medium for the exchange of resources and experience, capacity building and collaborative development of knowledge (koper, rusman, & sloep, 2005; powell, koput, & smith-doerr, 1996; tynjälä & nikkanen 2009), academics‟ interactions about teaching are grounded and discussed in the context of networks. a network comprises a set of actors (“nodes”) and a set of relations (“ties” or “edges”), between the nodes (wasserman & faust, 1994). common objectives for interaction bring network participants together (paavola, lipponen, & hakkarainen, 2002). network members may be connected either directly or indirectly, and their connections can be either informal (trust-based), or formalized through contracts. the ties may comprise flows of various types, such as flows of information, materials, financial resources, services, and social support (monge & contractor, 2003). granovetter (1973) differentiated between strong and weak ties, describing strong ties in terms of the time and emotions invested in the relationship. examples of strong ties include friendship and familial relationships, which facilitate the transfer of tacit, sensitive and complex knowledge (burt, 1992; reagans & mcevily, 2003). weak ties, by contrast, encompass a more restrained investment of time and intimacy. granovetter suggested that weak ties serve as bridges between otherwise disconnected social groups and are more important in disseminating new, non-redundant information and resources than strong ties. in order to understand different properties of networks, it is useful to draw on a range of network theories. homophily and proximity theories are particularly important for scrutinising and interpreting the likelihood of establishing and/or dissolving network ties. according to proximity theory (monge & contractor, 2003, p.303), “people communicate most frequently with those to whom they are physically closest and proximity increases the opportunities for individuals to observe and learn more about one another, thereby creating conditions favourable for the development of communication ties”. rogers (2003) asserted that communication is usually most effective between individuals who are similar, or homophilous, in some respect. proximity theory implies that those who are physically close and communicate frequently are more likely to become homophilous, thus leading to the development of rogers‟s (2003) conditions for effective communication. nevertheless, recent technological developments have greatly affected the spatial and social structure of groups, communities and other entities, offering easy access to new information/knowledge/resources and sustainment of communication ties (wellman, 2001). the advent of ubiquitous virtual networking raises the question of whether the concept of proximity is still relevant. while homophily and proximity theories are useful for understanding the formation of network ties, social capital theory helps to evaluate the value of social networks. social capital theory explicates that individuals invest in forming social relationships in order to acquire access to rich resources, namely emotional and professional support, expertise, valuable new connections, and different type of capital (knowledge, human, social and learning) (wenger, trayner, & de laat, 2011). previous research has emphasised the importance of networking, along with other forms of social exchange, for both individual and organisational learning (katz, earl, & jaffar, 2009; trinkle, 2009; tynjälä, 2008). scholars have concluded that networks facilitate dissemination of good teaching practices (coburn & russell, 2008). engagement in networks offers new ways of thinking about educational quality and enhances teachers‟ knowledge, potentially altering their thinking and classroom practice (hargreaves, 2003). furthermore, networks have been recognised as a key instrument for sustained teacher learning and n. pataraia et al. 6 | f l r professional development (katz et al., 2009). through networking, individuals form and maintain useful relationships with others who can, potentially, provide work-related support (forret & dougherty, 2004). in addition, networks equip teachers with a sense of empowerment, provide emotional support, and encourage engagement in teaching (baker-doyle, 2011). nevertheless, it is worth highlighting that these arguments have been largely derived from research in school teaching contexts. pioneering investigations of educational networks have primarily focused either on teachers‟ learning in the context of secondary education (mccormick et al., 2011) or on academics‟ research and departmental networks (becher & trowler, 2001; pifer, 2010). for example, mccormick et al. (2011) examined the role of networks in school teachers‟ learning, suggesting that application of network theories would lead to a better understanding of educational networks. several studies have documented that informal interactions contribute to enhancement of teaching practice (schuck, aubusson, & buchanan, 2008; thomson, 2013). however, many of these studies have examined centrally-organised, formal networks stressing network coordinators‟ viewpoints on the overall value of networks for teachers‟ professional development (kerr et al., 2003). therefore, there is still little insight into what role personal networks play in supporting the professional development of teachers (baker-doyle, 2011). it is worth emphasising that there is even less understanding of personal networks at he teacher level. hence, this paper aims to extend the limited understanding of academics‟ teaching networks by focusing on personal, egocentric networks where, “the network is perceived by the individual at its centre” (wellman, 1998, p.19). we explore the composition of academics‟ personal networks, and also the nature, frequency, venue and characteristics of interactions in order to understand how academics‟ networks may support learning and change of teaching practice. furthermore, coburn & russell (2008) have emphasised that previous studies have ignored the content of teachers‟ interactions. this research responds to this call by investigating themes of participating academics‟ interactions. a number of previous studies examined academics‟ self-initiated networks. most notably, pifer (2010) explored the networking behaviour of academics in the us universities. she found that academics relied on their departmental colleagues for instruction, mentoring, professional opportunities, support with writing grant application and publications, and general support and friendship. pifer‟s study showed that “departmental characteristics, such as proximity, disciplinary influence, and the culture of the department, appeared to influence the interactions of academics” and academics tended to “cultivate relationships and exchange resources with colleagues they perceived to be like them, and less likely to interact with colleagues they perceived to be different from them” (ibid, p. 227–230). however, pifer‟s work focused solely on networks within single departments. she identified the need for further research into other types of academic networks. this paper addresses this gap by examining relationships both within and beyond the department. similarly, roxå and mårtensson (2009) investigated academics‟ networks in a swedish university. drawing on a socio-cultural perspective, they explored the conversations that teachers have with their colleagues. they presumed that some of these conversations could have an influence on teachers to develop new understanding of teaching or even significantly alter their conceptions of teaching. to test the reliability of their assumption, they asked 106 faculty members in sweden from a range of disciplines to reflect on their conversations about teaching. they discovered that “academics relied on a network of a few significant others as they constructed, maintained, or changed their understanding of the teaching and learning reality” (2009, p. 214). on average, participants reported ten conversational partners, which accords with becher and trowler‟s observations of the smaller research network (2001). furthermore, their research revealed that although the participants found their conversational partners anywhere in the same or other departments, disciplines, or institutions, or outside academia the proportion of conversational partners was higher within the department than in other locations. this study extends the work of roxå and mårtensson (2009) by examining a wider range of aspects of conversations about teaching within networks. the research is guided by the following research questions: who do academics talk to about their teaching? what are the main themes (content) of academics‟ conversations? with what frequency and where do academics‟ conversations take place? what factors motivate academics to network and what value do they perceive in their interactions? n. pataraia et al. 7 | f l r the data presented in this paper were derived from two interrelated studies from the pilot phase: the share project longitudinal study 1 and the “academics‟ networking practices” (anp) project 2 at glasgow caledonian university. the share project, at the university of kent, comprised a number of separate studies, which broadly aimed to investigate with whom academics discuss their teaching practice. more precisely, the share project longitudinal study was concerned with the exploration of the setting, nature and value of academics‟ interactions related to teaching. examination of these topics informed the anp project in terms of its methodological approach and research objectives. the overarching aim of the anp project was to uncover further how social interactions and the structure of personal networks influence academics' learning, affecting their behaviour and supporting change in teaching practice. 2 methodology we applied the analytical method of social network analysis (sna). this method is specifically designed to examine the patterns, causes and consequences of established relationships between different individuals (scott & carrington, 2011). however, sna falls short of revealing the motivation behind individuals‟ actions within their networks. since several authors have recommended application of different forms of data collection for breadth and depth of understanding and also for corroboration of network processes (kilduff & tsai, 2007; mehra, kilduff, & brass, 1998), this study integrated both quantitative and qualitative approaches. 2.1 study 1: share project longitudinal study as part of a more extensive questionnaire, longitudinal study of 18 academics in computing, mathematics and technology subjects, 14 provided a free-text written response regarding their teachingrelated interactions. study 1 drew on convenience sampling. the response rate was 83%. two explicit inclusion criteria were used: 1. potential study participants had to be teaching in math/computing/technology area, and 2. participants would be eager to participate in two interventions a year over a period of three years. within the mathematics/computing/technology constraint, they were chosen to represent a variety of institutional contexts, experience and reputation for innovation. the given study examined the composition of academics‟ teaching networks along with the frequency, nature and content of interactions. 2.2 study 2: semi-structured interviews within the anp project to probe findings from the share project further, eleven academics representing three institutions and five disciplines, namely engineering-2/11; life sciences-4/11; education-2/11; social sciences-1/11; humanities-2/11, were interviewed. interviewees for study 2 were drawn using convenience sampling. the response rate was 100%. the main criterion for the selection was that the potential interviewee had to be an innovative/excellent teacher. the interviews lasted one to one and a half hours and were audio recorded and transcribed. interview protocol and interview questions can be accessed at: https://drive.google.com/file/d/0b2to0roh4ibxxzrsawz4ckj5nuk/edit?usp=sharing. 2.3 data analysis 2.3.1 studies 1 and 2 the same techniques of analysis were applied to the 14 written responses from study 1 and 11 interview transcripts from study 2. descriptive statistics, using spss software, focused on describing the characteristics of the sample along with the number of contact types/categories, the frequency and themes of interaction about teaching. given that variables of interest were categorical (qualitative) by nature, frequencies were utilised to obtain descriptive statistics (pallant, 2010). the research questions were used to define initial coding classes for written data and further classes were created as themes emerged. emergent classes were developed by two independent researchers, then compared and contrasted. checks for consistency and reliability were carried out and the final list of codes was refined. overall, eight classes were created: 1. contact category (table 1); 2. level of experience; 3. disciplinary affiliation; 4. frequency of 1 http://www.sharingpractice.ac.uk/homepage.html 2 http://www.gcu.ac.uk/networkedinnovation/ https://drive.google.com/file/d/0b2to0roh4ibxxzrsawz4ckj5nuk/edit?usp=sharing http://www.sharingpractice.ac.uk/homepage.html http://www.gcu.ac.uk/networkedinnovation/ n. pataraia et al. 8 | f l r interaction; 5. venue of interaction; 6. nature of interaction; 7. preferred method of interaction; 8. content of interaction. the purpose of these thematic categories was to organise data into meaningful units of analysis. table 1 shows the different categories of contacts enumerated by academics. table 2 outlines the categories within the classes frequency, nature and themes of conversations. table 1 contact categories (top row) and types within each category ‘family’ ‘in department’ ‘in institution’ ‘friends’ ‘elsewhere’ family member, profession not specified family member teaching family member non-teaching departmental colleague, role not specified colleagues teaching same or companion modules support staff students: current academics teaching in other departments, same institution (discipline not specified) academics teaching in other, but related disciplines/departments central support staff friends, profession not specified friends teaching friends nonteaching professional relationships outside the institution, role not specified formal relationships; collaborations (i.e., co-authors) non-academic relations former colleagues students: former and prospective table 2 categories within the frequency, nature of conversation, and theme classes frequency of interactions the nature of conversations themes ns-not specified=0 once a term-yearly=1 fortnightly-several times per term=2 2 weekly-fortnightly=3 daily=4 ns-not specified=0 formal=1 informal=2 unspecified=0 learning, curriculum design; projects for students=1 students experience/progress=2 research and developing teaching=3 approach to teaching=4 feedback to students/students’ assessment=5 tips and ideas for teaching=6 problems with students=7 n. pataraia et al. 9 | f l r administration/management=8 concerns with institutional environment=9 other=10 interview data were classified, summarized and visualized using nvivo 9. initially, interview transcripts were read to uncover the key themes; subsequently, data were broken down into discrete parts, closely examined, and compared for similarities and differences. from content analysis, themes, such as contact categories; nature, content, intensity and venue of interactions; motivating factors, and also the value of networking, emerged (babbie, 2007). 3 results and discussion the presentation of results is structured around our four research questions. 3.1 research question 1: who do academics talk to about their teaching? in order to understand the configuration and composition of academics‟ teaching networks, information about teaching-related interactions was gathered. each participant was free to name as many contacts as they wished, located across different settings and representing diverse categories of relationships, namely department/institutional/external colleagues, friends and/or family members. it is worth mentioning that each academic could identify more than one contact type under each category (e.g. “staff directly involved in the course i teach” and “postgraduates who teach”; these two different types of contact would still appear “in department” category). since no boundaries were predefined and also no temporal or numerical constraints were introduced for capturing academics‟ significant teaching-related interactions, we presume that enumerated contacts represent members of participants‟ personal networks rather than of their tightly-knit communities. results revealed that academics discuss their teaching with diverse types of contact. however, when asked “who do you talk to about your teaching”, participants tended to name departmental colleagues first before mentioning other types of connection. interviewee 5, specialising in life science, emphasised that “everything i do, i discuss with others, here, in the departmental level”. similarly interviewee 8, representing life sciences, highlighted having close interactions with the departmental programme team while designing new, or amending old, courses. overall, the majority of teaching-related contact types fell under the category of department. “elsewhere” and “institution” represented the second and the third most frequently quoted categories, followed by “family” and “friends”. only two out of eleven academics from study 2 prioritised interactions outside their institution. these two a-typical cases were experienced teachers from the discipline of education. it has to be noted that some respondents identified individual contacts (eg. “the director of teaching”), while others named only types of contact (eg. “other instructors in my department”), giving no precise idea how many individuals within each contact type they talk to. therefore, analysis is at the level of contact type, rather than individuals. findings from this research suggest that common interests, namely joint projects, goals, problems, mutual commitment (“we actually sit on the same committee, we teach on the same course, we are on the same project”), trust and good personal relations played an essential role in cultivating and maintaining connections with others, encouraging open discussions and idea exchange in regards to teaching. since network studies normally rely on a simple name generator question, such as “who do you talk to about specific topic”, data derived from these two studies were sufficient to capture participants‟ contacts distributed across diverse settings, determining the configuration and the basic size of academics‟ teaching network. in sum, findings suggest that academics‟ interactions are concentrated in, but not confined to, departments, spreading more weakly across and outside academia. this observation is in line with roxå and mårtensson‟s finding in sweden that, “academics‟ conversational partners could be found anywhere: within their discipline, in other universities or outside academia” and with their diagram showing a higher proportion of contacts within the department (2009, p. 551, diagram on page 552). as mentioned above, there were only two interviewees who had teaching networks that focused strongly outside their department and institution. it seems likely that their teaching and research networks were inseparable and shared the n. pataraia et al. 10 | f l r characteristics of research networks comprised of dispersed contacts (becher & trowler, 2001). given that respondents tended to list informal interactions first and in greater numbers than formal, it can be presumed that they attach greater significance to the informal. this concurs with roxå and mårtensson‟s (2009) finding for teaching networks in sweden, and bears out knight (2006), thomson (2013) and eraut‟s (1994; 2004) claims that teachers‟ learning is informal. the small significant research networks observed by becher and trowler (2001) were also informal. although our data did not measure the absolute size of respondents‟ teaching networks, the indications are that they were small, sparse and simultaneously informal. 3.2 research question 2: what are the main themes of academics’ conversations about teaching? this research sheds light on the content of academics‟ interactions, examining the flow of different types of resources, advice, information and support within personal networks. data revealed that conversations about teaching varied in terms of their content across diverse types of contact. table 3 illustrates the themes discussed across the five contact categories: table 3 distribution of themes discussed across five categories of contacts themes discussed with different categories of contacts ‘family’ 'in department' ‘in institution’ ‘friends’ ‘elsewhere’ learning, curriculum design; projects for students 1 11 3 0 6 students experience/progress 2 6 3 0 3 research and developing teaching 0 1 1 0 1 approach to teaching 1 4 3 1 1 feedback to students/students‟ assessment 0 7 0 1 4 tips and ideas for teaching 1 2 2 0 1 problems with students 2 5 2 1 1 administration/management 1 9 2 0 3 concerns with institutional environment 2 2 0 0 0 other 3 5 4 1 3 student-related issues and concerns with the institutional environment formed a high proportion of conversations with family “... content of modules, how things are going, irritating admin regulations, marking woes, and odd events”. in addition, family members offered emotional support: “[my wife] provides a valuable balance that helps me to deal with tough situations. it‟s not really directly to do with teaching, but it is certainly a huge help with part of my job”. inside their departments, respondents discussed a wider variety of themes, as detailed in table 3. problems, concerns about students and administrative issues were discussed with administrative staff (five respondents) and people who provided teaching support (two respondents). whereas, curriculum design, projects for students and approaches to teaching were discussed with people whose teaching participants supervised (seven respondents). students‟ experience, progress, feedback, assessment and problems, were discussed with current students, mainly during tutorials and classes (seven respondents). one participant indicated that students‟ opinion was “invariably good source of feedback, insight into teaching practices”. beyond the department, but within the institution, conversations were not discipline-specific. general pedagogical approaches, assessment tools, curriculum design and problems associated with students were discussed with academics from other departments. the conversations with institutional colleagues occurred in a formal setting, normally during seminars and training events. interactions with people from support departments addressed educational research and development of teaching, students‟ experience, administration/management and use of technology (five respondents). three respondents discussed approaches to teaching, students‟ issues, assessment and feedback with their friends. while some stated sharing and testing new teaching ideas or seeking advice for teaching-related challenges from friends, others n. pataraia et al. 11 | f l r specified that their conversations with friends were general and entailed sharing funny stories. with colleagues from other institutions, academics compared and contrasted their professional and teaching environments and discussed prospects for collaborations. the external colleagues tended to be either from the same discipline or at least share similar research interests. course content, teaching approaches, learning process and students, in particular their changing expectations, progress, and issues, were the main themes of conversations. overall, the depth of conversations varied across different contacts, yet appearing more comprehensive with departmental colleagues in comparison with peers from other departments or institutions. 3.3 research question 3: with what frequency and where do academics’ conversations take place? results indicated variations between participants in terms of the regularity of their interactions about teaching. some engaged in task specific interactions, such as struggling with a particular aspect of teaching or designing a new course, while others took part in regular, informal talks around various aspects of their practice. for interviewee 4, specialising in social sciences, networking is a natural way of working and an integral part of her everyday professional life: “my whole practice is based on this idea of collaboration and networking, because it is how i work; you know, it‟s a personal preference, i am not a lone scholar”. in written responses, respondents specified the frequency of their conversations either in quantitative or qualitative terms for 72 out of 105 interactions. table 4 illustrates the distribution of frequencies across different contact categories where this was specified quantitatively, and table 5 shows the distribution where frequency was specified qualitatively. table 4 distribution of quantitatively specified frequencies (n=14) frequencies reported in quantitative terms contact types ‘family’ ‘in department’ ‘in institution’ ‘friends’ ‘elsewhere’ total once a term-yearly 0 6 1 1 3 11 fortnightly-several times per term 2 7 1 1 2 13 1/2 weekly-fortnightly 0 10 0 0 1 11 daily 0 1 0 0 0 1 table 5 distribution of qualitatively specified frequencies (n=14) frequencies specified in a qualitative way contact types ‘family’ ‘in department’ ‘in institution’ ‘friends’ ‘elsewhere’ total very occasionally 0 2 1 0 1 4 sometimes 3 9 3 2 4 21 frequently 3 2 0 0 0 5 when change is required 0 1 0 0 0 1 table 4 and 5 suggest that participants talk about their teaching with colleagues in the department regularly, half-weekly or several times per term. the content analysis of written responses and interview transcripts revealed that interactions about teaching were ad hoc, taking place during lunch and coffee breaks, and more frequently during the teaching term (verified by five respondents). a detailed analysis of written data revealed that within the department, participants talked most frequently with colleagues teaching the same or a companion module, or whose teaching they supervised, namely teaching assistants. interactions with colleagues teaching the same or a companion module were mainly face-to-face, spontaneous, casual in nature, and took place in common rooms or corridors. some selectivity was evident in interviewee 2‟s (humanities) statement that, “i‟ve got a couple of colleagues here i often talk to about teaching... so yes, a fair amount of, probably two or three people out of 40, ... they tend to be people you can n. pataraia et al. 12 | f l r talk to or you feel are on a same sort of wave length as you are,”. similarly, interviewee 10, specialising in engineering, mentioned talking with some colleagues far more frequently than with others. overall, participants emphasized talking more with those with whom they were on friendly terms. despite the fact that interactions with colleagues from other departments, from subject networks, other he institutions, industry or employers, were mentioned, the majority of interviewees indicated a lower frequency of such interactions, occurring occasionally, once a term-yearly basis: “maybe a couple of times a year, depending on if there‟s an event” (8/11). interactions with institutional colleagues occurred at universitywide events, mainly face-to-face, but email, phone, chat and online platforms, were used with physically distant colleagues. since the most frequent interactions were with departmental colleagues these are likely to be discipline-specific. proximity theory appears useful for interpreting the greater frequency of interactions about teaching, while the emphasis on discipline points to the evidence of homophily. this observation highlights that physical proximity still plays an influential role in activating and sustaining network ties, and also for developing trust and rapport with peers despite the widespread popularisation of technologies. if, following granovetter (1973), frequency of interaction is taken as a measure of strength of tie, then the study suggests that academics tend to have strong teaching ties with people within the department, and far weaker ties with people outside their institution. it appears that respondents rely mainly on close, localised connections when dealing with teaching matters. however, since some academics maintained weak ties, such contacts could represent a source of radically novel teaching ideas, bringing complementary knowledge to personal teaching networks (granovetter, 1973). finally, results point to the fact that not only the temporal component of interactions determines the strength of ties, but also the significance of a conversational partner (i.e., friendship). 3.4 research question 4: what factors motivate academics to network and what value do they perceive in their personal networks? in addition to exploring the composition and the basic size of networks along with the content, frequency, venue and nature of academics‟ interactions, this study expands understanding of the incentives for networking and the benefits obtained through personal teaching networks. table 6 summarises results across all of the interviewees: table 6 motivation for networking and the benefits obtained through personal networks motivation for networking benefits obtained through networks access to new teaching ideas good personal relationships access to disciplinary knowledge professional guidance access to new learning opportunities prompt feedback access to diverse resources solidarity and the sense of community access to professional and emotional support confidence findings suggest that personal networks provide not only access to new teaching ideas, learning opportunities and diverse resources, but also the exposure to diverse viewpoints and a wide pool of expertise within networks enriches academics‟ knowledge base and challenges their conceptions of teaching. through interactions, participants keep track of others‟ work, sometimes triggering their motivation to adopt or experiment with new things: “i find out what other people are doing; looking at what someone else is doing and then changing my teaching is one of the things that i would do” (interviewee 9). sometimes, interviewees adopted ideas without much alteration; at others they adapted new concepts to their own context, “you can take something that someone is using to teach in a particular context and you maybe like n. pataraia et al. 13 | f l r the idea, but it doesn‟t fit with your students or with what you teach. so what you could do is take that idea and you can change it until it does fit with your students.” furthermore, interviewees indicated that the network offered a sense of security, comfort and reliability. participants particularly valued availability of prompt feedback, especially when facing a specific teaching-related issue. by discussing problems with peers, academics could easily develop useful solutions. in addition, personal networks represented a locus for testing new ideas: “if you are planning some changes to your course, you‟ll often try it out on them first, before you go to the larger group, just to make sure you don‟t make a complete fool of yourself” (interviewee 2). in sum, findings showed that through personal networks academics acquire various kinds of resources (new ideas and teaching materials), share knowledge and experience with one another as speculated by social capital theory (wenger, trayner, & de laat, 2011). the interviewees appreciated these as benefits that provided incentives for networking. overall, respondents used their personal networks for exchanging ideas, discussing teaching-related problems and obtaining professional advice. academics‟ teaching networks thus conform with the network functions proposed by tynjälä and nikkanen (2009), koper et al. (2005) and paavola et al. (2002), as discussed in the introduction. 4 conclusion understanding academics‟ learning is important as in today‟s society lifelong learning is becoming the benchmark of all professional fields. given that academics are the key agents in transforming educational practices, scientific knowledge about from whom or how they learn and also in what ways their professional development can be supported is of key importance. this research specifically unpacks the interactions that influence and enhance academics‟ teaching practices, examining their networking in terms of its nature, processes and outcomes. this study can be of interest not only to the academics themselves, but also to the wider university staff, especially those who are responsible for professional development, and national bodies interested in teaching and learning (for instance, higher education academy and seda). the small size and variation of the sample limit the generalisability of the findings. nevertheless, some tentative conclusions are drawn below. these have implications for understanding the ways in which academics develop understanding of teaching, acquire new knowledge, skills and dispositions in regards to teaching, and also how change in instruction might be supported. nevertheless, further testing and verification of the results through additional empirical research are highly recommended. despite the fact that personal networks relating to teaching are valued by academics, in most cases these are strongly localised. there is little evidence of personal networks extending beyond immediate (faceto-face) contacts. even if other means were utilized to contact external colleagues, the ties were weaker, the intensity of interactions less frequent, the content of conversations less comprehensive, and generally considered less significant. two interpretations are possible for these observations. first, that teaching practice is a highly contextualised activity (in contrast to research), so meaningful interactions are likely to be with those who understand the local context, namely institutional regulations/politics, departmental culture, students – such people often share the same building, have mutual commitments and/or similar interests, and face to face contact is easy. second, that face to face contact could be the most effective way for sharing teaching practice and also for acquiring prompt feedback, hence significant interactions are likely to be with those who are geographically close. the data, though, may show some research bias: the prompt “who do you talk to about your teaching?” could have predisposed respondents to think in terms of face-toface interactions. investigation of the ways in which academics network about teaching through other media, could establish the circumstances under which face to face contact is significant in supporting changes in practice. the local focus implies densely connected networks where the majority of members know each other considerably well. tushman and anderson (1986) suggest that members of such networks are less exposed to radically new ideas and also less likely to absorb knowledge created elsewhere. nonaka and takeuchi (1995) agree, advocating being open to external resources and diverse sources of information to avert pressures for social conformity, and „not invented here‟ syndrome. however, ruef (2002) suggests that a diverse network may support creativity, through flow of information via weak ties, and adoption of the resultant innovation through strong ties. about half of respondents in this research had the diverse networks that might support effective innovation in teaching according to ruef‟s model. n. pataraia et al. 14 | f l r the majority of significant ties, for most respondents, appear to be with others from the same discipline, whether within the department or external to the institution. this implies that disciplinary networks may be more effective in supporting change than generalised intra-institutional networks. however, the share project respondents all came from computing, mathematics or technology, and this has heavily weighted the sample. further research should test whether this conclusion applies to disciplines with less technical content where teaching approaches may transfer more easily across disciplinary boundaries. that the teaching networking practices of those whose research discipline is education may be a-typical requires further investigation as it has implications for the applicability of any conclusions drawn from the study of such networks. academics‟ connections did not appear time or context specific, since respondents maintained contact both with current colleagues and with those from previous institutions. this implies a historical or temporal component of networks which are thus not entirely explained by proximity or discipline. moreover, there was a wide diversity in intensity of networking relations, but only within the department interactions appeared to be regular in nature. the dynamics of teaching network formation and maintenance, and the impact this has on the types of flow warrant further investigation. given that personal networks offered new teaching ideas, learning opportunities, diverse resources, and also shaped academics‟ perceptions about teaching, it can be presumed that personal networks play an influential role in academics‟ professional development. furthermore, since previous network literature has been dominated largely by quantitative research (filliettaz, 2011; rijt, bossche, & segers, 2012), this research project addresses the methodological gap by adding a much-needed qualitative perspective on academics‟ teaching-related interactions and network processes within their personal networks. by examining the depth of academics‟ interactions about teaching, this study addresses yet another gap concerning the content of interactions (coburn & russell, 2008). the current studies examined the static snapshot of participants‟ teaching networks. therefore, future studies should consider scrutinising how academics‟ network composition changes over time and what factors cause changes in their network structure. furthermore, the current research did not explore in what ways personal characteristics, namely age, gender, experience level, disciplinary domain or institutional culture influence academics‟ networking behaviours. hence, future research should consider investigating the influence of these characteristics on the patterns of networking. finally, this paper made partial use of social network analysis by elaborating on the basic structure of academics‟ networks, along with the frequency, content and the value of teaching-specific interactions. nevertheless, another research paper reports the detailed sna analysis on the project data, outlining the impact of ego, ego-alter and alter-alter characteristics on the patterns and nature of relationships formed by academics (pataraia et al., 2014). keypoints academics‟ teaching networks are localised, marked with strong ties. personal networks offer a wide range of benefits, namely new information, ideas and support. academics‟ personal connections do not appear time-bound. acknowledgements we are grateful to all participants in these two studies, and to the national teaching fellowship scheme, which funded the share project. n. pataraia et al. 15 | f l r references babbie, e. (2007). the practice of social research. belmont, ca: thomson higher education. baker-doyle, k. j. (2011). the networked teacher: how new teachers build social networks for professional support. new york, ny: teachers college press. becher, t., & trowler, p. (2001). academic tribes and territories: intellectual enquiry and the cultures of disciplines. buckingham, england: open university press. burt, r. s. (1992). structural holes: the social structure of competition. cambridge, ma: harvard university press. coburn, c. e., & russell, j. l. (2008). district policy and teachers‟ social networks. educational evaluation and policy analysis, 30(3), 203–235. eraut, m. (2004). informal learning in the workplace. studies in continuing education, 26(2), 247–273. eraut, m. (1994). developing professional knowledge and competence. london, england: routledge. european commission (2009). council conclusions of 12 may 2009 on a strategic framework for european cooperation in education and training (et 2020) [official journal c 119 of 28.5.2009]. retrieved from http://europa.eu/legislation_summaries/education_training_youth/general_framework/ef0016_en.htm. filliettaz, l. (2011). asking questions...getting answers: a sociopragmatic approach to vocational training interactions. pragmatics and society, 2(2), 234–259. fincher, s., & tenenberg, j. (2011). a commons leader„s vade mecum. university of kent press available at: http://www.cs.kent.ac.uk/people/staff/saf/share/papers/bt_111049_vadexmecum_final.pdf forret, m. l., & dougherty, t.w. (2004). networking behaviors and career outcomes: differences for men and women? journal of organizational behavior 25(3), 419–437. granovetter, m. s. (1973). the strength of weak ties. american journal of sociology, 78(6), 1360–1380. hargreaves, a. (2003). teaching in the knowledge society: education in the age of insecurity. new york, usa: teachers' college press. katz, s., earl, l. m., & jaffar, s. b. (2009). building and connecting learning communities: the power of networks for school improvement. thousand oaks, ca: corwin press. kerr, d., aiston, s., white, k., holland, m., & grayson, h. (2003). review of networked learning communities. maidenhead, uk: national foundation for educational research. kilduff, m., & tsai, w. (2007). social networks and organisations. los angeles, ca: sage. knight, p. (2006). quality enhancement and educational professional development. quality in higher education, 12(1), 29–40. koper, r., rusman, e., & sloep, p. (2005). „effective learning networks‟. article. retrieved from http://dspace.ou.nl/handle/1820/304. kyndt, e., dochy, f., & nijs, h. (2009). learning conditions for non-formal and informal workplace learning. journal of workplace learning, 21(5), 369–383. mccormick, r., fox, a., carmichael, p., & procter, r. (2011). researching and understanding educational networks. london, england: routledge. mehra, a., kilduff, m., & brass, d. j. (1998). at the margins: a distinctiveness approach to the social identity and social networks of underrepresented groups. academy of management journal, 41(4), 441–452. monge, p. r., & contractor, n. s. (2003). theories of communication networks. new york, ny: oxford university press. nonaka, i., & takeuchi, h. (1995). the knowledge-creating company: how japanese companies create the dynamics of innovation. new york, ny: oxford university press. paavola, s., lipponen, l., & hakkarainen, k. (2002). epistemological foundations for cscl: a comparison of three models of innovative knowledge communities. in g. stahl (ed.), proceedings of the conference on computer-supported collaborative learning: foundations for a cscl community (pp. 24–32). cscl ‟02. hillsdale, nj: erlbaum. international society of the learning sciences. retrieved from http://dl.acm.org/citation.cfm?id=1658616.1658621. http://www.cs.kent.ac.uk/people/staff/saf/share/papers/bt_111049_vadexmecum_final.pdf http://dl.acm.org/citation.cfm?id=1658616.1658621 n. pataraia et al. 16 | f l r pallant, j. (2010). spss survival manual: a step by step guide to data analysis using spss. crows next, australia: allen & unwin. pataraia, n., margaryan, a., falconer, i., littlejohn, a., & falconer, j. (2014). discovering academics‟ key learning connections: an ego-centric network approach to analysing learning about teaching. journal of workplace learning, 26(1), 56–72. pifer, m. (2010). such a dirty word: networks and networking in academic departments (unpublished doctoral dissertation). pennsylvania state university, usa. powell, w.w., koput, k.w., & smith-doerr, l. (1996). interorganizational collaboration and the locus of innovation: networks of learning in biotechnology. administrative science quarterly, 41(1), 116–145. reagans, r., & mcevily, b. (2003). network structure and knowledge transfer: the effects of cohesion and range. administrative science quarterly, 48(2), 240–267. rijt, j. van der, bossche, p. v. den, & segers, m. s. r. (2013). understanding informal feedback seeking in the workplace: the impact of the position in the organizational hierarchy. european journal of training and development, 37(1), 72–85. rogers, e. m. (2003). diffusion of innovations (5th ed.). new york, ny: the free press roxå, t., & mårtensson, k. (2009). significant conversations and significant networks: exploring the backstage of the teaching arena. studies in higher education, 34(5), 547–559. ruef, m. (2002). strong ties, weak ties and islands: structural and cultural predictors of organizational innovation. industrial and corporate change, 11(3), 427–449. schuck, s., aubusson, p., & buchanan, j. (2008). enhancing teacher education practice through professional learning conversations. european journal of teacher education, 31(2), 215–227. scott, j., & carrington, p. j. (2011). the sage handbook of social network analysis. london, uk: sage publications ltd. skule, s. (2004). learning conditions at work: a framework to understand and assess informal learning in the workplace. international journal of training and development, 8(1), 8–20. thomson, k. e. (2013). the nature of academics‟ informal conversation about teaching (unpublished doctoral dissertation). the university of sydney, australia. retrieved from http://ses.library.usyd.edu.au/bitstream/2123/9166/1/ke-thomson-2013-thesis.pdf. trinkle, c. (2009). twitter as a professional learning community. school library monthly, 26(4), 22–23. tynjälä, p. (2008). perspectives into learning at the workplace. educational research review, 3(2), 130– 154. tynjälä, p., & nikkanen, p. (2009). transformation of individual learning into organizational and networked learning in vocational education. in m. stenström., & p. tynjälä (eds.). towards integration of work and learning: strategies for connectivity and transformation (pp. 117–135). dordrecht, the netherlands: springer. tushman, m.l., & anderson, p. (1986). technological discontinuities and organizational environments. administrative science quarterly, 31(3), 439–465. wasserman, s., & faust, k. (1994). social network analysis: methods and applications. cambridge, england: cambridge university press. wellman, b. (2001). physical place and cyberplace: the rise of personalised networking. international journal of urban and regional research, 25(2): 227−52. wellman, b. (1998). networks in the global village: life in contemporary communities. boulder, co: westview press. wenger, e., trayner, b., & de laat, m. (2011). promoting and assessing value creation in communities and networks: a conceptual framework. heerlen, the netherlands: ruud de moor centrum, open university. retrieved from http://www. knowledge-architecture. com/downloads/wenger_trayner_delaat_value_creation. pdf frontline learning research 6 (2014) 67-81 issn 2295-3159 corresponding authors: eduardo cascallar, ku leuven, leuven, belgium, cascallar@msn.com and mariel musso, national research council (conicet), argentina and ku leuven, leuven, belgium, mariel.musso@hotmail.com doi: http://dx.doi.org/10.14786/flr.v2i5.135 67 | f l r modelling for understanding and for prediction/classification the power of neural networks in research eduardo cascallar ab , mariel musso acd , eva kyndt a and filip dochy a a university of leuven, belgium b assessment group international, usa / belgium c national research council (conicet)/ciipme, argentina d universidad argentina de la empresa, argentina article received 28 november 2014 / revised 18 january 2015 / accepted 18 january 2015 / available online 30 january 2015 abstract two articles, edelsbrunner and, schneider (2013), and nokelainen and silander (2014) comment on musso, kyndt, cascallar, and dochy (2013). several relevant issues are raised and some important clarifications are made in response to both commentaries. predictive systems based on artificial neural networks continue to be the focus of current research and several advances have improved the model building and the interpretation of the resulting neural network models. what is needed is the courage and open-mindedness to actually explore new paths and rigorously apply new methodologies which can perhaps, sometimes unexpectedly, provide new conceptualisations and tools for theoretical advancement and practical applied research. this is particularly true in the fields of educational science and social sciences, where the complexity of the problems to be solved requires the exploration of proven methods and new methods, the latter usually not among the common arsenal of tools of neither practitioners nor researchers in these fields. this response will enrich the understanding of the predictive systems methodology proposed by the authors and clarify the application of the procedure, as well as give a perspective on its place among other predictive approaches. keywords: artificial neural networks; response to commentaries; methodology; data modelling cascallar et al 68 | f l r research is the process of going up alleys to see if they are blind. marston bates two articles, edelsbrunner and, schneider (2013), and nokelainen and silander (2014) comment on musso, kyndt, cascallar, and dochy (2013). several relevant issues are raised and some important clarifications need to be made in response to both commentaries. this response will enrich the understanding of the predictive system methodology proposed by the authors and clarify the application of the procedure, as well as give a perspective on its place among other predictive approaches. edelsbrunner and schneider (2013) in their commentary on musso, kyndt, cascallar and dochy (2013) argue that artificial neural networks (anns) should only be used as exploratory modelling techniques, in spite of being powerful statistical modelling tools with demonstrated ability to improve outcomes of classifications and predictions over traditional statistical methods (marquez, hill, worthley, & remus, 1991). garson (1998, pp. 11-14) cites more than thirty-five articles which have shown the ability of anns to outperform traditional techniques in specific circumstances. in addition, haykin (1994, pp. 4-5) summarizes some of the main favourable properties of anns which explain their advantages over traditional methods. the reasons edelsbrunner and schneider (2013) argue for their rather strong position are centred on two main arguments: (a) that the output from anns cannot be fully translated into a meaningful set of rules because of a lack of accessibility to the input-output relationships, and (b) that there is a lack of equivalent statistical parameters in anns when compared to more traditional statistical techniques. these are the two fundamental misconceptions that will be addressed. one of the essential requirements for development and advancement in science is the willingness and vision to explore new conceptualizations and methods. in particular, as is the case in the study by musso et al. (2013), the ability to bring together data from interdisciplinary domains (e.g., decuyper, dochy, & van den bossche, 2010), and to use new methodologies for analyses that are commonly applied in other disciplines such as business, finance, and the social sciences (aldeek, 2001; detienne, detienne, & joshi, 2003; laguna & marti, 2002; neal & wurst, 2001; nguyen & cripps, 2001; white & racine, 2001, and others as stated in musso et al., 2003). the literature still shows relatively few studies applying neural networks in education and in educational assessment in particular (everson, chance, & lykins, 1994; wilson & hardgrave, 1995), although anns have been shown to improve the validity and the accuracy of the predictions and/or classifications, and also improve the predictive validity of test scores (everson et al., 1994; perkins, gupta, & tamanna, 1995; weiss & kulikowski, 1991). more recently, several studies have shown the applicability and use of this methodology in education (e.g., cascallar, boekaerts, & costigan, 2006; kyndt, musso, cascallar, & dochy, 2011; kyndt, musso, cascallar, & dochy, 2015; musso & cascallar, 2009a; musso & cascallar, 2009b; musso, kyndt, cascallar & dochy, 2012; musso et al., 2013; pinninghoff junemann, salcedo lagos, & contreras arriagada, 2007; ramaswami & bhaskaran, 2010; zambrano matamala, rojas díaz, carvajal cuello, & acuña leiva, 2011). these recent studies have used anns both for prediction/classification as well as for the understanding of the underlying variables involved in the educational outcomes studied. now it cascallar et al 69 | f l r is important to show that recent advances in ann analysis have addressed the main concerns expressed in edelsbrunner & schneider (2013). first, the concerns regarding the presumed “opacity” of ann in terms of their input-output relationships will be addressed. the authors undermine their own estimate of the value of anns as a “promising technique” by essentially arguing that it is contrary to good scientific practice for theory-building given the presumed “opaque” nature of their internal structure which makes interpretation difficult if not impossible. the often and now quite outdated argument of anns as “black boxes” (cf. benitez, castro & requena, 1997) is therefore raised once again. however, these arguments are raised ignoring the vast amount of research that has been going on in this field to overcome this initial drawback of predictive systems analyses (e.g., frey & rusch, 2013; intrator & intrator, 2001; lee, rey, mentele, & garver, 2005; tzeng & ma, 2005; yeh & cheng 2010). considering the nature and centrality of modelling in science, as was clearly presented by frigg and hartmann (2006), models can perform two different representational functions, which are not mutually exclusive as scientific models. first, they can be a representation of an aspect or selected part of the world, what they call the “target system”. in this case, what can be modelled are either phenomena or data. the second notion of modelling is the representation of a theory in that it represents its rules, laws and axioms. clearly, anns contribute to the construction of better representational models consisting of “models of data” (suppes, 1962). in particular, this contribution is based on ample research that has been crucial in making the link between anns representations and their relationship to the obtained outputs. as an anecdote, it is interesting and revealing that edelsbrunner and schneider (2013) cite the paper of benitez, et al. (1997) which presents an addition to the usual ann techniques which according to benitez et al. (1997) provide “such an interpretation of neural networks so that they will no longer be seen as black boxes” (p. 1156), which clearly contradicts the use of the article of benitez et al. (1997) as supporting the “black box” unique perception of anns. the proposed approach, in this case is based on the determination of the equality between multilayered perceptron anns, precisely the one used by musso et al. (2013), and fuzzy rule-based systems. the operator derived from this equivalency concept results in the transformation of fuzzy rules into a format which can be easily understood. thus, the knowledge generated by the ann after the learning process is finished can be more easily and clearly explained, “so that they can no longer be considered as black boxes” (benitez et al., 1997, p. 1156), while retaining all the advantages and power of the anns as very efficient computing representations as automated knowledge acquisition procedure models, and as universal approximators (ripley, 1996). in fact, west, brockett, and golden (1997) state that neural networks “are a well-defined adaptive gradient search procedure for parameter fitting in a complex nonlinear model, and not a „black box‟ at all” (p. 389). in addition, the efforts to develop better and more comprehensive visualisation techniques for the complex interactions in an ann, such as those suggested by tzeng and ma (2005) have contributed to open the “black box” and help the researcher in determining underlying dependencies between inputs and outputs of a neural network. as a consequence, they do not only facilitate the design of efficient anns, but also enable the use of anns for problem solving. it is true that cascallar et al 70 | f l r visualisation is not explanation, but they are powerful tools to guide the refinement of neural network structures for problem solving (e.g., classification tasks) using anns or other machine learning models. another significant addition to the literature which “opens the box” in ann analyses is the concept of structured neural network (snn) techniques used for modelling (lee, rey, mentele, & garver, 2005). in this approach, the actual construction of the network is based on existing contextual and theoretical knowledge to assist in the design of the ann structure of inputs. in fact, a similar approach was followed by musso et al. (2013), by populating the inputs based solely on solid theoretical constructs derived from previous cognitive, motivational, and sociodemographic research and models, avoiding blind data mining techniques (hand, mannila & smyth, 2001), and based on the factor analysis and structural equation modelling (sem) of several variables to determine their potential weight in the problem. cause-and-effect relationships have been traditionally modelled, among others, by sem and partial least squares (pls) approaches. but these procedures have their own shortcomings. in pls, there is no theoretical rationale for all indicators to have the same weighting (haenlein & kaplan, 2004), and the pls procedure does not take into account the fact that some indicators may be more reliable than others and should, therefore, receive higher weights (chin, marcolin, & newsted (2003). in addition, there is the difficulty of interpreting the loadings of the independent latent variables in pls (which are based on cross-product relations with the response variables). regarding sem several authors also point out some issues that require attention from the researcher or that are still awaiting further research (lei & qiong wu, 2007; schermelleh-engel, kerwer, & klein, 2014; weston & gore, 2006). among the issues noted with sem are possible data problems, such as missing data, non-normality of observed variables, or multicollinearity; estimation problems that could be due to data problems or identification problems in model specification; or interpretation problems due to unreasonable estimates. these potential problems have led to suggestions involving the development of “mixture pls” models (hahn, johnson, herrmann, & huber, 2002), hierarchical bayesian methods in sem models (ansari, jedidi, & jagpal, 2000) and new ways of evaluating fit in non-linear multilevel structural equation models (schermelleh-engel et al., 2014). even if nonlinear sem and pls models could handle asymmetric relationships, they still do not solve the problems associated with large data and complex interactions. the snn approach takes into account these complexities and non-linearity in data sets, while maintaining the advantages of the ann general model. another significant addition to the battery of approaches that researchers have explored to eliminate the “black box” risk of anns is the inclusion of sensitivity analysis for each of the variables in the model (kim & ahn, 2009) in order to extract the necessary information for model validation and process optimisation, from the relationships between inputs and outputs in the ann. this method, based on the relative importance (ri) parameter estimate improves on garson‟s (1991) use of relative importance weights, and uses sensitivity analysis to determine the causal importance of the input variables on the outputs. the sensitivity is a measure of the increase in the error of the predicted value as each variable is excluded from the model, and demonstrates systematically the degree of influence on the network weights of each participating variable. the ri methods used in both classification and prediction models are another evidence of the fallacy of the https://www.researchgate.net/researcher/2045138797_karin_schermelleh-engel/ https://www.researchgate.net/researcher/2045138797_karin_schermelleh-engel/ cascallar et al 71 | f l r view of neural networks as black-boxes beyond human understanding. incidentally, kim and ahn (2009) also compared the results from the ann analysis with logistic regression and classification and regression trees (cart) analyses, with ann models obtaining better results in both training and testing sets of data. other authors (e.g., blackard & dean, 1999) have compared anns absolute accuracy and relative accuracy compared to predictions based on discriminant analysis (da) models, with a consistent finding that ann models outperformed the da models. a very interesting comparison of methods to accurately assess the contribution of variables in ann architectures has been reported by olden, joy, and death (2004). the authors compare nine different methods for quantifying variable importance in anns using simulated data with known properties. the use of simulated data, when the true importance of the variables is known, provides a solid base for future developments in this field, which are not possible with natural data as is the case with gevrey, dimopoulos, and lek (2003). the nine methodologies studied by olden et al. (2004) included: connection weights, garson‟s algorithm, partial derivatives, input perturbation, sensitivity analysis, forward stepwise addition, backward stepwise elimination, improved stepwise selection 1, and improved stepwise selection 2 (see olden et al., 2004 for details on these methods). the results indicated that the connection weights approach showed the best overall performance both in terms of accuracy (degree of similarity between true and estimated variable ranks) and precision (degree of variation in accuracy), when estimating the true importance of all the variables in the ann. partial derivatives, input perturbation, sensitivity analysis and both versions of the improved stepwise selection methods showed moderate performance in the simulations. when estimating the actual ranks, the connection weights approach once again was the method which exhibited the best performance. in addition, olden and jackson (2002) reviewed a randomisation approach to better evaluate and understand the contribution of predictors in ann analysis. they conclude by stating: “thus, by coupling this new explanatory power of neural networks with its strong predictive abilities, anns promise to be a valuable quantitative tool to evaluate, understand, and predict ecological phenomena” (olden & jackson, 2002, p. 135). all of these examples demonstrate that using the appropriate techniques, the complexity of an ann does not need to translate into “opacity”, and researchers are not limited in their ability to gain insight into the explanatory factors of the prediction and classification processes performed efficiently by anns. studies such as olden et al. (2004), gevrey et al. (2003), and lek, belaud, baran, dimopoulos, and delacoste (1996), are but the beginnings of a vast number of applications that have “opened the box” in ann analysis. in addition, regularisation approaches have been used to enhance the interpretation of ann results (intrator & intrator, 2001), and the estimation of interaction effects in anns was used and demonstrated by donaldson and kamstra (1999). therefore, contrary to what has been pointed out by edelsbrunner and schneider (2013) and quoted by golino and gomes (2014), the ann approach offers the potential to examine the complex relationships amongst its components. an additional important advantage of ann analysis refers to the need to capture the complexity of the interaction of various factors in the understanding of also complex phenomena (agrawal, 2001). it is difficult to find large-n studies with a large set of variables, particularly in the social and educational sciences. so, most studies attempt to develop causal models based on a cascallar et al 72 | f l r very limited set of variables, without the capacity to encompass a large number of predictors, and therefore not providing the possibility to observe their complex interactions (boekaerts & cascallar, 2006; cascallar et al., 2006). a resulting problem is that meta-analyses trying to find general statistical correlations face very serious problems as interactions between the factors analysed are not known, which in turn leads to wrong estimations of relevance. related to this problem is the fact that in all studies that knowingly or unknowingly exclude a relevant factor, the importance of all other variables shifts dramatically. this effect has been noted in very diverse fields ranging from natural resource estimation to self-regulated learning (agrawal & chhatre, 2006; boekaerts & cascallar, 2006). studies which only take into account a few variables, in rather simple designs, and do not consider very important but complex interactions with a larger number of participating factors can and do often show contradictory results. this should not be considered a trivial problem for the conceptualisation of various effects and phenomena in every scientific field (boekaerts & cascallar, 2006). frey and rusch (2013) present an interesting study in the area of social-ecological systems which uses anns with an analytic approach that produces an open architecture in which it is possible to establish the input-output relationships which edelsbrunner and schneider (2013) seem to perceive are unachievable for anns. these analyses suggested by various authors (thrush, coco & hewitt, 2008; yeh & cheng 2010) make the relationships among the various input-output variables explicit. the second main argument regarding problems associated with the ann methodology, as claimed by edelsbrunner and schneider (2013), has to do with the lack of some statistical parameters in anns. this ignores the evidence that there has also been an abundance of research to provide the ann model with equivalent information. there have been increasing efforts for some time, to embed anns in general statistical frameworks (cheng & titterington, 1994), with bridle (1992) comparing and blending anns with markov-chain models, and applying bayesian approaches and methods in the modelling of neural networks (mackay, 1992). more recently, he and li (2011) provide an interesting example of such work. they used the standard backpropagation algorithm derived in vector form, and they were successful in determining the confidence interval and prediction intervals for the ann, while also exploring which neural network structural characteristics had more of an impact on such parameters. in particular, when the levenberg-marquardt backpropagation algorithm is used to train a neural network, since the jacobian matrix has been calculated to update the weights and biases of the neural network, the confidence interval with the corresponding confidence level can be computed to evaluate the predictive capability of the ann. in addition, on similar topics, zapranis and livanis (2005) state that given that anns are a good example of consistent non-parametric estimators with powerful universal approximation properties, they require that the development and implementation of neural network applications has to be based on established procedures for estimating confidence and especially prediction intervals. they go on to review the main state-of-the-art approaches for the construction of confidence and prediction intervals, and evaluate their strengths and weaknesses. after comparing them in a controlled simulation, the authors suggest that a combination of bootstrap and maximum likelihood approaches are superior to analytic approaches when constructing the prediction intervals (zapranis & livanis, 2005). on the other hand, other authors propose the construction of confidence intervals for neural networks based on least squares cascallar et al 73 | f l r estimations and using the linear taylor expansion of the nonlinear model output, which also detects ill-conditioning of ann candidates and can estimate their performance (rivals and personnaz, 2000). in terms of the comparison between anns and logistic regression, in neural network analysis the purpose of the hidden layer is to map a set of patterns, which are linearly non-separable in the input space, into the so-called image-space in the hidden layer, where these patterns may become linearly separable. as in logistic regression, decision surfaces in the neural networks are hyperplanes in the input space. the key difference, though, between neural networks and logistic regression is that each hidden neuron (other than the bias neuron) produces an output that corresponds to a distinct, discriminating hyperplane in the input space. when these are weighted, summed, and transformed at an output neuron, the resulting output corresponds very closely to a multidimensional step function. it is found that the boundaries of regions of similar probability are defined by the discriminating hyperplanes, which crisscross the input space (dreiseitl & ohnomachado, 2002). given the vast number of practical applications already mentioned in the original article by musso et al. (2013), it is unfortunate that edelsbrunner and schneider (2013) choose to exemplify an unrealistic example of application of anns in a contrived situation in which a student is eliminated from a programme based on a neural network classification. anns, like any other methodology provides the researcher or applied scientist with information. as we have already shown from the literature cited, in the case of anns there are a number of methods to establish the necessary input-output relationships and to determine the confidence and prediction intervals provided by an ann. therefore, the contrived diagnostic example provided by edelsbrunner and schneider (2013, pp. 100) shows an underestimation/misinterpretation of the potential of anns. furthermore, poor advice is always a problem, as would be the case in this example, with the unfortunately frequent decision-making of students‟ career paths determined by a single-point examination. on the other hand, a trusted result from a properly constructed and tested ann could provide valuable diagnostic, educational, and public policy information. in fact, the research carried out by some of these authors (cascallar et al., 2006; kyndt et al., 2011, 2015; luft, gomes, priori & takase, 2013; musso & cascallar, 2009a; musso et al., 2012, 2013) provides examples of useful diagnostic models in the educational field. it is a false dichotomy to present modelling for understanding versus modelling for prediction. in reality, both are achievable and in fact they should be integrated for the advancement of the field and the success of each application. much insight has been gained by integrating understanding with predictive and classification models. as is good practice in various fields, especially in applied statistics and mathematical modelling, the various approaches constitute a toolbox that the professional has available in order to apply the best method for the problem at hand. the fact that our article (musso et al., 2013) demonstrated the use of anns in a given academic application is not meant to be exclusionary. on the contrary, the field requires the integration of mathematical modelling and statistical techniques. regarding the comments in nokelainen and silander (2014) on the article by musso et al. (2013), they can be summarized in two main points. the first point questions whether the methodology used was rigorous in its procedures, and the second suggests comparing the neural cascallar et al 74 | f l r network results with those obtained from another discriminative classifier in addition to the comparison to a generative classifier such as discriminant analysis. it is very important to clarify that the data reported in musso et al. (2013) rigorously followed the standards established by the message understanding conferences (muc) (grishman & sundheim (1996). as is clearly stated in the musso et al. (2013) article, “the training and testing samples were selected at random from the existing data and the proportions were adjusted in order to maximize the training sample while preserving the appearance of all detected patterns in the testing sample, so as to be able to appropriately test the model” (p. 60). the two samples were chosen at random, precisely to avoid what nokelainen and silander (2014) put forward. these authors seem to have misinterpreted the sections on analyses procedures and architecture of the neural network (musso et al., 2013, pp. 52-54) in which the process is described in detail, and they completely misjudge when they state that “the paper by musso and her colleagues (2013) practically acknowledges that such a discipline was not rigorously followed.” (nokelainen & silander, 2014, p. 79). it is clearly stated in the above mentioned sections the way in which the sample was divided, the complete independence of the randomly selected training and testing subsets, and the criteria followed to determine the proportions of cases in each of the two subsets. ironically, the procedures followed coincide with those suggested by (nokelainen & silander, 2014, p. 79). let us state unequivocally that both subsets of cases in the training and testing samples were analyzed separately. in addition, all training of the neural network model was carried out on the training sample, as well as all parameter adjustments, until the desired level of precision was attained. then, the model was independently tested on the testing sample, capturing the generalization of the network structure and the learning parameters. none of the model building took place on the testing sample as nokelainen and silander (2014) incorrectly assume. thus, the performance of the model with the testing subset actually provides an indication of the generalization of the model, not just “fit” as nokelainen and silander (2014, pp. 79) also incorrectly state. a related comment regarding the “ethical standards” of the musso et al. (2013) paper is truly surprising. do nokelainen and silander (2014) truly believe or imply that the authors could not “refrain from cheating (using the test data)” (nokelainen & silander (2014, p. 79) in developing the model? if so, it is alarming, because they are making a serious assumption regarding the authors or at best an implication of ignorance of basic rules of science and of this methodology in particular. their fear of “cheating” and their implication that the testing sample analysis should be carried out by different researchers because of this assumed temptation to cheat could be extended to all research in all areas and all statistical methods. it is precisely part of the scientific method to follow any scientific finding with careful replications, not simply to avoid cheating, but to truly evaluate the generalizability of scientific results. it does not mean that we cannot trust researchers, at least a priori, with carrying out an ethically sound analysis. if not, all findings, including theirs, would be in question. certainly, the musso et al. (2013) article followed careful and rigorous methodological procedures. if their question has to do with the perfect classification obtained, it is the product both of the appropriate modelling process carried out, and of the granularity of the expected results given the available data; it should be noted that the correlation between the individual gpa scores of the cascallar et al 75 | f l r students in the whole testing sample and their predicted score (with data from one year in advance), was .86 (musso et al., 2013, p. 64). regarding the suggestion to use other discriminative classifiers, such as logistic regression, to compare with the results obtained with the neural network model, it is a good suggestion which has already been carried out in the literature (kim & ahn, 2009), and it has been found that neural networks obtained better classification results. in fact, some of the authors in musso et al. (2013) already have carried out such analyses in research currently underway, with the same results favourable to neural networks (musso, boekaerts, segers, & cascallar, in preparation). the field of machine learning research and the related predictive systems is in constant development and new advances are introduced at a rapid pace (monteith, carroll, seppi, & martinez, 2011). several methods have been suggested to improve the performance of machine learning algorithms and of neural network methods in particular, some of them using bayesian approaches which have shown excellent potential (aires, prigent, & rossow, 2004; orre, lansner, bate, & lindquist, 2000). we share the view expressed by nokelainen and silander (2014) that continued research in this field should be pursued, and ensemble methods (rokach, 2010), such as those involving bootstrap aggregating (sahu, runger, & apley, 2011), and bayesian model combination (monteith et al., 2011), together with multiple classifier systems (roli, giacinto, & vernazza, 2001) are among those that should continue to be considered in certain applications. in conclusion, we can state that as was very accurately stated by anders and korn (1996) in their work on model selection in neural networks, the process of model selection in ann can be informed by statistical procedures and methods. statistical methods can improve the model building and the interpretation of anns. what is needed is the courage and open-mindedness to actually explore new paths and new methodologies which can perhaps sometimes unexpectedly provide new conceptualisations and tools for theoretical advancement and practical applied research. this is particularly true in the fields of educational science and social sciences, where the complexity of the problems to be solved requires the exploration of proven methods and new methods, the latter usually not among the common arsenal of tools of neither practitioners nor researchers in these fields. keypoints artificial neural networks are powerful mathematical modelling tools for classification and prediction. advances in artificial neural network methodologies have made them more transparent and useful, avoiding the original “black box” characteristics in their early development. there is a long history with significant recent advances which has achieved strong ties between traditional statistical constructs with their equivalent in artificial neural networks. artificial neural networks are a useful methodology that can advance our understanding of phenomena when modelling for understanding and modelling for classification/predictions are combined. cascallar et al 76 | f l r artificial neural networks are an additional important tool in the researcher‟s toolbox which can be particularly useful to tackle highly complex and large data sets with interactions among the variables which are not fully understood. references agrawal, a. (2001). common property institutions and sustainable governance of resources. world development, 29, 1649-1672. doi: 10.1016/s0305-750x(01)00063-8 agrawal, a., & chhatre, a. (2006). explaining success on the commons: community forest governance in the indian himalaya. world development, 34, 149-166. doi: 10.1016/j.worlddev.2005.07.013 aires, f., prigent, c., & rossow, w. b. (2004). neural network uncertainty assessment using bayesian statistics: a remote sensing application. neural computing, 16, 2415-2458. doi: 10.1162/0899766041941925 al-deek, h. m. (2001). which method is better for developing freight planning models at seaports – neural networks or multiple regression? transportation research record, 1763, 9097. doi: 10.3141/176314 anders, u., & korn, o. (1996). model selection in neural networks. zew discussion papers, 96-21. retrieved from http://hdl.handle.net/10419/29449 ansari, a., jedidi, k., & jagpal, h. s. (2000). a hierarchical bayesian methodology for treating heterogeneity in structural equation models. marketing science, 19, 328-347. doi: 10.1287/mksc.19.4.328.11789 benitez, j. m., castro, j. l., & requena, i. (1997). are artificial neural networks black boxes? ieee transactions on neural networks, 8, 1156-1164. doi: 10.1109/72.623216 blackard, j. a. & dean, d. j. (1999). comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. computers and electronics in agriculture, 24, 131–151. doi: 10.1016/s0168-1699(99)00046-0 boekaerts, m., & cascallar, e. c. (2006). how far have we moved toward the integration of theory and practice in self-regulation? educational psychology review, 18, 199-210. doi: 10.1007/s10648-0069013-4 bridle, j. s. (1992). neural networks or hidden markov models for automatic speech recognition: is there a choice? in p. laface (ed.), speech recognition and understanding: recent advances, trends and application (pp. 225-236). new york: springer. cascallar, e. c., boekaerts, m., & costigan, t. e. (2006) assessment in the evaluation of selfregulation as a process. educational psychology review, 18, 297-306. doi: 10.1007/s10648-006-9023-2 http://dx.doi.org/10.1016/s0305-750x(01)00063-8 http://dx.doi.org/10.1016/j.worlddev.2005.07.013 http://dx.doi.org/10.1162/0899766041941925 http://dx.doi.org/10.3141/1763-14 http://dx.doi.org/10.3141/1763-14 http://hdl.handle.net/10419/29449 http://dx.doi.org/10.1287/mksc.19.4.328.11789 http://dx.doi.org/10.1109/72.623216 http://dx.doi.org/10.1016/s0168-1699(99)00046-0 http://dx.doi.org/10.1007/s10648-006-9013-4 http://dx.doi.org/10.1007/s10648-006-9013-4 http://dx.doi.org/10.1007/s10648-006-9023-2 cascallar et al 77 | f l r cheng, b., & titterington, d. m. (1994). neural networks: a review from a statistical perspective. statistical science, 9, 1, 2-54. doi: 10.1214/ss/1177010638 chin, w. w., marcolin, b. l., & newsted, p. r. (2003). a partial least squares latent variable modelling approach for measuring interaction effects: results from a monte carlo simulation study and an electronic-mail emotion/adoption study. information systems research, 14, 189–217. doi: 10.1287/isre.14.2.189.16018 decuyper, s., dochy, f., & van den bossche, p. (2010). grasping the dynamic complexity of team learning: an integrative model for effective team learning in organisations. educational research review, 5, 111-133. doi: 10.1016/j.edurev.2010.02.002 detienne, k. b., detienne d. h., & joshi, s. a. (2003). neural networks as statistical tools for business researchers. organizational research methods, 6, 236-265. doi: 10.1177/1094428103251907 donaldson, r. g., & kamstra, m. (1999). neural network forecast combining with interaction effects. journal of the franklin institute, 336b, 227-236. doi: 10.1016/s0016-0032(98)00018-0 dreiseitl, s., & ohno-machado, l. (2002). logistic regression and artificial neural network classification models: a methodology review. journal of biomedical informatics, 35, 352–359. doi: 10.1016/s1532-0464(03)00034-0 edelsbrunner, p., & schneider, m. (2013). modelling for prediction vs. modelling for understanding: commentary on musso et al. (2013). frontline learning research, 2, 99-101. everson, h. t., chance, d., & lykins, s. (1994, april). exploring the use of artificial neural networks in educational research. paper presented at the annual meeting of the american educational research association, new orleans, louisiana. frey, u. j., & rusch, h. (2013). using artificial neural networks for the analysis of social-ecological systems. ecology and society, 18, 40.doi:10.5751/es-05202-180240. frigg, r. & hartmann, s. (2006). models in science. in e. n. zalta (ed.), the stanford encyclopaedia of philosophy. summer 2006 edition. stanford, ca: stanford university press. garson, g. d. (1991). interpreting neural-network connection weights. ai expert, 6, 47-51. garson, g. d. (1998). neural networks. an introductory guide for social scientists. london: sage publications ltd. gevrey, m., dimopoulos, i., & lek, s. (2003). review and comparison of methods to study the contribution of variables in artificial neural network models. ecological modelling, 160, 249-264. doi: 10.1016/s0304-3800(02)00257-0 golino, h. f., & gomes, c. m. (2014). four machine learning methods to predict academic achievement of college students: a comparison study. manuscript submitted for publication. http://dx.doi.org/10.1214/ss/1177010638 http://dx.doi.org/10.1287/isre.14.2.189.16018 http://dx.doi.org/10.1016/j.edurev.2010.02.002 http://dx.doi.org/10.1177/1094428103251907 http://dx.doi.org/10.1016/s0016-0032(98)00018-0 http://dx.doi.org/10.1016/s1532-0464(03)00034-0 http://dx.doi.org/10.1016/s1532-0464(03)00034-0 http://dx.doi.org/10.1016/s0304-3800(02)00257-0 http://dx.doi.org/10.1016/s0304-3800(02)00257-0 cascallar et al 78 | f l r grishman, r., & sundheim, b. (1996). message understanding conference 6: a brief history. in: proceedings of the 16th international conference on computational linguistics (coling), i, copenhagen, 466–471. haenlein, m., & kaplan, a. (2004). a beginner's guide to partial least squares analysis. understanding statistics, 3, 283–297. doi: 10.1207/s15328031us0304_4 hahn, c., johnson, m. d., herrmann, a., & huber, f. (2002). capturing customer heterogeneity using a finite mixture pls approach. schmalenbach business review, 54, 243269. hand, d., mannila, h., & smyth, p. (2001). principles of data mining. cambridge, ma: mit press. haykin, s. (1994). neural networks: a comprehensive foundation. new york: macmillan. he, s., & li, j. (2011). confidence intervals for neural networks and applications to modeling engineering materials. in c. l. p. hui (ed.), artificial neural networks – application. shanghai, china: intech. doi: 10.5772/16097 intrator, o., & intrator, n. (2001). interpreting neural-network results: a simulation study. computational statistics and data analysis, 37, 373–393. doi: 10.1016/s0167-9473(01)00016-0 kim, j., & ahn, h. (2009). a new perspective for neural networks: application to a marketing management problem. journal of information science and engineering, 25, 1605-1616. kyndt, e., musso, m., cascallar, e., & dochy, f. (2011, august). predicting academic performance in higher education: role of cognitive, learning and motivation. symposium conducted at the 14th earli conference, exeter, uk. kyndt, e., musso, m., cascallar, e., & dochy, f. (2015, in press). predicting academic performance: the role of cognition, motivation and learning approaches. a neural network analysis. in v. donche & s. de maeyer (eds.), methodological challenges in research on student learning. antwerp, belgium: garant. laguna, m., & marti, r. (2002). neural network prediction in a system for optimizing simulations. iie transactions, 34, 273-282. doi: 10.1080/07408170208928869 lee, c., rey, t., mentele, j., & garver, m. (2005). structured neural network techniques for modeling loyalty and profitability. proceedings of the thirtieth annual sas® users group international conference. cary, nc: sas institute inc. lei, p. w., & qiong wu, q. (2007). introduction to structural equation modelling: issues and practical considerations. items – instructional topics in educational measurement fall 2007, ncme instructional module, 33-43. lek, s., belaud, a., baran, p., dimopoulos, i., & delacoste, m. (1996). role of some environmental variables in trout abundance models using neural networks. aquat. living resour, 9, 23-29. doi: 10.1051/alr:1996004 http://dx.doi.org/10.1207/s15328031us0304_4 http://dx.doi.org/10.5772/16097 http://dx.doi.org/10.1016/s0167-9473(01)00016-0 http://dx.doi.org/10.1080/07408170208928869 http://dx.doi.org/10.1051/alr:1996004 http://dx.doi.org/10.1051/alr:1996004 cascallar et al 79 | f l r luft, c. d. b., gomes, j. s., priori, d., & takase, e. (2013). using online cognitive tasks to predict mathematics low school achievement. computers & education, 67, 219-228. doi: 10.1016/j.compedu.2013.04.001 mackay, d. j. c. (1992). a practical bayesian framework for backpropagation networks. neural computation, 4, 448472. doi: 10.1162/neco.1992.4.3.448 marquez, l., hill, t., worthley, r., & remus, w. (1991). neural network models as an alternative to regression. proceedings of the ieee 24th annual hawaii international conference on systems sciences, 4, 129-135. doi: 10.1109/hicss.1991.184052 monteith, k., carroll, j., seppi, k., & martinez, t. (2011). turning bayesian model averaging into bayesian model combination. in: proceedings of the international joint conference on neural networks (ijcnn) 2011, 2657–2663. musso, m. f., & cascallar, e. c. (2009a). new approaches for improved quality in educational assessments: using automated predictive systems in reading and mathematics. journal of problems of education in the 21st century, 17, 134-151. musso, m. f. & cascallar, e. c. (2009b).predictive systems using artificial neural networks: an introduction to concepts and applications in education and social sciences. in m. c. richaud & j. e. moreno (eds.). research in behavioural sciences (volume i), (pp. 433-459). buenos aires, argentina: ciipme/conicet. musso, m. f., kyndt, e., cascallar, e. c., & dochy, f. (2012). predicting mathematical performance: the effect of cognitive processes and self-regulation factors. education research international. vol 2012, article id 250719, 13 pages. doi: 10.1155/2012/250719 musso, m. f., kyndt, e., cascallar, e. c., & dochy, f. (2013). predicting general academic performance and identifying differential contribution of participating variables using artificial neural networks. frontline learning research, 1, 42-71. doi: 10.14786/flr.v1i1.13 musso, m. f., boekaerts, m., segers, m., & cascallar, e. c. (in preparation). a comparative analysis of the prediction of student academic performance. neal, w., & wurst, j. (2001). advances in market segmentation. marketing research, 13, 14-18. nguyen, n., & cripps, a. (2001). predicting housing value: a comparison of multiple regression and artificial neural networks. journal of real estate research, 22, 313-336. nokelainen, p. & silander, t. (2014). using new models to analyse true complex regularities of the world: commentary on musso et al. (2013). frontiers in psychology, 3, 78-82. doi: .org/10.14786/flr.v2i1.107. olden, j. d., & jackson, d. a. (2002). illuminating the ''black box'': a randomization approach for understanding variable contributions in artificial neural networks. ecological modelling, 154, 135150. doi: 10.1016/s0304-3800(02)00064-9 http://dx.doi.org/10.1016/j.compedu.2013.04.001 http://dx.doi.org/10.1016/j.compedu.2013.04.001 http://dx.doi.org/10.1162/neco.1992.4.3.448 http://dx.doi.org/10.1109/hicss.1991.184052 http://dx.doi.org/10.14786/flr.v1i1.13 http://dx.doi.org/10.1016/s0304-3800(02)00064-9 cascallar et al 80 | f l r olden, j. d., joy, m. k. & death, r. g. (2004). an accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. ecological modelling, 178, 389-397. doi: 10.1016/j.ecolmodel.2004.03.013 orre, r., lansner, a., bate, a., & lindquist, m. (2000). bayesian neural networks with confidence estimations applied to data mining. computational statistics & data analysis, 34, 473-493. doi: 10.1016/s0167-9473(99)00114-0 perkins, k., gupta, l., & tamanna (1995). predict item difficulty in a reading comprehension test with an artificial neural network. language testing, 12, 34-53. doi: 10.1177/026553229501200103 pinninghoff junemann, m. a., salcedo lagos, p. a., & contreras arriagada, r. (2007). neural networks to predict schooling failure/success. in j. mira & j. r. alvarez (eds.), nature inspired problemsolving methods in knowledge engineering, (part ii), (pp. 571–579). berlin/heidelberg: springerverlag. doi: 10.1007/978-3-540-73055-2_59 ramaswami, m. m., & bhaskaran, r. r. (2010). a chaid based performance prediction model in educational data mining. international journal of computer science issues, 7, 10-18. roli, f., giacinto, g., & vernazza, g. (2001). methods for designing multiple classifier systems. in j. kittler & f. roli (eds.), multiple classifier systems, (pp. 78-87). berlin/heidelberg: springer-verlag. doi: 10.1007/3-540-48219-9_8 ripley, b. d. (1996). pattern recognition and neural networks. cambridge: cambridge university press. doi: 10.1017/cbo9780511812651 rivals, i., & personnaz, l. (2000). construction of confidence intervals for neural networks based on least squares estimations. neural networks, 13, 463-484. doi: 10.1016/s0893-6080(99)00080-5 rokach, l. (2010). ensemble-based classifiers. artificial intelligence review, 33, 1-39. doi: 10.1007/s10462-009-9124-7 sahu, a., runger, g., apley, d. (2011). image denoising with a multi-phase kernel principal component approach and an ensemble version. ieee applied imagery pattern recognition workshop, 1-7. schermelleh-engel, k., kerwer, m., & klein, a. g. (2014). evaluation of model fit in nonlinear multilevel structural equation modelling. frontiers in psychology, 5, article 181, 1-11. doi: 10.3389/fpsyg.2014.00181. suppes, p. (1962). models of data. in e. nagel, p. suppes & a. tarski (eds.), logic, methodology and philosophy of science: proceedings of the 1960 international congress. stanford: stanford university press, 252-261. thrush, s. f., coco, g., & hewitt, j. e. (2008). complex positive connections between functional groups are revealed by neural network analysis of ecological time series. american naturalist 171, 669-677. doi: 10.1086/587069 http://dx.doi.org/10.1016/j.ecolmodel.2004.03.013 http://dx.doi.org/10.1016/s0167-9473(99)00114-0 http://dx.doi.org/10.1016/s0167-9473(99)00114-0 http://dx.doi.org/10.1177/026553229501200103 http://dx.doi.org/10.1007/978-3-540-73055-2_59 http://dx.doi.org/10.1007/3-540-48219-9_8 http://dx.doi.org/10.1007/3-540-48219-9_8 http://dx.doi.org/10.1017/cbo9780511812651 http://dx.doi.org/10.1016/s0893-6080(99)00080-5 http://dx.doi.org/10.1007/s10462-009-9124-7 http://dx.doi.org/10.1007/s10462-009-9124-7 http://dx.doi.org/10.1086/587069 cascallar et al 81 | f l r tzeng, f. y., & ma, k. l. (2005). intelligent feature extraction and tracking for visualizing large-scale 4d flow simulations. in dvd proceedings of the international conference for high performance computing, networking, storage and analysis (sc '05). november, 2005. weiss, s. m., & kulikowski, c. a. (1991). computer systems that learn. san mateo, ca: morgan kaufmann publishers. west, p. m., brockett, p. l., & golden, l. l. (1997). a comparative analysis of neural networks and statistical methods for predicting consumer choice. marketing science, 16, 370-391. doi: 10.1287/mksc.16.4.370 weston, r., & gore, p. a. (2006). a brief guide to structural equation modeling. the counseling psychologist, 34, 719-751. doi: 10.1177/0011000006286345 white, h., & racine, j. (2001). statistical inference, the bootstrap, and neural network modelling with application to foreign exchange rates. ieee transactions on neural networks, 12, 657-673. doi: 10.1109/72.935080 wilson, r. l., & hardgrave, b. c. (1995). predicting graduate student success in an mba program: regression versus classification. educational and psychological measurement, 55, 186-195. doi: 10.1177/0013164495055002003 yeh, i. c., & cheng, w. l. (2010). first and second order sensitivity analysis of mlp. neurocomputing, 73, 2225-2233. doi: 10.1016/j.neucom.2010.01.011 zambrano matamala, c., rojas díaz, d., carvajal cuello, k., & acu-a leiva, g. (2011). análisis de rendimiento académico estudiantil usando data warehouse y redes neuronales. [analysis of students' academic performance using data warehouse and neural networks] ingeniare. revista chilena de ingeniería, 19, 369-381. doi: 10.4067/s0718-33052011000300007 zapranis, a., & livanis, e. (2005). prediction intervals for neural network models. proceedings of the 9th wseas international conference on computers (iccomp'05). world scientific and engineering academy and society (wseas). stevens point, wisconsin, usa. http://dx.doi.org/10.1287/mksc.16.4.370 http://dx.doi.org/10.1287/mksc.16.4.370 http://dx.doi.org/10.1177/0011000006286345 http://dx.doi.org/10.1109/72.935080 http://dx.doi.org/10.1109/72.935080 http://dx.doi.org/10.1177/0013164495055002003 http://dx.doi.org/10.1177/0013164495055002003 http://dx.doi.org/10.1016/j.neucom.2010.01.011 http://dx.doi.org/10.4067/s0718-33052011000300007 van gasse publication frontline learning research vol.7 no. 2 (2019) 40 56 issn 2295-3159 the effect of formal team meetings on teachers’ informal data use interactions roos van gasse a auniversity of antwerp, belgium article received 2 january 2019 / revised 19 march / accepted 18 april / available online 8 may abstract in recent years, the emphasis on interaction in data use has grown because of its potential to support individual teachers. however, in practice, teachers do not appear to interact widely in their use of data, either formally or informally. to gain knowledge of how sustainable data use interactions can be facilitated, this study investigated how formal data use in teams of teachers affects the teachers’ informal interactive data use. a survey provided insight into 72 teachers’ perceptions of data use discussion, interpretation, diagnosis and action at formal team meetings. subsequently, social network analysis of seven teacher informal data use networks revealed that teachers with more positive perceptions about formal data use become more active in their informal data use network. within the problem diagnosis phase, this tendency is to generalize across the participating teams. the results of this study imply that, particularly to define problems and formulate actions based on pupil learning outcome data, it is necessary to ensure strong connections between teachers in formal groupings in order to affect their informal interactive behaviour. keywords: data use; informal interactions, formal groupings; collaboration info corresponding author: roos.vangasse@uantwerpen.be doi: 10.14786/flr.v7i2.443 1. introduction to date, teachers are increasingly stimulated to use data (e.g. student data) to learn about and improve their practice. this emphasis on data use in education originated from the belief that data use objectivises educational decisions and contributes to more effective (changes in) instructional practices. the idea is that the systematic analysis and interpretation of different types of data can lead to school improvement (campbell & levin, 2008; carlson, borman, & robinson, 2011). teachers often experience difficulties in the process of transforming data into knowledge and action (datnow & hubbard, 2016; hubbard, datnow, & pruyn, 2014; jimerson, 2014; wayman, midgley, & stingfield, 2007). therefore, the international literature has pointed to the important role of teacher interactions in data use. interactions have the potential to provide teachers struggling to use data appropriately with the necessary support to accomplish the complex translation of data into decisions and actions (bertrand & marsh, 2015). moreover, the belief has grown that data use interactions create an environment for teachers’ professional development (vanhoof & schildkamp, 2014). however, the current situation regarding interactive data use among teachers gives cause for pessimism, for two reasons. first, the frequency of interactions matters for teacher learning (penuel, sun, frank, & gallagher, 2012). yet, data use interactions are often limited in comparison with other forms of professional interaction, either in formally-established groups or on an informal basis (farley-ripple & buttram, 2015; hubers, moolenaar, schildkamp, daly, handelzalts, & pieters, 2017; keuning, van geel, visscher, fox, & moolenaar, 2016). second, interdependence among teachers facilitates learning. this implies that teachers interact from shared values and goals and that there is collective responsibility for pupils’ learning (horn & little, 2010; moolenaar, sleegers, & daly, 2012; stoll, bolam, mcmahon, wallace, & thomas, 2006). nevertheless, if data use interactions occur, teachers are not likely to share responsibility with colleagues in data use, with the result that brief exchanges of information take place rather than powerful learning activities (van gasse, vanlommel, vanhoof, & van petegem, 2016; 2017). the aforementioned issues mean that the value of learning outcomes from data use interactions must be put into question (van gasse et al., 2016). despite the great emphasis on teacher interactions in data use, there is a need for research to establish how sustainable and effective data use interactions are cultivated. an opportunity in this regard may lie in the interrelation between formal and informal interactions. after all, it has been known that teachers seek stability in terms of the number of colleagues that belong to their personal networks in schools, particularly when it comes to data use (van gasse, vanlommel, vanhoof, & van petegem, 2017; farley-ripple & buttram, 2015). therefore, knowing each other and working together in formal data use settings may be an important facilitator for teachers’ informal interactions. for example, research has shown that teachers involved in formal subgroups are more likely to interact with those colleagues on an informal basis (daly, moolenaar, bolivar, & burke, 2010; meredith, van den noortgate, struyve, gielen, & kyndt, 2017). however, what has remained underexplored up to now in the context of data use is whether it is simply the involvement in formal interactions that determines teachers’ informal interactions or whether what happens within those formal interactions can also contribute to teachers’ informal interactions. in other words, the question arises whether it is being familiar with colleagues that affects teachers’ informal data use interactions or the degree to which they feel that formal occasions facilitate the proper use of data. to fill this lacuna, the following research questions will guide this paper: 1. to what extent do teachers perceive that proper data use is accomplished at formal team meetings? 2. how do teachers’ perceptions of data use at formal team meetings affect their informal interaction-seeking behaviour with colleagues from the formally-constructed team? 2. conceptual framework this conceptual framework will first provide broader information and a theory of data use. subsequently, it will describe (the merits of) teacher interactions in the context of data use and outline the relationship between formal and informal interactions found in the literature. 2.1 data use and data data use is a way to manage processes within the school. the aim is to map school processes, to ensure that these processes are in line with school-wide goals and to use data to improve these processes (barrezeele, 2012; schildkamp & kuyper, 2010). therefore, many types of qualitative and quantitative data can be used (hulpia, valcke & verhaeghe, 2004; schildkamp & kuyper, 2010). data use is a somewhat simplistic linguistic merger of ‘data’ and ‘use’. effective data use is not only about ‘data’ and about ‘use’. it is a complex and sequential process in which data are transformed into information and knowledge (coburn & turner, 2011; marsh, 2012). therefore, data users need to run through different sub-processes to interrupt teachers’ tendency to jump from data to decisions (schildkamp et al., 2016). a lot of research distinguishes the phases of data discussion, analysis, interpretation and action (gummer & mandinach, 2015; marsh, 2012; schildkamp et al., 2016). however, teachers often struggle with the translation of data to classroom interventions (gummer & mandinach, 2015; datnow & hubbard, 2016). therefore, we explicitly insert a phase of problem diagnosis in our conceptualisation of data use. in this study, the sequence of data use is one of discussion, interpretation, diagnosis and action (verhaeghe et al., 2010). this means that data first needs to be read and discussed. subsequently, data must be interpreted correctly. next, potential causes and explanations are hypothesised and checked in the diagnosis phase. finally, teachers design and implement improvement actions (verhaeghe et al., 2010). the data use sequence appears straightforward in outline. nevertheless, the literature has repeatedly shown that in practice complexity arises because the sequence of activities is often interrupted or teachers return to previous phases (schildkamp et al., 2015; marsh & farrell, 2015). moreover, activities within the different phases of data use cannot be considered identical (schildkamp et al., 2016). it is essential to approach the concept of data use with sufficient precision and to take into account that different phases can imply differences in teacher behaviour. therefore, we will explicitly distinguish between data discussion, interpretation, diagnosis and action in this study. the early research on data use indicated that it is a difficult process for teachers. the sequence (discussion, interpretation, diagnosis and action) includes numerous potential pitfalls. for example, individual teachers become stuck in the interpretation phase of data use or have difficulty ascertaining where exactly pupils’ problems are located when analysing the data (datnow & hubbard, 2016; verhaeghe et al., 2010). therefore, more recently, an increasing emphasis has been laid on teacher interactions in the context of data use. the conviction has grown that teacher interactions are beneficial because they provide a supportive environment in which individual data use struggles can be overcome (bertrand & marsh, 2015; hubers et al., 2017). moreover, interaction in the context of data use has been identified as conducive to a professional learning environment for teachers (vanhoof & schildkamp, 2014). the following sections will further delineate the relationship between formal and informal interactions. thereafter, we will describe how social network theory will be used to explore this connection. 2.2 informal teacher interactions in data use informal teacher interactions are formed based on personal goals. informal interactions can occur very systematically or ad hoc, but always on the initiative of one (or more) teachers without the central and external creation of a common mission (blankenship & ruona, 2009). as a result, informal interactions may become more formal on the initiative of teachers themselves, but may equally remain unstructured and ad hoc. despite the potential of informal interactions for teacher learning (jones & dexter, 2014; kyndt, gijbels, grosemans, & donche, 2016) , there has been limited research on informal interactions in the area of data use. the few studies to report on such interactions have shown that they are fairly limited (farley-ripple & buttram, 2015; van gasse et al., 2017). in addition, the group of colleagues with whom teachers interact appears to remain relatively fixed. teachers’ consult a similar (but smaller) pool of colleagues for data use purposes, compared to for their regular professional activities (farley-ripple & buttram, 2015). this implies that teachers will not turn to specific colleagues with regard to data use (e.g. data use experts), but rather that they maintain a stable network for their different professional activities. this is also illustrated in their approach to the different activities within the data use sequence. across the discussion, interpretation, diagnosis and action stages, teachers involve the same colleagues; however, for the more complex phases (e.g. data use action), fewer colleagues are consulted and deeper interactions are established (van gasse et al., 2017). the facts that teachers seek stability in interactions with colleagues and that deeper interactions are not established with all colleagues imply that informal interactions may be of significant importance for sustainable and effective data use. therefore, further insights are needed into how these interactions can be facilitated and the role of formal interactions in this regard. 2.3 formal teacher interactions in data use contrary to informal interactions, formal interactions are formed in function of a specific organizational goal instead of individual goals. this means interactions for which a common mission to use data is created externally. generally, such interactions result in the construction of pre-structured work groups, in which team members are made responsible for the outcomes, or the division of labour is clearly explicated beforehand (blankenship & ruona, 2009). common examples of such formal interactions are data use interventions that include the creation of a team of educators working on specific school-related problems in order to learn how to use data (ciampa & gallagher, 2016; cosner, 2011; keuning et al., 2016; schildkamp et al., 2016). research has shown marked differences in how such teams work through the different phases of data use, in a sense that some teams reach deeper levels of inquiry than others (schildkamp et al., 2016). moreover, it is not the case that creating such formal group settings automatically leads to highly interactive groups; in fact, levels of interactions in formal data use groups remain limited (hubers et al., 2017; keuning et al., 2016). in addition, the distribution of knowledge from inside formal data use work groups to colleagues outside the teams appears to be scarce (hubers et al., 2017). thus, although formal groups have been widely implemented with the aim of facilitating interactions between teachers beyond those groups and facilitating school-wide data use, such formal interactions seem to fall short of this intention. given the teachers’ wish for stability in data use interactions and the limited number of colleagues with whom they undertake the most intense data use interactions (farley-ripple & buttram, 2015; van gasse et al., 2017), more knowledge is needed about the mechanisms mediating between formal and informal data use interactions. these insights are needed to better facilitate the connection between interventions using formal group settings and teachers’ informal interactive behaviour in data use. research into teacher interactions has already exposed some interconnections between formal and informal interactions. for example, research of meredith and colleagues (2017) showed that being connected to formal subunits can, to some extent, facilitate informal interactions. in the findings of their study, teachers who were connected to the same subunit appeared to interact more often with each other informally. similar findings were reported by penuel et al. (2010), who concluded that teachers’ being connected in a formal structure (by grade level) were more likely to interact with each other. other researchers have argued that formal groupings provide a specific focus for interaction and shape opportunities for interactions (spillane & kim, 2012; spillane, parise, & sherer, 2011). therefore, some formal groupings are considered to have more influence on teacher interactions than personal characteristics (spillane, hopkins, & sweet, 2015). however, the aforementioned studies pay insufficient attention to the impact of what happens within formal meetings on teachers’ informal interactive behaviour. the studies look at formal structures and group compositions to explain why informal interactions take place; yet, in the context of data use, we know that large differences in group processes may occur (schildkamp et al., 2016). moreover, even among teachers who are bounded to the same formal data use groupings, informal interactions can be scarce (van gasse et al., 2017). therefore, greater insight is needed into whether teachers perceive that proper discussion, interpretation, diagnosis and action of data takes place in formal team meetings as this might also explain their informal interactive behaviour within the same constellation of teachers. 3.method 3.1 research context the study was carried out in flanders, the dutch speaking part of belgium. in flanders, schools get autonomy over how they achieve the required educational standards (penninckx, vanhoof, & van petegem, 2011). the government does not impose central exams (oecd, 2014). instead, schools themselves are responsible for developing strategies to meet the flemish standards at the end of secondary education. therefore, the flemish government’s perspective on data use is quite improvement oriented. this is different to countries with a longer though more accountability oriented tradition in data use (e.g. the netherlands, united states, united kingdom). in practice, this implies that flemish schools and teachers often primarily rely on their own data sources (e.g. tests, assignments, observations or portfolios) for data use purposes. in this study, we will report on teachers’ use of pupil learning outcome data because this is an informative source of data for teachers to improve their practices and to evaluate whether or not pupils meet the flemish standards at the end of secondary education. pupil learning outcome data include cognitive outcomes (i.e. linguistic and arithmetic skills) as well as non-cognitive outcomes (i.e. attitudes, and artistic and physical education), and those data can be both quantitative (e.g. class tests) and qualitative (e.g. observations). this conceptualisation of ‘data’ is broader than often-used definitions which refer solely to cognitive output indicators (schildkamp et al., 2012). as such, this study contributes to an enriched conceptualisation of ‘pupil learning outcome data’ that includes both cognitive outcomes (e.g., linguistic and arithmetic skills) and non-cognitive outcomes (e.g., attitudes, and artistic and physical education). the study took place in the context of a project on the assessment of competences (d-pac.be). all ten schools involved in the project were asked to participate in this study. in each school, the target population were all teachers of the pupil group that participated in an assessment of writing competences in the aforementioned project, i.e. the fifth grade of an academic track in economics and languages (16to 17-year-olds). in flanders, these forms of teacher teams are temporary interdisciplinary groupings that are collectively responsible for pupils’ learning. two to three times during the school year, the teams are obliged to discuss the pupils’ learning outcomes in a formal team meeting. in the last team meeting of the year, team members deliberate as to whether or not pupils will successfully complete their year. this study will report both on teachers’ perceptions of data use at those formal team meetings and their informal interactive behaviour in data use within the same team. in order to answer the present research questions, quantitative data analysis has been combined with social network analysis. both types of data were collected in the same online survey. 3.2 social network theory and analysis in this study, interactions will be studied by means of social network analysis. this method draws on social network theory, which uses the position of actors within a network to determine their access to resources (e.g. colleagues’ knowledge and skills in a broad sense) (finnigan & daly, 2012). social network analysis approaches interactions with fine-grained information. it brings together information about both actors in an inter-action, which creates great depth of analysis. there are three general elements present in inter-actions. the first is interaction-seeking behaviour. for example, in a data use interaction, teacher a may ask teacher b for advice. in this case, teacher a is sending a connection (or a tie) to teacher b. this is what in social network analysis is called a sent tie (or outdegree). in reverse, teacher b may also ask the advice of teacher a (or send a connection to a). from teacher a’s perspective, this a received tie (or indegree). if teacher a and teacher b ask each other for advice, both teachers are sending and receiving ties from each other. social network analysis calls them reciprocated ties (borgatti, everett, & johnson, 2013). although the different characteristics of inter-actions are explained by means of advice ties, social network studies have used a range of topics for interaction purposes (e.g. friendship ties, information ties, general professional ties) (e.g. van gasse et al., 2017; daly et al., 2010; moolenaar et al., 2012). social network analysis provides a means of explaining the different types of connections between teachers. for example, perceptions of data use at formal team meetings can be used to explain different types of ties. given the present research questions, this study will explain teachers’ outdegree measures. this measure reflects the number of outgoing interactions or the extent to which they take the initiative in interaction with colleagues (borgatti, mehra, brass & labianca, 2009). in other words, it will provide insight into whether teachers become more active interactors in data use when feeling more positive about formal data use with the same colleagues. in the analysis section, the term sender effects will refer to the effect of teachers’ perceptions of formal data use on their informal interactive behaviour in terms of outdegree (sweet, 2016). for example, a positive sender effect would mean that the more positive teachers are about formal data use, the more likely they are to initiate informal interactions within the same network (i.e. to have a higher outdegree measure). 3.3 participants the data of three teams were excluded from this study because of a response rate lower than the 80% required in social network analysis. the other response rates are shown in table 1. for confidentiality purposes, the team names are fictive. response rates above 80% were reached in all teams, with maximum response rates (100%) in four out of seven teams. the high response rates imply that accurate conclusions can be drawn about the relationship between formal and informal interactions in teachers’ use of pupil learning outcome data. across the teams, 3048 data points provide sufficient statistical power to reveal some general tendencies. table 1: teams' response rates (social network analysis) apart from team mckinley (13 teachers) and team eppingswood (8 teachers), the teams consisted of eleven teachers. the teams were interdisciplinary, which means that teachers teach different subjects to the pupil group. all teachers had a master degree; 60 % were female, 40 % were male. further, a small majority of the participants taught the pupil group more than three course hours per week and the teaching experience of the participant group varied from less than five years to over thirty years. 3.4 instrument both teachers’ perceptions of their formal and informal data use interactions (i.e. the discussion, interpretation, diagnosis and action with regard to pupil learning outcome data) were measured by using an online survey. first, some general information was questioned, such as gender, level of educational attainment or amount of teaching time per week in the specific pupil group (fifth year track economics and languages). the second part of the survey included questions on the formal team meetings. teachers were asked to rate five statements concerning the extent to which they use pupil learning outcome data at the formal team meetings (e.g. ‘together with colleagues, i diagnose problems based on pupil learning outcome data during the formal team meetings’) on a scale from 1 (totally disagree) to 5 (totally agree). the cronbach’s alpha of 0.88 indicated good internal consistency of teachers’ perceptions regarding to the data use at these formal team meetings (see table 2). table 2. descriptive statistics of the formal interaction scale in order to measure teachers’ perceptions of their informal data use interactions, social network questions were included in the survey. for each of the data use phases (i.e. discuss, interpret, diagnose, take action), a social network question was included (e.g. ‘which of the following colleagues do you consult to discuss pupil learning outcome data?’). subsequently, all members of the teacher team were listed and participants indicated which of the listed colleagues they consult for data use discussion, interpretation, diagnosis and action apart from the formal team meetings of the team. 3.5 analyses to answer the first research question, we aggregated teachers’ item scores of the five items on formal data use interactions at team level using spss 22 software. subsequently, descriptive statistics (i.e. average and standard deviation) were calculated for each participating teacher team. as such, we are able to draw team-level conclusions about general perceptions regarding the use of pupil learning outcome data on formal occasions. this was necessary to build a point of reference for the informal interactive behaviour of teams that was analysed in light of the second research question. with regard to the second research question, we first calculated a scale score (i.e. average score) of the five items on formal data use interaction at teacher level. this was needed for being able to attribute relational differences to individual differences and subsequently to team differences. the relation between teachers’ scale scores on the formal interaction scale and their informal interaction seeking behaviour was tested using exponential random graph modelling (ergm). ergm enables researchers to analyse interaction patterns in social networks and to explain specific relationships. this type of analysis predicts the presence of particular relations in the network and, thus, it can be used to assess the predictive value of teachers’ perceptions of interactions on formal data use occasions for their informal behaviour in networks. in doing this, ergm takes into account global network structures as such, ergm accounts for the multilevel effect that occurs in using the level of relationships (within individuals within teams) as the unit of analysis. statnet’s r-based ergm package was used for the analyses (handcock, hunter, butts, goodreau, & morris, 2016). ergms are specified at team level. therefore, per team multiple ergms were specified; one for each phase of the data use cycle. each of those ergms included a single sender effect for teachers’ scores on the formal interactions scale. this means that the model investigated whether higher scores on the formal interactions scale affect the probability of sending relationships (i.e. consulting colleagues). this effect was investigated in the different data use networks of each team (i.e. discussion, interpretation, diagnosis and action). for each ergm, the model with the sender effect included was compared with the baseline model by means of the akaike information criterion (aic). this method was used to evaluate whether informal teacher interactions were better explained by teachers’ perceptions of formal interactions than by chance. to evaluate overall effects in the discussion, interpretation, diagnosis and action networks, a meta-analysis was conducted across the seven teams using the ‘metafor’ package in r. 4.results the structure of this section is aligned with the research questions. therefore, we will first describe teachers’ perceptions regarding data use at formal team meetings. then we will provide insight into the descriptive statistics with regard to teachers informal data use interactions in light of a better understanding of the subsequent analyses. finally, we will present the findings on interrelationships between teachers’ perceptions of data use at formal team meetings and their informal data use interactions. 4.1 formal data use interactions in teacher teams table 3 provides an overview of the descriptive statistics of this study. the statistics with regard to ‘formal data use’ include the aggregated team scores of teachers’ perceptions of the use of pupil learning outcome data (i.e. discussion, interpretation, diagnosis and action) at the obliged formal team meetings (cf. context description). with regard to teachers’ perceptions of formal data use in the team, we find positive to strongly positive perceptions overall. in all teams, teachers report that pupil learning outcome data are discussed and interpreted at formal team meetings, that problems are diagnosed and that appropriate actions are formulated to improve pupils’ learning. across the teams, averages range from moderately positive perceptions (average team northvale = 3.80) to extremely positive perceptions (average team melrose = 4.94). however, in some teams teachers’ perceptions are more disparate than in others. for example, the standard deviation of team melrose indicates that all teachers answered the questions on formal data use similarly (sd = 0.11), whereas the same measure in teams easton and northvale indicates a significantly larger variation between teachers (sd = 0.82 and sd = 1.04 respectively). this means that some teachers were a lot more positive than others about data use on formal occasions in those teams. the ergm analysis will reveal whether such different perceptions also affect teachers’ informal data use interactions. 4.2 informal data use in teacher teams in order to better understand the interrelation between formal and informal data use interactions, table 3 also shows the average number of informal interactions (i.e. interactions independent from the formal team meetings) within the discussion, interpretation, diagnosis and action phase per teacher teams. this is represented by the ‘average degree’ measure, which refers to the number of outgoing relations (or the extent to which teachers consult colleagues), aggregated at team level. we find limited informal activity within the teams across the data use sequence (discussion, interpretation, diagnosis and action). table 3 shows that the maximum average degree, for teams of 11 teachers, occurs in team riverbank’s discussion network (average degree = 4.45). this means that teachers in team riverbank are, on average, connected to 4–5 teachers (out of 10) for data discussion. team riverbank is the most active network of 11 teachers with regard to informal data use interactions. the smaller team eppingswood is interacting to a slightly greater extent compared to the other teams. for example, the average degree of 3.63 in the action network implies that teachers are interacting informally with 3–4 colleagues (out of 7). furthermore, higher average degree numbers are found in team colby, but only for the data discussion and interpretation networks; the established relations in the diagnosis and action networks of team colby are in line with all other teams. therefore, in general, informal data use interactions are scarce, with teachers connected to approximately 2 other teachers for informal data use discussion, interpretation, diagnosis and action. table 3 descriptive statistics per team 4.2 perceptions of formal data use and informal interactive behavior table 4 shows the results of the ergm analyses of the impact of teachers’ perceptions of formal data use on their informal interaction-seeking behaviour in data use. these effects are represented in the ‘sender effects’ columns. the different ergms provide an overview of the effects of teachers’ perceptions on formal interactions across the different phases of the data use sequence. as such, fine-grained conclusions on the effects of perceived formal interactions can be drawn. a first finding of the ergm analyses is that there are some teams in which there are no significant effects from teachers’ perceptions of formal data use on informal interaction-seeking data use behaviour. more specifically, teachers’ informal data use interactions in teams riverbank and melrose cannot be related to their perceptions concerning formal data use in the same team constellations. in these teams, the teachers’ informal data use interactions can be explained equally well by chance. it is remarkable that teams riverbank and melrose in particular show no significant effects from teachers’ perceptions of formal interactions on their interaction-seeking behaviour. as shown in table 3, these are two of the four highest-scoring teams on the formal data use scale, which implies that teachers in those teams are particularly positive about the discussion and interpretation of data at formal team meetings, and about diagnosing problems and designing improvement actions based on data. additionally, the aforementioned teams have smaller standard deviations in our sample, which indicates that the teachers gave similar answers on the formal data use scale. table 4 sender effects of formal data use per team in all other teams, the ergm analyses show at least one effect. in team colby, we find a negative sender effect of formal data use on teachers’ informal interactive behaviour in the interpretation of pupil learning outcome data. in this team, teachers’ reporting higher scores on the formal data use scale are less likely to seek interaction with colleagues for the interpretation of pupil learning outcome data on an informal basis. in team mckinley a significant sender effect is again found for data use interpretation. however, in this team the effect is positive. this implies that, in team mckinley, teachers who are more positive about the discussion and interpretation of pupil learning outcome data, and about diagnosing problems and defining improvement actions are more likely to consult their teammates for informal data interpretation. given this disparity of effects in the interpretation networks of these two teams, no significant generic effects across the teams were found in the meta-analysis. together with the absence of effects in any of the teams’ discussion networks, this implies that no generic effects can be detected in the least complex phases of data use (i.e. discussion and interpretation). this implies that teachers’ perceptions of formal data use explain informal interactions for data discussion and interpretation no better than chance. however, the ergm analyses do show some effects of teachers’ perceptions of formal data use on their informal interaction-seeking behaviour in the more complex phases of data use. moreover, in each of the networks, these effects are positive. in teams mckinley, easton, eppingswood and northvale, teachers reporting higher scores on the formal data use scale are more likely to seek data use interactions with colleagues for diagnosing problems. this means that the more strongly teachers believe that pupil learning outcome data are discussed and interpreted effectively at formal team meetings, and that subsequently diagnosis is carried out and actions are defined, the more likely they are to consult (some of those) colleagues for informal data discussion. with regard to the definition of improvement actions, we find similar effects in teams mckinley, eppingswood and northvale. according to the network data, teachers with higher scores on the formal data use scale hare more likely to consult colleagues on data use actions. in other words, the teachers’ perceptions of data use at formal team meetings affect their informal interaction-seeking behaviour for the definition and implementation of improvement actions for pupils. thus, teachers’ perceptions of data use at formal team meetings seem to matter more for their informal interaction-seeking behaviour in the more complex phases of data use (i.e. discussion and action). the meta-analysis shows that these positive effects can be generalised across the teams for diagnosing problems from pupil learning outcome data, though not for defining and implementing improvement actions. 5. discussion and conclusion in data use, or the discussion and interpretation of data, the diagnosis of problems and the definition and implementation of improvement actions, researchers put a great emphasis on teacher interactions. peer interactions can provide teachers with the support necessary to acquire the complex knowledge and skills needed to transform data into meaningful decisions and actions (hubbard et al., 2014; jimerson, 2014; wayman et al., 2007). given the context, in which changes in teaching practices sometimes require prompt action, informal data use interactions may be of particular importance for teachers. nevertheless, the literature has shown that their informal data use interactions remain limited (farley-ripple & buttram, 2015; van gasse et al., 2017). therefore, it is crucial to understand how those informal data use interactions can be influenced. an important factor may be engagement in formal data use interactions. research has shown that being active in formal structures affects informal interactions (daly et al., 2010; meredith et al., 2017). however, what remained under-researched was how teachers’ perceptions of (data use) activities within formal team meetings influence their informal interactive behaviour. the current study approached this lacuna using a combination of survey questions and social network analysis. as such, insight was provided into teachers’ perceptions of the use of pupil learning outcome data at formal team meetings and the influence of these perceptions on their informal interaction-seeking behaviour. the descriptive statistics showed that, overall, teachers were positive about the discussion and interpretation of pupil learning outcome data at formal team meetings, as well as the diagnosis of problems and definition and implementation of improvement actions. there was notable variation in teachers’ perceptions across the teams, and some variation within them. the findings also showed limited informal data use interaction. on average, teachers were not connected to many of the colleagues with whom data use was carried out at formal team meetings. nevertheless, drawing on the ergm analyses, teachers’ perceptions of formal data use appeared be related to their informal interaction-seeking behaviour in the more complex phases of data use (i.e. diagnosis and action). significant positive effects of teachers’ perceptions of formal data use were found on the extent to which teachers consult colleagues for diagnosing problems and defining and implementing improvement actions using pupil learning outcome data. however, these effects could only be generalised across the teams for the problem diagnosis phase. two findings were particularly remarkable in the results of the ergm analyses. the first is that, in some teams, no significant effects were found. these teams (riverbank and melrose) had high average scores on the formal interactions scale with small standard deviations. this implies that teachers in those teams took a similar, positive stance towards the use of pupil learning outcomes at formal team meetings, making differences in informal behaviour more difficult to explain by this factor. the second remarkable finding, which may represent a step forward in the field of data use interaction research, is that the impact of teachers’ perceptions of formal data use only appeared to have an effect on the more complex phases of informal data use (i.e. diagnosis and action). apparently, a more or less positive stance towards the team’s formal data use is of less importance for teachers’ informal interaction-seeking at the data discussion and interpretation stages. yet for informal interactions at the problem-diagnosis phase, teachers’ perceptions regarding data use at formal team meetings do matter. an explanation may lie in the fact that for data discussion and interpretation, teachers lay slightly less weight on the specific colleagues involved in their interactions. as prior research has shown, teachers use interactions in these phases to build a frame of reference (van gasse et al., 2016). therefore, more colleagues may be suitable to interact with in these phases, and thus perceptions of what is happening at formal team meetings do not seem to play an important role. this is different in the phase of problem diagnosis and, to some extent, the phase of action. within these phases, more specific knowledge is needed. in this regard, the use of pupil learning outcome data can to some extent serve as an example of problem diagnosis based on data. positive perceptions of data use at formal team meetings can provide teachers with evidence that interactions in these phases may add value to the process of data use. additionally, they may develop a sense of who, among their colleagues, would make interesting partners for diagnosing problems in data use. as a result, the impact of teachers’ perceptions of formal interactions on their informal interactive behaviour mainly manifests in the phases where teachers often struggle most (i.e. diagnosis and action). this implies that, when schools and teachers aim to arrive at fruitful data use, a pit of the matter may lie in facilitating teachers in thorough interactions to diagnose problems and formulate actions based on pupil learning outcome data at formal team meetings. in line with previous research on data use interactions, this study showed limited teacher interactions (farley-ripple & buttram, 2015; hubers et al., 2017; keuning et al., 2016). previous research has also shown that teachers seek stability in their interactive data use behaviour. for example, teachers tend to interact with similar colleagues for data use and for their regular professional interactions (farley-ripple & buttram, 2015). furthermore, within the data use sequence (discussion, interpretation, diagnosis and action), teachers tend to select and retain a stable group of interlocutors (van gasse et al., 2017). in this regard, it is remarkable that this study reported a relatively low number of interactions between teachers. the colleagues involved in the social network checklist were all connected via formal data use occasions, thus informal interactions with these colleagues would have contributed to the stability of the teachers’ personal network. furthermore, research has repeatedly shown that data use cannot be considered a straightforward process because of the different sub-phases (gummer & mandinach, 2015; schildkamp et al., 2016). each of these phases require different knowledge and skills of teachers, and to some extent ‘a shift’ can be found between interpreting data and diagnosing problems and then taking actions in the knowledge that is used (gummer & mandinach, 2015). in teachers’ interactive behaviour, such a ‘shift’ has appreciable effects (van gasse et al., 2017). therefore, the finding that teachers’ informal interaction-seeking behaviour may be influenced differently according to the data use phase, confirms and broadens this prior knowledge. in addition to contributing to the field of data use research, this study also broadens understanding of teachers’ interaction-seeking behaviour in general. this study is an extension of earlier work that related formal interactions to informal interaction-seeking behaviour (e.g. meredith et al., 2017; spillane et al., 2015). in those studies, formal groupings were related to interaction-seeking behaviour. the current study, however, adds to this knowledge in two ways. first of all, it is not only the involvement in formal groupings that affect teachers’ informal interaction-seeking behaviour. this study shows the importance of what happens within these formal interactions. thus, next to just being related by means of formal team meetings (e.g. daly et al., 2010), teachers might only interact with their team members on an informal basis when they are positive about what happens at the formal team meetings. the second aspect we learn on interactions in teacher networks is that the type of task may matter for the influence of formal interactions on informal behaviour. in previous work that shed light on the interconnection between formal and informal interactions, this aspect stayed somewhat under surface (e.g. daly et al., 2010; spillane). this study makes clear that this relation might be affected by teachers’ task, because effects were only found when the complexity of the task in front increases. the social network approach taken in this study proved valuable for generating additional insights into how formal data use in team meetings can influence teachers’ informal data use interactions. however, there are some limitations to this study. the first is that the sample size was limited. social network analysis requires intensive data collection because of the high response rates (over 80%) required. this because missing data can have a huge impact on the results. suitable sample sizes are therefore difficult to obtain. however, for generalisable conclusions across the teams, the specificity of each team plays a role. for example, the fact that in teams riverbank and melrose no significant effects were found impacted on the way the findings could be generalised across the teams. this could have been resolved with a larger sample. therefore, to generalise the findings of this study, larger-scale network research will be required. a second limitation of this study relates to the team context that was used. to define the boundaries of teams, we used the criterion of teaching a specific pupil group, which meant that the teachers in the formal grouping taught different subject areas. it is important to note that some teachers may feel closer connections to other formal groupings within schools, such as subject-related teams. therefore, the present results are strongly context-depending based on the geographical situatedness in flanders, but also based on the team boundaries. therefore, choosing another formal grouping in future research may have an impact on the similarity of the findings to those in this study. the last limitation we want to emphasize is the limitations of using one survey-instrument. therefore, we could not fully capture why we arrived at the present research results. this study has generated some important implications for future research and practice. the first relates to the history of the formal team. the type of teams in this study might change about every school year. therefore, the results of this type of analyses might be slightly different when focusing on formal teams with a longer history together (e.g. grade-level teams in the work of, inter alia, meredith et al. (2017) or spillane et al. (2015)). replications of this study in other team contexts is necessary to fully understand how bounded the present results are to the current research context. next, further studies might investigate the reasons that teachers are more interested in interacting with some colleagues rather than others in data use diagnosis and action. if, for example, we assume that teachers learn about each other’s strengths and weaknesses through formal data use interactions, do teachers seek out specific (and similar) colleagues for data use diagnosis and action on that basis? thus, further research into whether, and why, specific teachers become more important in the complex phases of data use is essential for theory and practice. certain information on general characteristics (e.g. age, experience) might be helpful in this regard. in addition, the fact that teachers’ perceptions of formal data use affect their informal interactions in the diagnosis (and action) phase warrants further exploration. more insight is needed into what exactly happens in formal data use interactions, how schools and teacher teams differ in those formal interactions and when and how those interactions affect teachers’ informal interactive behaviour. opportunities in this regard can be found in combinations of social network analysis and qualitative data sources (e.g. observations or in-depth interviews) to further deepen the social network analyses of this study. for the field of data use, this study brings about implications regarding the sustainability of the many interventions that are used to promote data use among practitioners (e.g. ciampa & gallagher, 2016; cosner, 2011; marsh, 2012; schildkamp et al., 2016). such interventions often introduce (new and temporary) formal teams to support schools in the use of data. this study shows that ‘getting to know each other’ is not sufficient for the sustainability of the data use interactions during the interventions. such interventions might only succeed and grow into professional learning communities when sufficient attention is paid to how meaningful the data use activities are for the different team members. therefore, it is important to ensure strong connections between teachers in formal groupings in order to encourage teachers’ informal interactive behaviour. keypoints teachers are generally positive about the extent to which they discuss, interpret, diagnose and take action upon pupil learning outcome data at formal team meetings. teachers generally interact rarely on an informal basis on pupil learning outcome data. teachers who are more positive about their formal data use, interact to a greater extent for problem diagnosis in the context of the use of pupil learning outcome data. strong connections between teachers in formal groupings are essential to encourage teachers’ informal interactive behaviour. references blankenship, s. s., & ruona, w. e. a. (2009). exploring knowledge sharing in social structures: potential contributions to an overall knowledge management strategy. advances in developing human resources, 11(3), 290-306. doi: 10.1177/1523422309338578 bertrand, m., & marsh, j. a. (2015). teachers' sensemaking of data and implications for equity. american educational research journal, 52 (5), 861-893. doi: 10.3102/0002831215599251 borgatti, s. p., everett, m. g., & johnson, j. c. (2013). analyzing social networks. los angeles: sage. borgatti, s. p., mehra, a., brass, d. j., & labianca, g. (2009). network analysis in the social sciences. science, 323, 892-895. campbell, c., & levin, b. (2008). using data to support educational improvement. educational assessment, evaluation and accountability, 21(1), 47-65. doi: 10.1007/s11092-008-9063-x ciampa, k., & gallagher, t. l. (2016). teacher collaborative inquiry in the context of literacy education: examining the effects on teacher self-efficacy, instructional and assessment practices. teachers and teaching, 22(7), 858-878. doi: 10.1080/13540602.2016.1185821 coburn, c. e., & turner, e. o. (2011). research on data use: a framework and analysis. measurement: interdisciplinary research & perspective, 9(4), 173-206. doi: 10.1080/15366367.2011.626729 cosner, s. (2011). supporting the initiation and early development of evicence-based grade-level collaboration in urban elementary schools: key roles and strategies of principals and literacy coordinators. urban education, 46(4), 786-827. doi: 10.1177/0042085911399932 daly, a., moolenaar, n. m., bolivar, j. m., & burke, p. (2011). relationships in reform: the role of teachers’ social networks. journal of educational administration, 48(3), 359-391. doi: 10.1108/09578231011041062 datnow, a., & hubbard, l. (2016). teacher capacity for and beliefs about data-driven decision making: a literature review of international research. journal of educational change, 17(1), 7-28. doi: 10.1007/s10833-015-9264-2 farley-ripple, e. n., & buttram, j. l. (2015). the development of capacity for data use: the role of teacher networks in an elementary school. teachers college record, 117(4), 1-34. finnigan, k. s., & daly, a. j. (2012). mind the gap: organizational learning and improvement in an underperforming urban system. american journal of education, 119, 41-71. gummer, e. s., & mandinach, e. b. (2015). building a conceptual framework for data literacy. teachers college record, 117(4), 1-22. handcock, m. s., hunter, d. r., butts, c. t., goodreau, s. m., & morris, m. (2016). statnet: software tools for the statistical modeling of network data (version 2016.9). horn, i. s., & little, j. w. (2010). attending to problems of practice: routines and resources for professional learning in teachers’ workplace interactions. american educational research journal, 47(1), 181-217. doi: 10.3102/0002831209345158. doi: 10.3102/0002831209345158 hubbard, l., datnow, a., & pruyn, l. (2014). multiple initiatives, multiple challenges: the promise and pitfalls of implementing data. studies in educational evaluation, 42, 54-62. doi: 10.1016/j.stueduc.2013.10.003 hubers, m. d., moolenaar, n. m., schildkamp, k., daly, a., handelzats, a., & pieters, j. m. (2017). share and succeed: the development of knowledge sharing and brokerage in data teams’ network structures. research papers in education, 1-23. doi: 10.1080/02671522.2017.1286682 jimerson, j. b. (2014). thinking about data: exploring the development of mental models for data use among teachers and school leaders. studies in educational evaluation, 42(2014), 5-14. doi: 10.1016/j.stueduc.2013.10.010 jones, w. m. & dexter, s. (2014). how teachers learn: the roles of formal, informal, and independent learning. educational technology research and development, 62(3), 367-384. doi: 10.1007/s11423-014-9337-6 keuning, t., van geel, m., visscher, a., fox, j.p., & moolenaar, n.m. (2016). transformation of schools' social networks during a data-based decision making reform. teachers college record, 118(9), 1-33. kyndt, e., gijbels, d., grosemans, i., & donche, v. (2016). teachers’ everyday professional development: mapping informal learning activities, antecedents, and learning outcomes. review of educational research, 86(4), 1111-1150. doi: 10.3102/0034654315627864 marsh, j. a. (2012). interventions promoting educators’ use of data: research insights and gaps. teachers’ college record, 114, 1-48. marsh, j. a., & farrell, c. c. (2015). how leaders can support teachers with data-driven decision making: a framework for understanding capacity building. educational management administration & leadership, 43(2), 269-289. doi: 10.1177/1741143214537229 meredith c., van den noortgate w., struyve c., gielen s., kyndt e. (2017). information seeking in secondary schools: a multilevel network approach. social networks, 50, 35-45. doi: 10.1016/j.socnet.2017.03.006 moolenaar, n. m., sleegers, p. j. c., & daly, a. j. (2012). teaming up: linking collaboration networks, collective efficacy and student achievement. teaching and teacher education, 28, 251-262. doi: 10.1016/j.tate.2011.10.001 oecd (2014). talis 2013 results: an international perspective on teaching and learning , talis, oecd publishing, http://dx.doi.org/10.1787/9789264196261-en. penninckx, m., vanhoof, j., & van petegem, p. (2011). evaluatie in het vlaamse onderwijs. beleid en praktijk van leerling tot overheid . [evaluation in flemish education. policy and practice from student to government] antwerpen-apeldoorn: garant. penuel, w. r., riel, m., joshi, a., pearlman, l., kim, c. m., & frank, k. a. (2010). the alignment of the informal and formal organizational supports for reform: implications for improving teaching in schools. educational administration quarterly, 46(1), 57-95. doi: 10.1177/1094670509353180 penuel, w. r., sun, m., frank, k. a. & gallagher, h. a. (2012). using social network analysis to study how collegial interactions can augment teacher learning from external professional development. american journal of education, 119(1), 103-136. schildkamp, k., & kuiper, w. (2010). data-informed curriculum reform: which data, what purposes, and promoting and hindering factors. teaching and teacher education, 26(3), 482-496. doi: 10.1016/j.tate.2009.06.007 schildkamp, k., poortman, c. l., & handelzalts, a. (2016). data teams for school improvement. school effectiveness and school improvement, 27(2), 228-254. doi: 10.1080/09243453.2015.1056192 schildkamp, k., rekers-mombarg, l. t. m., & harms, t. j. (2012). student group differences in examination results and utilization for policy and school development. school effectiveness and school improvement, 23(2), 229-255. doi: 10.1080/09243453.2011.652123 spillane, j. p., hopkins, m., & sweet, t. (2015). intra-and inter-school interactions about instruction: exploring the conditions for social capital development. american journal of education, 122(1), 71 -110. spillane, j. p., & kim, c. m. (2012). an exploratory analysis of formal school leaders’positioning in instructional advice and information networks in elementary schools. american journal of education, 119(1), 73-102. doi: 10.1086/667755 spillane, j. p., parise, l. m., & sherer, j. z. (2011). organizational routines as coupling mechanisms policy, school administration, and the technical core. american educational research journal, 48(3), 586-¬619. stoll, l., bolam, r., mcmahon, a., wallace, m., & thomas, s. (2006). professional learning communities: a review of the literature. journal of educational change, 7(4), 221-258. doi: 10.1007/s10833-006-0001-8 sweet, t. m. (2016). social network methods for the educational and psychological sciences. educational psychologist, 51(3-4), 381-394. doi: 10.1080/00461520.2016.1208093 vanhoof, j., & schildkamp, k. (2014). from ‘professional development for data use’ to ‘data use for professional development’. studies in educational evaluation, 42, 1-4. doi: 10.1016/j.stueduc.2014.05.001 van gasse, r., vanlommel, k., vanhoof, j. and van petegem, p. (2016). teacher collaboration on the use of pupil learning outcome data: a rich environment for professional learning? teaching and teacher education, 60, 387-397. doi: 10.1016/j.tate.2016.07.004 van gasse, r., vanlommel, k., vanhoof, j. and van petegem, p. (2017). unravelling data use in teacher teams: how network patterns and interactive learning activities change across different data use phases. teaching and teacher education, 67, 550-560. doi: 10.1016/j.tate.2017.08.002 verhaeghe, g., vanhoof, j., valcke, m., & van petegem, p. (2010). using school performance feedback: perceptions of primary school principals. school effectiveness and school improvement, 21(2), 167-188. doi: 10.1080/09243450903396005 wayman, j. c., midgley, s., & stringfield, s. (2007). leadership for data-based decision making: collaborative educator teams. in a. b. danzig, k. m. borman, b. a. w. jones & w. f. wright (eds.), learner-centered leadership: research, policy and practice (pp. 189-205). new jersey, usa: lawrence erlbaum associates. microsoft word tulis et al_publication.docx frontline learning research vol.4 no. 2 special issue (2016) 12– 26 issn 2295-3159 corresponding author: maria tulis, university of augsburg, department of psychology, universitätsstraße 10, 86159 augsburg, germany. email address: maria.tulis@phil.uni-augsburg.de doi: http://dx.doi.org/10.14786/flr.v4i2.168 learning from errors: a model of individual processes maria tulisa, gabriele steuera, markus dresela auniversity of augsburg, germany article received 4 may / revised 16 october / accepted 2 march / available online 6 april abstract errors bear the potential to improve knowledge acquisition, provided that learners are able to deal with them in an adaptive and reflexive manner. however, learners experience a host of different—often impeding or maladaptive—emotional and motivational states in the face of academic errors. research has made few attempts to develop a theory that focuses on learning from errors (with the exceptions of the theory of impasse-driven learning and the theory of negative knowledge) and, in particular, a theoretical framework that focuses on antecedent motivational processes. by integrating theories of self-regulated learning, volition, attributions, and appraisals, we propose a model that highlights individual processes that are characteristic of this specific learning phenomenon. more precisely, our theoretical framework aims to explain how emotional, motivational and self-regulatory processes—influenced by personal and contextual conditions—interact in order to facilitate or impede adaptive dealing with errors and appropriate metacognitions and cognitive activities. our objective is to provide a framework that allows for the systematic integration of various aspects that have been targeted in previous research and to guide and stimulate future research on learning from errors. as a first evidence for validation, we summarise research findings that address specific parts of the proposed model. keywords: learning from errors; self-regulation; motivation; emotion tulis et al | f l r 13 1. learning from errors: a specific learning phenomenon in order to facilitate learning—the development of knowledge, metacognitive skills and autonomy— learners should be challenged with tasks that refer to skills and knowledge just beyond their current level of mastery (vygotsky, 1978). errors are a natural by-product of attempting challenging learning tasks and they may, in particular, provide learning opportunities (van lehn, 1988). recent research findings in educational psychology and contemporary cognitive psychology (e.g. cyr & anderson, 2014; van lehn, siler, murray, yamauchi, & baggett, 2003) give reason to revisit ancient wisdoms like “mistakes are the stepping stones for learning” or “you can always learn from your mistakes”. based on empirical findings, the consistent key argument is that errors initiate explanation and reflection processes in which deficient concepts are contrasted with correct concepts in order to establish accurate mental models (see also chi, 1996; kapur, 2008; oser & spychiger, 2005; siegler, 2002). however, as van lehn et al. (2003) put it, ”a learning opportunity is only an opportunity to learn”. accordingly, empirical findings consistently point to the importance of metacognitive support (e.g. keith & frese, 2005; künsting, kempf, & wirth, 2013). for example, westermann and rummel (2012) found that metacognitive support during student collaboration on difficult learning content and discussions of their wrong solutions lead to better learning outcomes. in addition to metacognitive processes, motivational processes obviously play a particularly important role for successful learning from errors. experiences of errors and impasses are accompanied by a host of different emotional and motivational states which facilitate or impede persistent learning engagement, the use of appropriate metacognitions, and cognitive activities. it can be assumed that poor learners are characterised by the experience of deactivating emotions following errors (for more details see section 2) and an inability to regulate their motivation and the respective emotions adaptively. in other words—as with learning in general (cf. kanfer & ackerman, 1989) but particularly after making errors—learning from one’s own errors through (self-) explanation basically requires motivational forces in order to persist after setbacks, to correct the error at hand, and to reflect on the underlying misconceptions. surprisingly, educational research has paid little attention to learning from errors. a theoretical framework that addresses error-related learning processes in terms of emotional experiences, motivational changes, self-regulation, metacognitive activities, and cognitions is lacking. in order to explain why some learners show adaptive reactions and learning gains after errors while others fail to do so, such a model needs to simultaneously explain individual differences with motivational self-regulatory processes (inextricably bound to emotions) as well as the learners’ prerequisites and conditions (i.e. dispositions, motivational beliefs and orientations, knowledge, abilities or skills) in interaction with characteristics of the learning environment and the context. we propose a model with perceived errors as the events that initiate selfregulation. it systematically integrates personal determinants, contextual conditions and situational processes that are specific for learners dealing with errors. within this framework, we integrated components of previous models and built on the central assumptions of established theories (for another attempt to integrate different motivational theories, but not specifically adjusted to processes following errors, see de brabander & martens, 2014). in particular, we included models that contribute to explain individual processes following errors—all of them further addressed in the next sections: the transactional stress/coping model based on primary and secondary appraisals (lazarus & folkman, 1984), aspects of volition theory (kuhl, 1985, 2000), feedback loops (carver & scheier, 1998), self-regulation models (boekaerts, 2006; winne & hadwin, 1998) and theories of impasse or error-driven learning (de leeuw & chi, 2003; kolodner, 1983, 1997; minsky, 1997; oser & spychiger, 2005; van lehn, 1988). findings from studies on error management (heimbeck, frese, sonnentag, & keith, 2003; keith & frese, 2005) and error-related beliefs or attitudes (rybowiak, garst, frese & batinic, 1999; tulis & ainley, 2011; tulis, steuer & dresel, subm.) complete our proposed model. tulis et al | f l r 14 1. 1 current state of research within the behaviouristic paradigm, and for a long time in the field of cognitive psychology, it was assumed that errors should be avoided because they would interfere with correct information and thus hinder the recall of correct answers (e.g. ayers & reder, 1998). in contrast, contemporary research provides empirical evidence for the fundamental role of errors in learning: overcoming impasses through reflection on errors and (self-) explanation of the underlying misconceptions has been shown to be important for learning progress since these processes help to establish accurate mental models (kapur 2008; mathan & koedinger, 2005; oser & spychiger, 2005; siegler, 2002; van lehn et al., 2003). based on a comprehensive literature review, we found different approaches that have been adopted in educational research to investigate the role of errors in learning: alongside research on classroom error management and error climate (e.g. tulis, 2013; steuer, rosentritt-brunn & dresel, 2013), individual responses to errors have been examined under different perspectives: for instance, there is a large body of research on (error) feedback and its impact on learning and achievement (for a meta-analysis see bangert-drowns, kulik, kulik & morgan, 1991; for an overview see mory, 1996). however, most of these studies did not address learning from errors per se. going deeper into this issue, a line of research has investigated students’ error patterns from a diagnostic perspective (for mathematics: clements, 1980; fiori & zuccheri, 2005; resnick, 1984) and has elaborated on error-types and taxonomies (e.g. frese & zapf, 1994). more recent studies have focused on learning from erroneous examples (eichelmann, narciss & schnaubert, 2013; große & renkl, 2007). for example, große and renkl (2007) found that incorrect solutions lead to enhanced learning outcomes if learners have favourable prior knowledge. including errors in worked examples motivated these learners to explain what was wrong and why, and it fostered elaborations on the correct solutions. their results underpin the positive relationship between transfer performance and the generation of self-explanations when learning with incorrect solutions. other researchers have concentrated on learning from errors with (intelligent) tutors (mathan & koedinger, 2005; van lehn et al., 2003). mathan and koedinger (2005) focused on learners’ error-detection and errorcorrection skills and how these can be supported. the authors provide evidence that feedback which allows students to detect, correct and reflect on their own errors fosters learning at a faster rate, conceptual understanding, and (transfer) performance. similarly, but in another setting (i.e. collaborative learning environments), research on productive failure has emphasised the benefits of delaying instruction in order to enable reflection on incorrect solution attempts by students (kapur, 2008; westermann & rummel, 2012). van lehn and colleagues (2003) investigated the conditions of successful learning episodes within their framework of impasse-driven learning. in particular, they studied tutorial dialogues between students and expert tutors. the results suggest that impasses and errors are strongly associated with learning. reaching impasses and clarifying errors turned out to have stronger effects on effective learning than when a tutor modelled the correct action. finally, some researchers have addressed learners’ attitudes towards making errors (rybowiak et al., 1999) and implemented the positive function of errors for learning in a training condition (gully, payne, koles & whiteman, 2002; kanfer & ackerman, 1996; keith & frese, 2005). in these studies, the positive function of errors was prompted to participants while practising a task and the participants were encouraged to make errors. however, error-trainings had better effects on performance if they were combined with instructions providing metacognitive techniques supporting cognitive and emotional self-regulation (keith & frese, 2005) or if individuals were higher in ability, higher in openness to experience, or lower in conscientiousness (gully et al., 2002). in summary, there is a growing research interest in the specific phenomenon “learning from errors, but a theoretical framework that allows an integration of these different perspectives is lacking. in addition to the above-outlined findings regarding the individual preconditions and their interaction with training efforts to enhance successful learning from errors, learners’ adaptive reactions to errors—their antecedents and consequences—have been considered to a minor degree. particularly little attention has been paid to differences in learners’ emotional and motivational responses to errors and their significance for subsequent learning processes. in this regard, we present four different theoretical perspectives on dealing with errors in tulis et al | f l r 15 learning contexts in the following sections. their theoretical assumptions build the basis for our proposed model described afterwards. 1. 2 perspectives on individual dealing with errors a first perspective to explain individual differences in learners’ reactions to errors can be derived from research on stress and coping (cf. boekaerts, 2010). lazarus and folkman (1984) proposed two cognitive appraisal processes which determine if a situation is perceived as stressful. first, in a primary appraisal process an (error) situation is interpreted along a continuum ranging from irrelevant, benignpositive, not harmful to challenging, threatening or harmful. the secondary appraisal process further evaluates the situation and determines which coping resources are available and whether the individual can apply them effectively. finally, the situation and coping strategies are monitored and evaluated, and the primary and secondary appraisals are modified if necessary. numerous studies have shown that appraisal processes—operating automatically or conscious and volitional—determine emotional experiences (lazarus, 1991). altogether, appraisal theory appears to constitute a proper basis for describing emotional states, motivational changes and self-regulatory processes following errors. a second perspective that is strongly related to learners’ reactions to errors stems from research on reactions to (success and) failure. literature review reveals an impressive body of research that has been proven to explain differences in individuals’ (affective) reactions to failure based on different theoretical foundations (for an overview see elliot & dweck, 2005), such as achievement goal theory (for an overview see maehr & zusho, 2009), or attribution theory (weiner, 1986). for example, mastery-oriented students with a focus on skill development and individual improvement do not necessarily feel threatened by failure when faced with a difficult task, but rather perceive setbacks as an opportunity for learning and mastery (e.g. dweck & leggett, 1988). causal beliefs of the importance of effort for success were found to mediate the relationship between mastery orientation and retained positive affect after errors were made (tulis & ainley, 2011). in contrast, performance avoidance goals have been shown to be associated with increased negative affect following failure experiences and lower preference for difficult tasks (e.g. elliott & dweck, 1988). clifford’s (1984) theory of constructive failure also emphasised learners’ differences in affective experiences following errors: students who are focused on the task rather than on themselves were less likely to fear failure and to feel negative emotions. rather, they were more likely to invoke positive thoughts and further appraisals of “challenge” (cf. boekaerts, 1993). finally, volition theory (kuhl, 1985, 2000) has broached the issue of the interplay between emotion, motivation, metacognition, and cognition in the face of failure: besides cognitive control—in terms of metacognitive activities directed towards keeping attention and effort on the task—emotion and motivation control (i.e. self-regulatory processes to keep negative emotions and other intrusive thoughts at bay during task engagement) can be assumed to mediate the effectiveness of learning from errors. in the context of learning situations, empirical studies by kanfer and ackerman (1996) provide evidence that emotion control is most critical when the task is likely to appear most daunting to the learner—a likely situation after making errors. important to note is that, although failure and errors are interrelated constructs, they are not the same: errors are usually defined as an unintended discrepancy between a current and a desired state, or as a deviation from a given standard (e.g. frese & zapf, 1994). “failure” implies more than just this perceived discrepancy. in contrast to errors, failure experiences constitute a more global miss of a goal with a greater focus on the subsequent consequences (cf. zhao & olivera, 2006). above all, not every error is necessarily interpreted as failure. whether an error is evaluated as failure or not depends on situational aspects (e.g. social norms) and personal characteristics of the learner, such as self-concept of ability. bandura (1997), as well as eccles and her colleagues (e.g. eccles & wigfield, 2002), concluded that efficacy expectations or perceptions of self-competence are a major determinant of a person’s willingness to invest more effort if the task becomes challenging—hence, also following errors. thirdly, regarding theories on learning from errors in a narrower sense, a perspective on dealing with errors stems from organisational psychology and technology based learning. within the field of organisational psychology, rather economised working models developed for empirical studies in the field of tulis et al | f l r 16 workplace learning have been proposed (zhao, 2011; bauer, gartmeier, & harteis, 2012; van dyck, van hooft, de gilder, & liesveld, 2010). researchers have either primarily focused on personal characteristics that may facilitate or impede effective learning from errors at work (e.g. components of an error specific attitude, rybowiak et al., 1999) or they have focused exclusively on contextual features, such as the organisational error climate. as an exception, oser and colleagues (e.g. oser & spychiger, 2005) introduced the concept of “negative knowledge” (cf. minsky, 1997) in the context of academic learning. it represents knowledge about false facts and inappropriate action strategies that labels incorrect concepts as wrong and helps to prevent the repetition of errors in similar situations. similarly, kolodner (1983, 1997) emphasised that individuals use knowledge about formerly experienced errors in new situations. comparably, van lehn (1988) suggested that impasses pave the way for learning from the subsequent explanation and therefore are even necessary for learning processes. however, these theoretical explanations primarily consider cognitive processes, and they do not cover emotional, motivational, and self-regulatory processes following errors as antecedents of successful learning from errors. in order to bridge this gap, contemporary models of self-regulated learning which propose recursive processes including emotional/motivational functioning as well as metacognitions and cognitive activities (e.g. boekaerts, 1999; pintrich, 2000; schmitz, 2001; zimmerman, 2008) appear to constitute a fourth perspective on differences in learners’ reactions to errors. more specifically, we explicate three selfregulation models in the following which provide a proper basis for describing motivational and selfregulatory processes following errors with different focal points (boekaerts & niemivirta, 2000; carver & scheier, 1998; winne & hadwin, 1998). first, the “dual processing self-regulation model” (boekaerts & niemivirta, 2000; boekaerts, 2006) provides a framework addressing the importance of affective experiences and the learners’ competences to regulate their emotions and motivation following errors. two main goal priorities which are pursued by selfregulative activities are distinguished: (1) the “mastery/growth pathway” and (2) the “well-being pathway”. learners who want to reach a specific subgoal in order to improve skills or gain knowledge (e.g. analyse the causes of the error at hand) initiate activities in the mastery/growth pathway because they value that goal and feel competent enough to commit energy to its pursuit. on the other hand, learners who are primarily concerned with the anticipated threat to their self-worth and the negative consequences of errors initiate activities in the well-being pathway. importantly, it is assumed that learners can switch to the mastery/growth pathway by using adaptive emotional and motivational regulation strategies (boekaerts, 2006). another theoretical model that can be used to explain both learners’ emotional as well as behavioural changes following errors was introduced by carver and scheier (1998). the authors have focused on the role of feedback control processes during self-regulation. the core construct in their model is a discrepancyreducing feedback loop (or a discrepancy-enlarging loop in the case of an avoidance situation): if a discrepancy between a current state/situation (input function) and a goal/standard (reference value) is detected, adjustments are made in an output function in terms of behavioural changes. for example, a learner may invest more effort to identify the error causes or she may seek further information in the learning material after the perception of an error. parallel to this behaviour-guiding loop, carver and scheier (1998) described the affect-creating feedback loop which operates automatically and simultaneously. it is assumed to monitor the rate of progress of behaviour discrepancy reduction over time. hence, the theoretical model of carver and scheier (1998, 2013) provides an appropriate framework for behavioural reactions as well as the origins and functions of emotions that are experienced after errors. finally, the model suggested by winne and hadwin (1998) highlights the ongoing evaluation of potential discrepancies between products and standards of the learning process. in their model, the authors describe four basic phases—task definition, goal setting and planning, studying tactics, and adaptations to metacognition (for an overview see also perry & winne, 2006)—in terms of the interaction of personal and contextual conditions, products (i.e. learning behaviour and outcomes) compared with standards (i.e. the optimal end state of each phase) and the learner’s goals through metacognitive evaluation processes all these tulis et al | f l r 17 aspects are types of information that are used or generated during learning. a mismatch between products and standards is assumed to initiate further learning operations, the use of metacognitive strategies and/or the revision of the conditions and standards. the output, or performance, is the result of recursive processes that cascade back and forth, altering conditions, standards, operations, and products as needed. thus, the model represents a “recursive, weakly sequenced system” (winne & hadwin, 1998, p. 281) and it primarily addresses cognitive and metacognitive activities. therefore, it perfectly augments the proposed theoretical framework for learning from errors presented in the next section, which considers not only emotional and motivational but also cognitive and metacognitive processes and learning activities. in summary, most of the theories outlined above focus on self-regulatory processes in general, but a sufficiently elaborated model with respect to perceived errors as initiating points for self-regulation is lacking. previous research was not able to adequately explain individual differences in error-specific emotional and motivational self-regulation. regarding prior research that aimed to investigate learning from errors, it is striking that self-regulatory processes—in particular motivational processes—have only been addressed sketchily, although there is a common agreement on their importance in the face of setbacks. a theoretical perspective for relating personal and contextual conditions and motivational (self-regulatory) processes following errors is needed to explain and systematically investigate error-related learning phenomena. previous empirical findings suggest that such a model must address three issues: first, affective and motivational reactions to errors, as well as cognitive and behavioural reactions specifically adjusted to the error in question have to be included (dresel, schober, ziegler, grassinger, & steuer, 2013; tulis, grassinger, & dresel, 2011). secondly, such a theory must take into account critical characteristics of errorsituations and indicate how these characteristics affect the potential contribution of individual dispositions, orientations, abilities or skills and current motivational states to adaptive learning behaviour. finally an integrated framework must consider the effects of interactions between personal determinants, contextual conditions, and situational processes. prior research and working models have tended to focus either on personal preconditions or the context, depending on the researcher’s primary concern. we attempt to overcome these existing shortcomings and to expand previous approaches by providing a framework which integrates proven theories of self-regulation, volition, motivation, emotion, and cognition. 2. individual reactions to and learning from errors: a process model the purpose of this section is to introduce a framework that includes the above presented theoretical perspectives and to provide a model which can provide an explanation of individual differences, situational influences and sequenced processes following error-experiences as antecedents for successful learning from errors (see figure 1). learning from errors is an effortful activity. our understanding of learning from errors includes a detailed analysis of the error causes in order to identify and explain potential misconceptions, a self-evaluation of the underlying knowledge and its modification, as well as the correction of the error in question (e.g. dresel, et al., 2013). prior to these metacognitions and cognitive activities, learners have to deal with changes in affect and motivation after the perception of an unintended discrepancy between a current state and a desired outcome or a given standard. more specifically, the perception of an error represents the (“bottom-up”) starting point in our model (see “error feedback/detection of an error” marked with an asterisk in figure 1) which induces a sequence of processes (indicated with bold arrows in figure 1). yet irrespective of the type of error or its causes, this is assumed to trigger direct reactions in terms of affect based on primary appraisals of the situation (see “direct reactions towards errors” in figure 1). in line with lazarus (1991), primary appraisals are directed towards the assessment of the relevance of this unintended discrepancy/goal incongruence to the learner and subsequent affect acts as a signal for this personally relevant deviation from an implicit or explicit standard. based on these primary appraisals, different emotions such as surprise, frustration, anger or boredom may be tulis et al | f l r 18 experienced. for example, a self-confident, high achieving learner may first experience surprise after error feedback, whereas a low achiever may experience frustration at first sight of an error—in the event that both learners value the task and aim to master the task. this first emotional reaction might be vague, maybe not as easy to categorize as a specific emotion, but in any case we would expect an observable change in arousal. we assume that primary reactions are followed by more indirect reactions towards the error at hand (see “indirect/secondary reactions towards errors” in figure 1) including secondary appraisals directed at the assessment of controllability and personal resources to deal with the error (cf. lazarus, 1991). it is further assumed that these secondary appraisal processes change or intensify the primary emotional reaction and the learners’ subsequent motivation. analogous to top-down processing (i.e. knowledge or expectations are used to guide processing), further selfand task related appraisals, such as causal attributions (weiner, 1986) are made. these, in turn, might evoke attribution-dependent emotions other than the learner’s primary emotional states—or the learner’s primary emotional states might be intensified. it can be expected that not all types of errors lead to the same processes and subsequent learning. for example, in contrast to careless mistakes (e.g. slips, caused by attentional problems, or lapses, caused by memory failures) only knowledgeand rule-based errors might bear a potential for learning (for this taxonomy see reason, 1990). the nature of emotional and motivational changes is likely affected by the type of the error at hand. thus, at this stage of the model, we presume that the error-type has an impact on the learners’ secondary appraisals, the subsequent selfand task-related motivation, and the respective learning actions. in the next step (see “emotional and motivational regulation” in figure 1), these changes in self and task-related motivation—and emotional states—are assumed to trigger emotional and motivational regulation processes (cf. boekaerts, 2003). depending on personal characteristics of the learner, these errorrelated regulation processes may become necessary to maintain learning motivation. for example, overthinking the value of the task, the use of social resources, efficacy self-talk, or cognitive reappraisal may help to reassure the learner to proceed with the task despite setbacks (e.g. wolters, 2003). some learners may be more concerned with emotion-focused coping (lazarus, 1993) to avoid a threat to self-worth and restore their well-being (cf. “well-being pathway”, boekaerts, 2006), others may focus on strategies to re-direct attention and learning activities in order to master the task (boekaerts, 2006; kuhl, 2000). hence, we assume that learners actively (i.e. consciously or automated) use emotional and motivational regulation strategies following errors to activate and sustain their cognitive, metacognitive and affective functioning (butler & winne, 1995; wolters, 2003). we presume that adaptive (and effective) emotional and motivational selfregulation (gross, 1998; schwinger, steinmayr & spinath, 2009; wolters, 1998) provides the basis for the use of appropriate metacognitive activities, cognitive strategies and learning behaviour to adequately reflect on the underlying misconception (subsumed under “learning process” in figure 1). however, the regulation strategies that learners may use can also be dysfunctional: the use of maladaptive strategies following errors, such as distraction, suppression or rumination (e.g. gross, 1998; knollmann, 2006) may impede a detailed self-explanation of errors and their respective correct counterparts. furthermore, it can be assumed that—as for the use of learning strategies—some regulation strategies may be appropriate for certain learning contexts whereas the same strategies might be dysfunctional in other contexts (engelschalk, steuer & dresel, 2015). in any case, inappropriate or failed regulation strategy use may result in the experience of negative deactivating emotions, such as hopelessness or boredom, which are held to be detrimental for motivation (e.g. pekrun, goetz, daniels, stupnisky, & perry, 2010). thus, emotions are not only assumed to act as a signal after the perception of a discrepancy, but they are also assumed to be an indicator of the learners’ current motivation. consequently, they guide subsequent learning behaviour (e.g. in terms of persistence, attention focus, or information seeking) and they serve as a monitoring instrument for goal pursuit (carver & scheier, 1990). hence, emotions are assumed to moderate learning processes and we regard the presence of activating (or epistemic) emotions as a necessary condition for persistent task engagement in the face of obstacles and for learning from errors in general. tulis et al | f l r 19 figure 1. process model of individual reactions to and learning from errors. it is important to note that individuals’ learning behaviour following errors, their emotional and motivational experiences and regulation strategies, and their subsequent metacognitions and cognitive activities are all assumed to be influenced by personal characteristics as well as contextual features which interact continuously with one another throughout the entire learning process. learners continuously appraise the learning conditions against the background of their individual dispositions, skills and abilities (e.g. prior knowledge or topic-interest), and their motivational beliefs such as self-concept of ability or goal orientation (for an overview see schunk, meece & pintrich, 2013). previous findings indicate that the effectiveness of error encouragement training might depend on such individual differences (e.g. gully et al., 2002). contextual conditions include characteristics of the task (e.g. an enquiry-based learning task versus a routine task), the learning context (e.g. practice versus testing situation), and the interpersonal aspect of dealing with errors in social learning environments which may facilitate or impede learning from errors (i.e. error climate). although located at the starting point in our model, personal and contextual conditions impact later processes as well (indicated with dashed arrows in figure 1). their interaction is affected by previous learning experiences and outcomes which are integrated in a broader social and cultural context. “learning from errors” (marked with an asterisk in figure 1) takes place in terms of reflection and self-explanation processes based on respective metacognitive activities, the use of appropriate cognitive strategies, and learning behaviour adapted to the new situation (see “learning process” in figure 1). finally, this should result in the modification of the underlying knowledge, improved skills and performance gains (see “learning outcome” in figure 1) which are expected to have reciprocal effects on the learners’ personal tulis et al | f l r 20 conditions, and hence on the interpretation of subsequent error-situations (indicated with backwards directed arrows in figure 1). in order to validate the proposed processes, different stages/components of the model and their relations or sequenced effects need to be analysed: in particular, (1) the impact and interplay between different personal and situational conditions on individual reactions to errors and the use of error-specific adaptive regulation strategies, (2) the proposed changes in motivation and emotion and their function for further self-regulation and learning behaviour, (3) the relevance of metacognitive and cognitive activities for error-related learning processes, and (4) the necessity of affective-motivational functioning to provide a basis for such activities. 3. empirical evidence and open research questions so far we have provided some evidence for the assumed functions of emotions at the stage pertaining to direct/primary and indirect/secondary reactions towards errors, the use of error-related regulation strategies, and the influence of selected personal conditions and contextual factors on individual responses to errors (dresel, et al., 2013, steuer et al., 2013; tulis, 2013; tulis & ainley, 2011; tulis & dresel, 2013; tulis & fulmer, 2013; tulis et al., 2011; tulis et al., subm.). in the present section we summarise the findings of three studies with different foci, namely individual determinants of adaptive dealing with errors (study 1), motivational (self-regulation) processes following errors and their impact on subsequent learning behaviour (study 2), and the dimensions of error climate and their impact on students’ responses to errors (study 3). study 1—located in the “person í situation” part of figure 1—focused on individual components that may facilitate a learner’s adaptive reaction to errors (tulis et al., subm.). previous studies (e.g. dresel et al., 2013; tulis & ainley, 2011; tulis et al., 2011) have already demonstrated a positive relationship between students’ (more stable) motivational orientations (i.e. positive self-concept of ability, mastery goal orientation, adaptive error-related beliefs) and emotional/motivational reactions following errors. based on these results tulis et al. (subm.) tested a tripartite classification of adaptive individual dealing with errors in terms of a cognitive, an affective, and a behavioural component. more specifically, the authors analysed the distinctiveness of 614 students’ self-reported beliefs about errors as learning opportunities from students’ affective-motivational reaction tendencies that facilitate persistence and engagement despite setbacks, and students’ behavioural reaction tendencies including metacognitive activities and error-related learning behaviour (dresel et al., 2013). the results—obtained with confirmatory factor analyses demonstrating a good fit to the data—provided evidence for three distinct factors. in addition, the authors analysed their relationship to other motivational beliefs, and, whether these components of adaptive individual dealing with errors may differ between the scholastic domains of mathematics, english and german. correlational findings suggested domain-specificity for the three components. thus, error-related beliefs, habitualised affective-motivational and behavioural responses to errors might be acquired domain-specifically. further research is needed before any conclusions can be drawn, but the results point to the likelihood that students’ dealing with errors may not be differentiated along the verbal and mathematics continuum as, for instance, the academic self-concept is (e.g. marsh, walker & debus, 1991). furthermore, the findings emphasise the differentiation of personal conditions in terms of rather proximal beliefs in addition to less error-specific motivational beliefs, such as mastery goal orientation. study 2 (tulis & dresel, 2013, august) addressed motivational changes (see “emotional and motivational regulation” in figure 1), undergraduate students’ motivational and emotional self-regulation following errors and its effects on learning behaviour (see “learning process” in figure 1) in a computerbased learning setting. data were collected during two time intervals—the same study design was implemented in both studies with the only difference that in study 2a, we additionally conducted stimulated tulis et al | f l r 21 recall interviews immediately after the learning session to examine participants’ use of various emotional and motivational regulation strategies following errors whereas -self-reported regulation strategies were assessed on-task after error feedback in study 2b. regarding the hypothesised motivational changes after making errors measured with on-task state items, we found a substantial decrease in students’ motivation. repeated-measures manovas with the three components of adaptive individual dealing with errors (high versus low levels) as between-subject factors indicated a stronger decline in task-related motivation, situational interest and enjoyment, and perceived competence for students’ low in action adaptivity, affective motivational adaptivity, and adaptive beliefs about errors, respectively. interview data pointed out a rich variety of different emotional and motivational regulation strategies that are used following errors, ranging from “proximal goal setting”, and the “use of social resources” (i.e. asking someone for help; most reported) to “self consequating” (i.e. motivating oneself by self-reinforcement for having reached a particular goal; least reported). in addition, also maladaptive strategies, such as “rumination” and “suppression” were reported in study 2a. however, when measured on-task (study 2b)—immediately after error feedback— appraisal based strategies, such as cognitive reappraisal (i.e., having a positive view on making errors as a natural part of learning) and mastery self-talk (i.e., thinking of the potential of errors for personal improvement) were more prominent, as was the use of maladaptive strategies. thus, according to our model, a decrease in student motivation triggered the use of emotional and motivational regulation strategies, and personal characteristics served as a buffer. regression analyses further emphasized differential associations between these strategies and adaptive learning activities after error feedback: mastery self-talk and reappraisal were found to facilitate an in-depth analysis of the error at hand, whereas distraction negatively predicted the reflection of the underlying misconceptions. logistic regression results indicated positive associations between proximal goal setting and students’ actual persistence. in summary, our findings emphasised the importance of motivational self-regulation for subsequent engagement following errors, and hence the proposed function of emotional and motivational regulation strategies for subsequent learning processes and learning behaviour. furthermore, they provided first evidence for a differentiation between adaptive and maladaptive error-related strategies. the findings of study 3 (steuer et al., 2013) are based on a questionnaire-study with 1,116 students from 56 sixth and seventh grade classrooms. this study focused on contextual conditions and it is located in the “person í situation” part as well as related to the dashed arrows of figure 1 that indicate the influence of characteristics of the learning environment on individual learning behaviour. study 3 provided evidence for eight theoretically and empirically distinguishable subdimensions of error climate and their impact on students’ individual dealing with errors. steuer et al. (2013) further demonstrated that classroom error climate has an impact on students’ affective-motivational and action adaptivity of error reactions, which, in turn, were positively associated with students’ self-reported effort. hence, according to our proposed model, the results supported the assumed association between personal conditions and characteristics of the social learning environment as well as their influences on individual learning behaviour following errors. 4. contribution to theory development and implications for future research taken together, our findings corroborate the assumed interplay between personal and contextual conditions as well as the importance of functional emotional and motivational self-regulation for adaptive dealing with errors. supported by some preliminary empirical evidence, the proposed model provides a more complete understanding of the motivational processes following errors in interaction with personal and contextual conditions. it gives several indications of how learners’ adaptive reactions to errors—a necessary precondition for learning from errors—can be supported. however, besides motivational processes, further research is needed to address the cognitive processes specifically related to effective learning from errors (see “learning process” in figure 1). research findings on conceptual change, cognitive conflicts, impasse driven learning and productive failure may provide the basis for further investigations using on-task tulis et al | f l r 22 measurements (e.g. eye-tracking). another important issue raised by previous findings (e.g. keith & frese, 2005) concerns metacognitive activities—also part of the proposed framework that needs to be specified in future research. finally, our findings raised some methodological issues for future research: retrospective measurements (even if the time interval is short) might not offer adequate insights into actual and transient task-specific regulation processes, especially strategies involving cognitive change. therefore, future studies should differentiate between strategies learners tend to use to regulate their motivation during learning (e.g. assessed with questionnaires) and the actual strategy learners use (measured on-task). in summary, our model contributes to current research on motivation in several ways: (1) it expands current theories of self-regulated learning because it highlights perceived errors as initiating points for selfregulatory processes, (2) it provides a solid foundation for the analysis of motivational processes compatible with almost all contemporary theories of motivation, and (3) it enables the examination of personal, contextual and situational conditions and their interactions as well as their potential impact on error-related learning processes. finally, our proposed model provides a unified framework specifically adjusted to the phenomenon of learning from errors—a growing but still barely investigated field of educational research. previous findings and future research can be easily integrated into the present framework in order to specify the antecedents and processes of effective learning from errors. finally, the major implication for the future research practice is the process-related view on learning. keypoints a theoretical framework specifically adjusted to the phenomenon of learning from errors is introduced. changes in motivation trigger emotional and motivational self-regulation processes. individual differences are explained by personal and situational conditions, emotional, motivational, metacognitive and cognitive processes. contemporary theories of motivation are integrated in the model. antecedents and processes of successful learning from errors are specified. references ayers, m. s., & reder, l. m. (1998). a theoretical review of the misinformation effect. predictions from an activation-based memory model. psychonomic bulletin and review, 5, 1–21. doi:10.3758/bf03209454. bandura, a. (1997). self-efficacy: the exercise of control. new york: w. h. freeman. bangert-drowns, r. l.; kulik, c.; kulik, j. a. & morgan, m. t. (1991). the instructional effect of feedback in test-like events. review of educational research, 61, 213–238. doi:10.3102/00346543061002213. bauer, j., gartmeier, m., & harteis, c. (2012). human fallibility and learning from errors at work. in j. bauer & c. harteis (eds.), human fallibility: the ambiguity of errors for work and learning (pp. 155– 169). dordrecht: springer. doi:10.1007/978-90-481-3941-5. boekaerts, m. (1993). being concerned with well-being and with learning. educational psychologist, 28, 149–167. doi:10.1207/s15326985ep2802_4. boekaerts, m. (1999). self-regulated learning: where we are today. international journal of educational research, 31, 445–457. doi:10.1016/s0883-0355(99)00014-2. boekaerts, m. (2003). towards a model that integrates motivation, affect, and learning. british journal of educational psychology monograph, series ii (2), development and motivation: joint perspectives, 173–189. boekaerts, m. (2006). self-regulation and effort investment. in e. sigel & k. a. renninger (eds.), handbook of child psychology, child psychology in practice, 4 (pp. 345–377). new jersey: wiley. tulis et al | f l r 23 boekaerts m. (2010), coping with stressful situations: an important aspect of self-regulation. in p. peterson, e. baker, & b. mcgaw (eds.), international encyclopedia of education (pp. 570–575). oxford: elsevier. boekaerts, m., & niemivirta, m. (2000). self-regulation in learning: finding a balance between learning goals and ego-protective goals. in m. boekaerts, p.-r. pintrich, & m. zeidner (eds.), handbook of selfregulation (pp. 417–450). san diego: academic press. doi:10.1016/b978-012109890-2/50042-1. butler, d. l., & winne, p. h. (1995). feedback and self-regulated learning: a theoretical synthesis. review of educational research, 65, 245–281. doi:10.3102/00346543065003245. carver, c. s., & scheier, m. f. (1990). origins and functions of positive and negative affect: a control process view. psychological review, 97, 19–35. doi:10.1037/0033-295x.97.1.19. carver, c. s. & scheier, m. f. (1998). on the self-regulation of behavior. new york: cambridge: university press. doi:10.1017/cbo9781139174794. carver, c. s., & scheier, m. f. (2013). goals and emotion. in m. d. robinson, e. r. watkins, & e. harmonjones (eds.), guilford handbook of cognition and emotion (pp. 176–194). new york: guilford press. chi, m. t. h. (1996). constructing self-explanations and scaffolded explanations in tutoring. applied cognitive psychology, 10, 33–49. doi:10.1002/(sici)1099-0720(199611)10:7%3c33::aidacp436%3e3.3.co;2-5. clements, m. a. (1980). analyzing children's errors on written mathematical tasks. educational studies in mathematics, 11, 1–21. doi:10.1007/bf00369157. clifford, m. m. (1984). thoughts on a theory of constructive failure. educational psychologist, 19, 108–120. doi:10.1080/00461528409529286. cyr, a.-a., & anderson, n. d. (2015). mistakes as stepping stones: effects of errors on episodic memory among younger and older adults. journal of experimental psychology: learning, memory, and cognition. doi:10.1037/xlm0000073. de brabander, k., & martens, r. (2014). towards a unified theory of task-specific motivation. educational research review, 11, 27–44. doi:10.1016/j.edurev.2013.11.001. de leeuw, n., & chi, m. t. h. (2003). self-explanation, enriching a situation model or repairing a domain model? in g. sinatra & p. r. pintrich (eds.), intentional conceptual change (pp. 55–78). mahwah: new jersey: erlbaum. dresel, m., schober, b., ziegler, a., grassinger, r., & steuer, g. (2013). affektiv-motivational adaptive und handlungsadaptive reaktionen auf fehler im lernprozess [affective-motivational adaptive and action adaptive reactions on errors in learning processes]. zeitschrift für pädagogische psychologie, 27, 255– 271. doi:10.1024/1010-0652/a000111. dweck, c. s., & leggett, e. l. (1988). a social-cognitive approach to motivation and personality. psychological review, 95, 256–273. doi:10.1037/0033-295x.95.2.256. eccles, j. s., & wigfield, a. (2002). motivational beliefs, values, and goals. annual review of psychology, 53, 109–132. doi:10.1146/annurev.psych.53.100901.135153. eichelmann, a., narciss, s., & schnaubert, l. (2013, august). learning from errors through tasks-withtypical-errors. paper presented at the 15th biennial conference of the european association for research on learning and instruction (earli), munich, germany. elliott, e., & c. dweck. (1988). goals: an approach to motivation and achievement. journal of personality and social psychology, 54, 5–12. doi:10.1037/0022-3514.54.1.5. elliot, a. j., & dweck, c. s. (eds.). (2005). handbook of competence and motivation. new york, ny: guilford. engelschalk, t., steuer, g. & dresel, m. (2015). wie spezifisch regulieren studierende ihre motivation bei unterschiedlichen anlässen? ergebnisse einer interviewstudie [situation-specific motivation regulation: how specifically do students regulate their motivation for different situations?]. zeitschrift für entwicklungspsychologie und pädagogische psychologie, 47, 14–23. doi:10.1026/00498637/a000120. fiori, c., & zuccheri, l. (2005). an experimental research on error patterns in written subtraction. educational studies in mathematics, 60, 323–331. doi:10.1007/s10649-005-7530-6. tulis et al | f l r 24 frese, m., & zapf, d. (1994). action as the core of work psychology: a german approach. in h. c. triandis, m. d. dunette, & l. m. hough (eds.), handbook of industrial and organizational psychology (vol. 4, pp. 271–340). palo alto: consulting psychologists. gross, j. j. (1998). the emerging field of emotion regulation: an integrative review. review of general psychology, 2, 271–299. doi:10.1037/1089-2680.2.3.271. große, c. s., & renkl, a. (2007). finding and fixing errors in worked examples: can this foster learning outcomes? learning and instruction, 17, 612–634. doi:10.1016/j.learninstruc.2007.09.008. gully, s. m., payne, s. c., koles, k. l., & whiteman, j.-a. k. (2002). the impact of error training and individual differences on training outcomes: an attribute–treatment interaction perspective. journal of applied psychology, 87, 143–155. doi:10.1037/0021-9010.87.1.143. heimbeck, d., frese, m., sonnentag, s., & keith, n. (2003). integrating errors into the training process: the function of error management instructions and the role of goal orientation. personnel psychology, 56, 333–362. doi:10.1111/j.1744-6570.2003.tb00153.x. kanfer, r., & ackerman, p. l. (1989). motivation and cognitive abilities: an integrative/aptitude-treatment interaction approach to skill acquisition. journal of applied psychology, 74, 657–690. doi:10.1037/0021-9010.74.4.657. kanfer, r., & ackerman, p. l. (1996). a self-regulatory skills perspective to reducing cognitive interference. in i. g. sarason, b. r. sarason & g. r. pierce (eds.), cognitive interference: theories, methods, and findings (pp. 153–171). mahwah, nj: erlbaum. kapur, m. (2008). productive failure. cognition and instruction, 26, 379–424. doi:10.1080/07370000802212669. keith, n., & frese, m. (2005). self-regulation in error management training: emotion control and metacognition as mediators of performance effects. journal of applied psychology, 90, 677–691. doi:10.1037/0021-9010.90.4.677. knollmann, m. (2006). kontextspezifische emotionsregulationsstile. entwicklung eines fragebogens zur emotionsregulation im lernkontext mathematik [emotion regulation in learning contexts: development of a questionnaire measuring emotion regulation during math learning]. zeitschrift für pädagogische psychologie, 20, 113–123. doi:10.1024/1010-0652.20.12.113. kolodner, j. (1983). towards an understanding of the role of experience in the evolution from novice to expert. international journal of man-machine studies, 19, 497–518. doi: 10.1016/s00207373(83)80068-6. kolodner, j. (1997). educational implications of analogy: a view from case-based reasoning. american psychologist, 52, 57–66. doi:10.1037/0003-066x.52.1.57. kuhl, j. (1985). volitional mediators of cognitive-behavior-consistency; self-regulatory processes and action versus state orientation. in j. kuhl & j. beckmann (eds.), action control. from cognition to behavior (pp. 101–128). berlin: springer. kuhl, j. (2000). the volitional basis of personality systems interaction theory: applications in learning and treatment contexts. international journal of educational research, 33, 665–703. doi:10.1016/s08830355(00)00045-8. künsting, j., kempf, j., & wirth, j. (2013). enhancing scientific discovery learning through metacognitive support. contemporary educational psychology, 38, 349–360. doi:10.1016/j.cedpsych.2013.07.001. lazarus, r. s. (1991). emotion and adaptation. oxford: oxford university press. lazarus, r. s. (1993). why we should think of stress as a subset of emotion? in l. goldberger & s. breznitz (eds.), handbook of stress: theoretical and empirical aspects (2nd ed., pp. 21–39). new york: the free press. lazarus, r. s., & folkman, s. (1984). stress, appraisal, and coping. new york: springer. maehr, m. l., & zusho, a. (2009). achievement goal theory: the past, present, and future. in k. r. wentzel & a. wigfield (eds.), handbook of motivation at school (pp. 77–104). new york: routledge. marsh, h. w., walker, r., & debus, r. (1991). subject–specific components of academic self–concept and self–efficacy. contemporary educational psychology, 16, 311–345. doi:10.1016/0361-476x(91)90013b. tulis et al | f l r 25 mathan, s. a., & koedinger, k. r. (2005). fostering the intelligent novice: learning from errors with metacognitive tutoring. educational psychologist, 40, 257–265. doi:10.1207/s15326985ep4004_7. minsky, m. (1997). negative expertise. in p. j. feltovich, k. m. ford, & r. r. hoffman (eds.), expertise in context (pp. 515–521). menlo park: aaai press/mit press. mory, e. h. (1996). feedback research. in: d. h. jonassen (ed.), handbook of research for educational communications and technology. a project of the association for educational communications and technology, 919–956. new york: macmillan. oser, f., & spychiger, m. (2005). lernen ist schmerzhaft. zur theorie des negativen wissens und zur praxis der fehlerkultur [learning is painful. on the theory of negative knowledge and the practice of error culture]. weinheim, germany: beltz. pekrun, r., goetz, t., daniels, l. m., stupnisky, r. h., & perry, r. p. (2010). boredom in achievement settings: control-value antecedents and performance outcomes of a neglected emotion. journal of educational psychology, 102, 531–549. doi:10.1037/a0019243. perry, n. e., & winne, p. h. (2006). learning from learning kits: gstudy traces of students’ self-regulated engagements with computerized content. educational psychology review, 18, 211–228. doi:10.1007/s10648-006-9014-3. pintrich, p. r. (2000). the role of goal orientation in self-regulated learning. in m. boekaerts, p. r. pintrich, & m. zeidner (eds.), handbook of self-regulation (pp. 452–502). san diego, ca: academic press. schunk, d. h., meece, j. r., & pintrich, p. r. (2013). motivation in education: theory, research, and applications (4th ed.). upper saddle river, nj: prentice hall. reason, j. t. (1990). human error. cambridge: cambridge university press. doi:10.1017/cbo9781139062367. reason, j. t. (1995). understanding adverse events: human factors. quality in health care, 4, 80–89. doi:10.1136/qshc.4.2.80. resnick, l. b. (1984). beyond error analysis: the role of understanding in elementary school arithmetic. in h. n. cheek (ed.), diagnostic and prescriptive mathematics: issues, ideas, and insights (pp. 214). kent, oh: research council for diagnostic and prescriptive mathematics. rybowiak, v., garst, h., frese, m., & batinic, b. (1999). error orientation questionnaire (eoq): reliability, validity, and different language equivalence. journal of organizational behavior, 20, 527–547. doi:10.1002/(sici)1099-1379(199907)20:4%3c527::aid-job886%3e3.0.co;2-g. schmitz, b. (2001). self-monitoring zur unterstützung des transfers einer schulung in selbstregulation für studierende [self-monitoring to support the transfer of a self-regulation instruction for students]. zeitschrift für pädagogische psychologie, 15, 181–197. doi:10.1024//1010-0652.15.34.181. schwinger, m., steinmayr, r., & spinath, b. (2009). how do motivational regulation strategies affect achievement: mediated by effort management and moderated by intelligence. learning and individual differences, 19, 621–627. doi: 10.1016/j.lindif.2009.08.006. siegler, r. s. (2002). microgenetic studies of self-explanation. in n. granott & j. parziale (eds.), microdevelopment. transition processes in development and learning (pp. 31–58). cambridge: cambridge university press. doi:10.1017/cbo9780511489709.002. steuer, g., rosentritt-brunn, g., & dresel, m. (2013). dealing with errors in mathematics classrooms: structure and relevance of perceived error climate. contemporary educational psychology, 38, 196– 210. doi:10.1016/j.cedpsych.2013.03.002. tulis, m. (2013). error management behavior in classrooms: teachers’ responses to students’ mistakes. teaching and teacher education, 33, 56–68. doi:10.1016/j.tate.2013.02.003. tulis, m., & ainley, m. (2011). interest, enjoyment and pride after failure experiences? predictors of students' state-emotions after success and failure during learning mathematics. educational psychology, 31, 779–807. doi:10.1080/01443410.2011.608524. tulis, m. & dresel, m. (2013, august). motivational and emotional self–regulation and adaptive learning activities after errors. paper presented at the 15th biennial conference of the european association for research on learning and instruction (earli), munich, germany. tulis et al | f l r 26 tulis, m., & fulmer, s. m. (2013). students’ motivational and emotional experiences and their relationship to persistence during academic challenge in mathematics and reading. learning and individual differences, 27, 35–47. doi:10.1016/j.lindif.2013.06.003. tulis, m., grassinger, r., & dresel, m. (2011). adaptiver umgang mit fehlern als aspekt der lernmotivation und des selbstregulierten lernens von overachievern [adaptive handling of errors as an aspect of learning motivation and self-regulated learning of overachievers]. in m. dresel & l. lämmle (eds.), motivation, selbstregulation und leistungsexzellenz [motivation, self-regulation and achievement excellence] (pp. 29–51). münster, germany: lit. tulis, m., steuer, g., & dresel, m. (subm.). components of adaptive individual dealing with errors during academic learning. van dyck, c., van hooft, e. a. j., de gilder, t. c., & liesveld, l. c. (2010). proximal antecedents and correlates of adopted error approach: a self-regulatory perspective. journal of social psychology, 150, 428–451. doi:10.1080/00224540903366743. van lehn, k. (1988). toward a theory of impasse-driven learning. in h. mandl & a. lesgold (eds.), learning issues for intelligent tutoring systems, 19-41. new york: springer. van lehn, k., siler, s., murray, c., yamauchi, t., & baggett, w. (2003). why do only some events cause learning during human tutoring? cognition and instruction, 21, 209–249. doi:10.1207/s1532690xci2103_01. vygotsky, l. s. (1978). mind in society: the development of higher mental processes. cambridge, ma: harvard university press. weiner, b. (1986). an attributional theory of motivation and emotion. psychological review, 92, 548–573. doi:10.1007/978-1-4612-4948-1. westermann, k., & rummel, n. (2012). delaying instruction: evidence from a study in a university relearning setting. instructional science, 40, 673–689. doi:10.1007/s11251-012-9207-8. winne, p. h. & hadwin, a. f. (1998). studying as self-regulated learning. in d. j. hacker, j. dunlosky & a. c. graesser (hrsg.), metacognition in educational theory and practice (pp. 279–306). hillsdale, nj: erlbaum. wolters, c. a. (1998). self-regulated learning and college students' regulation of motivation. journal of educational psychology, 90, 224–235. doi:10.1037/0022-0663.90.2.224. wolters, c. a. (2003). regulation of motivation. evaluating an underemphasized as-pect of self-regulated learning. educational psychologist, 38, 189–205. doi:10.1207/s15326985ep3804_1. zimmerman, b. j. (2008). investigating self-regulation and motivation: historical background, methodological developments, and future prospects. american educational research journal, 45, 166– 183. doi:10.3102/0002831207312909. zhao, b. (2011). learning from errors: the role of context, emotion, and personality. journal of organizational behavior, 32, 435–463. doi:10.1002/job.696. zhao, b., & olivera, f. (2006). error reporting in organizations. academy of management review, 31, 1012– 1030. doi:10.5465/amr.2006.22528167. frontline learning research 5 special issue „learning through networks‟ (2014) 56-71 issn 2295-3159 corresponding author: matthieu.vaessen@ou.nl doi: http://dx.doi.org/10.14786/flr.v2i2.92 56 | f l r networked professional learning: relating the formal and the informal matthieu vaessenᵃ, antoine van den beemtᵇ, maarten de laatᵃ a welten institute, open university, heerlen, the netherlands b eindhoven school of education, eindhoven university of technology, the netherlands article received 17 february 2014 / revised 31 march 2014 / accepted 30 june 2014 / available online 15 july 2014 abstract the increasing complexity of the workplace environment requires teachers and professionals in general to tap into their social networks, inside and outside circles of direct colleagues and collaborators, for finding appropriate knowledge and expertise. this collective process of sharing and constructing knowledge can be considered 'networked learning'. the processes involved are informal and largely invisible to the official framework of the organisation. consequently, a large amount of learning that takes place is unrecognised and the dynamics, impacts and benefits of such networked learning are often overlooked by organisations. this situation brings about tensions between formal and informal processes, which in turn raise issues concerning adequate professional development, professional autonomy and management. it also leads to questions about facilitating the creation and exchange of knowledge and expertise within the existing social networks. taking an interdisciplinary approach, we explore a number of educational and organisational studies. our key questions are: what are the formal and informal mechanisms underlying networked professional learning, related to professional development, autonomy and management? how can networked learning be positioned in the most optimal way? currently, a clear academic understanding of how to optimally align and make use of networked learning is lacking. the goal of our exploratory review is to describe mechanisms that influence the alignment of informal and formal learning of teachers within their workplace: schools. we work towards a theoretical and practical integration of the different chosen fields by means of a framework of mechanisms related to networked learning. keywords: networked learning; teachers; professional development vaessen et al. 57 | f l r 1. introduction entering the 21st century, pervasive communication technologies together with increased attention for situational knowledge have resulted in an emphasis on collaboration and exchange, highlighting the importance of social networks both within organisations and across organisational borders (lieberman, 2000; price, 2013; pugh & prusak, 2013). making good use of social networks has become increasingly important in educational settings, where teachers develop relationships within and outside schools that help them to learn, solve problems, and innovate their teaching (de laat, 2012). access to networks resulting from these informal relationships has become an important aspect of continued professional development. these informal networks help teachers to deal with the increasing complexity of their work. research shows that most of what professionals learn is learnt informally (cross, 2007), which highlights the need for professional autonomy and personal creativity in problem-solving and professional development. furthermore, research shows the need to understand the role and impact of informal social networks on teacher professional development (villegas-reimers, 2003; darling-hammond et al., 2009; boud & hager, 2012; hargreaves & fullan, 2012). 1.1 the value of networked professional learning networked learning is a perspective on social learning that describes how participants learn through communication, exchange and connections. people in a person's network can be seen as a source of knowledge (siemens, 2004). learning in networks can be informal (a chat during a break) or formal (attending a group training), and the networks themselves can be formal (a taskforce) or informal (talking to a student's parent). learning networks often can be of value when we are in need of certain knowledge, especially the „weak ties‟; those people that we know but don‟t interact with very often can have something „new‟ to offer (granovetter, 1973). learning in networks is nothing new, it happens where people interact and gain experience (eraut, 2004), connected to the work context (billet, 2004). professional learning has proven to drive organisational learning and innovation (bessant et al., 2012). addressing complex problems is a forte of the networked learning paradigm (earl & katz, 2007; hodgson, de laat, mcdonnel & ryberg, 2014). 1.1.1 networked learning and professional development in spite of the proven importance of informal networks, professional development of teachers is almost invariably approached in a largely formal manner (darling-hammond & wei, 2009; villegasreimers, 2003). school organisations often think of schooling in terms of hiring an expert, in-house training, or individual training trajectories such as coaching. however, formal trajectories are seldom tailored to the challenges teachers face in daily practice. furthermore, these challenges at work induce teachers to learn informally (billet, 2004). both formal trajectories and informal learning processes are part of the learning of teachers, and professionals in general (billet, 2002; le clus, 2011). unfortunately, this continuous process of workplace learning, where people customarily exchange knowledge with others in their networks, is hardly ever recognised as professional development. as such, informal learning processes are often overlooked by the management and as a consequence do not receive adequate attention. this suboptimal situation (billet, 2004; de laat, 2012) can be remedied by aligning formal and informal learning processes through networked learning. instead of contrasting formal with informal learning, we emphasise the need to develop a hybrid form of learning where both formal and informal learning activities are recognised and promoted (cf. mcguire & gubbins, 2010). this requires a new role from school management, one that expands a culture of learning by creating social learning spaces for professional development (de laat, 2012). growing evidence is available that shows how informal professional development can be given a place within the formal organisational context by establishing learning networks and professional learning communities, such as „communities of practice‟ (cf. wenger, dermott & snyder, 2002). promoting and strengthening these vaessen et al. 58 | f l r informal networks builds on the already existing social structures and networks within and between schools (cf. hodkinson & hodkinson, 2005; de laat, 2012). 1.1.2 networked learning and professional autonomy participation in learning networks is aimed at sharing knowledge and expertise as individuals personally see fit. networked learning, in our view, is aimed at promoting professional autonomy, selfdirectedness and independent decision-making. networked learning opens up the social environment to optimally make use of (new) possibilities to connect to other professionals (cf. de laat, 2012). for networked learning to be effectively integrated into the organisation, a balanced and integrated approach is required (agterberg, 2012). since informal learning through networks is often bottom-up, self-governing, spontaneous and practice-driven, it is not an easy task to combine this with the formal need for control and performance: management and „personnel‟ have different roles and outlooks (fuller & unwin, 2003; hargreaves & fullan, 2012; hodkinson & hodkinson, 2005) as soon as the management gets involved too much, participants in learning networks risk losing their sense of autonomy, the result of which can be loss of motivation (agterberg, 2012; kubiak & bertram, 2010). related to issues of teacher autonomy are teachers‟ influence on management control and leadership (forrester, 2000) as well as teachers‟ participation in planning and innovation (de laat, 2012). 1.1.3 networked learning and management providing autonomy, which allows individuals to interact and develop expertise as they see fit, means lowering formal control (hulsbos, andersen, kessels & wassink, 2012). this brings into view issues of management and leadership, which directly influence the amount of professional autonomy that individuals have in the organisation (bass, 1991; tynjälä, 2013). with greater individual autonomy, thinking, learning and acting independently is increased and people can personally take up responsibility. this requires a different style of leadership, where responsibilities are shared among the members of the organisation: distributed leadership. distributed leadership promotes the sharing of knowledge and increases motivation for work and learning (spillane, 2008). when leaders pay attention to informal factors in the organisation, such as the personal interests of individuals (i.e. „transformative leadership‟) this increases commitment to organisation goals. this can be contrasted with purely transactional leadership, which functions according to standards, performance and rewards, which can engender mediocrity in the organisation (bass, 1991). to create an organisation where the day-to-day complexity is successfully dealt with and different interests are accounted for, where responsibility is shared and where people can grow and together create value and quality, the management needs to shift focus from a traditional centralised role to a position that reflects a deeper insight into the dynamics of the organisation. this entails an integrated view of formal and informal dynamics. directions and strategies can be developed „top-down‟ as well a „bottom-up‟ (groot and homan, 2012). networked learning then involves the organisation as a whole, management as well as teachers. 1.1.4 aim of this study we have argued the importance of informal networked learning and illustrated how this relates to professional development, autonomy and management of informal and formal learning in organisations. however, to date, these areas of research have not been integrated in the scientific literature. theory in the field of teacher professional development is still much under development (mccormick, 2010). findings from the private sector can advance theory and practice in the public sector (binz-scharf, lazer, & mergel, 2011). in this study we examine underpinning mechanisms, using a networked learning perspective, in order to develop a better conceptual understanding and to examine how this facilitates a better alignment of informal and formal learning in organisations. since professional development of teachers is directly related vaessen et al. 59 | f l r to teaching quality (darling-hammond & wei, 2009; villegas-reimers, 2003), we deem it important that this topic receives the attention it deserves. 1.2 the ‘iceberg’ metaphor as background of our study: formal and informal working and learning the formal side of how things are officially organised, and the informal side of how in everyday life people work, learn, experience and give meaning to their work, are two faces of the organisation. the analogy of an iceberg illustrates this point. the visible tip of the iceberg represents the formal organisation, where planned decisions are made and organisational structures are developed in order to divide the work, create order, and provide stability. under the waterline we find the huge mass of the iceberg, largely invisible, informally structured, yet much larger and often at least as influential as the official organisation structures, consisting of everything that is not formal (de caluwe and vermaak, 2003; de laat, 2012). „formal‟ and „informal‟ aspects of working and learning both are part of professional life and play a role at the level of individuals, groups, and organisations. the worlds „above‟ and „under‟ water mutually influence each other: by interacting in networks people create and influence both the formal and the informal organisation. within both formal and informal networks we find aspects of control, autonomy, performance, development and management. actions and procedures can be planned or spontaneous, visible or invisible, controlled or chaotic, under orders or autonomous. both formal procedures and informal influences are crucial for the organisation and its members (brown, collins & duguid, 1989; snowden, 2005). formal and informal mechanisms play a role for all individuals, groups and organisations, be it „above water‟, or „under water‟. 2. research question in this paper we research the mechanisms underlying networked professional learning in order to increase our understanding of how to optimally align networked learning in the school organisation. our key questions are: what are the formal and informal mechanisms underlying networked professional learning, related to professional development, autonomy and management? how can networked learning be positioned in the most optimal way? the term mechanism is used here as: the way in which something functions. we first address how networked learning contributes to professional development. then, because a prerequisite for networked learning is the possibility of spontaneous and autonomous action and decisionmaking, we outline how networked learning is related to professional autonomy. lastly, we explore how networked learning is related to issues of management and leadership. literature in these different research areas has until now not been integrated, and we work toward a framework in order to bring these different areas together. we do this by means of analysing formal and informal mechanisms that play a role in networked learning. 3. methodology the studies presented in this exploratory review were identified in several systematic steps. first, searches on the database of ebscohost were applied. we chose this database as it is a multi-disciplinary meta-database that allows to search for articles that covers studies in education and professional development, management and organisational learning. ebscohost includes, amongst others, the databases vaessen et al. 60 | f l r of academic search elite, business source premier, e-journals, psycinfo, and eric. peer reviewed journal articles and international peer reviewed book chapters published between january 1st 2004 and january 1st 2014 were included in the search. the following keywords were used for a boolean search: „networked learning‟ or 'learning networks' and „professional development‟ and „teachers‟ not „online‟. this search resulted in 74 articles. the aim of the literature research was to recognise formal and informal mechanisms underlying networked learning, related to professional development, professional autonomy and management. for this purpose, the abstract, summary and references of all selected sources were studied first, 26 studies were shortlisted and the articles were read, which resulted in a final selection of 22 sources. the other 52 articles were left out of further analysis because they did not discuss networked professional development of teachers, or had a single focus on online learning tools. the snowball method of checking references in the remaining articles resulted in 22 extra references relevant to our aim. in total 44 studies (see appendix 2) were read in depth and provided the basis of our analysis. the result is an overview of formal and informal mechanisms involved in networked professional learning. this overview is then condensed into a conceptual framework. 4. findings first we discuss our findings of how networked learning is related to professional development. after this, we look at networked learning and professional autonomy. then we consider the relation of networked learning and management. we conclude each section with an overview of formal and informal mechanisms regarding networked learning found in the literature. 4.1 networked learning and professional development professional development comprises formal and informal activities related to intellectual, personal and social domains (de laat, schreurs & nijland, 2013), and can be seen as a “non-linear ongoing process rather than as an outcome of linear, one-off training events” (varga-atkins, o‟brien, burton, campbell & qualter, 2008, p.42). furthermore professional development can be regarded as “a flow of acquired knowledge, as well as participation in a learning community” (pahor, škerlavaj & dimovski, 2008). in networked learning communities, knowledge is constructed and developed, rather than being transferred from one person to the next (schultz, 2011). influence from colleagues can be noted as a contributing factor in order to learn and develop, for example, in changing a style of teaching (supovitz, sirinides & may, 2009). it is argued that theory in the field of professional development still has to be developed, insights gained from networked learning could contribute to how and what teachers learn professionally (cf. appleby & hiller, 2012; mccormick, 2010). exchange between individuals happens through formal and informal networks (carmichael, fox, mccormick, procter & honour, 2006) and the flow of knowledge related to professional development occurs both between organisations and within organisations (jones, 2006; seezink, poell & kirschner, 2010) as well as cross-culturally (ryan, kang, mitchell & gaalen, 2009). professional learning activities can be formal (obtaining a diploma or a degree from an institute), or informal (sharing a drink after a conference day). studies comparing effectiveness of professional development programmes have found that collaborative approaches are more effective than individual ones (varga-atkins et al., 2010), for example when teachers together research and evaluate their own practices (bartlett & burton, 2006). baker‐ doyle and yoon, (2011) also found that while teachers personally gather information, it is within and through social networks that this information comes to life as it is shared, interpreted, developed and sustained. professional development can be seen as an ongoing process of becoming where people grow and learn in connection with each other and events in their professional life (boud & hager, 2012; poell & van der krogt, 2013). schools however, have traditionally been formally designed in a way that teachers work individually. “they have rarely been given time together to plan lessons, share instructional practices, assess students, design curriculums, or contribute to administrative or managerial decisions” (darling-hammond & wei, 2009, p.11). increasing possibilities for communication and exchange across organisational boundaries is therefore vaessen et al. 61 | f l r an important aspect of networked learning initiatives, aiming to bring together people in order to exchange and create knowledge to support each other. for example, questions can be explored, new insights can be discussed, or meeting an expert can provide valuable new information. both formal and informal learning opportunities enable teachers to improve their practice (o‟brien, varga-atkins, burton, campbell & qualter, 2008). making social learning processes part of a learning programme can complement or replace formal education such as seminars in situations where this formal education does not address the learning needed. for example, a project was carried out in a primary school setting where teachers, parents and other parties outside of the school studied problems together (angelides, georgiou & kyriakou, 2008). these learning networks, aimed at developing a social learning approach, were found to facilitate experimentation and reflection. the teachers felt strengthened in their profession when being able to collaborate with the outsiders (school advisors or academics) that came to the school (angelides et al., 2008). learning through networks and partnerships within and between schools sustains contextualised knowledge (baumfield & butterworth, 2005). beckett (2012) describes a situation in which school staff operated in a political context focused on targets and performance levels. the school was situated in a poor area, which required adaptation and dealing with complexities. the school staff felt that the governmentimposed recommendations were not reflecting their immediate concerns, and developed a school network including researchers in order to develop understanding about the relation between poverty and children‟s educational experiences. professional learning networks can function as a „learning incubation centre‟ (attard, 2012). participating in a learning network can promote reflective awareness and development through collaborative analysis, for example when participants note that they “started to dig deeper into their experience” (p.199). when what happens in learning networks is of direct relevance to the participants' needs, this can increase participants‟ motivation to engage in the reflective process that the network entails (attard, 2012). the main findings of this section are: professional learning is an ongoing process, rather than something occasional, which naturally happens in formal and informal social structures. furthermore, networked learning is often situated and most effective when it is directly related to the work practices. promoting collaboration through networks has proven to be effective to enhance the learning process. in table 1 we outline the formal and informal mechanisms regarding networked learning and professional development that we have found in this section. table 1 formal and informal mechanisms in networked learning regarding professional development mechanisms 'informal' 'formal' knowledge is constructed knowledge is transferred invisible visible transcending borders within boundaries continuous event-driven demand-driven supply-oriented voluntary under orders 4.2 networked learning and autonomy if teachers are to improve their skills, they must have the possibility to influence their work and the way they learn (cf. villegas-reimers, 2003). learning networks provide individuals with the opportunity to vaessen et al. 62 | f l r learn about topics they personally find of interest to their practice or personal development. in addition to being able to choose what they want to learn, networks also open up the environment by providing links to others outside of the direct working environment (cf. büchel & raub, 2002). the option to personally choose the areas to explore improves a person‟s performance (akkerman, petter & de laat, 2008) because the opportunity to choose brings a feeling of responsibility which increases personal motivation (vargaatkins et al., 2010). research shows that when teachers have more autonomy they are more committed and share more of their practices (hökkä & eteläpelto, 2013; imants, wubbels & vermunt, 2013). trotman (2009) warns for too much pressure to meet formal performance standards, pointing out that one should be careful to ensure that true learning is happening, where professionals are intrinsically motivated because of their own interest. for reflective processes to take place among colleagues, there must be trust, so that mistakes can be discussed openly and learned from (hargreaves et al., 2013). positive school culture and atmosphere for collaboration are thus important contributors to quality of networked professional development (vargaatkins et al., 2010). hodkinson and hodkinson (2005) refer to the notion of an „expansive‟ rather than a „restrictive‟ learning environment where formal learning is combined with an effective approach to informal and networked learning. through networked learning, possibilities for collaboration and personal initiative can be created (hodkinson & hodkinson, 2005). learning networks can function as open platforms where participants can meet and develop issues of their own interest. however, issues surrounding accountability can come up when learning networks are misunderstood and misused, for example when formal leaders take part, disturbing genuineness and exchange, or when financial interests are involved that create pressure (trotman, 2009). group processes of power, role ambiguity, and lack of direction can create complications. when personal responsibility takes the form of accountability toward control from superiors or school inspection, spontaneous learning processes can be impeded (hargreaves et al., 2013). among members of the group a sense of autonomy is created and sustained and in this sense, autonomy does not mean acting alone as an isolated individual (hargreaves et al., 2013; imants et al., 2013). a flat organisation structure and a culture that fosters democracy and participation, allows for easier contact between people and increases the chance that networked learning occurs. in open organisational environments where people freely can use their networks to connect to each other and learn, it is easier to find and contact the right person to learn from. hierarchy and a centralised culture can hinder possibilities to learn from more experienced people (pahor et al., 2008). trust is an important factor when it comes to developing networked learning communities (day & hadfield, 2004; trotman, 2009). penuel et al. (2009) describe how in a school there were more opportunities to learn from colleagues, because the principal and the teachers themselves encouraged sharing and communication. authority structures were more open, and teachers often used their networks to go outside the school for helpful resources. the school showed a pragmatic attitude towards teachers using these networks and resources, rather than one requiring formal approval from superiors. this led to a high level of trust in relationships and a sense of collective responsibility. more openness, generated by trust and social coherence, can lead to more success in implementing change and development (penuel et al., 2009). promoting open collaboration requires trust in order for members to open up, discuss differences, deal with uncertainty and respect individual differences (attard, 2012). hökkä and eteläpelto (2013), studying autonomy and learning of teachers, note three aspects to consider in order to improve continuous professional learning facilitated by networks: teachers often do not identify with their role as active researchers and developers, barriers between groups can hinder collaboration between groups in different fields, and too strongly adhering to one‟s views can limit collaboration, cultural change and organisational learning. hanraets, hulsebosch and de laat (2011) note that networking skills need to be developed over time in order to make better use of the social environment . employing initiative, valuing others with whom you learn, sharing responsibility and building relations or actively looking for connections are not necessarily skills that people have by nature. new skills have to be developed, by getting used to the new networked way of thinking and working (day & hadfield, 2004). vaessen et al. 63 | f l r concluding, an important aim of promoting networked learning is to provide individuals with more professional autonomy by creating an open environment in which people can connect to others to learn. we have seen that a number of mechanisms that play a role here: freedom of choice, commitment, responsibility, accountability, power, control, trust, communicative openness and willingness to share and reflect are all factors that contribute to the professional autonomy of the individual, and to a collaborative atmosphere in the organisation, and the success of networked learning activities. we stipulate that aiming to integrate these informal tendencies with the necessary formal requirements (see table 2) will create a situation with most value for all involved. table 2 formal and informal mechanisms regarding professional autonomy mechanisms 'informal' 'formal' personal choice rules commitment accountability personal interest/development performance standards personal reflection directives communicative openness communicative barriers trust control in what follows we outline how networks and networked learning are related to management, how networked learning is important, and what can be done to promote it. we identify formal and informal mechanisms that are of influence in the context of management and networked learning. 4.3 networked learning and management schools can be seen as examples of ´open practices´ (de laat, schreurs & nijland, 2014), connecting different parties and practices in an open and complex environment as they are directly related with governments, parents and families, companies and other collaborative institutions (darling-hammond & wei, 2009; villegas-reimers, 2003). the importance of networks for the organisation and the way they are embedded within organisational structures have been widely recognised (cf. carmichael et al., 2006 ). knowledge developed in learning networks form a significant part of the „social capital‟ of an organisation (van emmerik, jawahar, schreurs & cuyper, 2011), and learning networks build capacity for change (edwards, 2012). since networked processes comprise a large part of the learning in organisations, it raises the question of how to manage the relations and knowledge involved. by relinquishing some control, managers can provide a creative and productive network environment where organisation members take part out of their own interest, understanding the benefits of having a strong professional network (büchel & raub, 2002). leaders need to „let it happen‟ while at the same time facilitating adequate room for emerging networks and embedding network activities in the organisation (kubiak & bertram, 2010). leadership is not only embedded in formal positions, but emerges from interactions between people and activities that are performed (scribner, sawyer, watson & myers, 2007). in a more open and decentralised authority structure, leadership is less central but distributed over the members in the networks of the organisation (cf. frost, 2008). büchel and raub note the importance of multi-directionality, each member or unit can learn from all the others. responsibility for success lies within all the network members. learning networks can be designed for problem-solving and creating new knowledge, generated by input from all participants. vaessen et al. 64 | f l r although the motivation of the participants is crucial in attaining success, learning networks need to be supported by the management (büchel & raub, 2002; carmichael et al., 2006). promoting learning and change entails that both formal processes and informal processes are considered important and where possible brought into agreement. when the formal and the informal organisation of a school are in harmony, it increases the chance of successful collaboration (penuel et al. 2010). managing responsibilities and allocation of time and resources have found to be of influence to perceptions of the social space on the work floor. the “designed” and “lived” organisations are equally important and influence each other mutually (penuel et al., 2010). in addition to promoting an open culture of learning and exchange in general, organising network activities or setting up networked learning communities can be helpful to promote the exchange of knowledge (moses, skinner, hicks & o‟sullivan, 2009) and to create a more distributed leadership where members of the organisation all can contribute their expertise (baumfield & butterworth, 2005). holmes (2004) describes a networked learning project where collective enquiry was the underlying mechanism that fuelled the activity in the learning networks. in order to be successful, a learning network needs a common purpose which benefits individual needs, fruitful collaboration which promotes commitment, purposeful and relevant network activities, a good facilitator who has sound knowledge and expertise in the given area, and funding (varga-atkins et al., 2010). fostering networked learning communities is most successful when participants have shared goals, such as clearly defined aims and activities, where a balance between short and long-time goals is important, observe kubiak and bertram (2010). in order to promote learning networks, it has shown to be important to respect the natural bottom-up, self-governing culture of learning. since informal learning is often spontaneous and practice-driven, it is not an easy task to combine this with the need for control and performance of „above the waterline‟: management and employees have different roles and outlooks. as soon as the management gets involved too much, learning networks risk losing their sense of autonomy, the result of which can be loss of motivation (agterberg, 2012; kubiak & bertram, 2010). for a network facilitator, his or her task involves creatively working with whatever emerges and take up the role of for example an inspirer, guide, pr-manager or an investigator. in order to work with bottom-up processes the facilitator has to develop a non-directive attitude, and to investigate profoundly the needs and expectations of the participants and use this information to make suggestions for developing the network. also, coaching participants intensively in personal and communication skills and online literacy can be part of the procedures. furthermore it can be necessary to promote networked learning as a recognised strategy for professional development in order for it to be understood and supported by supervisors and managers (hanraets et al. 2011).. school principals are important agents when it comes to implementing learning networks. they can act as gate-keepers, facilitators or as barriers (o‟brien et al., 2008). the way networks are promoted and developed by leaders and co-leaders is highly influential (daly, moolenaar, bolivar & burke, 2009), while the way networks develop can vary from network to network (kubiak, 2009, kubiak & bertram, 2010). some may be more short-lived, others become more mature and individuals and schools might opt in or out according to their individual needs. network leaders, being aware of these particularities and developing appropriate strategies, can prove vital for the healthy development of learning networks (fox, haddock & smith, 2007; kubiak & bertram, 2010; schechter, 2012; varga-atkins et al., 2010). hökkä and eteläpelto (2013) conclude that because the management is crucial in creating openness and the possibility for change, leaders and managers themselves need to reflect on their own identity, since they are the ones implementing strategic decisions and then deal with the emotions of the personnel. concluding; regarding networked learning and organisational leadership, we found a number of mechanisms at play. managerial acknowledgement of informal networks, promoting networked learning, organisational structure, a distributed leadership, open communication patterns, and an organisational culture in favour of collaboration and exchange, not only between direct colleagues but also between different organisational layers, all contribute to an environment that promotes a healthy learning culture that is conducive to both formal learning procedures and informal networked learning (see table 3). vaessen et al. 65 | f l r table 3 formal and informal mechanisms regarding management mechanisms 'informal' 'formal' recognition of informal networks recognition of formal authority structures shared leadership centralised leadership bottom-up decision-making top-down decision-making open organisational structure rigid organisational structure open communication closed communication learning and working together in an inspiring environment is more likely to succeed when the work floor and the management understand each other and respect each others‟ decisions. networked learning facilitates understanding and collaboration in respect to the content of work practices, and also contributes to the formal and informal organisational context. 5. conclusion and discussion in this study we examined underpinning mechanisms regarding networked learning and professional development, autonomy, and management. we used the perspective of networked learning in order to develop a better conceptual understanding and to examine how this facilitates a better alignment of informal and formal learning in organisations. our key questions were: what are formal and informal mechanisms underlying networked professional learning related to professional development, autonomy and management? how can networked learning be positioned in the most optimal way? 5.1 formal and informal mechanisms underlying networked professional learning concerning our first question: we analysed the formal and informal mechanisms that we found in each of the sections of the results (see appendix 1) and found three main groups of mechanisms at play: learning mechanisms: what we have seen in the literature indicates that networked learning is a natural activity through which professionals develop their expertise, in addition to participating in formal learning procedures. this form of professional development is a continuous process. networked learning is often directly related to work practices and promoting it has proven to be effective to enhance the learning process. mechanisms regarding autonomy can be considered to be motivational: networked learning provides individuals with the opportunity to connect to others with the same interests, in this way opening up the learning environment to learn what one deems necessary. personal learning and learning initiatives can be promoted through networked learning. issues of trust, freedom of choice, and willingness to share and connect are intrinsically motivated factors that play a role here. this can be contrasted with pressure to perform, obligations to follow rules, and follow strict regulations which, however necessary, creates an external motivational force (cf. ryan & deci, 2000). vaessen et al. 66 | f l r organisational mechanisms: if management acknowledges the value of informal networks, professionals can be encouraged to make use of their informal networks in order for the organisation to adapt to the always changing environment. through networks, organisational structures become more flexible, and open communication can be promoted. in an expansive rather than a restrictive organisational environment, leadership can be seen as a process where responsibilities are distributed and „bottom-up‟ initiatives are encouraged. the management has an important role in creating a conducive and collaborative learning environment by providing opportunities for networked learning activities and structuring the formal organisation accordingly these three groups of mechanisms can be brought together in the following framework against the background of our „iceberg‟ (figure 1). figure 1. three groups of formal and informal mechanisms related to networked learning in school organisations 5.2 how can networked learning be positioned in the most optimal way? our second key question was: how can networked learning be positioned in the most optimal way? as we have argued in the introduction, formal and informal learning procedures in teacher professional development often not are integrated in a satisfactory way. the core mechanisms depicted in the formal-informal framework illustrate how networked learning can be positioned so that formal learning procedures can be augmented, complemented and informed by informal networked learning. already existing informal networks can be made visible and then strengthened by giving them a place in the organisation. for this to happen it is helpful for the networks to develop a learning agenda that is visible to the management (de laat, 2012), and have support from the management (büchel and raub, 2002). for members of networks to be motivated, autonomy, trust and efficacy are important factors in order for networks to be effective (cf. van den beemt, ketelaar & diepstraten, 2014). networking skills need to be developed by both the participants in learning networks and by the management of school organisations in order for networked learning to be most effective. formal regulations and standards are a professional reality, but school leaders, in addition to judging teachers‟ performance through accountability practices, can strive to create an open organisational culture where responsibilities are shared, encourage participation, and promote looking for new ideas outside of the direct working environment in order to create an environment learning formal informal autonomy organisation planned extrinsic motivation restrictive spontaneous intrinsic motivation permissive vaessen et al. 67 | f l r where formal study and informal learning can both have their place. recognising both parts of the „iceberg‟ by understanding the mechanisms at play is helpful in order to understand how to balance and integrate both positions so that professionalism can prosper. 5.3 discussion research in this area raises questions about how, what, why, and when teachers learn. currently we do not know much about the way the different mechanisms that we found in this study influence each other, which, in our view, merits further investigation. developing a „social awareness‟ of learning processes (boud & hager, 2012) can help to develop new metaphors for professional development (cf. de laat, schreurs & nijland, 2013) and open up new avenues of practice and research. findings from this study can be used to advance the theoretical understanding about the alignment of informal and formal professional development (cf. evers et al., 2011; mcguire & gubbins, 2010) and develop an instrument to engage school leaders and teachers in a constructive dialogue, and collect further data. our study has its limitations. by focusing on the interplay of formal and informal processes, we have provided a far from exhaustive overview of the findings in each of the chosen fields related to the subject. however, combining the insights from different areas of research in order to come to a shared framework there is scientific relevance to our study and our findings can be further conceptualised and validated. we would like to add to this the observation that there might not be one specific „optimal situation‟ for (networked) professional development to be effective; different people have different needs and views. organisations can be seen as a „complex responsive process‟ with many unexpected complexities and local realities, and only one-third of change efforts to improve quality in organisations are considered successful (pieterse, caniëls & homan, 2012). we believe that this is where making use of networks can be helpful: to provide open space for communication and learning, where individual differences can exist and prosper. openness, exchange, trust, and communication are relevant to both school leaders and teachers. promoting openness and development in the light of performance pressure, market-oriented reforms, and centrally imposed standards is no easy task. however, to be in control can sometimes mean, within limits, letting go of control. networks flourish by a healthy balance between formalities and informalities. striking this balance can be achieved by aiming both at facts and figures and at shared values and meaning. keypoints networks and networked learning are increasingly impostant for the work of teachers because of the increased complexity of the work professional development entails formal and informal processes. informal processes, that take up a large proportion of the learning process, are often overlooked and consequently do not receive much attention. networked learning can be helpful to integrate informal processes in the formal school context and align formal and informal learning procedures. creating a balance between the personal interest and performance requirements can provide for a healthy level of professional autonomy and increase motivation for working and learning adopting a perspective of networked learning can have implications for management and leadership. leadership and responsibilities can be shared in order to create a more „open‟ organisation. striking a healthy balance between attention for formal and informal processes means paying attention to both facts and figures and shared values and meaning. vaessen et al. 68 | f l r acknowledgements this paper has been supported by skoem, the netherlands. references agterberg, l. c. m. (2012). walking a tightrope: the dynamics of coordinating intra-organizational networks of practice (doctoral dissertation).vu, amsterdam. akkerman, s., petter, c., & laat, m. de. (2008). organising communities-of-practice: facilitating emergence. journal of workplace learning, 20(6), 383–399. doi: 10.1108/13665620810892067 angelides, p., georgiou, r., & kyriakou, k. (2008). the implementation of a collaborative action research programme for developing inclusive practices: social learning in small internal networks. educational action research, 16(4), 557–568. doi: 10.1080/09650790802445742 appleby, y., & hillier, y. (2012). exploring practice–research networks for critical professional learning. studies in continuing education, 34(1), 31-43. doi: 10.1080/0158037x.2011.613374 attard, k. (2012). public reflection within learning communities: an incessant type of professional development. european journal of teacher education, 35(2), 199-211. doi: 10.1080/02619768.2011.643397 baker‐ doyle, k. j., & yoon, s. a. (2011). in search of practitioner‐ based social capital: a social network analysis tool for understanding and facilitating teacher collaboration in a us‐ based stem professional development program. professional development in education, 37(1), 75–93. doi: 10.1080/19415257.2010.494450 bartlett, s., & burton, d. (2006). practitioner research or descriptions of classroom practice? a discussion of teachers investigating their classrooms. educational action research, 14(3), 395–405 doi:10.1080/09650790600847735 bass, b. m. (1991). from transactional to transformational leadership: learning to share the vision. organizational dynamics, 18(3), 19-31. doi: 10.1016/0090-2616(90)90061-s baumfield, v., & butterworth, m. (2005). developing and sustaining professional dialogue about teaching and learning in schools. journal of in-service education, 31(2), 297-312. doi:10.1080/13674580500200280 beckett, l. (2012). “trust the teachers, mother!”: the leading learning project in leeds. improving schools, 15(1), 10–22. doi: 10.1177/1365480211433723 berry, j. (2012). an investigation into teachers’ professional autonomy in england: implications for policy and practice. submitted in partial fulfilment of the requirements of the degree of phd. university of hertfordshire. bessant, j., alexander, a., tsekouras, g., rush, h., & lamming, r. (2012). developing innovation capability through learning networks. journal of economic geography, 12(5), 1087-1112. doi: 10.1093/jeg/lbs026 billett, s. (2004). workplace participatory practices: conceptualising workplaces as learning environments. journal of workplace learning, 16(6), 312-324. doi: 10.1108/13665620410550295 binz-scharf, m. c., lazer, d., & mergel, i. (2011). searching for answers: networks of practice among public administrators. the american review of public administration, 42(2), 202–225. doi: 10.1177/0275074011398956 boud, d., & hager, p. (2012). re-thinking continuing professional development through changing metaphors and location in professional practices. studies in continuing education, 34(1), 17-30. doi: 10.1080/0158037x.2011.608656 brown, j, collins, a; duguid p (1989). situated cognition and the culture of learning. educational researcher, vol. 18, no. 1. (jan. feb., 1989), pp. 32-42. doi: 10.3102/0013189x018001032 büchel, b., and raub, s. (2002). building knowledge-creating value networks. european management journal, 20(6), 587-596. doi: 10.1016/s0263-2373(02)00110-x carmichael, p., fox, a., mccormick, r., procter, r., & honour, l. (2006). teachers’ networks in and out of school. research papers in education, 21(2), 217–234. doi: 10.1080/02671520600615729 vaessen et al. 69 | f l r clus, le, m. (2011). informal learning in the workplace: a review of the literature. australian journal of adult learning, 51(2) 355-373. cross, j (2007). informal learning, rediscovering the natural pathways that inspire innovation and performance. san francisco: pfeiffer. daly, a. j., moolenaar, n. m., bolivar, j. m., & burke, p. (2010). relationships in reform: the role of teachers’ social networks. journal of educational administration, 48(3), 359–391. doi: 10.1108/09578231011041062 day, c., & hadfield, m. (2004). learning through networks: trust, partnerships and the power of action research. educational action research, 12(4), 575–586. doi: 10.1080/09650790400200269 darling-hammond, l., wei, r. c., andree, a., richardson, n., & orphanos, s. (2009). professional learning in the learning profession. washington, dc: national staff development council. de caluwe, l., & vermaak, h. (2003). learning to change: a guide for organization change agents. thousand oaks: sage. de laat, m. (2012). enabling professional development networks: how connected are you? inaugural address. heerlen, open universteit. de laat, m.f., schreurs, b., & nijland, f. (2014). communities of practice and value creation in networks. in. r.f. poell, t. rocco, & g. roth. (eds.). the routledge companion to human resource development. new york: routledge. earl, l., & katz, s. (2007). leadership in networked learning communities: defining the terrain. school leadership & management, 27(3), 239–258. doi: 10.1080/13632430701379503 edwards, f. (2012). learning communities for curriculum change: key factors in an educational change process in new zealand. professional development in education, 38(1), 25-47. doi: 10.1080/19415257.2011.592077 emmerik, h. van, jawahar, i. m., schreurs, b., & cuyper, n. de. (2011). social capital, team efficacy and team potency: the mediating role of team learning behaviors. career development international, 16(1), 82–99. doi: 10.1108/13620431111107829 eraut, m. (2004). informal learning in the workplace. studies in continuing education, 26(2), 247-273. doi: 10.1080/158037042000225245 evers, a. t., kreijns, k., van der heijden, b. i., & gerrichhauzen, j. t. (2011). an organizational and task perspective model aimed at enhancing teachers’ professional development and occupational expertise. human resource development review, 10(2), 151-179. doi: 10.1177/1534484310397852 forrester, g. (2000). professional autonomy versus managerial control: the experience of teachers in an english primary school. international studies in sociology of education, 10(2), 133–151. doi: 10.1080/09620210000200056 fox, a., haddock, j., & smith, t. (2007). a network biography: reflecting on a journey from birth to maturity of a networked learning community. curriculum journal, 18(3), 287–306. doi: 10.1080/09585170701589918 fuller, a., & unwin, l. (2003). learning as apprentices in the contemporary uk workplace: creating and managing expansive and restrictive participation. journal of education and work, 16(4), 407-426. doi: 10.1080/1363908032000093012 granovetter, m. (1973). the strength of weak ties. american journal of sociology, 78, 1360–1380. groot, n., & homan, t. h. (2012). strategising as a complex responsive leadership process. international journal of learning and change, 6(3/4), 156. doi: 10.1504/ijlc.2012.050858 hargreaves, e., berry, r., lai, y. c., leung, p., scott, d., & stobart, g. (2013). teachers’ experiences of autonomy in continuing professional development: teacher learning communities in london and hong kong. teacher development 17(1), 19-34 doi: 10.1080/13664530.2012.748686 hargreaves, a., & fullan, m. (2012). professional capital: transforming teaching in every school. new york: teachers college press. hodgson, v., de laat, m., mcconnell, d., ryberg, th. (eds.) (2014) the design, experience and practice of networked learning. new york: springer hodkinson, h., and hodkinson, p. (2005). improving schoolteachers' workplace learning. research papers in education, 20(2), 109-131. doi: 10.1080/02671520500077921 vaessen et al. 70 | f l r holmes, d. (2004). nuts, bolts, levers and cranks: designing enquiry-based learning in hartlepool. improving schools, 7(2),121–128.doi: 10.1177/1365480204047344 hökkä, p., & eteläpelto, a. (2013). seeking new perspectives on the development of teacher education: a study of the finnish context. journal of teacher education, 65(1), 39–52. doi: 10.1177/0022487113504220 hulsbos, f., anderson, i., kessels, j., & wassink, h. (2012). professionele ruimte en gespreid leiderschap [professional autonomy and distributed leadership]. look report 37, open university, the netherlands. imants, j., wubbels, t., & vermunt, j. d. (2013). teachers’ enactments of workplace conditions and their beliefs and attitudes toward reform. vocations and learning, 6(3), 323–346. doi: 10.1007/s12186013-9098-0 jones, j. (2006) leadership in small schools: supporting the power of collaboration. management in education 20(2) 24-28. doi: 10.1177/089202060602000207 kubiak, c. (2009). working the interface: brokerage and learning networks. educational management administration & leadership, 37(2), 239–256. doi: 10.1177/1741143208100300 kubiak, c., & bertram, j. (2010). facilitating the development of school-based learning networks. journal of educational administration, 48(1), 31–47. doi: 10.1108/09578231011015403 lieberman, a. (2000). networks as learning communities shaping the future of teacher development. journal of teacher education, 51(3), 221-227. doi: 10.1177/0022487100051003010 mccormick, r. (2010). the state of the nation in cpd: a literature review. curriculum journal, 21(4), 395– 412. doi: 10.1080/09585176.2010.529643 mcguire, d., & gubbins, c. (2010). the slow death of formal learning: a polemic. human resource development review, 9(3), 249-265. doi: 10.1177/1534484310371444 moses, a. s., skinner, d. h., hicks, e., & o‟sullivan, p. s. (2009). developing an educator network: the effect of a teaching scholars program in the health professions on networking and productivity. teaching and learning in medicine, 21(3), 175–9. doi: 10.1080/10401330903014095 o‟brien, m., varga-atkins, t., burton, d., campbell, a., & qualter, a. (2008). how are the perceptions of learning networks shaped among school professionals and headteachers at an early stage in their introduction? international review of education, 54(2), 211–242. doi: 10.1007/s11159-0089084-1 pahor, m., škerlavaj, m., & dimovski, v. (2008). evidence for the network perspective on organizational learning, journal of the american society for information science and technology, 59(12), 1985– 1994. doi: 10.1002/asi penuel, w. r., riel, m., joshi, a., pearlman, l., kim, c. m., & frank, k. a. (2010). the alignment of the informal and formal organizational supports for reform: implications for improving teaching in schools. educational administration quarterly, 46(1), 57–95. doi: 10.1177/1094670509353180 penuel, w., riel, m., krause, a., & frank, k. (2009). analyzing teachers' professional interactions in a school as social capital: a social network approach. the teachers college record, 111(1), 124-163. pieterse, j. h., caniëls, m. c., & homan, t. (2012). professional discourses and resistance to change. journal of organizational change management, 25(6), 798-818. doi: 10.1108/09534811211280573 poell, r. f. & van der krogt f.j. (2013). the role of human resource development in organizational change : professional development strategies of employees, managers and hrd practitioners, 1– 20. chapter to be published in international handbook of research in professional and practicebased learning. s. billett, c. gruber (eds.). dordrecht: springer (in press) price, d. (2013). open: how we’ll work, live and learn in the future. kindle edition. pugh, k. and prusak, l. (2013). designing effective knowledge networks. retrieved october, 24, 2013, from http://sloanreview.mit.edu/article/designing-effective-knowledge-networks/. ryan, j., kang, c., mitchell, i., & erickson, g. (2009). china's basic education reform: an account of an international collaborative research and development project. asia pacific journal of education, 29(4), 427-441. doi: 10.1080/02188790903308902 vaessen et al. 71 | f l r ryan, r. m., & deci, e. l. (2000). self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. american psychologist, 55(1), 68. doi: 10.1037/0003066x.55.1.68 scribner, j. p., sawyer, r. k., watson, s. t., & myers, v. l. (2007). teacher teams and distributed leadership: a study of group discourse and collaboration. educational administration quarterly, 43(1), 67–100. doi: 10.1177/0013161x06293631 schultz, k. (2011). beginning with the particular: reimagining professional development as a feminist practice. the new educator, 7(3), 287–302. doi: 10.1080/1547688x.2011.593997 seezink, a., poell, r., & kirschner, p. (2010). soap in practice: learning outcomes of a cross‐ institutional innovation project conducted by teachers, student teachers, and teacher educators. european journal of teacher education, 33(3), 229–243. doi: 10.1080/02619768.2010.490911 schechter, c. (2012). the professional learning community as perceived by israeli school superintendents, principals and teachers. international review of education, 58(6), 717-734. doi: 10.1007/s11159012-9327-z siemens g., (2004) connectivism: a learning theory for the digital age. retrieved from http://www.elearnspace.org/articles/connectivism.htm snowden, d. (2005). from atomism to networks in social systems. the learning organization, 12(6), 552– 562. doi:10.1108/09696470510626757 spillane, j. p. (2005, june). distributed leadership. in the educational forum (vol. 69, no. 2, pp. 143-150). taylor & francis group. doi: 10.1080/00131720508984678 supovitz, j., sirinides, p., & may, h. (2009). how principals and peers influence teaching and learning. educational administration quarterly, 46(1), 31–56. doi: 10.1177/1094670509353043 trotman, d. (2009). networking for educational change: concepts, impediments and opportunities for primary school professional learning communities. professional development in education, 35(3), 341–356. doi: 10.1080/13674580802596626 tynjälä, p. (2013). toward a 3-p model of workplace learning: a literature review. vocations and learning, 6(1), 11-36. doi: 10.1007/s12186-012-9091-z varga-atkins, t., o‟brien, m., burton, d., campbell, a., & qualter, a. (2009). the importance of interplay between school-based and networked professional development: school professionals’ experiences of inter-school collaborations in learning networks. journal of educational change, 11(3), 241–272. doi: 10.1007/s10833-009-9127-9 van den beemt, a., ketelaar, e., & diepstraten, i. (2014). reciprocity in knowledge networks. paper presented at the 2014 networked learning conference, edinburgh, uk. villegas-reimers, e. (2003). teacher professional development: an international review of the literature. paris: international institute for educational planning. wenger, e., mcdermott, r. and snyder, w. m. (2002). cultivating communities of practice. boston, mass: harvard business school press. http://psycnet.apa.org/doi/10.1037/0003-066x.55.1.68 http://psycnet.apa.org/doi/10.1037/0003-066x.55.1.68 http://dx.doi.org/10.1007%2fs12186-012-9091-z codepen hietajarvi et al publication frontline learning research vol.8 no. 1 (2020) 33 55 issn 2295-3159 are schools alienating digitally engaged students? longitudinal relations between digital engagement and school engagement lauri hietajärvia, kirsti lonkaa, kai hakkarainena, kimmo alhob & katariina salmela-aroa a faculty of educational sciences, university of helsinki b department of psychology and logopedics, faculty of medicine, university of helsinki article received 29 november 2018 / revised 15 december 2019/ accepted 2 february 2020/ available online 20 february abstract this article examined digital learning engagement as the out-of-school learning component that reflects informally emerging socio-digital participation. the gap hypothesis proposes that students who prefer learning with digital technologies outside of school are less engaged in traditional school. this hypothesis was approached from the framework of connected learning, referring to the process of connecting self-regulated and interest-driven learning across formal and informal contexts. we tested this hypothesis with longitudinal data. it was of interest how digital engagement, operationalized as a general digital learning preference, wish for digital schoolwork, and their interaction, is related to traditional school engagement. this was examined both cross-sectionally in three time points and longitudinally across three years. the participants were 1,705 (43.7% female) 7th–9th graders (13-15 years old) from 27 schools in helsinki, finland. we explored the structure of correlations between latent constructs at each time point separately, and finally, to evaluate longitudinal relations between digital engagement and school engagement we specified latent cross-lagged panel models. the results indicate that students holding a stronger general digital learning preference experienced higher schoolwork engagement, both contemporaneously and over time, indicating successful connected learning. however, the results also showed support for the gap hypothesis: students who preferred digital learning but did not have the chance to digitally engage at school, experienced a decrease in school engagement over time. the article shows that there is a need to examine the reciprocal interactive processes between the learners and their social ecologies inside and outside school more closely. keywords: connected learning; digital engagement; schoolwork engagement; gap hypothesis; longitudinal analysis info corresponding author: lauri.hietajarvi@helsinki.fi doi: 10.14786/flr.v8i1.437. 1. introduction connecting learning across in-school and out-of-school contexts has been a continuous challenge in education (e.g., malcolm, hodkinson, & colley, 2003) and the novel informal learning opportunities provided by digital media have highlighted tensions between informal and formal practices of learning (ito et al., 2013). in general, previous research shows that the more students spend time engaging with digital media the more skills they are able to acquire (eu kids online, 2014). when these informally cultivated digital practices are successfully connected with academic learning practices, such students are likely to flourish and extend their potentials (ito et al., 2013). yet, finnish young people are not provided sufficient structured support in school for cultivating advanced digital competences, as digital technologies are used at finnish school seldom and mostly for shallow training of basic digital skills (european parliament, 2015; european commission, 2017; hakkarainen, hietajärvi, alho, lonka, & salmela-aro, 2015). in this condition, an increased gap or misfit between a digitally engaged learner and the learning environment may occur. the gap hypothesis proposes that students who prefer learning with digital technologies outside of school are less engaged in traditional school. this is problematic because school engagement is crucial for students’ learning, academic development, and well-being (salmela-aro & upadyaya, 2012; upadyaya & salmela-aro, 2013). thus, promoting practices that connect informal and formal learning as well as support school engagement should be the main goals of modern pedagogical practices. however, in comparison to other pisa countries, technology-enhanced pedagogies appear to be not so widely adopted in finnish schools (oecd, 2015), and utilizing digital technologies successfully in education calls for transformations in the social practices of schooling, which appear to be happening very slowly (hakkarainen, 2009). such transformations are nevertheless needed, as the schooling system should prepare students for the current technology-rich innovation-driven society that calls for collaborative solving of complex non-routine problems and cultivating associated personal and social competences. toward that end, it is important to learn creative and academic practices of using socio-digital technologies (hakkarainen et al., 2015). the conventional individualist, acquisition-oriented, and teacher-centered educational practices prevailing at school are considered to be a major hindrance to creating such a workforce (robinson, 2011). the present study addresses the conditions of continuity and discontinuity between these informal and formal contexts of learning, and how these can be seen as either indicators of connected learning or the ‘gap’ and how such interconnections are reflected in learners’ school engagement. the assumption of the gap between adolescents’ digital and academic engagement is not a new, it originates from prensky’s (2001) introduction of the controversial concept of digital natives (see also bennett & maton, 2010). the argument is that due to extended socialization in using socio-digital technologies, adolescents are often very comfortable with various socio-digital tools and applications and are able to fluently learn novel applications (hakkarainen et al., 2015). it is developmentally significant that young generations have cognitively socialized to a radically different social and technological environment than the older generations (wexler, 2006). the earlier and the more intensively young people adapt to the transforming cognitive, social, and cultural environment, the stronger the impact of this environment on their intellectual, emotional, behavioural, and social engagement is likely to be (moisala et al, 2016a; 2016b; ritella & hakkarainen, 2012). the gap hypothesis stems from the idea that in schools, digital immigrants, who are not sharing the same ‘language’, are teaching digital natives (hakkarainen et al, 2015; prensky, 2001). although prensky’s digital natives are teachers of today, it is suggested that the gap between some students’ progressive use of digital media outside of the classroom and the traditional pedagogies of most schools is still growing (ito et al., 2013). however, the gap can emerge for various reasons and can reflect various psychological processes. for instance, the students’ out-of-school interests and competencies may not be socially recognized leading to experiences of withdrawal and disengagement (rajala, kumpulainen, hilppö, paananen, & lipponen, 2015). it is also possible that the students’ out-of-school practices of working with learning and knowledge are critically different from the traditional practices of school (kumpulainen & sefton-green, 2012, mcfarlane, 2015). such situation may cause discontinuities, for instance, between individual versus social learning, externally regulated teaching versus self-initiated inquiry learning, and working with pre-digested textbook context versus navigating through open knowledge and media spaces. despite the controversy over the original concepts of digital natives and digital immigrants, it seems that there indeed are gaps between connecting (digital) learning across in-school and out-of-school contexts. 2. digital engagement and school engagement the early empirical findings tapping into the concept of digital natives revealed that students do not share the same experiences and competencies with digital media, ranging from students that engage in a wide range of digital activities do not participate in similar activities at all, with a spread of moderate participators in between (bennett & maton, 2010). a year-long ethnographic investigation of ito and colleagues (2010; see also barron, 2006) revealed diverging but partially overlapping genres of socio-digital participation. most adolescents use digital technologies for shallow friendship oriented hanging out with an extended network of friends. a much smaller proportion of young people use the emerging socio-digital technologies for pursuing their interests messing around with like-minded peers at social and digital networks. a significant but relatively small group of young people are geeking out by developing their technological and creative socio-digital competences (li, hietajärvi, palonen, salmela-aro, & hakkarainen, 2017). presumably interest-driven socio-digital participation fosters learning and development of young people and may assist in cultivating considerable student expertise (olson & bruner, 1996) concerning digital learning and activity. by relying on bereiter and scardamalia’s (1993) notion of progressive problem solving and hatano and inagaki’s (1992) adaptive expertise, hakkarainen and colleagues (2000) constructed measures for assessing to what extent young people have developed such crucial aspects of student expertise as putting effort to using digital technologies at the edge of competences and enjoying working with challenging problems with digital technologies. hence, a significant proportion of young people pursue their interest online, are supported by their more competent peers and have learned considerable digital competences through intensive socio-digital participation. it is also typical for young people to learn through active, although not always very deep, personal and social exploration rather than learn by passively consuming pre-determined information (hietajärvi, seppä, & hakkarainen, 2016; li et al, 2017). from the viewpoint of gap hypothesis, it is claimed that active socio-digital participators and especially those who have developed sophisticated peer learning and digital competences in informal contexts, may not get sufficient social recognition of their capabilities and, therefore, might feel alienated and experience mismatch between their personal and school practices, indicating inadequate person-environment fit. consequently, this has been suggested to be among the factors contributing to lower school engagement of those actively digitally engaging students, pointing to the gap between adolescents’ digital and school-related engagement (see, e.g., prensky 2001; halonen, hietajärvi, lonka ,& salmela-aro, 2017; kumpulainen & sefton-green, 2012; salmela-aro, muotka, alho, hakkarainen, & lonka, 2016a; selwyn, 2006). empirically, there appears, however, to be both continuities and discontinuities between young peoples’ digital learning engagement and school engagement. case studies have described students' informal interest-driven learning practices that can both facilitate and obstruct academic engagement (deng, connelly & lau, 2016; gurung & rutledge, 2014), and have also suggested that integrating practices of informal digital learning engagement in schoolwork can enhance student engagement (clements, 2015; esteves, 2012). larger scale quantitative studies, although scarcer, point to a similar direction: digital participation is related to both self-directed learning and student engagement (laird & kuh, 2005; rashid & ashgar, 2016). previous studies supporting the gap hypothesis, in turn, have suggested that that students’ reporting more cynicism towards school also reported that they would be more engaged in their schoolwork if they were able to use more digital technologies (halonen et al., 2017; salmela-aro et al., 2016a). hakkarainen and colleagues (2000) already reported similar finding among a large sample of finnish primary and secondary students. yet, other studies offer both positive and negative relations between out-of-school digital learning engagement and student engagement depending on the actual activities (hietajärvi, salmela-aro, tuominen, hakkarainen, & lonka, 2019; junco, 2012a, 2012b). in terms of adopting digital pedagogies in school, previous studies indicate that integrating digital technologies and media in education in general appears to offer mainly positive results regarding engagement and performance (chen, lambert, & guidry, 2010; junco, 2011; sung, chan, & liu, 2016; tamim, bernard, borokhovski, abrami, & schmid, 2011). 3. the conceptual framework in the present study, the gap hypothesis is approached from the framework of connected learning (ito et al, 2013; kumpulainen & sefton-green, 2012). connected learning refers to the process of connecting adolescents’ self-regulated and interest-driven learning (barron, 2006) across formal and informal contexts, in the reciprocal interactive processes between the learners and their social ecologies (nardi & o’day, 2000). connected learning is anchored to interest rather than mere friendship-driven socio-digital participation (ito et al., 2010). connected learning emerges when young people find contexts for pursuing their interests, network with like-minded peers, and when academic institutions recognize the value of informally developed knowledge and competences and allow interest-driven learning to be relevant in school (ito et al., 2013). further, it involves extensive peer-to-peer learning and supports learning processes relevant for academic achievements, civil activity, and, perhaps, also for professional career. connected learning takes into account a wide range of learning contexts, whereas in the present study, we focused on digital learning engagement as the out-of-school learning component. more precisely, we conceptualize digital learning engagement as reflecting informally emerging socio-digital participation (hakkarainen, hietajärvi, alho, lonka, & salmela-aro, 2015) including a considerable degree of self-regulated (boekarts & minnaert, 1999, see also panadero & järvelä, 2015) learning embedded on the contexts of their peer supported interest-driven learning ecologies (barron, 2006). it needs to be noted that although self-regulated learning is an internal process, it is embedded in digital and social context and environment and it is assisted and influenced by social interaction (panadero & järvelä, 2015). as such, we consider digital learning engagement as situated within the ecologies of connected learning. specifically, we operationalized digital learning engagement with two constructs. more precisely, we operationalized digital learning engagement as being expressed through having a digital learning preference, a preference for cultivation of adaptive student expertise concerning digital learning and problem-solving (hakkarainen et al, 2000) as well as showing different degrees of wish for digital schoolwork, that is, wish for connecting this digital learning to the context of school. in addition, in this study we extend the gap hypothesis to contribute to the wider research on school engagement (fredricks, blumenfeld & paris, 2004), operationalized as schoolwork engagement (upadyaya & salmela-aro, 2013), a generally positive disposition towards school and schoolwork that is considered a key outcome and indicator of connected learning (ito et al., 2013). specifically, we utilized the framework of connected learning to provide novel empirical information of the antecedents of school engagement combining it with the framework of demands-resources model extended to academic well-being (salmela-aro & upadyaya, 2014). in doing so, the present study attempts to provide insights into the psychological processes underlying successful experiences of connected learning, or, conversely, an experience of disengagement. according to this model, possible relations between digital learning engagement and school engagement can be viewed as resulting from the balance between the psychological demands of the situation and the resources available to overcome these demands, conceptualized over two processes, the energy-depleting process and the motivational process (demerouti, bakker, nachreiner & schaufeli, 2001; salmela-aro & upadyaya, 2014). digital learning engagement can function over both these processes by increasing the demands and depleting energy (e.g., multitasking, interruptions, cognitive load) or providing extended resources cultivating engagement (e.g. knowledge building and utilization, peer support) (barron, 2006; ito et al., 2013; salmela-aro & upadyaya, 2014). congruence between digital learning engagement and school engagement can be conceptualized as a condition of successful connected learning, in which the resources gained in out-of-school digital activities are successfully connected to in-school learning. in turn, the gap can be used to conceptualize a condition of discontinuity where these interest-driven digital learning practices collide with, for instance, strong external regulation and teacher-centered practices, and informally developed competencies and practices of learning are not utilized in school. similar discontinuity may follow if students, in their informal practices, have learned to rely on intensive peer-to-peer learning but are expected to work mostly alone at school. these discontinuities and contradictions in the possibilities to utilize the resources gained in out-of-school learning may consequently decrease students’ engagement with schoolwork. 4. research aim and hypotheses previous research indicates that active digital learning engagement is related to both positive and negative school-related outcomes, but there is little knowledge concerning the conditions by which these positive or negative outcomes come to be. moreover, most previous studies were conducted in higher education context, and no longitudinal designs have come to our knowledge. therefore, the present study was conducted in upper comprehensive school and with a longitudinal design. more precisely, the present study empirically focuses on the question of how digital learning engagement is related to school engagement both cross-sectionally and over time, while taking into account how wishing to use more, or less, digital technologies in schoolwork moderates the longitudinal relationship. the present research aimed to examine processes of continuity and discontinuity between out-of-school digital learning engagement and school engagement, utilizing both the broader framework of connected learning combined with the demands-resources model (salmela-aro & upadyaya, 2014) as tools to interpret the processes underlying the gap. towards that end, by combining the framework of connected learning with the demands-resources model (salmela-aro & upadyaya, 2014) we expected that (hypothesis 1) having a digital learning preference would be reflected as also having a positive disposition towards school indicated by a positive relation to schoolwork engagement (both cross-sectionally and longitudinally). the positive relation was expected due to the supposedly increased psychological resources (salmela-aro & upadyaya, 2014) resulting from the informal self-regulated and connected learning happening in the process (barron, 2006, see also hietajärvi et al., 2019). in contrast, we expected that (hypothesis 2) reporting a higher wish for digital schoolwork, that is, experiencing a discontinuity between the practices of one’s informal interest-driven learning and academic learning, would be negatively related to schoolwork engagement both cross-sectionally and longitudinally (kumpulainen & sefton-green, 2012). in particular, we hypothesized that (hypothesis 3) the negative relation from a wish for digital schoolwork and school engagement would be explained especially by interaction between digital learning preference and the wish for digital schoolwork, resulting in a condition of discontinuity. in other words, we expected that the combination of having a digital learning preference without the possibility to connect it to academic learning would lead to lower school engagement. 5. method 5.1 participants there was a total of 1,705 (43.7% male) participants from 27 schools in helsinki. the data were collected annually in spring following the same students across grades 7 (age ~14 yrs., n = 1272), 8 (age ~15 yrs., n = 1150) and 9 (age ~16 yrs., n = 903) over the upper comprehensive school. of all participants 1,090 (64%) participated in the study at least at two time points over the data collection period and 530 (31%) participated in all the waves. the participants filled in a self-report questionnaire on digital engagement and academic well-being. participation in the study was voluntary and informed consent forms were collected from the students and from their parents. data collection was organized as a convenience sample, that is, all teachers in the schools that were able to organize data collection administered the questionnaires during school hours and all students that were attending at the time of data collection and were willing to take the questionnaire were included as participants. because of the data collection procedure the reasons for attrition may be due to either the schools or the teachers’ inability to incorporate the data collection into their timeframe, the students being absent during the data collection or unwilling to respond. despite this, the number of students that participated in at least two waves was satisfactory. the study protocol was approved by the university of helsinki ethical review board in the humanities and social and behavioural sciences. 5.2 measures the means, standard deviations and measures of internal consistency for all constructs used in this study are shown in table 1. 5.2.1 digital learning engagement digital learning engagement was conceptualized as, on one hand, showing orientation towards learning with technologies in general, and on the other hand, expressing enthusiasm towards using more socio-digital technologies in formal schoolwork. these were measured with two constructs both measured on a scale from 1 (= not at all true) to 5 (= very true). digital learning preference (dlp; hakkarainen et al., 2000; see also halonen et al., 2017) was measured with four items that assessed having a preference towards learning and solving problems with digital technologies (e.g. “it’s fun to learn to use digital technologies, because it offers continuously new challenges”). rather than merely assessing interest in using digital technology, the items have been designed so that they trace students’ orientation toward learning with and about digital technology, and, thereby related to cultivation of adaptive student expertise concerning digital learning and problem-solving. wish for digital schoolwork (wds; hakkarainen et al., 2000; see also halonen et al., 2017; salmela-aro et al, 2016a) was measured with three items that directly tapped into the gap hypothesis by assessing motivation and possibilities towards using more digital technologies in school and its perceived effect on school engagement with three items (e.g., “i’m more engaged in my schoolwork when i’m able to use digital technologies”). in other words, the scale assessed whether or not the student favoured the use of more digital technologies in schoolwork. higher scores indicated a stronger wish towards using more technologies in schoolwork. table 1 raw means, standard deviations and measures of internal consistencies of the constructs used in the models 5.2.2 school engagement school engagement was assessed using the schoolwork engagement inventory (i.e., eda abbreviated from energy, dedication, and absorption; salmela-aro & upadyaya, 2012) measuring a trait-like long-term study-related positive state of mind. the inventory consists of three subscales, each including three items, measuring energy (e.g., “when i study, i feel i’m bursting with energy”), dedication (e.g., “i am enthusiastic about my studies”), and absorption (e.g., “time flies when i’m studying”). however, schoolwork engagement is often specified as a unidimensional measurement model representing a generally positive study-related frame of mind (salmela-aro & upadyaya, 2012). the items were rated on a scale ranging from 1 (‘never’) to 7 (‘every day’). 5.3 analysis strategy we followed an analysis strategy in which we first ran preliminary analyses for screening the data and ensuring that the latent constructs we used carried the same meaning across gender and time. second, we analysed gender differences in the means of latent constructs. third, to test hypotheses 1 and 2, we examined the partial correlations between our latent constructs separately in each point of measurement and specified the longitudinal model. finally, to test hypothesis 3 we added the latent interaction to the longitudinal model. the more detailed steps in the analysis procedure are described as follows. first, as preliminary analysis, the data were screened for the number and patterns of missing values using the ibm statistical package for social sciences, version 25 (spss). the missing values were assessed longitudinally and at each time point separately. second, we specified and tested the measurement model using a confirmatory factor analysis approach (cfa). residuals of the same items were allowed to be correlated over time. the analyses were conducted using mplus 8.0 (muthén & muthén, 2018) in conjunction with r and rstudio (r core team, 2018) with the package mplusautomation (hallquist & wiley, 2018). maximum likelihood with standard errors robust for non-normality (mlr) was used as the estimator and missing data was handled with full information maximum likelihood estimation (fiml). the complex survey data option (muthén & muthén, 2018; see also asparouhov & muthén, 2006; muthén & satorra, 1995) was used in all analyses to correct for non-independence at the class level. invariance of the measurement model across the factor structure (configural), factor loadings (metric) and item intercepts (scalar) was tested to ensure that the measures held the same meaning across gender and over time. the model fits (see e.g. hu & bentler, 1998) were evaluated based on the chi-square value as well as the root mean square error of approximation (rmsea) with an approximate acceptable cutoff value of less than .08, standardized root mean residual (srmr) with an approximate cutoff or less than .08, and, incremental indexes such as the comparative fit index (cfi) and the tucker-lewis index (tli) with approximate acceptable cutoff values of greater than .9. in evaluating measurement invariance, we relied on the conventional criteria of evaluating change in rmsea and cfi (chen, 2007; for more discussion on measurement invariance testing, see putnick & bornstein, 2016). finally, after confirming that the measurement model represented sufficiently the same constructs across both gender and time, we explored mean differences across gender by regressing each latent factor on gender. to test the cross-sectional parts of hypotheses 1 and 2, that is, to evaluate how digital learning engagement and school engagement are related within a time point, we explored the gender-controlled partial correlations between the latent variables. this was done by visualizing the latent variable partial correlations by plotting the variables as nodes in a ebicglasso-regularized network (epskamp & fried, 2018) using the r-package qgraph (epskamp, cramer, waldorp, schmittmann, & borsboom, 2012). partial correlations are presented, so the edges in the latent partial correlation network can be interpreted similarly as regression path coefficients, as they are controlled for gender as well as each other, but without assuming any direction of effects. this type of modelling allows for a powerful measurement error corrected modelling and visualization of contemporaneous relations between latent variables when the direction of effects cannot be inferred from the data (guyon, falissard, & kop, 2017). then, to test the longitudinal parts of hypotheses 1 and 2, that is, to evaluate how digital learning engagement and school engagement are related over time, we specified latent longitudinal panel models (l-clpm; little, preacher, selig & card, 2007). the clpm is especially useful for identifying the relations between variables across time and can be applied to identify a possible causal relationship between variables measured at different time points. the clpm accounts for stability over time through the inclusion of autoregressive parameters. more precisely, the autoregressive effects describe the stability of individual differences from one measurement point to the next, whereas the cross-lagged effects describe the effect of a variable on another measured at a later occasion. taking into account autoregressive effects, cross-lagged effects in the present study can be interpreted as predicting change over time (selig & little, 2012). moreover, our models were specified using a latent measurement model, so that the variables were free of measurement error (little et al., 2007). further, in our models mean differences across gender were controlled for by regressing each latent variable on gender. the model was specified with both autoregressive and crossed paths specified between successive time points and the paths from time 1 to time 2 were constrained equal with paths from time 2 to time 3 to achieve a simpler model. the cost of acquiring a simpler model in comparison to an unconstrained model was evaluated by examining the change in fit indices rmsea and cfi as well as the bayesian information criterion (bic), which penalizes complexity (raftery, 1995). finally, to test hypothesis 3, that is, to examine the presence of a condition of discontinuity, we included latent interactions between digital learning preference and a wish for digital schoolwork as predictors of schoolwork engagement. the latent interactions were estimated using the latent moderated structural equations (lms) approach (klein & moosbrugger, 2000) implemented in mplus (muthén & muthén, 2018) as a maximum likelihood-based approach, which, in general can be viewed as recommendable (see e.g. marsh, wen & hau, 2004). 6. results 6.1 preliminary results there were less than 10% missing values overall at each time point. based on little’s mcar test, the data were not missing completely at random longitudinally (χ2 (7346) = 7702.84, p = .003). looking at each time point separately the data were missing completely at random at time 1 and time 2 (χ2 (805) = 823.04, p = .322; t2: χ2 (488) = 527.56, p = .105), whereas at time 3 the mcar assumption did not hold (χ2 (388) = 509.86, p < .001). 6.1.1 measurement model first, we specified a baseline measurement model in the first time point and continued to test for measurement invariance across gender. the baseline model (see figure 1) fitted the data acceptably (χ2(101)=611.21, χ2 scaling correction factor (cf) = 1.20, p < .001, rmsea = .063, cfi = .956, tli = .948, srmr = .032). figure 1. the baseline measurement model for time 1 with unstandardized factor loadings. we then proceeded to evaluate the measurement invariance across gender and time (model fit indices as well as the factor loadings and r2 for the scalar longitudinal measurement model are presented in the appendices). the results indicated that there were no considerable differences in the measurement model between boys and girls and that the constructs we measured did not change in their meaning over time (for details see appendix a). then, to control for mean differences across gender, we regressed each latent factor on gender. the model fit did not decline and the modification indices did not suggest any direct effects between the factor indicators and gender. regarding mean differences across gender (see table 2) the model indicated that male participants scored higher in digital learning preference than female participants and wished for digital schoolwork at each time point, whereas in schoolwork engagement there were no gender differences. table 2 mean differences across gender in the latent variables note: gender code: 0, ‘female’; 1, ‘male’. the estimate to be interpreted as how the ‘male’ group differs from the ‘female’ group. 6.2 cross-sectional relations between digital learning engagement and school engagement to answer how digital learning preference and a wish for digital schoolwork are related to schoolwork engagement cross-sectionally, we examined the contemporaneous partial correlations between the latent variables. the correlation coefficients were extracted from the latent measurement model with scalar invariance constraints and gender as a covariate. the latent variables showed a similar pattern of relations with each other at all time points as can be inferred from figure 2 (for zero-order correlations see appendix c). the results indicated that when controlled for each other, digital learning preference was positively related to both a wish for digital schoolwork and schoolwork engagement, supporting hypothesis 1. however, when controlled for digital learning preference, a wish for digital schoolwork was negatively related to schoolwork engagement, supporting hypothesis 2. figure 2. cross-sectional gender-controlled latent variable partial correlation networks. note: dlp, digital learning preference; wds, wish for digital schoolwork; eda, schoolwork engagement. network estimated with ebicglasso regularization (see epskamp & fried, 2018). nodes placed by fruchterman-reingold algorithm (fruchterman & reingold, 1991). blue indicates positive correlations, red negative, and the width and colour of the edges correspond to the absolute value of the correlations: the stronger the correlation, the thicker and more saturated the edge (see espkamp et al., 2012). this pattern of partial correlations gives a reason to suspect that the effect of digital learning preference on schoolwork engagement might be moderated by a wish for digital schoolwork, in which case there would be discontinuity between out-of-school digital learning and the possibilities to connect this to school. in other words, how the digital learning preference is related to school engagement would depend on the participants’ motivation and possibilities in using technologies in school, or the students’ personal digital learning practices’ fit with the pedagogical practices at school. 6.3 longitudinal relations between socio-digital participation and academic well-being to answer how digital learning preference and a wish for digital schoolwork are related to schoolwork engagement longitudinally, we specified the latent longitudinal panel model. the model fitted the data well (χ2 (1096) = 3151.59, cf = 1.11, p < .001, rmsea = .034, cfi = .944, tli = .940, srmr = .038) and the model fit did not differ considerably from the unconstrained measurement model. in addition, bayesian information criterion (bic) favoured the model (bic = 142676.60) over the more complex unconstrained model (bic = 142703.41). we then included the hypothesized latent interaction. the interaction term was statistically significant and resulted in a slightly better fit as indicated by the log-likelihood chi-square difference (∆χ2 = 7.23, df = 1, p = .007). thus, the results presented are from the model with stationarity assumed and the latent interaction included. all unstandardized structural model parameters are presented in table 3 and statistically significant structural parameters are illustrated in figure 3. table 3 autoregressive and cross-lagged parameters and latent interactions of the longitudinal panel model the model indicated that, first, the effects of the same construct on itself over time were moderately strong and somewhat carried over for two years, indicating that the constructs are quite stable over time. beyond these autoregressive effects the model also revealed that digital learning preference predicted higher schoolwork engagement, supporting hypothesis 1. a wish for digital schoolwork had only a weak and statistically insignificant negative effect on schoolwork engagement. however, their interaction predicted schoolwork engagement negatively in line with hypothesis 3. figure 3. structural parameters of the longitudinal panel model. note: cross-lagged paths with positive coefficients highlighted in blue, negative in red. only paths significant at p < .05 illustrated for clarity. gender effects omitted for clarity. a closer inspection of the interaction (see figure 4) indicated that a wish for digital schoolwork predicted lower future schoolwork engagement only for those students that had reported a higher digital learning preference. for those students reporting lower digital learning preference there appeared to be no effect between a wish for digital schoolwork and schoolwork engagement. figure 4. schoolwork engagement regressed on wish for digital schoolwork with differing levels (±1 sd) of digital learning preference. note: the blue line represents high digital learning preference (+1sd), the red line low digital learning preference (-1sd); dashed lines represent 95% confidence intervals. 7. discussion the present study aimed to investigate connected learning in terms of examining the conditions of continuity and discontinuity emerging between students’ self-regulated and interest-driven informal learning in the digital contexts and their school engagement. previous research has shown that contextual factors such as parental affect, teacher support and mastery-supportive classroom atmosphere promote higher school engagement (upadyaya & salmela-aro, 2013), and the present study sheds light on how orientation toward learning through digital tools and orientation toward developing socio-digital competencies might be related to the equation. the present results support the gap hypothesis, but also reveal signs of connected learning, connecting digital learning engagement with schoolwork engagement. more precisely, we expected that having an digital learning preference would be reflected in having a positive study-related state of mind (hypothesis 1), that wishing to use more digital tools in schoolwork would be negatively related to school engagement (hypothesis 2), and that especially the combination of an individual disposition towards digital learning and a lack of contextual support would lead to lower school engagement (hypothesis 3) and thus represented a condition of discontinuity. the present cross-sectional results give support for all our hypotheses, whereas the present longitudinal data only supports hypotheses 1 and 3. cross-sectionally, we observed that digital learning preference is positively correlated to both a wish for digital schoolwork and schoolwork engagement. wishing for more digital schoolwork, in turn, was negatively correlated to schoolwork engagement. longitudinally, digital learning preference predicted higher schoolwork engagement across time and the interaction between digital learning preference and a wish for digital schoolwork predicted later schoolwork engagement negatively. the directions of the effects appeared very clear: schoolwork engagement did not predict increases or decreases in either digital learning preference or wish for digital schoolwork, indicating that these indeed work as antecedents for school engagement. the present results might be interpreted in various ways. from the viewpoint of connected learning framework, the positive association between digital learning engagement and school engagement might indicate successes in connecting learning across informal and formal contexts. this is understandable because the items in the digital learning preference scale trace the participants’ orientation toward cultivating their adaptive expertise of learning through digital technologies; such efforts may generalize from digital to other spheres of learning. this is also in line with previous studies showing a positive relation between digital and academic engagement (laird & kuh, 2005; rashid & ashgar, 2016), and gives support to the previous findings with a longitudinal component included. thus, the present results also indicate that adopting more sophisticated digital practices and competencies might help students build novel resources for schoolwork, thus contributing to higher school engagement over time in line with the demands-resources model (hietajärvi et al., 2019; salmela-aro & upadyaya, 2014). such resources could involve students’ spontaneous use of digital technologies for academic purposes, such as seeking school-relevant knowledge from the internet or reciprocally helping one another in schoolwork (hietajärvi, seppä, & hakkarainen, 2016, li et al., 2017). further, given that especially students with high digital learning preference and low wish for digital schoolwork experienced higher later school engagement, we can argue that informal digital learning engagement, if recognized and taken into account in schools, can foster connected learning (ito et al., 2013). data from our related studies indicate, however, that digital technologies were used rather infrequently at school and mostly for pretty basic purposes at the time of collecting the present data (halonen et al., 2017). in previous research, especially those digitally oriented students who appeared to feel inadequate and alienated at school wished for an opportunity to use more digital technologies at their schoolwork (salmela-aro et al., 2016a). the results of the present study also revealed evidence of the gap, indicating that connecting learning across contexts is challenging. the results of this study indicate that not being able to incorporate the prior experiences and practices of students into the formal learning environment creates, for some students, an experience of discontinuity contributing to feelings of disengagement (rajala et al, 2015). these were students who reported to have cultivated high levels of digital expertise, and could even be described as ‘geeking out’ (ito et al., 2010). for other, digitally less engaged students, wish for digital schoolwork appeared irrelevant regarding changes in school engagement over time. taken together, both the cross-sectional and longitudinal results showed support for a more nuanced understanding of the gap hypothesis: students who, on one hand, hold a disposition for learning with and about digital tools, but on the other hand, experience being not capable of deploying this competence in school, experience decline in their school engagement. conversely, it is noteworthy that holding a stronger orientation towards learning with digital tools contributed to a higher school engagement possibly due to the increased psychological resources gained in the process, given that the students’ needs in terms of digital schoolwork are in congruence. 7.1 methodological reflections and limitations the present study used a large longitudinal sample of adolescents, which strengthens the inferences that can be drawn from the analyses. however, the sample was not representative; it was a convenience sample collected from helsinki and, therefore, cannot be generalizable across finland or beyond. a replication study with a representative sample, possibly with students of different ages, various parts of finland and from different academic contexts (high school, vocational school), would be needed. the data were based on self-reports, so the actual digital participation practices of the students were not assessed. further, the present investigation addressed mostly young people’s informal digital activities because out-of-school socio-digital participation was far more intensive that within school one. thus, we were not able to actually trace the actual behaviour of participants either with digital technologies or in school; nor were we able to actually examine the pedagogical practices or technology use in school. we cannot, for instance, say anything about how students with different levels of digital learning preference approach and behave when working on learning tasks or what kind of learning tasks they are presented with, including digital technologies or not. we acknowledge that this can make all the difference and thus future studies should include multi-level data of students’ informal and formal learning activity. moreover, future studies should better approach the qualitative differences in adolescents’ digital learning engagement, for instance, by mixed methods data on their interest-driven pursuits, and why and how they engage with digital technologies to support them (for this kind of initial efforts, see hietajärvi et al., 2016; li et al., 2017). further, school engagement is but a one indicator of academic functioning, and we should look into different ways of conceptualizing the emergence of connected learning or the gap, future studies should also take into account academic achievement, as well as indicators other than school engagement regarding motivation and well-being. the cross-lagged panel model we used to analyse the longitudinal relations allowed us to examine how digital learning engagement predicted change in school engagement over time, and as such gives us grounds to present inferences regarding the temporal ordering on these effects with a large number of participants. the methodological choice of using latent variables allowed us to model the relations without measurement error (e.g., little et al., 2007) but we did not separate between-participant and within-participant variances, which affects the inferences that we can draw, especially limiting stronger causal inferences (hamaker, kuiper, & grasman, 2015). the inclusion of the latent interaction allowed us to model the gap conditional between individual and perceived contextual factors, but without a multi-level setting and actually tracing the school-level practices we cannot truly establish detailed aspects of the postulated gap between the students’ informal digital practices and the schools’ digital pedagogical practices. there can be considerable differences between teachers, schools, districts and countries in how digital tools are implemented in teaching and learning. consequently, the gap is likely to be more observable in some schools than others and with some students rather than others. moreover, it appears that the interplay between students’ digital and academic engagement is complex rather than straightforward. national efforts of digitalizing practices of learning and teaching in finland are likely to have significant impact on future manifestations of the gap to be revealed by collecting empirical data. after we carried out the present study, finnish matriculation examination (the only high-stake test in finland) has been digitalized and major efforts of digitalization of school are underway so as to meet societal challenges and overcome the gap. our research network has developed novel self-report instruments for tracing in details the extent and focus of within school use of digital technologies and associated pedagogic approaches together with young people’s informal socio-digital practices. further, looking for the gap hypothesis only through students who use more, or less, digital technologies or how much they would like to use digital technologies does not appear to be fruitful. it is crucial to collect detailed longitudinal data of students’ transforming informal and formal ecologies of socio-digital participation that are likely to change from one cohort to the next. moreover, investigating relations between digital and school engagement calls for investigating qualitatively different levels (or genres) of socio-digital participation because interest-driven creative and academic practices are more likely to foster young people’s learning and development. moreover, students come from different backgrounds and contexts, which is reflected in how they experience digital learning engagement in terms of learning and how these experiences collide or connect with the educational practices of school (howard, ma, & yang, 2016; ito et al., 2013). this can position students unequally and contribute to a digital competence gap. for instance, investigations of barron and colleagues (2009) indicate that students who have cultivated advanced levels of digital competence often come from advantaged homes and have parents who provide structured support for school as well as digital learning, and foster the development interest-driven technical and creative capabilities. disadvantaged students, in turn, may not only have limited parental guidance of the school activity but also restricted access to tools, practices, or social support relevant for building advanced digital competences increasing dangers of educational exclusion or exposure to maladaptive patterns of digital engagement. to counter these risks, all students should be provided with tools to cultivate their digital practices and capitalize on the connected learning possibilities. from the schools’ viewpoint, the problem to be solved is how the pedagogical solutions around digital tools are implemented so that the students’ out-of-school practices are acknowledged to best support students’ personal and collaborative learning and development within a network of connected learning. toward that end, it appears critical to engage students learning by using digital technologies for sustained collaborative effort of building and creating knowledge and media (paavola & hakkarainen, 2014). 8. conclusion our results are among the first to directly assess the gap hypothesis with larger scale quantitative and longitudinal data. the results indicate that the gap hypothesis was supported under the following condition: students who express a disposition to solve problems and learn with digital technologies out of school and, would prefer to use more technologies for learning in school, experience a discontinuity between out-of-school and in-school learning and report lower later school engagement. however, the results also indicate that students who hold a disposition towards digital learning but do not experience a discontinuity in connecting it to schoolwork, experience higher later school engagement. the finding that digital learning preference predicted higher school engagement provides indirect evidence for some aspects of connected learning in schools instead of the gap. based on the results we emphasize that the manifestation of the gap vs. connected learning is dependent on multiple factors, both individual (the level of digital and school engagement) and contextual (the prevailing digital-pedagogic practices of schools). these need to be taken into account in future research. multiple methodologies and richer data sets are needed to reveal more about the adolescents’ truly connected learning experiences – or the lack of them. this study was one step forward in demonstrating and understanding the complexity of this issue. there is an obvious need to examine the reciprocal interactive processes between the learners and their social ecologies inside and outside school more closely in order to support the intellectual development and school engagement of our youth. towards that end it seems essential to also enhance the educational practices in schools so that the informal learning gained in out-of-school digital engagement can be recognized and supported across all students. keypoints cross-sectional and longitudinal results showed both continuity and discontinuity in connecting out-of-school digital learning and school. digital learning preference was related to higher schoolwork engagement. wish for digital schoolwork was related to lower schoolwork engagement. students who preferred digital learning experienced increased schoolwork engagement over time, especially when connected to in-school digital schoolwork. students who preferred digital learning but did not have sufficient possibilities to connect it to schoolwork, experienced decline in schoolwork engagement. acknowledgements this research was funded by: the academy of finland project “mind the gap—between digital natives and educational practices,” pi kirsti lonka (grant #265528); the academy of finland project “bridging the gaps—affective, cognitive, and social consequences of digital revolution for youth development and education,” pi katariina salmela-aro (grant #308351) and pi kirsti lonka (grant #308352); the strategic research council project “growing mind: educational transformations for facilitating sustainable personal, social, and institutional renewal in the digital age,” pi kai hakkarainen (grant #312527) and team leader kimmo alho (grant #312529); and, the academy of finland project “#agents – young people’s agency in social media,” pi katariina salmela-aro (grant # 320371). references asparouhov, t., & muthen, b. (2006). comparison of estimation methods for complex survey data analysis. mplus web notes. url: https://www.statmodel.com/download/surveycomp21.pdf barron, b. (2006). interest and self-sustained learning as catalysts of development: a learning ecology perspective. human development, 49, 193–224. doi: 10.1159/000094368 barron, b., martin, c. k., takeuchi, l., & fithian, r. (2009). parents as learning partners in the development of technological fluency. international journal of learning and media, 1, 55–77. doi: 10.1162/ijlm.2009.0021 bennett, d. a. (2001). how can i deal with missing data in my study? australian and new zealand journal of public health, 25, 464–469. doi: 10.1111/j.1467-842x.2001.tb00294.x bennett, s. & maton, k. (2010). beyond the “digital native” debate: towards a more nuanced understanding of students’ technology experiences. journal of computer assisted learning, 26, 321–331. doi: 10.1111/j.1365-2729.2010.00360.x bereiter, c. & scardamalia, m. (1993). surpassing ourselves: an inquiry into the nature and implications of expertise. chicago, il: open court. boekaerts, m., & minnaert, a. (1999). self-regulation with respect to informal learning. international journal of educational research, 31, 533–544. doi: 10.1016/s0883-0355(99)00020-8 chen, f. f. (2007). sensitivity of goodness of fit indices to lack of measurement invariance. structural equation modeling, 14, 464–504. doi: 10.1080/10705510701301834 chen, p. s. d., lambert, a. d., & guidry, k. r. (2010). engaging online learners: the impact of web-based learning technology on college student engagement. computers & education, 54, 1222–1232. doi: 10.1016/j.compedu.2009.11.008 clements, j. c. (2015). using facebook to enhance independent student engagement: a case study of first-year undergraduates. higher education studies, 5, 131–146. doi: 10.5539/hes.v5n4p131 demerouti, e., bakker, a. b., nachreiner, f., & schaufeli, w. b. (2001). the job demands-resources model of burnout. journal of applied psychology, 86, 499–512. doi: 10.1037/0021-9010.86.3.499 deng, l., connelly, j., & lau, m. (2016). interest-driven digital practices of secondary students: cases of connected learning. learning, culture and social interaction, 9, 45–54. doi: 10.1016/j.lcsi.2016.01.004 epskamp, s., cramer, a. o., waldorp, l. j., schmittmann, v. d., & borsboom, d. (2012). qgraph: network visualizations of relationships in psychometric data. journal of statistical software, 48, 1–18. doi: http://dx.doi.org/10.18637/jss.v048.i04 epskamp, s., & fried, e. i. (2018). a tutorial on regularized partial correlation networks. psychological methods, 23(4), 617-634.. doi: 10.1037/met0000167 epskamp, s., rhemtulla, m., & borsboom, d. (2017). generalized network psychometrics: combining network and latent variable models. psychometrika, 82, 904–927. doi: 10.1007/s11336-017-9557-x esteves, k. k. (2012). exploring facebook to enhance learning and student engagement: a case from the university of philippines (up) open university. malaysian journal of distance education, 14. eu kids online (2014) eu kids online: findings, methods, recommendations (deliverable d1.6). eu kids online, london, united kingdom: london school of economics. european commission. (2017). the digital competence framework 2.0. retrieved from https://ec.europa.eu/jrc/en/digcomp/digital-competence-framework/ european parliament. (2015). innovative schools: teaching and learning in the digital era workshop documentation. brussels, belgium: european parliament. fredricks, j.a., blumenfeld, p.c., & paris, a.h. (2004). school engagement: potential of the concept, state of the evidence. review of educational research, 74, 59–109. doi: 10.3102/00346543074001059 fruchterman. t.m.j. & reingold, e.m. (1991). graph drawing by force-directed placement. software: practice and experience, 21, 1129–1164. doi: 10.1002/spe.4380211102 gurung, b., & rutledge, d. (2014). digital learners and the overlapping of their personal and educational digital engagement. computers & education, 77, 91–100. doi: 10.1016/j.compedu.2014.04.012 guyon, h., falissard, b., & kop, j.-l. (2017). modeling psychological attributes in discussion: network analysis vs. latent variables. frontiers in psychology, 8, 798. doi: 10.3389/fpsyg.2017.00798 hakkarainen, k. (2009). a knowledge-practice perspective on technology-mediated learning. international journal of computer-supported collaborative learning, 4, 213–231. doi: 10.1007/s11412-009-9064-x hakkarainen, k., ilomäki, l., lipponen, l., muukkonen, h., rahikainen, m., tuominen, t., et al. (2000). students’ skills and practices of using ict: results of a national assessment in finland. computers & education, 34, 103–117. doi:10.1016/s0360-1315(00)00007-5 hallquist, m. n. & wiley, j. f. (2018). mplusautomation: an r package for facilitating large-scale latent variable analyses in mplus. structural equation modeling, 1–18. doi: 10.1080/10705511.2017.1402334. halonen, n., hietajärvi, l., lonka, k., & salmela-aro, k. (2016). sixth graders’ use of technologies in learning, technology attitudes and school well-being. the european journal of social & behavioural sciences. 18, 2307–2324. doi: 10.15405/ejsbs.205 hamaker, e. l., kuiper, r. m., & grasman, r. p. (2015). a critique of the cross-lagged panel model. psychological methods, 20, 102. doi: 10.1037/a0038889 hatano, g. & inagaki, k. (1992). desituating cognition through the construction of conceptual knowledge. in p. light & g. butterworth (eds.), context and cognition. ways of knowing and learning (pp. 115–133). new york, new york: harvester. hietajärvi, l., salmela-aro, k., tuominen, h., hakkarainen, k., & lonka, k. (2019). beyond screen time: multidimensionality of socio-digital participation and relations to academic well-being in three educational phases. computers in human behavior, 93, 13–24. doi: 10.1016/j.chb.2018.11.049 howard, s. k., ma, j., & yang, j. (2016). student rules: exploring patterns of students’ computer-efficacy and engagement with digital technologies in learning. computers & education, 101, 29–42. doi: 10.1016/j.compedu.2016.05.008 hu, l. t., & bentler, p. m. (1998). fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. psychological methods, 3, 424. ito, m., baumer, s., bittanti, m., cody, r., herr-stephenson, b., horst, h. a., et al. (2010). hanging out, messing around, and geeking out. cambridge, massachusetts: the mit press. ito, m., gutiérrez, k., livingstone, s., penuel, b., rhodes, j., salen, k., ... & watkins, s. c. (2013). connected learning: an agenda for research and design. irvine, california: digital media and learning research hub. jenkins, h. (2009). confronting the challenges of participatory culture: media education for the 21st century. cambridge, massachusetts: mit press. junco, r. (2012a). the relationship between frequency of facebook use, participation in facebook activities, and student engagement. computers & education, 58, 162–171. doi: 10.1016/j.compedu.2011.08.004 junco, r. (2012b). too much face and not enough books: the relationship between multiple indices of facebook use and academic performance. computers in human behavior, 28, 187–198. doi: 10.1016/j.chb.2011.08.026 junco, r., heiberger, g., & loken, e. (2011). the effect of twitter on college student engagement and grades. journal of computer assisted learning, 27, 119–132. doi: 10.1111/j.1365-2729.2010.00387.x klein, a., & moosbrugger, h. (2000). maximum likelihood estimation of latent interaction effects with the lms method. psychometrika, 65, 457–474. doi: 10.1007/bf02296338 kumpulainen, k., & sefton-green, j. (2012). what is connected learning and how to research it? international journal of learning, 4, 7–18. doi: 10.1162/ijlm_a_00091 laird, t. f. n., & kuh, g. d. (2005). student experiences with information technology and their relationship to other aspects of student engagement. research in higher education, 46, 211–233. doi: 10.1007/s11162-004-1600-y li, s., hietajärvi, l., palonen, t., salmela-aro, k., & hakkarainen, k. (2017). adolescents’ social networks: exploring different patterns of socio-digital participation. scandinavian journal of educational research, 61, 255–274. doi: 10.1080/00313831.2015.1120236 little, t. d., preacher, k. j., selig, j. p., & card, n. a. (2007). new developments in latent variable panel analyses of longitudinal data. international journal of behavioral development, 31, 357–365. doi: 10.1177/0165025407077757 maccallum, r.c., browne, m.w., & cai, l. (2005). testing differences between nested covariance structure models: power analysis and null hypotheses. psychological methods, 11, 19–35. doi: 10.1037/1082-989x.11.1.19 malcolm, j., hodkinson, p., & colley, h. (2003). the interrelationships between informal and formal learning. journal of workplace learning, 15, 313–318. doi: 10.1108/13665620310504783 marsh, h. w., wen, z., & hau, k. t. (2004). structural equation models of latent interactions: evaluation of alternative estimation strategies and indicator construction. psychological methods, 9, 275. doi: 10.1037/1082-989x.9.3.275 mcfarlane, a. (2015). authentic learning for the digital generation: realising the potential of technology in the classroom. london: routledge. moisala, m., salmela, v., hietajärvi, l., carlson, s., vuontela, v., lonka, k., ... & alho, k. (2016a). gaming is related to enhanced working memory performance and task-related cortical activity. brain research , 1655, 204–215. doi: 10.1016/j.brainres.2016.10.027 moisala, m., salmela, v., hietajärvi, l., salo, e., carlson, s., salonen, o., ... & alho, k. (2016b). media multitasking is associated with distractibility and increased prefrontal activity in adolescents and young adults. neuroimage, 134, 113–121. doi: 10.1016/j.neuroimage.2016.04.011 muthén, l. k., & muthén, b. o. (2018). mplus: statistical analysis with latent variables: user's guide [version 8]. los angeles, california: muthén & muthén. muthén, b., & satorra, a. (1995). complex sample data in structural equation modeling. sociological methodology, 25, 267–316. doi: 10.2307/271070 nardi, b., & o’day, v. (2000). information ecologies: using technology with heart. cambridge, massachussets: mit. oecd. (2015). students, computers and learning: making the connection. paris, france: pisa, oecd publishing. doi: 10.1787/9789264239555-en olson, d.r. & bruner, j. s. (1996) folk psychology and folk pedagogy. in d. r. olson, d.r. & n. torrance (eds.) the handbook of education and human development. new models of learning, teaching and schooling (pp. 9–27). malden, massachusetts: blackwell publisher. orvis, k. l. (ed.). (2008). computer-supported collaborative learning: best practices and principles for instructors: best practices and principles for instructors. hershey, new york, new york: igi global. paavola s. & hakkarainen, k. (2014). trialogical approach for knowledge creation. in tan s-c., jo, h.-j., & yoe, j. (eds.), knowledge creation in education (pp. 53–72). singapore: springer. panadero, e. & järvelä, s. (2015). socially shared regulation of learning: a review. european psychologist, 20, 190–203. doi: 10.1027/1016-9040/a000226 prensky, m. (2001). digital natives, digital immigrants part 1. on the horizon, 9, 1–6. doi: 10.1108/10748120110424816 putnick, d. l., & bornstein, m. h. (2016). measurement invariance conventions and reporting: the state of the art and future directions for psychological research. developmental review, 41, 71–9. doi: 10.1016/j.dr.2016.06.004 r core team (2018). r: a language and environment for statistical computing. r foundation for statistical computing, vienna, austria. url https://www.r-project.org/. raftery, a. e. (1995). bayesian model selection in social research. sociological methodology, 111-163. rajala, a., kumpulainen, k., hilppö, j., paananen, m., & lipponen, l. (2015). connecting learning across school and out-of-school contexts: a review of pedagogical approaches. in o. erstad, k. kumpulainen, å. mäkitalo, k. p. pruulmann-vengerfeldt, & t. jóhannsdóttir (eds.), learning across contexts in the knowledge society. (pp. 15-35) rotterdam, the netherlands: sense publishers. rashid, t., & asghar, h. m. (2016). technology use, self-directed learning, student engagement and academic performance: examining the interrelations. computers in human behavior, 63, 604–612. doi: 10.1016/j.chb.2016.05.084 ritella, g. & hakkarainen, k (2012). instrument genesis in technology mediated learning: from double stimulation to expansive knowledge practices. international journal of computer-supported collaborative learning, 7, 239–258 doi: 10.1007/s11412-012-9144-1. robinson, k. (2011). out of our minds. learning to be creative. westford, massachusetts: capstone publishing inc. salmela-aro, k. (2017). dark and bright sides of thriving–school burnout and engagement in the finnish context. european journal of developmental psychology, 14, 337–349. doi: 10.1080/17405629.2016.1207517 salmela-aro, k., muotka, j., alho, k., hakkarainen, k., & lonka, k. (2016a). school burnout and engagement profiles among digital natives in finland: a person-oriented approach. european journal of developmental psychology, 13, 704–718. doi: 10.1080/17405629.2015.1107542 salmela-aro, k., & upadaya, k. (2012). the schoolwork engagement inventory. european journal of psychological assessment, 28, 60–67. doi: 10.1027/1015-5759/a000091 salmela-aro, k., & upadyaya, k. (2014). school burnout and engagement in the context of demands–resources model. british journal of educational psychology, 84, 137–151. doi: 10.1111/bjep.12018 salmela-aro, k., upadyaya, k., hakkarainen, k., lonka, k., & alho, k. (2016b). the dark side of internet use: two longitudinal studies of excessive internet use, depressive symptoms, school burnout and engagement among finnish early and late adolescents. journal of youth and adolescence, 1–15. doi: 10.1007/s10964-016-0494-2 sawyer, k. (2014). the cambridge handbook of the learning sciences. cambridge, massachusetts: cambridge university press. selig, j. p., & little, t. d. (2012). autoregressive and cross-lagged panel analysis for longitudinal data. in b. laursen, t. d. little, & n. card (eds.). handbook of developmental research methods (pp. 265–278). new york, new york: guilford press. selwyn, n. (2006). exploring the ‘digital disconnect’ between net savvy students and their schools. learning, media and technology, 31, 5–17. doi: 10.1080/17439880500515416 sung, y. t., chang, k. e., & liu, t. c. (2016). the effects of integrating mobile devices with teaching and learning on students’ learning performance: a meta-analysis and research synthesis. computers & education, 94, 252–275. doi: 10.1016/j.compedu.2015.11.008 tamim, r. m., bernard, r. m., borokhovski, e., abrami, p. c., & schmid, r. f. (2011). what forty years of research says about the impact of technology on learning: a second-order meta-analysis and validation study. review of educational research, 81, 4–28. doi: 10.3102/0034654310393361 upadyaya, k., & salmela-aro, k. (2013). development of school engagement in association with academic success and well-being in varying social contexts: a review of empirical research. european psychologist, 18, 136–147. doi: 10.1027/1016-9040/a000143 wexler, b. e. (2006). brain and culture. neurobiology, ideology, and social change. cambridge, massachusetts: the mit press . appendices appendix a model fit indices in the measurement invariance testing appendix b unstandardized factor loadings and r2 of the longitudinal measurement model with scalar invariance constraints appendix c latent variable correlations of the longitudinal measurement model with scalar invariance constraints appendix d materials to reproduce the results can be found here: https://osf.io/2hk3y/ codepen durik frontline learning research vol.8 no. 3 special issue (2020) 85 103 issn 2295-3159 variability in certainty of self-reported interest: implications for theory and research amanda m. durik a& jade s. jenkinsa anorthern illinois university, usa article received 15 may 2019 / revised 29 august/ accepted 29 august / available online 30 march abstract these studies examined self-reported interest, how level of interest is related to reported certainty of interest, and whether certainty helps to clarify the relationship between interest and behavior. this research borrows from research on attitudes showing that attitude certainty helps to clarify the relationship between attitudes and behavior. a pilot study examined the relationship between self-reported interest and certainty of interest within four disciplines (math, psychology, biology, and astronomy). these relationships were replicated in math (study 1) and psychology (study 2), and the relationships between interest and behavior were stronger for those with greater certainty. for domains in which participants had sufficient levels of experience and varied levels of interest, curvilinear relationships were found between level of interest and certainty, showing that certainty is higher among individuals who report more extreme (high or low) levels of interest. moreover, self-reported interest predicted behavior more strongly for those with more certainty in their responses. discussion surrounds the theoretical and methodological utility of considering certainty of interest alongside measures of self-reported interest. keywords: self-report; certainty; interest info corresponding author email: adurik@niu.edu doi: https://doi.org/10.14786/flr.v8i3.491 1. background when scanning a classroom, it is hard not to notice variability in student motivation. the students who are attentive, active, listening, and responding can be differentiated from those who are off task and distracted. some of this differentiation can be traced to students’ varying levels of interest in the domain that is being taught. interest is an emotional response to particular stimuli that has both cognitive and motivational features (see reviews by hidi & renninger, 2006; krapp, 2002; prenzel, 1992). it can guide students toward better self-regulation while learning because it helps students focus on content, choose to engage, persevere through challenges, and recall more. interest is conceptualized as both residing in the person over time (individual interest) as well as varying in the moment in response to stimuli in the immediate situation (situational interest). as such, interest can fluctuate in response to processes operating within the person and stimuli present in the situation. individual interest is an enduring tendency to approach and seek learning opportunities in a given domain (ainley & ainley, 2011; deci, 1992; hidi & renninger, 2006; krapp, 2002; prenzel, 1992; renninger, hidi, & krapp, 1992; schiefele, 1991; silvia, 2006). the domain is associated with affective involvement and meaningfulness (schiefele, 1991), which are enhanced by the acquisition and use of knowledge (hidi & renninger, 2006; renninger, 2000; prenzel, 1992). a hallmark of individual interest is willingness to reengage in the domain over time (hidi & renninger, 2006). the measurement of interest has been given considerable attention. options of how to measure interest often include self-report scales and interviews (see discussions by krapp & prenzel, 2011; renninger & hidi, 2011). within those, decisions about which features of interest should be captured vary. for example, within the collection of self-report scales that are available, some include feelings associated with interacting with domain content, the meaning or value associated with the domain, and/or the perceived knowledge or actual knowledge that individuals have stored in the domain (see review by renninger & hidi, 2011). most conceptualizations of interest reflect the idea that interest can change in response to experience and exposure to content, and we argue that research on interest must inherently take a developmental perspective. this leads to challenges in measurement because any measure is a snapshot of a person’s interest at a particular moment. moreover, in the case of closed-ended self-report measures of interest, a given response is that person’s best attempt at quantifying their level of interest at that time. these issues also manifest in the capacity (or lack of, in some cases) for self-report measures of interest to predict behavioral manifestations of interest. behavioral measures of interest often assess whether and how people choose to engage with domain content. not surprisingly, measures of self-reported interest often positively predict behavior, such as free-choice behaviors observed in the laboratory, course taking, retrospective reports of behavior, and intentions to behave in the future (ainley et al., 2002; harackiewicz, durik, barron, linnenbrink-garcia, & tauer, 2008; renninger, 1990; simpkins et al., 2006; wijnia et al., 2014). that said, although correlations between self-reported interest and behavior are often present, they may not be as strong as one might expect. other areas of research have carefully considered why self-reported and behavioral data are not always strongly associated, and social psychologists who study attitudes began working on this issue within their own area in the 1970s. this work toward understanding the relationship between self-reported attitudes and behavior has led to greater clarity surrounding the nature of and research on attitudes. similarly, careful attention to the relationship between self-reported interest and behavior may also help clarify the construct of interest. the challenges encountered by attitude researchers in assessing self-reported attitudes are likely similar to many of the challenges encountered by interest researchers assessing self-reported interest. in both cases, participants are asked to evaluate their responses to a particular class of stimuli or ideas. although it is simple enough for an individual to provide a response on a scale, this deceivingly simple response is the outcome of much more complex processes. it should be noted, however, that we do not argue that interest and attitudes are the same. interest assumes an active process on the part of the individual that propels them toward knowledge acquisition, elaboration, or growth (deci, 1992) whereas an attitude does not necessarily trigger this process and may actually do the opposite. for example, a person who holds a positive attitude toward recycling would believe that recycling is good. a person who holds a negative attitude about recycling would believe that recycling is bad, possibly a waste. whereas the person with a positive attitude may be more open to learning about recycling than the person with a negative attitude, the attitude itself does not motivate learning. in contrast, a person with an interest in recycling would be expected to have learning goals related to recycling (e.g., a desire to learn about the processes related to recycling, which materials can be recycled, and why they should be recycled). attitude researchers have addressed the issue of attitude-behavior consistency in several ways. for example, one approach recognized that other environmental variables such as norms and the opportunity and ability to engage in the behavior were also critical (ajzen, 1991). another approach recognized that attitudes predicted behavior more strongly when people focused on only one side of an attitude (e.g., for or against; glasman & albarracin, 2006). finally, another approach identified that other attitude features contributed to attitude strength, and increased the relationship between attitudes and behavior (krosnick & petty, 1995). borrowing from this last approach, the current research centers on the idea that individuals vary in the extent to which they are certain of their attitudes. certainty refers to the extent to which individuals are confident in their assessment of an attitude as clear and correct (rucker et al., 2014). certain attitudes are held more strongly and are less likely to change (krosnick & petty, 1995; pomerantz et al., 1995). moreover, participants who reported greater certainty of their attitudes, which was measured separately from the attitudes themselves, were more likely to behave in ways that were consistent with their attitudes (e.g., bizer et al., 2006; fazio & zanna, 1978; glasman & albarracıin, 2006; tormala, 2016; tormala & rucker, 2007). as an example, although participants may report varying levels of attitudes toward recycling, those who have more certain and positive attitudes toward recycling would be more likely to actually recycle. just as individuals can report less or more certainty of their attitudes, we theorized that some participants would be more certain of their self-reported interest and others less so. we reasoned that if attitude researchers were able to clarify the relationship between attitudes and behaviors by considering attitude certainty, it was worthwhile to attempt the same for individual interest. this approach, compared to some of the other approaches explored within the attitude literature, was selected because it preserved the assumption that research on interest must consider development. interest changes and people may become more certain of their interest over time. as such, not only might the inclusion of certainty clarify the relationship between interest and behavior, but it may also provide insight into how participants’ awareness of interest changes. specifically, certainty of interest may prove useful in gaining insight into the nature of interest and how individuals come to recognize their interests. drawing again from the attitude literature, the extent to which individuals become more confident or certain of their attitude is related to the extent and valence of prior experiences and the amount of careful thought put toward the object of the attitude (berger, 1992; bizer et al., 2006; fazio & zanna, 1978; glasman & albarracin, 2006; jonas et al., 1997; krishnan & smith, 1998; prislin et al., 1998). prior experience and careful thought toward the object of an attitude have been found to increase certainty, which then contributes to stability of attitudes. the current research examines variability in certainty as a starting point in determining whether similar processes may be operating for interest as they are for attitudes. the first aim of the current research is to examine certainty of interest and how certainty varies with levels of self-reported interest. the second aim was to test whether individuals who are more certain behave in ways that align more closely to their interests. 2. pilot study the purpose of the pilot study was to explore the patterns of association between self-reported interest and certainty of interest among different domains (math, biology, astronomy, and psychology). these domains were chosen because it was expected that participants would vary in their prior experience with each. the pilot sample was composed entirely of advanced psychology students. as such, this population was anticipated to have varying levels of exposure to math and biology (due to compulsory education), high exposure to psychology (as their program of study), and low exposure to astronomy (neither compulsory nor inherently linked to their program of study). this anticipated variability in experience with the different domains may have implications for certainty, and create meaningful comparisons across the domains. we tested whether certainty and interest would be related in a linear or curvilinear fashion, and were especially interested in a curvilinear relationship such that participants who reported more extreme levels of interest (either low or high) may also be more certain of their interest. this pattern was found in prior research on attitudes revealing a curvilinear relationship between certainty and willingness to advocate for an attitudinal position (cheatham & tormala, 2017). moreover, if the relationship was linear, the redundancy in interest and certainty may undermine the utility of considering certainty of interest as separate from interest. 2.1 method design this was a correlational study using a within-participants design in which participants answered questions about their interest and certainty of interest in four domains. 2.1.1 participants the participants were 21 undergraduate students at a mid-sized university in the midwestern united states. they were all in an upper-level psychology course that students typically complete their last year of undergraduate study. they completed the questionnaire in exchange for extra credit. 2.1.2 measures and procedure participants responded to a 4-page, paper-and-pencil survey, in which questions for each domain (math, biology, astronomy, and psychology) were presented on different pages. the order in which participants responded about the different domains was counterbalanced across participants. participants responded to items assessing interest, certainty, and the number of college courses they had completed in the domain as well as other items that are not central to the current research. interest was measured with 6 items that were adapted from those used in prior research and capture both feeling and meaning/value aspects of interest (durik & harackiewicz, 2007). the scale included “i find ___ interesting,” “___is fascinating to me,” “i find ___enjoyable,” “___is a boring subject,” “___just doesn’t appeal to me,” and “i think ___is a meaningful discipline.” wherein the blank spaces were replaced with the domain name. participants rated each item from 1 (strongly disagree) to 7 (strongly agree). cronbach alphas for interest were .94 (math), .90 (biology), .77 (psychology), and .81 (astronomy). the lower reliability observed for psychology is likely due to restriction of range because all participants were highly interested in psychology, which is known to constrain estimates of reliability (nunnally & bernstein, 1994). certainty was assessed with one item, “how sure are you of your attitudes about ___?” and rated from 1 (not at all) to 5 (very much). although only a single item was used, a similar approach has been taken in prior research (gross et al., 1995). participants were also asked, “how many college courses have you taken in ___?” with response options ranging from zero to greater than eleven. this item was included in order to examine whether certainty was related to participants’ prior exposure to the domain. 2.2 results and discussion the data for this study (and both subsequent studies) were analyzed using multiple regression analysis conducted in spss version 25.0. for each domain, certainty was designated as the criterion variable. the measure of interest in each domain was standardized and a squared term was calculated by multiplying the standardized measure by itself. these two variables, the standardized measure of interest and its square, were entered into the regression simultaneously to predict certainty of interest for that domain. for each effect, squared semi-partial correlations are provided as measures of effect size. these denote the portion of total variability in the outcome variable that is uniquely accounted for by a given predictor. 2.2.1 math the analysis predicting certainty of math interest revealed a negative average relationship of interest, t(18) = -2.48, p = .02, b = -0.42, sr2 = .23, and a positive quadratic relationship, t(18) = 2.54, p = .02, b = 0.38, sr2 = .24. the top left panel of figure 1 depicts the relationship, showing that certainty was higher for those reporting either low or high levels of interest, and lower for those reporting more moderate levels of interest. 2.2.2. biology the analysis predicting certainty of biology interest revealed no relationship of interest, t(18) = 0.68, p = .51, b = 0.12, sr2 = .02, but similar to math, yielded a positive quadratic relationship, t(18) = 3.12, p < .01, b = 0.65, sr2 = .35. the bottom left panel of figure 1 depicts the relationship. similar to what was observed in math, participants who reported lower or higher levels of interest in biology also reported greater certainty, compared with those who reported more moderate levels of interest. 2.2.3 astronomy the model used to predict certainty of astronomy interest revealed a different pattern. neither a linear relationship, t(18) = 0.68, p = .50, b = 0.10, sr2 = .02, nor a quadratic relationship, t(18) = -0.18, p = .86, b = -0.02, sr2 < .01, emerged (see top right panel of figure 1). 2.2.4 psychology the regression predicting certainty of psychology interest yielded a positive linear relationship of interest, t(18) = 3.34, p < .01, b = 0.24, sr2 = .12, as well as a negative quadratic relationship, t(18) = -2.56, p = .02, b = -0.19, sr2 = .07. the bottom right panel of figure 1 reveals a different quadratic relationship than was observed for math and biology. among this sample of upper-level psychology students, interest appears to be positively related to certainty, and then levels off at the highest levels of interest and certainty. figure 1. curvilinear relationships tested between interest and certainty for each domain in the pilot study. math and biology (left panels) revealed statistically significant (p < .05) positive quadratic relationships, psychology (bottom right) revealed a significant negative quadratic relationship, and astronomy revealed no relationship (top right). overall these analyses of the relationships among level of interest and certainty reveals several things. first, the relationship between interest and certainty varies considerably by domain, such that negative quadratic relationships are observed in this sample for math and biology, a positive quadratic relationship was observed in this sample for psychology, and no relationship was observed for astronomy. these varied relationships are likely due to both the amount of experience these participants have with each domain, as well as their high interest in psychology due to the fact that participants were sampled from an upper-level psychology class. it seems likely that the quadratic relationships observed in math and biology are representative of most domains in which individuals have sufficient exposure to the domain to report their interest, and assuming that the sample includes the full range of interest. in most cases when participants are asked to report their interest, they have sufficient experience from which to draw conclusions about and be aware of their level of interest (either high or low) in the domain. in contrast, we interpreted the absence of relationships found in the astronomy domain as likely due to participants having had little prior exposure to astronomy. lack of experience likely limited both their level of certainty as well as their extremity of interest in the domain. finally, the results for psychology revealed a different quadratic pattern from the other three and reflects this sample’s high certainty and high interest in psychology. to examine whether certainty did covary with participants’ prior experience, a final analysis was performed in which reports of certainty and the number of college courses reported was aggregated across participants. a correlation was calculated between the average number of courses students reported for each domain and the average level of certainty for each of the four domains. the correlation of the aggregated measures revealed a strong and positive association, r(2) = .99, p < .01, suggesting that the number of reported courses among participants in this sample was strongly and positively associated with level of certainty. one could imagine that a more thorough measure of prior experience, including courses taken in secondary school or informal learning opportunities, would add further insight into this relationship. although the sample size for this pilot study was extremely small and only included participants who were highly interested in psychology, the results were promising enough to explore further. studies 1 and 2 were designed to examine these relationships more in depth within two domains, math and psychology, among participants drawn from a more general population. 3. study 1 study 1 was designed to replicate the results observed in the pilot study in the domain of math. given that certainty of interest is likely to increase as individuals have more exposure to domains, and that students are exposed to years of compulsory math in primary and secondary school, we expected the negative quadratic relationship between interest and certainty that was observed in the pilot study to also emerge in study 1. the second purpose of study 1 was to test the relationship between self-reported interest and behavior for those with lower or higher certainty. we hypothesized that self-reported interest in the domain would predict behavior in the domain positively and more strongly if individuals were more versus less certain of their interest. 3.1 method 3.1.1 design this was a correlational study that took place in a laboratory context. participants’ math interest, certainty of math interest, and math-related behaviors were assessed in a single session. 3.1.2 participants the participants were 138 undergraduate students (54% women) completing an introductory psychology course at a mid-sized university in the midwest united states. they participated in the study for partial course credit. the sample included participants who reported their race or ethnicity as african american (24%), hispanic (16%), asian (9%), caucasian (48%), or as another, unlisted category (3%). 3.1.3 measures and procedure participants were invited into the lab individually and completed the measures using medialab software (jarvis, 2004). first, participants reported their interest in math using five items adapted from prior research (harackiewicz et al. 2008), including “i've always been fascinated by mathematics,” “i'm really excited about learning mathematics,” “i'm really looking forward to learning more about mathematics,” “i think mathematics is an important discipline,” and “i think mathematics is important for me to know.” participants responded to each item from 1 (strongly disagree) to 7 (strongly agree). the internal consistency of the scale items was strong (cronbach’s alpha = .90). participants responded to 6 items designed to assess their certainty of interest (krosnick et al., 1993). these items were, “how certain are you of your feelings toward mathematics?”, “how sure are you that your opinion of mathematics is correct?”, “how firm are your opinions of mathematics?”, “how easily could your opinion of mathematics be changed?” (reversed), “how definite are your views of mathematics?”, and “how convinced are you of your views of mathematics?” from 1 (not at all ___) to 7 (very ___), in which the blank restated the word in the question that was presented in capital letters. the reversed item was omitted because it decreased the internal consistency of the scale, which left 5 items (final cronbach’s alpha = .91). participants also had the opportunity to report their engagement in math-related behaviors. behavioral indicators are often influenced by many factors in a given situation so the three behavioral indicators were combined into a composite after being standardized. one set of items asked participants to reflect on the past two years and indicate whether or not they had voluntarily chosen to engage in 12 activities related to math, including “i have surfed a website about mathematics in my spare time,” “i have voluntarily discussed topics related to mathematics with friends or family,” “i have chosen to join a club related to mathematics,” and “i have spent free time reading a magazine article about mathematics.” the number of behaviors indicated were summed for each participant. two additional behavioral measures occurred during the session. participants were given a set of 10 math-related topics (e.g., “polygons and figures,” “pi,” “statistics,” “quadratic equations”) and asked to mark any about which they would like to receive more information via email (and to provide their contact information to do so). participants were also given the opportunity to watch any of 5 short video clips about math-related topics (e.g., pi, mental math techniques, square roots). the number of topics marked and the number of videos watched were summed and each served as an additional measure of behavior. the standardized scores for all three measures were averaged to obtain the behavior composite (cronbach’s alpha = .69). 3.2 results and discussion an exploratory factor analysis using oblique rotation was conducted to explore whether the measure of certainty was different from that of interest. two eigenvalues over 1 emerged and the pattern matrix showed two factors with fairly simple structure. each item had a loading of at least .74 on its expected factor and no loading over .07 on the unexpected factor. the next analysis focused on testing the relationship between level of interest in math and certainty of math interest. as was done in the pilot study, this was achieved by conducting a multiple regression analysis in which certainty served as the criterion variable, and a standardized measure of interest as well as its square served as the two predictors. this analysis revealed both a positive relationship of interest, t(135) = 4.68, p < .01, b = 0.38, sr2 = .12. as well as a positive quadratic relationship, t(135) = 6.02, p < .01, b = 0.47, sr2 = .20. comparable to the pattern that emerged in the pilot study with regard to math, those with either lower or higher levels of interest in math also reported more certainty. in contrast, those who reported a moderate amount of interest in math reported lower certainty (see figure 2). figure 2. curvilinear relationship observed between interest in math and certainty in study 1. the second analysis focused on whether the relationship between math interest and behavior would be positive and stronger for participants who reported greater certainty. to this end, a second regression analysis was conducted. the criterion variable was the composite measure of behavior and the three predictors included standardized measures of math interest, certainty of math interest, and their product. the analysis yielded a strong positive relationship of interest, t(134) = 4.67, p < .01, b = 0.32, sr2 = .12, and the predicted interaction, t(134) = 2.57, p = .01, b = 0.17, sr2 = .04, indicating that the relationship between interest and behavior varied depending on level of certainty. certainty was not a significant predictor. simple slope analyses were conducted to examine the relationship between interest and behavior separately for participants reporting certainty that was one standard deviation above and below the mean. the relationship between math interest and behavior was significant and positive for those with high certainty, t(134) = 7.12, p < .01, b = 0.49, but not significant for those with low certainty, t(134) = 1.28, p = .20, b = 0.15 (see figure 3). figure 3. interaction depicting different relationships between math interest and behavior for those reporting lower (one sd below the mean) and higher (one sd above the mean) certainty in study 1. these data replicate the pattern observed in the pilot study between math interest and certainty, and also showed that individuals who are more certain of their interest are more likely to behave in ways that are consistent with their levels of interest. in other words, those who are more certain of having lower interest are less likely to engage whereas those who are more certain of having higher interest are especially likely to engage. an interesting picture begins to emerge with regard to participants who are less certain. what is learned from study 1 is that these participants’ behavior is not as closely associated with their interest. 4. study 2 study 1 offered additional evidence that individuals’ self-reported interest and certainty vary in a curvilinear way, and that participants with higher certainty are more likely to behave in ways that are consistent with their level of reported interest. study 2 was designed to test this again but in a different domain, psychology. psychology was chosen for two reasons. first, the pilot study showed that interest in psychology and certainty showed a different relationship (a negative quadratic relationship) in contrast to the positive quadratic relationship observed in study 1 as well as the two other domains examined in the pilot study. we suspected that the observed relationship between interest and certainty for psychology found in the pilot study was due to the sample being composed entirely of highly interested psychology students nearing the completion of their degree. that said, it could instead be due to the domain itself. we wanted to examine the relationship between interest and certainty in psychology, but with a more general sample. second, psychology offered the possibility of testing the ideas related to interest and certainty with regard to individual interest as well as situational interest, given that the students were completing introductory psychology at the time that the data were collected. whereas individual interest is thought of as an enduring person characteristic, situational interest refers to interest that is triggered by cues in the environment (hidi & renninger, 2006; mitchell, 1993; schraw & lehman, 2001). although individual interest and situational interest are often highly correlated, we reasoned that certainty may function differently for these two types of interest. among students enrolled in an introductory psychology class, it was possible to examine overall individual interest in psychology as well as situational interest in the class, and compare how each measure of interest predicted behavior when taking into account level of certainty. we did not make hypotheses about whether one measure of interest (individual or situational) would predict behavior more strongly than the other. 4.1 method 4.1.1 design this was a correlational study in which participants’ interest in psychology, certainty of their interest, and behaviors related to the domain of psychology were assessed. 4.1.2 participants the participants included 142 undergraduate students (55% women) completing an introductory psychology course at a mid-sized university in the midwest united states. they participated in the study for partial course credit. participants in the sample reported their race or ethnicity as african american (20%), hispanic (10%), both african american and hispanic (1%), asian (3%), caucasian (65%), or as another, unlisted category (1%). one person did not respond to the question about race/ethnicity. 4.1.3 measures and procedure there were two measures of interest, both individual and situational. to report individual interest, participants rated whether “psychology is…” “interesting,” “stimulating,” “boring” (reversed), “engaging,” “meaningful,” “worthless” (reversed), and “useful” from 1 (not at all) to 7 (very much; schiefele, 1990). situational interest was measured with eight items reflecting the students’ level of interest in their introduction to psychology course (e.g., “what we are learning in psychology class this semester is fascinating to me” and “we are learning valuable things in psychology class this semester”; linnenbrink-garcia et al., 2010). participants responded to each item on a scale from 1 (strongly disagree) to 7 (strongly agree). the cronbach’s alphas for individual and situational interest were .88 and .95, respectively. participants reported their certainty immediately following both measures of interest. the items that measured certainty were the same as in study 1 and were general in that they did not specify whether participants should report certainty of individual or situational interest. cronbach’s alpha for the certainty measure was .88. at the end of the survey, reports of behavior were measured with two types of items, and the two types were standardized and combined into a composite in the same way as in study 1. in parallel with study 1, participants were asked whether or not they had engaged in 11 psychology-related behaviors in the past two years (e.g., “i have chosen to join a club related to psychology,” “i have spent free time reading a magazine article about psychology”). participants were also given the option of indicating whether they would like to receive information about various psychology-related topics, and if so, to mark their topic choices and provide their email address so this information could be sent. fifteen topics were listed, designed to capture broad areas of psychology (e.g., “how the brain works,” “mental illness,” “how memories form,” and “stereotyping and prejudice”). the total behaviors indicated and the total number of topics selected were both standardized and then averaged to form the composite measure of behavior. given that there were only two types of behaviors, cronbach’s alpha for this measure was modest equaling .48, which may attenuate the relationships observed between interest and behavior. in contrast to study 1, the option to offer participants videos to watch was not possible because study 2 used pencil-and-paper surveys. 4.2 results and discussion 4.2.1 individual interest as in study 1, an exploratory factor analysis was conducted on the certainty and interest items in order to evaluate their structure. the structure was not as clean as in study 1, due to the two interest measures (individual and situational) having items that shared variability. this is not terribly surprising given the similarity in the constructs, methods of measurement, and timing. that said, the certainty items tended to load together and separately from the interest items, again attesting to the uniqueness of the certainty measure as a complement to typical measures of interest. we proceeded with the two separate measures of interest given their conceptual distinction but also recognize that the similarity in their measurement may hinder the ability to see predictive differences across the two measures. as in study 1, the first analysis was designed to test the relationship between level of interest and certainty, which was then followed by an analysis to test the relationship between self-reported interest and behavior, with the addition of certainty as a moderator. a multiple regression model was tested using certainty of interest as the criterion variable, and interest and its square as the predictors. as in study 1, the measures of interest were standardized prior to calculating the squared term. replicating the relationship observed in study 1 with the domain of math, individual interest in psychology had both a linear, t(139) = 5.68, p < .01, b = 0.53, sr2 = .18, and quadratic relationship with certainty, t(139) = 4.00, p < .01, b = 0.22, sr2 = .09. similar to math, and unlike the relationship observed in the pilot study, certainty was higher for those with lower or higher interest in psychology, and lower for those who reported more moderate interest. next, to test whether interest was a stronger predictor of behavior for those with more certain interest, a multiple regression model was tested in which the behavior composite was the criterion variable and the three predictors were the standardized composite measure of individual interest, the standardized measure of certainty, and their product. this analyses revealed a positive relationship between interest and behavior, t(138) = 5.09, p < .01, b = 0.34, sr2 = .10, as well as an interaction, t(138) = 1.99, p < .05, b = 0.12, sr2 = .02 (see figure 4). simple slope tests were conducted to examine the relationship between interest and behavior for those scoring one standard deviation below and above the mean of certainty. these analyses revealed that the relationship between individual interest and behavior was significant and positive for both, but stronger for those with higher certainty, t(138) = 6.04, p < .01, b = 0.46, than for those with lower certainty, t(138) = 2.17, p = .03, b = 0.22. figure 4. interaction depicting different relationships between psychology interest and behavior for those reporting lower (one sd below the mean) and higher (one sd above the mean) certainty in study 2. these relationships replicate the patterns that were observed in study 1. furthermore, the pattern between interest in psychology and certainty that was observed in the pilot study seems to have been due to the sampling of participants for the pilot study and not to the domain. 4.2.2 situational interest the next set of analyses were parallel to those described for individual interest, but instead focused on situational interest. when situational interest in the psychology class was used to predict certainty, both a linear relationship, t(139) = 5.48, p < .01, b = 0.48, sr2 = .17, and a quadratic relationship, t(139) = 3.17, p = .02, b = 0.21, sr2 = .06, emerged. participants who reported more extreme levels of situational interest also reported greater certainty whereas those who reported more moderate levels of situational interest reported less certainty. when situational interest, certainty, and the interaction between them were used to predict the composite measure of behavior, both situational interest, t(138) = 2.71, p < .01, b = 0.19, sr2 = .05, and their interaction were significant, t(138) = 2.29, p = .02, b = 0.15, sr2 = .03. simple slopes were tested and showed that situational interest positively predicted behavior for those with higher certainty, t(138) = 4.10, p < .01, b = 0.34, but did not predict behavior for those with lower certainty, t(138) = 0.40, p = .69, b = 0.04. the results from both individual and situational interest demonstrate that certainty may be useful in better predicting behavior with both types of interest. if anything, the interaction was slightly more pronounced for situational than for individual interest, although this difference was not tested directly. one explanation is that the salience of an ongoing situation may be a vivid motivator of behavior toward (or away from) domain content. however, this pattern is also tied to the nature of the situation assessed here. given that the situation that defined situational interest in this study was in reference to a semester-long, introductory class, situational interest as well as the behavior were in reference to a fairly broad definition of the field in general. other situations that are more narrow (e.g., focused on a particular topic) may not predict behavior as strongly, especially if the behavioral opportunity is more broad or more narrow than the experience of situational interest. 5. general discussion this research lays out some of the complexity inherent in asking individuals to self-report their interest and recognizes that these self-reports may be more or less certain. for domains in which individuals have varied amounts of experiences and sufficient prior exposure (e.g., math, psychology, and biology), there appears to be a curvilinear relationship between self-reported interest and self-reported certainty. individuals who reported more extreme levels of interest (either high or low) tended to report being more certain of their interest. in other words, when individuals were quite interested in a domain, they were sure of the presence of their interest; when individuals were quite disinterested in a domain, they were also sure of their lack of interest. moreover, as predicted, level of interest was a stronger predictor of behavior when certainty was high than when certainty was low. it is not clear from these data whether participants who are less certain (and more moderate in their interest) have truly neutral beliefs about the domain or if they actually have both positive and negative experiences, which are best captured as neutral on this bidirectional scale. 5.1 promises and pitfalls of self-report measures certainty of interest highlights both a promise and a pitfall of using self-report measures to assess interest. this research speaks to two of the three main questions guiding the compilation of this special issue. first, this research relates to the complexity of interpreting self-report data. on the one hand, if a sample has high certainty of their interest, then self-report measures may be easier to interpret. the certainty with which people report their interest may support processes that strengthen associations between interest and various behavioral outcomes. in considering this promise, however, it is also critical to consider the pitfall; when people are not certain, they will still provide a response but that response may not be rooted in as much experience, which could make interpretation difficult. if participants are asked to provide reports of interest in domains in which they have little experience, their reports will be ill-informed. this challenge adds to the challenge of inattentiveness, examined by iaconelli and wolters (2020). the interpretation of self-report data will likely be murky not only when people are responding inattentively, but also when they are attentive but do not have sufficient self-knowledge or experience with which to provide a meaningful response. although the current work centers on the certainty with which individuals can identify their interests, certainty may extend to other types of reports as well, such as certainty in metacognition and strategy use. other authors in this issue (e.g., rogiers et al. 2020, van halem et al.,2020), report how self-reported measures link to trace measures that occur during task engagement, and to subsequent behavior. these relationships may be stronger to the extent that participants are more certain of their self-reported meta-cognition and strategy use. second, this research also highlights constraints of self-report data that have implications for methodology. self-reported domain interest may be less valid when interest is in early phases of development (e.g., renninger & hidi, 2011), and this research suggests that certainty may capture an important element. it may take months and years for individuals to collect information about their response to a given domain in order to have certainty in their level of interest (either high or low). presumably, the development of this certainty will emerge as individuals have experiences with a domain and then think back to them retrospectively (see dinsmore et al., this issue for a discussion of retrospective processes). if interest must be measured among a sample with limited certainty of interest, one approach may be to provide supports for them to know how to respond (e.g., definitions of the domain, a particular experience to reflect on) in order to provide a more valid measure, or to assess interest multiple times during a task (moeller et al., 2020) rather than relying on a global assessment of domain interest. alternatively, researchers may want to consider certainty when selecting domains to study and identifying appropriate samples within those domains. it is noteworthy that the sample in the pilot study had relatively little certainty about their interest in astronomy and very strong certainty about their interest in psychology. given the results of studies 1 and 2, this might also have implications for behavior. if the purpose of a research study is to use interest to predict behavior, then it may be wise to select a domain in which participants have considerable experience in the domain (i.e., math), because interest may predict behavior if the sample as a whole has more experience with the domain. that said, it is also important to have variability in the sample so that it represents the full range of interest, with which to predict behavior. the pilot study included a sample composed entirely of students in their last year of studying psychology. although this sample reported very high certainty, restriction of range in their interest is likely to have limited any observed association between level of interest and behavior, had behavior been assessed. 5.2 implications for theory the observed relationship between interest and certainty may be related to participants’ experiences with domain content, and captures the changing aspect of interest within a developmental trajectory. data from the pilot study showed that the number of classes students took in each domain positively predicted certainty in a linear fashion. prior experience also varied along with the different observed relationships between interest and certainty across the domains. no reliable relationship was detected in the domain of astronomy, likely because participants had such limited experiences in the domain, and the opposite quadratic relationship was detected in the domain of psychology, presumably because students had extremely high interest and certainty. although these fluctuations are consistent with research on attitudes showing that direct experience contributes to attitude certainty (see review by glasman & albarracin, 2006), the nuances of how this occurs is important to consider for understanding how individuals come to recognize and be aware of their interest. one possibility is that the extremity of the emotion prompts awareness of the experience and contributes to certainty. individuals who have relatively intense experiences in a domain—experiences that are either highly interesting or highly distasteful—may come to realize their level of interest and have greater certainty (dutta et al., 1972). in contrast, those who have more mixed or vague reactions may be less aware of their emotional reaction to the domain content, which then leaves them with less certainty of their interest. another possibility is that the clarity or vividness of individuals’ memories contributes to certainty. for example, those who have had multiple, semester-long courses in the domain may have more vivid memories of learning in that domain, which may contribute to greater certainty. along these lines, it may be fruitful to bring research on autobiographical memory into research on interest in order to better understand how individuals recall prior experiences that may inform their report of interest. research suggests that the valence of how an event ends has a disproportionate influence on how the event is remembered (kahneman et al., 1993). research on interest may benefit by building on this foundation from the memory literature in order to better understand not only how interest develops, but whether the timing of experiences impacts how people recall the events that they come to see as foundational to their perception of interest in the domain. 5.3 implications for research certainty may also provide a handle for predicting whose interest may be more or less altered by new experiences with a domain. for example, individuals who are less certain of their interest may be more responsive to situational variables designed to affect interest. less certain individuals may be more open to collecting information through their experiences with domain content and updating their level of interest. if so, situational enhancements designed to foster interest may be more effective for low versus high certain individuals. as such, it may be beneficial to assess certainty of interest in research testing interventions designed to foster interest. the effects of a situational intervention may be positive for a subset of individuals (i.e., those with lower certainty), but this effect may be masked by others who are more certain and therefore less likely to change their level of interest. in a sense, if one thinks of situational interest as being analogous to attitudes in the face of persuasive attempts, then the attitudes literature provides some hints of this possibility as well. for example, tormala (2016) has noted how curiosity is itself a form of “interested uncertainty” which, in some situations, can aid persuasion. specifically, kupor and tormala (2015, study 3) reason that curiosity motivates more thorough processing of the persuasive message and increases the message’s impact. this may suggest that a similar process could explain differential impact of situational enhancements that are designed to foster interest. in general, future directions focused on underlying cognitive and affective processes are warranted and would very much help illuminate how interest may or may not change in response to new experiences. 5.4 differences between attitudes and interest although this research was initiated because of similarity between interest and attitudes, these constructs are not the same, which has implications for how they emerge and function. for example, certainty in the attitude literature has been divided into correctness and clarity (petrocelli et al., 2007). whereas correctness reflects the veraciousness of an attitude in an absolute sense (e.g., efforts to slow global warming is the correct perspective), clarity is the extent to which individuals are confident that they know their own stance (i.e., i know my attitude about efforts to slow global warming). applied to interest, clarity is more relevant than correctness if one assumes a stance of relativism, in which interest in one domain is not more valuable or correct than interest in any other domain. a difference between the attitude and interest literatures also emerges when one considers what is being evaluated when self-assessing an attitude versus self-assessing level of interest. attitude objects are typically outside the person and the question is whether the person agrees with or disagrees with the attitude object. this is in contrast to interest in which the person assesses how they interact with the domain. when people evaluate their interest, they evaluate their personal experiences with it and how they respond to the domain. as such, assessments of an attitude object may be more about the object and less about how the person interacts with the object. for interest, in contrast, the assessment is about one’s own response to the domain. 5.5 limitations finally, although positive relationships between interest and behavior were observed in the domains of math and psychology, the direction of causality is not clear in the present correlational studies. the theoretical model describing the relationship between interest and behavior typically places interest as the motivation (i.e., cause) of behavior; however, it is also worth pointing out that behavior may influence self-reports of interest as well as certainty. research on attitudes has addressed several of these processes (see review by olson & stone, 2005). when individuals are unsure of their attitudes, they may reflect on their past behavior as information that can be used to inform their attitude (e.g., bem, 1967). for example, when asked about global warming, individuals may scan their memories for events in which they chose to engage in pro-environment activities or not. these memories may lead them to decide on a particular self-reported attitude. a similar process may operate with regard to interest, especially when individuals have less certainty of their interest. for example, when participants in the pilot study were asked about their interest in astronomy, the domain in which they had taken the fewest classes, they may have tried to recall relevant memories. those who could generate more positive memories are likely to have rated their interest higher than those who could generate fewer memories, or negative memories. it is also worthwhile to consider how behaviors may affect self-reported interest ratings, which can also explain the observed relationships between interest and behavior found in this set of studies. in studies 1 and 2, participants first reported their interest, then their certainty of interest, and finally responded to behavioral opportunities. it is possible that participants felt pressure to behave in a way that was consistent with their initial reports, which may have strengthened the observed results (olson & stone, 2005). specifically, those who had just reported low or high levels of interest may have felt internal pressure to behave in ways that were consistent with their reports, and this may have been especially strong for those who reported greater certainty. the present research does not address this possibility, but opens up a line of research in which these ideas could be explored. finally, specific features of the studies reported here also warrant caution in drawing broad conclusions. the pilot study examined prior exposure to various domains by collecting information on students’ course experiences. however, experiences in high school courses and extracurricular activities may have also provided opportunities for exposure. inclusion of all these experiences could help paint a fuller picture of the relationship between domain exposure and certainty. furthermore, these studies involve a very narrow sample of individuals—namely, undergraduate students at a single university who are taking a psychology course. for example, it is healthy to question whether similar patterns would be observed among a sample of older adults (e.g., who may have more experience and greater certainty) or younger children (i.e., who may have even less experience than the sample tested in the current research). moreover, these psychology students may have been especially sensitive to environment cues or demand characteristics related to behaving in ways that are more consistent with their stated interest. this tendency would be exaggerated if participants felt that the desired response involved giving higher interest ratings and engaging in more behaviors. these questions cannot be answered with the current data but could be tested in the future. 5.6 concluding thoughts in summary, when interest is self-reported, the report reflects a synthesis of what individuals have available to them at the moment of measurement, and these reports are more certain for some participants than for others. this variation in certainty may provide a lens for better understanding how interest develops and how it is internalized and becomes known to individuals. as with any type of measure, it is critical to interpret the data in light of the assumptions, capacities, limitations, and processes that are relevant at the time of measurement. keypoints self-reported interest varies in certainty across individuals those with greater certainty self-report more extreme interest (high or low) self-reported interest and behaviour correlate more strongly for those with greater certainty references ainley, m., & ainley, j. (2011). a cultural perspective on the structure of student interest in science. international journal of science education, 33, 51-71. https://doi.org/10.1080/09500693.2010.518640 ainley, m., hidi, s., & berndorff, d. (2002). interest, learning, and the psychological processes that mediate their relationship. journal of educational psychology, 94, 545-561. https://doi.org/10.1037//0022-0663.94.3.545 ajzen, i. (1991). the theory of planned behavior. organizational behavior and human decision processes, 50, 179-211. https://doi.org/10.1016/0749-5978(91)90020-t bem, d. j. (1967). self-perception: an alternative interpretation of cognitive dissonance phenomena. psychological review, 74, 183-200. https://doi.org/10.1037/h0024835 berger, i. e. (1999). the influence of advertising frequency on attitude-behavior consistency: a memory based analysis. journal of social behavior & personality, 14, 547–568. bizer, g. y., tormala, z. l., rucker, d. d., & petty, r. e. (2006). memory-based versus on-line processing: implications for attitude strength. journal of experimental social psychology, 42, 646–653. https://doi.org/10.1016/j.jesp.2005.09.002 cheatham, l. b., & tormala, z. l. (2017). the curvilinear relationship between attitude certainty and attitudinal advocacy. personality and social psychology bulletin, 43, 3-16. https://doi.org/10.1177/0146167216673349 deci, e. l. (1992). the relation of interest to the motivation of behavior: a self-determination theory perspective. in k. a. renninger, s. hidi, & a. krapp, a. (eds.),the role of interest in learning and development (pp. 43-70) . hillsdale, nj: lawrence erlbaum. dutta, s., kanungo, r. n; & freibergs, v. (1972). retention of affective material: effects of intensity of affect on retrieval. journal of personality and social psychology, 23, 64–80. https://doi.org/10.1037/h0032790 fazio, r. h., & zanna, m. p. (1978). attitudinal qualities relating to the strength of the attitude– behavior relationship. journal of experimental social psychology, 14, 398–408. https://doi.org/10.1016/0022-1031(78)90035-5 glasman, l. r., & albarracin, d. (2006). forming attitudes that predict future behavior: a meta-analysis of the attitude-behavior relation. psychological bulletin, 132, 778-822. https://doi.org/10.1037/0033-2909.132.5.778 gross, s. r., holtz, r., & miller, n. (1995). attitude certainty. in r. e. petty & j. a. krosnick (eds), attitude strength: antecedents and consequences (pp.215-245). mahwah, nj: lawrence erlbaum. harackiewicz, j. m., durik, a. m., barron, k. e., linnenbrink-garcia, l., & tauer, j. m. (2008). the role of achievement goals in the development of interest: reciprocal relations between achievement goals, interest and performance. journal of educational psychology, 100(1), 105-122. https://doi.org/10.1037/0022-0663.100.1.105 hidi, s. & renninger, k.a. (2006). the four-phase model of interest development. educational psychologist, 41, 111-127. https://doi.org/10.1207/s15326985ep4102_4 iaconelli, r. & wolters c.a. (2020). insufficient effort responding in surveys assessing self-regulated learning: nuisance or fatal flaw? frontline learning research. 8 (3) 104 – 125. https://doi.org/10.14786/flr.v8i3.521 jarvis, w. b. g. (2004). medialab [computer software]. new york: empirisoft. jonas, k., diehl, m., & broemer, p. (1997). effects of attitudinal ambivalence on information processing and attitude-intention consistency. journal of experimental social psychology, 33, 190–210. https://doi.org/10.1006/jesp.1996.1317 kahneman, d., fredrickson, b. l., schreiber, c. a., & redelmeier, d. a. (1993). when more pain is preferred to less: adding a better end. psychological science. 4, 401–405. https://doi.org/10.1111/j.1467-9280.1993.tb00589.x krapp, a. (2002). an educational-psychological theory of interest and its relation to sdt. in. e. l. deci & r. m. ryan (eds.), handbook of self-determination research (pp. 405-427). rochester, ny: university of rochester press. krapp, a., & prenzel, m. (2011). research on interest in science: theories, methods, and findings. international journal of science education, 33, 27-50. https://doi.org/10.1080/09500693.2010.518645 krishnan, h. s., & smith, r. e. (1998). the relative endurance of attitudes, confidence and attitude-behavior consistency: the role of information source and delay. journal of consumer psychology, 7, 273–298. https://doi.org/10.1207/s15327663jcp0703_03 krosnick, j. a., boninger, d. s., chuang, y. c., berent, m. k., & carnot, c. g. (1993). attitude strength: one construct or many related constructs? journal of personality and social psychology, 65, 1132-1151. https://doi.org/ 10.1037/0022-3514.65.6.1132 krosnick, j. a., & petty, r. e. (1995). attitude strength: an overview. in r. e. petty & j. a. krosnick (eds.), attitude strength: antecedents and consequences (pp. 1-24). mahwah, nj: lawrence erlbaum. kupor, d. m., & tormala, z. l. (2015). persuasion, interrupted: the effect of momentary interruptions on message processing and persuasion. journal of consumer research, 42, 300–315. https://doi.org/ 10.1093/jcr/ucv018 linnenbrink-garcia, l., durik, a. m., conley, a. m., barron, k. e., tauer, j. m., karabenick, s. a., & harackiewicz, j. m. (2010). measuring situational interest in academic domains. educational and psychological measurement, 70, 647-671. https://doi.org/10.1177/0013164409355699 mitchell, m. (1993). situational interest: its multifaceted structure in the secondary school mathematics classroom. journal of educational psychology, 85, 424–436. https://doi.org/10.1037/0022-0663.85.3.424 moeller, j., viljaranta, j., kracke, b., & dietrich, j. (2020). disentangling objective characteristics of learning situations from subjective perceptions thereof, using an experience sampling method design. frontline learning research. 8 (3) 63-84. https://doi.org/10.14786/flr.v8i3.529 nunnally, j. c., & bernstein, i. a. (1994). psychometric theory (3rd ed.). new york: mcgraw-hill. olson, j. m., & stone, j. (2005). the influence of behavior on attitudes. in d. albarracin, b. t. johnson, & m. p. zanna (eds.), the handbook of attitudes (pp. 223-271). mahwah, nj: lawrence erlbaum. petrocelli j. v., tormala, z. l., & rucker, d. d. (2007). unpacking attitude certainty: attitude clarity and attitude correctness. journal of personality and social psychology, 92, 30-41. https://doi.org/10.1037/0022-3514.92.1.30 pomerantz, e. m., chaiken, s., & tordesillas, r. s. (1995). attitude strength and resistance processes. journal of personality and social psychology, 69, 408-419. https://doi.org/10.1037/0022-3514.69.3.408 prislin, r., wood, w., & pool, g. j. (1998). structural consistency and the deduction of novel from existing attitudes. journal of experimental social psychology, 34, 66–89. https://doi.org/10.1006/jesp.1997.1343 renninger, k. a. (1990). children’s play interests, representation, and activity. in r. fivush & k. hudson (eds.), knowing and remembering in young children (pp. 127-165). new york: cambridge university press. renninger, k. a. (2000). individual interest and its implications for understanding intrinsic motivation. in c. sansone and j. m. harackiewicz (eds.), intrinsic and extrinsic motivation: the search for optimal motivation and performance (pp. 373-404). san diego, ca: academic press, inc. renninger, k. a., & hidi, s. (2011). revisiting the conceptualization, measurement, and generation of interest. educational psychologist, 46, 168-184. https://doi.org/10.1080/00461520.2011.587723 renninger, k. a., hidi, s., & krapp, a. (1992). the role of interest in learning and development. hillsdale, nj: lawrence erlbaum. renninger, k. a., & su, s. (2012). interest and its development. in r. m. ryan (ed.), the oxford handbook of motivation (pp. 167-187). oxford: oxford university. rogiers, a., merchie, e., & van keer h. (2020). opening the black box of students’ text-learning processes: a process mining perspective. frontline learning research. 8(3), 40–62. https://doi.org/10.14786/flr.v8i3.527 rucker, d. d., tormala, z. l., petty, r. e., & briñol, p. (2014). consumer conviction and commitment: an appraisal-based framework for attitude certainty. journal of consumer psychology, 24 (1), 119-136. https://doi.org/10.1016/j.jcps.2013.07.001 schiefele, u. (1991). interest, learning, and motivation. educational psychologist, 26, 299-323. https://doi.org/10.1080/00461520.1991.9653136 schiefele, u. (1999). interest and learning from text. scientific studies of reading, 3, 257-279. https://doi.org/10.1207/s1532799xssr0303_4 schraw, g., & lehman, s. (2001). situational interest: a review of the literature and directions for future research. educational psychology review, 13, 23-52. https://doi.org/10.1023/a:1009004801455 silvia, p. j. (2005). what is interesting? exploring the appraisal structure of interest. emotion, 5, 89–102. https://doi.org/10.1037/1528-3542.5.1.89 silvia, p. j. (2006). exploring the psychology of interest. new york: oxford university press. silvia, p.j. (2008). appraisal components and emotion traits: examining the appraisal basis of trait curiosity. cognition and emotion, 22, 94–113. https://doi.org/10.1080/02699930701298481 simpkins, s. d., davis-kean, p. e., & eccles, j. s. (2006). math and science motivation: a longitudinal examination of the links between choices and beliefs. developmental psychology, 42, 70-83. https://doi.org/10.1037/0012-1649.42.1.70 tormala, z. l. (2016). the role of certainty (and uncertainty) in attitudes and persuasion. current opinion in psychology, 10, 6-11. https://doi.org/10.1016/j.copsyc.2015.10.017 tormala, z. l., & rucker, d. d. (2007). attitude certainty: a review of past findings and emerging perspectives. social and personality psychology compass, 1, 469-492. https://doi.org/10.1111/j.1751-9004.2007.00025.x van halem, n., van klaveren, c., drachsler h., schmitz, m., & cornelisz, i. (2020). tracking patterns in self-regulated learning using students’ self-reports and online trace data. frontline learning research. 8 (3) 140-163. https://doi.org/10.14786/flr.v8i3.497 wijnia, l., loyens, s. m. m., derous, e., & schmidt, h. g. (2014). do students’ topic interest and tutors’ instructional style matter in problem-based learning? journal of educational psychology, 106, 919-933. https://doi.org/10.1037/a0037119 frontline learning research 1 (2013) 81 96 issn 2295-3159 corresponding author: kerry lee, national institute of education, 1 nanyang walk, singapore 637616, kerry.lee@nie.edu.sg, t +65 6219 3888, f +65 6896 9845 http://dx.doi.org/10.14786/flr.v1i1.49 81 | f l r longer bars for bigger numbers? children’s usage and understanding of graphical representations of algebraic problems kerry lee, kiat hui khng, swee fong ng, jeremy ng lan kong national institute of education, nanyang technological university, singapore article received 7 june 2013 / revised 19 july 2013 / accepted 16 august 2013 / available online 27 august 2013 abstract in singapore, primary school students are taught to use bar diagrams to represent known and unknown values in algebraic word problems. however, little is known about students’ understanding of these graphical representations. we investigated whether students use and think of the bar diagrams in a concrete or a more abstract fashion. we also examined whether usage and understanding varied with grade. secondary 2 (n = 68, mage = 13.9 years) and primary 5 students (n = 110, mage = 11.1 years) were administered a production task in which they drew bar diagrams of algebraic word problems with operands of varying magnitude. in the validation task, they were presented with different bar diagrams for the same word problems and were asked to ascertain, and give explanations regarding the accuracy of the diagrams. the küchemann algebra test was administered to the secondary 2 students. students from both grades drew longer bars to represent larger numbers. in contrast, findings from the validation task showed a more abstract appreciation for how the bar diagrams can be used. primary 5 students who showed more abstract appreciations in the validation task were less likely to use the bar diagrams in a concrete fashion in the production task. performance on the küchemann algebra test was unrelated to performance on the production task or the validation task. the findings are discussed in terms of a production deficit, with students exhibiting a more sophisticated understanding of bar diagrams than is demonstrated by their usage. keywords: algebra; pre-algebra; graphical representation; mathematical understanding k. lee et al. 82 | f l r 1. introduction singapore has performed well in recent international tests of mathematics (mullis, martin, gonzalez, & chrostowski, 2004; oecd, 2010). perhaps because of this, there has been much interest in the singapore mathematics curriculum, with some schools in other countries having been reported to have adopted her curriculum (e.g., hu, 2010). the wisdom of such cross-country adoption aside, one peculiar feature of the singapore curriculum is that algebraic thinking is introduced early. unlike many countries where algebra is introduced in the secondary or high school years, mathematical problems with an algebraic structure are taught in the senior primary years (grades 4 – 6). algebra is recognised widely as an important pillar for both academic and economic success (national council for teachers of mathematics, 2000; national mathematics advisory panel, 2008). primary school children have been shown capable of exhibiting algebraic thinking (e.g. carpenter & levi, 2000; carraher, schliemann, brizuela, & earnest, 2006; ng & lee, 2009; swafford & langrall, 2000; warren & cooper, 2005; warren & cooper, 2009). however, whether this understanding is similar to that of older students has not been studied widely. certainly, algebra can be difficult; even for college students. some of the documented difficulties include intrusion of arithmetic reasoning (e.g. khng & lee, 2009; ng, 2003; stacey & macgregor, 1999), difficulties translating word problems to equations (e.g. capraro & joffrion, 2006; duru, 2011; hefferman & koedinger, 1997), problems with the concept of equivalence (e.g. hunter, 2007; kieran, 1981; knuth, stephens, mcneil, & alibali, 2006; steinberg, sleeman, & ktorza, 1991) and poor understanding of the concept of variables (e.g. küchemann, 1978). in this study, we focused on an important pedagogical device for providing children with earlier access to algebraic problems. in the latter part of primary 4 (grade 4, ~10 years old), children in singapore are introduced to algebraic or start-unknown word problems. instead of symbolic algebra, children are taught a graphical heuristic in which they draw bar diagrams to represent known and unknown quantities (ng & lee, 2005). we examined children’s usage and understanding of this heuristic. 1.1 an early start to learning algebra this graphical heuristic, also called the model method, provides students with access to problems that would otherwise require symbolic algebra. three different types of graphical models are commonly taught: (a) part-whole, (b) comparison, and (c) multiplication and division models (ng & lee, 2009). to illustrate its operation, take for example a simple question. mary and john have 6 marbles altogether. john has 2 more marbles than mary. how many marbles does mary have? figure 1 shows the graphical and letter symbolic approach to the problem. in the graphical approach, students draw rectangular bars to represent the number of marbles carried by mary and john. the difference between the two quantities is shown by drawing one bar longer than the other. because the quantitative difference is specified in the question, john’s bar is drawn longer and the quantity represented by the difference in length -the difference unit -is labelled as 2. with this graphical representation, children typically proceed with a variety of arithmetic strategies, such as unwinding or guess-and-check (nathan & koedinger, 2000), to arrive at the solution. k. lee et al. 83 | f l r (a) solution by the model method (b) solution by letter-symbolic algebra 6 – 2 = 4 2 units = 4 1 unit = 2 mary has 2 marbles. let the number of marbles mary has be x. john has x + 2 number of marbles. x + x + 2 = 6 2x = 4 x = 2 mary has 2 marbles. figure 1. model method and the symbolic algebra approach to the question “mary and john have 6 marbles altogether. john has 2 more marbles than mary. how many marbles does mary have?” the bar labelled “2” in (a) represents the difference unit. formal algebraic notations in the form of letter symbols and related expressions were not introduced till some time into grade 6. although both the model method and symbolic algebra require students to translate information in a word problem to an alternative representation, there are some fundamental differences. symbolic algebra requires students to work directly with unknown quantities. an equation comprising both known and unknown values is formed using forward operations and the unknown is solved by constructing a series of equivalent expressions. students operate on the equation in a way that maintains symmetric equivalence across the two sides of the equals sign. in contrast, the graphical approach can trace its roots to pedagogical tools used in the early primary school years. beginning in lower primary, familiar objects and pictures (e.g., pictures of bears and dolls) are used to depict known quantities as an aid to understanding arithmetic word problems. with the graphical approach, a standardized representational tool is used to represent both known and unknown quantities. of course, unknown quantities cannot be depicted exactly. instead, children are taught to draw a diagram with the unknown quantities depicted by bars of arbitrary length, which are constrained by the given quantities and their quantitative relations. computation of solution is effected using arithmetic procedures in which students work only with known quantities. in other words, the unknown is solved by the direct application of arithmetic operations on known values (e.g., backward operations such as unwinding). unlike symbolic algebra, with the graphical approach, the equals sign is generally used as a directive to calculate instead of representing equality (see khng & lee, 2009, for more details). some recent works on the use of the model method were motivated by parents’ and teachers’ concerns that the model method may confuse students when it comes time for them to learn letter symbolic algebra, with some concerned that the two approaches may draw on different cognitive processes (e.g., kwokwc, 2011; lim, 2007). the model method is taught system-wide and has been part of the national curriculum in singapore for over a decade. for this reason, it is difficult to evaluate these claims using standard programme evaluation methodology. in two recent studies, lee and his colleagues used functional magnetic resonance imaging techniques to examine the cognitive underpinnings of these two approaches (lee et al., 2007; lee et al., 2010). the findings showed substantial overlap between the two approaches, but symbolic algebra activated more strongly areas associated with attention and working memory engagement. these findings suggest that symbolic algebra is more demanding of cognitive resources than the model method. from a pedagogical viewpoint, introducing the model method prior to symbolic algebra can thus be interpreted as being consistent with their respective cognitive demands. k. lee et al. 84 | f l r 1.2 the model method, letter symbols, and variables regarding the cognitive factors that influence children’s success in algebra, there has been a number of recent studies on both domain-general and domain-specific correlates of algebraic performance (e.g., fuchs et al., 2012; lee, ng, bull, pe, & ho, 2011; tolar, lederberg, & fletcher, 2009; wei, yuan, chen, & zhou, 2012). on the specific influence of using diagrams, there is a large body of research that examined whether students learn better when text is accompanied by diagrams (mayer, 1989, 2002) or when students generate the diagrams themselves (meter & garner, 2005). of particular relevance are a number of studies conducted by koedinger and his colleagues. they investigated the use of “picture algebra”, a strategy similar to the model method. they found that by using this strategy, even students in grade 6 were successful in solving algebraic problems that are known to be challenging for older students (koedinger & terao, 2002). the strategy was also found to be effective for lower achieving pre-algebra students (booth & koedinger, 2010). although we now have some information on the efficacy of the model method and some of the contexts in which they are more likely to be efficacious, an important issue on which we know little is how students understand or perceive these graphical representations. compared to pedagogical practices in earlier grades in which familiar objects are used to depict operands in arithmetic in a one-to-one manner, the model method involves a greater degree of abstraction. instead of familiar and discrete objects (e.g., bears or dolls), bars of different lengths are used. nonetheless, given students’ earlier experiences, it is possible that they retain a concrete way of thinking about the bars, such that a bar of a certain length is deemed capable of holding, say, ten bears and ten bears alone. in teaching the model method, teachers generally ask children to use relatively longer bars for bigger numbers (ng & lee, 2009). more important than absolute length is that, within a problem, children are taught that the lengths of the bars should preserve the quantitative relations between the protagonists. this is especially when more than two protagonists are involved in a problem. how students understand these graphical representations is important because when they learn symbolic algebra, a concept that they should understand is that letter symbols (e.g., x and y) denote variables. although the model method gives children earlier access to algebraic questions without the use of letter symbols, the bars that are used to depict quantities play a similar role as do letter symbols in algebraic equations. both serve to depict relations between known and unknown quantities. if children regard the bar diagrams in a concrete manner, one potential drawback is that they may over-generalise and regard x and y as depicting unknown constants. a related concern was raised by dede (2004), who argued that the preponderance of questions that use letter-symbols to represent unknowns could result in students developing a restricted view of the roles and functions of variables. the concept of a variable is important in algebra, but it is also difficult, perhaps because it has different meanings or usages. usiskin (1988) argued that variables are used in different ways: (a) as an unknown or a constant, (b) a pattern generalizer, (c) an argument or parameter, or (d) an arbitrary mark on paper. when students are first introduced to letter symbols, they typically encounter them as unknowns in solve-for-x questions, where students are asked to find a solution for the letter symbols (e.g., x + y = 47, x + 13 = y. what is the value of x?). as dede (2004) argued, one concern is that given the preponderance of experiences with this usage of letter symbols, children come to take this as the norm and neglect to entertain other ways in which variables are used. indeed, when asked to simplify algebraic expressions (e.g., 3x + 5x 24), children tended to find a solution for x instead (philipp, 1992). a similar difficulty was reported by akgün and özdemir (2006). in their study, students presented with x + 2 = 2 + x attempted to solve for x when they were told to report all the values that x can assume. kuchëmann (1978) investigated students’ understanding of the use of letter symbols and found most secondary school students treated them as concrete placeholders or “shorthand names” (macgregor & stacey, 1997) (e.g., p in 3p as pears instead of number of pears). only a small number of students displayed an understanding of letters as representing specific unknowns. an even smaller number considered the letters to be generalized numbers. k. lee et al. 85 | f l r 1.3 the present study to understand better the utility of teaching the method model, we focused on students’ understanding of these graphical representations. specifically, we investigated whether students use and think of the bar diagrams in a concrete or a more abstract fashion. we asked children to perform two tasks. in the production task, we examined how they drew model representations for algebraic word problems with operands of varying quantities. we defined concrete usage as varying the length of bars across questions in accordance to the magnitude of operands. abstract usage was indicated by the lack of a consistent relation between the length of the bars and the magnitude of operands. we also manipulated the sequence in which increases in the magnitude of operands were presented. because a sequential increase may overly focus the children’s attention on changes in magnitude and attenuate any tendency to adopt a more abstract strategy, we also presented the increases in a random sequence. in the validation task, we assessed the same children’s understanding by asking them to judge and explain whether several presented graphical representations were drawn correctly. the representations contained drawings that fall into either the concrete or abstract pattern as defined above. children with more sophisticated understanding were expected to know that though bars of particular lengths depict unknown constants within the context of each question, across questions, the same bars can be used to depict different quantities. in fact, it is when letter symbols are considered in this sense that they are considered variables. we tested primary 5 (grade 5) children who have not been taught symbolic algebra and secondary 2 (grade 8) children who have been taught both the model method and symbolic algebra. the primary school children would have used the model method for a year, whereas the secondary school children would have been introduced to symbolic algebra some time during since primary 6. in an interview study conducted with ten primary 5 students, most of the children showed some abstract understanding of how the bar diagrams should be used (ng & lee, 2008). typical of their responses was statements suggesting that the absolute size of the bars do not matter. with their added experience with algebraic problems, we expected the secondary school children to display more abstract usage and awareness. to examine how our measures of children’s understanding were related to other tests of algebraic understanding, we administered the küchemann algebra test (brown, hart, & kuchemann, 1985) to the secondary school children. this is a standardized measure that gauges children’s understanding of variables expressed in the letter symbolic format. it was given only to secondary school children as the primary school children have not been exposed to letter symbols. 2. method 2.1 participants and design the experiment was based on a 4 (magnitude band: one, tens, hundreds, versus thousands) × 2 (question sequence: increasing value versus randomised) × 2 (grade: primary 5 versus secondary 2) full factorial split-plot design. magnitude band served as the only within-subject variable. a total of 68 secondary 2 students (mage =13.9, sd = 0.45) and 110 primary 5 students (mage = 11.1, sd = 0.60) from 5 schools (2 secondary and 3 primary) of mixed abilities and social economic status participated in the study. all schools were government funded, located in the western region of singapore, and followed the national mathematics curriculum. all children participated with parental consent. 2.2 task and materials we designed a web-based program comprising a production task followed by a validation task. participants’ inputs on the computer interface were logged on a server. in addition, the secondary 2 students were administered the küchemann algebra test (brown et al., 1985). k. lee et al. 86 | f l r 2.2.1 production task the children were presented with algebra word problems on the computer screen and were asked to draw model diagrams for these problems using the graphical tools provided onscreen. a computerized interface provided a standardized interface both for problem administration and data collection. the children started with an online palette that contained a variety of specially designed drawing and labelling tools. the children viewed and worked on one problem at a time. they were not required to solve the problem, just to draw the bar diagrams in the same format that they would with pen and paper. there was no limit to the size of the bars that could be drawn. each child completed 12 problems. all the problems were of the same structure, but varied in the name of the protagonists and referents (e.g., mary versus jane, marbles versus cupcakes). the specific magnitude band in each problem varied across four scales: ones, tens, hundreds, and thousands. three problems were given within each scale. the size of the operand relating to the difference unit -our dependent measure of interest -in these three problems was drawn from the smaller, medium, and larger range of each magnitude band; with one question from each range. in figure 1a, for example, we depicted a question with a small difference unit (“2”) from the magnitude band of “ones”. a question with a large difference-unit operand (e.g., 7), still from the magnitude band of ones, would read something like “mary and john have 9 marbles altogether. john has 7 more marbles than mary…” a question with a large difference unit (e.g., 70) from the magnitude band of tens would have been presented as “john has 70 more marbles than mary…” to reduce stimuli specific effects, we developed two parallel sets of problems that were identical in structure and differed only in specific quantities. in both sets, questions were administered either in randomized order or in accordance to the magnitude of the difference-unit operands. the children’s drawings were recorded by the computer program, which also logged the pixel length of the bar diagrams. the pixel length of the difference unit or the section of the bar that represents the difference between the two protagonists (see figure 1a) was used as the dependent variable. of particular interest was whether students varied the length of the difference unit across magnitude bands. that is, did they draw bars that were much longer to represent “john has 70 more marbles” as compared to “john has 7 more marbles”? because we did not impose an upper limit on the length of the bar, there was also no upper cap on the range for the dependent variable. in table 1, we provided both the confidence intervals and the range from the observed data. 2.2.2 validation task participants were presented with two sets of 3 questions. for the first set of validation questions, each question comprised a word problem followed by two model diagrams differing in whether proportionality in the length of the bars was maintained, across questions. in one diagram, the bars were drawn with longer bars for larger numbers. in the other, proportionality was not maintained. in other words, the diagrams differed in whether the bars were drawn in a more concrete or abstract manner. participants were asked to choose which diagram (one or the other, or both) was correct. they were also asked to explain their selection by selecting a response from four multiple choice options (see appendix a). for the second set of questions, each question contained two word problems, each accompanied by two diagrams, said to be drawn by student a and student b respectively. the two word problems in each question were structurally equivalent, differing only in the magnitude of the operands. student a’s diagrams demonstrated a more abstract usage of the bars: the size of the bars were identical for operands of different sizes. student b’s diagrams demonstrated a more concrete usage: the size of the bars differed in accordance to changes in the size of the operands. participants were asked to indicate if the two students were correct. a maximum of 9 marks could be obtained in the validation task (see appendix a for scoring criteria) with higher marks indicating a more abstract understanding of the bar diagrams. k. lee et al. 87 | f l r 2.2.3 küchemann algebra test the küchemann algebra test from the chelsea diagnostic mathematics tests (brown et al., 1985) is a standardized measure of how children interpret letter symbols used in algebra. up to six different common interpretations have been identified in the literature: (i) letter numerically evaluated, (ii) letter not used or ignored, (iii) letter used as an object or abbreviation, (iv) letter used as a specific unknown, (v) letter used as a generalised number, (vi) letter used as a variable. the test places children on one of four levels of understanding based on the type and complexity of their interpretations. for example, a student using any of the first three interpretations and can only answer very simple questions is classified as level 1. a student using the same interpretations, but who is able to answer more structurally complex questions is classified as level 2. understanding letter symbols as referring to specific unknowns qualifies classification at level 3. this is deemed a basic level of understanding required for symbolic algebra. level 4 requires students to demonstrate an understanding of letter symbols as generalised numbers. of interest was whether secondary 2 students’ attainment on the küchemann test correlated with their performance on the production task. that is, did students with higher scores on the küchemann test show less tendency to adjust the length of the bars in accordance to the magnitude of the operands? 2.3 procedure for the computerised tasks, participants from the same school were tested together in a single group session in their school computer laboratories. the secondary 2 students completed the küchemann algebra test in an additional session in a classroom. 3. results to examine whether children’s performances on the production task differed across the various experimental conditions, we subjected the data to a 4 (magnitude band: ones, tens, hundreds, versus thousands) × 2 (question sequence: increasing value versus randomised) × 2 (grade: primary 5 versus secondary 2) repeated measures multivariate analysis of variance. pixel length of the difference unit drawn for questions with smaller, medium, versus larger operands within each magnitude band served as the dependent measures. in addition to the main independent variables, we entered which of the two parallel forms children were administered to take account of potential differences in performance across the two forms. descriptive statistics can be found in table 1. k. lee et al. 88 | f l r table 1 mean pixel length, standard deviation, and confidence intervals for the production task size of difference unit operand small medium large order of presentation/ magnitude band m sd 95% ci m sd 95% ci m sd 95% ci primary 5 increasing (n = 55) ones 33 (19) [28, 38] 51 (29) [43, 59] 111 (56) [95, 127] tens 44 (25) [38, 51] 57 (29) [49, 65] 116 (59) [100, 132] hundreds 50 (28) [42, 57] 60 (28) [52, 68] 122 (68) [103, 140] thousands 54 (35) [44, 63] 59 (27) [52, 66] 106 (62) [89, 123] randomized (n = 55) ones 35 (22) [29, 41] 53 (34) [44, 62] 74 (54) [59, 88] tens 39 (22) [33, 46] 54 (26) [47, 61] 104 (53) [89, 118] hundreds 48 (26) [41, 55] 54 (23) [48, 61] 101 (77) [80, 122] thousands 59 (37) [49, 69] 65 (35) [55, 74] 123 (63) [106, 140] secondary 2 increasing (n = 36) ones 36 (16) [31, 42] 59 (26) [50, 67] 133 (57) [114, 153] tens 47 (28) [38, 56] 61 (25) [53, 70] 124 (51) [106, 142] hundreds 53 (29) [43, 63] 69 (30) [58, 79] 121 (53) [103, 138] thousands 59 (32) [48, 70] 74 (29) [64, 84] 125 (48) [109, 141] randomized (n = 32) ones 36 (18) [30, 43] 59 (29) [49, 70] 111 (70) [85, 136] tens 52 (25) [43, 61] 65 (25) [56, 74] 138 (73) [111, 164] hundreds 53 (24) [45, 62] 72 (27) [62, 82] 134 (84) [103, 164] thousands 70 (38) [56, 84] 81 (39) [67, 95] 150 (71) [125, 176] notes. 1. m and sd refers to the means and standard deviations of the bars drawn for the difference unit (du). 2. 95% ci refers to the 95% confidence interval. it is calculated from the sample mean and is used as an indication of the precision of the estimate. typically, the narrower is the range, the more precise the estimate. 3. values for the m, sd and the 95% ci are rounded to the nearest pixel unit. 4. values for the du range from 5 to 356. length of the difference unit drawn by the students was affected by magnitude band, f(9, 150) = 12.85, p < .001, ηp 2 = .44. as can be noted in both table 1 and figure 2, children generally drew longer bars for larger operands. however, this was qualified by an interaction with question sequence, f(9, 150) = 4.13, p < .01, ηp 2 = .20. univariate tests showed that the interaction effect was not significant for either the smaller or medium sized operands. for these operands, there were significant and strong linear trends across the four magnitude bands, .39 > ηp 2 > .14, regardless of sequence of presentation. in other words, children uniformly used longer bars for small and medium sized operands, regardless of magnitude band. for large operands, a strong linear and increasing trend across the four magnitude bands was found when question sequence was randomised, f(1, 80) = 61.93, p < .01, ηp 2 = .44. there were no differences in the length of the bars, across the four magnitude bands, when questions with large operand sizes were presented as part of a sequence in which the size of operands was ordered (see figure 2). k. lee et al. 89 | f l r figure 2. mean performance on the production task by magnitude band and order of question presentation for smaller, medium and larger numbers within each band. although there was a significant main effect associated with grade, f(3, 156) = 4.05, p < .01, ηp 2 = .07, with the older children drawing longer bars, it did not enter into interaction with other variables. for secondary 2 students, we also tested the relation between their performance on the production task and the küchemann algebra test. as there were only 2 children in the küchemann level 1 category, levels 1 and 2 were combined. approximately 22% of the students attained level 1 – 2, 40% level 3, and 38% level 4. a 4 (magnitude: ones, tens, hundreds, vs. thousands) × 3 (küchemann: level 1 – 2, level 3 vs. level 4) repeated measures analysis of variance showed no significant interaction between magnitude band and level of algebraic understanding on the length of the difference unit drawn by the children. children with more advanced understanding of the use of letters in algebra adjusted the length of bars according to magnitude in a similar manner as did students with a more basic level of algebraic understanding. the students performed well on the validation task with 62% scoring the maximum nine marks. oneway analysis of variance indicated that there were no significant age differences in performance on the validation task. amongst the secondary 2 students, performances also did not differ across levels of understanding on the küchemann algebra test. we also examined whether performance on the production task was related to performance on the validation task. performance on the production task, relative to magnitude band, was indexed by a performance coefficient. this was derived for each individual by fitting a line of best fit using the length of their bar diagrams and the four magnitude conditions as the two axes. a larger performance coefficient indicates a greater propensity to adjust the length of the bars according to magnitude band. there was a small but significant correlation between the performance coefficient and the validation task score but only for the younger age group (r = 0.26, p < .007). primary 5 students with a higher score on the validation task were less likely to adjust the length of the bars according to the magnitude of the operands. 4. discussion findings from the production task showed that children drew longer bars when the magnitude of the operands increased. the only condition in which children did not do this was when they were presented with larger numbers, and only when questions were presented in order of operand magnitude. these findings k. lee et al. 90 | f l r suggest that the children used the bars in a concrete fashion, but their usage is tempered by affordances in the question set. recall that we defined concrete usage as drawing longer bars to denote larger operands, with abstract usage being indicated by the lack of a consistent relation between the length of the bars and magnitude of the operands. children are less likely to engage in a concrete fashion when changes in the magnitude of operands across questions are more salient. findings from the validation task point to a different conclusion. more than half of the children scored full marks and demonstrated awareness that the absolute size of the bars were not essential to the accuracy of the model representations. in contrast to findings from the production task, this finding shows that the majority of children have quite sophisticated understanding of how the graphical representations should be interpreted. indeed, for the younger children, there was a significant correlation between performances on the production and validation tasks. those who showed better understanding on the validation task were more likely to produce models that conformed to our definition of abstract depiction. our data provide no definitive information on why the children’s performance on the production task was less sophisticated than their performance on the validation task. the findings point to another case of children knowing and understanding more than what they can do. this is a common phenomenon in the development of complex skills. in the development of memory strategies, for example, children tend not to deploy skills or strategies spontaneously, but are able to do so successfully when either instructed or are given explicit prompts (flavell, 1970; harnishfeger & bjorklund, 1990). here, both the younger and older children seem to be producing graphical depictions in a concrete manner despite having fairly sophisticated understanding. however, once the manipulation of problem size becomes apparent, they are able to deploy their knowledge accordingly. an alternative version of this explanation, which focuses on the affordances of the production task, is that the students approached the task with what is most familiar. in the production task, the children were not given specific instructions or directives on how the bar diagrams should be drawn. although all the children have had extensive practice with the use of such graphical representations, it is possible that they drew the bars in a more concrete manner because this was what they did, and had to do, with arithmetic problems. findings from the validation task suggest that many children have some appreciation of the fact that the absolute length of the bars across questions is unimportant. although this is just one aspect in understanding the role of variables in algebraic equations, from a pedagogical viewpoint, it is comforting to know that the use of the model method is not overtly associated with erroneous thinking regarding the nature of what is being represented. what is somewhat worrisome is that the secondary school children’s performance on the production task was no different from the primary school children’s. if the younger children’s performance resulted from a production deficiency, the older children, with more experience with such questions, should have been able to use the heuristic in a more abstract fashion. although speculative, one explanation for why this was not observed is related to the way in which algebra is taught. in secondary schools, students are taught symbolic algebra and use letter symbols to represent unknown values. in some schools at least, there is little discussion of differences and similarities in using bar diagrams versus letter symbols to represent known and unknown quantities (ng, lee, ang, & khng, 2006). it is perhaps this lack of explicit linkage that resulted in some lingering confusion, which is reflected in the children’s performance on the production task. nonetheless, given their performance on both the validation task and the küchemann test, their performance on the production task should not be viewed as a major deficit. one pedagogical approach that may further benefit student is to emphasize the different situations under which the two approaches are best suited. although not specifically focused on the model method, previous research have shown that students are more successful when they use more concrete or grounded representations to solve simple algebra questions, but are more successful with more abstract, symbolic representations with more complex problems (koedinger, alibali, & nathan, 2008; koedinger & nathan, 2004). findings from the küchemann algebra test showed that the majority of our secondary 2 students demonstrated a level of understanding that is deemed sufficient to enable them to engage in further studies in algebra. one challenging aspect of the findings is that performance on the küchemann test was not related to performances in either the production or validation tasks. the küchemann test focuses on how children interpret letter symbols in algebra. in contrast, our measures are focused on children’s usage and understanding of the bar diagrams used to represent algebraic questions. although an understanding of the k. lee et al. 91 | f l r use of letter symbols to represent unknowns should help children’s performances in our tasks, one interpretation of the findings is that there is a lack of transfer between understanding the notion of variables when represented as letter symbols versus when represented in the form of bars. an alternative interpretation is that the two types of tasks map onto aspects of algebraic understanding that are more disparate than we anticipated. further research on linkages between these aspects of algebra may help bridge the gap between what is taught in the primary and secondary curricula. 5. conclusion the main aim of this study was to understand how primary and secondary school students use and understand the bar diagrams used for solving algebraic questions. findings from the production task showed that children generally drew longer bars for bigger operands. although this finding shows that the children are using the graphical representations in a more concrete fashion, findings from the validation task suggest that their understanding is more sophisticated. in the validation task, both the younger and the older children demonstrated understanding that the bars can be used in an abstract manner and the length of the bars need not be tied to the size of the operands. the mismatch between findings from the production and validation tasks is interpreted as evidence of a production deficit. although it was surprising that our primary and secondary school students performed in a similar fashion in the production task, when the totality of findings are considered, we do not think the evidence points to a major deficit. despite the production finding, for the secondary students, performances on the production and validation tasks were not correlated. furthermore, the great majority of secondary school students showed quite sophisticated understanding on both the validation task and the küchemann test. on the other hand, for the primary students, the negative correlation between the production and validation tasks suggests that examining the way these students depict problems with different operands does provide some indication of their understanding. primary school teachers may wish to use similar sets of questions, presented in a randomised order, to gain additional insight on students’ facility with algebraic concepts. it should be noted that what is important is not the length of the bars produced for each individual question, but the pattern of responses to changes in the magnitude of operands that provide insight to children’s understanding. discussing how bars of the same length can be used to represent operands of different magnitudes may also help teachers make explicit the conceptual connections between the bars and letter symbols used in algebra. although we were motivated by concerns regarding the role of the model method in the curriculum, this study is not an evaluation of the curriculum, nor did we evaluate whether learning the model method aids in the acquisition of letter symbolic algebra. instead, the findings provide some answers to how children use and understand the model method, which may assist policy makers and curriculum designers when they evaluate its role in the curriculum. to that end, we were encouraged by the sophistication of the children’s responses in the validation task and view these findings as being supportive of the way in which algebraic problem solving is taught. keypoints unlike many other countries, algebraic word problems are introduced in the primary school years in the singapore mathematics curriculum. this study examined how children understand and use bar diagrams that are used to give them earlier access to such problems. both grade 5 and 8 students showed an abstract understanding of the bar diagrams. however, they tended to use the diagrams in a more concrete fashion. discrepancy between what the students produced versus what they understood is indicative of a production deficit. k. lee et al. 92 | f l r discussing how bars of the same length can be used to represent operands of different magnitudes in algebraic questions may help teachers make explicit the conceptual connections between the bar diagrams and letter symbols. acknowledgements the work was supported by grants from the centre for research in pedagogy and practice (#crp 9/05 kl). views expressed in this article do not necessarily reflect those of the national institute of education, singapore. we thank the students who participated in this study and the school administrators who provided access and assistance. appendix a here are some word problems. student a and student b drew the models for these problems. you have to pick if student a is correct, or student b is correct, or if both are correct. you can do this by checking the box next to the options. mary has some marbles. john has 30 marbles more than mary. they have 150 marbles altogether. how many marbles has mary? student a drew the following model. student b drew the following model. who is correct? student a student b both students a and b check the box next to the reason that best explains your choice. student a is correct because the numbers are small. therefore, the rectangles are short. student b is correct because the numbers are big. therefore, the rectangles are long. both students a and b are correct because the size of the rectangle does not matter. both students a and b are wrong because the models are wrongly drawn. note. one mark was awarded if the student chose “both students a and b” and no marks were given for choosing either “student a” or “student b”. one mark was awarded for choosing the answer “both students a and b are correct because the size of the rectangle does not matter” and no marks were given for any other responses. k. lee et al. 93 | f l r student a and student b drew models for the 2 questions below. q.1 mary has some marbles. john has 30 marbles more than mary. they have 150 marbles altogether. how many marbles has mary? q.2 mary has some marbles. john has 300 marbles more than mary. they have 1500 marbles altogether. how many marbles has mary? student a drew: student b drew: note. one mark was awarded for choosing “yes” for both students and no marks were given for any other responses. references akgün, l., & özdemir, m. e. (2006). students' understanding of the variable as general number and unknown: a case study. the teaching of mathematics, 9(1), 45-51. is student a correct? yes no is student b correct? yes no k. lee et al. 94 | f l r booth, j. l., & koedinger, k. r. (2010). facilitating low-achieving students’ diagram use in algebraic story problems. in s. ohlsson & r. catrambone (eds.), proceedings of the 32nd annual meeting of the cognitive science society (pp. 1649-1654). austin, tx: cognitive science society. brown, m., hart, k., & kuchemann, d. (1985). chelsea diagnostic mathematics tests and teacher's guide. windsor: nfer-nelson publishing company ltd. capraro, m. m., & joffrion, h. (2006). algebraic equations: can middle-school students meaningfully translate from words to mathematical symbols? reading psychology, 27(2-3), 147-164. doi: 10.1080/02702710600642467 carpenter, t. p., & levi, l. (2000). developing conceptions of algebraic reasoning in the primary grades. (res. rep. 00-2). madison, wi: national center for improving student learning and achievement in mathematics and science. retrieved from http://ncisla.wceruw.org/publications/reports/rr-002.pdf carraher, d. w., schliemann, a., brizuela, b. m., & earnest, d. (2006). arithmetic and algebra in early mathematics education. journal for research in mathematics education, 37(2), 87-115. dede, y. (2004). the concept of variable and identification its learning difficulties. educational sciences: theory & practice, 4(1), 50. duru, a. (2011). middle school students’ reading comprehension of mathematical texts and algebraic equations. international journal of mathematical education in science and technology, 42(4), 447468. doi: 10.1080/0020739x.2010.550938 flavell, j. h. (1970). developmental studies of mediated memory. in h. w. reese & l. p. lipsitt (eds.), advances in child development and child behavior (vol. 5). new york: academic press. fuchs, l. s., compton, d. l., fuchs, d., powell, s. r., schumacher, r. f., hamlett, c. l., et al. (2012). contributions of domain-general cognitive resources and different forms of arithmetic development to pre-algebraic knowledge. developmental psychology, 48(5), 1315-1326. doi: 10.1037/a0027475 harnishfeger, k. k., & bjorklund, d. f. (1990). children’s strategies: a brief history. in d. f. bjorklund (eds.), children’s strategies: contemporary views of cognitive development. hillsdale, nj: lawrence erlbaum associates. hefferman, n., & koedinger, k. r. (1997). the composition effect in symbolizing: the role of symbol production versus text comprehension. in m. g. shafto & p. langley (eds.), proceedings of the nineteenth annual conference of the cognitive science society (pp. 307-312). mahwah, nj: lawrence erlbaum associates. hu, w. (2010). making math lessons as easy as 1, pause, 2, pause ... the new york times. retrieved from http://www.nytimes.com/2010/10/01/education/01math.html?_r=0 hunter, j. (2007). relational or calculational thinking: students solving open number equivalence problems. in j. watson & k. beswick (eds.), proceedings of the 30th annual conference of the mathematics education research group of australasia (vol. 2, pp. 421-429). adelaide: merga. khng, k. h., & lee, k. (2009). inhibiting interference from prior knowledge: arithmetic intrusions in algebra word problem solving. learning and individual differences, 19(2), 262-268. doi: 10.1016/j.lindif.2009.01.004 kieran, c. (1981). concepts associated with the equality symbol. educational studies in mathematics, 12(3), 317-326. doi: 10.1007/bf00311062 knuth, e. j., stephens, a. c., mcneil, n. m., & alibali, m. w. (2006). does understanding the equal sign matter? evidence from solving equations. journal for research in mathematics education, 37(4), 297-312. koedinger, k. r., alibali, m. w., & nathan, m. j. (2008). trade-offs between grounded and abstract representations: evidence from algebra problem solving. cognitive science: a multidisciplinary journal, 32(2), 366-397. doi: 10.1080/03640210701863933 koedinger, k. r., & nathan, m. j. (2004). the real story behind story problems: effects of representations on quantitative reasoning. journal of the learning sciences, 13(2), 129-164. koedinger, k. r., & terao, a. (2002). a cognitive task analysis of using pictures to support pre-algebraic reasoning. in c.d. schunn & w. gray (eds.), proceedings of the twenty-fourth annual conference of the cognitive science society (pp. 542-547). mahwah, nj: lawrence erlbaum associates. k. lee et al. 95 | f l r küchemann, d. (1978). children's understanding of numerical variables. mathematics in school, 7(4), 2326. kwokwc. (2011). not able to use algebra in primary level. retrieved from http://www.kiasuparents.com/kiasu/forum/viewtopic.php?f=27&t=18728 lee, k., lim, z. y., yeong, s. h., ng, s. f., venkatraman, v., & chee, m. w. (2007). strategic differences in algebraic problem solving: neuroanatomical correlates. brain research, 1155 (june), 163-171. doi: 10.1016/j.brainres.2007.04.040 lee, k., ng, s. f., bull, r., pe, m. l., & ho, r. h. m. (2011). are patterns important? an investigation of the relationships between proficiencies in patterns, computation, executive functioning, and algebraic word problems. journal of educational psychology, 103(2), 269-281. doi: 10.1037/a0023068 lee, k., yeong, s. h. m., ng, s. f., venkatraman, v., graham, s., & chee, m. w. l. (2010). computing solutions to algebraic problems using a symbolic versus a schematic strategy. zdm, 42(6), 591-605. doi: 10.1007/s11858-010-0265-6 lim, b. t. (2007). can algebra be used to solve psle maths problems, the strait times. retrieved from http://www.moe.gov.sg/media/forum/2007/forum_letters/20070217.pdf macgregor, m., & stacey, k. (1997). students' understanding of algebraic notation: 11-15. educational studies in mathematics, 33(1), 1-19. doi: 10.1023/a:1002970913563 mayer, r. e. (1989). systematic thinking fostered by illustrations in scientific text. journal of educational psychology, 81(2), 240. mayer, r. e. (2002). multimedia learning. psychology of learning and motivation, 41, 85-139. meter, p., & garner, j. (2005). the promise and practice of learner-generated drawing: literature review and synthesis. educational psychology review, 17(4), 285-325. doi: 10.1007/s10648-005-8136-3 mullis, i. v. s., martin, m. o., gonzalez, e. j., & chrostowski, s. j. (2004). timss 2003 international mathematics report: findings from iea's trends in international mathematics and science study at the fourth and eighth grades. chestnut hill, ma: boston college. nathan, m. j., & koedinger, k. r. (2000). teachers' and researchers' beliefs about the development of algebraic reasoning. journal for research in mathematics education, 31(2), 168-190. doi: 10.2307/749750 national council for teachers of mathematics. (2000). principles and standards for school mathematics. reston, va: national council for teachers of mathematics. national mathematics advisory panel. (2008). foundations for success: the final report of the national mathematics advisory panel. washington, dc: u.s. department of education. ng, s. f. (2003). how secondary two express stream students used algebra and the model method to solve problems. the mathematics educator, 7(1), 1-17. ng, s. f., & lee, k. (2005). how primary five pupils use the model method to solve word problems. the mathematics educator, 9(1), 60-84. ng, s. f., & lee, k. (2008). as long as the drawing is logical, size does not matter. the korean journal of thinking & problem solving, 18(1), 67-82. ng, s. f., & lee, k. (2009). model method: singapore children's tool for representing and solving algebra word problems. journal for research in mathematics education, 40(3), 282-313. ng, s. f., lee, k., ang, s. y., & khng, f. (2006). model method: obstacle or bridge to learning symbolic algebra. in w. bokhorst-heng, m. osborne & k. lee (eds.), redesigning pedagogies (pp. 227-242). ny: sense. oecd (2010). pisa 2009 results: executive summary. retrieved from http://www.oecd.org/pisa/pisaproducts/46619703.pdf philipp, r. (1992). the many uses of algebraic variables. mathematics teacher, 85, 557-561. stacey, k., & macgregor, m. (1999). learning the algebraic method of solving problems. the journal of mathematical behavior, 18(2), 149-167. steinberg, r. m., sleeman, d. h., & ktorza, d. (1991). algebra students' knowledge of equivalence of equations. journal for research in mathematics education, 22(2), 112-121. swafford, j. o., & langrall, c. w. (2000). grade 6 students' preinstructional use of equations to describe and represent problem situations. journal for research in mathematics education, 31(1), 89-112. k. lee et al. 96 | f l r tolar, t. d., lederberg, a. r., & fletcher, j. m. (2009). a structural model of algebra achievement: computational fluency and spatial visualisation as mediators of the effect of working memory on algebra achievement. educational psychology: an international journal of experimental educational psychology, 29(2), 239-266. usiskin, z. (1988). conceptions of school algebra and uses of variables. in a. f. coxford & a. p. schulte (eds.), the ideas of algebra (pp. 8-19). reston, va: national council of teachers of mathematics. warren, e., & cooper, t. (2005). introducing functional thinking in year 2: a case study of early algebra teaching. contemporary issues in early childhood, 6(2), 150-162. warren, e., & cooper, t. j. (2009). developing mathematics understanding and abstraction: the case of equivalence in the elementary years. mathematics education research journal, 21(2), 76-95. wei, w., yuan, h. b., chen, c. s., & zhou, x. l. (2012). cognitive correlates of performance in advanced mathematics. british journal of educational psychology, 82(1), 157-181. doi: 10.1111/j.20448279.2011.02049.x frontline learning research vol. 10 no. 2 (2022) 64 85 issn 2295-3159 corresponding author: mari murtonen, assistentinkatu 5, 20500 turku, finland, mari.murtonen@utu.fi doi:https://doi.org/10.14786/flr.v10i2.1031 university teachers’ focus on students: examining the relationships between visual attention, conceptions of teaching and pedagogical training mari murtonena, erkki antoa, eero laakkonena & henna vilppua auniversity of turku, finland article received 21 january 2022 / revised 9 december 2022/ accepted 9 december 2022 / available online 11 january 2023 abstract teachers’ focus on their students’ learning is considered central in high-quality, studentcentred university teaching. this frontline eye-movement research asks whether teachers’ focus can be observed at the intersection of the visual and conceptual levels. it introduces a novel way to study teachers’ visual attention combined with verbal interpretations, including numerical ratings of the success of teaching when they observe teaching situations. teachers’ visual attention and interpretations were further studied in connection to their prior pedagogical training and teaching experience in years. two short videos depicting teaching during a lecture, including different types of trigger events, were presented to teachers (n = 49) who were asked to think aloud while watching. the first video’s trigger was students becoming bored during a content-focused teaching situation, and the second video’s trigger was the teacher replying in an engaging way to students’ questions in a learning-focused teaching situation. the results showed that pedagogically trained teachers paid more visual attention to the students than did their non-trained colleagues, especially in content-focused teaching situations. teaching experience did not have any effect on visual attention or interpretation in this study. the teachers who paid more visual attention to the students in the content-focused teaching situation noticed in their interpretations that the students were not active, expressed higher learning-facilitating teaching conceptions and gave lower numerical ratings for the teaching situation. in conclusion, pedagogical training seems to promote university teachers’ ability to pay visual attention to students in teaching situations and interpret these situations from the students’ perspective, i.e. focus on student learning. keywords: visual attention, conceptions of teaching, university pedagogical training, facilitation of learning, eye tracking mailto:mari.murtonen@utu.fi murtonen et al 65 | f l r introduction to achieve better learning outcomes, university teachers are expected to focus on their students’ learning to be able to support it instead of merely focusing on delivering the content to them (prosser & trigwell, 2014; vilppu et al., 2019). focusing on students’ learning is often used as a synonym for good teaching that acknowledges and answers students’ needs to foster learning. learning-focused or student-centred teaching is often called for; however, it is not yet very clear what focusing on students’ learning means for teaching. focusing on students’ learning can take place at many levels, starting with the curriculum and building the environment and courses such that they support students’ learning actions (entwistle, 2005). however, what it means in a teaching situation in which a teacher and students are present remains unclear. questions such as how university teachers monitor and gain knowledge about their students during a teaching situation, e.g. a lecture, to guide their teaching actions have remained unanswered. teachers’ readiness to facilitate university students’ learning has been studied in terms of their conceptions of teaching and learning (samuelowicz & bain, 2001) as well as their approaches to teaching (prosser & trigwell, 2014). a relationship between teachers’ and students’ approaches has been found (gibbs & coffey, 2004), indicating that teachers’ approaches do have an effect on their teaching and on their students’ learning. the studies on university teachers’ conceptions and approaches use the methods of self-report questionnaires and interviews, which reveal only some aspects of teaching, such as their intentions (trigwell & prosser, 2004) or their underlying orientations and beliefs (samuelowicz & bain, 2001). more knowledge is needed about how teachers perceive, interpret and make decisions in certain teaching situations (blömeke et al., 2015). in secondary school settings, eye-movement studies have revealed interesting information about expert teachers’ perceptions compared to novices. for example, experts tend to look longer at students (mcintyre et al., 2017) and focus their attention on areas where relevant information is available (wolff et al., 2016). in general, previous studies have claimed that novice teachers are not able to focus on students’ learning as deeply as more experienced teachers (e.g. levin et al., 2009), and they may use only bottom-up visual noticing instead of knowledge-based top-down processes that allow shifting of attention from attention-capturing events to pedagogically meaningful events (e.g. theeuwes, 2000). we lack this type of information concerning university teaching, that is, how more experienced or better educated teachers differ from novices and on what levels, besides conceptions and approaches, differences may occur. the teaching situations in higher education are different from those in secondary school classrooms; thus, research on higher education settings is needed. this study aimed to discover whether focusing on students can be observed at the visual level and whether visual attention is related to different teaching conceptions. by using eye-tracking measurements and retrospective think-aloud, we investigated how university teachers perceive and interpret different kinds of teaching situations. in addition, the effects of prior pedagogical training and teaching experience were studied. we begin the paper by describing university teachers’ expertise requirements and the teaching context and move on to consider what is meant by learning-focused teaching at the university level. we then proceed to the question of focus on the visual level, i.e. visual attention and finally propose video cases as a method to gain deeper insight into university teaching. teachers’ pedagogical expertise in the university context teachers’ professional learning of pedagogy, i.e. the development of their pedagogical expertise, can be understood as a complex process whereby changes in knowledge, orientation and skills pertain to one’s conception of teaching and actions as a teacher (garner & kaplan, 2019). these changes often require a change in the teacher’s identity as well. the teacher’s core identity has traditionally been defined as a subject expert murtonen et al 66 | f l r who transmits subject knowledge to students, while contemporary views of teaching highlight the role of the teacher as a learning process expert who fosters active, self-regulated and collaborated learning in students (vermunt et al., 2017). university teachers are typically highly educated experts in their own subject domain, but they often lack pedagogical qualifications, unlike their colleagues in primary and secondary schools. thus, although university teachers excel in the content knowledge of their own discipline, they may lack pedagogical knowledge. in addition, they may lack pedagogical content knowledge, which refers to how pedagogical knowledge can be implemented in their own disciplinary areas (shulman, 1987). it is problematic that university teachers have no pedagogical training, since, according to expertise research, excelling in one’s own disciplinary subject does not necessarily make one an expert in teaching the subject (e.g. ericsson, 2008). knight (2002) argued that without pedagogical training, it is typical for university teachers to adopt their own teachers’ teaching style, even though they know it might not be the best way to promote student learning. persuading teachers who have been teaching at university for a long time to take part in pedagogical training can be difficult. in addition, teachers with extensive teaching experience can be reluctant to change their teaching conceptions and practices (postareff & nevgi, 2015). in contrast, pedagogical training at the beginning of a university teaching career can be very effective (e.g. vilppu et al., 2019). thus, teachers’ pedagogical expertise levels may vary greatly in the university context. the university teaching environment is unique and different from those of primary and secondary schools. according to doyle (2006), a school classroom situation includes features such as a large quantity of events and tasks taking place multidimensionally and simultaneously (e.g. interruptions and other unpredictable situations that require immediate attention). it also includes a common set of experiences that form a history for the class and have an impact on future events. compared to this, a traditional university lecture could be described as a more unidirectional situation, with the teacher lecturing and usually no surprises occurring; in addition, it likely includes certain norms and traditions concerning the university culture for students about how to behave in a lecture. at the university, students and the place where teaching takes place may be different in every lecture, and the teacher often has neither a common history with the students nor personal contact with them. this creates a unique environment in which the research results from other educational levels cannot be directly applied. focusing on students at the level of conceptions and approaches university teachers’ pedagogical expertise has mostly been studied from the perspective of their conceptions of and approaches to teaching. university teachers’ conceptions of teaching have been found to vary between teaching as facilitating learning and teaching as transmitting knowledge (kember & kwan, 2000; samuelowicz & bain, 2001). the former conception includes the idea that the most important task in teaching is to support students’ learning processes and to create learning environments that ‘scaffold’ learning, whereas the transmission conception indicates that the most important task in teaching is to deliver information to students. teachers’ approaches to their teaching, i.e. the strategies they adopt, have been categorised into learning-focused and content-focused approaches (postareff & lindblom-ylänne, 2008). the learning-focused approach refers to teaching strategies in which the teacher’s aim is to foster students’ deep learning processes by activating their knowledge construction. in contrast, with the content-focused approach, the teacher’s intention is to transmit knowledge to students without attempting to activate them. a rather high correspondence between conceptions and approaches seems to exist. teachers who consider teaching as transmitting knowledge tend to adopt a content-focused approach to teaching, whereas teachers murtonen et al 67 | f l r who view teaching as supporting students in building their own understanding more often adopt a learningfocused approach to teaching (kember & kwan, 2000). teachers’ approaches to teaching have been shown to relate to their students’ approaches to learning (gibbs & coffey, 2004; prosser & trigwell, 2014; uiboleht et al., 2018), indicating that a learning-focused approach to teaching encourages the use of a deep approach to learning. students’ adoption of a deep approach to learning seems to indicate that they will achieve higherquality learning outcomes (uiboleht et al., 2018). thus, teachers’ actions and conceptions seem to have important effects on the success of student learning. focusing on student learning and being able to support their learning process also requires skills other than understanding how learning happens, along with an intention to support it. for example, engaging students in lectures has been shown to be an important medium for focusing on students and fostering their active learning (lonka & ketonen, 2012). many university teachers still use the traditional lecturing approach, with the dominant view of teaching as the transmission of knowledge, probably because they teach the way they were taught (knight, 2002). this unidirectional method of lecturing does not give the lecturer much information about students’ learning. teaching methods that engage students are seen as central in giving teachers information about students’ prior knowledge, goals and motivation in studying and motivating students to learn in more depth (lonka & ketonen, 2012). to apply engaging teaching methods, a teacher needs to be sensitive to students’ nonverbal messages in a teaching situation. to be able to monitor, react to different situations and support students’ learning processes, teachers need pedagogical knowledge and skills that guide their own teaching. according to expertise studies, a skill needs to be deliberately practiced to develop (ericsson, 2008). thus, practicing teaching without deliberate training probably does not help university teachers develop; they need pedagogical training to acquire pedagogical expertise. current pedagogical training for university teachers aims to facilitate conceptions of teaching that enhance learning and learning-focused approaches to teaching, and there is evidence that this training has a positive effect on teachers’ conceptions and approaches to teaching (e.g. postareff et al., 2007; stes & van petegem, 2011). focusing on students at the visual level and noticing important events in addition to university teachers’ focusing on their students’ prior knowledge, intentions, goals and study progress, teachers need visual information about their students during a teaching session to be sensitive to their nonverbal messages concerning their learning. focusing on students at the visual level means paying visual attention to students. eye-movement studies offer information about where viewers focus their attention and how they process classroom situations when observing teaching (wolff et al., 2016). eye movements as such are not sufficient to provide information about teachers’ thinking, since there is only a hypothesis about the connection between eye and mind (e.g. just & carpenter, 1980), but when combined with teachers’ verbal interpretations, they can help us to understand teachers’ thoughts. while focusing means deliberately paying attention to the overall situation, noticing means that one actually perceives an important event when it happens. noticing refers to the ability to focus attention on events that are pertinent to teaching and learning (grub et al., 2020), and knowledge-based reasoning implies the ability to apply knowledge about teaching and learning to interpret these events as well as the ability to draw relevant conclusions. lachner et al. (2016) argued that the skills in both noticing and interpreting are knowledge-based in that teachers’ knowledge guides their attention and interpretation of crucial events. compared to novices, expert teachers possess more extensive, elaborate and coherently organised knowledge structures. through teaching experience, teachers integrate formal professional knowledge with their personal murtonen et al 68 | f l r and practical knowledge, thus strengthening their ability to perform effectively (wolff et al., 2021). the differences between experienced and novice teachers’ visual processing of classroom information have mainly been studied at the primary (e.g. pouta et al., 2020) and secondary school levels (e.g. mcintyre et al., 2017; stahnke & blömeke, 2021; wolff et al., 2016). the level of expertise has been shown to influence the noticing and interpretation of classroom events. for example, expert teachers’ noticing is efficient (mcintyre et al., 2017) and knowledge-based, covers wide areas (wolff et al., 2016) and is focused on students (e.g. van den bogert et al., 2014; stahnke & blömeke, 2021). novices, on the other hand, tend to engage in a more timeconsuming and rather indiscriminate search for information (e.g. wolff et al., 2016). furthermore, with regard to knowledge-based reasoning, novices tend just to describe classroom events, while experts explain and integrate the meaning behind what they see (e.g. wolff et al., 2017). a novice may notice only so-called bottom-up events that capture visual attention, while experts use knowledge-based top-down processes that allow them to shift their attention from attention-capturing events to pedagogically meaningful events (e.g. theeuwes, 2000). utilising videos in studying university teachers’ focus on students during the last decade, video-based assessment has become frequent in both teacher training and teacher training research (dunekacke et al., 2015; gaudin & chaliés, 2015), and many studies focusing on visual processes utilise classroom videos. there are many advantages to using video assessment: it provides a standardised measurement, it is close to the complex reality of pedagogical situations, and, due to this perceived authenticity, it is usually considered motivating and highly accepted by the participants. further, compared to written or still picture cases, videos can integrate both verbal and nonverbal information, such as facial expressions, gestures, movements, postures and even emotional states. scripted videos also enable the inclusion of trigger events (könig et al., 2014), i.e. pedagogically meaningful events, which the teacher should notice in order to be successful in learning-focused teaching. as a research method, video assessments may also avoid problems commonly related to self-report measures, such as those inherent in likert scale questionnaires and interviews, which rely on self-perception and are thus prone to credibility issues (see vilppu et al., 2019). higher education teachers’ focus on students has usually been studied through their conceptions and approaches with fairly traditional self-report measures, such as questionnaires and interviews (e.g. postareff & lindblom-ylänne, 2008; trigwell & prosser, 2004), which do not necessarily measure actual teaching practices, but aims and beliefs concerning them. thus, new methodological perspectives and tools would allow for new knowledge of teachers’ pedagogical expertise. considering the varied and constantly changing university teaching situations and differing backgrounds of university teachers and students, analysing practical teaching situations to obtain general knowledge of how teachers’ teaching actions correspond to their conceptions would be challenging. therefore, viewing and interpreting videotaped teaching situations could offer a methodology for approaching this question. recently, many studies have integrated eye-tracking methodology into video viewing (e.g. wolff et al., 2016; wyss et al., 2020), and thus, have also focused on visual processes. the viewer’s attention is central to how classroom situations are visually processed, where viewers’ eye movements offer insight (wolff et al., 2016). because of the link between where the eyes are gazing and what the mind is engaged with (the eye–mind hypothesis; see just & carpenter, 1980), viewers’ eye fixation patterns can be used to investigate their ongoing mental processes during viewing. however, eye tracking has its limitations, and as such, it does not necessarily relate anything about how the viewer comprehends the scene. thus, eye-movement data require many murtonen et al 69 | f l r inferences about the underlying cognitive processes, since they do not explain why a viewer was looking at certain representations (van gog et al., 2009). to reduce the number of researchers’ inferences, complementary methods, such as concurrent or retrospective reporting, are utilised alongside eye movements. present study this study aimed to extend knowledge about university teachers’ focus on their students by examining whether the focus can be observed at the visual level in addition to the conceptual level. we studied teachers’ conceptions of teaching with regard to their visual attention, noticing the trigger event and rating the success of the observed teaching. furthermore, the effects of pedagogical training and teaching experience on visual attention and teaching conceptions were examined. comparisons were made between pedagogically trained vs. untrained and novice vs. more experienced teachers. the research questions of the study were as follows: 1) to what extent do university teachers pay visual attention to students in comparison with the other two central elements of a lecture, the teacher and the slides, when watching videotaped teaching situations with inbuilt trigger events? 2) do teachers notice inbuilt trigger events in videos by paying attention to students at the intersection of the visual level and teaching conceptions? are these in line with their numerical ratings of the success of teaching situations? 3) how are teachers’ prior pedagogical training and the length of their teaching experience connected with their visual attention to students and their conceptions of teaching? since pedagogical training aims to foster learning-facilitating conception of teaching, we assumed that pedagogically trained teachers’ interpretations of the videos would reflect a stronger learning-facilitation conception of teaching than their untrained colleagues, due to their more sophisticated knowledge base (lachner et al., 2016). the main hypothesis for this study was that pedagogically trained teachers would pay more visual attention to students than their untrained colleagues would, especially in a situation where topdown processes would be needed to shift teachers’ attention to important phenomena (theeuwes, 2000). furthermore, we used numerical ratings as evaluations given by teachers on teaching situations to confirm that our analysis of their interpretation was correct. thus, the ratings needed to be in alignment with the interpretations. the role of teaching experience might be more ambiguous among university teachers than among primary and secondary school teachers, who are all pedagogically trained. work experience alone does not help people develop their expertise, but deliberate practices such as training are needed to gain high-level skills (ericsson, 2008). from this standpoint, we assumed that the length of previous teaching experience would not strongly differentiate teachers in their visual attention and conceptions of teaching but that previous pedagogical training would promote paying more attention to students’ learning both visually and verbally (see figure 1). murtonen et al 70 | f l r figure 1. hypothesised model of the connections between visual attention and conceptions of teaching among university teachers methods participants the target group of the study comprised university teachers and doctoral students who either had or did not yet have teaching tasks at the university. they had all applied for voluntary university pedagogy training (n = 51). the measurements took place in the beginning of the training. watching the video vignettes was part of the training, but the trainees could choose whether they wanted to take part in the study. thus, participation in the study was voluntary, and informed consent was obtained from the participants. ethical approval for the study was granted by the ethics committee for human sciences of the target university. as two members of the target group declined to participate, the response rate was 94%. thus, the number of participants was 49. twenty (42%) participants had earlier pedagogical training, which varied from a university pedagogy course bearing 1 study credit (according to the european credit transfer and accumulation system, ects) to a subject teacher degree bearing 60 ects, whereas the rest had no previous pedagogical training. the participants represented seven faculties. their teaching experience varied, but most of them had been teaching at the university for less than 10 years: nine (19%) had no teaching experience, 14 (29%) had a maximum of 2 years’ teaching experience, 13 (27%) had been teaching for 2 to 5 years, 10 (21%) for 5 to 10 years, and two (4%) had been teaching for over 10 years in at least one course per academic year. information on one participant’s faculty and teaching experience was missing from the data. those doctoral students who had no teaching experience at the university were considered prospective teachers who might be given teaching duties in the near future. due to the consistency of the sample, teachers were divided into novices (n = 23, with no teaching experience or a maximum of 2 years) and more experienced teachers (n = 25, with more than 2 years of teaching experience) for further analysis. apparatus and materials a tobii tx300 eye tracker (tobii technology, inc., falls church, va, usa) was used to collect the participants’ eye movements. the eye-tracking component was integrated into a 23-inch high-resolution monitor, with a maximum resolution of 1920 × 1080 pixels. the eye-tracking camera sampled data binocularly at a rate of 300 hz, with a reported gaze accuracy of 0.4°. to ensure that participants were as comfortable as possible while watching the video vignettes, no supporting chinrest was used, since the eye tracker allowed even large head movements. two custom-made videos were used in the study, with actors as teachers and students. the videos were designed and filmed by the researchers of this paper, who were familiar with the local lecturing culture. both videos shared the same simple layout, and they were filmed from the same angle and from an outsider’s perspective of the classroom. in both videos, there was a scene in which students were sitting on the left, the teacher was standing in the middle and the screen was on the right (see figure 2). the first video was 1 minute 33 seconds in duration, and the second was 1 minute 36 seconds; both depicted a situation in the middle of a lecture. to focus on the targeted constructs, the videos were scripted (könig et al., 2014) by a group of experienced university pedagogy educators and researchers. they aimed to represent typical and realistic university murtonen et al 71 | f l r teaching–learning situations, since the perceived authenticity of the video material was important (seidel et al., 2011). research methodology was chosen as the topic of teaching for both videos, since it was considered to be quite domain-general, neutral and equally understandable for teachers from different disciplines. the videos were filmed from the side to allow the three important elements of the setting (teacher, student and slides) to be clearly visible. only a few students were shown on the video to reduce the number of spontaneous movements that could draw observers’ attention. both videos incorporated a pedagogically interesting situation, a so-called trigger event (see also vilppu et al., 2019). we expected these built-in pedagogical events to trigger certain reactions and interpretations in teachers depending on their conceptions of teaching. both videos were scripted according to the relevant literature (e.g. postareff & lindblom-ylänne, 2008) to describe a content-focused (cfts) and a learning-focused teaching situation (lfts). the first video presented cfts, in which a teacher is lecturing about devising interview questions. she is very focused on transmitting her topic and is not paying any attention to the audience. the students are sitting and looking bored. one is yawning, another is tapping her phone and some are conversing with each other. the trigger in the first video was the teacher totally ignoring the students and their off -task behaviour. the situation is reminiscent of a typical situation requiring classroom management (wolff et al., 2021) and thus noticing that students are not attending to the lecture. the second video presented an lfts, in which a teacher is lecturing about observation as a research method when a student interrupts her with a question concerning the ethics of observation. the teacher thinks a while and then prompts the students to have discussions in pairs for a few minutes. the trigger here was the teacher’s positive reaction to the student’s question, followed by engaging the students instead of directly answering the question by herself. thus, there is space and flexibility for changes in her teaching plan; the teacher sees students as active participants and relies on their ability to find the answer and process the knowledge themselves (postareff & linblom-ylänne, 2008). additionally, the teacher’s positive reaction to the interruption implies a good, safe atmosphere in the seminar room. the order of presentation of the videos was selected due to the assumed priming effect of the lfts video, meaning that after seeing the teacher’s behaviour of engaging the students, the participants would be more likely to notice the missing engagement in the cfts video. procedure the data collection procedure began with an orientation (see table 1). for each participant, the eye tracker was calibrated using 9-point calibration at the start of the data collection. to maximise calibration accuracy, participants were requested to take as comfortable a position as possible to prevent changes in position during the recording. participants sat approximately 60 cm from the screen on a manually adjustable chair. after calibration, they were given instructions regarding the session. the participants were told that they would be watching different lecturing situations and rating them from the viewpoint of teaching and learning. on the second watching of each video, they would be asked to think aloud about their interpretation of the situation. after the instructions, the participants watched a rehearsal video and practiced the think-aloud procedure. table 1. the study’s procedure orientation video viewing questionnaire cfts-video lfts-video murtonen et al 72 | f l r content-focused teaching situation learning-focused teaching situation calibration + instructions + rehearsal video first watch + rating + second watch and simultaneous think-aloud + rating and explanation first watch + rating + second watch and simultaneous think-aloud + rating and explanation background questions after the orientation, the actual data collection started. the participants watched both videos twice in the same order. after the first viewing, they rated the situation from the viewpoints of teaching and learning on a scale from 1 to 5 (1 = very poor, 2 = poor, 3 = moderate, 4 = good and 5 = very good). the second viewing took place immediately after the rating. they received the following prompt: “now, you are asked to watch the previous situation again and simultaneously think aloud about your interpretation of it. explain what is going on from the viewpoint of teaching and learning.” they also had a change to correct their rating. such an open approach was considered advantageous, since it is in no way preconditioned by the researchers and thus purely elicits the viewer’s perspective (kaiser et al., 2015). during the second viewing, the video’s sound was muted so that it would not interfere with the think-aloud process. if there were prolonged silences at the beginning of the viewing, the participants were prompted to verbalise what they were thinking about in the situation. they were allowed to continue their verbalisations even after the video vignette ended. since the think-aloud took place during the second watch, it can be considered retrospective; however, it was conducted without a gaze overlay of the first watch as a cue. the participants’ eye movements were recorded each time they viewed the videos. however, only the eye movements of the first viewing were used in the analyses, since these were considered to represent so-called “pure” viewing; as such, they are comparable to an authentic teaching situation as a one-time event without the possibility of reviewing the situation. after finishing the video viewing, the participants answered a short background questionnaire. due to calibration problems and common problems with eye-tracking data quality, such as data loss (holmqvist et al., 2011), the data of only 41 participants in the cfts video and 40 participants in the lfts video were available for the analyses out of the total of 49 participants. the percentage of gaze samples with at least one eye detected was 82.92 for the cfts video and 82.07 for the lfts video. analysis analysis of visual attention when watching teaching situations the participants’ viewings of the videos were analysed using tobii studio version 3.4.5. (tobii ab, danderyd, sweden). additionally, the numerical data from tobii studio were transferred to the ibm statistical package for the social sciences (spss), version 25 (ibm corp., armonk, ny, usa), which was used for further analyses. the videos were divided into areas of interest (aois), that is, the regions in the stimulus from which the authors were interested in gathering data (holmqvist et al., 2011). since both videos depicted the same scene, the same aois were used on the students, the teacher and the slides (see figure 2). murtonen et al 73 | f l r figure 2. the common aois used in both videos with an example scan path fixations and saccades as eye-tracking parameters are thought to reflect voluntary, overt visual attention (e.g. duchowski, 2007). the intake of visual information from the environment is assumed to happen largely during fixations (kok & jarodzka, 2016), which usually reflect the desire to focus attention on a certain object of interest. thus, fixations were considered useful for identifying where teachers focused their attention. the sum of fixation durations on each aoi was chosen to analyse the visual attention of the participants, i.e. for how long they had watched each aoi. as we were interested in the division of fixation time for each participant between different aois, the sum of fixation durations on each aoi was used to calculate the percentage share of fixation time for each aoi. thus, we would get a viewing profile of each participant (i.e. how much they would, in terms of percentage, fixate on the students, the teacher and the slides). we expected more fixations on the relevant regions to indicate deeper cognitive processing or the importance of a region (e.g. grub et al., 2020). the so-called white space, i.e. the visual attention on areas other than aois, was not considered when calculating the provision of fixations. this decision was based on descriptive statistics showing that the number of white space fixations was minimal. the eye-tracking data were normally distributed, thus enabling the use of independent samples t-tests. analysis of video interpretations the think-aloud protocols were transcribed verbatim and analysed qualitatively using nvivo 12 software (alfasoft ab, göteborg, sweden). the analyses were performed by the second and last authors, who are pedagogically qualified teachers and researchers in the field. theory-based content analysis was used to analyse the interpretations of the triggers. the structure of the coding scheme continuum was derived from the theory of teaching conceptions (e.g. kember & kwan, 2000; samuelowicz & bain, 2001). in the continuum from 1 to 5, 1 represented a strong knowledge-transmission conception and 5 represented a strong learningfacilitation conception of teaching. in the analysis of the lfts video, the scale was skewed towards the knowledge transmission end of the continuum, since critical or knowledge transmission reflecting comments on that video were scarce. the descriptions of each category were based on what emerged from the think murtonen et al 74 | f l r aloud protocols. both the think-aloud during the viewing and the summing up of the ratings after the video viewing were analysed when deciding to which category each participant’s answer belonged. multiple rounds of open coding were conducted to reach the current coding scheme (see table 2). table 2. coding scheme of the video interpretations: teachers’ reactions to the trigger event from the perspective of knowledge-transmission and learning-facilitation conceptions. cfts video (content-focused teaching) trigger: students not attending lfts video (learning-focused teaching) trigger: teacher engaging students category description description 1 = reflecting strong knowledgetransmission conception does not notice the trigger. praising the teaching or focusing on the presentation. interpretation of the trigger from the knowledge-transmission perspective. the teacher performs poorly, since the structure of the teaching suffers from a student’s interruption or the teacher should give a clear answer to the student’s question. 2 = reflecting knowledgetransmission conception notices the trigger but does not suggest that the teacher should react to it. if suggestions for improvement are given, they are related to the presentation (e.g. there should be more pictures in the slides). mere description of the situation without taking a positive stand on teacher’s learningfocused performance or neutral interpretation. 3 = reflecting characteristics of both conceptions notices the trigger and suggests that the teacher should do something (e.g. have a break or somehow get students’ attention), but no clear mentioning of supporting student learning. a superficially positive view of the situation/the teacher’s performance is good (no arguments given). no specific mention of the trigger. 4 = reflecting learning-facilitation conception notices the trigger. suggestions for improvement are related to facilitating students’ learning (e.g. engaging/motivating students, increasing interaction). the teacher’s performance is considered good because she reacts positively to the student’s question (the trigger); mentions facilitation of learning. 5 = reflecting strong learning-facilitation conception strongly notices the trigger. teaching is considered very poor since the students are not learning. suggestions for improvement are related to students’ learning (e.g. engaging students, fostering their own thinking). the interpretation is given clearly from the viewpoint of learning. praises teacher’ reaction to the trigger. mentions students’ knowledge building or the pedagogy behind not answering the student directly. viewpoint of deep learning (noticing that the teacher is changing their original plan to answer students’ needs and interests). in the following, citation examples are presented to illustrate the classes in the coding scheme. participants are referred to as p and the identification code, such as p1. the next example of an interpretation of the cfts video was classified in category 2, reflecting a knowledge-transmission conception, since the participant noticed the trigger of the students not focusing but did not suggest that the teacher should do anything about it. i see that the students are looking quite bored and concentrating on their own business. … i don’t know why. i don’t think it is the style of teaching, but maybe the topic. … i don’t see anything special to criticise about the teacher’s actions; this is very typical university teaching. (p50) murtonen et al 75 | f l r in another example of the cfts video, the participant interpreted the trigger from the viewpoint of learning. this interpretation reflected a strong learning-facilitation conception of teaching (category 5): from the viewpoint of teaching, it clearly seems that, in this situation, the conditions for learning new things are not very good. … only a few students are following the situations and the teacher’s teaching style seems to be one in which her message doesn’t reach the students very well. (p47) the next excerpt from the interpretation of the lfts video was classified as reflecting a knowledgetransmission conception (category 2). the participant just described the situation, but neither indicated whether the teacher performed well nor interpreted the trigger from the viewpoint of teaching and learning: in this scenario, the lecturer was still giving this lecture, but this time, she considered the students’ interest in the topic and made them discuss it as group work. (p8) a second example from the lfts video was classified as reflecting a strong learning-facilitation conception (category 5). in this excerpt, the teacher’s teaching actions, i.e. the trigger, were considered good, since she changed her original lecture plan according to what the students showed interest in. … the teacher was able to compromise her original lecturing plan, and when there was a question, instead of directly answering it, she made the students ponder it and this way they would have a more concrete learning experience. (p21) interrater reliability was calculated for the interpretations of the triggers for 25% of the data using cohen’s weighted kappa. substantial agreement was reached for both videos, indicating fair reliability (cfts video: 66.67%, weighted kappa = 0.739; lfts video: 58.33%, weighted kappa = 0.639). analysis of the connections between targeted concepts spearman’s correlations were utilised to examine the relations between targeted concepts. the rating scale of the cfts video was reversed so that it would be comparable to the ratings of the lfts video. a path analysis was conducted using the mplus software (version 8.4, muthen & muthen, 2019) to portray the possible causal linkages between the target variables to better understand the processes and mechanisms behind the phenomenon. path analysis was chosen because it allows for inferring and testing a sequence of causal links between variables of interest and examining the relationship between multiple predictor and criterion variables simultaneously (barbeau et al., 2019). missing values were handled by employing full information maximum likelihood (fiml) in the model estimations. fiml can handle missing data (mar) in an optimal way (muthén & muthén, 2017). results teachers’ visual attention in teaching situations first, the teachers’ visual attention on both videos was examined. on the cfts video, the pedagogically educated teachers watched statistically significantly more at the students (cohen’s d = .77) and almost statistically significantly less at the teacher (cohen’s d = .64) than their untrained colleagues (table 3). the effect sizes were moderate (cohen, 1988). on the lfts video, the differences pointed in the same direction as on the cfts video but were not statistically significant. there were no statistically significant differences between the teaching experience groups in either video (see table 4). murtonen et al 76 | f l r table 3. percentage share of fixation time for each aoi in teaching situations in the cfts and lfts videos between pedagogically untrained and trained university teachers pedagogical training t(37 or 38) p no (n = 21–22) m, sd yes (n = 17–19) m, sd cfts video (content-focused teaching situation) aoi teacher (%) 33.70, 10.45 26.07, 13.35 2.02 .05 aoi students (%) 42.35, 17.95 56.16, 17.82 -2.44 .02* aoi slides (%) 23.95, 14.57 17.77, 10.18 1.54 .13 lfts video (learning-focused teaching situation) aoi teacher (%) 49.79, 9.49 45.50, 12.27 1.23 .23 aoi students (%) 35.66, 11.94 42.65, 15.49 -1.59 .12 aoi slides (%) 14.55, 7.19 11.85, 7.44 1.14 .26 note. the number of teachers varied in the videos due to missing data (cfts video: 21 untrained and 19 trained teachers; lfts video: 22 untrained and 17 trained teachers). *p < .05 table 4. percentage share of fixation time for each aoi in teaching situations in the cfts and lfts videos between novice and more experienced university teachers teaching experience t (3638) p 0–2 years (n = 17–18) m, sd >2 years (n = 21–23) m, sd cfts video (content-focused teaching situation) aoi teacher (%) 30.74, 11.64 29.26, 13.23 .37 .72 aoi students (%) 49.10, 17.28 49.79, 20.83 -.11 .91 aoi slides (%) 20.16, 9.12 20.94, 15.37 -.20 .84 lfts video (learning-focused teaching situation) aoi teacher (%) 48.68, 10.29 47.17, 11.57 .43 .67 aoi students (%) 39.51, 12.58 38.35, 15.31 .26 .80 aoi slides (%) 11.81, 5.37 14.48, 8.64 -1.17 .25 note. the number of teachers varied in the videos due to missing data (cfts video: 17 novice and 23 more experienced teachers; lfts video: 18 novice and 21 more experienced teachers). teachers’ verbal interpretations of teaching situations and triggers overall, the participants’ interpretations of the lfts video included more notions about learning facilitation (m = 4.08, sd = 1.00) than their interpretations of the cfts video (m = 3.16, sd = 1.11) (figure 3). no significant differences between trained and untrained teachers (cfts video: t(46) =.09, p = .93; lfts video: t(46) = -.67, p = .50), nor in relation to teaching experience (cfts video: t(46) = 1.67, p = .87; lfts video: t(46) = -.70, p = .49), were found concerning the interpretations. we assume that the lfts video was easier for the participants to interpret, since there were more happenings on the video, such as the teacher and the students being actively engaged in collaborative learning processes. thus, the lfts video trigger was able to capture watchers’ attention (bottom-up) and no shifting of attention elsewhere was needed (theeuwes, 2000). the cfts video where the lecturer was unidirectionally lecturing and the trigger was that students were passive resulted in more variation in teachers’ interpretations. murtonen et al 77 | f l r figure 3. the division of classified video interpretations of cfts (content-focused teaching situation) and lfts (learning-focused teaching situation) videos teachers’ numerical ratings of the teaching situations teachers’ ratings of the success of the teaching situation were higher overall concerning the lfts video (m = 4.33, sd = .56) than the cfts video (m = 3.43, sd = .74), showing that the teachers considered the lfts video to illustrate a better situation in terms of teaching and learning. no significant differences in ratings were found between untrained and trained teachers (cfts video: t(46) = .29, p = .77; lfts video: t(46) = .35, p = .73) or novice and more experienced teachers (cfts video: t(46) = -.23, p = .82). a path model of university teachers’ visual attention and interpretations in teaching situations finally, path analysis was conducted to portray the causal linkages between the target constructs. the cfts video was selected for the path analysis because it resulted in more variation in participants’ eye movements as well as their interpretations and ratings; thus, its explanatory power was expected to be stronger. the aoi of students was used as the basis of the model, since noticing students’ passivity was central in the cfts video. the correlations among the studied variables are shown in table 5. table 5. means, standard deviations and correlations (rs) among study variables 1 2 3 4 5 1. visual attention to students 1 .43** .36* .36* .04 2. interpretation 1 .47** -.01 -.04 3. rating 1 .00 .03 4. pedagogical training (1 = no, 2 = yes) 1 .17 5. teaching experience 1 m 49.34 3.16 3.43 sd 18.95 1.11 0.75 *p < .05, **p < .01 murtonen et al 78 | f l r the path analysis is depicted in figure 4. the fit indices indicate that the model fits the data well: χ2(3) = 2.30, p = 0.512, cfi = 1.00, tli = 1.00, srmr = .04, rmsea = .00, 90% ci = [0.00, 0.223]. note. ***p < .001, **p < .01, *p < .05, ns = not significant figure 4. the final structural model with standardised path coefficients (n = 47) in this model, two paths were examined: 1) the effects of pedagogical training on the rating given to the teaching situation and 2) the effects of pedagogical training on video interpretation. the first path did not result in any significant indirect effects. however, in the second path, the indirect effect of visual attention on students as a mediator between pedagogical training and video interpretation was significant: β = 0.231 (p = .019; 95% ci = [.051, .463]). in addition, an almost significant direct negative effect of pedagogical training on video interpretation was found: β = -0.267 (p = .050; 95% ci: [-.505, -.016]), meaning that if the teacher was not able to visually focus on students, she would not produce an accurate verbal interpretation. the relationships proposed in the model explain 16.2% of the variance in visual attention on students, 28.9% of the variance in video interpretation and 24.7% of the rating of the teaching situation. thus, pedagogical training seems to affect teachers’ visual attention to students, which is further associated with video interpretations and video ratings. in other words, pedagogically trained teachers gaze more at the students, and gazing at them further engenders more learning-focused interpretations and aligned ratings of the teaching situations. in addition, there was a small negative direct effect of pedagogical training on verbal interpretation, indicating that pedagogically educated teachers used fewer learning-focused explanations of the situation if they did not pay visual attention to the students. thus, visual attention seems to be central to interpreting students’ learning situations. discussion focusing on students’ learning is a central element in high-quality university teaching (e.g. prosser & trigwell, 2014). previous studies have shown that teachers who express a learning-facilitation conception of teaching, i.e. who consider teaching as supporting students’ learning, more frequently report a learning-focused approach to teaching in practice, while those who consider teaching as transmitting knowledge tend to adopt a content-focused approach in their teaching practices (kember & kwan, 2000). pedagogical training, even a short one, has been shown to enhance teachers’ learning-facilitating conception (vilppu et al., 2019). murtonen et al 79 | f l r while prior studies on university teaching have mainly used self-report questionnaires and interviews, we broadened the scale to include eye-tracking methodology. we investigated whether teachers’ focus on students could be found at the intersection of their teaching conceptions and visual attention. based on previous studies, we hypothesised that pedagogically trained teachers would express more learning-focused views; thus, we expected a connection between previous training and focus on students, on both the visual and the interpretation levels. the analyses of verbal interpretations of the videos revealed both learning-facilitation and knowledge-transmission conceptions in teachers. when analysing only the verbal interpretations of the videos, we found no differences between the pedagogically trained and untrained or novice and more experienced teachers. our results concerning teachers’ visual attention to students showed that pedagogically trained teachers fixated more on the students than their untrained colleagues did. the difference was statistically significant in the content-focused teaching situation (cfts) video, where the students were passive and bored, but not in the learning-focused teaching situation (lfts) video, where the teacher actively engaged students in learning. pedagogically, the situation of the cfts video, in which students were passive, would need teacher attention and intervening action. only a few studies have addressed the recognition of possible situations that need teacher’s action (grub et al., 2020), of which our trigger event in cfts video is an example. we claim that the trained teachers, due to their more elaborated knowledge base (lachner et al., 2016), were more competent in noticing, i.e. paying visual attention to the problematic situation and interpreting it adequately. the cognitive theory of the top-down and bottom-up control of visual attention supports our finding: after the teacher’s teaching actions, i.e. lecturing on the cfts video, had captured watchers’ attention (bottom-up), the trained teachers used their attentive processes to shift their attention elsewhere (top-down) to focus on important things (e.g. theeuwes, 2000). in our case, the learning-facilitating conception helped pedagogically trained teachers shift their attention from the lecturing teacher to students when an action-needing event, i.e. boredom and passivity, occurred. analyses of teachers’ visual attention to the other elements of the lecture, the teachers and the slides, showed no statistically significant differences. however, teachers’ visual attention to the lecturing teacher when viewing the cfts video was statistically almost significant and in the direction of our hypothesis, showing that the untrained teachers paid more attention to the teacher than the pedagogically trained teachers did. previous studies have shown that experts tend to fixate more often and for a longer duration on relevant areas, whereas novices look more frequently at irrelevant areas (grub et al., 2020). similarly, our pedagogically trained teachers paid more visual attention to the students, noticed the trigger event and evaluated the situation in terms of learning-facilitation conception. the untrained teachers probably did not notice students’ boredom as a relevant phenomenon since they did not pay enough visual attention to the students. this is probably because, according to their knowledge-transmitting teaching conception, what the teacher does is most important. in our study, teaching experience measured in teaching years was not connected to visual attention and interpretations. this result contrasts with studies on the lower levels of education (e.g. stahnke & blömeke, 2021), in which experienced teachers differ from novices. there are many reasons for this contrasting result. at lower educational levels, teachers are usually pedagogically trained, unlike in universities, where university teachers may totally lack pedagogical education. thus, to compare the settings, we would need a study in which we have pedagogically trained novice and expert university teachers. our sample comprised mainly novice teachers, so more profound studies including teachers with extensive experience will be needed in the future. the university teaching environment also differs significantly from that of other educational levels; for example, a teacher may not always teach the same students and the place where teaching takes place may murtonen et al 80 | f l r always be different. thus, we argue that the results of other educational levels’ eye-tracking studies cannot be directly applied to higher education. on the other hand, our study is in line with expertise research results that found training to be more important than experience (e.g. ericsson, 2008). thus, it may be that in the university environment, having at least some pedagogical training is more important than having long teaching experience without pedagogical education. the other possible concerns of this study included the rather small sample size for quantitative modelling; this may affect the reliability of the statistical analyses, although it is comparable with other eye-tracking studies (see beach & mcconnel, 2019). on the other hand, in small samples, the effects are often undetected; this might indicate that we have discovered an interesting phenomenon that needs to be confirmed in future studies. the division of teachers into two groups with either less or more than two years of experience can be considered a problematic solution. however, the small sample size and the fact that most of the teachers were novices did not allow many other solutions. another concern was that the order in which the videos were watched was not randomised for the participants. however, we think that the order in which the videos were shown (first the content-focused scenario, then the more appropriate learning-focused scenario) was justified to tap into participants’ conceptions of teaching. showing the more favourable teaching video first could have affected their interpretations in the second video. the videos seemed to differ in their discriminatory power, which proved better for the cfts video. we assume this was because the trigger event was the passivity of the students, which required visual attention to the aoi of students. furthermore, it was subtler than the trigger event in the lfts video, requiring the participants to look at areas other than the most obvious, the teacher, who was talking all the time. in addition, in the eye-tracking data analyses, the visual attention on areas other than aois was not considered, since the number of the so-called white space fixations was minimal. however, since the aois were of different sizes, the participants might have looked at some of the aois accidentally more than others. in future studies, this should be considered in the analyses. in this study, the teachers watched teaching situations on a video, which is different from being in a real teaching situation and looking at their own students. university teachers’ gaze at their own teaching situations needs to be studied in the future, which raises its own methodological questions (cortina et al., 2015). however, using this simple eye-tracking design, we were able to conduct operationalisation and analysis of the data and obtain support for our hypotheses, which will lay the groundwork for more complex studies. university teachers are an interesting group to study, since many of them lack pedagogical training, as opposed to primary and secondary school teachers, who are usually pedagogically qualified. this novel study showed that pedagogical training is important for university teachers to develop their ability to notice important events in lecturing situations. the type of video viewing used in this study appeared to be a suitable instrument for measuring university teachers’ visual attention and related conceptions of teaching. we suggest that video interpretations combined with visual attention reflect teachers’ conceptions of teaching and offer new insights into the area of research, which has traditionally been studied almost entirely using self-reporting instruments (see also vilppu et al., 2019). our findings are very important, meaning that when a trained teacher notices on a visual level that the students need engaging, they may be able to engage them in active learning, which is considered central in high-quality teaching (cf. lonka & ketonen, 2012). in contrast, if a university teacher has no pedagogical training, they may not be competent in noticing situations where students need engaging. this finding proves that visual attention plays a central role in teachers’ ability to focus on students. murtonen et al 81 | f l r key points visual attention combined with verbal interpretations offers a frontline method for studying university teachers’ pedagogical expertise. pedagogically trained teachers paid more visual attention to the students in a situation where the students were bored and did not attend to the lecture. teachers who paid visual attention to important events during teaching were also able to formulate a more accurate verbal interpretation, reflecting a learning-facilitating conception of teaching. previous pedagogical training has explained differences in visual attention and verbal interpretations. teaching experience as measured by the number of years teaching was not connected to visual attention and verbal interpretations. acknowledgments we are thankful to the graduate students and researchers who participated in making the videos and to all the teachers who took part in the study. references barbeau, k., boileau, k., sarr, f., & smith, k. (2019). path analysis in mplus: a tutorial using conceptual model of psychological and behavioural antecedents of bulimic symptoms in young adults. the quantitative methods for psychology, 15(1), 38–53. https://doi.org/10.20982/tqmp.15.1.p038 beach, p. & mcconnel, j. (2019). eye tracking methodology for studying teacher learning: a review of the research. international journal of research & method in education, 42(5), 485–501. https://doi.org/10.1080/1743727x.2018.1496415 blömeke, s., gustafsson, j. e., & shavelson, r. (2015). beyond dichotomies: competence viewed as a continuum. zeitschrift für psychologie, 223, 3–13. https://doi.org/10.1027/2151-2604/a000194 cohen, j. (1988). statistical power analysis for the behavioral sciences (2nd ed.). lawrence erlbaum associates. https://doi.org/10.4324/9780203771587 cortina, k. s., miller, k. f., mckenzie, r., & epstein, a. (2015). where low and high inference data converge: validation of class assessment of mathematics instruction using mobile eye tracking with expert and novice teachers. international journal of science and mathematics education, 13, 389–403. https://doi.org/10.1007/s10763-014-9610-5 doyle, w. (2006). ecological approaches to classroom management. in c. evertson & c. weinstein (eds.), handbook of classroom management: research, practice and contemporary issues (pp. 97–125). lawrence erlbaum associates. https://doi.org/10.20982/tqmp.15.1.p038 https://doi.org/10.1080/1743727x.2018.1496415 https://doi.org/10.1027/2151-2604/a000194 https://doi.org/10.4324/9780203771587 https://doi.org/10.1007/s10763-014-9610-5 murtonen et al 82 | f l r duchowski, a. t. (2007). eye tracking methodology. theory and practice. (2nd ed.). springer. https://doi.org/10.1007/978-1-4471-3750-4 dunekacke, s., jenβen, l., & blömeke, s. (2015). effects of mathematics content knowledge on pre-school teachers’ performance: a video-based assessment of perception and planning abilities in informal learning situations. international journal of science and mathematics education, 13, 267–286. https://doi.org/10.1007/s10763-014-9596-z entwistle, n. (2005). learning outcomes and ways of thinking across contrasting disciplines and settings in higher education. the curriculum journal, 16(1), 67–82. https://doi.org/10.1080/0958517042000336818 ericsson, k. a. (2008). deliberate practice and acquisition of expert performance: a general overview. academic emergency medicine, 15, 988–994. https://doi.org/10.1111/j.1553-2712.2008.00227.x garner, j. k., & kaplan, a. (2019). a complex dynamic systems perspective on teacher learning and identity formation: an instrumental case. teachers and teaching: theory and practice, 25(1), 7–33. https://doi.org/10.1080/13540602.2018.1533811 gaudin, c., & chaliès, s. (2015). video viewing in teacher education and professional development: a literature review. educational research review, 16, 41–67. https://doi.org/10.1016/j.edurev.2015.06.001 gibbs, g., & coffey, m. (2004). the impact of training of university teachers on their teaching skills, their approach to teaching and the approach to learning of their students. active learning in higher education, 5, 87–100. https://doi.org/10.1177/1469787404040463 grub, a-s., biermann, a., & brünken, r. (2020). process-based measurement of professional vision of (prospective) teachers in the field of classroom management. a systematic review. journal for educational research online, 12, 75–102. https://doi.org/10.25656/01:21187 holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2011). eye tracking: a comprehensive guide to methods and measures. oxford university press. just, m. a. & carpenter, p. a. (1980). a theory of reading: from eye fixations to comprehension. psychological review, 87(4), 329–354. https://doi.org/10.1037/0033-295x.87.4.329 kaiser, g., busse, a., hoth, j., könig, j., & blömeke, s. (2015). about the complexities of video-based assessments: theoretical and methodological approaches to overcoming shortcomings of research on teachers’ competence. international journal of science and mathematics education, 13(2), 369– 387. https://doi.org/10.1007/s10763-015-9616-7 kember, d., & kwan, k. (2000). lecturer’s approaches to teaching and their relationship to conceptions of good teaching. instructional science, 28, 469–490. https://doi.org/10.1023/a:1026569608656 knight, p. (2002). being a teacher in higher education. society for research into higher education & open university press. https://doi.org/10.1007/978-1-4471-3750-4 https://doi.org/10.1007/s10763-014-9596-z https://doi.org/10.1080/0958517042000336818 https://doi.org/10.1111/j.1553-2712.2008.00227.x https://doi.org/10.1080/13540602.2018.1533811 https://doi.org/10.1016/j.edurev.2015.06.001 https://doi.org/10.1177/1469787404040463 https://doi.org/10.25656/01:21187 https://doi.org/10.1037/0033-295x.87.4.329 https://doi.org/10.1007/s10763-015-9616-7 https://doi.org/10.1023/a:1026569608656 murtonen et al 83 | f l r kok, e. m., & jarodzka, h. (2016). before your very eyes: the value and limitations of eye tracking in medical education. medical education, 51(1), 114–122. https://doi.org/10.1111/medu.13066 könig, j., blömeke, s., klein, p., suhl, u., busse, a., & kaiser, g. (2014). is teachers’ general pedagogical knowledge a premise for noticing and interpreting classroom situations? a video-based assessment approach. teaching and teacher education, 38, 76–88. https://doi.org/10.1016/j.tate.2013.11.004 lachner, a., jarodzka, h., & nückles, m. (2016). what makes an expert teacher? investigating teachers’ professional vision and discourse abilities. instructional science 44(3), 197–203. https://doi.org/10.1007/s11251-016-9376-y levin, d. m., hammer, d., & coffey, j. e. (2009). novice teachers’ attention to student thinking. journal of teacher education, 60(2), 142–154. https://doi.org/10.1177/0022487108330245 lonka, k., & ketonen, e. (2012). how to make a lecture course an engaging learning experience? studies for the learning society, 2-3, 63–74. https://doi.org/10.2478/v10240-012-0006-1 mcintyre, n. a., mainhard, m. t., & klassen, r. m. (2017). are you looking to teach? cultural, temporal and dynamic insights into expert teacher gaze. learning and instruction, 49, 41–53. https://doi.org/10.1016/j.learninstruc.2016.12.005 muthén, l. k., & muthén, b. o. (2017). mplus user’s guide (8 th ed.). muthén & muthén. postareff, l., & lindblom-ylänne, s. (2008). variation in teachers’ descriptions of teaching: broadening the understanding of teaching in higher education. learning and instruction, 18, 109–120. https://doi.org/10.1016/j.learninstruc.2007.01.008 postareff, l., & nevgi, a. (2015). development paths of university teachers during a pedagogical development course. educar, 51(1), 37–52. https://doi.org/10.5565/rev/educar.647 postareff, l., nevgi, a., & lindblom-ylänne, s. (2007). the effect of pedagogical training on teaching in higher education. teaching and teacher education, 23, 557–571. https://doi.org/10.1016/j.tate.2006.11.013 pouta, m., lehtinen, e., & palonen, t. (2020). student teacher’ and experienced teachers’ professional vision of students’ understanding of the rational number concept. educational psychology review 33, 109–128. https://doi.org/10.1007/s10648-020-09536-y prosser, m., & trigwell, k. (2014). qualitative variation in approaches to university teaching and learning in large first-year classes. higher education, 67, 783–795. https://doi.org/10.1007/s10734-013-9690-0 samuelowicz, k., & bain, j. d. (2001). revisiting academics’ beliefs about teaching and learning. higher education, 41, 299–325. https://doi.org/10.1023/a:1004130031247 seidel, t., stürmer, k., blomberg, g., kobarg, m., & schwindt, k. (2011). teacher learning from analysis of videotaped classroom situations: does it make a difference whether teachers observe their own teaching or that of others? teaching and teacher education, 27, 259–267. https://doi.org/10.1016/j.tate.2010.08.009 https://doi.org/10.1111/medu.13066 https://doi.org/10.1016/j.tate.2013.11.004 https://doi.org/10.1007/s11251-016-9376-y https://doi.org/10.1177/0022487108330245 https://doi.org/10.2478/v10240-012-0006-1 https://doi.org/10.1016/j.learninstruc.2016.12.005 https://doi.org/10.1016/j.learninstruc.2007.01.008 https://doi.org/10.5565/rev/educar.647 https://doi.org/10.1016/j.tate.2006.11.013 https://doi.org/10.1007/s10648-020-09536-y https://doi.org/10.1007/s10734-013-9690-0 https://doi.org/10.1023/a:1004130031247 https://doi.org/10.1016/j.tate.2010.08.009 murtonen et al 84 | f l r shulman, l. s. (1987). knowledge and teaching: foundations of the new reform. harvard educational review, 57, 1–22. https://doi.org/10.17763/haer.57.1.j463w79r56455411 stahnke, r., & blömeke, s. (2021). novice and expert teachers’ noticing of classroom management in whole-group and partner work activities: evidence from teachers' gaze and identification of events. learning and instruction, 74, 101464. https://doi.org/10.1016/j.learninstruc.2021.101464 stes, a., & van petegem, p. (2011). instructional development for early career academics: an overview of impact. educational research, 53, 459–474. https://doi.org/10.1080/00131881.2011.625156 theeuwes, j., atchley, p., & kramer, a. f. (2000). on the time course of top-down and bottom-up control of visual attention. in s. monsell & j. driver (eds.), control of cognitive processes: attention and performance xviii (pp. 105–124). mit press. trigwell, k., & prosser, m. (2004). development and use of the approaches to teaching inventory. educational psychology review, 16, 409–424. https://doi.org/10.1007/s10648-004-0007-9 uiboleht, k., karm, m., & postareff, l. (2018). the interplay between teachers’ approaches to teaching, students’ approaches to learning and learning outcomes: a qualitative multi-case study. learning environments research, 21, 321–347. https://doi.org/10.1007/s10984-018-9257-1 van den bogert, n., van bruggen, j., kostons, d., & jochems, w. (2014). first steps into understanding teachers’ visual perception of classroom events. teaching and teacher education, 37, 208–216. https://doi.org/10.1016/j.tate.2013.09.001 van gog, t., kester, l., nievelstein, f., giesbers, b., & paas, f. (2009). uncovering cognitive processes: different techniques that can contribute to cognitive load research and instruction. computers in human behavior, 25, 325–331. https://doi.org/10.1016/j.chb.2008.12.021 vermunt, j. d., vrikki, m., warwick, p., & mercer, n. (2017). connecting teacher identity formation to patterns in teacher learning. in d. j. clandinin & j. husu (eds.), the sage handbook of research on teacher education (pp. 143–159). sage publications ltd. vilppu, h., södervik, i., postareff, l., & murtonen, m. (2019). the effect of short online pedagogical training on university teachers’ interpretations of teaching–learning situations. instructional science, 47(6), 679–709. https://doi.org/10.1007/s11251-019-09496-z wolff, c. e., jarodzka, h., & boshuizen, h. p. a. (2017). see and tell: differences between expert and novice teachers’ interpretations of problematic classroom management events. teaching and teacher education, 66, 295–308. https://doi.org/10.1016/j.tate.2017.04.015 wolff, c. e., jarodzka, h., & boshuizen, h. p. a. (2021). classroom management scripts: a theoretical model contrasting expert and novice teachers’ knowledge and awareness of classroom events. educational psychology review, 33, 131–148. https://doi.org/10.1007/s10648-020-09542-0 https://doi.org/10.17763/haer.57.1.j463w79r56455411 https://doi.org/10.1016/j.learninstruc.2021.101464 https://doi.org/10.1080/00131881.2011.625156 https://doi.org/10.1007/s10648-004-0007-9 https://doi.org/10.1007/s10984-018-9257-1 https://doi.org/10.1016/j.tate.2013.09.001 https://doi.org/10.1016/j.chb.2008.12.021 https://doi.org/10.1007/s11251-019-09496-z https://doi.org/10.1016/j.tate.2017.04.015 https://doi.org/10.1007/s10648-020-09542-0 murtonen et al 85 | f l r wolff, c. e., jarodzka, h., van den bogert, n., & boshuizen, h. p. a. (2016). teacher vision: expert and novice teachers’ perception of problematic classroom management scenes. instructional science, 44(3), 243–265. https://doi.org/10.1007/s11251-016-9367-z wyss, c., rosenberger, k. & bührer, w. (2021). student teachers’ and teacher educators’ professional vision: findings from an eye tracking study. educational psychology review, 33, 91–107. https://doi.org/10.1007/s10648-020-09535-z https://doi.org/10.1007/s11251-016-9367-z https://doi.org/10.1007/s10648-020-09535-z article received 21 january 2022 / revised 9 december 2022/ accepted 9 december 2022 / available online 11 january 2023 abstract introduction teachers’ pedagogical expertise in the university context focusing on students at the level of conceptions and approaches focusing on students at the visual level and noticing important events utilising videos in studying university teachers’ focus on students present study methods participants apparatus and materials procedure analysis analysis of visual attention when watching teaching situations analysis of video interpretations analysis of the connections between targeted concepts results teachers’ visual attention in teaching situations teachers’ verbal interpretations of teaching situations and triggers teachers’ numerical ratings of the teaching situations a path model of university teachers’ visual attention and interpretations in teaching situations discussion references brom et al publication frontline learning research vol. 7 no 3 (2019) 64 90 issn 2295-3159 it’s better to enjoy learning than playing: motivational effects of an educational live action role-playing game cyril broma, viktor dobrovolný a, b filip děchtěrenkob, c tereza stárková a, b edita bromová a, d afaculty of mathematics and physics, charles university, czech republic b faculty of arts, charles university, czech republic c institute of psychology, the czech academy of sciences, czech republic b film and tv school, academy of performing arts in prague, czech republic article received 17 february 2019/ revised 20 june/ accepted 19 july / available online 16 august abstract game-based learning is supposed to motivate learners. however, to what degree does motivation driven by interest in playing an instructional game affect learning outcomes compared to motivation driven by interest in the very learning process? this is not known. in this study with a unique design and intervention, young adults (n = 128; a heterogeneous sample) learned how to control an electro-mechanical device in a 40-minute-long learning session integrated into a 2-hour-long educational live action role-playing game (edu-larp). edu-larps are supposedly engaging games where players take part in team role-playing by physically enacting characters in a fictional universe. in our edu-larp, players had to understand how the to-be-learned device worked in order to win the game. departing from typical game-based learning research, learningand playing-related variables were assessed for each learner separately (i.e., a within-subject design). affective-motivational factors related to playing (rather than learning) predicted learning outcomes in a positive, but considerably weaker, way compared to learning-related, affective-motivational factors. developed interest in larp-like games was primarily related to enjoying the game rather than better learning outcomes; whereas, developed interest in the instructional domain was primarily related to enjoyment of learning and better learning outcomes. overall, autonomous motivation to play was connected to higher learning outcomes, but this connection was weak. keywords: game-based learning; autonomous motivation; learning-outcomes; developed interest; live action role-playing games; edu-larps info corresponding author brom@ksvi.mff.cuni.cz doi: 10.14786/flr.v7i3.459 1. introduction game-based learning experiences are widely supposed to boost the autonomous motivation of learners. autonomous motivation refers to doing an activity for its own sake or for its perceived personal importance (vansteenkiste et al., 2009). higher autonomous motivation is supposed to increase cognitive engagement and thereby facilitate learning (e.g., moreno, 2005; see also cordova & lepper, 1996; grolnick & ryan, 1987). however, to what degree does motivation driven by interest in playing an educational game impact learning outcomes? is the influence of this motivation on learning outcomes higher, lower or the same compared to how much motivation driven by interest in the instructional topic impacts learning? this research question is addressed in this study. participants played a 2-hour-long educational live action role-playing game (edu-larp) with a sci-fi plot, that included a 40-minute-long learning session organically integrated into the middle of game play. edu-larps are supposedly engaging game-based learning experiences that emphasize team role-play and an element of physical enactment (bowman, 2014; bowman & standiford, 2015; hyltoft, 2010; montola, 2008; vanek & peterson, 2016). to win the game, players had to learn (in the integrated learning session) how to control an electro-mechanical device. the device was fictitious, having meaning only in the game context (to minimize the influence of prior knowledge). from an educational perspective, learning how the device worked represented general mental models acquisition in science, technology, and engineering contexts. participants included young adults having varying degrees of developed interest in sci-fi larp-like games and developed interest in electro-mechanics/ict. developed interest refers to a relatively enduring predisposition to re-engage with particular types of content over time (hidi & renninger, 2006). the edu-larp was designed to appeal primarily to those with the former developed interest. the learning session and the device were designed to engage primarily those with the latter developed interest. the reason for this was to create two distinct drives of autonomous motivation: one being interest in playing and the other interest in the instructional domain. enjoyable instructional experiences may not only increase autonomous motivation; they can also cause greater cognitive load (e.g., mayer, 2014; rey, 2012; um et al., 2012). cognitive load refers to the amount of mental activity imposed by the educational experience on working memory (sweller, ayres, & kalyuga, 2011). high cognitive load, especially if triggered by too complex or distractive elements in the learning environment, has opposite effects on learning processes than does motivation. it can overwhelm limited working memory resources and thereby hamper learning (e.g., sweller, ayres, & kalyuga, 2011; see also um et al., 2012). in this study, we measured autonomous motivation as well as cognitive load variables. we measured them twice during the game: with respect to game play (i.e., before the learning session) and with respect to the learning (i.e., after the learning session). we contrasted how motivation to play versus motivation to learn, and game-engendered versus learning-engendered cognitive load, influenced learning outcomes (i.e., within-subject design). learning outcomes were measured both after the edu-larp and a month later. we also examined whether developed interest in the instructional domain, and in the game, affected the two autonomous motivations, two cognitive loads, and learning outcomes. finally, we explored whether the (possible) relationship between the two developed interests and learning outcomes was mediated by affective-motivational or cognitive load variables. 2. study background 2.1 games and learning there are many definitions of games. in this paper, we understand game in the terms of jull’s definition (2003): as a rule-based system with variable, quantifiable outcomes, in which actors (i.e., players) have the possibility to “attach” themselves intrinsically to different outcomes of the playing process. at the same time, they can influence the game state while working toward the goal (with or without real-life consequences). players must make an effort to influence the game state. one of the key drivers for exerting such effort is autonomous motivation to play the game. various types of learning exist; for instance, skill training, learning of facts, or mental models acquisition. in this study, we focus on mental models acquisition, as understood in constructivist frameworks (e.g., mayer, 2009). mental models are knowledge structures that represent processes and/or systems and that enable drawing inferences about the processes/systems. these structures are built in learners’ minds within the context of knowledge structures previously acquired. the process is not automatic: learners have to exert mental effort to do so. one of the key premises of game-based learning is that motivation to play the game can positively influence learning processes and thereby enhance learning outcomes. how exactly motivation to play should do this (and whether it can actually do this, and if so, how strongly) has rarely been examined in game-based learning literature. this is discussed in detail in the next section. for example, when can effort invested into playing (driven by autonomous motivation to play) be transferred to effort invested into learning? this is not known. the present study examines one possible mechanism through which motivation derived from playing can enhance learning, as outlined in section 2.4. 2.2 game-based learning and edu-larps the game-based learning field has been dominated by digital games. however, other approaches, such as edu-larps, are also becoming popular (see bowman, 2014). as said above, the motivational potential of game-based learning has been generally assumed, but evidence substantiating it is limited. for example, digital games slightly enhance learning compared to traditional instructional approaches (e.g., meta-analyzed in clark, tanner-smith, & killingsworth, 2016; wouters, van nimwegen, van oostendorp, & van der spek, 2013), but the extent to which they do so through affective-motivational factors is unclear. many studies included in the meta-analyses have not researched affective-motivational factors (see sitzmann, 2011; wouters et al., 2013). the affective-motivational dimension has thus been omitted from most meta-analyses. wouters and colleagues, who did examine the effects of games in this dimension, reported that instructional games are more motivational compared to traditional types of education, but the difference only approached significance. 1 narrative reviews of game-based learning literature (e.g., bowman, 2014; boyle et al., 2016; jabbar & felicia, 2015) also did not provide information about how much game-derived motivation influences learning outcomes. claims about the possible influence of game-derived motivation on learning outcomes are substantiated by experimental studies that have examined the effects of individual game design elements (see, e.g., clark et al., 2016; wouters & oostendorp, 2017). some of these elements have been shown to elevate affective-motivational factors as well as enhance learning outcomes. these include personalization and choice (cordova & lepper, 1996), intrinsic integration of the learning content with game mechanics (habgood & ainsworth, 2011), and team role-playing activities (brom, šisler, slussareff, selmbacherová, & hlávka, 2016). however, correlational studies examining the strengths of associations between game-derived affective-motivational factors and learning outcomes have offered a mixed picture (e.g., a negative influence: iten & petko, 2014; a positive influence: sabourin & lester, 2014). the reason behind this ambiguity could be that the motivational–learning correlations have been confounded by contextual factors, most notably, by different levels of cognitive load caused by different game designs. even in comparative studies, experimental and control conditions could differ in the levels of cognitive load (often uncontrolled) imposed upon learners. to take the next step in answering the question on the strength of game-derived motivation’s learning impact, it would be useful if motivational effects triggered by an educational game were contrasted with effects triggered by an unquestioned, robust, “baseline” motivational factor while all participants were undergoing the same intervention. this would make levels of cognitive load dependent on differences between individuals rather than between contextual factors. what should this robust “baseline” be? high interest in a learning domain, and/or in an instructional topic, is straightforwardly implied by theories of motivation and interest in enhancing learning processes (e.g., see eccles & wigfield, 2002; hidi & renninger, 2006; keller, 2010). the positive effects of this interest on motivation to learn and learning outcomes have been repeatedly demonstrated (e.g., brom et al., 2017; fulmer, d’mello, strain & graesser, 2015; schiefele & krapp, 1996; schiefele, 1999). therefore, we used motivation driven by interest in the learning domain as our “baseline”. we contrasted effects of the “baseline” motivation on learning outcomes with the effects of motivation to play the game. the latter motivation was triggered using one of the approaches having been previously shown to be instrumental in doing this: team role-play activities in an edu-larp. edu-larps derive mostly from leisure time social role-playing games (bowman, 2014). social role-playing games emphasize players’ interaction, which distinguishes them from single-player role-playing games. social role-playing games are organized around a modeled scenario/narrative. in edu-larps, players enact this scenario physically, unlike in table-top and digital role-playing games. edu-larps present an old educational approach (kot, 2012; vanek & peterson, 2016), which recently saw a surge in interest (see bowman, 2014). they have been utilized for a range of curricular objectives and they are supposed to have various instructional benefits (bowman, 2014; hyltoft, 2008). for instance, they seem to be useful when a model scenario is to be re-enacted: such as for skills-training (e.g., hayden, smiley, alexander, kardong-edgren, & jeffries, 2014) or understanding complex, socio-historical relationships (e.g., brom et al., 2016; mochocki, 2014). motivational potential is yet another possible advantage of edu-larps (e.g., bowman & standiford, 2015; vanek & peterson, 2016). as in the case of digital game-based learning, researchers are still developing empirical evidence on edu-larps’ effectiveness (bowman, 2014). the key evidence substantiating the claim about the motivational potential of edu-larps comes from the experimental study with a large sample by brom and colleagues (2016), who demonstrated that learning outcomes and affective-motivational variables were enhanced when high school students learned through an edu-larp compared to a discussion-based control without the role-playing game element. the affective-motivational variables partially mediated the game’s positive effect on learning outcomes. supplementary evidence for the motivational potential of edu-larps comes from anecdotal reports (see bowman, 2014; bowman & standiford, 2015). 2.3 individual differences in developed interest different learners are interested in different things. learners’ interests may influence how much they will be motivated to study a particular topic or learn using a game-based approach. for example, not all learners are equally motivated by edu-larps. some learners do not want to get involved in role-play (vanek & peterson, 2016); especially those with low prior experience with this game format (mochocki, 2014). learners with applied study backgrounds found an edu-larp approach more appealing than learners with social sciences backgrounds (brummel et al., 2010). whereas some learners enjoyed an edu-larp experience, others were stressed by the necessity to interact socially during the game (brom et al., 2014). on a theoretical level, these ideas can be embraced using the notion of well-developed individual interest (called developed interest here for brevity). according to the four phase model of interest development (hidi & renninger, 2006), this is the most enduring form of interest: a relatively stable predisposition to re-engage with particular types of content. two developed interests are important here. the first one is interest in sci-fi larps and similar games and game-like experiences (hereafter also called gamer scores). the second one is developed interest in the instructional domain/topic, i.e., ict/electro-physics (hereafter also called techie scores). 2.4 autonomous motivation and self determination theory how can gamer scores and techie scores be theoretically connected to motivations to play and/or to learn? how can these motivations be theoretically linked to learning outcomes? (we note that we do not focus here on leveraging general school motivation, but on motivation and learning outcomes related to a particular learning experience.) a useful way of organizing different forms of motivation provides a framework, which differentiates between autonomous and controlled motivations (ryan et al., 2006; vansteenkiste et al., 2009). this framework is based on self-determination theory (deci & ryan, 1985). autonomous motivation, which is of present interest, is characterized by an internal perceived locus of causality (decharms, 1968; ryan & deci, 2000): learners perceive this motivation as originating within themselves. (controlled motivation is characterized by an external perceived locus of causality.) autonomous motivation is the desired type of motivation (deci & ryan, 2008), because it is linked to several advantages: including better learning outcomes (e.g., cordova & lepper, 1996; grolnick & ryan, 1987; vansteenkiste et al., 2005; see also schiefele, 1999; vansteenkiste et al., 2009). we share this assumption here (general prediction). within self-determination theory, autonomous motivation has two subcomponents (vansteenkiste et al., 2009; see also ryan & deci, 2000): intrinsic motivation that refers to doing an activity for its own sake (because it is inherently enjoyable) and identified regulation, a form of extrinsic motivation that refers to doing an activity because of its perceived importance. self-determination theory maintains that autonomous motivation is fostered when learning environments facilitate learner satisfaction of needs for autonomy, competence, and relatedness (deci & ryan, 1985; vansteenkiste et al., 2009). as concerns intrinsic motivation to learn, the need for competence would be satisfied more often among persons with a high developed interest in the learning domain (i.e., techies) than for those with a low developed interest therein (i.e., non-techies). the reason is that techies would typically feel more competent in solving the learning task than non-techies. therefore, self-determination theory predicts that techie scores will positively relate to intrinsic motivation for learning how an electro-mechanical device works (prediction sdt1a; sdt = self-determination theory). consequently, the techie scores will positively relate to learning outcomes (prediction sdt1b). based on self-determination theory, intrinsic motivation to play would be hampered when a game undermines one of the above needs: it will be lower for those who find it hard to play an edu-larp (competence) or who feel uncomfortable during role-playing/social interaction (relatedness). intrinsic motivation to play will thus positively relate to the gamer score (prediction sdt2a). as concerns identified regulation to learn, the following applies: high autonomous motivation to play, presumably more prevalent among gamers, can be transferred to identified regulation to learn, because players motivated to play may invest more into learning (they feel it is important as part of the game). consequently, the gamer score should relate positively to learning gains (prediction sdt2b). however, the gamer scores → learning outcomes link may be weaker than the techie scores → learning outcomes link, because it is not guaranteed that motivation to play would project to identified regulation to learn for all participants (prediction sdt3). 2.5 distraction and cognitive load theory cognitive load theory (sweller, ayres, & kalyuga, 2011) is a theoretical framework based on a model of human cognitive architecture, which enables educational designers to construct instructionally efficient learning environments. its key assumptions are that working memory has limited capacity and duration; whereas, long-term memory serves for permanent storage with unlimited capacity and duration. during learning, incoming information is first represented in working memory and eventually integrated with pre-existing knowledge structures in long-term memory. these knowledge structures also organize temporary representations in working memory: more advanced structures enable the representation of incoming information using fewer information elements. following its recent adjustment (kalyuga, 2011), cognitive load theory posits two types of working memory load: intrinsic and extraneous. learners must allocate cognitive resources to deal with these loads. if they fail to do so, or if total load overwhelms working memory resources, learning is hampered. intrinsic load is imposed on learners by the complexity of the learning task. intrinsic load is essential for comprehending the learning message: dealing with it results in learning. in determining the task’s complexity, one has to consider learner’s prior knowledge (i.e., knowledge structures in long-term memory prior to learning). what is complex for a novice may not be complex for an expert. in this study, learners learn to operate a fictitious electro-mechanical device. prior knowledge of the device is null for everyone, but techies will possess high-quality knowledge structures concerning certain general electro-mechanical concepts; e.g., “electrical signal”. these structures will enable them to represent information in their working memory more efficiently. within cognitive load theory, this means techies will have lower intrinsic load (prediction clt1a; clt = cognitive load theory). therefore, it is less likely that techies’ working memory would be overloaded compared to non-techies. consequently, in agreement with prediction sdt1b, techie scores will be positively related to learning outcomes (prediction clt1b). extraneous load is caused by the processing of sub-optimally designed features of learning environments. it should be minimized, because accommodating it depletes cognitive resources that could otherwise aid in dealing with intrinsic load. extraneous load can arise from two sources within a game-based learning environment: a) from the sub-optimal design of the instructional content embedded in the game and b) from game play as such. those with high techie scores will likely cope better with possible sub-optimal design, which adds weight to prediction clt1b. as concerns the game-related source of extraneous load, a portion of players’ cognitive resources will be devoted to thinking about playing the game. these thoughts will deflect learners’ attention away from learning. game-related, but learning-irrelevant, thoughts are likely to be amplified for non-gamers, who do not yet have well-developed game-related schemata/skills. therefore, game-engendered extraneous load will be negatively related to gamer scores (prediction clt2a). because it may cause cognitive overload and hamper learning, gamer scores will relate positively to learning outcomes (prediction clt2b). 3. this study – overview and hypotheses this study examines how much autonomous motivation driven by developed interest in an educational game influences learning outcomes compared to motivation driven by developed interest in the instructional domain (i.e., within-subject design). autonomous motivation is referred to hereafter as motivation for the sake of brevity; the first motivation is referred to as motivation to play and the second one as motivation to learn. the two developed interests are called gamer and techie scores, respectively. the study also investigates how these two scores affect motivation to play, motivation to learn, overall game enjoyment, difficulty of game play (as a proxy variable to game-engendered extraneous cognitive load), cognitive loads engendered by the learning experience, and learning outcomes. directional hypotheses: h1: the techie scores will relate • (h1a) positively to motivation to learn (based on prediction sdt1a); • (h1b) negatively to cognitive loads engendered during learning (prediction clt1a); • (h1c) positively to learning outcomes (predictions sdt1b and clt1b). h2: the gamer scores will relate • (h2a) positively to motivation to play and also overall game enjoyment (prediction sdt2a); • (h2b) negatively to difficulty in playing the game (prediction clt2a); • (h2c) positively to learning outcomes (prediction sdt2b and clt2b). the directional hypotheses concerning influences of the techie and gamer scores are summarized in table 1. there relationships for the remaining pairs of variables (i.e., in columns and rows from table 1) will be explored (exploratory goals e1, e2). with respect to the influences of both motivations on learning outcomes, the following hypotheses will be examined (table 2): h3 • (h3a) motivation to play will be positively related to learning outcomes (prediction sdt2b and general prediction); • (h3b) motivation to learn will be positively related to learning outcomes (general prediction). the link between cognitive loads and learning outcomes will be explored (exploratory goal e3). we put forward these mediation hypotheses (table 3): h4: the relationship between techie scores and learning outcomes will be mediated • (h4a) positively by motivation to learn (prediction sdt1a and sdt1b); • (h4b) negatively by cognitive loads evoked during learning (prediction clt1a and clt1b). h5: the relationship between gamer scores and learning outcomes will be mediated • (h5a) positively by motivation to play and also overall game enjoyment (prediction sdt2a and sdt2b); • (h5b) negatively by difficulty in playing the game (prediction clt2a and clt2b). finally, we will explore (e4) whether the relationships between a) gamer scores and learning outcomes and b) motivation to play and learning outcomes is weaker compared to complementary relationships between (a) techie scores or (b) motivation to learn and learning outcomes (cf. prediction sdt3). table 1 hypotheses/exploratory goals related to techie and gamer scores a indexed by proxy variables. note: + positive relationship expected; – negative relationship expected; ? no expectation. table 2 hypotheses/ exploratory goals related to how variables assessed in situ predict learning outcomes aindexed by proxy variables. note: + positive relationship expected; ? no expectation. table 3 hypotheses/exploratory goals related to mediation aindexed by proxy variables. note: expected relationship: + positive; – negative. 4. methods 4.1 participants participants were recruited from university pools and via the facebook pages of larp/sci-fi communities and using short-term job advertisement servers. we emphasized that we were seeking both novice and seasoned larp players. participants received financial compensation (400 czk, ~15 eur). prospective participants completed an online questionnaire, which provided demographic data (mage = 24.7; sdage = 3.72) and data on prior larp-related experience. we invited selected participants for one of 11 game runs, such that people with both low and high prior larp-related experience and with diverse study/employment backgrounds (see suppl. mat. a) participated in every run (10-13 participants per run). ultimately, 128 participants were included in the analysis. two additional participants were excluded due to health issues. seventeen participants were excluded from the analyses of delayed learning outcomes data because they did not attend delayed testing session. the sample was heterogeneous with respect to two techie and gamer scores (figure 1). these two variables also did not correlate (r = .06); i.e., we recruited techie gamers, techie non-gamers, non-techie gamers, and non-techie non-gamers. figure 1. participants’ techie and gamer scores. 4.2 materials edu-larp. we designed the larp as a 2-hour-long game that can teach a scientific topic by means of an embedded learning session. the larp was a sci-fi space opera with a plot created by a seasoned larp script writer. the story started in the midst of a journey on a generation spaceship. the players took on roles of technical school students therein. certain events triggered a mutiny, during which the fighting parties damaged input cables to a device for controlling correction thrusters (i.e., sideways-pointed motors used to make small corrective movements when a spaceship is already in space). this left the ship on a collision course with an asteroid. only one access route to the device remained passable: it led from the technical school via an escape corridor. the players had to learn how the device works (figure 2, 3), locate it, set it to manual control, and tweak its cables to avoid hitting the asteroid and save the ship’s population. the complexity of roles was determined during pilot experiments. the roles were less complex than in a typical larp for seasoned players, but still relatively complex for novice players (i.e., a compromise). figure 2. a: model of the device available during the learning session. b: rewired cabling. c: the actual device. d: schematic drawing of device. figure 3. a: players discussing game options. b: players interacting during the learning session. c: players rewiring cabling on a model. in the middle of the game, the teacher (game master) initiated a 40-minute-long learning session, during which he taught the students how to operate the device (figure 3b, c). the plot was designed so that all players were motivated to learn this (see suppl. mat. b for details). the players worked with functional models of the device. they did not have access to these models outside the session, and the teacher never commented on the device’s functioning, except during this session. a clear beginning and end point for the session also enabled the teacher to administer questionnaires during the game at well-defined moments. toward the end of the game, the players gained access to the actual device. they had 9 minutes to tweak it in order to correct the ship’s course. teaching session. this was a 40-minute-long frontal lecture with slides interspersed with teacher-guided hands-on-practice segments. during the lecture, players stayed in their game roles the whole time. each player was given the device’s model and schematic drawing (figure 2a, 2d). this was the first time they saw the model/device. to solve the final game task and win the game, players had to understand how the device works, i.e., acquire its mental model (mere superficial memorization was insufficient). the model and the learning task. the key part of the model/device, the so-called control calculator a) controlled whether the ship’s course correction could be made given the current status of the correction thrusters and b) converted the course correction given in angular units to the force of the thrusters. input wiring to the calculator relayed a signal from the deck carrying the correction command and status reports from the thrusters. output wiring carried signals to the thrusters with force data and a command (either to perform the maneuver or test if it was possible to carry out the maneuver). the device featured a manual control which could partially override commands from the deck. with a sufficient level of understanding, one could re-wire the cabling (figure 2b, 3c); by-passing certain calculations and/or changing the partial override mode to full override mode (see suppl. mat. c for further details). it was necessary to do these steps in solving the final task. the actual device and the final task. the device was similar to the model (figure 2c): but larger. it had a similar layout, but different graphics. it contained a timer that counted down the time to the ship’s final maneuver. during this time (9 minutes), the players, as a group, had to tweak the cabling and set the ship’s new course. this was a near transfer task with respect to the tasks assigned in the learning session. 4.3 measurements we faced a challenge because instruments assessing relevant constructs in the context of edu-larps were lacking. also, the instruments needed to be short; especially those to be used during the game. therefore, we adjusted several instruments from neighboring research fields, established face validity, and fine-tuned the questions during pilot experiments. all questions and scales are detailed in suppl. mat. d. 4.3.1 participant variables demographic data. when expressing interest in participating in this research, participants reported online their gender, age, prior larp-related experience, study background, and possible employment. prior larp-related experience was measured using six questions we developed, which assessed participants’ experience with various types of larp-like games. techie scores. this variable should reflect participants’ developed domain interest. such interest was, in the present case, related to developed interests in ict and electro-physics. it was also related to participants’ study types, as study type can generally be assumed to reflect the person’s interests. therefore, the techie score was computed as a weighted sum of these two interests and a score assigned based on the participant’s study type (see suppl. mat. e for the exact equation). developed interest in electro-physics (4 items; α = .87) and in ict (4 items; α = .89) was assessed based on work done by renninger and schofield (2014). a score for the participant’s study type was assigned based on a rubric detailed in suppl. mat. e. ict and electro-physics developed interests were strongly inter-correlated (r = .68) and moderately-to-strongly correlated with the study type score (r = .42, .44). gamer scores. this variable should reflect participants’ developed interest in the type of games exemplified by our intervention; i.e., a sci-fi larp. it was computed as a weighted sum of participants’ prior larp-related experience (as this reflects voluntary experience with larp-like games) and developed sci-fi interest (α = .73; see suppl. mat. e for details). prior larp-related experience was reported online when participants expressed interest in our research (see above). sci-fi developed interest was assessed similarly to the ict/electro-physics developed interest (i.e., based on renninger & schofield, 2014) (5 items; α = .95). 4.3.2 dependent variables autonomous motivation: positive affect, flow, and learning enjoyment. motivations to play and to learn were examined separately. because it would be difficult for learners to distinguish between these two motivations after the game ended, we assessed them in situ: at appropriate moments during the game play but without disrupting the play (i.e., using “gamified” questionnaires). we measured motivation to play through two proxy variables: flow and generalized positive affect (referred to hereafter as positive affect). we measured motivation to learn through three proxy variables: flow, positive affect, and learning enjoyment. positive affect is related to various positively-valenced, activating feelings (e.g., excitation, activity). we measured it using positive and negative affect schedule (i.e., panas; watson et al., 1988) (α = .82 – .93).2 flow refers to pleasant absorption of an activity that one takes part in (csikszentmihalyi, 1975). we measured it using three items from the flow short scale (rheinberg et al., 2003) (α = .85 – .89). learning enjoyment was assessed using two questions from the interest/enjoyment subscale of the intrinsic motivation inventory (mcauley et al., 1989) (r = .88). only subsets of questions were used for brevity. learning-engendered cognitive load. in addition to learning-related motivation, we measured learning-engendered intrinsic load (2 items; r = .82) and extraneous load (3 items; α = .78) using questions adopted from the questionnaires by leppink and colleagues (2014) and naismith and colleagues (2015). overall game difficulty, overall game enjoyment. as a proxy variable to game-engendered extraneous load, overall game difficulty was measured using three items we created (α = .84). overall game enjoyment was assessed with 10 questions we created (α = .89); based on motivation/enjoyment items from other questionnaires (e.g., mcauley et al., 1989; schraw et al., 1995). items were tailored for the specifics of larps. retention test. it had one question: “draw a diagram of the device for controlling the ship’s correction thrusters showing all elements of the device”. a point was awarded for correctly drawing an element, for correctly positioning it with respect to other elements, for correctly naming it, and for correctly drawing a cable crossing (scale: 0 – 80). two independent raters scored the answers with a nearly perfect agreement (immediate: weighted cohen’s κ = .995; delayed: κ = .999; cohen, 1986). transfer test. we developed two complementary versions of the transfer test (for immediate versus delayed testing; counterbalanced across participants). one version had four and the other five open-ended questions, e.g., “imagine that the spaceship does not have three correction thrusters, but four instead. what changes would you have to make to the device controlling the correction thrusters in order for the device to function with four thrusters?”. participants were awarded 1 point for each correct solution or 0.25 or 0.5 points for a partially-correct solution (scales: 0 – 26 and 0 – 27, respectively). two raters scored the answers with a substantial agreement (immediate: κ = .977; delayed: κ = .968). raters’ scores were averaged for the subsequent analysis. prior to averaging, transfer test scores were z-transformed for each version of the test to obtain comparable values. 4.4 procedure participants received general larp rules, a description of the setting, and brief descriptions of all roles in advance. they selected online which roles they would prefer to play. upon arrival, participants filled in the initial questionnaire (see figure 4 for the experimental schedule). afterwards, a warm-up period started: the larp rules were recapitulated and participants could decorate the rooms with supplied thematic set pieces. participants were then assigned roles (based on the preferences they had previously expressed), read their descriptions, and introduced their roles to fellow players. figure 4. experimental schedule. afterwards, the game started. the teaching session started about an hour into the game. before it began, the teacher distributed the game-related flow and panas questionnaires (with a cover story about the ship’s supreme inspectorate evaluating the quality of his teaching). after the lecture, the teacher distributed the questionnaire that yielded post-learning flow, positive affect, enjoyment, and intrinsic/extraneous cognitive load data. after roughly 20 minutes, players eventually found access to the escape corridor and located the device, wherein they solved the final game task. participants then filled in the final questionnaire, which primarily yielded overall game enjoyment and difficulty assessments. afterwards, retention and transfer tests were distributed. finally, a game debriefing and an interview were organized. about three weeks after the larp, participants arrived for the delayed testing session. they were given retention and transfer tests and developed interest questionnaires. 4.5 data treatment post-learning positive affect and flow tapped at-the-moment experienced affective-motivational states. these states could be influenced not only by learning, but also by game play that preceded the learning session (this was not the case of enjoyment/cognitive load scales, because the respective questions referred to the learning session as such, see suppl. mat. d). we wanted to use, as proxies to motivation to learn, positive affect/flow-derived variables that satisfied two requirements: they were a) related to pre-/post-learning change in positive affect/flow and b) independent of pre-learning positive affect/flow. therefore, learning-related positive affect/flow were computed as pre-/post-learning positive affect/flow residual differences (by regressing the post-learning positive affect/flow on the pre-learning positive affect/flow). 5. results the means and averages of the dependent variables are included in suppl. mat. a. the correlation matrix is also provided therein. 5.1 techie scores we used multiple linear regressions with two independent factors: gamer and techie score. when entered together with gamer scores into the models, techie scores modestly predicted motivation to learn (table 4), so hypothesis 1a was supported. techie scores strongly predicted cognitive load induced by learning (in the negative direction) and learning outcomes, so hypotheses h1b and h1c were also supported. as concerns exploratory goals e1a and e1b, techie scores were unrelated to game-induced flow and positive affect. however, they were modestly related to game enjoyment. this can be explained by the fact that game enjoyment was measured after the larp ended, so it was arguably also influenced by liking the learning session (unlike game-induced flow/positive affect measured before the learning session started). techie scores were unrelated to perceived game difficulty. we conclude that techies liked the learning session more (compared to non-techies), it was easier for them to learn, and they learned better. however, playing a larp was not more motivating for them. 5.2 gamer scores gamer scores strongly predicted motivation to play, overall game enjoyment, and (in the negative direction) game difficulty (table 4). hypotheses h2a and h2b were thus supported. gamer scores were modestly related to learning gains (except for immediate retention). hypothesis h2c was thus partially supported. as concerns exploratory goals e2a and e2b, gamer scores were unrelated to motivation to learn and learning-engendered cognitive loads. we conclude that participants with higher gamer scores were relatively more motivated to play the larp and playing was easier for them. they also learned slightly better. however, gamer scores were not showed to be connected to motivation to learn and cognitive load. table 4 standardized beta coefficients for a multiple regression model (predictors: techie/gamer scores) note: pa = positive affect. hypotheses: ✓ supported; [✓] partially supported. relationships found: ○ no; + positive; – negative. a pre-post residuum. *p < .05 **p < .01 ***p < .001 5.3 affective-motivational–learning relationship which in situ measured variables predicted learning outcomes? the effects of the following variables were investigated: game-induced positive affect/flow, learning-induced positive affect/flow, learning enjoyment, intrinsic/extraneous load. to facilitate interpretation, we reduced the number of variables using exploratory factor analysis. the kaiser-meyer-olkin adequacy was .65 (above the recommended value .6) and bartlett’s test of sphericity was significant (p < .001). factors were extracted using the ordinary least squares method with varimax rotation. both scree plot analysis and parallel analysis suggested the presence of three factors. these factors corresponded to our three umbrella constructs: motivation to play, motivation to learn, and cognitive load; and the factors were thus labeled so. in our main analysis, we computed regression models with the newly-created factors (all three factors where entered together into each model)3. both motivation factors significantly predicted transfer; however, only motivation to learn predicted also retention (table 5). hypothesis h3b was thus supported and h3a was supported only as concerns transfer. table 5 hypotheses, predictions, and exploratory goals related to the motivation–learning link note: hypothesis: ✓ supported; ( ✓ ) partially supported; x not supported. relationship found: – negative. athe factor to which game-induced positive affect/flow primarily load. bthe factor to which learning enjoyment/positive affect/flow primarily load. cthe factor to which intrinsic/extraneous load primarily load. as concerns exploratory goal e3, cognitive load predicted well all learning outcome variables in the negative direction. as concerns exploratory goal e4, gamer scores as well as motivation to play were weaker predictors of learning outcomes compared to techie scores and motivation to learn (and cognitive load). however, they still played certain roles (tables 4, 5). we conclude that participants with higher motivation to learn and/or with lower cognitive load learned better than those with lower motivation and/or higher cognitive load. also, those motivated to play the game performed somewhat better than those less motivated to play, but only on transfer test tasks. 5.4 mediation analysis we used the package mediation (tingley et al., 2014) for causal mediation analysis. we computed estimates for indirect effect using quasi-bayesian monte carlo simulations (n = 10,000) (preacher & hayes, 2004). the relationship between techie scores and learning outcomes was mediated both by motivation to learn and cognitive load engendered during learning (table 6). therefore, hypotheses h4a and h4b were supported. the relationship between gamer scores and transfer was mediated by overall game enjoyment and, for immediate transfer, marginally mediated by overall game difficulty. no other variable was confirmed as a mediator (p > .106). hypothesis h5a was thus supported only with respect to transfer and overall game enjoyment, and hypothesis h5b with respect to immediate transfer. table 6 mediation analysis note: ci = confidence interval. hypothesis: ✓supported; [✓] partially supported; x not supported. †p < .10 *p < .05 **p < .01 athe fa factors 5.5 interview data we inspected negative evaluations of the larp and whether motivation to learn transformed to identified regulation to play, as gauged by participants. to summarize these results, we split participants into four groups based on median split of transfer (average value of immediate and delayed transfer test scores; i.e., high vs. low transfer) and motivation to play (average value of z-scores from game-induced flow and positive affect; i.e., high vs. low motivation). sample statements from the subgroups’ participants are shown in table 7. generally, qualitative data showed: i) higher learning outcomes were not necessarily connected to excitation from the game; ii) some learners (around 20% of the sample) did not like the approach; iii) some participants (around 15%) claimed the game had a substantial positive motivational effect on their learning. table 7 sample statements during interviews and participant interests aactual range. btwenty-seven participants could not be classified as hard vs. social science students or had partly missing data. 5.6 supplementary results for control purposes, we also conducted a supplementary study, in which we tested how much participants would learn from the same 40-minute-long learning session as embedded in the larp: but outside the game. we recruited 48 participants to match a selected sub-sample from the main study (i.e., a quasi-experimental comparison without randomization). participants were matched based on their study backgrounds (see suppl. mat. g for further details). at the beginning of the learning session, these non-game participants were told that they should imagine they are students on a generation ship (to contextualize the device). the same learning outcomes and autonomous motivation measures were used. positive affect of non-larp learners measured immediately after the narrative introduction, i.e., immediately before the learning session, was significantly lower compared to positive affect of the matched larp participants measured in the game, before the learning session (d = 0.45). this demonstrates a medium effect size motivational advantage of the edu-larp (see suppl. mat. g for descriptive data and analyses). motivation to learn (i.e., measured immediately after the learning session ended) was comparable for both groups of learners. non-larp learners exhibited a faster decline of transfer learning outcomes (immediate – after three weeks) compared to the matched larp participants (d = 0.67). no other significant between-group difference as concerns learning outcomes was found (d = –0.16 – 0.20) (suppl. mat. g). this means that larp players forgot less in terms of conceptual knowledge (moderate-to-strong effect). 6. discussion we investigated how much motivation driven by interest in playing an educational game impacts learning outcomes compared to motivation driven by interest in the instructional domain (while controlling for levels of cognitive load). motivation to play was shown to be related to learning outcomes, but its influence was dwarfed by the effects of the natural motivation to learn the given topic exhibited primarily by participants with developed domain interest. specifically, developed domain interest was clearly related to autonomous motivation to learn and learning-evoked cognitive load, which were clearly related to learning outcomes. however, autonomous motivation to play, exhibited primarily by participants with developed interest in sci-fi larp-like games, was only slightly related to transfer learning outcomes (and unrelated to retention). also, motivation to play did not noticeably mediate the relationship between interest in sci-fi larp-like games and learning outcomes. this pattern of results – that of a relatively strong effect of domain interest on learning outcomes and weaker, sometimes even negative, effect of interestingness of framing the educational message on learning outcomes – is generally consistent with findings from the fields of multimedia learning (e.g., rey, 2012) and hypermedia learning (e.g., moos & marroquin, 2010). this study has demonstrated this pattern in the context of game-based learning, for which large motivational benefits have been envisioned: it is better to enjoy learning than playing. 6.1 contributions 6.1.1 theoretical contributions from the self-determination theory perspective, it is noteworthy that we distinguished between autonomous motivations to learn versus to play, which is rare in game-based learning literature. in agreement with self-determination theory, our results demonstrated that motivation to learn was a better predictor of learning outcomes compared to motivation to play; yet the study also provided provisional evidence suggesting that intrinsic motivation to play can transform to identified regulation to learn. future studies should explore in more detail how motivation derived from playing transfers to identified regulation to learn, as this is one of the key ways how motivation to play can influence learning processes within game-based learning. other ways how motivation to play can impact learning processes should also be considered and examined (e.g., through enhanced self-efficacy). from the cognitive load theory perspective, it was equally important that we distinguished between game-evoked extraneous load and learning-evoked loads. in agreement with cognitive load theory, the results confirmed learning-evoked loads as mediators of the effect of developed domain interest on learning outcomes. complementary meditational analysis showed that perceived game difficulty (a proxy variable to game-engendered extraneous load) only tended to mediate the effect of developed interest in sci-fi larp-like games on immediate transfer. either the distraction from the game was not a big deal for participants, or our measurement did not assess the game-engendered cognitive load well. validated methods for measuring this construct are needed, because the extent to which different types of games or game attributes evoke extraneous cognitive load is a pressing issue. with respect to the four-phase model of interest development (hidi & renninger, 2006), our results corroborated the idea that developed interest in a learning domain is connected to enhanced learning in (at least) two different ways: cognitive and affective-motivational ones. this interest was linked to both lower cognitive load (presumably due to pre-existing, task-fitting schemata in the long-term memory) and higher values of motivation variables. both cognitive load and motivation also mediated the effect of domain interest on learning outcomes. 6.1.2 practical contributions our results showed that edu-larps may enhance learning through positive affective-motivational factors for some learners, but this educational method is not unanimously liked. this study and some prior research (brummel et al., 2010; mochocki, 2014; vanek & peterson, 2016) thus indicate that edu-larps are accepted differently by different learner-types. acceptance-/suitability-related problems should be addressed in future research as well as in applying this educational method in practice (cf. mochocki, 2014; vanek & peterson, 2016). practitioners should also keep in mind that larps take a long time to prepare and complete. on a more positive note, a single edu-larp can have multiple educational objectives at the same time (bowman, 2014). 6.1.3 methodological contributions this work capitalized on the fact that game-related versus learning-related developed interest, affective-motivational, and cognitive load variables were assessed separately. we suggest considering this approach in future game-based learning research, as this can help elucidate complex roles different components of interest, motivation, and cognitive load play in learning processes. 6.2 limitations no work is without limitations. the key thorny issue is the nearly-absolute lack of validated measurements for edu-larp contexts (and beyond, as detailed below). first, developed instruments, such as egameflow (fu et al., 2009), are very long. a construct needs to be assessed with a few questions within a game, and, should learning tests be administered, also after the game (to avoid fatigue). second there is an on-going discussion about how to measure cognitive load and distinguish between different types of load (e.g., leppink et al., 2014; naismith et al., 2015). cognitive load instruments that would be validated in the same way as, for example, panas are lacking. finally, there is no agreed-upon method for measuring developed interests (renninger & pozos-brewer, 2015). we were more satisfied with the gamer score variable, because it was more internally consistent compared to the techie score variable (despite all three subcomponents of the latter variable were theoretically related to developed domain interest). we believe that the pattern of our data is clear enough and consistent with underlying theories to warrant our interpretations of the findings. nevertheless, valid instruments would be useful in future. in an ideal world, this study would have had a control, non-larp, condition and participants would have been randomly assigned to the larp and non-larp conditions. this would enable the contrasting of learning within the game to learning outside the game, whilst the content and the method (i.e., frontal lecture with hands-on practice) would be the same. unfortunately, true randomization is rarely possible in research using edu-larps due to practical and ethical reasons (further detailed in suppl. mat. g). we have focused here on the within-subject comparison part (done within what would be an experimental condition) and have drawn conclusions from this part. we also recruited a quasi-experimental control group, but data on comparing performance of larp learners to non-larp learners should be treated cautiously because of the lack of randomization. 7. conclusions this study contributes to game-based learning literature, but also to the research base on productive enhancement of interest and motivation in academic contexts. its key message is that learners’ developed domain interest and motivation to learn a particular topic contribute more toward enhancing learning outcomes than supposedly appealing, game-based augmentations of the educational message. our results can be directly generalized probably only to games that have a similar level of complexity as our larp. however, there is growing, parallel evidence suggesting that the message above is quite general and concerns many different types of learning environments and materials, such as textbooks or hypermedia. supplementary materials supplementary materials include: supplementary data and analyses (suppl. mat. a, f), detailed description of the edu-larp and procedure (suppl. mat. b), description of the function of the experimental device (suppl. mat. c), questionnaire items (suppl. mat. d), description of developed interest variables (suppl. mat. e), the supplementary study (suppl. mat. g). keypoints educational live action role-playing games (edu-larps) are supposed to enhance learning outcomes by motivating learners. in this study, learners played a 2-hour-long edu-larp with an integrated learning session. we asked to what degree does motivation driven by interest in playing the edu-larp affect learning outcomes compared to learning-driven motivation. learning-driven motivation (rather than playing-driven motivation) predicted learning outcomes; the effects of the latter were positive, but small. developed topic interest (rather than interest in larps) predicted learning outcomes; the effects of the latter were still positive, but small. acknowledgments we thank research assistants who helped to conduct the experiments, most notably: t. zoulová, n. frollová, k. koppová, n. střádalová, and p. šustová. we thank suzanne hidi and k. ann renninger for discussing methods for measuring interest with us. we also thank sarah lynne bowman for discussing this project with us and for commenting on early versions of this manuscript. we thank lucie filipenská for making the videos and all the actors: anna kratochvílová, anežka rusevová, pavol smolárik, and luboš veselý; including actors in pilot videos: břetislav dufek, tereza “tess” kovanicová, and jan kovanic. we also thank david obdržálek, jan hrach, tomáš “jethro” pokorný, and jana stárková for making the devices, and brmlab, a community-run hackerspace in prague, for assistance. this study was primarily funded by czech grant science foundation (ga čr), project nr. 15-14715s. work of f. d. was supported by rvo 68081740 by the czech academy of sciences. table of footnotes 1 the p-value, unreported in the paper, is .076 (pieter wouters; email dating from 16 dec 2013). 2 some questions administered in this study were not analyzed and reported here. examples include negative affect (from panas), which was out of present scope (see suppl. mat. a for descriptive data), and various manipulation check questions (e.g., on initial interest). 3 see suppl. mat. a for factor loadings and supplementary analyzes. references bowman, s. l. (2014). educational live action role-playing games: a secondary literature review. in: the wyrd con companion book (pp. 112-131): wyrd con. bowman, s. l., & standiford, a. (2015). educational larp in the middle school classroom: a mixed method case study. international journal of role-playing, 5, 4-25. boyle, e. a., hainey, t., connolly, t. m., gray, g., earp, j., ott, m., ... & pereira, j. (2016). an update to the systematic literature review of empirical evidence of the impacts and outcomes of computer games and serious games. computers & education, 94, 178-192. doi: 10.1016/j.compedu.2015.11.003 brom, c., buchtová, m., šisler, v., děchtěrenko, f., palme, r., & glenk, l. m. (2014). flow, social interaction anxiety and salivary cortisol responses in serious games: a quasi-experimental study. computers & education, 79, 69-100. doi: 10.1016/j.compedu.2014.07.001 brom, c., šisler, v., slussareff, m., selmbacherová, t., & hlávka, z. (2016). you like it, you learn it: affectivity and learning in competitive social role play gaming. international journal of computer-supported collaborative learning, 11 (3), 313-348. doi: 10.1007/s11412-016-9237-3 brom, c., děchtěrenko, f., frollová, n., stárková, t., bromová, e., & d’mello, s. k. (2017). enjoyment or involvement? affective-motivational mediation during learning from a complex computerized simulation. computers & education, 114, 236-254. doi: 10.1016/j.compedu.2017.07.001 brummel, b. j., gunsalus, c., anderson, k. l., & loui, m. c. (2010). development of role-play scenarios for teaching responsible conduct of research. science and engineering ethics, 16(3), 573-589. doi: 10.1007/s11948-010-9221-7 clark, d. b., tanner-smith, e. e., & killingsworth, s. s. (2016). digital games, design, and learning a systematic review and meta-analysis. review of educational research, 86(1), 79-122. doi: 10.3102/0034654315582065 cordova, d. i., & lepper, m. r. (1996). intrinsic motivation and the process of learning: beneficial effects of contextualization, personalization, and choice. journal of educational psychology, 88 (4), 715-730. doi: 10.1037/0022-0663.88.4.715 csikszentmihalyi, m. (1975). beyond boredom and anxiety: jossey–bass, san francisco, ca. deci, e. l., & ryan, r. m. (1985). intrinsic motivation and self-determination in human behavior. new york: plenum. deci, e. l., & ryan, r. m. (2008). facilitating optimal motivation and psychological well-being across life's domains. canadian psychology, 49(1), 14-23. doi: 10.1037/0708-5591.49.1.14 decharms, r. (1968). personal causation: the internal affective determinants of behavior . new york: academic press. eccles, j. s., & wigfield, a. (2002). motivational beliefs, values, and goals. annual review of psychology, 53(1), 109-132. doi: 10.1146/annurev.psych.53.100901.135153 fu, f.-l., su, r.-c., & yu, s.-c. (2009). egameflow: a scale to measure learners’ enjoyment of e-learning games. computers & education, 52(1), 101-112. doi: 10.1016/j.compedu.2008.07.004 fulmer, s. m., d'mello, s. k., strain, a., & graesser, a. c. (2015). interest-based text preference moderates the effect of text difficulty on engagement and learning. contemporary educational psychology, 41, 98-110. doi: 10.1016/j.cedpsych.2014.12.005 grolnick, w. s., & ryan, r. m. (1987). autonomy in children's learning: an experimental and individual difference investigation. journal of personality and social psychology, 52(5), 890-898. doi: 10.1037/0022-3514.52.5.890 habgood, m. j., & ainsworth, s. e. (2011). motivating children to learn effectively: exploring the value of intrinsic integration in educational games. journal of learning sciences, 20(2), 169-206. doi: 10.1080/10508406.2010.508029 hayden, j. k., smiley, r. a., alexander, m., kardong-edgren, s., & jeffries, p. r. (2014). the ncsbn national simulation study: a longitudinal, randomized, controlled study replacing clinical hours with simulation in prelicensure nursing education. journal of nursing regulation, supplement 5(2), s1-s64. doi: 10.1016/j.ecns.2012.07.070 hidi, s., & renninger, k. a. (2006). the four-phase model of interest development. educational psychologist, 41(2), 111-127. doi: 10.1207/s15326985ep4102_4 hyltoft, m. (2008). the role-players’ school: østerskov efterskole playground worlds: creating and evaluating experiences of role-playing games (pp. 12-25): ropecon ry. hyltoft, m. (2010). four reasons why edu-larp works larp: einblicke (pp. 43-57): zauberfeder verlag. iten, n., & petko, d. (2014). learning with serious games: is fun playing the game a predictor of learning success? british journal of educational technology, 47(1), 151-163. doi: 10.1111/bjet.12226 jabbar, a. i., & felicia, p. (2015). gameplay engagement and learning in game-based learning: a systematic review. review of educational research, 85(4), 740-779. doi: 10.3102/0034654315577210 juul, j. (2003). the game, the player, the world: looking for a heart of gameness. plurais-revista multidisciplinar, 1(2), 248-270. kapp, k. m. (2014). do not use games for “stealth learning”. retrieved from http://karlkapp.com/do-not-use-games-for-stealth-learning/ (acessed 15-06-2019) kalyuga, s. (2011). cognitive load theory: how many types of load does it really need? educational psychology review, 23(1), 1-19. doi: 10.1007/s10648-010-9150-7 keller, j. m. (2010). motivational design for learning and performance: the arcs model approach: new york: springer. kot, y. i. (2012). educational larp: topics for consideration. in: wyrd con companion book (pp. 118-127): wyrd con. leppink, j., paas, f., van gog, t., van der vleuten, c. p., & van merrienboer, j. j. (2014). effects of pairs of problems and examples on task performance and different types of cognitive load. learning and instruction, 30, 32-42. doi: 10.1016/j.learninstruc.2013.12.001 mayer, r. e. (2014). incorporating motivation into multimedia learning. learning and instruction, 29, 171-173. doi: 10.1016/j.learninstruc.2013.04.003 mcauley, e., duncan, t., & tammen, v. v. (1989). psychometric properties of the intrinsic motivation inventory in a competitive sport setting: a confirmatory factor analysis. research quarterly for exercise and sport, 60(1), 48-58. doi: 10.1080/02701367.1989.10607413 mochocki, m. (2014). larping the past: research report on high-school edu-larp. in s. l. bowman (ed.), the wyrd con companion book (pp. 132-149): wyrd con. montola, m. (2008). the invisible rules of role-playing: the social framework of role-playing process. international journal of role-playing, 1(1), 22-36. moos, d. c., & marroquin, e. (2010). multimedia, hypermedia, and hypertext: motivation considered and reconsidered. computers in human behavior, 26(3), 265-276. doi: 10.1016/j.chb.2009.11.004 naismith, l. m., cheung, j. j., ringsted, c., & cavalcanti, r. b. (2015). limitations of subjective cognitive load measures in simulation-based procedural training. medical education, 49(8), 805-814. doi: 10.1111/medu.12732 pekrun, r. (2006). the control-value theory of achievement emotions: assumptions, corollaries, and implications for educational research and practice. educational psychology review, 18(4), 315-341. doi: 10.1007/s10648-006-9029-9 renninger, k. a., & pozos-brewer, r. k. (2015). interest, psychology of. in: international encyclopedia of the social & behavioral sciences, 2nd ed. (pp. 378-385). oxford: elsevier. renninger, k. a., & schofield, l. s. (2014). assessing stem interest as a developmental motivational variable. poster presented as part of a structured poster session. in: current approaches to interest measurement. philadelphia, pa: american educational research association. rey, g. d. (2012). a review of research and a meta-analysis of the seductive detail effect. educational research review, 7(3), 216-237. doi: 10.1016/j.edurev.2012.05.003 rheinberg, f., vollmeyer, r., & burns, b. d. (2001). fam: ein fragebogen zur erfassung aktueller motivation in lern-und leistungssituationen [in german]. diagnostica, 47, 57-66. rheinberg, f., vollmeyer, r., & engeser, s. (2003). die erfassung des flow-erlebens [in german]. in: diagnostik von motivation und selbstkonzept (pp. 261-279): hogrefe. ryan, r. m., & deci, e. l. (2000). intrinsic and extrinsic motivations: classic definitions and new directions. contemporary educational psychology, 25(1), 54-67. doi: 10.1006/ceps.1999.1020 ryan, r. m., rigby, c. s., & przybylski, a. (2006). the motivational pull of video games: a self-determination theory approach. motivation and emotion, 30(4), 344-360. doi: 10.1007/s11031-006-9051-8 sabourin, j. l., & lester, j. c. (2014). affect and engagement in game-based learning environments. ieee transactions on affective computing, 5(1), 45-56. doi: 10.1109/t-affc.2013.27 schiefele, u. (1999). interest and learning from text. scientific studies of reading, 3(3), 257-279. doi: 10.1207/s1532799xssr0303_4 schiefele, u., & krapp, a. (1996). topic interest and free recall of expository text. learning and individual differences, 8(2), 141-160. doi: 10.1016/s1041-6080(96)90030-8 schraw, g., bruning, r., & svoboda, c. (1995). sources of situational interest. journal of literacy research, 27(1), 1-17. doi: 10.1080/10862969509547866 sharp, l. a. (2012). stealth learning: unexpected learning opportunities through games. journal of instructional research, 1, 42-48. doi: 10.9743/jir.2013.6 sitzmann, t. (2011). a meta‐analytic examination of the instructional effectiveness of computer‐based simulation games. personnel psychology, 64(2), 489-528. doi: 10.1111/j.1744-6570.2011.01190.x sweller, j., ayres, p., & kalyuga, s. (2011). cognitive load theory. new york: springer. tingley, d., yamamoto, t., hirose, k., keele, l., & imai, k. (2014). mediation: r package for causal mediation analysis. journal of statistical software, 59(5). doi: 10.18637/jss.v059.i05 um, e. r., plass, j. l., hayward, e. o., & homer, b. d. (2012). emotional design in multimedia learning. journal of educational psychology, 104(2), 485-498. doi: 10.1037/a0026609 vanek, a., & peterson, a. (2016). live action role-playing (larp): insight into an underutilized educational tool. in: learning, education and games. volume two: bringing games into educational contexts (pp. 219-240): etc press. vansteenkiste, m., sierens, e., soenens, b., luyckx, k., & lens, w. (2009). motivational profiles from a self-determination perspective: the quality of motivation matters. journal of educational psychology, 101(3), 671-688. doi: 10.1037/a0015083 watson, d., clark, l. a., & tellegen, a. (1988). development and validation of brief measures of positive and negative affect: the panas scales. journal of personality and social psychology, 54(6), 1063-1070. doi: 10.1037/022-3514.54.6.1063 wouters, p., van nimwegen, c., van oostendorp, h., & van der spek, e. d. (2013). a meta-analysis of the cognitive and motivational effects of serious games. journal of educational psychology, 105(2), 249-265. doi: 10.1037/a0031311 wouters, p., & van oostendorp, h. (2017). overview of instructional techniques to facilitate learning and motivation of serious games. in instructional techniques to facilitate learning and motivation of serious games (pp. 1-16): springer. frontline learning research 2 (2013) 70-85 issn 2295-3159 corresponding author: stellan ohlsson, university of illinois, chicago, stellan@uic.edu http://dx.doi.org/10.14786/flr.v1i2.58 70 | f l r beyond evidence-based belief formation: how normative ideas have constrained conceptual change research stellan ohlsson a a university of illinois at chicago, chicago, united states article received 4 september 2013 / revised 13 december 2013 / accepted 13 december 2013/ available online 20 december 2013 abstract the cognitive sciences, including psychology and education, have their roots in antiquity. in the historically early disciplines like logic and philosophy, the purpose of inquiry was normative. logic sought to formalize valid inferences, and the various branches of philosophy sought to identify true and certain knowledge. normative principles are irrelevant for descriptive, empirical sciences like psychology. normative concepts have nevertheless strongly influenced cognitive research in general and conceptual change research in particular. studies of conceptual change often ask why students do not abandon their misconceptions when presented with falsifying evidence. but there is little reason to believe that people evolved to conform to normative principles of belief management and conceptual change. when we put the normative traditions aside, we can consider a broader range of hypotheses about conceptual change. as an illustration, the pragmatist focus on action and habits is articulated into a psychological theory that claims that cognitive utility, not the probability of truth, is the key variable that determines belief revision and conceptual change. keywords: belief formation; belief revision; cognitive utility; conceptual change; descriptive vs. normative inquiry; pragmatism cognitive scientists pride themselves on their interdisciplinary approach, drawing upon anthropology, artificial intelligence, evolutionary biology, linguistics, logic, neuroscience, philosophy, psychology, and yet s. ohlsson 71 | f l r other disciplines in their efforts to understand human cognition. interdisciplinary research strategies have paid off in the natural sciences. for example, in the middle of the 20 th century, research on the border between biology and chemistry resulted in spectacular advances, including the determination of the structure of dna (watson & crick, 1953). it is plausible that an interdisciplinary approach will pay off in the study of cognition as well. but the cognitive sciences exhibit principled differences that might get in the way of interdisciplinary efforts. inquiry into cognition was originally rooted in the desire for human betterment. the first cognitive disciplines, including logic, epistemology, and linguistics, were normative disciplines. logicians wanted to systematize valid inferences, as opposed to whatever inferences, including fallacious ones, that people make; philosophers sought to identify criteria for certain knowledge, as opposed to describe all knowledge 1 ; and early linguists were more concerned with codifying correct grammar than with cataloguing grammatical errors. historically, these and related disciplines mixed normative and descriptive elements in a way that is quite foreign to the contemporary conception of a natural or social science. in this respect, they resembled aesthetics, ethics, and legal scholarship more than biology, chemistry, and physics as practiced since the scientific revolution (butterfield, 1957; osler, 2000). because the normative disciplines were historically prior, the concepts, practices, and tools of normative inquiry became part of the intellectual infrastructure of the self-consciously descriptive sciences like neuroscience and experimental psychology that became established in the latter half of the 19 th century. concepts like abstraction, association, and imagery are obvious examples of such imports. this intellectual inheritance helped the new sciences get started, in part by suggesting questions and problems (how and when are associations formed?). but normative and descriptive disciplines are different enough in their goals and methods so that it is reasonable to ask whether that inheritance has had a negative impact as well. in this article, i argue that certain normative ideas have led the study of cognition in general and cognitive change in particular down an unproductive path. the types of cognitive changes i have in mind are those that psychologists call conceptual change, belief revision, and theory change (carey, 2009; duit & treagust, 2003; nersessian, 2001; thagard, 1992; vosniadou, baltas & vamvakoussi, 2007) for purposes of this article, i use these terms as near-synonyms. when a collective label is needed, i call them nonmonotonic change processes (ohlsson, 2011). the argument proceeds through seven steps. in section 1, i elaborate on the distinction between normative and descriptive inquiry. section 2 highlights the role of normative concepts in what i call the ideal-deviation paradigm, a particular style of research that is common in cognitive psychology. in section 3 i show that this research paradigm is also present, albeit implicitly, in conceptual change research. from a normative perspective, people ought to base their concepts and beliefs on evidence and revise them when they are contradicted by new evidence. the assumption -sometimes implicit, sometimes explicit -that people‟s cognitive systems are designed to operate in this way has focused researchers‟ attention on the deviations of human behaviour from normatively correct belief management (using the latter term as a convenient shorthand for “belief formation and belief revision”). but the normative perspective is irrelevant to the scientific study of cognitive change; hence, so are the deviations between the norms and actual cognitive processing. but if the deviations are irrelevant, so are our explanations for them. to go beyond the current state of conceptual change research, we need to make explicit the influence of the normative perspective, identify the constraints it has imposed on theory development, and relax those constraints. when we cultivate a resolutely descriptive stance, the space of possible theories of conceptual change expands. as a first step towards a new theory, section 4 argues that the notion that people form concepts and beliefs on the basis of evidence might be fundamentally incorrect. the problem of why people do not revise their misconceptions when confronted with contradictory evidence then dissolves, and other questions move to the foreground. in section 5, i outline an approach to conceptual change that is inspired by the pragmatist notion that concepts and beliefs are tools for successful action. according to this perspective, the key variable that drives conceptual change is not the strength of the relevant evidence or the probability of truth, but cognitive 1 indeed, many philosophers insist that unless knowledge is true and certain, it does not qualify as knowledge. s. ohlsson 72 | f l r utility. section 6 answers two plausible objections to this view, and section 7 outlines some of its implications. section 8 recapitulates the argument. although the concept of utility is not itself new, the critique of conceptual change research as mired in normative ideas is stated here for the first time, and the conjecture that utility can replace probability of truth as the key theoretical variable in conceptual change has not been previously proposed. 1. normative versus descriptive inquiry descriptive sciences aim to provide accurate theories about the way the world is. as i use it, the word “descriptive” does not stand in contrast to “theoretical” or “explanatory” but encompasses all the empirical and theoretical practices of the natural and social sciences as we now conceive them; the term “empirical” could have been used instead, but is sometimes understood as standing in contrast to “theoretical.” the goal of descriptive science is to provide an account of reality that is intersubjectively valid and reflects the world as it is, independent of human judgments or wishes. descriptive sciences are essentially concerned with adapting theories and concepts to data. the descriptive sciences constitute what we today call “science.” normative disciplines, in contrast, investigate how things ought to be. they are essentially concerned with conformity to standards of goodness. a large portion of what we have in our heads consists of more or less explicit normative knowledge. aesthetics, epistemology, ethics, etiquette, law, literary criticism, logic, rhetoric, and several other disciplines ask what the appropriate standards are, or should be, in some area of human endeavour; how one decides whether some instance does or does not conform to the relevant standards; and why particular instances conform, or fail to conform. the state of theory in these disciplines varies widely, from formalized theories of valid inferences in logic to the obviously culture-dependent rules of good manners, and the highly controversial theories of literary criticism. as these nutshell definitions are meant to illustrate, descriptive and normative disciplines are so different from each other that the distinction seems impossible to overlook. but the separation of normative and descriptive inquiry was in fact long in coming; for example, psychology was included among “the moral sciences” well into the 19 th century. the distinction was not fully articulated and accepted in western thought until the 20 th century, supported by, among other influences, the logical positivists‟ emphasis on the distinction between fact and value. (for criticisms of the distinction, see, e.g., köhler, 1938/1966, putnam, 2002, and others.). it is nevertheless anachronistic to think of earlier generations of scholars as having confused descriptive and normative inquiry. the situation is better described by saying that they had not yet distinguished them. astronomy provides an example of research in an era when the distinction was not yet fully articulated (kuhn, 1957; margolis, 1987, 1993). some ancient astronomers adopted the normative idea that planets ought to move in perfectly circular orbits, because the heavenly bodies were perfect beings and perfect beings ought to move in perfect orbits and the circle is the most perfect geometric figure. astronomers then spent two millennia explaining the deviations of the observed planetary orbits from the normatively specified orbits using the ptolemaic construct of epicycles, instead of exploring other hypotheses about the geometric shape of the orbits (frank, 1952). research in the natural sciences is no longer constrained by normative ideas in this way. section 2 shows that psychology, in contrast, has not yet outgrown its normative inheritance. s. ohlsson 73 | f l r 2. the ideal-deviation paradigm normative principles have generated a psychological research paradigm that i refer to as the idealdeviation paradigm. although only a subset of psychological research conforms to this paradigm, the paradigm has had a strong and largely negative impact on research in cognitive psychology in general and research on conceptual change in particular. a line of research that follows this paradigm proceeds through the following general steps (examples to follow): (a) choose a normative theory. how ought the mind carry out such-and-such a process, or perform such-and-such a task? (b) construct or identify a situation or task environment in which that theory applies, and derive its implications for normatively correct behaviour. (c) recruit human subjects and observe their behaviour in the relevant situation. (d) describe the deviations of the observed behaviour from the normatively correct behaviour. (e) hypothesize an explanation for the observed deviations. (f) test the explanation in further empirical experiments. readers who are familiar with cognitive psychology will have no difficulty in thinking of instances of the ideal-deviation paradigm. the prototypical example is research on logical inference (evans, 2007). in this area, researchers originally used logic as developed by logicians – primarily the logics of syllogistic and propositional inferences – as the relevant normative theory. the reasoning problems presented to human subjects include wason‟s famous 4-card task (a.k.a. the selection task; wason & johnson-laird, 1972). propositional logic prescribes a particular pattern of responses to this task, and the deviations of human responses from the prescribed pattern have been replicated in dozens, perhaps hundreds, of experimental studies. researchers have proposed and debated a wide range of explanations for the observed deviations (johnson-laird, 2006; klauer, stahl, & erdfelder, 2007). research on decision making is a second instance of the ideal-deviation paradigm. the subjective expected utility (seu) theory and the mathematics of probability provide a normative theory for how to choose among competing options. when people are confronted with choices that involve probabilistic outcomes in laboratory settings, they deviate from the normatively correct behaviour in a variety of ways. errors like availability and representativeness are examples. in the former, human judgments about the probability of an event (e.g., an airline crash) are influenced by the ease with which the person can retrieve an example of such an event from memory. in the latter case, human judgments are influenced by the similarity of a sample to the population from which the sample was drawn. in this field, too, researchers have proposed, debated, and experimentally investigated multiple explanations for these and other observed deviations (kahneman, 2011). the key point for present purposes is that the ideal-deviation paradigm mixes normative and descriptive elements in a way that is foreign to the way we now think of scientific research. to highlight this point, imagine biochemists in the 1950s deciding that a particular protein molecule ought to fold itself into such-and-such a three-dimension structure, for, say, aesthetic reasons. imagine also that they observe that the actual shape of the molecule deviates from this normatively specified structure, and then spend their time and theoretical energy explaining why the protein deviates in such-and-such a way from the supposedly correct structure, instead of explaining why it folds together the way it actually does. no such investigation could survive peer review for a contemporary chemistry journal. in short, given our current conception of scientific research, there is no justification for the normative element in the ideal-deviation paradigm. a descriptive theory of how people think or learn must be based on accounts of the actual processes occurring in people‟s heads when they draw inferences, make decisions, and s. ohlsson 74 | f l r revise their knowledge, regardless of whether those processes are similar to, or different from, normatively correct processes. comparing empirical observations to a normative theory contributes nothing to that enterprise. section 3 argues that normative conceptions are nevertheless at the centre of contemporary research in conceptual change. 3. ideal-deviation in conceptual change the ideal-deviation paradigm has strongly impacted psychological research on belief revision, conceptual change, theory change, and related processes. the impact is not immediately obvious, because the relevant normative theory is less precise and less explicit than the normative theories that underpin studies of logical reasoning and decision making. the normative theory of belief management can be summarized in four principles: principle 1: grounding. beliefs and concepts ought to be based on evidence. in this context, “based on” means “derived from.” the derivation is typically understood to be some form of induction across qualitative observations and/or aggregation of quantitative data. to adopt a belief for which one has no evidence is deplorable, even irresponsible, and a belief that is not grounded in evidence is dismissed as a guess, prejudice, or mere speculation. principle 2: graded conviction. beliefs ought to be held with a conviction that is proportional to the strength of the relevant evidence. for example, hearsay provides weaker evidence than direct observation; a anecdote provides weaker evidence than a study based on a representative sample; and a correlational study provides weaker evidence for a causal relation than an experimental study. the strength of one‟s convictions ought to reflect such differences in the nature and extent of the relevant evidence. for purposes of quantitative comparisons, the conviction with which a belief is held can be conceptualized as an estimate of its probability of being true. principle 3: belief-belief conflicts. when two beliefs or informal theories contradict each other, the person ought to choose to believe the one that is backed by the stronger evidence. the theory with the strongest support ought to have priority in the control of behaviour, including both discourse and action. to hold contradictory beliefs (p & not-p) is to be inconsistent and hence irrational. principle 4: belief-evidence conflicts. when beliefs are contradicted by new evidence, they ought to be revised so as to be consistent with both the old and the new evidence. failure to do so makes a person “closed minded”, “irrational”, “rigid minded”, or a victim of “robust misconceptions.” these four principles are mere common sense; this is how a rational agent ought to manage his or her beliefs. there seems to be little gain in giving such vacuous verities the status of principles. but my purpose is to make explicit what is normally too embedded in our conceptual infrastructure to be visible. elements of the normative theory of belief management, masquerading as descriptive statements, can be found throughout the cognitive sciences. for example, allport (1958/1979) proposed the contact theory of racial prejudice. the key idea was that negative racial stereotypes would be diminished if a person with such a stereotype were subjected to frequent contacts with members of the relevant ethnic group. the hypothesis was that the contacts would provide evidence against the negative stereotypes and pave the way for other, more positive opinions. in the philosophy of science, kuhn (1970) described theory change as a consequence of the accumulation of anomalies. in educational psychology, posner et al. (1982) hypothesized that students have to be dissatisfied with their current beliefs about scientific phenomena before they are prepared to revise them, and that being confronted with evidence to the contrary is the key source of dissatisfaction. in developmental psychology, gopnik and meltzoff (1997) embraced principle 4, designating belief-evidence conflicts as the main drivers of cognitive change: “theories may turn out to be inconsistent with the evidence, and because of this theories change.” (p. 39) although other processes are involved as well, the processing of counterevidence is the most important: “theories change as a result of a number of different epistemological processes. one particularly critical factor is the accumulation of counterevidence to the theory.” (p. 39) s. ohlsson 75 | f l r paradoxically, cognitive scientists confidently assert these variations of principle 4, while they simultaneously and in parallel assert that people deviate from principle 4. in discipline after discipline, researchers have observed that people do not always and necessarily revise their beliefs when confronted with contradictory evidence. the predictions of the contact theory of racial prejudice were not verified and the theory had to be reformulated (pettigrew, 1998). likewise, strike and posner (1992) found that students retain their misconceptions even after instruction that is directly aimed at confronting those misconceptions with contradicting evidence. “one of the most important findings of the misconception literature…is that misconceptions are highly resistant to change” (strike & posner, 1992, p. 153). in a review paper, limón (2001) wrote that “…the most outstanding result of the studies using the cognitive conflict strategy is the lack of efficacy for students to achieve a strong restructuring and, consequently, a deep understanding of the new information.” (p. 364) indeed, the deviation of student behaviour from principle 4 is the very phenomenon that created conceptual change as a distinct field of research, at least within educational research. consistent with the ideal-deviation paradigm, researchers have responded to the finding that people do not (necessarily) adapt their beliefs to contradictory evidence by proposing various explanations for this deviation. for example, rokeach (1960, 1970) proposed that belief systems have a hierarchical structure, and that change becomes more and more difficult as one moves from the periphery to the centre. as a result, most changes are peripheral and central principles are hardly ever affected by evidence. political and religious principles are cases in point. the philosopher imre lakatos has proposed a similar theory to explain theory change in science (lakatos, 1980). festinger (1957/1962) launched a long-lasting line of research in social psychology that centred on a set of mechanisms for reducing what he called cognitive dissonance. cognitive mechanisms for dissonance reduction process contradictory evidence without any fundamental revision of the relevant beliefs. more recently, cognitive psychologists have added yet other explanations. the category shift theory of chi (2005, 2008) and co-workers explains the robustness of misconceptions as a consequence of the inheritance of characteristics from the (frequently inappropriate) ontological category to which a phenomenon has been assimilated. vosniadou and brewer (1992) and vosniadou and skopeliti (2013) explain the deviations as a consequence of the synthesis of prior (and frequently inaccurate) mental models into more comprehensive (but sometimes equally inaccurate) mental models. sinatra and co-workers have added motivational and emotional variables as additional sources of explanation (broughton, sinatra, & nussbaum, 2013; sinatra & pintrich, 2003). yet other perspectives on conceptual change have been proposed (see, e.g., rakison & poulin-dubois, 2001; shipstone, 1984). ohlsson, (2011, chap. 9) provides a more extensive comparative analysis of these and related types of explanations. in short, although the normative theory of belief formation is less explicit than the normative theories that underpin studies of logical reasoning and decision making, research on conceptual change follows closely the ideal-deviation paradigm. the basic structure of conceptual change research is that (a) students ought to revise their misconceptions when confronted with contradictory evidence, (b) the empirical evidence indicate that they do not in fact do so, and therefore (c) we need to explain why they do not do so. but this research enterprise is only meaningful if one accepts principles 1 4 as relevant for the study of conceptual change. section 4 prepares for a new approach to conceptual change by arguing that people do not base their concepts and beliefs on evidence. if so, principles 1-4 are irrelevant for understanding conceptual change. 4. the irrelevance of evidence at first glance, the normative theory of belief management seems highly relevant for understanding human cognition. if people do not adapt their beliefs to reality, how do they get through their day? surely the deviations from rational belief management uncovered in various areas of cognitive research are relatively minor slips of a fundamentally rational cognitive system for building and maintaining a veridical belief base? such slips might be due, for example, to cognitive capacity limitations or emotional biases. s. ohlsson 76 | f l r this view is plausible but difficult to evaluate. we know very little about how people form and revise beliefs in natural settings, because there are few relevant empirical studies. what follows are some informal observations and examples. in conjunction, they suggest a radical conclusion: the principle that people base their beliefs on evidence might be fundamentally incorrect rather than an optimistic idealization or a partial truth. an adult person has a large belief base in memory, at least if the term “belief” is applied broadly enough to include not only the deep principles that tend to be the object of analysis, but also local, concrete facts. for example, i have multiple beliefs about the public transportation system in the city where i live: that there are buses and subway trains; that there are multiple subway lines; where they go; how long a trip is likely to take; how much it costs; the location of stations; and so on. this small domain of experience is likely to encompass several hundreds, perhaps even thousands, of beliefs, most of which are likely to be accurate. the view that a belief ought to be derived from, or based on, observational evidence works well with respect to such concrete, particular matters. for example, the belief that there is a subway station at the corner of x and y streets might very well be acquired by no more complicated a process than walking down x street and encountering that very station at the crossing with y street. such routine belief formation events can plausibly be attributed to direct observation and in that sense conforms to principles 1 4. the direct observation account of belief formation quickly runs into difficulties when the belief is general. for example, most adults have a variety of beliefs about economical, political, and social affairs. informal observations indicate that a significant proportion of such beliefs are not based on any evidence whatsoever. will austerity economics stimulate the economy or depress the markets by robbing consumers of their ability to consume? quite a few adults are prepared to offer a point of view about this issue, and, just as obviously, very few of them have access to relevant quantitative data or other observational evidence. this is not an isolated instance. consider the range of controversial socio-political and economic issues in the public discourse: gun control, surveillance by intelligence organizations, same-sex marriage, drone strikes on foreign soil, the benefits of universal health care – a large proportion of adults have beliefs regarding many of such issues, but almost none of those beliefs are based on evidence. that is, a person who holds a belief on such an issue did not, as a rule, induce it from multiple historical examples or derive it from statistical data or other types of observational evidence. most people cannot give any coherent or detailed account of why, how, or even when they adopted any particular belief. if people operated with principle 1, they ought to answer almost every question about socio-political and economic issues by saying, “i don‟t know; i don‟t have an opinion on that; i don‟t have enough information.” if general beliefs are not formed by induction from observations, how, by what processes are they formed instead? informal reflection on everyday life suggests that we form general beliefs by accepting what someone else tells us, either in face-to-face conversation or via media. the notion of evidence does not enter into this belief formation process in any prominent way, because we do not normally and as a rule question or doubt what we are being told. it is enough to hear someone say it for us to encode it as veridical. gabbay and woods (2001) call this the ad ignorantiam rule: “human agents tend to accept without challenge the utterances and arguments of others except where they know or think they know or suspect that something is amiss.” (p. 150) the reason for this rule is probably that we tend to communicate with people we trust, and access sources that we have already judged as reliable. but this does not support the normative principle, because it is not obvious that we base our judgments about the trustworthiness of a source on anything that would qualify as evidence. another hypothesis is that belief formation is internal to the cognitive system. many of our beliefs appear to arrive in the belief base as consequences of already adopted beliefs. for example, i believe that public education is an essential social institution. i also believe that nations that invest in education will fare better than those that do not. it would be an exaggeration to say that i have evidence for the second belief. after all, what counts as evidence as to what will happen in the future? it seems more accurate to say that i have adopted the second belief because it follows from the first. if public education is essential, nations underfund it at their peril. intra-mental derivations of this sort can hardly be characterized as evidence-based. first, the question of evidence is merely pushed one step backwards, because the derived belief cannot be s. ohlsson 77 | f l r said to be evidence-based unless the beliefs it is derived from are themselves evidence-based. second, the internal derivations are influenced by factors that are themselves unrelated to truth, such as a desire for consistency, instrumental gain, and various types of biases. consider next principle 2, that beliefs ought to be held with graded convictions that reflect the strength of the evidence. are people sensitive to the relative strength of the evidence? that is, do people in general and as a rule hold their beliefs more strongly when they are supported by more evidence and less strongly when the support is weaker? a thorough answer to this question would require extensive data collection and some way of measuring the strength of the relevant evidence. however, it is noteworthy that the beliefs that people hold with the greatest conviction tend to be their religious beliefs, and church leaders and followers alike insist that religious beliefs are, and should be, based on faith, not evidence. the fact that faith-based beliefs are held more strongly than other classes of beliefs is inconsistent with the idea that our brains are programmed, in some deep and fundamental way, to base beliefs on evidence. other examples of conviction levels that do not seem to reflect the strength of the available evidence include those that pertain to beliefs regarding climate change and the value of vaccinations. at one time, it was rational to be sceptical regarding the reality of climate change; now, the evidence is overwhelming (oreskes, 2004). nevertheless, some people continue to believe that the climate is not changing. the controversy over vaccinations exhibits a similar pattern. although caution was once rational, there are now multiple, large-scale studies that show conclusively that there is nothing wrong with the common vaccines that are given to children, or with the way they are administered. people do not get sick from vaccines; they get sick from germs. however, anti-vaccine activists continue to claim that vaccines are harmful and they have many followers (offit, 2011). finally, consider principle 3, namely that theory-theory conflicts are to be resolved with reference to the relative strengths of the evidence for the competing theories. do people consistently side with the view that has the strongest evidence? consider the issue whether human beings are fundamentally evil and require discipline in order to behave themselves, or fundamentally good, so all they need is an opportunity to blossom in a natural way. every news story about yet another serial killer is evidence for the former view; every heartwarming news story about someone who goes out of their way to make a difference for people around them is evidence for the latter view. every war produces novel atrocities, but every natural catastrophe – forest fire, hurricane, tsunami – generates a fresh batch of stories about individual heroism and self-sacrifice. anybody who attends to the news has as much evidence for one view as for the other. given that there is much evidence for either view, those of us who hold a strong opinion on the issue of human nature must have resolved the conflict between these two theories at least partially on the basis of something other than the evidence. to summarize, informal observations suggest that people do not, in general, induce their beliefs from observational evidence. although we often base concrete beliefs about particular objects and events on direct observation, we appear to form general beliefs through ubiquitous encoding of communications by trusted sources and by deriving them from other, already adopted beliefs. these hypothetical but plausible belief formation processes are not inductive in nature, and the contribution of what we normally call evidence to each is weak. in addition, people show few signs of holding their beliefs with a conviction that is proportional to the strength of the supporting evidence, resolve conflicts among competing beliefs by comparing the relative strength of the supporting evidence, or to revise their beliefs when they encounter contradictory evidence. these observations suggest that the normative theory of belief management, taken as a descriptive theory, is fundamentally wrong rather than merely an optimistic idealization. but if so, why do the observed deviations of human behaviour from principles 1-4 deserve our attention? the consequence of abandoning principles 1-4 is that conceptual change researchers no longer need to explain why misconceptions are robust in the face of contradictory evidence. if there is no reason to expect students to revise their beliefs when confronted with new evidence, then the absence of such revisions is not puzzling. explanations for why misconceptions are robust become obsolete, not in the sense of being falsified, but in the sense of being answers to a question we do not need to ask. the problem of why people do not revise their beliefs is not so much solved as dissolved. however, the task of formulating a scientific s. ohlsson 78 | f l r theory of belief formation and belief revision that can support effective pedagogical practices remains. in section 5, i propose that some of the ideas put forward by the american pragmatist philosophers can serve as a starting point for a new approach to conceptual change. 5. a pragmatist approach at the end of the 19 th century and the beginning of the 20 th , american scholars, lead by william james, charles sanders peirce, and john dewey, tried to reformulate the classical philosophical problems about knowledge, meaning, and truth in terms of action instead of observation. they claimed that the meaning of a concept or belief resides in the set of actions or “habits” to which it gives rise. “the essence of belief is the establishment of a habit, and different beliefs are distinguished by the different modes of action to which they give rise.” (peirce, 1878, p. 129-130) the truth of a belief is tied to the outcomes of executing those habits. they stopped short of claiming that, “what works is what is true”, but some of their contemporaries stated their ideas even more boldly than they did themselves (schiller, 1905). pragmatism did not flourish as a philosophical, i.e., normative, theory. its impact faded after the demise of its most charismatic leaders. although it is once again receiving serious attention from philosophers (stich, 1983), my purpose is not to revive philosophical pragmatism. instead, i intend to mine this strand of thought for an approach to cognitive change that does not begin with the assumption that people decide what to believe by estimating the probability of truth. the question is what they estimate instead. 5.1 cognitive utility as the basis for cognition the pragmatist emphasis on action fits well with psychological theories of cognition. there is broad consensus on certain general features of what cognitive psychologists have come to call the cognitive architecture, i.e., the information processing machinery that underpins the higher cognitive processes (polk & seifert, 2002). at the centre of the cognitive architecture there is a limited-capacity working memory, connected to separate long-term memory stores for declarative and practical (skill) knowledge. the working memory receives input from sensory systems, and holds information that is being processed in reasoning and decision making. the purpose of the cognitive system is to generate behaviour that satisfies the person‟s current goal. in the process, the system makes endless, lightening quick choices: which goal to pursue next (planning); which part of the environment to attend to next (attention allocation); which interpretation of perceptual input to prefer (perception); which memory structure to activate next (retrieval); which inference to carry out next (reasoning); and which change, if any, to make in the system‟s knowledge base at any given time (learning). the pragmatist stance invites the hypothesis that the variable that guides the never-ending choices is the cognitive utility of the relevant knowledge structures. to articulate this idea, imagine that each knowledge structure (concept or belief) in memory is associated with a numerical value that measures its past usefulness. when there is a choice to be made among knowledge structures, the one with the higher utility is preferred and gets to control discourse and action. if a knowledge structure is instrumental in generating a particular action, and if that action is successful, then the utility of that knowledge structure is adjusted upwards; if the action is unsuccessful, it is adjusted downwards. each application of a knowledge structure is an opportunity for that structure to accrue utility (or to loose some of it, in the case of unsuccessful action). over time, the value of the cognitive utility associated with a knowledge structure will become stabilized at some asymptotic value that estimates its usefulness in general. the distribution of utility values over the belief base represents the person‟s experience of the world, as filtered through action rather than perception. the cognitive utility hypothesis is not novel. a construct of this sort has been incorporated into the act-r model of the cognitive architecture proposed by john r. anderson and co-workers (anderson, s. ohlsson 79 | f l r 2007). in act-r, cognitive skills are encoded in sets of goal-situation-action rules (skill elements) that specify an action to be considered when certain conditions are satisfied by the current situation. in each cycle of operation, the architecture retrieves all the rules that have their conditions satisfied. it then selects one of those rules to be executed; that is, its action is taken. the action usually changes the current situation, and the cycle starts over with a renewed evaluation of which rules have their conditions satisfied in the changed situation. in act-r, the utility u of rule i determines its probability of being selected for execution. in simplified form, that probability is given by eq. (1) prob(i) = u(i) / σ u(1, 2,…i,…j), where σ u(1,2,…i,…j) is the sum of the utilities of the rules for which the conditions are satisfied by the current situation. the probability that a particular rule i will be selected is thus proportional to how much of the total utility represented by all the currently satisfied rules it accounts for. the probability of being chosen for execution is thus a dynamic quantity that depends on context and that changes from moment to moment as cognitive processing unfolds. if rule i is selected and executed on operational cycle n, its utility is adjusted upwards or downwards, depending on the outcome. the adjustment is given by the equation eq. (2) u(i, n) = u(i, n-1) + α[r(i, n) – u(i, n-1)], in which i is the relevant rule, n is the operational cycle, and r is the reward or feedback from the environment about the success of the executed action (reinforcement in the behaviourist sense). the magnitude r(i, n) – u(i, n-1) is the reward the rule realized in cycle n, r(i, n), over and above the utility it already possessed in the previous cycle of operation, u(i, n-1). the rate parameter α controls the proportion of that reward increment that is to be added to the current utility of the rule, u(i, n-1), to compute the utility of the rule in the following cycle, u(i, n). the reader is recommended to consult the original source for further technical details (anderson, 2007, pp. 159-164). in the act-r theory, utility values are associated with skill elements (rules), and there is a separate system of theoretical quantities that pertain to the learning and application of declarative knowledge elements. to make the utility construct relevant for belief formation we have to hypothesize that utility values are associated with declarative knowledge structures (beliefs, concepts, informal theories) instead of (or in addition to) skill elements. furthermore, the relation between cognitive utility and belief (subjective truth) has to be specified. one possible hypothesis is that there is a threshold such that, when the utility of a particular belief rises above that threshold, the person feels that the belief is true. a cognitive system that operates in this way would be significantly different from act-r and other cognitive systems described in the cognitive literate (polk & seifert, 2002). a key question is what degree of utility new information will be assigned when it is first encoded into memory. at the outset, the new knowledge structure has no track record of supporting successful action, so one might decide that its initial utility is zero. this causes a paradox: if it is zero, it will always have lower utility than any competitor with even a modest track record, so it will never be activated or chosen, and therefore never have an opportunity to accrue utility. to an outside observer, it will appear as if the learner did not encode the new information, because his or discourse and action continue to be guided by other knowledge structures. there are multiple solutions to this theoretical problem. in anderson‟s act-r theory, the initial value is indeed set to zero, but a knowledge structure (rule) can be created multiple times, and each time the utility value is increased. other solutions are possible. the initial value can be hypothesized to be random, or equal to the mean of the utility values of all knowledge structures in memory. there might be situations in which s. ohlsson 80 | f l r competing older knowledge structures do not apply, but the new one does, and those situations afford the newer knowledge with opportunities to accrue utility. many scenarios that seem like straightforward instances of truth-based processing are equally or better understood in terms of utility. for example, suppose that my eyes itch. i might have dry eyes, or i might suffer from an allergy attack. i decide to take an antihistamine pill. the itch disappears. in a logic-inspired analysis, the belief that i am suffering from an allergy outbreak is a hypothesis the truth of which is unknown. the connection between the belief that i have an allergy and the prediction that the itch will disappear is a step-by-step chain of inferences. the disappearance of the itch is an observation that verifies the hypothesis, and my estimate of the probability that i have an allergy increases as specified by, for example, bayesian principles. this account has weaknesses. one weakness is that i am not aware of any lengthy reasoning process to arrive at a testable prediction. the process that connects the belief “i have an allergy attack” with the fact that my itch stopped is a process of problem solving and planning (what should i do about my itchy eyes?), not a process of propositional inference. another weakness is that the envisioned process is an instance of a logical fallacy: if p, then q in conjunction with q does not imply p. this is popper‟s classical critique of verificationism. but if the truth-based account has a logical fallacy at its core, how can people function? the utility-based account avoids this problem by postulating a direct link between the action outcome and the relevant belief: the action of taking the antihistamine worked, so my disposition to act on the allergy belief in the future is increased. as the example illustrates, the difference between an account in terms of evidence, inference, and truth, on the one hand, and an account in terms of utility and action, on the other, can be subtle. how does that difference affect how we view conceptual change? the pragmatist stance focuses attention on action, the output side of the cognitive system, instead of perception, the input side. what the learner does matters more than what he or she hears or sees. passive reception of information will not in and of itself have any cognitive consequences. unless the learner retrieves a knowledge structure and uses it to decide what to do next, that knowledge structure cannot accrue utility and hence might remain dormant, even though the new information has been encoded accurately. in the pragmatist perspective, new information does not replace the old. in a logic-based theory, two different beliefs can be mutually incompatible, which implies that a person cannot embrace both. the earth is either round or flat; it is impossible to believe both assertions at once. however, the fact that knowledge structure i has utility u(i) is not incompatible with the fact that knowledge structure j has utility u(j). the belief that the earth is flat might be useful for mapmaking purposes, while the belief that the earth is round might be more useful for the purpose of circumnavigation. many tasks in real life admit of multiple solutions, varying with respect to goal satisfaction, efficiency, and range of applicability. evaluating beliefs with respect to their cognitive utility is thus very different from evaluating them with respect to their truth. falsification by contradictory evidence is, in principle, a one-shot affair. a single application of modus tollens is logically sufficient to bring down a belief and even an entire theory. but utility-based belief revision is necessarily a gradual matter. once the utility rises to the point where a new knowledge structure is chosen to be the basis for action on at least some occasions, belief change is contingent on the outcomes of the resulting actions. the utility of structure i might be steadily raising with each application, while the utility of some competing knowledge structure j is gradually dropping. eventually, the utility of the newer knowledge structure will surpass that of the older, competing structures, and rise above the threshold of belief. if changes in utility values are incremental, then this process is necessarily gradual. the most radical difference between a truth-based and a utility-based account of cognition pertains to the trigger of conceptual change. in the truth-based account, it is the failure of the older knowledge that drives belief revision. change happens because already acquired concepts and beliefs have been found to be false, triggering dissatisfaction and a search for more veridical concepts and beliefs to replace them. if there is no failure, there is no push for change. in the utility-based account, on the other hand, new information need not wait for falsification or dissatisfaction with prior beliefs. it is the success of the newer concepts and beliefs that drives the change. no dissatisfaction with the old belief is required, only a recognition that the s. ohlsson 81 | f l r newer belief is an even more useful basis for action. change is driven by success, not failure (ohlsson, 2009, 2011). however, before the utility-based perspective can be adopted, some plausible objections must be dealt with; this is the task of section 6. 6. two objections the purpose of this section is to address two objections that must have occurred to the reader. the first is that evolution through natural selection ought to have pushed human cognition in the direction of the normative theory of belief management (principles 1-4), and the second is that the behaviour of scientists appears to conform to the normative theory. 6.1 natural selection for truthfulness? one might argue that the shift from estimates of the probability of truth to estimates of cognitive utility is unimportant. after all, how can a belief be useful unless it is, in fact, true? if only true beliefs are useful, then the selective pressures that drove the evolution of human cognition must have pushed the belief management processes in the learner‟s head to conform at least approximately to principles 14. how could our hunter-gatherer ancestors have survived unless their beliefs corresponded to reality? the instrumental value of veridicality in the struggle for survival implies that the human cognitive architecture is designed to derive beliefs from evidence. but natural selection cannot have operated directly on the truthfulness of beliefs. the probability of surviving long enough to mate and to raise the resulting offspring to reproductive age is a function of how the individual behaves, not on how he or she thinks. what mattered during human evolution cannot have been the truth of beliefs per se, but the effectiveness of human behaviour. consistent selection in the direction of effective action would create a utility-based rather than truth-based system. the distinction between truth and utility would be of minor importance, if the two were perfectly correlated. however, false beliefs can lead to successful action. for example, it does not matter what belief one has about the causes of severe weather, as long as that belief implies that when storm clouds gather, it is time to seek shelter. the belief that lightening is a sign of the anger of the gods and the belief that it is an electrical discharge are equally good reasons to get out of the way. the belief that a certain medical condition is caused by an evil spirit and that the spirit can be exercised by ingesting a certain herb can be as successful as an account of the disease in terms of bacteria, white blood cells, etc., if the relevant herb contains traces of, for example, an antibiotic substance. an even stronger example is provided by the 14 th century physicist buridan‟s impetus theory of mechanical motion (claggett, 1959; robin & ohlsson, 1989). a central principle in this theory says that to keep an object in motion requires the continuous application of force, the opposite of the principle of inertia that is at the centre of newtonian mechanics. however, the impetus principle holds on the surface of the earth due to the universal presence of friction. if the goal is to keep an object moving, or to make it move further or faster, the impetus concept is as useful a guide to action as the theory that physicists teach (apply more force). in short, truth and utility are only partially correlated, and evolution has no way of selecting for the truth of beliefs directly, but only for the success of an individual‟s struggle for survival. evolutionary considerations thus support rather than contradict the hypothesis that utility is the key variable in belief formation. s. ohlsson 82 | f l r 6.2 the behaviour of scientists the reader would be excused for thinking that the present author is engaged in a self-defeating enterprise: to use evidence and arguments to make the reader believe that people do not use evidence and arguments when deciding what to believe. this article is itself an attempt to base belief in this matter on evidence. more generally, scientists do base their theories on evidence and scientists are people, so it seems unreasonable to claim that this is not a common cognitive capability. gopnik and meltzoff (1997) has emphasized this connection between the procedures of scientific knowledge creation and individual belief formation: “the central idea of [our] theory is that the processes of cognitive development in children are similar to, indeed perhaps even identical with [sic], the processes of cognitive development in scientists.” (p. 3) indeed, they have stated their hypothesis quite clearly: “…the most central parts of the scientific enterprise, the basic apparatus of explanation, prediction, causal attribution, theory formation and testing, and so forth, is not a relatively late cultural invention but is instead a basic part of our evolutionary endowment.” (pp. 20-21) the consequence is that cognition can be explained with normatively correct processes such as bayesian inference (gopnik et al., 2004). the utility-based view explored in this article does not deny that people can acquire the higher-order cognitive skills needed to engage in the methods and procedures of science. it does claim that those methods and procedures are acquired. scientists are professional theorizers; they engage in belief formation (a.k.a. hypothesis testing) deliberately and on purpose, with a high degree of awareness. to be able to do this, they undergo a multi-year training process called graduate school. they are supported by a wide variety of tools such as special-purpose statistical software that embody the principles of the normative view of belief management. furthermore, scientific research takes place within a social context, the scientific discipline, that enforces adherence to the normative theory. for example, a scientist who revises his or her theory to improve its fit to empirical data (principle 4) is more admired than someone who continues to advocate a favourite theory in the face of counterevidence. the behaviour of scientists shows that people can acquire the high-level skills needed to function at least approximately as prescribed by the normative theory. but this does not imply that the basic processes of the cognitive architecture conform to the normative theory. to cast the procedures of science as a description of conceptual change in the individual is to confuse two levels of description: the level of the basic processes of the cognitive architecture (the “basic part of our evolutionary endowment”), on the one hand, and the level of acquired higher-order strategies and skills, on the other. the arguments put forth in this paper concern the basic processes. i know of no reason to believe that “the most central parts of the scientific enterprise, the basic apparatus of explanation, prediction, causal attribution, theory formation and testing” is part of our “evolutionary endowment.” the late arrival of science in human history, its invention by one culture at one time, and the extensive training individuals need to conduct scientific research make it highly implausible that anything like the “basic apparatus” of science is among our “evolutionary endowment.” instead, the cognitive apparatus of science is precisely “a relatively late cultural invention.” the relation between the basic processes of cognitive change and the procedures of science is the opposite of the one claimed by gopnik and meltzoff (1997). rather than scientific practices explaining how cognitive change happens in children and lay adults, the relationship should be construed the other way around: a theory of the basic cognitive processes should explain how it is possible to acquire the higherorder strategies for belief management that approximate the normative theory in principles 1-4. the utilitybased perspective have other implications as well, three of which are discussed in section 7. 7. implications if we adopt the utility-based perspective, what follows? from the point of view of basic research on conceptual change, it implies a re-evaluation of existing theoretical constructs, methodologies, and applications. traditionally, research on conceptual change and belief formation has been perception-centric: the focus has been on what the learner sees and hears, and how he or she processes the perceived s. ohlsson 83 | f l r information. the utility-based perspective, in contrast, implies a need to focus on what the learner does, when and where he or she succeeds or fails, and on what information is activated, retrieved, and used to guide action. a learning trajectory is primarily to be defined in terms of tasks undertaken, and only secondarily in terms of information encountered. as a side effect of such a re-focusing, the traditional concepts, tools, and puzzles regarding truth inherited from philosophy and logic will become comparatively less important. the perception-centric bias of cognitive research in general and cognitive studies for education in particular is driven, in part, by the practicalities of psychological experimentation. the experimenter controls the subject‟s task environment, so he or she can create complex but well-specified conditions and contrasting situations by varying the stimulus. such variations can easily be described in research reports. the subjects‟ behaviours, on the other hand, are only easy to report and interpret in an intersubjectively valid way if they consist of simple, easy-to-record events, like pushing a button or placing a mark on a rating scale. the pragmatist perspective implies that this style of empirical inquiry runs the risk of eliminating from the researchers‟ consideration the central subject matter of cognition, namely complex, temporally extended, hierarchically structured, and dynamically coordinated sequences of actions in the service of human goals and objectives. the pragmatist perspective implies a need for a period of methodological innovation in which researchers develop new techniques to record and interpret complex behaviours. from the point of view of instructional application, the utility-based account poses multiple challenges: how to stimulate students to encode knowledge that they have no reason to believe, and that is only tangentially relevant for their own action? how to design situations in which the new knowledge presented in the course of instruction, but not their prior knowledge, applies, so that the new knowledge can accrue utility? how to provide learners with multiple opportunities to apply new knowledge without resorting to mind numbing drill and practice? these questions are quite different from the questions of why misconceptions are robust, what evidence will convince a student that his or her misconceptions are in fact inaccurate, or how to train students to pay attention to evidence, so pursuing them will likely lead educational researchers in novel directions. 8. conclusion throughout the history of science, interdisciplinary work has often been innovative and path breaking. at the beginning of conceptual change research, there was every reason to believe that drawing upon a variety of disciplines was a productive way to proceed. however, researchers (including the present author) overlooked the distinction between normative and descriptive disciplines, fell into the ideal-deviation paradigm, and spent their theoretical energies explaining the main observed deviations from the normative theory: that students do not revise their prior conceptions when confronted with counterevidence. but the normative idea that people ought to base their beliefs on evidence is irrelevant for the empirical study of cognition. there is little or no evidence that people base any of their beliefs on evidence, and considerable evidence that they do not. if they do not, then it is no surprise that science courses fail to impact students‟ beliefs about scientific phenomena, and efforts to explain this supposed phenomenon are unnecessary. to make progress in understanding conceptual change, researchers need to adopt a resolutely naturalistic approach that makes no normatively inspired assumptions about belief formation and belief revision. the pragmatist view that cognition evolved to support successful action and that beliefs are evaluated on the basis of their cognitive utility instead of their probability of being true is an alternative starting point for conceptual change research. the utility-based perspective implies that action is necessary for conceptual change, that old and new beliefs are not mutually incompatible, that conceptual change is necessarily gradual, and that change is not driven by the failures of misconceptions but by the successes of better ideas. a research program to articulate this perspective would replace the traditional perception-centric bias of psychological research with an action-centric approach that forefronts the cognitive consequences of complex actions. s. ohlsson 84 | f l r references allport, g. w. (1958/1979). the nature of prejudice (2 nd ed.). reading, ma: addison-wesley. anderson, j. r. (2007). how can the human mind occur in the physical universe (pp. 159-165)? oxford up. broughton, s. h., sinatra, g. m., & nussbaum, e. m. (2013). “pluto has been a planet my whole life!” emotions, attitudes, and conceptual change in elementary students‟ learning about pluto‟s reclassification. research in science education, 43, 529-550. butterfield, h. (1957). the origins of modern science 1300-1800 (revised ed.). indianapolis, in: hackett. carey, s. (2009). the origin of concepts. new york: oxford university press. chi, m. t. h. (2005). commonsense conceptions of emergent processes: why some misconceptions are robust. the journal of the learning sciences, 14, 161-199. chi, m.t.h. (2008). three types of conceptual change: belief revision, mental model transformation, and categorical shift. in s.vosniadou (ed.), handbook of research on conceptual change (pp. 61-82). hillsdale, nj: erlbaum. claggett, m. (1959). the science of mechanics in the middle ages. madison, wisconsin: university of wisconsin press. duit, r., & treagust, d. f. (2003). conceptual change: a powerful framework for improving science teaching and learning. international journal of science education, 25(6), 671-688. evans, j. st. b. t. (2007). hypothetical thinking: dual processes in reasoning and judgment. new york: psychology press. festinger, l. (1957/1962). a theory of cognitive dissonance. stanford, ca: stanford university press. frank, p. (1952). the origin of the separation between science and philosophy. proceedings of the american academy of arts and sciences, 80(2), 115-139. gabbay, d., & woods, j. (2001). the new logic. logic journal of the interest group in pure and applied logics, vol. 9, pp. 141-174. gopnik, a., & meltzoff, a. n. (1997). words, thoughts, and theories. cambridge ma: mit press. gopnik, a., glymour, c., sobel, d. m., schulz, l. e., & kushnir, t. (2004). a theory of causal learning in children: causal maps and bayes nets. psychological review, 111, 3-32. johnson-laird, p. n. (2006). how we reason. new york: oxford university press. kahneman, d. (2011). thinking, fast and slow. new york: farrar, straus, and giroux. klauer, k. c., stahl, c., & erdfelder, e. (2007). the abstract selection task: new data and an almost comprehensive model. journal of experimental psychology: learning, memory, and cognition, 33, 680-703. kuhn, t. s. (1957). the copernican revolution: planetary astronomy in the development of western thought. new york: random house. kuhn, t. s. (1970). the structure of scientific revolutions (2 nd ed.). chicago, il: university of chicago press. köhler, w. (1938/1966). the place of value in a world of facts. new york: liveright. lakatos, i. (1980). philosophical papers (vol. 1): the methodology of scientific research programmes). cambridge, uk: cambridge university press. limón, m. (2001). on the cognitive conflict as an instructional strategy for conceptual change: a critical appraisal. learning and instruction, 11, 357-380. margolis, h. (1987). patterns, thinking, and cognition: a theory of judgment. chicago, il: university of chicago press. margolis, h. (1993). paradigms and barriers: how habits of mind govern scientific beliefs. chicago, il: university of chicago press. nersessian, n. j. (2008). creating scientific concepts. cambridge, ma: mit press. offit, p. a. (2011). deadly choices: how the anti-vaccine movement threatens us all. new york: basic books. ohlsson, s. (2009). resubsumption: a possible mechanism for conceptual change and belief revision. educational psychologist, 44, 20-40. ohlsson, s. (2011). deep learning: how the mind overrides experience. new york: cambridge university press. s. ohlsson 85 | f l r osler, m.j., (ed.), (2000). rethinking the scientific revolution. cambridge, ny: cambridge university press. oreskes, n. (2004). beyond the ivory tower: the scientific consensus on climate change. science, 306, 1686. peirce, c. s. (1878). how to make our ideas clear. popular science monthly, vol. 12, pp. 286-302. [reprinted in n. houser and c. kloesel (eds.), the essential peirce: selected philosophical writings (vol. 1, pp. 124-141). bloomington, in: indiana university press.] pettigrew, t. f. (1998). intergroup contact theory. annual review of psychology, vol. 49, pp. 65-85. polk, t. a., & seifert, c. m., (eds.), (2002). cognitive modeling. cambridge, ma: mit press. posner, g. j., strike, k. a., hewson, p. w., & gertzog, w. a. (1982). accommodation of a scientific conception: toward a theory of conceptual change. science education, 66, 211-27. putnam, h. (2002). the collapse of the fact/value dichotomy; and other essays. cambridge, ma: harvard university press. robin, n., & ohlsson, s. (1989) impetus then and now: a detailed comparison between jean buridan and a single contemporary subject. in d. e. herget (ed.), the history and philosophy of science in science teaching. proceedings of the first international conference (pp. 292-305). tallahassee: florida state university, science education & dept. of philosophy. rakison, d. h., & poulin-dubois, d. (2001). developmental origin of the animate-inanimate distinction. psychological bulletin, 127(2), 209-228. rokeach, m. (1960). the open and closed mind. new york: basic books. rokeach, m. (1970). beliefs, attitudes, and values: a theory of organization and change. san francisco, ca: jossey-bass. schiller, f. c. s. (1905). the definition of „pragmatism‟ and „humanism‟. mind, 14, 235-240. shipstone, d. m. (1984). a study of children‟s understanding of electricity in simple dc circuits. european journal of science education, 6, 185-198. sinatra, g. m., & pintrich, p. r., (eds.), (2003). intentional conceptual change. mahwah, nj: lawrence erlbaum. stitch, s. p. (1983). from folk psychology to cognitive science: the case against belief. cambridge, ma: mit press. strike, k. a., & posner, g. j. (1992). a revisionist theory of conceptual change. in r. a. duschl and r. j. hamilton (eds.), philosophy of science, cognitive psychology, and educational theory and practice (pp. 147-176). new york: state university of new york press. thagard, p, (1992). conceptual revolutions. princeton, nj: princeton university press. vosniadou, s., baltas, a., & vamvakoussi, x., (eds.), (2007). reframing the conceptual change approach to learning and instruction. amsterdam, the netherlands: elsevier science. vosniadou, s., & brewer, w.f. (1992). mental models of the earth: a study of conceptual change in childhood. cognitive psychology, 24, 535-585. vosniadou, s., & skopeliti, i. (2013). conceptual change from the framework theory side of the fence. science & education, doi 10.1007/s11191-013-9640-3. wason, p. c., & johnson-laird, p. n. (1972). psychology of reasoning: structure and content. london, uk: b. t. batsford. watson, j. d., & crick, f. h c. (1953). a structure for deoxyribose nucleic acid. nature, vol. 171, pp. 737738. frontline learning research 5 (2014) 28-45 issn 2295-3159 corresponding author: http://dx.doi.org/10.14786/flr.v2i3.96 28 | f l r scientific reasoning and argumentation: advancing an interdisciplinary research agenda in education frank fischer a , ingo kollar a , stefan ufer b , beate sodian a , heinrich hussmann c , reinhard pekrun a , birgit neuhaus d , birgit dorner e , sabine pankofer e , martin fischer f , jan-willem strijbos a , moritz heene a & julia eberle a,d a ludwig maximilians university of munich, department of psychology, germany b ludwig maximilians university of munich, department of mathematics, germany c ludwig maximilians university of munich, department of informatics, germany d ludwig maximilians university of munich, department of biology, germany e katholische stiftungsfachhochschule münchen university of applied sciences, germany f ludwig maximilians university of munich, university hospital, institute for medical education, germany a-f munich center of the learning sciences, germany article received 24 february 2014 / revised 1 april 2014 / accepted 19 may 2014 / available online 16 june 2014 abstract scientific reasoning and scientific argumentation are highly valued outcomes of k-12 and higher education. in this article, we first review main topics and key findings of three different strands of research, namely research on the development of scientific reasoning, research on scientific argumentation, and research on approaches to support scientific reasoning and argumentation. building on these findings, we outline current research deficits and address five aspects that exemplify where and how research on scientific reasoning and argumentation needs to be expanded. in particular, we suggest to ground future research in a conceptual framework with three epistemic modes (advancing theory building about natural and social phenomena, artefact-centred scientific reasoning, and science-based reasoning in practice) and eight epistemic activities (problem identification, questioning, hypothesis generation, construction and redesign of artefacts, evidence f. fischer et al. 29 | f l r generation, evidence evaluation, drawing conclusions as well as communicating and scrutinizing scientific reasoning and its results). we further propose addressing the domain specificities and domain generalities of scientific reasoning and argumentation as well as approaches for facilitation. finally, we argue for investigating the role of epistemic emotions, the role of the social context, and the influence of digital technologies on scientific reasoning and argumentation. keywords: scientific reasoning; argumentation; epistemic emotions; collaboration; technology 1. problem to participate in the knowledge society and to benefit from the unprecedented open access to a vast volume of scientific knowledge requires a broad set of skills and abilities that have lately been labelled as 21 st century skills (e.g., trilling, & fadel, 2009). these include skills and abilities to use scientific concepts and methods to understand how scientific knowledge is generated in different scientific disciplines, to evaluate the validity of science-related claims, to assess the relevance of new scientific concepts, methods, and findings, and to generate new knowledge using these concepts and methods. the acquisition of these complex competencies is considered a main goal and outcome of k-12 and higher education. however, contemporary knowledge about what constitutes these competencies and how they can be facilitated is scattered over different research disciplines. in order to develop a better understanding of these competencies, we propose to build on three existing strands of research. first, research on the development of scientific reasoning (e.g., koslowski, 2012); second, research looking at the processes and products of scientific argumentation (e.g., chinn & clark, 2013) from the fields of educational psychology, education, as well as science education and other subject education disciplines. third, there is a broad range of approaches to support and facilitate scientific reasoning and argumentation (sra) in educational contexts (e.g., furtak, seidel, iverson, & briggs, 2012). in this article, we will first provide an overview of the main topics and key findings of these three strands of research. building on these findings, we outline the deficits of existing research and address five aspects that exemplify where and how research on sra needs to be expanded. 2. key findings of previous research 2.1 development of scientific reasoning research on scientific reasoning amongst laypeople has its roots in developmental psychology. inhelder and piaget (1958) assumed that scientific rationality was a model of the ideal human reasoning, that is, a person who reflects on theories, builds hypothetical models of reality, critically and exhaustively tests for all possible main and interaction effects between variables, and objectively and systematically evaluates evidence with respect to a claim. in a series of studies they showed that the scientific reasoning of preadolescent children was severely deficient, whereas significant improvement took place in adolescence. these findings led them to claim the stage of “formal operational thought” as the highest stage of cognitive development. this view has since been heavily criticised, as it neither adequately captures adult reasoning nor its development (kuhn & franklin, 2006). neither the lay adult nor professional scientists conform to a model of domain-general, ideal scientific rationality. rather, adult reasoning abilities are heavily dependent on domain-specific knowledge and context (e.g., kruglanski & gigerenzer, 2011). this is found for laypersons, but professional scientists f. fischer et al. 30 | f l r are equally influenced by their prior knowledge and theoretical biases (dunbar, 1995). similarly, children’s scientific reasoning is context and task dependent and does not differ fundamentally from adult scientific reasoning (koslowski, 1996, 2012; see zimmerman, 2000, 2007). the “layperson as scientist” metaphor, which focuses on processes of intentional knowledge seeking to test theories and hypotheses and to evaluate evidence with respect to a hypothesis or theory (kuhn & franklin, 2006), has proved to be a productive framework for research into scientific reasoning. however, broad models of scientific reasoning that incorporate early competencies are only now emerging (kuhn, & franklin, 2006; sodian & bullock, 2008). for example, kuhn (1991) showed that differentiation of theory and evidence poses a major problem for many lay adults in complex, real-world argumentation. however, even young elementary school children can differentiate hypothetical beliefs from evidence and identify a conclusive research design to test a hypothesis (sodian, zaitchik & carey, 1991). third graders distinguish controlled from confounded experiments (bullock, & ziegler, 1999). even pre-schoolers possess basic data evaluation competencies (koerber, sodian, thoermer, & nett 2005; koerber b& sodian, 2009). thus, neither children nor adults appear to lack a basic understanding of the relationship between hypothetical beliefs and empirical evidence. rather, in complex theory evaluation tasks, both children and adults appear to lack an understanding of mechanisms, as well as methodological knowledge to provide and judge evidence-based arguments (e.g., koslowski, 2012). a meta-conceptual understanding of the nature of scientific knowledge has been identified as a major source of developmental progress. understanding progresses from an undifferentiated level 1 (science as activities and effects) through an intermediate level 2 (science as providing explanations via testable claims) to a level 3 understanding (science as a cyclical and cumulative process of theory, testing, and revision), with children rarely displaying level 2 and even adults rarely articulating a coherent level 3 understanding (e.g., carey & smith, 1993). however, even the nature of elementary school students’ science understanding can be improved through instructional support (e.g., sodian, jonen, thoermer, & kircher, 2006). moreover, an advanced meta-conceptual understanding of science in childhood has been found to predict strategy acquisition in adolescence (bullock, sodian, & koerber, 2009). recent attempts in developmental research with elementary school students support a model of scientific reasoning as a complex set of interrelated abilities, consisting of four major components: “understanding the nature of science”, “understanding theories”, “designing experiments”, and “interpreting data” (e.g., koerber, sodian, kropf, mayer, & schwippert, 2011). apart from general cognitive abilities, student’s problem-solving skills and spatial abilities have been shown to have a major impact on these scientific reasoning competencies. moreover, scientific reasoning has been shown to be a separate construct from measures of intelligence and reading skills in elementary school students (mayer, sodian, koerber, & schwippert, 2014). 2.2 scientific argumentation while developmental research is mainly interested in the developmental trajectories of an individual’s scientific reasoning, educational and science education research on scientific argumentation has focused on the externalised processes and products of scientific reasoning within social contexts (e.g., the science classroom; osborne, 2010). the interest in scientific argumentation is sparked by the view that argumentation relates to the learning of core content and acquisition of general argumentation skills (chinn & clark, 2013). previous research strived for two main goals: (a) identification of students’ deficits during their engagement in scientific argumentation in social contexts, and (b) design and development of effective scaffolding approaches to improve students’ argumentation. with respect to students’ deficits in scientific argumentation, some studies focused on the structural quality of student-generated arguments, for example on the use of evidence (e.g., mcneill, 2011), qualifiers (stegmann, wecker, weinberger, & fischer, 2012) or warrants (kollar, fischer, & slotta, 2007). a recurring finding has been that students tend to make claims without justifications. in socio-scientific debates, they typically do not spontaneously refer to scientific concepts and information (sadler, 2004). other studies have shown that students often have problems producing arguments of high content quality (e.g., kelly, & takao, f. fischer et al. 31 | f l r 2002). a third set of studies revealed that students often exhibit a poor dialogic or social quality of argumentation as reflected in the social exchange and co-construction of arguments. for example, students have been found to refrain from challenging others’ arguments (weinberger, stegmann, & fischer, 2010). this might be related to the recurring finding that students have difficulties recognising contrasting argumentative positions (sadler, 2004) and are often not successful in integrating different perspectives of different learners within a group or community (noroozi, weinberger, biermans, mulder, & chizari, 2012). 2.3 intervention studies how students can effectively be supported in their acquisition of sra-related skills has been subject to a large body of intervention-based research, including long-term and short-term interventions, technologybased and teacher-based scaffolding, laboratory as well as field studies, and studies at the school and university levels (e.g., kollar et al., 2007; mcneill, lizotte, krajcik & marx, 2006). overall, this research shows that sra can be substantially advanced by making it an explicit topic of instruction (see osborne, 2010). this applies to both increasing students’ abilities to engage in activities of scientific knowledge generation (or epistemic activities) and helping them develop a more sophisticated understanding of the nature of science. current research on instructional approaches focuses on immersing learners into scientific practices (see cavagnetto, 2010) which typically involves student engagement in research-related activities and debates. three prototypical instructional approaches are inquiry learning, problem-based learning, and design-based learning. inquiry learning engages students in more or less authentic activities of hypothesis formulation, generation of evidence, and drawing conclusions (chinn & malhotra, 2002). inquiry learning proved to be an effective instructional approach to advance science learning, especially when combined with teacher-led activities (e.g., furtak et al., 2012). similarly, in problem-based learning, students are confronted with complex problems and expected to find explanations and solutions that are based on scientific concepts and methods (e.g., dochy, segers, van den bossche & gijbels, 2003). design-based learning (e.g., kolodner, 2007) engages students in inter-linked cycles of research and design with the goal of arriving at an optimal design of a concrete product, such as a miniature car that that can go from one end of the classroom to the other. in all of these approaches that aim to immerse students into authentic sra processes, it has been found crucial to provide students with structural support. this scaffolding may be directed at individual learners, small groups and whole classrooms. for individual learning, hints, prompts, sentence starters, and guiding questions that help students focus their attention on the critical aspects of sra have been found to be effective (see quintana, reiser, davis, krajcik, fretz, duncan, & soloway, 2005). a hypothesis scratchpad, for example, helped students formulate better hypotheses than students whose hypothesis formation was unscaffolded (van joolingen, & de jong, 1993). for small-group collaboration, several studies showed that the quality of sra can be raised substantially through collaboration scripts (see fischer, kollar, stegmann, & wecker, 2013), which assign roles to learners and sequence their epistemic activities. for instance, a social-discursive peer review script has been shown to enhance student argumentation. detailed process analyses revealed that social-discursive argumentation during the peer review processes mediated the effects of scaffolding by the script on the improvement of (individual) argumentation skills (stegmann et al., 2012). a related form of structuring collaboration is peer assessment (e.g., cho, schunn, & wilson, 2006; strijbos, & sluijsmans, 2010), which is also a crucial aspect of the contemporary scientific process. peer assessment can be used to help collaborators uncover incongruence in their respective sra processes when scrutinising scientific claims and evidence. the incongruence can subsequently foster refinement of target processes through critical reflection (nicol, thomson, & breslin, 2014). finally, studies demonstrated that teachers can be successfully empowered to help students gain scientific argumentation skills (e.g., erduran, simon, & osborne, 2004). for instance, research on classroom scripts has shown that epistemic activities can be facilitated if teachers combine scaffolding at different social levels in the classroom (plenary, group, individual; e.g., mäkitalo-siegl, kohnle, & fischer, 2011). moving even beyond the boundaries of the classroom, knowledge building communities have been successfully implemented in schools around the globe to engage students in argumentative processes to jointly construct knowledge in the classroom (scardamalia & bereiter, 2006). f. fischer et al. 32 | f l r 3. deficits of prior research and directions for advancing studies on sra research on the development of scientific reasoning, as well as research on scientific argumentation, has substantially progressed over the last two decades (see nussbaum, 2011; zimmerman, 2007). however, there are still important research gaps which leads us to argue for more systematic and interdisciplinary research on sra. we propose that future research should (a) expand the range of epistemic modes and epistemic activities, (b) investigate domain-specific aspects of sra more systematically, (c) examine the role of emotions in sra, (d) consider the social context of sra in a more systematic way, and (e) explore the influence of digital technologies on sra. each of these suggestions is more closely elaborated upon in the following. 3.1 expanding the range of epistemic modes and epistemic activities 3.1.1 epistemic modes people engage in sra with different motivations. for example, a researcher may strive to contribute to theory building in a domain while practitioners try to find solutions for problems in their professional practice by applying scientific concepts or methods. we argue that these different motivations have not yet been systematically reflected in research on sra in educational contexts. stokes (1997) suggested a widely accepted classification according to which approaches to scientific reasoning vary in their primary goals along two orthogonal dimensions: understanding and use. pure basic research is characterised by its primary goal of advancing scientific understanding of natural and social phenomena, regardless of its usefulness in practice. stokes used nils bohr’s scientific approach – with no emphasis on the use and societal uptake of his theoretical advances – to characterise this type of research. in contrast, pure applied research emphasises the use of scientific knowledge without the aim of advancing theory building and understanding. stokes exemplified this kind of research with the work of thomas a. edison, who brought electricity to a whole country by using scientific knowledge and methods, but without being concerned about generalisation and theory building beyond this practical challenge. a third class that stokes (1997) identified is the scientific approach that combines the goals of understanding and use, which he termed “use-inspired basic research” and exemplified with louis pasteur’s work. pasteur started from problems in practice (e.g., how to make food last longer), conducted systematic research to solve them, but simultaneously strived for a generalised theoretical explanation. we suggest that stokes’ classification of research approaches can be used to inform the differentiation of three distinct modes of sra. in a first mode (1) sra can be used to advance theory building about natural and social phenomena. when learners apply this mode, they aim to generate and test hypotheses to develop and improve scientific theories and explanations about social and natural phenomena. that way, this epistemic mode will help support student learning of the scientific knowledge of a domain, how it is created, and how students themselves can contribute to knowledge creation by engaging in scientific research. a second sra mode may be labelled (2) science-based reasoning and argumentation in practice. in this mode, learners aim at developing solutions for contextualised problems using scientific concepts, theories, and methods. based on information about the problem and the state-of-research as they know it, learners generate one or more solution approaches and evaluate them in light of scientific knowledge and methods, but also based on standards of the practice under consideration. that way, learners take over the role of scientifically knowledgeable practitioners rather than that of basic researchers. for example, teacher education students may develop a concept to help 4th graders improve with respect to their reading abilities, based on both practice-based observations of the possibly poor reading abilities of their students and on prior scientific theories and empirical studies on how to effectively support students with reading difficulties (e.g., reciprocal teaching; palincsar & brown, 1984). another example is the application of mathematics to solve practical problems (e.g., predicting the development of sprint world records by describing historical data with an appropriate mathematical function), typically referred to as “mathematical modelling” (galbraith, f. fischer et al. 33 | f l r henn, & niss, 2007). the difference between science-based reasoning in practice and problem solving is that the result is not only the solution of a problem, but also an argument based on scientific theory. the third sra mode we would like to introduce is called (3) artefact-centred sra. this mode is realized when students engage in circular processes which involve the concurrent development of an artefact and a scientific theory or explanation for why the artefact works or does not work (i.e., why a given problem can or cannot be solved by the use of the artefact), through repeated cycles of prototype design, testing, and analysis of test results. for example, kolodner (2007) reports on a science curriculum unit during which students are supposed to build miniature cars from a given set of materials. based on concepts from physics (e.g., friction and force), the students’ task is to design a car that would travel from one end of the classroom to the other. that way, the students’ reasoning and argumentation resembles that of researchers in engineering and technology. this mode of scientific reasoning differs from “science-based reasoning in practice” with respect to the thrust towards generalisation and theory building. nevertheless, in educational contexts, both modes have the potential to address student competence of understanding and engagement in scientific knowledge creation activities, as well as their competence to address practical problems through application of scientific concepts and methods. 3.1.2 epistemic activities the three epistemic modes imply an extended notion of sra that also calls for considering a comprehensive set of scientific activities. students in educational contexts need to learn how these activities work and how to engage in them. we suggest distinguishing eight epistemic activities that all may be fulfilled in sra in all of the three epistemic modes. yet, both the weight that is attributed to each activity in each of the three modes and the way these activities are performed within each of the three modes may differ. in the following, we describe each of these activities along with one example of how the activity may be performed in one of the three epistemic modes. (1) problem identification. many scientific reasoning processes are driven by concrete problems. according to the three epistemic modes, such problems might be practical real-world problems (see kolodner, 2007), but also scientific problems that cannot be solved with the available theoretical concepts and methods. becoming aware that available explanations do not appropriately explain phenomena is a starting point for both the advancement of science as an abstract set of knowledge, and for the individual learner advancing his or her understanding of the world. thus, to engage in sra, one first needs to perceive a mismatch or shortcoming concerning the available explanation of a particular problem. during this epistemic activity, a problem representation is built from an analysis of the situation. a medical student may for example be confronted with a patient who reports a diverse set of illness symptoms (exemplifying the epistemic mode science-based sra in practice). based on medical knowledge, which in medical experts typically is encapsulated in so-called “illness scripts” (charlin, boshuizen, custers, & feltovich, 2007), the student will try to identify which parts of the patient’s descriptions are relevant for the diagnostic process and which are not. that way, the actual biomedical problem is gradually concretized and then determines further action. (2) questioning. based on the representation developed during problem identification, one or more initial questions are identified for the subsequent reasoning process (see white & frederiksen, 1998). later on, this question might be refined to allow for a systematic search of evidence. to exemplify how a math student may be confronted with questioning in the epistemic mode of advancing theory building about natural and social phenomena we refer to the following famous problem formulated by euler in 1741 (seven bridges of königsberg; solution proved by hierholzer & wiener, 1873): in a given arrangement of points and lines between these points (e.g., a set of crossings and streets in a city), how can we determine if an “euler-walk” along adjacent lines is possible, which passes each line exactly once (e.g., a sightseeing walk through the city)? the problem here is a classification problem (how to describe objects with a given property). (3) hypothesis generation. during hypothesis generation, students derive possible answers to the question from plausible models, available theoretical frameworks or empirical evidence they are aware of f. fischer et al. 34 | f l r (klahr & dunbar, 1988). if the student’s prior knowledge does not allow for predictions, the question might be refined or – alternatively – an exploratory approach of evidence generation may be adopted to derive a hypothesis based on patterns in this evidence. this process involves formulating the hypothesis according to scientific standards. in biology, a learner may for example aim at developing an answer to the question how the memory of honey bees develops. based on prior research, the learner may hypothesize that glutamate plays a role in this process, since glutamate has been shown to be important for human memory development. to substantiate this hypothesis, further search for corresponding literature may be necessary, e.g. concerning the question whether glutamate has also been found in other insects. (4) construction and redesign of artefacts. scientific reasoning often includes the construction of some kind of artefact, be it the development of a prototype object by an engineer or an axiomatic system describing a new mathematical structure. typically, this construction will be based on current theoretical knowledge. following its construction, the artefact is submitted to a test in an authentic environment (see kolodner, 2007). for example, teacher students may have the task to develop a computer-based collaborative learning environment that would effectively scaffold the interaction of small groups of learners in order to raise the individuals’ learning outcomes (exemplifying the epistemic mode of artefact-centred sra). for that purpose, a prototype of the learning environment (e.g., based on the collaboration script approach; fischer et al., 2013) may be built that – based on theoretical reasoning and prior empirical evidence – seems promising to achieve this goal. (5) evidence generation. evidence generation includes various approaches. one approach is to conduct hypothetico-deductive experimental studies that refer to the systematic, theory-driven variation of one or more variables by the learner in consecutive trials, while repeatedly observing the same outcome variables. evidence generation may also follow an inductive approach of observing, comparing and describing phenomena to draw conclusions about structures and functions, for example in evolutionary biology or sociology. another approach is observing the synchronous or sequential co-occurrence of phenomena, which is frequently applied in the natural sciences (e.g., when studying climate models), but also in the social sciences (e.g., in longitudinal studies). finally, most natural and social sciences use deductive reasoning – within more or less elaborate theories – to generate evidence in favour or against a claim. in the mathematic example by euler described above, a first approach to gather (exploratory, in the mathematical sense preliminary) evidence would be to study single examples of point-line configurations and test if they admit an euler-walk. comparing configurations which admit such a walk and some which do not, might lead to a first hypothesis about the characteristic difference between the two (hypothesis generation). studying more, and perhaps extreme, examples will add further (still preliminary) evidence to support the hypothesis, maybe leading to its revision or refinement. finally, starting from a set of basic assumptions on such line configurations (described by the axioms of mathematical graph theory), a deductive chain of arguments can be constructed that shows that configurations admitting an euler-walk have the hypothesized property, and vice versa. constructing such a line of deductive arguments, which derive that a conjecture follows from the axioms of a mathematical theory, is actually the main mode of evidence generation in mathematics. nevertheless, also other kinds of evidence play a major role in mathematical reasoning, such as counter-examples that disprove a general conjecture (e.g., zazkis, & chernoff, 2008). (6) evidence evaluation. the aim of evidence evaluation is to assess the degree to which a certain piece of evidence supports a claim or theory. what counts as evidence will differ both with respect to the epistemic mode in which sra is realised and with respect to the domain under study. observational studies (shafto, kemp, bonawitz, coley, & tenenbaum, 2008), for example, might be considered the best available evidence in one discipline (e.g., astronomy) but less valuable than experimental studies in another (e.g., psychology, engineering; kolodner, 2007). deductions from a theoretical framework constitute the crucial acceptance criterion in mathematics, whereas in psychology or in natural sciences they serve an auxiliary role as predictions about the outcomes of an experiment from theoretical assumptions. even though an “experimentum crucis” is not viable in most disciplines, cumulated evidence from several experimental or observational studies is necessary to sustain a claim. an example from medical education in the epistemic mode of science-based sra in practice would be a medical student aiming to find the right diagnosis for a f. fischer et al. 35 | f l r patient’s health problem in a case-based simulation environment. evidence evaluation in this example may refer to the accumulating evidence from the patient´s history, physical examinations and additional lab and technical tests. optimally, this evidence is interpreted in light of candidate diagnoses that have already been set up during hypothesis generation. here the development of encapsulated, experiential knowledge in the form of illness scripts (charlin et al., 2007) has been identified as crucial in order to arrive at a sound evaluation of the collected evidence. (7) drawing conclusions. since different kinds of evidence can be generated within the scientific reasoning process, drawing conclusions is not restricted to reconsidering an initial claim in light of experimental results. different pieces of evidence must often be integrated by weighing each single piece according to the method by which it was generated and by the rules and criteria of the discipline. in the case of a teacher student developing a scaffolded computer-supported learning environment, drawing conclusions means to critically analyse data and observations from an experiment or a field trial in which the environment was used and to derive consequences for whether the environment (or specific features of it) needs to be re-designed or may be used as originally planned in further trials. to arrive at such a conclusion, typically a multitude of data sources needs to be considered (e.g., individual knowledge tests, verbal protocols, data on students’ motivation). (8) communicating and scrutinising. individual scientific reasoning processes and their results are typically shared with and scrutinised by others (shavelson & towne, 2002). persons involved in scientific reasoning are more or less constantly involved in conversations and discussions in work groups or peer groups. these interactions might influence scientific reasoning from problem identification to knowledgebased interventions in practice situations. thus, social-discursive and dialogic argumentation is an integral component of many scientific reasoning processes and should be included when analysing and facilitating sra in educational contexts (e.g., clark, sampson, weinberger & erkens, 2007; sampson & clark, 2009). in the biology example on the memory of honey bees, communicating and scrutinising may play a double role. on the one hand, if groups of learners work on the honey bee problem, communication within the team is necessary to secure that the research process is carried out in a rigorous way, including arriving at a sound explanation for the phenomenon under investigation. on the other hand, the research process and outcomes are typically shared with the broader community, e.g. in the form of plenary presentations. 3.2 domain-specific aspects of sra need to be investigated more systematically while research on sra focused on commonalities across domains, investigations on the differences of sra between disciplines have been rare (e.g., herrenkohl & cornelius, 2013). in addition, the set of domains under consideration has so far been small and seemingly arbitrary. one crucial question is what role domain-specific conceptual knowledge plays for successful sra (e.g., chinnappan, ekanayake & brown, 2011; schunn & anderson, 1999). domain-specific conceptual knowledge is, for example, necessary to build a mental representation of the problem situation and to identify aspects of the situation that offer scientifically accessible questions. moreover, the process of scientific reasoning is different across domains, with respect to both nature and weight of the epistemic activities to be displayed. for example, engineers enact the epistemic activity of “problem identification” by starting their design process with a clear problem for which the initial stage, the solution stage, and the constraints are all well-defined. natural scientists and social scientists do not necessarily have such well-defined initial and solution stages – for them, thus, the epistemic activities “questioning” and “hypothesis generation” play a major role. regarding the epistemic activity of “evidence evaluation”, scientific disciplines vary considerably in what is regarded as acceptable evidence to support a scientific claim. while many natural sciences rely upon hypothetico-deductive methods, many social sciences accept inductive comparisons as methods of evidence evaluation. in (pure) mathematics the only acceptable evidence is a chain of deductive arguments within a theory. all other kinds of evidence are regarded as informal. thus, transferring criteria for evidence evaluation from one discipline to another appears problematic. moreover, it is unclear whether exposure to one domain-specific approach of scientific reasoning influences the nature of evidence evaluation skills in other domains (given that k-12 education, as well as teacher education, immerses students in various domains). f. fischer et al. 36 | f l r although the nature of epistemic activities varies across disciplines, approaches to foster student’s scientific reasoning have typically focused on single domains and developed in different directions. while research from developmental psychology and science education has predominantly focused on hypothesis and evidence generation and evaluation processes, research from mathematics education focused on metacognitive aspects to improve students’ self-regulated problem-solving (for example when searching for mathematical proofs, chinnappan & lawson, 1996). despite the fact that the existence of domain-dependent differences concerning sra can hardly be doubted, we contend that the three epistemic modes and the eight epistemic activities are of relevance to a broad range of disciplines. in other words, there may also be skill aspects of sra that are similar across domains (such as skills for structuring a problem situation, experimentation or deductive reasoning). however, since disciplines might differ substantially in the relative weights of the modes and activities and thus in the specific knowledge, skills and attitudes that students are supposed to develop when learning sra, a more representative selection of disciplines seems key for investigating their particularities in future research. finally, existing approaches to facilitation have typically proven effective for only one specific domain, in the context of one epistemic mode, in referring to only some specific epistemic activities, and in focusing on only some specific learning prerequisites. the extent to which the approaches to facilitation are domain-specific is an important question, but the extent to which they can be generalised across epistemic modes, domains, epistemic activities, and different learners is an important question as well (see klahr, zimmerman & jirout, 2011). future research should thus invest effort in identifying domain-specific and domain-general aspects of sra and their facilitation. 3.3 the role of emotions in sra requires investigation cognition is intricately interwoven with emotions. emotions are defined as systems of interrelated component processes, including subjective, physiological, and behavioral components (e.g., uneasy and nervous feelings, physiological activation, and anxious facial expression in anxiety; shuman & scherer, 2014). cognitive appraisals of situational demands and one’s competencies are known to shape human emotion. emotions, in turn, are prime drivers of motivation to solve problems and can profoundly impact the quality and outcomes of cognitive processes (e.g., moors, ellsworth, scherer & fijda, 2013; pekrun, 2006). it seems likely that this is also true for sra. without emotions such as surprise, curiosity triggered by contradictory findings, joy about solving scientific problems, or pride in one’s accomplishments, scientists would likely not be motivated to engage in scientific discovery, and students would lack motivation to learn science (pekrun, hall, goetz & perry, in press). furthermore, these emotions are known to regulate attention, memory processes, and different modes of cognitive problem solving, such as analytical versus holistic ways to approach problems, which are critically important for sra (fiedler & beier, 2014). systematic research examining the links between emotions and scientific reasoning, however, is largely lacking as yet (see sinatra, broughton & lombardi, 2014). we propose that five groups of emotions that seem to be relevant for scientific reasoning should be investigated. (1) epistemic emotions. as noted, epistemic activities such as generating hypotheses, are at the core of scientific reasoning in a broad range of domains. typically, these activities are accompanied by emotions triggered by the epistemic quality of problem-related information and mental activity. a prototypical case is cognitive incongruity triggering surprise, awe, curiosity, confusion, or joy when the incongruity is resolved. as proposed by philosophers (brun, doğuoğlu, & kuenzle, 2008; morton, 2010), these emotions can be called epistemic emotions (pekrun, & stevens, 2011). (2) achievement emotions. achievement emotions are emotions that relate to activities or outcomes that are judged according to competence-related standards of quality (pekrun, 2006). in many learning situations, scientific reasoning activities and the outcomes of these activities are judged for their achievement quality. depending on the perceived importance of success and failure, scientific reasoning can induce strong achievement emotions, such as hope and pride or anxiety, shame, and hopelessness. f. fischer et al. 37 | f l r (3) topic emotions. during scientific reasoning, emotions can be triggered by the contents of the problem to be solved. an example is the anxiety experienced when dealing with issues of climate change or genetically modified food. in contrast to epistemic emotions, topic emotions do not directly pertain to the process of scientific reasoning, however, they can strongly influence engagement in reasoning (ainley, 2006). (4) social emotions. scientific reasoning is often situated in social contexts. by implication, scientific reasoning can induce a multitude of social emotions related to other people. these emotions include both social achievement emotions, such as admiration, envy, contempt, or empathy related to the success and failure of others, as well as non-achievement emotions, such as love or hate in relationships with collaborators in the reasoning process (weiner, 2007). (5) incidental emotions and moods. when engaging in scientific reasoning, a person can continue experiencing emotions that relate to external events, such as current stress, or problems in their family. these emotions do not relate to the reasoning process itself, but have the potential, nonetheless, to strongly influence the quality of reasoning and learning to reason, such as a student’s worries about their parents’ divorce being brought into the science classroom. all five classes of emotion can play a role in all epistemic activities. however, it seems likely that different emotions are more typical for some of these activities than for others. for example, epistemic emotions are likely to be triggered by mental activities that can involve impasses and cognitive incongruity, such as “problem identification” or “evidence evaluation”, whereas social emotions are of primary importance in collaborative reasoning processes and for the communication of the results of scientific reasoning. furthermore, emotions of all five classes can profoundly influence the scientific reasoning process and its outcomes. the impact of these emotions on reasoning can be mediated by various cognitive and motivational processes, e.g. intrinsic and extrinsic motivation to engage, or deep versus shallow information processing strategies (e.g., clore & huntsinger, 2009). as a consequence, positive activating emotions in reasoning may typically support high-quality reasoning, whereas some negative emotions may be detrimental. however, for many emotions and task conditions, the effects on reasoning performance are likely to be more complex. thus we argue that studying the role of emotions in and during sra is an important task for future research. 3.4 the social context of sra should be considered more systematically scientific reasoning and argumentation are typically situated in a social context (dunbar, 1995). some epistemic activities are collaborative in nature, such as discussing the results of scientific reasoning with peers or communicating them to the broader public. other epistemic activities are not collaborative in nature but may benefit from collaboration (chi, 2009; duschl, 2008). we propose two strands for future research that bear the potential to improve our understanding of the social aspects of sra. (1) collaborative knowledge construction. extensive research has been carried out on the cognitive and social mechanisms of knowledge construction in groups and collectives. research on knowledge construction in pairs and small groups has often been conducted in a joint problem-solving paradigm. this line of research focuses on how pairs and groups, in contrast to individuals, work on complex science-related problems (e.g., okada & simon, 1997) and on how groups develop joint strategies and norms for sra beyond just learning the domain content associated with the task (e.g., roschelle & teasley, 1997). research on dialogic education (wegerif, 2007) and argumentative classroom discourse (osborne, 2010) focuses on the structure and content of discussions in groups and collectives, and on the conditions for evolving (scientific) quality of the argumentation in these discussions. in contrast to the perspectives on joint problem solving and dialogic argumentation that analyse the micro-mechanisms of knowledge construction, research on communities of practice emphasises processes of knowledge creation, participation and identity in collectives of people sharing goals or interests (lave & wenger, 1991). in knowledge community approaches, domain knowledge acquisition by individuals is rather seen as a by-product of qualitative f. fischer et al. 38 | f l r changes in the participation pattern, from legitimate peripheral to more “core” participation. research is needed on which forms of participation in epistemic activities of certain scientific communities effectively advance students’ sra skills. (2) distributed, shared and collective cognition. approaches to distributed and shared cognition share the assumption that reasoning in real world tasks cannot be understood by just focussing on isolated individuals. in real world tasks, individuals collaborate with others on solving problems and making decisions, but they also use tools that allow them to act much more intelligently than they would be able to without. the distributed cognition perspective suggests a systemic perspective for the analysis of complex social and socio-technical tasks (e.g. salomon & perkins, 1998). research on transactive memory systems (wegner, 1987) addresses the cognitive interdependence that develops when group members collaborate for some time and specialise in specific areas of which the other members are aware. a transactive memory system is thus characterised by the collaborative division of labour for learning, remembering, and communication of knowledge (e.g., hollingshead, gupta, yoon, & brandon, 2011), which seems crucial for most epistemic activities. the shared mental models perspective (e.g. mohammed & dumville, 2001; wu & keysar, 2007) addresses the question which kind of knowledge (e.g., knowledge on task vs. knowledge on team) is needed and the extent to which group members need overlapping (shared) information as opposed to unique (or unshared) information to perform well as a team. research on cognitive convergence (teasley et al., 2008) or knowledge convergence (fischer & mandl, 2005) focuses on the similarity and dissimilarity of cognitive representations in collaborative situations, as well as their changes through collaboration. in the context of sra it is an interesting open question to which extent divergent vs. convergent cognitive representations of different individuals in a group are supportive for different epistemic activities. it seems plausible to hypothesise that divergent knowledge in a group is specifically supportive in epistemic activities such as “evaluating evidence” and “scrutinising arguments”. furthermore, a recurring result from prior research is that the knowledge learners acquire through collaboration is surprisingly dissimilar (miyake, 1986). this might be especially relevant for educational settings where students engage in collaborative learning to develop sra skills. research on expert-layperson communication (bromme, jucks & runde, 2005) has shown that large differences in domain expertise may have detrimental effects on communication and understanding. measures to support expert-layperson communication have shown positive effects (e.g., nückles & stürz, 2006). in the context of sra these knowledge differences exist, e.g., between scholars acting as teachers and students in their early years but also in the context of communicating scientific outcomes to wider audiences. an open question is how different disciplines try to overcome detrimental effects and make optimal use of large knowledge differences between scholars and students. 3.5 the influence of digital technologies on sra needs further research studies show that digital technologies affect reasoning and learning contingent to the way that they are used. for instance, a study by sparrow, liu and wegner (2011) revealed that digital technologies increasingly become “external memories” integral to people’s reasoning. their findings show that the availability of externally stored information changes cognitive processing dramatically, depending on the person’s assumptions on later accessibility. it is plausible to assume that the availability of digital technology affects sra in a similar way. moreover, this should be true for all three epistemic modes, but in different ways. when advancing theory about natural and social phenomena, technology is typically used for data collection and visualisation (e.g., computer simulations; gijlers & de jong, 2009) as well as for analysis, including not only statistical analysis but also analysis based on language and logic (e.g., rosé, wang, arguello, stegmann, weinberger & fischer, 2008). when applying science-based reasoning in practice, technology is often used to provide access to the knowledge base and theories in the respective domain (e.g., sparrow, et al., 2011). in the epistemic mode of artefact-centred scientific reasoning, technology often acts as the core enabler for prototypes or simulating features of a design artefact (e.g., wiethoff, schneider, rohs, butz & greenberg, 2012). across the three modes, research has generated evidence that communication and collaboration can be substantially enhanced by digital technologies (see stegmann et al., 2012). furthermore, awareness tools, i.e. tools capable of capturing and mirroring the quality of group processes via external f. fischer et al. 39 | f l r representations, have shown strong potential to support scientific argumentation (janssen & bodemer, 2013; streng, stegmann, boring, böhm, fischer & hussmann, 2010). we propose that future research should investigate how technologies shape sra. firstly, research should more systematically address how available and easily accessible technologies influence scientific reasoning in the different epistemic modes and activities. a co-evolutionary perspective on the mutual influence of technology development and scientific reasoning seems promising, for example, how access to scientific information through the internet affects the sra of practitioners. secondly, research should investigate the effects of technological tools specifically designed to facilitate certain epistemic activities in sra. prior research on computer simulations and computer-supported collaboration will be informative for the formulation of design principles for the development of technology-based scaffolds. 4. conclusions sra is considered one of the core competences in knowledge societies (trilling & fadel, 2009). knowledge of the structure and generality of these competences, their emotional, social and technological conditions, and how they can be facilitated appears key for a promising re-design of curricula and interventions in schools, higher education and vocational practice to foster the development of sra. as a starting point for the necessary interdisciplinary research we suggest the following broad definition of sra: scientific reasoning and argumentation include the knowledge and skills involved in different epistemic activities (problem identification, questioning, hypothesis generation, construction of artefacts, evidence generation, evidence evaluation, drawing conclusions as well as communicating and scrutinising scientific reasoning and its results) in the context of three different epistemic modes (advancing theory building about natural and social phenomena, science-based reasoning in practice, and artefact-centred scientific reasoning,). scientific reasoning and argumentation are assumed to consist of domain-specific as well as domain-general components, and depend on emotional, social, instructional/facilitative, and technological conditions. we proposed a research agenda on the analysis and facilitation of sra in educational contexts, which significantly broadens our perspective beyond basic experimental research. based on stokes’ (1997) model of scientific knowledge production, we suggested three epistemic modes of sra: (1) advancing theory building about natural and social phenomena, (2) science-based reasoning in practice, and (2) artefact-centred scientific reasoning. in a broad range of domains, all three epistemic modes play a role. students thus need to learn to understand how scientific knowledge is developed in their domains of study, and how it can be applied to address practical problems. to an extent differing vastly between domains and study programmes, students are also expected to learn to participate in processes of scientific research (trilling & fadel, 2009). we further identified eight epistemic activities, of which some have only received marginal or narrowly focused consideration in research on sra, mainly in the experimental paradigm: (1) problem identification, (2) questioning, (3) hypothesis generation, (4) construction and redesign of artefacts, (5) evidence generation, (6) evidence evaluation, (7) drawing conclusions and (8) communicating and scrutinising. we do not claim that this process typology is exhaustive, and do not intend to conceal that others have developed alternative typologies (e.g., van joolingen, de jong, lazonder, savelsbergh & manlove, 2005; white & frederiksen, 1998). instead it is proposed as a starting point for an interdisciplinary research agenda, to be modified in further theoretical discussion and based on findings of empirical studies. based on this framework, we suggest five further areas in research on sra that require more systematic investigation. first, research should investigate the differences between disciplines regarding how epistemic modes and activities are employed and to what extent knowledge generated within them is considered as evidence for or against theories. we suggest that it is crucial to advance our understanding of sra by determining which aspects are domain-general and which aspects are specific for a single domain or group of domains (see schunn & anderson, 1999). f. fischer et al. 40 | f l r second, commonalities and differences between disciplines are also likely to exist with respect to measures of intervention and facilitation. on the one hand, some of the interventions developed for a specific domain and context might prove generalizable to some extent to other contexts and domains. on the other hand, domain-independent instructional approaches might well be differentially effective in different domains (see klahr et al., 2011). in addition, we suggest building a coherent conceptual framework for integrating the diverse research findings from intervention research across domains. chi’s (2009) icap model might be a promising starting point in this respect to integrate the available evidence and guide future research on sra interventions. icap classifies learning activities based on their underlying cognitive processes into interactive, constructive, active and passive. the model predicts the best learning outcomes for interactive learning activities, followed by constructive, active and passive activities. third, research on sra displays a strong cognitive bias. however, it seems likely that most scientific reasoning processes are triggered, modulated, or followed by emotions (see shuman, & scherer, 2014). thus far there is no systematic research on emotions in the context of sra, which is striking because, for example, curiosity is widely regarded as a major driving force for any scientific endeavour (pekrun & stevens, 2011). fourth, scientific reasoning is increasingly recognised as a social epistemic practice rather than a purely individual activity (dunbar, 1995). however, prior research on sra has examined the social context in which sra appears in a rather unsystematic way. therefore, we suggest considering constructs of research fields that are advanced in this perspective, such as peer assessment (cho et al., 2006; strijbos & sluijsmans, 2010) or research on collaboration scripts (fischer et al., 2013), as starting points for addressing the social aspects of scientific reasoning. fifth, recent years have seen an expansion of digital technology in nearly every sector of society, including research and related fields of practice. we argue that the effects of digital technologies on sra practices need to be examined more systematically. important questions include how digital technologies are used to support scientific reasoning and how technologies can be designed to support students in sra (see for example gijlers & de jong, 2009). given the amount of research in the fields of scientific reasoning and scientific argumentation described at the outset of this article, the field might benefit from an integrative view that combines the so far largely separated strands of research. with concerted and interdisciplinary research efforts, we strongly believe that we may achieve a better understanding of what sra skills are, how they develop and how their development can be supported effectively. the outcomes of this research may subsequently inform educational practice to help educate citizens who are able to participate in science-related societal debates and make more systematic use of scientific knowledge and skills. keypoints we review research in three areas – developmental research on scientific reasoning, research on argumentation, and research on interventions on scientific reasoning and argumentation. the article proposes a framework for scientific reasoning and argumentation meant as a starting point for interdisciplinary research. the framework includes three epistemic modes: advancing theory building about phenomena, artefact-centred scientific reasoning, and science-based reasoning in practice. we distinguish eight epistemic activities (e.g., generating evidence) relevant in all three epistemic modes. we argue that differences between disciplines as well as the roles of emotions, the social context, and digital technologies have been neglected and are promising foci for interdisciplinary research on scientific reasoning and argumentation. f. fischer et al. 41 | f l r acknowledgements this work was funded by the elite network of bavaria. references ainley, m. (2006). connecting with learning: motivation, affect and cognition in interest processes. educational psychology review, 18(4), 391-405. doi: 10.1007/s10648-006-9033-0 bromme, r., jucks, r., & runde, a. (2005). barriers and biases in computer-mediated expert-laypersoncommunication. in r. bromme, f.w. hesse, & h. spada (eds.), barriers and biases in computermediated knowledge communication (pp. 89-118). new york: springer. brun, g., doğuoğlu, u., & kuenzle, d. (eds.). (2008). epistemology and emotions. aldershot, uk: ashgate. bullock, m., sodian, b., & koerber, s. (2009). doing experiments and understanding science. development of scientific reasoning from childhood to adulthood. in w. schneider, & m. bullock (eds.). human development from early childhood to early adulthood: findings from a 20 year longitudinal study (pp. 173-198). new york, nj: psychology press. bullock, m., & ziegler, a. (1999). scientific reasoning: developmental and individual differences. in f. e. weinert, & w. schneider (eds.). individual development from 3 to 12: findings from the munich longitudinal study (pp. 38-54). cambridge: cambridge university press. carey, s., & smith, c. (1993). on understanding the nature of scientific knowledge. educational psychologist, 28(3), 235-251. doi: 10.1207/s15326985ep2803_4 cavagnetto, a. r. (2010). argument to foster scientific literacy. a review of argument interventions in k-12 science contexts. review of educational research, 80(3), 336-371. doi: 10.3102/0034654310376953 charlin, b., boshuizen, h. p., custers, e. j., & feltovich, p. j. (2007). scripts and clinical reasoning. medical education, 41(12), 1178-1184. doi: 10.1111/j.1365-2923.2007.02924.x chi, m. t. (2009). active, constructive, interactive: a conceptual framework for differentiating learning activities. topics in cognitive science, 1(1), 73-105. doi: 10.1111/j.1756-8765.2008.01005.x chinn, c., & clark, d. b. (2013). learning through collaborative argumentation. in c. e. hmelo-silver, c. a. chinn, c. k. k. chan, & a. m. o'donnell (eds.), international handbook of collaborative learning (pp. 314-332). new york: routledge. chinn, c. a., & malhotra, b. a. (2002). epistemologically authentic inquiry in schools: a theoretical framework for evaluating inquiry tasks. science education, 86(2), 175-218. doi: 10.1002/sce.10001 chinnappan, m., ekanayake, m. b., & brown, c. (2011). specific and general knowledge in geometric proof development. saarc journal of educational research, 8, 1-28. chinnappan, m., & lawson, m. j. (1996). the effects of training in the use of executive strategies in geometry problem solving. learning and instruction, 6(1), 1-17. doi: 10.1016/s09594752(96)80001-6 cho, k., schunn, c. d., & wilson, r. w. (2006). validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. journal of educational psychology, 98(4), 891901. doi: 10.1037/0022-0663.98.4.891 clark, d. b., sampson, v., weinberger, a., & erkens, g. (2007). analytic frameworks for assessing dialogic argumentation in online learning environments. educational psychology review, 19(3), 343-374. doi: 10.1007/s10648-007-9050-7 clore, g. l., & huntsinger, j. r. (2009). how the object of affect guides its impact. emotion review, 1, 3954. doi: 10.1177/1754073908097185 dochy, f., segers, m., van den bossche, o., & gijbels, d., (2003). effects of problem-based learning: a meta-analysis. learning and instruction, 13(5), 533-568. doi: 10.3102/00346543075001027 http://dx.doi.org/10.1016/s0959-4752%2896%2980001-6 http://dx.doi.org/10.1016/s0959-4752%2896%2980001-6 http://psycnet.apa.org/doi/10.1037/0022-0663.98.4.891 f. fischer et al. 42 | f l r dunbar, k. (1995). how scientists really reason: scientific reasoning in real-world laboratories. in r. j. sternberg, & j. davidson (eds.), mechanisms of insight (pp. 365-395). cambridge ma: mit press. duschl, r. (2008). science education in three-part harmony: balancing conceptual, epistemic, and social learning goals. review of research in education, 32(1), 268-291. doi: 10.3102/0091732x07309371 erduran, s., simon, s., & osborne, j. (2004). tapping into argumentation: developments in the application of toulmin's argument pattern for studying science discourse. science education, 88(6), 915-933. doi: 10.1002/sce.20012 euler, l. (1741). solutio problematis ad geometriam situs pertinentis. commentarii academiae scientiarum petropolitanae, 8, 128-140. fiedler, k., & beier, s. (2014). affect and cognitive processes. in r. pekrun, & l. linnenbrink-garcia (eds.), international handbook of emotions in education (pp. 36-55). new york: taylor & francis. fischer, f., kollar, i., stegmann, k., & wecker, c. (2013). toward a script theory of guidance in computersupported collaborative learning. educational psychologist, 48(1), 56-66. doi: 10.1080/00461520.2012.748005 fischer, f., & mandl, h. (2005). knowledge convergence in computer-supported collaborative learning: the role of external representation tools. journal of the learning sciences, 14(3), 405-441. doi: 10.1207/s15327809jls1403_3 furtak, e. m., seidel, t., iverson, h., & briggs, d. c. (2012). experimental and quasi-experimental studies of inquiry-based science teaching. a meta-analysis. review of educational research, 82(3), 300329. doi: 10.3102/0034654312457206 galbraith, p. l., henn, h.-w., & niss, n. (2007). modelling and applications in mathematics education. new york, nj: springer. gijlers, h., & de jong, t. (2009). sharing and confronting propositions in collaborative inquiry learning. cognition and instruction, 27(3), 239-268. doi: 10.1080/07370000903014352 herrenkohl, l. r. & cornelius, l. (2013). investigating elementary students' scientific and historical argumentation. journal of the learning sciences, 22(3), 413-461. doi: 10.1080/10508406.2013.799475 hierholzer, c., & wiener, c. (1873). ueber die möglichkeit, einen linienzug ohne wiederholung und ohne unterbrechung zu umfahren. mathematische annalen, 6(1), 30-32. doi: 10.1007/bf01442866 hollingshead, a. b., gupta, n., yoon, k., & brandon, d. p. (2011). transactive memory theory and teams: past, present, and future. in e. salas, s. m. fiore, & m. p. letzky (eds.), theories of team cognition: cross-disciplinary perspectives (pp. 421-455). new york: routledge. inhelder, b. & piaget, j. (1958). the growth of logical thinking from childhood to adolescence: an essay on the construction of formal operational structure. london: routledge & kegan pau. janssen, j., & bodemer, d. (2013). coordinated computer-supported collaborative learning: awareness and awareness tools. educational psychologist, 48(1), 40-55. doi: 10.1080/00461520.2012.749153 kelly, g. j., & takao, a. (2002). epistemic levels in argument: an analysis of university oceanography students' use of evidence in writing. science education, 86(3), 314-342. doi: 10.1002/sce.10024 klahr, d., & dunbar, k. (1988). dual space search during scientific reasoning. cognitive science, 12(1), 148. doi: 10.1207/s15516709cog1201_1 klahr, d., zimmerman, c., & jirout, j. (2011). educational interventions to advance children’s scientific thinking. science, 333(6045), 971-975. doi: 10.1126/science.1204528 koerber, s., & sodian, b. (2009). reasoning from graphs in young children. preschoolers’ ability to interpret and evaluate covariation data from graphs. journal of psychology of science & technology, 2(2), 73-86. doi: 10.1891/1939-7054.2.2.73 koerber, s., sodian, b., kropf, n., mayer, d., & schwippert, k. (2011). die entwicklung des wissenschaftlichen denkens im grundschulalter. theorieverständnis, experimentierstrategien, dateninterpretation. zeitschrift für entwicklungspsychologie und pädagogische psychologie, 43(1), 16-21. doi: 10.1026/0049-8637/a000027 koerber, s., sodian, b., thoermer, c., & nett, u. (2005). scientific reasoning in young children: preschoolers' ability to evaluate covariation evidence. swiss journal of psychology 64(3), 141-152. doi: 10.1024/1421-0185.64.3.141 http://psycnet.apa.org/doi/10.1024/1421-0185.64.3.141 f. fischer et al. 43 | f l r kollar, i., fischer, f., & slotta, j. d (2007). internal and external scripts in computer-supported collaborative inquiry learning. learning & instruction, 17(6), 708-721. doi: 10.1016/j.learninstruc.2007.09.021 kolodner, j. l. (2007). the roles of scripts in promoting collaborative discourse in learning by design. in f. fischer, i. kollar, h. mandl, & j. m. haake (eds.), scripting computer-supported collaborative learning cognitive, computational and educational approaches (pp. 237-262). new york: springer. koslowski, b. (1996). theory and evidence: the development of scientific reasoning. cambridge, ma: mit press/bradford books. koslowski, b. (2012). scientific reasoning: explanation, confirmation bias, and scientific practice. in g. j. feist, & m. e. gorman (eds.), handbook of the psychology of science (pp. 151-192). new york, nj: springer. kruglanski, a. w., & gigerenzer, g. (2011). intuitive and deliberate judgments are based on common principles. psychological review, 118(1), 97-109. doi: 10.1037/a0020762 kuhn, d. (1991). the skills of argument. new york: cambridge university press. kuhn, d., & franklin, s. (2006). the second decade: what develops (and how)? in d. kuhn, & r. siegler (eds.), handbook of child psychology: vol. 2. cognition, perception, and language (pp. 517-550). hoboken, nj: wiley. lave, j., & wenger, e. (1991). situated learning: legitimate peripheral participation. cambridge, uk: cambridge university press. mäkitalo-siegl, k., kohnle, c., & fischer, f. (2011). computer-supported collaborative inquiry learning and classroom scripts: effects on help-seeking processes and learning outcomes. learning and instruction, 21(2), 257-266. doi: 10.1016/j.learninstruc.2010.07.001 mayer, d., sodian, b., koerber, s., & schwippert, k. (2014). scientific reasoning in elementary school children: assessment and relations with cognitive abilities. learning and instruction, 29, 43-55. doi: 10.1016/j.learninstruc.2013.07.005 mcneill, k. l. (2011). elementary students' views of explanation, argumentation, and evidence, and their abilities to construct arguments over the school year. journal of research in science teaching, 48(7), 793-823. doi: 10.1002/tea.20430 mcneill, k. l., lizotte, d. j., krajcik, j., & marx, r. w. (2006). supporting students' construction of scientific explanations by fading scaffolds in instructional materials. journal of the learning sciences, 15(2), 153-191. doi: 10.1207/s15327809jls1502_1 miyake, n. (1986). constructive interaction and the iterative process of understanding. cognitive science, 10, 151-177. doi: 10.1207/s15516709cog1002_2 mohammed, s., & dumville, b. c. (2001). team mental models in a team knowledge framework: expanding theory and measurement across disciplinary boundaries. journal of organizational behavior, 22(2), 89-106. doi: 10.1002/job.86 moors, a., ellsworth, p. c., scherer, k. r., & frijda, n. h. (2013). appraisal theories of emotion: state of the art and future development. emotion review, 5(2), 119-124. doi: 10.1177/1754073912468165 morton, a. (2010). epistemic emotions. in p. goldie (ed.), the oxford handbook of philosophy of emotion (pp. 385–399). oxford, united kingdom: oxford university press. nicol, d., thomson, a., & breslin, c. (2014). rethinking feedback practices in higher education: a peer review perspective. assessment & evaluation in higher education, 39(1), 102-122. doi: 10.1080/02602938.2013.795518 noroozi, o., weinberger, a., biemans, h. j., mulder, m., & chizari, m. (2012). argumentation-based computer supported collaborative learning (abcscl): a synthesis of 15 years of research. educational research review, 7(2), 79-106. doi: 10.1016/j.edurev.2011.11.006 nückles, m., & stürz, a. (2006), the assessment tool: a method to support asynchronous communication between computer experts and laypersons. computers in human behavior, 22(5), 917-940. doi: 10.1016/j.chb.2004.03.021 nussbaum, m. (2011). argumentation, dialogue theory, and probability modeling: alternative frameworks for argumentation research in education. educational psychologist, 46(2), 84-106. doi: 10.1080/00461520.2011.558816 http://dx.doi.org/10.1016/j.learninstruc.2007.09.021 http://psycnet.apa.org/doi/10.1037/a0020762 http://dx.doi.org/10.1016/j.learninstruc.2010.07.001 http://dx.doi.org/10.1016/j.learninstruc.2013.07.005 http://dx.doi.org/10.1016/j.edurev.2011.11.006 http://dx.doi.org/10.1016/j.chb.2004.03.021 f. fischer et al. 44 | f l r okada, t., & simon, h. a. (1997). collaborative discovery in a scientific domain. cognitive science, 21(2), 109-146. doi: 10.1207/s15516709cog2102_1 osborne, j. (2010). arguing to learn in science: the role of collaborative, critical discourse. science, 328(5977), 463-466. doi: 10.1126/science.1183944 palincsar, a. s., & brown, a. l. (1984). reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. cognition and instruction, 1(2), 117-175. doi: 10.1207/s1532690xci0102_1 pekrun, r. (2006). the control-value theory of achievement emotions: assumptions, corollaries, and implications for educational research and practice. educational psychology review, 18, 315-341. doi: 10.1007/s10648-006-9029-9 pekrun, r., hall, n. c., goetz, t., & perry, r. p. (in press). boredom and academic achievement: testing a model of reciprocal causation. journal of educational psychology. pekrun, r., & stephens, e. j. (2011). academic emotions. in k. r. harris, s. graham, t. urdan, s. graham, j. m. royer, & m. zeidner (eds.), apa educational psychology handbook (vol. 2, pp. 3-31). washington, dc: american psychological association. quintana, c., reiser, b. j., davis, e. a., krajcik, j., fretz, e., duncan, r. g., & soloway, e. (2004). a scaffolding design framework for software to support science inquiry. journal of the learning sciences, 13(3), 337-386. doi: 10.1207/s15327809jls1303_4 roschelle, j., & teasley, s. d. (1997). the construction of shared knowledge in collaborative problem solving. in c. o'malley (ed.), computer supported collaborative learning (vol. 128, pp. 69-97). berlin: springer. rosé, c. p., wang, y. c., arguello, j., stegmann, k., weinberger, a., & fischer, f. (2008). analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. international journal of computersupported collaborative learning, 3, 237-271. doi: 10.1007/s11412-007-9034-0 sadler, t. d. (2004). informal reasoning regarding socio-scientific issues: a critical review of research. journal of research in science teaching, 41(5), 513-536. doi: 10.1002/tea.20009 salomon, g., & perkins, d. n. (1998). individual and social aspects of learning. review of research in education, 23, 1-24. doi:10.3102/0091732x023001001 sampson, v., & clark, d. (2009). the impact of collaboration on the outcomes of scientific argumentation. science education, 93(3), 448-484. doi: 10.1002/sce.20306 scardamalia, m., & bereiter, c. (2006). knowledge building: theory, pedagogy, and technology. in k. sawyer (ed.), cambridge handbook of the learning sciences (pp. 97-119). new york: university press. schunn, c. d., & anderson, j. r. (1999). the generality/specificity of expertise in scientific reasoning. cognitive science, 23(3), 337-370. doi: 10.1016/s0364-0213(99)00006-3 shafto, p., kemp, c., bonawitz, e. b., coley, j. d., & tenenbaum, j. b. (2008). inductive reasoning about causally transmitted properties. cognition, 109(2), 175-192. doi: 10.1016/j.cognition.2008.07.006 shavelson, r. j., & towne, l. (eds.). (2002). scientific research in education. washington, dc: national academic press. shuman, v., & scherer, k. r. (2014). concepts and structures of emotions. in r. pekrun, & l. linnenbrinkgarcia (eds.), international handbook of emotions in education (pp. 13-35). new york: taylor & francis. sinatra, g. m., broughton, s. h., & lombardi, d. (2014). emotions in science education. in r. pekrun, & l. linnenbrink-garcia (eds.), international handbook of emotions in education (pp. 415-436). new york: taylor & francis. sodian, b., & bullock, m. (2008). scientific reasoning – where are we now? cognitive development, 23(4), 431-434. doi: 10.1016/j.cogdev.2008.09.003 sodian, b., jonen, a., thoermer, c. & kircher, e. (2006). die natur der naturwissenschaften verstehen: implementierung wissenschaftstheoretischen unterrichts in der grundschule. in m. prenzel, & l. allolio-näcke (eds.), untersuchungen zur bildungsqualität von schule. abschlussbericht des dfg-schwerpunktprogramms (s. 147-160). münster: waxmann. http://dx.doi.org/10.1016/s0364-0213%2899%2900006-3 http://dx.doi.org/10.1016/j.cognition.2008.07.006 http://dx.doi.org/10.1016/j.cogdev.2008.09.003 f. fischer et al. 45 | f l r sodian, b., zaitchik, d., & carey, s. (1991). young children's differentiation of hypothetical beliefs from evidence. child development, 62, 753-766. doi: 10.1111/j.1467-8624.1991.tb01567.x sparrow, b., liu, j., & wegner, d. (2011). google effects on memory: cognitive consequences of having information at our fingertips. science, 333(6043), 776-778. doi: 10.1126/science.1207745 stegmann, k., wecker, c., weinberger, a., & fischer, f. (2012). collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment. instructional science, 40(2), 297-323. doi: 10.1007/s11251-011-9174-5 stokes, d. e. (1997). pasteur’s quadrant: basic science and technological innovation. washington, dc: brookings institution press. streng, s., stegmann, k., boring, s., böhm, s., fischer, f., & hussmann, h. (2010). measuring effects of private and shared displays in small-group knowledge sharing processes. in e. hvannberg, m. k. lárusdóttir, a. blandford, & j. gulliksen (eds.), proceedings of the 6th nordic conference on human-computer interaction (nordichi 2010) (pp. 789-792). new york, ny: acm. strijbos, j. w., & sluijsmans, d. (2010). unravelling peer assessment: methodological, functional, and conceptual developments. learning and instruction, 20(4), 265-269. doi: 10.1016/j.learninstruc.2009.08.002 teasley, s., fischer, f., dillenbourg, p., kapur, m., chi, m., weinberger, a., & stegmann, k. (2008). cognitive convergence in collaborative learning. in proceedings of icls 2008 (vol. 3, pp. 360– 367). international society of the learning sciences. trilling, b., & fadel, c. (2009). twenty-first century skills. learning for life in out times. san francisco: jossey-bass. van joolingen, w. r., & de jong, t. (1993). exploring a domain through a computer simulation: traversing variable and relation space with the help of a hypothesis scratchpad. in d. towne, t. de jong, & h. spada (eds.), simulation-based experiential learning (pp. 191-206). berlin: springer. van joolingen, w. r., de jong, t., lazonder, a. w., savelsbergh, e. r., & manlove, s. (2005). co-lab: research and development of an online learning environment for collaborative scientific discovery learning. computers in human behavior, 21, 671-688. doi: 10.1016/j.chb.2004.10.039 wegerif, r. (2007). dialogic education and technology: expanding the space of learning. new york: springer. wegner, d. m. (1987). transactive memory: a contemporary analysis of the group mind. in b. mullen & g. r. goethals (eds.), theories of group behavior (pp. 185-208). new york: springer. weinberger, a., stegmann, k., & fischer, f. (2010). learning to argue online: scripted groups surpass individuals (unscripted groups do not). computers in human behavior, 26(4), 506-515. doi: 10.1016/j.chb.2009.08.007 weiner, b. (2007). examining emotional diversity in the classroom: an attribution theorist considers the moral emotions. in p. a. schutz, & r. pekrun (eds.), emotion in education (pp. 75-88). san diego, ca: academic press. white, b. y., & frederiksen, j. r. (1998). inquiry, modelling, and metacognition: making science accessible to all students. cognition and instruction, 16(1), 3-118. doi: 10.1207/s1532690xci1601_2 wiethoff, a., schneider, h., rohs, m., butz, a., & greenberg, s. (2012). sketch-a-tui: low cost prototyping of tangible interactions using cardboard and conductive ink. in s. n. spencer (ed.), proceedings of the sixth international conference on tangible, embedded and embodied interaction (pp. 309-312). new york: acm. wu, s., & keysar, b. (2007). the effect of information overlap on communication effectiveness. cognitive science, 31(1), 169-181. doi: 10.1080/03640210709336989 zazkis, r., & chernoff, e. j. (2008). what makes a counterexample exemplary? educational studies in mathematics, 68, 195-208. doi: 10.1007/s10649-007-9110-4 zimmerman, c. (2000). the development of scientific reasoning skills. developmental review, 20, 99-149. doi: 10.1006/drev.1999.0497 zimmerman, c. (2007). the development of scientific thinking skills in elementary and middle school. developmental review, 27, 172-223. doi: 10.1016/j.dr.2006.12.001 http://dx.doi.org/10.1016/j.learninstruc.2009.08.002 http://dx.doi.org/10.1016/j.chb.2004.10.039 http://dx.doi.org/10.1016/j.chb.2009.08.007 http://dx.doi.org/10.1006/drev.1999.0497 frontline learning research 3 (2014) 31-49 issn 2295-3159 corresponding author: liisa postareff, p.o.box 9, 00014 (centre for research and development of higher education) university of helsinki, finland, liisa.postareff@helsinki.fi http://dx.doi.org/10.14786/flr.v2i1.63 31 | f l r explaining university students’ strong commitment to understand through individual and contextual elements liisa postareff a , sari lindblom-ylänne a & anna parpala a a university of helsinki, finland article received 20 th september 2013 / revised 14 th february 2014 / accepted 20 th february 2014 / available online 25 th april 2014 abstract since the late 1970s numerous studies have explored students‟ approaches to learning (referred to as the „sal‟ tradition). these studies have provided valuable evidence of students‟ study strategies and intentions at the university. since extensive research already exists on students‟ approaches to learning, there is a need to move forward and analyse student learning from new perspectives. in the present in-depth qualitative study, we analyse interviews of 34 students who scored extremely highly on the deep approach scale in a pre-test in our previous quantitative study (lindblom-ylänne, parpala & postareff, 2013) and thus are likely to have a strong commitment to understand, and a „disposition to understand for oneself‟ which is a recently introduced, yet unexplored phenomenon (see entwistle & mccune, 2009; mccune & entwistle, 2011). we identified several individual and contextual elements which provided explanations for the students‟ high scores on the deep approach, as well as for the increase, decrease or stability in their deep approach during one course. the results showed that most students showed a strong commitment to understand, but those whose deep approach sharply decreased during the course showed less commitment and their descriptions revealed problems with, for example, study skills, time management and regulation of learning. however, contextual elements such as the students' experiences of the course teaching and their interest in the course content did not clearly provide explanations for the changes in the deep approach. elements of a 'disposition to understand for oneself‟ clearly emerged among students whose deep approach did not decrease, or decreased only slightly. keywords: approaches to learning, disposition to understand for oneself, commitment to understand, higher education l. postareff et al. 32 | f l r 1. introduction numerous studies which have been conducted within the sal (students approaches to learning; see lonka, olkinuora, & mäkinen, 2004) tradition have identified three qualitatively different approaches to learning: the deep and surface approaches (e.g., biggs, 1987; entwistle & ramsden, 1983; marton & säljö, 1976, 1997) and strategic approach or organised studying (e.g. biggs, 1987; entwistle & ramsden, 1983). a deep approach to learning has been shown to be related to high-quality learning outcomes (diseth 2003; watters & watter 2007), and therefore university students are encouraged to aim at constructing meaning and develop deep understandings of the study content. however, research has shown that students vary to a great extent with regard to their approaches to learning in that some are more likely to adopt deep approaches while others will rely more on surface learning. moreover, most students‟ approaches seem to vary depending on the context (e.g. vermunt, 1998; nieminen et al., 2004, lindblom-ylänne et al., 2012). recent research has further suggested that some students continually aim at a deep understanding and thus have a disposition to understand for oneself (entwistle and mccune 2009; mccune and entwistle, 2011). the disposition to understand is a more consistent and stronger form of the „intention to understand‟ found in a deep approach to learning. students with such a disposition aim at reaching a full and satisfying understanding of what they study. entwistle and mccune suggest that a disposition to understand is an important characteristic for students to develop at university in order to cope with the uncertainty and complexity of society in the future. the disposition to understand contains three central elements: 1) a welldeveloped use of learning strategies which concentrate on relating ideas, the critical use of evidence and attention to detail; 2) a willingness to devote the necessary time, effort, and concentration to apply the learning strategies effectively; and 3) an alertness to the learning context (entwistle & mccune 2009; mccune & entwistle, 2011). these three elements are similar to the ones perkins and tishman (2001) identified when exploring „thinking dispositions‟. thinking dispositions are stable ways of reacting to situations and thinking critically, and are comprised of three components needed in carrying out intellectual tasks: willingness to apply effort, ability to perform the task effectively, and alertness to situations in which thinking is required. entwistle and mccune (2009) took this avenue, when analysing students‟ approaches to learning and disposition to understand. the disposition to understand is a broader concept than the deep approach to learning. while a deep approach describes the student‟s intention to understand the content of study and the use of effective learning strategies, such as relating ideas and using evidence (e.g. entwistle & ramsden 1983; marton & säljö, 1997), the disposition to understand focuses more broadly on a specific discipline as a whole. a student with a strong disposition to understand for oneself shows an emotional commitment to continuously strive towards understanding and to monitor the development of understanding the contents of study (entwistle & mccune 2009; mccune & entwistle, 2011). empirical studies on the disposition to understand for oneself are rare. still, in a recent study entwistle and mccune (in press) analysed nearly 2000 students from undergraduate courses and aimed to identify students who showed high and consistent scores for a deep approach to learning, as well as for organised effort and monitoring studying. they found one cluster of students whose scores were high on all these scales and remained stable over time, thus showing characteristics of a disposition to understand. the other clusters also scored high on the deep approach, but in addition showed surface elements, or lower scores on organised effort or monitoring studying. to our knowledge, this quantitative study is the only empirical one conducted on students‟ disposition to understand. what especially differentiates a disposition to understand from a deep approach is that the former is characterised as more stable (entwistle and mccune 2009; mccune & entwistle, 2011) while the latter has been characterised as more contextual and changeable (e.g., vermunt, 1998; nieminen et al., 2004, lindblom-ylänne et al., 2013). however, opposite views have also been presented, as the deep approach has been claimed to be a relatively stable construct (e.g., lietz & matthews, 2010; zeegers, 2001) but the mainstream conception seems to rely on the original view of the contextual and dynamic nature of the approaches presented by marton and säljö already in 1976. a deep approach to learning seems to be difficult to induce since a number of quantitative studies have shown a decrease in the deep approach after varying l. postareff et al. 33 | f l r study periods. some studies have shown a decrease in the deep approach after a three-year study period (biggs, 1987; watkins & hattie, 1985) and others have shown a decrease after shorter course units (e.g., lindblom-ylänne et al., 2013). evidence shows that inducing a deep approach to learning might be difficult even in student-centred learning environments (e.g. gijbels, segers, & struyf, 2008; struyven, dochy, janssens, & gielen, 2006). however, in a number of quantitative studies it has been shown that satisfaction with the quality of a course as well as a student-centred approach taken by teachers has been shown to enhance the application a deep approach (trigwell, prosser, & waterhouse., 1999; see also baeten, kyndt, struyven, & dochy, 2010). in addition, individual reasons such as intrinsic motivation, self-confidence, strong self-efficacy beliefs and openness to experience have been shown to enhance the adoption of a deep approach (baeten et al., 2010; kyndt, dochy, cascallar, & struyven, 2011). the changes in a deep approach and the individual and contextual factors affecting its adoption have thus been explored in a number of quantitative studies, but to our knowledge no qualitative research combining these two elements has been carried out. this study presents an in-depth qualitative analysis of the factors explaining the stability or changes in a deep approach to learning among students who scored extremely highly on a deep approach scale in the pre-test. the existence of the disposition to understand for oneself is also explored. thus the study provides both methodologically and theoretically a fresh perspective on research on student learning in higher education. 1.2 aims of the study the study aims to more deeply understand the results from previous research of ours which concentrated on analysing with quantitative methods group-level changes in one‟s deep approach during different courses (lindblom-ylänne et al., 2013). the pre-test/ post-test results showed large individual variation in all disciplinary contexts in terms of the amount and direction of change in students‟ deep approach to learning. the present study concentrates on analysing interviews of students who scored the highest on a deep approach scale in the pre-test. the aim is to identify the individual and contextual elements which are related to a strong commitment to understand as shown in the pre-test, and also to the stability or decrease in one‟s deep approach during the course as shown in the post-test. in addition, the study provides a new perspective on analysing approaches to learning as it focuses on exploring the existence of the disposition to understand for oneself. thus the second aim is to analyse how a disposition to understand for oneself emerged from the interviews. previous work on the disposition to understand for oneself has not been based on coherent evidence, and has not been empirically conducted with diverse samples of students (see entwistle & mccune, 2009). thus the present study aims to provide empirical evidence of the disposition to understand through an in-depth qualitative analysis of student interviews. 2. materials and methods the study adopts a mixed-methods approach as it has its roots in the quantitative results, which are then explained through analysing student interviews. mixed methods research is a type of research in which elements of qualitative and quantitative research approaches are combined in order to gain a deep and broad understanding of the phenomenon under study (see e.g. johnson, onwuegbuzie & turner, 2007). cresswell (2009) emphasises that the use of mixed methods approach has its roots in pragmatism, in which the most important thing is to focus attention on the research problem itself and use pluralistic approaches to derive knowledge about the problem. an advantage of using mixed methods is that the results from one method can help develop or inform other methods. in the present study we adopted a sequential procedure through which we attempted to elaborate on and expand the quantitative findings with the qualitative ones (see creswell, 2013). in addition, the study adopts, along with a traditional variable-oriented technique, a person-oriented approach, which assumes that human behavior is affected by several factors and the interplay between these l. postareff et al. 34 | f l r several factors forms unique profiles of individuals (see vanthournout et al., 2013). the person-oriented approach is sometimes considered as an opposite to variable-oriented techniques, but they should be more viewed as complementary techniques (vanthournout et al. 2013). in the present study both techniques are used to provide complementary information (see section 2.3). 2.1 participants and contexts of the study the participants of the study were selected among 277 bachelor students from the university of helsinki, who took part in our previous quantitative study (lindblom-ylänne et al., 2013). the 34 students were selected on the basis of having scored extremely highly on a scale measuring their deep approach to learning. the students were categorised into three groups on the basis of how their deep approach to learning changed between two measurements (at the beginning and at the end of a course). the present study focuses on analysing interviews of the students in these three change groups. in the following chapter, we provide some insight into our previous quantitative study and into the procedure of creating the change groups in order to create an understanding of the inventory results and of the differences between the three groups of students. a more detailed description of the quantitative analyses is provided in our previous study (lindblom-ylänne et al., 2013). 2.1.1 background for selecting the participants for the previous studies, the data were collected from 10 bachelor-level courses in five different disciplines. the students of the courses completed the approaches to studying and learning inventory (alsi; see section 2.2.) at the beginning and at the end of a lecture course. at the beginning of the course the students were asked to consider how they had studied in their major up until then, and at the end of the course they were asked to focus on how they had studied in that particular course. their scores at the beginning and at the end of the course were used to explore changes in their approaches to learning between the two measurements at the beginning of the courses the mean of the deep approach scale in each discipline varied from 3.42 to 3.66 on a scale of 1 to 5, and at the end of the courses the range was from 3.07 to 3.24. in each discipline, the decrease in the deep approach between the two measurements was statistically significant. a paired-samples t-test showed that the t-values ranged from 2.25 to 4.86 and the p-value varied between 0.001 and 0.028. among bioscience students the decline was the lowest being -0.15, while among mathematics students the decline was the highest being -0.35. after exploring the differences between the two measurements at the group level, we wanted to explore the changes at a more individual level. we computed the deep approach change variables by subtracting the students‟ scale scores at the end of the course from their scale scores at the beginning of the course. the magnitude and the direction of the change served to create five groups of change (see table 1): strong increase, increase, no change, decrease and strong decrease. the distributions of the change variables were explored in detail in order to decide upon the best cutting points for the change groups. our decision was to create the change groups on the basis of likert-scale point changes, which is a procedure followed previously by lindblom-ylänne, trigwell, nevgi & ashwin (2006). the benefit of using change variables is that they mirror the absolute changes instead of relational changes. our purpose was to focus on the changes of individual students regardless of other students‟ values. we used a quarter of a likert scale as the cutting point between the change groups in order to separate different types of changes in as detail as possible. a half of a likert scale would have been too robust, since a decrease of 0.15 in the deep approach was already statistically significant. we also considered categorisation on the basis of standard deviation or median split, but these are based on relational values and they would have rendered a comparison of the changes between the three approaches impossible (which was the focus in our previous study, see lindblom-ylänne et al., 2013). for the present study, the 34 students were divided into three subgroups on the basis of the change variable categories: 1) deep approach remains high (includes the „no change‟ and the „increase‟ groups), 2) slight decrease in one‟s deep approach and 3) sharp decrease in one‟s deep approach. l. postareff et al. 35 | f l r table 1 change variable categories the direction of change differences in scores sharp increase in the approach to learning 0.50 or higher slight increase in the approach to learning from 0.25 to 0.49 no change from -0.24 to 0.24 slight decrease in the approach to learning from -0.25 to -0.49 sharp decrease in the approach to learning -0.50 or lower after creating the change groups, the students were divided into four ranked percentile groups based on their deep approach scores at the beginning of the course. the 34 students who were selected for the present study were categorised into the highest-ranked percentile group at the beginning of the course. their deep approach score at the beginning of the course was between 4.00 and 5.00 which is remarkably higher than the mean score of the deep approach in any of the disciplines. thus the selected students were among the highest scoring students in the deep approach scale. while the students‟ deep approach scores at the beginning of the course varied between 4.00 and 5.00, the variation at the end of the course was between 1.5 and 5.00. nine students were categorised into the „deep approach remains high‟ group. of them, four students showed increase in the deep approach, while five students‟ scores remained exactly the same. seven students were categorised into the „slight decrease in deep approach‟ group. these students‟ deep approach scores at the beginning of the course varied between 4.00 and 5.00, and all of them scored 0.25 lower on the deep approach scale at the end of the course. the „sharp decrease in the deep approach‟ group consisted of 18 students, whose deep approach scores at the beginning of the course varied between 4.00 and 5.00, and at the end of the course between 1.5 and 4.50. the decrease in their deep approach was between -0.5 and -3.25. however, in only five cases was the decrease more than -1.00. 2.1.2 participants and contexts of the 34 students, 21 were female and 13 male. they ranged in age from 19 to 43 years, the mean age being 27. in finland, students‟ mean age is higher than in most european countries because students graduate from upper secondary school later and thus enter university at an older age than in most european countries. however, the mean age of 27 years is higher than the average among bachelor students. two of the students were minoring in the courses while the rest were major students. the 34 students were attending a compulsory bachelor-level course in their own discipline (except for the two students who were minor students), and each was interviewed after completing the course. the students participated in one of 10 courses representing five disciplines: three courses in bioscience (n=7), two in educational sciences (n=12), one in mathematics (n=2), three in theology (n=10) and one in veterinary medicine (n=3). the courses are presented in table 2. the courses were designed for second year students, except for courses on educational sciences which were designed for first year students. all ten courses were lecture courses that included both lecturing and activating assignments for the students, with the nature of the assignments varying. the courses lasted from 6 to 13 weeks and were worth between 3 and 10 credits. eight courses included a written exam at their conclusion, while one course included a learning diary and an oral exam, and one included a drama-type exam. an effort was made to select as similar courses as possible from the point of view of the students‟ role in order to minimise the effect of the course itself on the results. however, one course was based on group activities, and the exam was a group exam. in another course, the exam was a group exam in drama form. thus two courses differed from the others because they were based more on group activities, while the rest were based on individual tasks and exams. in all courses the students attended in lectures and completed some activating tasks. l. postareff et al. 36 | f l r table 2 the discipline and course of the participants discipline course students (n) course type and assessment bioscience course 1 3 lectures, written exam at the end course 2 3 lectures, written exam at the end course 3 1 lectures, written exam at the end educational sciences course 1 7 peer group working, short lectures, oral exam and learning diary course 2 5 lectures, written exam at the end, essay mathematics course 1 2 lectures, calculations, written exam at the end theology course 1 3 lectures (including lots of discussions) written exam at the end course 2 6 lectures, written exam at the end course 3 1 lectures (including much discussions), drama-type exam at the end veterinary medicine course 1 3 lectures, written exam at the end 2.2 materials the students were interviewed on a voluntary basis once the courses had concluded. they were told beforehand about the content and purpose of the interview, and at the beginning of the interview they were allowed to ask questions about the research or the interview. the interviewees were told about the confidentiality of the interviews and that they cannot be identified at any point. the interviews were held in autumn 2009, autumn 2010, or spring 2011. conducted by the first author and two research assistants, the interviews lasted from 35 to 75 minutes and were transcribed verbatim. the interviews focused on the students‟ descriptions of their intentions and goals related to studying and learning at the university, their learning processes and practices in general as well as during the course they had just completed, and their experiences of studying and learning in the specific course they had recently attended. the interviews were deep and open in nature, with each of them covering the above-mentioned theme. for our previous study (lindblom-ylänne et al., 2013), which formed the basis for selecting the students for the present one, the students filled in a revised version of the alsi (entwistle & mccune, 2004; parpala & lindblom-ylänne, 2012) which contains scales measuring students‟ approaches to learning. the deep approach scale consists of four items, which are presented in table 3. items 1 and 2 measure students‟ learning strategies while items 3 and 4 focus on their intentions. in the interviews, the focus was similarly on students‟ study strategies and intentions. l. postareff et al. 37 | f l r table 3 items on the deep approach scale item 1 ideas i‟ve come across in my academic reading set me off on long chains of thought. item 2 i look carefully at evidence to reach my own conclusion about what i‟m studying. item 3 i try to relate new material, as i am reading it, to what i already know on the topic. item 4 i try to relate what i learn in one course to what i have learned in other courses. 2.3 analyses qualitative content analysis was selected as the analysis method for the interview data. in the first phase, we used inductive content analysis, in which themes are allowed to emerge from the data without any theoretical assumptions (see elo & kyngäs, 2007; schilling, 2006). inductive content analysis was used to analyse how the students described their studying and learning (both generally in their university studies and in the specific course), as well as their study experiences during the specific course. three steps typical of inductive content analysis were carried out: data reduction, grouping and conceptualisation (see patton, 1990; flick, 2002). these three steps represented the variable-oriented technique (see vathournout et al., 2013) as the aim was to identify all factors related to students‟ learning and study experiences regardless of the individuals. the first step was data reduction, in which all descriptions related to these issues were identified from the interview transcripts. this was done by the first author independently. the second step was to group similar descriptions under same categories (e.g., all descriptions related to students‟ motivation were placed under the same category). this was done by the first and second author independently, and the identified categories were compared and discussed. after an in-depth discussion, the identified descriptions were placed under seven categories. the third step, conceptualisation, included finding a concept for each of the seven categories which describe the content and nature of each category. for example, the category including description related to emotions and attachment was conceptualised as „emotional commitment‟. this was done in collaboration with all three authors. in the second phase of the analysis a person-oriented approach was adopted (see vanthournout et al., 2013). each of the seven categories was investigated in more depth within the following three groups of students: „deep approach remains high‟, „slight decrease in deep approach‟, and „sharp decrease in deep approach‟ in order to identify similarities and differences within students in the same group and between students in different groups. for example, we explored how students showing sharp decrease in their deep approach described their motivation, emotional commitment, and the remaining five categories, and compared their descriptions to other students‟ descriptions. this phase was conducted by the first author, but the final results were obtained through a thorough discussion with all three authors. the third phase of the analysis was deductive content analysis, in which existing theories are utilised in analysing the data (see elo & kyngäs, 2007; schilling, 2006). in this phase we analysed, through adopting a person-oriented approach, how the three central elements of the disposition to understand emerge in each students‟ interview: 1) a well-developed use of learning strategies which concentrate on relating ideas, the critical use of evidence and attention to detail; 2) a willingness to devote the necessary time, effort, and concentration to apply the learning strategies effectively; and 3) an alertness to the learning context (see entwistle & mccune 2009; mccune & entwistle, 2011). this phase was conducted independently by the first two authors. the findings of both authors were compared and discussed together. the inter-rater agreement was high, although several discussions were needed to obtain the final results. to give an example, the „emotional commitment‟ category was discussed in depth to determine which elements would be included in it. l. postareff et al. 38 | f l r 3. results and discussion in each of the three groups, elements related to the high level of a deep approach at the beginning of the course were analysed. in addition, students‟ descriptions of studying and learning in the specific courses were analysed separately in each group. in each of the groups the elements related to stability or changes in one‟s deep approach could be categorised under seven different themes: 1) students‟ motives, intentions and study strategies, 2) organised studying and regulation of learning, 3) emotional commitment, 4) experiences of challenge, 5) interest in the course content, 6) devoting time and effort to studying during the course and 7) experiences of the course teaching. these were identified in each of the three groups. finally, elements of a „disposition to understand for oneself‟ were identified. in what follows, the seven themes identified in the student interviews are described and discussed within each of the following categories of change: „remains high‟, „slight decrease‟ and „sharp decrease‟. finally, the results concerning a disposition to understand are presented and discussed. 3.1. individual and contextual elements related to one’s deep approach to learning both individual and contextual elements related to the stability or changes in one‟s deep approach to learning were identified. however, clearly distinguishing between individual and contextual elements was challenging because, for example, the category „devoting time and effort to studying during the specific course‟ combined the individual‟s effort and the context. the range of elements is presented from individual to more contextual ones below. 3.1.1. students‟ motives, intentions and study strategies when describing their studying and learning in general, the nine students whose deep approach scores remained high described having a strong intrinsic motivation to study at the university. they said that it is not enough for them to just pass courses, but that they aim at a deep understanding of the subject matter and developing themselves as persons. marton, dall‟alba and beaty (1993) found a similar category, „changing as a person‟, and van rossum, deijkers and hamer (1985) labelled another similar category as „self realisation‟. the nine students‟ descriptions revealed that their conceptions of learning were sophisticated, including, for example, conceptions of learning being about relating ideas and combining new information with their previous knowledge. some of the students also stated that when they are able to explain the subject to someone else in their own words, they feel they have learned well. these students‟ descriptions revealed that they all had developed good and functional study skills. they all concentrated on the big picture instead of details, and explained that they had formed a larger picture of the learned material for themselves. the students‟ descriptions revealed that they go through deep thinking processes while studying. these students‟ descriptions of their studying were therefore in line with the inventory results in that they clearly reflected an adoption of a deep approach to learning: their intention was to learn deeply and to form a coherent whole from the subject matter, and they used strategies which enabled deep-level learning (see e.g. entwistle, 2009). the seven students showing a slight decrease in their deep approach to learning described their studying and learning at the university very similarly to the students showing no deep approach decrease. in addition, all seven students in this group seemed to have a strong intrinsic motivation towards their university studies. they emphasised that learning is about broadening one‟s understanding and observing things from new perspectives. integrating new information with previous knowledge was also emphasised. all six students‟ descriptions revealed that they had good study skills as did the students showing no deep approach decrease. the students stated, for example, that they analyse the subject matter from diverse perspectives and explain the central concepts or content in their own words. most mentioned that they search for extra material by themselves and concentrate on what they find challenging. in addition, most also mentioned using a variety of learning strategies (e.g. mind maps, notes, explaining things in their own words). none of the students described difficulties in their learning. two of the seven mentioned preparing for lectures beforehand through familiarising themselves with the content. thus the students showing a slight l. postareff et al. 39 | f l r deep approach decrease were aiming at a deep understanding, and used effective strategies to accomplish this, being in this sense very similar to the students showing no decrease. the 18 students showing a sharp deep approach decrease differed more from the two other student groups. firstly, six of them described more extrinsic motivators (such as earning a degree), although they also mentioned that they like studying at the university and that in most occasions they are also interested in course content. secondly, not all students in this group described applying a deep approach to learning as strongly as those in the two other groups. for example, one characterised her study process as mainly memorising things. although most students‟ descriptions reflected elements of deep learning, only three described themselves learning as deeply as students in the two other groups. moreover, three students mentioned uncertainty regarding their own way of learning as well as feelings of incompetence. two of these students described their learning in a very theoretical manner, rather than in their own words; thus their awareness of the elements of deep learning as students in educational sciences might have contributed to their high score on the deep approach scale at the beginning of the course. the third student stated that she would like to „learn how to learn‟. these three students seemed to have problems with their study skills. however, the other students‟ interviews did not reflect such problems. these results imply that the more varying motives, intentions and study strategies among some of the students in this group might be related to a decrease in their deep approach to learning. while students showing a slight or no decrease had a commitment to learn, the interviews of students showing a sharp decrease did not reflect such a clear commitment. there were no clear differences between students participating in different courses with regard to their intentions, motives and study strategies, except that two students in a an educational sciences course described their learning in a way which implied that they were aware of theories of learning when describing their own studying. however, this difference was related more to the students‟ discipline than to the course itself. 3.1.2. regulation of learning the descriptions of the nine students whose deep approach scores remained high revealed that they all had good self-regulation skills. self-regulation refers to processes in which students plan, monitor, control and regulate their own learning (vermunt, 1998). self-regulation resembles organised studying (entwistle & mccune, 2004), making the two concepts partly overlap. for example, time-management skills can be related to organised studying or self-regulated learning. all nine students described setting goals for their own learning, and studying regularly instead of only before deadlines or exams. they wanted to learn the course content deeply, and they read additional material or consulted their teachers or peers when they had difficulties in understanding the content. they all attended lectures regularly, although attendance was voluntary in all courses. thus they clearly assumed responsibility for their own learning. scheduling studies beforehand was also emphasised by some of the students. they all described having good time-management skills although one student mentioned sometimes having difficulties in getting started. some of these students emphasised that they concentrate on the most relevant content and study effectively in order to avoid an overload of work. the interview results support the results of previous studies showing that selfregulation is related to a deep approach to learning (e.g. lonka & lindblom-ylänne, 1996; heikkilä & lonka, 2006; heikkilä et. al., 2011; vermunt & van rijswijk, 1988). self-regulation skills are also related to students‟ study pace, study success and well-being (e.g. heikkilä et. al., 2011; rytkönen et. al., 2012). the seven students showing slight decrease in the deep approach also seemed to have good selfregulation skills. their descriptions implied that they were aware of what they were supposed to learn and were able to focus their attention on the relevant content. they all studied on a regular basis, but a very organised way of scheduling own studies was mentioned by only one student, which clearly differentiated her from the three students showing no decrease in the deep approach. three of these students mentioned that they do not attend lectures regularly, but instead devote their time to reading the course material. however, none of these students‟ interviews reflected clear problems with self-regulation skills. the 18 students whose deep approach decreased sharply clearly differed from the two other groups with regard to self-regulation skills. only five students‟ descriptions reflected good self-regulations skills. the remaining 13 students‟ descriptions did not reflect severe problems in self-regulation, but most of these l. postareff et al. 40 | f l r students had not, for example, set goals for their own learning, and four of them expected concrete guidance and support from the teacher. thus a lack of regulation or external regulation characterised these students more than self-regulated learning. externally regulated students rely on teachers, other students or study material for guidance, while lack of regulation relates to difficulties in self-regulation. students who lack self-regulation skills are unsure about how they should study or may find it difficult to assess whether they have sufficiently learned the subject matter (vermunt, 1998; vermunt & verloop, 1999). furthermore, the 13 students in this group did not describe organising their studies systematically, and their time management skills were not as good as those whose deep approach did not decrease. two students had more severe problems concerning time-management and organising their studies effectively. for example, one described trying to find a rhythm and routine in her studies and avoid doing things right before the deadlines. these problems in self-regulation skills and time management are likely to explain the decrease in these students‟ deep approach, and may also reflect a combination of unorganised studying and applying a deep approach in an overly sense (see parpala et al., 2010). there were no differences between students participating in the different courses with regard to their self-regulation skills. 3.1.3. emotional commitment all nine students whose deep approach remained high described being committed to their studies. however, six of those students described stronger emotions related to studying their major subject, i.e. having a strong attachment to and respect for the subject they were studying. they also mentioned having a desire to learn more about the discipline and enjoying the university experience as well as pride in studying at the university. a recent quantitative study similarly showed a relationship between university students‟ positive emotions, such as pride, and a deep approach to learning (trigwell, ellis & han, 2011). of the seven students showing a slight deep approach decrease, only one mentioned strong emotions and an attachment to studying her discipline, or a desire to do so. enjoyment of learning was mentioned by four students in this group, but none of the seven described their studying in a negative light. the results show that these students did not have such a strong emotional commitment towards studying as did their peers whose deep approach did not decrease. all 18 students showing a sharp deep approach decrease described enjoying their studies in general, but a majority of these students mentioned it being very context-dependent. in some courses they are very committed, but in others they simply want to do the minimum in order to pass the course. only one of the 18 students described having a strong attachment or desire with respect to her studies. she stated that it is important for her to be part of the scientific community. to conclude, enjoyment of learning was mentioned by all 34 students, but clear differences were noted between the students‟ emotional commitment with regard to the changes in their deep approach to learning. again, there were no differences between students participating in the different courses with regard to their emotional commitment. 3.1.4. devoting time and effort to studying during the specific course eight of the nine students whose deep approach remained high during the courses devoted considerable time to studying the course content, and they described studying actively in all the courses they take. most of them said that they did not have to invest much time in preparing for the exam because they had studied thoroughly throughout the entire course. some mentioned that the course content was challenging, or that the lecturer proceeded too quickly, which compelled them to devote even more time and effort than normally. only one student mentioned that she did not invest time to study the course content until a few days before the exam. to conclude, active and regular participation in courses was common to all students, except for one, whose deep approach remained on a high level. of the seven students whose deep approach decreased slightly, five described investing time and effort in studying during the course. they actively participated in the lectures and spent a significant amount of time reading the study material. however, two of the students described being less active during the course. one of them stated that he did not have enough time to study on a regular basis and the other student l. postareff et al. 41 | f l r stated that she did not have to invest that much time or effort in her studying because the course did not provide much new information. only two of the 18 students showing a sharp decrease mentioned studying regularly and actively during the course. of the remaining 16 students, nine clearly stated that they invested less time and effort in the course than they normally would. seven of them felt that the course they participated in was not very demanding and they did not have to invest much time or effort in studying, and two students described that their own activity decreased because the course was based on group activities. the other seven students reported that they more generally tend to study actively only when exams approach. one stated that her weak study skills prevented her from studying more effectively during the course. one student worked during the evenings which left her little time for studying. the results are clear in that investing time and effort in studying during the course was related to the stability of one‟s deep approach to learning. the less students described being active during the course, the more their deep approach to learning declined. however, the reasons for investing less time varied, being related to particular study habits, weak study skills, working along with studying, or to the challenges the course presented. most of the students who felt that the course was not very demanding and therefore invested little time and effort, were students from the same theology course. otherwise no differences were noted between students participating in the different courses as to how much time or effort they devoted to studying. 3.1.5. experiences of challenge the nine students whose deep approach remained high during the course reported that the courses challenged them positively in one way or another. for four students the course content was challenging, which made them invest more time and effort in studying. five students did not describe their course as very challenging, but they wanted to thoroughly learn the content, and challenged themselves by reading extra material or analysing the content from different points of view. thus a challenging learning environment or a student‟s own desire to thoroughly learn the content, seemed to maintain these three students‟ deep approach at a high level. the students showing a slight deep approach decrease experienced the demands of the courses in different ways. three of the seven students said that the course did not offer them enough of a challenge. for instance, one described the course being more about pondering the course material from different perspectives, because the content was already so familiar. she would have hoped to learn more new information during the course. on the other hand, four students stated that the courses were in some ways too challenging. for example, one mentioned that the course books were too difficult and that she had to read them many times until gaining some kind of understanding. another student stated that the course covered too much information, and that he did not always know what he was supposed to study although he felt that he was able to follow the course well. thus the students showing a slight deep approach decrease experienced both too many and too few challenges, but only to a slight degree. despite facing challenges or too few of them, most of them actively participated in the lectures and put effort into studying during the course. of the 18 students showing a sharp deep approach decrease, the descriptions of eight indicated that the courses were not challenging enough. some considered that the content was easy, and some thought that the course provided little new information. conversely, four students mentioned that the courses were too challenging. two of them had difficulties understanding the course content, while two mentioned that they had difficulties in forming their own view of the content because they acted in groups, which was quite challenging for them. the remaining six students mentioned neither too many nor too few challenges. students from one theology course stated more often than students from the other courses that the course was not challenging enough, but in the other courses students varied more with respect to how they experienced the challenges the course presented. l. postareff et al. 42 | f l r these results suggest that the course „fit‟ is important for maintaining one‟s deep approach at a high level. either too many or too few challenges seem to lower the level of a deep approach among most students. kyndt, dochy, struyven and cascallar (2011) similarly showed that task complexity might hinder the application of a deep approach to learning. similarly, mccune and entwistle (2011) have emphasised that students need to experience challenging teaching–learning environments that systematically encourage students to focus on personal understanding. however, the results of the present study imply that some students seemed to be able to maintain their deep approach at a relatively high level although the course did not offer many challenges. this accord with the results of lindblom-ylänne and lonka (1999), who identified a cluster of students who were meaning-oriented and who independently found their own way, being immune to the effects of the teaching-learning environment. 3.1.6. interest in course content four of the nine students whose deep approach remained high mentioned that the course content was not very interesting. however, two stated that they became more interested in the content during the course. the other mentioned that the way the course was taught promoted her interest, and another expressed that discovering links between the course content and his previous experiences from working life increased his interest. more generally, this student stated that doing things properly is important to him, whether or not he is interested, and that he wanted to succeed and be proud of himself. the remaining five students found the courses more interesting than the two other students, although three of them did not express a strong interest in the course. despite the level of interest not being high among all of them, all these students showed a commitment to learn and wanted to succeed in their studies. kyndt et al. (2011) showed that the more student is motivated to study for autonomous reasons such as find a course pleasant, the more they will be inclined to use a deep approach to learning. so although not all students whose deep approach remained at a high level throughout the course described being interested in the course content, they seemed to have a more general motivation to learn for autonomous reasons. previous research similarly suggest that even though university students may find some content initially uninteresting and their studying may be based on extrinsic motivation, some are through self-regulation processes able to generate their own thoughts, feelings and actions to meet uninteresting study demands (ryan & deci, 2000; hidi & ainley, 2008). also students‟ whose deep approach to learning slightly declined varied with regard to how interesting they found the course content. six of the seven students described being interested in the content, with only one of them expressing a strong interest. one student said that the course did not interest her much and that her goal was simply to complete it. she also expressed being more interested in the course content at the beginning of the course, but that her level of interest decreased as the course progressed. ten of the 18 students showing a sharp deep approach decrease stated that they found the content of the course interesting. however, most of them were not interested beforehand in the content, but became so during the course. on the other hand, eight students found the course content less interesting, but only two of these students mentioned that their goal was only to pass the course because the content was not interesting or useful to them. students‟ interest in the course content did not explain the changes in their deep approach, since both the students whose deep approach decreased, and those whose deep approach did not, described different levels of interest in the course content. however, the students whose deep approach remained high described more often than the others that they create links between different courses, which implies that they try to find the meaning of the courses even if the content does not particularly interest them. furthermore, these students would invest time and effort in studying even though a course was not of great interest to them, because they would want to understand the content deeply. interestingly, there were no differences between students participating in the different courses with regard to their interest in the course content. 3.1.7. experiences of the course teaching the nine students whose deep approach remained high described the teaching of the course in a positive manner. however, only two of them expressed that they were extremely satisfied with the teaching l. postareff et al. 43 | f l r and thought that the teaching enhanced their learning. most students stated that the teaching was fine and that the teacher was pleasant, but they did not mention that the teaching would have considerably enhanced their learning. one student stated that no matter what the teaching is like, he always studies the same way. the students whose deep approach slightly decreased described their experiences of the teaching in different ways. two of the seven students described the course teaching in a slightly negative manner. one of them mentioned that the lectures were frustrating, but that she valued the discussions with other students outside the lectures. the other student expressed that the lecturer was agreeable but that the lectures concentrated too much on discussions, with little new information being presented. three students reported more positive experiences. for example, they mentioned that the teacher had structured the lectures well, was genuinely interested in his students‟ learning and that there was supportive interaction during the lectures. however, none of the three said that the teaching significantly supported their learning. two student‟s experiences of the teaching were rather neutral. they said that the lectures were traditional and not so useful, although they mentioned that the teacher of the course was good. ten of the 18 students showing a sharp deep approach decrease were rather satisfied with the teaching of the course. however, only three of these said that the teaching was exceptionally good and the rest expressed milder positive responses. three students described that the teacher of the course was very pleasant, but still they considered that the teaching did not significantly enhance their learning. five students had more negative experiences of the teaching: two stated that they did not understand what the teacher had said and that the teaching was boring, and three were not completely satisfied with the teaching method because it was new to them and they did not find themselves comfortable with it. these three students were from a course in educational sciences, where the students studied in groups throughout the whole course. most of their peers enjoyed the group method, but these three students had some difficulties with it. otherwise, no clear differences could be detected between students participating in different courses in their experiences of the course teaching. interestingly, these results imply, that the experiences of the quality of teaching were not related to deep approach stability or decrease. most students whose deep approach decreased sharply were satisfied with the teaching and some whose deep approach remained high throughout the course described some negative experiences related to the teaching. therefore the results support the existence of students who are „immune‟ to the teaching-learning environment, as suggested in previous studies by lindblom-ylänne & lonka (1999), who showed that the study practices of some meaning-oriented students remain unaffected by the learning environment. the results of baeten et al. (2010), however, showed that students who are satisfied with the quality of a course are more likely to employ a deep approach than students who are less satisfied. a larger sample of interviewees would be needed to explore in more depth the relationship between students‟ experiences of teaching and their approaches to learning. students with lower deep approach scores might be more sensitive to the quality of teaching and the effects of the teaching-learning environment. this could not be confirmed in the present study since all students scored highly at the beginning of the course and thus the sample was highly selected. another limitation is that the students were categorised into the three change groups according to their scores within only one course. however, during the first measurement the students were asked to consider how they have studied so far, and the interviews were used to improve the reliability of the questionnaire data. thus the use of a mixed-methods approach enabled a deep and more reliable investigation of the elements affecting students‟ studying and learning and their disposition to understand. a further limitation concerns the use of different cohorts of students: the data was collected during three different semesters and from both first and second year students, which might affect the results since students representing different cohorts might have diverse experiences of their learning environment. we are able to address some of these limitations in our other studies, since the data collected for our large research project includes a rich variety of qualitative data, e.g. stimulated recall data on assessment of student learning and students‟ exam papers, observation and video data from the courses as well as large interview data from both students and teachers. l. postareff et al. 44 | f l r 3.2. elements of a ‘disposition to understand for oneself’ the disposition to understand for oneself, as defined by entwistle and mccune (entwistle & mccune 2009; mccune & entwistle, 2011), could not be analysed in detail from the interviews because these focused broadly on studying and learning at the university and on the specific course rather than explicitly on the disposition to understand. however, the strength of our interviews was that they were openended and deep in nature and the students were able to thoroughly describe the elements of their studying and learning they considered to be important. some elements of having a disposition to understand clearly emerged from the data although the students were not specifically asked about them. a central element of the disposition to understand is a well-developed use of learning strategies which concentrate on relating ideas, critically using of evidence and attention to detail. these types of learning strategies were identified among most of the interviewed students, although some students who showed sharp decrease in the deep approach described a more narrow use of learning strategies. one component of well-developed learning strategies is a broader focus on the discipline as a whole instead of individual courses or blocks of content. this type of broader focus could be clearly identified among five of the nine students whose deep approach remained high and among three of the seven students whose deep approach slightly decreased. most of the students whose deep approach sharply decreased described a welldeveloped use of learning strategies, but aiming at gaining a broader view of the discipline as a whole was not emphasised. one student, whose deep approach remained high, described this type of broader focus on the discipline as follows: ” … i enjoy studying and learning hugely. i am very attached to my own major subject, but i would also like to explore what else i can learn here outside my major subject because i want to learn things deeply and broadly. i want to gain thinking and writing skills as well and absorb information from all possible sources.” (male student, educational sciences) secondly, willingness to devote the necessary time, effort and concentration to apply the learning strategies effectively is an essential element of a disposition to understand. again, the students whose deep approach remained high and most of those showing a slight decrease put a considerable amount of time and effort into studying during the course. however, only a few of the students whose deep approach sharply decreased devoted a good deal of time and effort to studying the content. in the following quotation a student whose deep approach did not decrease describes how he devoted time and effort during the course: “i attended lectures regularly and took notes. then, at home i looked at the materials we studied in many different books and i compared it… i tried to combine the new information with the old. i did this during the whole course, which was good, because i was able to keep on track all the time. i didn‟t read only for the examination, but evenly throughout the course.” (male student, biosciences) furthermore, some of the students showed an „alertness to the learning context‟ which is the third important element in a disposition to understand. it is defined as “alertness that monitors the learning processes and strategies in relation to the demands of the task, along with alertness to opportunities provided by the teaching, and indeed the whole learning environment, to further one's understanding”. elements of this type of alertness could be found in five students‟ interviews whose deep approach did not decrease, in two students‟ interviews whose deep approach slightly decreased and in one student‟s interview whose deep approach sharply decreased. however, the data was somewhat thin with regard to analysing alertness to the learning context, which prevented us from further investigating this element. the following quotation offers an example of how alertness to the learning context emerged in our data: “this bioscience course was very basic, but one had to learn the content deeply. therefore i did a lot of work and studied the content very well… once you do that, it helps you later on to understand things. in some courses i have to invest less effort and i don‟t necessarily understand everything so deeply, but the material in this course needed to be studied well… i l. postareff et al. 45 | f l r also study mathematics, physics, chemistry and biochemistry, and it‟s awesome to notice that some physics and chemistry matters can be combined with biology, and that mathematics is needed in all of them. i want to understand all of these fields.” (male student, biosciences) an important finding of our study was that the level of interest towards the content of the courses was not always related to deep approach changes. a low level of interest also characterised students whose deep approach remained high or only decreased slightly. these results suggest that such students have a will to learn even though they might find the course content less interesting. entwistle and mccune (entwistle & mccune 2009; mccune & entwistle, 2011) suggest that students with a disposition to understand for oneself have a continuing desire to adopt effortful, deep approaches across a wide range of contexts, and to reach the most satisfying understanding possible. the continuing desire to learn is illustrated in the following quotation: ”i have this constant hunger for information, i always find new things which i want to explore in more depth and i feel i need to find out more about this and that, and sometimes i end up on a number of different paths.” (female student, educational sciences) these students also expressed wanting to understand things deeply for their own purposes. entwistle and mccune (entwistle & mccune 2009; mccune & entwistle, 2011) note that students showing a disposition to understand feel strongly that they need to understand for themselves, and that they want to demonstrate the depth of their understanding, for example in examination answers. our results revealed that some of the students showing a slight or no change in their deep approach described that they study as long as it takes to understand the content deeply, and that they explained the content to themselves in their own words, as one student put it: “i want to understand as deeply as possible, my head can‟t take pure memorisation. i do mind maps and then i explain the things in my own words to the walls.” an attempt to understanding for oneself becomes evident in the following quotation from another student: “i read the course books in a very self-oriented way and i concentrate on the things that are important to me.” mccune and entwistle suggest that a students‟ strong commitment to understand may indicate that it has become part of that student's sense of identity as a learner, and so represents a much more stable characteristic than a deep approach. the strong feelings students express suggest that it has become a part of the students‟ sense of identity as learners (entwistle & mccune, 2009). the interview quotations presented in this article contain a substantial number of words implying strong feelings (such as „hugely‟, „hunger for information‟) which indicate that these students have a strong commitment and disposition to understand. in our data, four of the students showing no deep approach decrease, and one student showing a slight decrease, described this type of strong commitment. as well, the following quotation demonstrates strong feelings („really‟, „overly‟) when a student describes his studying, which also indicates a solid commitment to understand: “in every course i take i really want to learn, and not just pass the course. i feel that i really want to understand and use the information. sometimes there are courses which at first don‟t seem to be very important, but then i find myself being overly enthusiastic once i get involved with the content….“ (male student, theology) a challenge related to developing a disposition to understand is that it is, according to mccune and entwistle (entwistle & mccune 2009; mccune & entwistle, 2011), a more stable characteristic and less changeable through specific experiences. this was supported by the results of the present study showing that particularly the students whose deep approach did not decrease during the course were unaffected by the teaching-learning environment. mccune and entwistle emphasise that students need to experience challenging teaching–learning environments that systematically encourage a focus on personal l. postareff et al. 46 | f l r understanding. our results support this view by showing that having too few or too many challenges was mostly related to a decrease in one‟s deep approach, while positive challenges were related to the stability, or even an increase of one‟s deep approach. a student whose deep approach decreased sharply describes her perceived lack of challenges in the following way: “i could have invested more in studying. but somehow i felt that i would remember these things well enough … i felt that i wouldn‟t have to write them down in my learning diary immediately after the lectures. i only started the learning diary two days before the deadline.” (female student, theology) thus the central elements of a disposition to understand for oneself clearly emerged from the interviews, but our results imply that only about half of the students whose deep approach remained high, and a few whose deep approach slightly decreased, showed a disposition to understand for oneself. however, a more thorough analysis would require different types of interview questions, which more thoroughly would focus on the disposition to understand. nevertheless, the deep and open nature of the interviews made it possible to analyse elements of the disposition to understand in the current study as well. in the interviews the students broadly described their studying and learning and these descriptions revealed elements related to the disposition to understand. the mean age of the three students whose deep approach did not decrease was 30 years. this implies that a stronger commitment to understand might be related to student age as well. previous studies have also shown that older students are more likely to adopt a deep approach to learning than their younger peers (e.g. gow & kember, 1990). 4. conclusions in the present study we were able to identify the individual and contextual elements which were related to the stability of or changes in one‟s deep approach to learning. we identified that the students whose deep approach to learning decreased sharply during the course described problems in their studying and it seemed that at least some of them had exaggerated their deep approach level at the beginning of the course when answering the questionnaire. thus they did not show as strong commitment to understand as the students whose deep approach did not decrease or decreased only slightly. the students whose deep approach remained high or decreased only slightly described their studying and learning very similarly, and both individual and contextual elements were identified that logically explained the questionnaire results. the students whose deep approach decreased sharply clearly differed from those showing only slight or no changes in their deep approach with respect to the individual elements. for example, they described more problems in their self-regulation skills, time-management skills and study strategies. however, these students did not differ from the others in terms of their experiences of the teaching or their interest in the course. it therefore seems that the individual elements explained sharp decreases more than the contextual elements did. however, some interviews clearly showed that a lack of challenges or too many challenges decreased the deep approach level. in general, students showing a lack of interest in course content might be inclined to exhibit a disposition to understand for oneself when courses are challenging them in a positive way. adjusting the level of the course appropriately, then, seems to be a key element in course design. these elements should be further examined among students scoring lower on the deep approach scale. the results of the study confirm the strength of using in-depth qualitative analysis when examining students‟ approaches to learning and why they may vary or change. in addition, the study provided new information of the relationship between the deep approach to learning and a disposition to understand for oneself. our future research will focus on analysing the interviews of students scoring lower on the deep approach to learning scale. l. postareff et al. 47 | f l r keypoints elements explaining stability or change in the deep approach to learning were explored. individual elements explained the stability or change more than the contextual elements did. not all students showed a strong commitment to understand despite their high score on the deep approach scale. elements of a „disposition to understand for oneself‟ were identified among some students. references baeten, m., kyndt, e., struyven, k., & dochy, f. (2010). using student-centred learning environments to stimulate deep approaches to learning: factors encouraging or discouraging their effectiveness. educational research review, 5, 243-260. doi:10.1016/j.edurev.2010.06.001 biggs, j. (1987). student approaches to learning and studying. camberwell, vic: australian council for educational research. creswell, j. (2009). research design: qualitative, quantitative and mixed methods approaches (3rd edition). london: sage publications. diseth, a. (2003). personality and approaches to learning as predictors of academic achievement. european journal of personality, 17, 143–155. doi: 10.1002/per.469 elo, s. & kyngäs, h. (2007). the qualitative content analysis process. journal of advanced nursing, 62 (1), 107-115. doi: 10.1111/j.1365-2648.2007.04569.x entwistle, n. (2009). teaching for understanding at university: deep approaches to learning and distinctive ways of thinking. basingstoke, hampshire: palgrave macmillan. entwistle, n. j., & mccune, v. (in press). the disposition to understand for oneself at university: integrating learning processes with motivation and cognition. british journal of educational psychology. entwistle, n. j., & mccune, v. (2009). the disposition to understand for oneself at university and beyond: learning processes, the will to learn and sensitivity to context. in l-f. zang & r. j. sternberg (eds.), perspectives on the nature of intellectual styles (pp. 29-62). new york: springer. entwistle, n. & mccune, v. (2004). the conceptual bases of study strategies inventories in higher education. educational psychology review, 16 (4), 325-345.doi: 10.1007/s10648-004-0003-0 entwistle, n., & ramsden, p. (1983). understanding student learning. london: croom helm. flick, u. (2002). an introduction to qualitative research. 2nd ed. london: sage publications. gijbels, d., segers, m., & struyf, e. (2008). constructivist learning environments and the (im)possibility to change students‟ perceptions of assessment demands and approaches to learning. instructional science, 36, 431–443. doi: 10.1007/s11251-008-9064-7 gow, l. & kember, d. (1990). does higher education promote independent learning? higher education 19, 307-322. doi: 10.1007/bf00133895 hailikari, t., postareff, l,. tuononen, t., räisänen, m. & lindblom-ylänne, s. (in press). students‟ and teachers‟ perceptions of fairness in assessment. in c. kreber, c. anderson, n. entwistle, & j. mcarthur (eds), advances and innovations in university assessment and feedback. the edinburgh university press. heikkilä, a., & lonka, k. (2006). studying in higher education: students‟ approaches to learning, selfregulation, and cognitive strategies. studies in higher education, 31, 99-117. doi: 10.1080/03075070500392433 heikkilä, a., niemivirta, m., nieminen, j., & lonka, k. (2011). interrelations among university students‟ approaches to learning, regulations of learning, and cognitive and attributional strategies: a person oriented approach. higher education, 61, 513-529. doi: 10.1007/s10734-010-9346-2 hidi, s. & ainley, m. (2008). interest and self-regulation: relationships between variables that influence learning. in d. schunk & b. j. zimmerman (eds.), motivation and self-regulated learning (pp. 77110). theory, research, and applications. new york: taylor & francis. l. postareff et al. 48 | f l r johnson, r.b, onwuegbuzie, a.j., & turner, l.a. (2007). toward a definition of mixed methods research, journal of mixed methods research, 1(2), 112-133. doi: 10.1177/1558689806298224 kyndt, e., dochy, f., cascallar, e., & struyven, k. (2011). the direct and indirect effect of motivation for learning on students‟ approaches to learning, through perceptions of workload and task complexity. higher education research & development, 30, 135-150. doi: 10.1080/07294360.2010.501329 kyndt, e., dochy, f., struyven, k., & cascallar, e. (2011). the perception of workload and task complexity and its influence on students‟ approaches to learning. european journal of psychology of education, 26, 393-415. doi: 10.1007/s10212-010-0053-2 lietz, p., & matthews, b. (2010). the effects of college students‟ personal values on changes in learning approaches. research in higher education, 51, 65–87. doi: 10.1007/s11162-009-9147-6 lindblom-ylänne, s., & lonka, k (1999). individual ways of interacting with the learning environment are they related to study success? learning and instruction, 9, 1-18. doi: 10.1016/s09594752(98)00025-5 lindblom-ylänne, s., parpala, a., & postareff, l. (2013). challenges in analysing change in students‟ approaches to learning. in v. donche, j. richardson, j. vermunt, & d. gijbels (eds), learning patterns in higher education. routledge. lonka, k., & lindblom-ylänne, s. (1996). epistemologies, conceptions of learning, and study practices in medicine and psychology. higher education, 31, 5-24. doi: 10.1007/bf00129105 lonka, k., olkinuora, e., & mäkinen, j. (2004). aspects and prospects of measuring studying and learning in higher education. educational psychology review, 16 (4), 301-323. doi: 10.1007/s10648-0040002-1 marton, f. & säljö, r. (1976). on qualitative differences in learning: i. outcome and process. british journal of educational psychology, 46, 4-11. doi: 10.1111/j.2044-8279.1976.tb02980.x marton, f. & säljö, r. (1997). approaches to learning. in f. marton, d. hounsell & n. entwistle, (eds.) the experience of learning (2 nd ed., pp. 39-58). edinburgh, uk: scottish academic press. marton, f., dall‟alba, g., & beaty, e. (1993). conceptions of learning. international journal of educational research, 19, 277-300. mccune, v., & entwistle, n. (2011). cultivating the disposition to understand in 21 st century university education. learning and individual differences, 21, 303-310. doi: 10.1016/j.lindif.2010.11.017 nieminen, j., lindblom-ylänne, s., & lonka, k. (2004). the development of study orientations and study success in students of pharmacy. instructional science, 32, 387–417. doi: 10.1023/b:truc.0000044642.35553.e5 parpala, a., & lindblom-ylänne, s. (2012). using a research instrument for developing quality at the university. quality in higher education, 18, 313-328. doi: 10.1080/13538322.2012.733493 parpala, a., lindblom-ylänne, s., komulainen, e., litmanen, t., & hirsto, l. (2010). students‟ approaches to learning and their experiences of the teaching-learning environment in different disciplines. british journal of educational psychology, 80, 269-282. doi: 10.1348/000709909x476946 patton, m.q. (1990). qualitative evaluation and research methods (2nd ed.). london: sage publications. perkins, d. n., & tishman, s. (2001). dispositional aspects of intelligence. in j. m. collis & s. messick (eds.), intelligence and personality (pp. 233−258). mahwah, nj: lawrence erlbaum. ryan, r. m. & deci, e. l. (2000). intrinsic and extrinsic motivations: classic definitions and new directions. contemporary educational psychology, 25, 54-67. doi: 10.1006/ceps.1999.1020 rytkönen, h., parpala, a., lindblom-ylänne, s., virtanen, v., & postareff, l. (2012). factors affecting bioscience students‟ academic achievement. instructional science, 40, 241-256. doi: 10.1007/s11251-011-9176-3 schilling, j. (2006). on the pragmatics of qualitative assessment: designing the process for content analysis. european journal of psychological assessment, 22 (1), 28-37. doi: 10.1027/1015-5759.22.1.28 struyven, k., dochy, f., janssens, s., & gielen, s. (2006). on the dynamics of students‟ approaches to learning: the effects of the teaching/learning environment. learning and instruction, 16, 279–294. doi: 10.1016/j.learninstruc.2006.07.001 trigwell, k., ellis, r. a., & han, f. (2012). relations between students‟ approaches to learning, experienced emotions and outcomes of learning. studies in higher education, 37, 811-824. doi: 10.1080/03075079.2010.549220 l. postareff et al. 49 | f l r trigwell, k., prosser, m., & waterhouse, f. (1999). relations between teachers‟ approaches to teaching and students‟ approaches to learning. higher education, 37, 57-70. doi: 10.1023/a:1003548313194 van rossum, e. j., deijkers, r., & hamer, r. (1985). students‟ learning conceptions and their interpretation of significant educational concepts. higher education, 14, 617–641. doi: 10.1007/bf00136501 vanthournout, g., donche, v., gijbels, d. & van petegem, p. (2013). (dis)similarities in research on learning approaches and learning patterns. in d. gijbels, v. donche, j.t.e. richardson & j.d. vermunt (eds.), learning patterns in higher education. dimensions and research perspectives (pp. 11-32). routledge. watters, d., & watters, j. (2007). approaches to learning by students in the biological sciences: implications for teaching. international journal of science education, 29, 19–43. doi: 10.1080/09500690600621282 vermunt, j. d. (1998). the regulation of constructive learning processes. british journal of educational psychology, 68, 149-171.doi: 10.1111/j.2044-8279.1998.tb01281.x vermunt, j. d.., & van rijswijk, f.a.w.m. (1988). analysis and development of students‟ skill in selfregulated learning. higher education, 170, 647-682. vermunt, j. d., & verloop, n. (1999). congruence and friction between learning and teaching. learning and instruction, 9, 257-280. doi: 10.1016/s0959-4752(98)00028-0 watkins, d. a., & hattie, j. (1985). a longitudinal study of the approach to learning of australian tertiary students. human learning, 4, 127—142. zeegers, p. (2001). approaches to learning in science: a longitudinal study. british journal of educational psychology, 66, 59-71. doi: 10.1348/000709901158424 andressen et a l publication frontline learning research vol. 7 no 3 (2019) 1 – 26 issn 2295-3159 processing and learning from multiple sources: a comparative case study of students with dyslexia working in a multiple source multimedia context anette andresena, øistein anmarkrud a ladislao salmerónb, ivar bråtena auniversity of oslo, norway buniversity of valencia, spain article received 22 january / revised 2 may/ accepted 2 june / available online 16 july abstract this study investigated how four 10th-grade students with dyslexia processed and integrated information across web pages and representations when learning in a multiple source multimedia context. eye movement data showed that participants’ processing of the materials varied with respect to their initial exploration of the web pages, their overall processing time, and the linearity of their processing patterns, with post-learning interviews indicating the deliberate, strategic considerations underlying each participant’s processing pattern. eye movement data in terms of fixation duration and percentage of regressions also corroborated the findings of formal, diagnostic assessments. finally, it was found that participants differed with respect to how much factual information they learned from working with the materials and how well they were able to integrate information across the web pages and representations, with results suggesting particular problems with learning factual information and, at the same time, constructing a coherent mental representation of the issue, as well as with drawing on textual information in the integration process. this study brings together two research areas that essentially have been kept apart in theory and research, that is, dyslexia and multimedia learning, and it provides unique information about the role of individual differences in multiple source multimedia contexts. keywords: multiple source use; dyslexia; eye-tracking; strategic processing; multimedia learning corresponding author: oistein.anmarkrud@isp.uio.no doi: 10.14786/flr.v7i3.451 1. introduction due to technological developments, human learning is becoming increasingly multi-representational (ainsworth, 2018). accordingly, learning in school has become much more than reading and understanding textbooks. one reading context where students encounter multimedia information on a regular basis is the internet, making it possible for students to benefit from texts, pictures, animations, films, and interactive graphs. hence, the internet has become an invaluable learning tool for students, providing them with a vast amount of multimedia information that they can use for academic purposes (e.g., kammerer, meier, & stahl, 2016; kingsley & tancock, 2013; mason, junyent, & tornatora, 2014; van strien, brand-gruwel, & boshuizen, 2014). however, although the abundant information available just a finger swipe or mouse click away has brought new affordances for learning, it also comes with some caveats. successful learning from the internet requires learners to integrate task-relevant and reliable information across different representations (e.g., pictures, videos, and texts), web pages, and perspectives, as well as with their own prior knowledge (e.g., bråten, braasch, & salmerón, in press; cho, woodward, & li, 2017; deschryver, 2015; rouet & britt, 2014). presumably, learning from multimedia materials, that is, the construction of a coherent mental representation based on information from different types of media, requires considerable working memory resources (e.g., irrazabal, saux, & burin, 2016; schüler, scheiter, & van genuchten, 2011; sweller, ayres, & kalyuga, 2011). in explaining the relationship between multimedia learning and working memory, mayer’s (2003, 2014a) influential cognitive theory of multimedia learning draws on limited-capacity (baddeley, 1995, 2000; just & carpenter, 1992) and dual-channel theories (baddeley, 1995; clark & paivio, 1991; paivio, 1971, 1986). the limited-capacity theory assumes that human working memory is limited in capacity and, thus, can process only a certain amount of information at a time. if processing demands exceed this limited capacity, the likely result is cognitive overload, which reduces learning. the dual-channel theory assumes that visual and auditory information is processed in separate channels in working memory and that these channels operate independently from each other, each with its own capacity. hence, when these two channels are combined, such as when a multimedia learning context involves both a text (visual channel) and a narration (auditive channel), the learner will be able to process more information simultaneously than when the learning context involves a combination of text and pictures, since both text and pictures have to be processed in the visual channel. however, current web pages, often containing text, pictures, animations with narration, audio files, and so forth, have the potential to overload both the visual and the auditive channel in working memory (knoop-van campen, segers, & verhoven, 2018; schüler et al., 2011). when students use the internet for educational purposes, they often visit several web pages that may present overlapping, complementary, and conflicting information (cho, afflerbach, & han, 2018; cho et al., 2017; salmerón, strømsø, kammerer, stadtler, & van den broek, 2018), with successful learning demanding integration across different representations (e.g., text, video, and picture) and web pages. although this can represent a great challenge for students regardless of their reading skills, poor readers may be particularly vulnerable in such learning contexts. still, surprisingly little is known about how students with dyslexia handle such multi-representational information (anmarkrud, brante, & andresen, 2018; knoop-van campen et al., 2018; mccarthy & swierenga, 2010), compared to the knowledge that exists about typically developing readers’ multimedia learning (mayer, 2014c). this study is frontline because it uniquely contributes to both multimedia learning and dyslexia, integrating and broadening the research agenda in both areas and providing new insights into what it means to be a struggling reader in the 21st century. one pertinent question is how the combined processing demands imposed by the multimedia context along with the processing demands of reading could affect comprehension and learning from multimodal materials for readers with dyslexia. hence, the present study aimed to provide a detailed description of some of the potential challenges that students with dyslexia may experience when trying to integrate information across representations (i.e., texts, pictures, and videos) and web pages by means of a comparative case study of four adolescents with dyslexia working on a socio-scientific issue in a digital environment. as such, this study draws on theoretical and empirical work on information processing in multimedia learning and attempts to extend that work to the challenges faced by readers with dyslexia in a multimedia learning context. 1.1 developmental dyslexia dyslexia is a specific learning disability characterized by difficulties with accurate and/or fluent word recognition, poor decoding skills, and spelling difficulties (lyon, shaywitz, & shaywitz, 2003). the associated reading and writing difficulties are typically a result of a deficit in the phonological component of language (e.g., harm & seidenberg, 1999; ramus et al., 2003). dyslexia is found to affect between 3 and 7% of the population (hulme & snowling, 2009) and is assumed to be of neurobiological origin (e.g., shaywitz & shaywitz, 2008). although there is no “cure” for dyslexia, several studies have shown that many children with dyslexia can develop reading skills comparable to typically developing readers when provided necessary support and high-quality remedial reading instruction (e.g., hulme & snowling, 2009; torgersen, 2001; torgersen et al., 2001). however, there are substantial differences in reading skills among students with dyslexia, and those towards the severe end of the spectrum may struggle with reading long into adolescence and adulthood. further, individuals with dyslexia have often been found to have working memory problems (e.g., avons & hanna, 1995; barbosa, miranda, santos, & bueno, 2009; melby-lervåg, lyster, & hulme, 2012), not only with the processing of information in a phonological code but also with central executive domains and the processing of visual information (e.g., fischbach, könen, rietz, & hasselhorn, 2014; menghini, finzi, carlesimo, & vicari, 2011; smith-spark & fisk, 2007). it is assumed that the working memory deficits often seen among individuals with dyslexia may contribute to difficulties constructing a coherent representation of a text during reading, independent of their difficulties with phonological coding (e.g., berninger, raskind, richards, abbot, & stock, 2008; borella, carretti, & pelegrina, 2010; follmer, 2018; smith-spark & fisk, 2007). however, several studies have found substantial within-group-heterogeneity in working memory capacity among students with dyslexia (e.g., gathercole, alloway, willis, & adams, 2006; jeffries & everatt, 2004; smith-spark & fisk, 2007). 1.2 learning from multiple representations in a digital context up to the early 2000s, research examining reading comprehension and text-based learning typically involved a reader encountering a single text, usually on paper. with the development of new and user-friendly information technologies, such as the internet, the conception of what constitutes a typical reading situation has changed considerably (bråten et al., in press; cho et al., 2018; fox & alexander, 2017; leu, kiili, & forzani, 2016; salmerón et al., in press). whereas conventional printed textbooks typically include text and various forms of illustrations and as such can be labelled multimedia materials, the digital learning contexts of today provides a variety of representations (e.g., animations, videos, audio files, simulations) in addition to text and pictures. research on multimedia learning originally focused on learning from a combination of text and pictures in an offline context (butcher, 2014; mayer, 2014b). however, given the technological developments in recent decades, multimedia learning has come to refer to the combination of any type of words and visual displays, regardless of whether the learning occurs in a non-digital or digital context. several cognitive models have been developed to explain multimedia learning, with mayer’s (2001, 2014b) cognitive theory of multimedia learning being the most influential. the basic assumption of this model is that multimedia learning rests on a cognitive system with multiple memory stores, with a working memory system of limited capacity considered an essential processing component. also, the model posits that good multimedia learning requires the integration of information from various representations and that comprehension and learning can be hampered by the constraints of the human cognitive system, particularly working memory (e.g., chan & unsworth, 2011; mayer & moreno, 2010; schüler et al., 2011). a significant body of research indicates that using multiple representations in academic learning contexts may be beneficial (e.g., butcher, 2014; cuevas, fiore, & oser, 2002; rieber, tzeng, & tribble, 2004). however, poorly designed multimedia learning environments can increase working memory load, leading to reduced learning. one example is when an additional representation (e.g., a picture added to a text) does not contain new information. in such a case, learners will have to waste additional processing capacity on the redundant information from the added representation without gaining any new knowledge. this is often referred to as the redundancy effect and has been found to interfere with learning (e.g., gerjets, scheiter, opfermann, hesse, & eysink, 2009; pociask & morrison, 2008; torcasio & sweller, 2010). another example is that presenting information by means of more than two representations in and of itself can increase processing demands, particularly when the representations are physically or temporally disparate (ayres & sweller, 2014). in such a situation, learners would have to split their attention between the different representations and fill the “gaps” between representations by drawing inferences before integrating information across the representations. this is referred to as the split-attention effect, and it can potentially reduce learning, especially among learners with reduced working memory capacity (fenesi, kramer, & kim, 2016). in brief, the use of multimedia has the potential to increase learning when multimedia environments are designed according to the limitations of the human information processing system. however, ill-structured multimedia environments have the potential to reduce learning, compared to single-media environments, due to increased load on working memory. 1.3 dyslexia and learning from multiple representations thus far, few studies have been conducted on multimedia learning among students with dyslexia. in a recent study, knoop-van campen et al. (2018) examined how multiple representations affected learning and study time among 11-year-old students with dyslexia compared to typically developing readers. the participants worked with three user-paced multimedia lessons on the topics of balance in nature, motion, and global warming, and participants were divided into three conditions: 1) pictures + text, 2) pictures + audio, and 3) pictures + text + audio. the participants studied every lesson once, and learning was assessed at two time points; at one immediate posttest and a delayed posttest one week later. the results showed that in the picture + text condition, the dyslexic participants spent statistically significantly longer time working with the materials than did the non-dyslexic participants. there were no statistically significant differences in study time between the two groups in the two other conditions, and there was no statistically significant correlation between study time and learning in any of the conditions. further, there were no main effects of condition, group, or working memory capacity on learning measured at the immediate and delayed posttests, although participants with dyslexia had statistically significantly lower scores on the working memory measure compared to the participants without dyslexia. a study by maccullagh, bosanquet, and badcocks (2017) highlights the challenges students with dyslexia may have integrating information across representations. interviews were conducted with 13 university students with dyslexia to investigate how they used an available multimedia tool consisting of video-recorded lectures in combination with other representations such as animations and text boxes to compensate for their reading difficulties. several participants reported that recorded lectures that they viewed with this tool were challenging to follow when all the different representations were combined. the participants reported that they often had to go through the online lectures several times to be able to benefit from them (e.g., just listen to the lecturer the first time, pay attention to the animation and text boxes the second time, and take notes the third time). in two studies, alty and colleagues (alty, al-sharrah, & beacham, 2006; beacham & alty, 2006) examined the effects that different combinations of representations, such as textual and visual materials (e.g., diagrams) as well as audio files (voice over), had on the learning of statistics among students with dyslexia. in the first study, alty et al. (2006) compared students with and without dyslexia across three conditions: one group received the learning materials as text only, one group received the materials as diagrams + voice over, and one group received the materials as text + diagrams. the results showed, contrary to expectations, that the students with dyslexia in the text-only condition significantly outperformed the students with dyslexia in the two other conditions with regard to learning, whereas the students without dyslexia in the diagrams + voice-over condition performed better than the students without dyslexia in the two other conditions. given the somewhat surprising finding concerning the participants with dyslexia, beacham and alty (2006) conducted the same experiment with a larger sample of students with dyslexia, this time without a non-dyslexic control group. the results in this second study corroborated the findings from the original study; again, the participants in the text-only condition outperformed the students in the two other conditions regarding learning, despite reporting that this was the least preferred version of the learning materials. 1.4 integration as strategic activity according to cho and afflerbach (2017), integration of information across representations and web pages requires strategic activity in the service of creating meaning. more generally, comprehension strategies may be defined as intentional attempts to control and modify meaning construction during learning (cf., afflerbach, pearson, & paris, 2008). if the semantic overlap between representations or sources is high, integration may rely on automatic processing (myers & o'brien, 1998). if not, integration will have to rely on deliberate, strategic activity (kurby, britt, & magliano, 2005), with the execution and monitoring of strategies drawing on working memory resources. presumably, multimedia learning that requires strategic processing will be particularly challenging for learners with dyslexia, who also must spend considerable working memory resources on more basic reading processes. of note is also that multimedia materials, such as online sources, are not necessarily designed according to multimedia principles (e.g., mayer, 2014b), that is, to reduce the load on working memory. several studies clearly have indicated that strategic behavior, such as coordinating representations and actively searching for meaning in different sources, can facilitate the integration of information into a coherent and rich mental model when working with multimedia materials (e.g., azevedo & cromley, 2004; greene, moos, azevedo, & winters, 2008; moreno & mayer, 2000). 1.5 the present study although it has been argued that learning in multimedia contexts, such as the internet, could be beneficial for struggling readers because text is supplemented with other representations (castek et al., 2011; henry et al., 2012), learning in such contexts also may represent particular challenges for struggling readers. this is because a combination of processing demands associated with word reading and processing demands associated with integrating information across web pages and representations may lead to cognitive overload (chan & unsworth, 2011). in this study, we extended previous research by exploring variations in the processing patterns of adolescent readers with dyslexia who worked with conflicting web pages containing multiple representations. additionally, we set out to explore how these processing patterns were related to cognitive differences among participants and to their performance on post-reading learning and integration tasks. to be able to address these issues in depth, we opted for a comparative case study design (yin, 2009) combining quantitative and qualitative data. specifically, following the logic of a comparative case approach (campbell, 2012; yin, 2009), we selected four cases that were analysed independently before they were compared and contrasted. the study of these cases was guided by two research questions. first, to what extent do different processing patterns displayed by students with dyslexia when reading multimodal information represent deliberate, strategic activity? we expected that processing time would be related to the severity of the participants’ reading difficulties, with more severe reading difficulties associated with longer processing time due to the time spent on reading the texts on the web pages. moreover, prior research with typically developing readers in multimedia learning contexts has shown that the order in which representations are processed, and the transitions between representations, can influence learning outcomes (e.g., mason, pluchino, & tornatora, 2016; mason, scheiter, & tornatora, 2017). two different approaches to the integration of textual and pictorial information in multimedia contexts have been described in the literature. the first approach, picture-to-text processing, involves a brief inspection of pictures before processing textual material, with a quick examination of a picture providing a global spatial representation of the topic that, in turn, can scaffold comprehension of the text material. recent research has found a positive effect of the picture-to-text approach (eitel, scheitel, & schüler, 2013; eitel, scheiter, schüler, nyström, & holmqvist, 2013; mason et al., 2017). the second approach, text-to-picture processing, involves processing textual material first, which may help readers focus on the essential elements of a picture subsequently. this approach also has been found to facilitate learning in multimedia contexts (hegarty & just, 1993). which of these approaches is the most efficient is probably a matter of the complexity of the information conveyed by the different representations, with the representation containing the least complex information preferably processed first (eitel & scheiter, 2015; mason et al., 2017). considering that our participants were students with dyslexia, we expected that participants with a deliberate, strategic processing pattern would use the picture-to-text approach to try to compensate for their word reading problems. second, how are processing patterns related to individual differences among participants and to their learning from and integration of multimodal information presented on different web pages? several studies have indicated that students with dyslexia who have developed sufficient word decoding skills through adequate remedial reading instruction may display reading comprehension almost on a par with students without dyslexia (e.g., bishop & snowling, 2004; de olivera, da silva, dias, sebra, & macedo, 2014; torgersen, 2001; torgersen et al., 2001). although there are very few studies examining comprehension of graphics (both static and motion) among individuals with dyslexia, available studies do not indicate that they have particular problems extracting information from pictorial representations (abtahi, 2012; roca, tejero, & insa, 2018; taylor, duffy, & hughes, 2007). hence, we expected that the participants would be able to gain factual knowledge from all representations and web pages used in this study. however, because integration of information from multiple representations imposes considerable processing demands on working memory, and because students with dyslexia have been found to display working memory deficits, we expected that integrating information across representations and web pages would be a profound challenge for our participants. further, we expected that the participants would be inclined to draw more on information from pictures and videos than on information conveyed by textual material when trying to construct a coherent mental representation of the learning materials. this research is based on a sample of 22 tenth-graders with dyslexia who participated in a study investigating differences in multiple source use between students with and without dyslexia (andresen, anmarkrud, & bråten, 2019). that study indicated that the group of students with dyslexia was heterogeneous with respect to working memory capacity and reading skill. hence, to examine in depth how differences in these two key competencies might influence the processing of and learning from multiple multimedia sources in a digital environment, we selected participants with dyslexia who varied with respect to working memory capacity and reading skill for this comparative case study. 2. method 2.1 participants the participants were four norwegian adolescents, ranging in age from 15 years 9 months to 15 years 11 months. in norway, if there is concern about a student’s reading proficiency, the student will be assessed by an educational-psychological service (eps). if appropriate, the students will be diagnosed with dyslexia based on test results, classroom observations, and interviews with parents and teachers. students who receive remedial reading instruction according to the special needs education act are usually reassessed every other year. thus, the four participants in this study were diagnosed by experts at the eps within the last two years. their diagnoses were based on criteria included in the definition of dyslexia proposed by lyon et al. (2003). this means that the four participants displayed difficulties in word recognition, phonological processing, and spelling (lyon et al., 2003). specifically, all participants were assessed with standardized diagnostic test batteries called logos (høien, 2014) or stas (klinkenberg & skaar, 2003), which are frequently used in norway and other scandinavian countries to diagnose dyslexia. on these test batteries, all participants with dyslexia scored below the 15th percentile on subtests measuring reading fluency, word identification, phonological processing, and spelling and were simultaneously within the normal range on subtests measuring listening comprehension. none of the participants had comorbid conditions such as attention deficit disorder, language impairment, or more general learning disabilities. all participants had normal or corrected to normal vision. see table 1 for relevant background information about each of the four participants. table 1 background information about the four participants note measured with logos (høien, 2014), 2measured with stas (klinkenberg & skaar, 2003), 3measured with a norwegian adaption of swanson and trahan’s (1992) working memory span task, 4compared to the mean scores of a sample of 528 norwegian 10th and 11th graders (anmarkrud & ferguson, 2011). participant 1 was diagnosed with dyslexia in 4th grade. the latest eps assessment showed that this participant still had substantial reading difficulties, with very low scores on subtests measuring phonological word reading (nonword reading; 1.2 percentile), orthographic reading (0.1 percentile), and reading fluency (0.2 percentile). based on the thorough reading assessment at the eps, participant 1 was by far the weakest reader among the four participants. as seen in table 1, participant 1 also displayed a very limited working memory capacity. participant 2 was diagnosed with dyslexia in 6th grade. on the latest eps assessment, participant 2 scored approximately two grades below the current grade level on subtests measuring phonological reading (nonword reading), orthographic reading, and reading fluency. however, presumably due to very good language comprehension skills, participant 2 seemed to be able to compensate for the word reading problems and had grade-appropriate reading comprehension scores. hence, participant 2 could be characterized as a poor word decoder with relatively good comprehension skills. participant 3 was not diagnosed with dyslexia until 8th grade. in the report from the eps, this participant was described as a very motivated and academically sound student with excellent learning abilities (e.g., a relatively high working memory capacity). due to these strengths, participant 3 had been able to conceal and compensate for the word reading problems for many years, and it was not until entering 8th grade, where the reading materials in school became increasingly more complex, that the reading difficulties became very visible. the eps assessment showed that participant 3´s scores on tests measuring phonological reading (nonword reading), orthographic reading, and reading fluency were equivalent to what is typically found among students two years younger. participant 4 was diagnosed with dyslexia in 5th grade. the results on the latest eps assessment showed that this participant mastered a phonological word reading task (i.e., could read the nonwords correctly) but was very slow on this task compared to typically developing peers. participant 4 demonstrated substantial difficulties on subtests measuring phonological awareness, orthographic reading, and reading fluency, with scores well below the 15th percentile on these subtests. table 2 content of the three web pages 2.2 learning materials participants were given access to a researcher-generated internet site presented in an offline mode that was titled “sunbathing and health”. the site contained three different web pages about the controversial issue of sun exposure and health. these web pages presented two main perspectives: sun exposure is beneficial, and sun exposure is harmful. each web page contained a title and a lead paragraph explaining the overall content of that page and then presented a video, a short text, and a picture, in that order. the first page contained information about the nature of ultraviolet radiation, different wavelength bands, how ultraviolet radiation is measured, and how different types of ultraviolet radiation affect the skin. the second page presented research arguing that sun exposure is healthy because it increases the production of vitamin d, which can protect against cancer, particularly in inner organs. the third page focused on the harmful effects of sun exposure due to increased risk of skin cancer, particularly basal cell carcinoma and melanoma, and explained that sun exposure cannot be considered a safe source of vitamin d. the main idea units of the different representations (i.e., the texts, the videos, and the pictures) were unique, which made it possible to trace each idea unit in participants’ post-reading answers back to a particular representation on a particular web page. of note is that the learning materials were designed in accordance with design principles for multimedia learning (mayer, 2014a). thus, we took the spatial contiguity principle (e.g., austin, 2009; johnson & mayer, 2012) into consideration by presenting text, videos, and pictures near each other on the web pages, and we omitted redundant information across the different representations in accordance with the redundancy principle (e.g., mayer, heiser, & lonn, 2001; moreno & mayer, 2002). table 2 provides an overview of the content of the three web pages. the texts that were included on the web pages (one on each page) contained 83, 92, and 90 words, and ranged in readability from 37 to 40 (see table 3). these readability scores were based on björnsson’s (1968) formula, taking word length and sentence length into consideration. this formula yields readability scores ranging from approximately 20 (very easy text) to approximately 60 (very difficult text). vinje (1982) reported that textbooks used in norwegian upper-secondary school had a readability score of approximately 42 and that public information texts from the norwegian government had a readability of 45. table 3 descriptive information about the text on each of the three web pages 2.3 measures 2.3.1 topic knowledge measure. to assess students’ knowledge about the topic of sun exposure and health, both before and after working on the learning materials described above, we used a 12-item multiple-choice test. the items referred to concepts and information central to the issue of sun exposure and health that were discussed on the three web pages. because the same measure was administered both before and after participants worked on the learning materials, learning gain could be calculated by subtracting the number of correct responses out of 12 on the first occasion from the number of correct responses out of 12 on the second occasion. a preliminary version of the topic knowledge measure was reviewed by a professor of medical biochemistry at the university of oslo who was not part of the project, which resulted in only minor modifications to the response alternatives of a few items. sample items from the topic knowledge measure are displayed in appendix a. in the larger sample of students with dyslexia from which the four participants were selected, the internal consistency reliability (kuder richardson 20) for scores on this measure was .62. 2.3.2. working memory measure. working memory was measured using a norwegian adaptation of swanson and trahan’s (1992) working memory span task (braasch, bråten, strømsø, & anmarkrud, 2014). this measure is derived from daneman and carpenter’s (1980) original reading span test. twelve sets of unrelated sentences were read aloud with a 2-second interval between each sentence. the sets gradually increased from two to five sentences. participants were tasked to simultaneously a) answer a comprehension question about an unknown sentence after the final sentence was read, and b) remember the final words from each of the sentences. for each of the 12 trials, participants were awarded 1 point if they correctly answered the comprehension question and one additional point for each of the final words they recalled. if participants failed to answer the comprehension question correctly, they did not receive any points for that set regardless of how many final words they recalled. internal consistency reliability (cronbach’s α) for scores on this measure in the larger sample of students with dyslexia from which the four participants were selected was .73. 2.3.3 multiple source integration task. multiple source integration was assessed by asking the four participants to respond orally to two open-ended questions modelled on the integrative short essay tasks used by rukavina and daneman (1996) to measure students’ understanding of a controversial scientific issue. of note is that this approach also has been used effectively in several previous studies of multiple source integration (e.g., barzilai & ka’adan, 2017; bråten, anmarkrud, brandmo, & strømsø, 2014; ferguson & bråten, 2013).the first question was, “could you explain the relationship between sun exposure, health, and illness?” the second question was, “could more than one view on the relationship between sun exposure, health, and illness be correct? yes or no? if yes, why? if no, why not?” following rukavina and daneman (1996), we considered our first question to indirectly require participants to integrate different perspectives across web pages and representations, or, at least, to consider each perspective’s claims and explanations. our second question was considered to directly require participants to pit perspectives against each other, measuring how well they could reason about the issue in terms of the claims and explanations presented across web pages and representations. the oral responses were audio-taped and transcribed before they were scored. following andresen et al. (2019), the responses were scored in three steps. in the first step, we coded responses to both questions based on the extent to which participants integrated the two main perspectives represented in the materials (i.e., sun exposure is healthy vs. sun exposure is harmful), regardless of the web pages and representations they drew upon in their responses. on the indirect integrative question, participants could obtain scores between 0 and 5. a score of 0 was given for no response or for irrelevant information. a score of 5 was given for mentioning the two main perspectives and providing elaborate explanations or reasons for both perspectives as well as relating the two perspectives to each other by comparing and/or contrasting them and trying to reconcile them. inter-rater reliability was established in the larger sample from which the four participants were selected. in this process, the first and second authors independently scored a random selection of 50% of participant responses to the first question, initially agreeing on 80% and resolving all disagreements through discussion. on the direct integrative question, we first coded whether participants recognized that the main perspectives were not mutually exclusive and might be reconciled (i.e., whether participants answered “yes” or “no” to the question). second, we coded to what extent participants could explain and reconcile the two perspectives (i.e., when they answered “yes”) and to what extent they could select one of the perspectives and provide explanation or reason for that perspective (i.e., when they answered “no”). again, scores could range from 0 to 5. a score of 0 was given when participants answered “no” to the question without providing any further justification for their answer. a score of 5 was given when participants answered “yes” to the question, mentioned the two perspectives, provided elaborate explanations or reasons for both, and related the two perspectives to each other by explaining how they may be reconciled. again, the reliability of the coding was established in the larger sample from which the participants were selected. the first and second authors independently scored a random selection of 50% of the answers to the second question, initially agreeing on 83% and resolving all disagreements through discussion. participants’ scores on the two integrative questions were collapsed, which means that their scores after this step could range from 0 to 10. table 4 presents the entire coding system used for scoring the oral responses in the first step. table 4 coding system used in the first step of the scoring of the oral responses in the second step, we assessed the extent to which participants drew on information from the three different web pages and the different types of representations on each web page (i.e., text, video, and picture) when constructing their oral responses to the two questions. because the main idea units of each representation were unique, we could trace an idea unit included in an oral response back to a particular representation on a particular web page. inter-rater reliability of this coding also was established in the larger sample by the first and second authors who independently coded a random selection of 50% of the responses to both questions and initially agreed on the origin of 85% of the idea units. all disagreements were resolved through discussion. in the second step, participants could obtain scores between 1 and 2.98. in addition to a constant of 1, participants were awarded a score of 0.33 for each web page and a score of 0.11 for each representation (i.e., text, video, or picture) that they used in their responses. for example, a participant who included idea units from the text and the video on the first web page would obtain a score of 0.33 for the web page and a score of 0.11 each for the text and the video, so this participant’s score would be 1.55 (including the constant). if this participant additionally drew on the video on the second web page, he or she would obtain a score of 0.33 for that web page and 0.11 for that video, resulting in a score of 1.99 (including the constant). we awarded a score of 0.33 for each web page and a score of 0.11 for each representation because we considered the entire web site to consist of three parts (i.e., web pages), which were again divided into three representations. in the third step, we computed each participant’s total multiple source integration score by multiplying the participant’s score from the first and second steps. in this way, we considered both the integration of the main perspectives (step one) and the coverage of the learning materials (step two) when assessing multiple source integration. thus, on the multiple source integration task, participants could obtain a maximum score of 29.8 (i.e., 10 x 2.98). the reason we added a constant of 1 to each participant’s score in the second step was to avoid any participant obtaining a total multiple source integration score that was lower than the score obtained in step one. 2.3.4 apparatus and analysis of eye-tracking data. when working with the learning materials, gaze data were collected by means of a tobii x2-60 eye-tracking device. the tobii x2-60 is a screen-based eye tracker that records gaze data at a sampling rate of 60 hz. data collection took place in a quiet room, and direct sunlight to the screen and the student’s head was restricted to avoid distorting reflections. students sat approximately 60 cm from the screen, which was a t540p lenovo laptop with a 15.6” monitor and 1920 x 1080 resolution. a nine-point calibration task was used to ensure reliable eye-tracking. the process was repeated until the average deviation dropped below 0.5º. eye-tracking data were analyzed with two different approaches. first, participants’ patterns of processing were examined with regard to sequence of processing (linear vs. nonlinear processing patterns) and time spent on the various web pages and representations. second, we performed finer grained analyzes of eye movements during the reading of the texts on the three web pages. eye-tracking data were analyzed by means of tobii studio software. specifically, we established text paragraphs as the main areas of interest (aoi). within these, we examined the number of fixations, average duration of fixations (measured in milliseconds), and percentage of regressions (saccadic movements to the left, excluding carriage returns). the first time participants read a paragraph was defined as first-pass reading, while any subsequent reread was considered second-pass reading. in the reading literature, higher average fixation duration and a high percentage of regressions are considered indicators of comprehension difficulties (rayner, chace, slattery, & ashby, 2006; schotter, tran, & rayner, 2014). 2.3.5 follow-up interview. finally, when participants had finished working with the learning materials and responded to the post-reading measures, we replayed the recordings of their eye movements and used these recordings as stimuli to gain insight into whether there were any deliberate reasons for their processing patterns when working with the learning materials. the main topic of these follow-up interviews was the order in which the different web pages and representations where processed and whether the observed processing pattern was representative of how they would typically approach a web page in an academic setting. the follow-up interviews were transcribed, and the various utterances were given a time stamp making it possible to connect an utterance to an incident in the recordings of the eye movements. 2.4 procedure data collection took place in two different sessions, approximately two weeks apart. the first author collected all data in participants’ home schools either during the school day or directly afterwards. all instructions, questionnaire items, and questions were read aloud to the participants while they had their own printed copies in front of them. this procedure was followed to reduce the effects of participants’ reading difficulties on the various measures. in the first session, the topic knowledge and working memory measures were individually administered in that order. this session lasted approximately 40 minutes. in the second session, participants individually studied the three web pages on a laptop (see above) with the following instruction read aloud: “sun exposure and health is a topic of current interest. imagine that you are supposed to hold an oral presentation on this topic for your fellow students. here are three web pages you can use to prepare the presentation. you may have 30 minutes studying these three pages; please do not take any notes. you can move between the pages as much as you would like and read them in the order that you choose”. when finished working on the three web pages, the participants responded to the topic knowledge measure for a second time before answering the oral multiple source integration task. finally, the follow-up interview was conducted. participants were given a gift certificate of 300 nok (approximately 35 us $) and were offered an individual course in strategic internet reading as a reward for their participation. this study was carried out in accordance with the recommendations of the norwegian national research ethics committees. the protocol was approved by the norwegian centre for research data. in norway, the norwegian centre for research data functions as a national ethics committee approving all studies within the social sciences. all subjects as well as their parents gave written informed consent for participation in the study. 3. results to address the first research question regarding possible variations in the processing patterns, we used eye tracking to examine their processing of the web pages and representations. we also performed a more detailed analysis of eye movements during text reading, focusing on differences among participants regarding number of fixations, fixation duration, and regressions. figure 1 displays the processing pattern of the four participants when working with the learning materials in the second session. one difference that could be observed concerned how they initially explored the web pages. participant 3 started the session by using 20 seconds to quickly click through all three pages, before returning to the first page and going through the three web pages more thoroughly. participant 4, on the other hand, went straight to page one, going through this page before moving to page two and page three. on each web page, both participant 1 and participant 2 started with the text before moving on to the video and the picture, in that order. a second difference was overall processing time, with participant 1 spending a total of 23 minutes and the other three spending between 6 minutes and 30 seconds and 7 minutes and 45 seconds on the three web pages. interestingly, the time participant 1 spent on reading the texts on the three web pages (15 minutes) constituted the difference in total processing time between participant 1 and the other participants. a third observable difference between the participants was the sequence of processing the web pages, specifically the degree of linearity. the processing patterns of participant 3 and participant 4 can be categorized as linear in the sense that they both processed the various representations in the order in which they appeared on the web pages (i.e., video, text, and picture). although the processing patterns of these two students may look identical on a surface level, there were important differences on a more detailed level. thus, participant 3 always started out by reading the lead paragraph explaining the content of the page before going through each web page in a linear pattern. participant 4 skipped the lead paragraph on all pages and went directly to the video before examining the text and picture in a linear pattern on every page. in the follow-up interview, participant 3 said that this quick examination of the three web pages was a deliberate strategy used to get an overview of the content of the web pages. when asked about the processing order of the representations in the follow-up interview, both participant 3 and participant 4 explained that they believed that those who make web pages probably have a reason for the design of a page; therefore, they just processed the representations in the order in which they appeared on each web page. figure 1. processing patterns of the four participants on the three web pages the processing patterns of participant 1 and participant 2 can be described as nonlinear because they did not process the representations in the order in which they appeared on the web pages. a noteworthy difference between participant 1 and participant 2 was that the former systematically went back and reread the text and re-examined the pictures on each web page after the initial processing of the site. although, as seen from the processing patterns, they both prioritized the text (i.e., read the text first), their reasons for this were different. participant 1 explained in the follow-up interview that reading the text first was done deliberate to get some knowledge of the content before watching the video; a strategic activity used to maximize the comprehension of the video. participant 1 applied this strategy on all three web pages. participant 2, on the other hand, expressed that the text was usually the representation that contained the key information on a web page, and, therefore, participant 2 would always start with the text when entering a web page. this participant also gave another strategic reason for such a processing pattern during the follow-up interview. due to the reading difficulties, participant 2 sometimes experienced that “things were unclear” after the reading of a text on a web page, hoping to clarify misunderstandings by watching a video or looking at pictures if such representations were available on a web page. regarding the second research question, we examined whether processing patterns were related to differences among participants with respect to working memory capacity, reading skills, and topic knowledge, as well as to their post-reading performances on the topic knowledge measure and the multiple source integration task. table 5 number of fixations, percentage of regressions, and fixation duration during reading note. 1fixation durations are measured in milliseconds. outlier fixations (defined as participants’ mean fixation + 2 sd) are replaced by participants’ median fixation. the results of the eye-tracking analyses are displayed in table 5, with results corroborating the findings of the eps assessments regarding the reading levels of the participants. each participant’s average fixation duration (see “average forward fixation duration” in table 5) and percentage of regressions across the three web pages were compared with the reading of students with normative development (rayner, ardoin, & binder, 2013) and with a sample of students with dyslexia (prado, dubois, & valdois, 2007). participant 1 read texts at a substantially slower rate (average fixation duration 476 msc) but did not display more regressions (31.59% regressions) than what has been reported in children with dyslexia (325 msc, 31% regressions). participant 1 was also the only participant who reread all three texts. as can be seen in table 5, both the 1st and the 2nd pass were characterized by long fixations and many regressions. participant 4’s reading behavior (average fixation duration 310 msc, 39.11% regressions) was similar to what has been reported for children with dyslexia. finally, participant 2 and participant 3 read the texts substantially faster (with average fixation duration 200 and 190 msc, and with 26.06% and 16.20% regressions, respectively), which closely resembles the behaviors typically reported for children without dyslexia of a similar age (average fixation duration 230-250 msc, 22% regressions) (rayner et al., 2013). however, previous research has found that the transparency of the orthography can influence fixation time during reading; the deeper the orthography, the longer the fixations (bahnmueller, huber, nuerk, göbel, & moeller, 2016; rau, moll, snowling, & landerl, 2015; van roy & pretorius, 2013). norwegian is a transparent orthography compared to english or french, and this should be considered when interpreting the reading rates of participant 2 and participant 3 in relation to the norms, which are established with englishor french-speaking children. 3.1 learning gain and multiple source integration as displayed in table 6, the participants started out with a similar amount of topic knowledge, with participant 1 and participant 3 receiving a score of 4, which is equivalent to a z-score of -1.32 compared to a norm sample of 528 norwegian 10th and 11th graders without dyslexia (anmarkrud & ferguson, 2011). please note that all participants in the norm sample responded to the same topic knowledge measure both before and after studying the same information about sun exposure and health as did the four participants in the current study, and that the pre-reading topic knowledge, post-reading topic knowledge, and learning gain of the participants in the current study were compared to the pre-reading topic knowledge, post-reading topic knowledge, and learning gain obtained by the students in the norm sample. participant 2 and participant 4 scored 5, which is equivalent to a z-score of -.93 when compared to the norm sample. however, there were differences between the participants with respect to learning gain, indicating the amount of factual knowledge they were able to gain from working with the learning materials. participant 1 read at a much slower rate than what is often seen among students with dyslexia but reread all texts thoroughly. this participant had a learning gain of 6, ending up with a post-reading topic knowledge score of 10 (z-score -.14). compared to the norm sample, this is a substantial learning gain, equivalent to a z-score of 1.45. participant 3 and participant 4 both ended up with a post-reading topic knowledge score of 9 (z-score -.72), with their learning gains of 5 and 4 equalling z-scores of .99 and .53, respectively, when compared to the norm sample. hence, the results indicated that these three participants were all able to extract factual knowledge from the representations and web pages included in the learning materials. however, participant 2, who read relatively fast and also displayed relatively low working memory capacity, did not seem to gain much factual knowledge from the learning materials, increasing the topic knowledge score by only one point and ending up with a post-reading topic knowledge score of 6 (z-score -2.48), indicating a relatively small learning gain compared to the norm sample (z-score -.86). on the multiple source integration task, higher scores required that readers integrated information across web pages and representations (i.e., texts, videos, and pictures) into a coherent mental representation of the issue of sun exposure and health. given a potential maximum score of 29.8, participant 1 clearly struggled with this task and obtained a score of only 3.76, which is equivalent to a z-score of -1.76 when compared to a sample of norwegian 10th graders previously responding to this task (authors 1). participant 3 and participant 4 obtained scores of 5.64 and 9.28, equalling z-scores of -1.46 and -.88, respectively. in contrast, participant 2, who gained little factual knowledge, received a multiple source integration score of 16.80 (z-score .35), thus performing above the average of the norm sample. hence, the results suggested that none of the four participants was able to learn factual knowledge from the three web pages (indicated by the post-reading topic knowledge scores) and at the same time construct a coherent mental representation of the issue in question (indicated by the scores on the multiple source integration task). thus, spending cognitive processing capacity on integrating information across different web pages and representations seemed to have left little capacity for learning factual information from the same web pages and representations, and vice versa. table 6 participant scores on the learning and integration measures note. 1the z-scores are based on comparison with the pre-reading topic knowledge, post reading topic knowledge, and learning gain scores of a sample of 528 norwegian 10th and 11th graders (anmarkrud & ferguson, 2011). 2the z-scores are based on comparison with a sample of 44 norwegian 10th graders (andresen et al., 2019). table 7 representations drawn on in the integration task finally, even though the eye-tracking data indicated that the four participants processed all the representations on all the web pages, there was a clear pattern regarding which representations the participants drew on in the multiple source integration task. as displayed in table 7, none of the participants drew on the text material when answering the multiple source integration task; they all based their oral responses on videos and, to some degree, information from the pictures. hence, since all the participants processed the texts, the inability to draw on information from the texts in the integrations task could not be ascribed to a lack of processing of the texts, but rather the inability to integrate the information extracted from the texts with information from other representations across the three web pages. 4. discussion the present study examined how four adolescents with dyslexia processed information from multiple representations in a digital context and how they learned from and integrated information across different representations and web pages. our first research question concerned differences in processing patterns and whether differences in processing patterns represented differences with respect to deliberate strategic activity. while all the participants provided reasons for the strategic approaches they chose when working with the learning materials, there seemed to be differences regarding the sophistication of these strategic approaches. two of the participants, participant 3 and participant 4, processed the various representations in the same order as they appeared on the web pages, with the rationale for this linear processing approach being that those who made web pages probably had good reasons for the order in which the representations appeared. both participant 1 and participant 2, on the other hand, started with the text before processing the graphic representations. previous research has revealed that students with dyslexia may be aware of their reading problems but seem to have a limited strategic repertoire to compensate for these difficulties (furnes & norman, 2015). moreover, studies conducted with traditional paper-based reading have consistently found differences between students with and without reading difficulties regarding knowledge about which strategies to use and when to use them (e.g., baker & beall, 2009; furnes & norman, 2015; roeschl-heils, schneider, & van kraayenoord, 2003). in their review of the literature, anderson and ambruster (1984) reported that, compared to students without reading difficulties, students with reading difficulties to a lesser degree planned their reading ahead, integrated information, and reread text when they noticed comprehension problems. hence, it is interesting that only one of the participants (participant 2) explained a strategic approach by referring to the reading difficulties. this participant thus read the text first, and then examined pictures and videos to clarify misunderstandings that might have arisen while reading due to the reading difficulties. although participant 1 did not verbalize any particular strategic reason for the meticulous reading, and rereading, of the texts on the three web pages, it is conceivable that this approach reflected the experience as a struggling reader and reasoning about what could do to compensate for the difficulties. hence, inconsistent with what we expected, a strategic approach to the learning materials was taken by the two participants who started with text before moving towards the graphic representations, in accordance with a text-to-picture processing approach (hegarty & just, 1993; mason et al., 2017). the prioritization of text, regarding both the time spent with text and the decision to process text first, can also reflect the text superiority effect (e.g., corriveau, einav, robinson, & harris, 2014; einav, robinson, & fox, 2012; eyden, robinson, einav, & jaswal, 2013), which implies that children tend to put more trust and emphasis on information from written information compared to other types of information, especially in an academic context. the approaches of participant 3 and participant 4 can be described as less sophisticated and more passive. simply following the order of the web pages in a linear approach can reflect a type of “outsourcing” of the decisions regarding how to work with the learning materials to those who made the web pages. participants 2 and 3 read through the texts on the web pages quickly, as compared to what has been established as standards for students with dyslexia (prado et al., 2007; rayner et al., 2013). there are several reasons why one should be careful in interpreting this relatively fast reading pace as reflective of good word reading skills. first, the oft-cited standards are based on readers with dyslexia who are younger than those who participated in our study. second, previous research has found that the transparency of the orthography can influence fixation duration during reading. the fact that norwegian is a relatively transparent orthography compared to the orthographies within which the standards have been established could be a reason for the slight mismatch between our reading data and the established standards. there are currently no standards for average fixation duration, number of fixations, or amount of regressions based on reading in norwegian. third, previous research indicates that reading on a screen in and of itself can lead to a faster reading pace than reading on paper (e.g., trakhman, alexander, & berkowitz, in press; van de vijver & harsveld, 1994). our second research question concerned the relationship between processing patterns, individual differences in reading abilities and working memory, and learning and integration in a digital multimedia context. the results showed that three of the participants (1, 3, and 4) had substantial learning gains, also when compared to the learning gains of a norm sample of students without dyslexia. hence, these three participants were able to learn factual knowledge from representations and web pages, and this knowledge was sufficient to answer post-reading questions in a multiple-choice format. however, these participants were to a very limited degree able to integrate information across representations and web pages into a coherent mental representation that reconciled the opposing perspectives covered in the learning materials. participant 2, on the other hand, received a relatively good score on the integration task, also when compared to a norm sample previously responding to this task; however, this participant ended up with a small learning gain, both compared to the other participants in this case study and to the norm sample. as previously described, learning multimedia materials can place a high demand on the capacity-limited working memory system, and multimedia learning therefore is expected to be a challenge for students with dyslexia because they often have working memory problems. one plausible interpretation of our results is that none of the participants were able to learn information on a propositional level, necessary to answer the multiple-choice questions, and at the same time aggregate the main ideas from the representations and web pages into an integrated understanding of the issue. thus, it seemed that for these participants, the interaction with the multimedia materials resulted in an “either – or” learning outcome; either the details or the bigger picture, but not both. presumably, this result is due to the combination of reduced working memory capacity and word reading problems, with reading putting too heavy a load on an already limited working memory and leaving too little working memory capacity to both remember detailed information on a propositional level and integrate information (anmarkrud et al., 2018; hulme & snowling, 2009; melby-lervåg et al., 2012). accordingly, in previous research on paper-based reading, it has been found that both word recognition and comprehension processes compete for working memory capacity (e.g., stanovich, 1986). however, this study does not allow us to draw conclusions about whether it is dyslexia or limited working memory capacity that is causing the integration difficulties. in contrast to our results, as well as what has been found in previous research (e.g., alty et al., 2006; beacham & alty, 2006; maccullagh et al., 2017), knoop-van campen and colleagues (2018) did not find that readers with dyslexia had difficulties integrating information across different representations compared to peers without dyslexia. one likely explanation for inconsistent results in research on multimedia learning among students with dyslexia is the age of participants. the participants in knoop-van campen and colleagues’ (2018) study were 11-year-old children, while the participants in this and other studies (alty et al., 2006; beacham & alty, 2006; maccullagh et al., 2017) have been university students or adolescents in upper-secondary school. among 11-year-old children, including children with dyslexia and their typically developing peers, working memory is not yet fully developed (schneider, 2011). hence, similarity in working memory capacity in the two groups (i.e., students with and without dyslexia) could have resulted in a lack of difference in multimedia learning. the large learning gain of participant 1 also requires some explanation. despite a very limited working memory capacity and severe reading difficulties, participant 1 ended up with the largest learning gain among the four participants, a learning gain that was also well above the mean learning gain of the norm sample consisting of students without dyslexia. knoop-van campen and colleagues (2018) found that self-paced work with learning materials allowed the participants with dyslexia in their study to spend the time necessary to learn the materials properly. thus, those authors found that the participants with dyslexia used significantly more time with the learning materials in the text condition compared to the control group, but this extra time washed out the expected differences in learning. in line with this, participant 1’s slow and meticulous processing and reprocessing of the materials seem to have compensated for the shortcomings regarding working memory capacity and reading abilities compared to the other three participants. finally, although eye-tracking data revealed that all the participants processed all representations on all pages, it is intriguing that none of the participants drew on the text material in their answers on the integration task. they were, however, able to draw on information from the text when answering the multiple-choice questions. this implies that keeping information from the texts in working memory, and at the same time integrating this information with information from the other representations, seems to be a major challenge for readers with dyslexia. it should be noted that we used a portable eye-tracker without a chin rest with a sampling rate of 60 hz, which is a somewhat lower sampling rate than the current standard in reading research. albeit being a limitation of the present study, the relatively large areas of interest (i.e., paragraphs) used in the analyses of eye movements make it unlikely that the sampling rate had a large impact on the quality of the data. 5. conclusion our comparative case study indicates that students with dyslexia may approach conflicting web pages containing various representations in different ways. additionally, while they may process information strategically to improve their learning, they may fail to integrate information across different web pages and representations. to the best of our knowledge, this is the first study of dyslexic readers’ integration of information across representations and web pages. the few studies that exist on the online reading of students with dyslexia have focused more on the learning outcomes of internet reading (castek et al., 2011), internet reading as a community of practice (henry et al., 2012), dyslexia-friendly interfaces (mccarthy & swierenga, 2010), and the difference between reading printed and digital text (schneps, thomson, chen, sonnert, & pomplum, 2013). hence, knowledge about how readers with dyslexia process information in a digital multimedia context is essentially lacking. it should be noted that this lack of research makes it difficult to draw firm conclusions regarding readers with dyslexia based on previous work, since this group of readers is almost invisible in research on information processing in multimedia contexts. however, this study brings together two research fields that have traditionally been kept separate in research and theory, that is, the study of dyslexia and the study of learning in a multimedia environment. in the present information society, where the vast majority of adolescents, including those with dyslexia, use multimedia materials such as the internet as an important information source in their school work, this study represents a timely integration of important areas of research that, hopefully, will inspire much further work. keypoints students with dyslexia worked with multiple web pages and representations eye-tracking and interviews showed differences in processing and strategy use processing was related to individual differences and learning and integration learning and integrating information at the same time were problematic acknowledgments thanks are due to shane colvin and arild moland for help in creating the learning materials. references abtahi, m.s. (2012). interactive multimedia learning object (imlo) for dyslexic children. procedia – social and behavioral sciences, 47, 1206-1210. doi:10.1016/j.sbspro.2012.06.801 afflerbach, p., pearson, p.d., & paris, s.g. (2008). clarifying differences between reading skills and reading strategies. the reading teacher, 61, 364-373. doi:10.1598/rt.61.5.1 ainsworth, s. (2018). multiple representations and multimedia learning. in f. fischer, c.e. hmelo-silver, s.r. goldman, & p. reiman (eds.), international handbook of the learning sciences (pp. 96-105). new york: routledge. alty, j. l., al‐sharrah, a., & beacham, n. (2006). when humans form media and media form humans: an experimental study examining the effects different digital media have on the learning outcomes of students who have different learning styles. interacting with computers, 18 , 891–909. doi:10.1016/j.intcom.2006.04.002 anderson, t.h., & ambruster, b.b. (1984). studying. in p.d. pearson, m. kamil, r. barr, & p. rosenthal (eds.), handbook of reading research (1st ed., pp. 657–679). white plains, ny: longman. andresen, a., anmarkrud, ø., & bråten, i. (2019). investigating multiple source use among students with and without dyslexia. reading and writing, 32, 1149-1174. https://doi.org/10.1007/s11145-018-9904-z anmarkrud, ø., brante, e.w., & andresen, a. (2018). potential processing challenges of internet use among readers with dyslexia. in j.l.g. braasch, i. bråten, & m.t. mccrudden (eds.), handbook of multiple source use (pp. 117-132). new york: routledge. anmarkrud, ø., & ferguson, l.e. (2011). working memory and topic knowledge of norwegian 10th and 11th graders . unpublished data set. oslo: faculty of educational sciences, university of oslo. austin, k.a. ( 2009). multimedia learning: cognitive individual differences and display design techniques predict transfer learning with multimedia learning modules. computers & education, 53, 1339-1354. doi:10.1016/j.compedu.2009.06.017 avons, s.e., & hanna, c. (1995). the memory-span deficit in children with specific reading-disability – is speech rate responsible? british journal of developmental psychology, 13, 303-311. doi: 10.1111/j.2044-835x.1995.tb00681.x ayres, p., & sweller, j. (2014). the split-attention principle in multimedia learning. in r.e. mayer (ed.), the cambridge handbook of multimedia learning (pp. 206-226). new york: cambridge university press. azevedo, r., & cromley, j.g. (2004). does training on self-regulated learning facilitate students’ learning with hypermedia? journal of educational psychology, 96, 523-535. doi: 10.1037/0022-0663.96.3.523 baddeley, a. (1995). working memory. oxford: clarendon press. baddeley, a.d. (2000). the episodic buffer: a new component of working memory? trends in cognitive science, 4, 417-423. doi: 10.1016/s1364-6613(00)01538-2 bahnmueller, j., huber, s., nuerk, h.c., göbel, s.m., & moeller, k. (2016). processing multi-digit numbers: a translingual eye-tracking study. psychological research, 80, 422-433. doi: 10.1007/s00426-015-0729-y baker, l., & beall, l.c. (2009). metacognitive processes and reading comprehension. in s.e. israel & g.g. duffy (eds.), handbook of research on reading comprehension (pp. 373–388). new york: routledge. barbosa, t., miranda, m.c., santos, r.f., & bueno, o.f.a. (2009). phonological working memory, phonological awareness, and language in literacy difficulties in brazilian children. reading and writing, 22, 201-218. doi: 10.1007/s11145-007-9109-3 barzilai, s., & ka’adan, i. (2017). learning to integrate divergent information sources: the interplay of epistemic cognition and epistemic metacognition. metacognition and learning, 12, 193-232. doi: 10.1007/s11409-016-9165-7 beacham, n.a., & alty, j.l. (2006). an investigation into the effects that digital media can have on the learning outcomes of individuals who have dyslexia. computers & education, 47, 74-93. doi: 10.1016/j.compedu.2004.10.006 berninger, v. w., raskind, w., richards, t., abbott, r., & stock, p. (2008). a multidisciplinary approach to understanding developmental dyslexia within working-memory architecture: genotypes, phenotypes, brain, and instruction. developmental neuropsychology, 33, 707-744. doi: 10.1080/87565640802418662 bishop, d.v.m., & snowling, m.j. (2004). developmental dyslexia and specific language impairment: same or different? psychological bulletin, 130, 858-886. doi: 10.1037/0033-2909.130.6.858 björnsson, c. h. (1968). läsbarhet [readability]. stockholm: liber. borella, e., carretti, b., & pelegrina, s. (2010). the specific role of inhibition in reading comprehension in good and poor comprehenders. journal of learning disabilities, 43, 541-552. doi: 10.1177/0022219410371676 braasch, j.l.g., bråten, i., strømsø, h.i., & anmarkrud, ø. (2014). incremental theories of intelligence predict multiple document comprehension. learning and individual differences, 31, 11-20. doi: 10.1016/j.lindif.2013.12.012 bråten, i., anmarkrud, ø., brandmo, c., & strømsø h.i. (2014). developing and testing a model of direct and indirect relationships between individual differences, processing, and multiple-text comprehension . learning and instruction, 30, 9-24. doi: 10.1016/j.learninstruc.2013.11.002 bråten, i., braasch, j.l.g., & salmerón, l. (in press). reading multiple and non-traditional texts: new opportunities and new challenges. in e.b. moje, p. afflerbach, p. enciso, & n.k. lesaux (eds.), handbook of reading research (vol. v). new york: routledge. butcher, k.r. (2014). the multimedia principle. in r.e. mayer (ed.), the cambridge handbook of multimedia leraring (2nd ed., pp. 174-205). new york: cambridge university press. campbell, s. (2012). comparative case study. in a.j. mills, g. durepos, & e. wiebe (eds.), encyclopedia of case study research (pp. 175-176). thousand oaks, ca: sage. castek, j., zawilinski, l., mcverry, j.g., o'byrne, w.i., & leu, d.j. (2011). the new literacies of online reading comprehension: new opportunities and challenges for students with learning difficulties. in c. wyatt-smith, j. elkins, & s. gunn (eds.), multiple perspectives on difficulties in learning literacy and numeracy (pp. 91-110). new york: springer. chan, e., & unsworth, l. (2011). image-language interaction in online reading environments: challenges for students' reading comprehension. australian educational researcher, 38, 181-202. doi: 10.1007/s13384-011-0023-y cho, b.-y., & afflerbach, p. (2017). an evolving perspective of constructively responsive reading comprehension strategies in multilayered digital text environments. in s.e. israel (ed.), handbook of research on reading comprehension (2nd ed., pp. 109-134). new york: guilford. cho, b.-y., afflerbach, p., & han, h. (2018). strategic processing in accessing, comprehending, and using multiple sources online. in in j.l.g. braasch, i. bråten, & m.t. mccrudden (eds.), handbook of multiple source use (pp. 133-150). new york: routledge. cho, b.-y., woodward, l., & li, d. (2017). examining adolescents' strategic processing during online reading with a question generating task. american educational research journal, 54, 691-724. doi: 10.3102/0002831217701694 clark, j.m., & paivio, a. (1991). dual coding theory and education. educational psychology review, 3, 149-210. doi: 10.1007/bf01320076 corriveau, k.h., einav, s., robinson, e.j., & harris, p.l. (2014). to the letter: early readers trust print-based over oral instructions to guide their actions. british journal of developmental psychology, 32, 345-358. doi: 10.1111/bjdp.12046 cuevas , h.m., fiore , s.m., & oser, r.l. (2002). scaffolding cognitive and metacognitive processes in low verbal ability learners: use of diagrams in computer based training environments. instructional science, 30, 433–464. doi: 10.1023/a:1020516301541 daneman, m., & carpenter, p.a. (1980). individual differences in working memory and reading. journal of verbal learning and verbal behavior, 19, 450-466. doi: 10.1016/s0022-5371(80)90312-6 de olivera, d.g., da silva, p.b., dias, n.m., sebra, a.g., & macedo, e.c. (2014). reading component skills in dyslexia: word recognition, comprehension, and processing speed. frontiers in psychology, 5: 1339. doi: 10.3389/fpsyg.2014.01339 deschryver, m. (2015). higher order thinking in an online world: toward a theory of web-mediated knowledge synthesis.teachers college record, 116, 1-44. http://www.tcrecord.org id number: 17692 einav, s., robinson, e.j., & fox, a. (2012). take it as read: origins of trust in knowledge gained from print. journal of experimental psychology, 114, 262-274. doi: 10.1016/j.jecp.2012.09.016 eitel, a., & scheiter, k. (2015). picture or text first? explaining sequence effects when learning with pictures and text. educational psychology review, 27,153–180. doi: 10.1007/s10648-014-9264-4 eitel, a., scheiter, k., & schüler, a. (2013). how inspecting a picture affects processing of text in multimedia learning. applied cognitive psychology, 27, 451–461. doi: 10.1002/acp.2922 eitel, a., scheiter, k., schüler, a., nyström, m., & holmqvist, k. (2013). how a picture facilitates the process of learning from text: evidence for scaffolding. learning and instruction, 28, 48–63. doi: 10.1016/j.learninstruc.2013.05.002 eyden, j., robinson, e.j., einav, s., & jaswal, v.k. (2013). the power of print: children’s trust in unexpected printed suggestions. journal of experimental child psychology, 116, 593-608. doi: 10.1016/j.jecp.2013.06.012 fenesi, b., kramer, e., & kim, j.a. (2016). split-attention and coherence principles in multimedia instruction can rescue performance for learners with lower working memory capacity. applied cognitive psychology, 30, 691-699. doi: 10.1002/acp.3244 ferguson, l.e., & bråten, i. (2013). student profiles of knowledge and epistemic beliefs: changes and relations to multiple-text comprehension. learning and instruction, 25, 49-61. doi: 10.1016/j.learninstruc.2012.11.003 fischbach, a., könen, t., rietz, c. s., & hasselhorn, m. (2014). what is not working in working memory of children with literacy disorders? evidence from a three-year-longitudinal study. reading and writing, 27, 267-286. doi: 10.1007/s11145-013-9444-5 follmer, d. j. (2018). executive function and reading comprehension: a meta-analytic review. educational psychologist, 53, 42-60. doi: 10.1080/00461520.2017.1309295 fox, e., & alexander, p.a. (2017). text and comprehension. in s.e. israel (ed.), handbook of research on reading comprehension (2nd ed., pp. 335-352). new york: guilford. furnes, b., & norman, e. (2015). metacognition and reading: comparing three forms of metacognition in normally developing readers and readers with dyslexia. dyslexia, 21, 273-284. doi: 10.1002/dys.1501 gathercole, s.e., alloway, t.p., willis, c., & adams, a.-m. (2006). working memory in children with reading disabilities. journal of experimental child psychology, 93, 265-281. doi: 10.1016/j.jecp.2005.08.003 gerjets, p., scheiter, k., opfermann, m., hesse, f.w., & eysink, t.h. (2009). learning with hypermedia: the influence of representational formats and different levels of learner control on performance and learning behavior. computers in human behavior, 360–370. doi: 10.1016/j.chb.2008.12.015 greene, j.a., moos, d.c., azevedo, r., & winters, f.i. (2008). exploring differences between gifted and grade-level students’ use of self-regulatory learning processes with hypermedia. computers & education, 50, 1069-1083. doi:10.1016/j.compedu.2006.10.004 harm, m.v., & seidenberg, m.s. (1999). phonology, reading acquisition, and dyslexia: insights from connectionist models. psychological review, 106, 491-528. doi: 10.1037/0033-295x.106.3.491 hegarty, m., & just, m.a. (1993). constructing mental models of machines from text and diagrams. journal of memory and language, 32,717–742. doi: 10.1006/jmla.1993.1036 henry, l.a., castek, j., o'byrne, w.i., & zawilinski, l. (2012). using peer collaboration to support online reading, writing, and communication: an empowerment model for struggling readers. reading & writing quarterly, 28, 279-306. doi: 10.1080/10573569.2012.676431 høien, t. (2014). logos teoribasert diagnostisering av lesevansker [logos theory based assessment of reading difficulties] . bryne, norway: logometrica. hulme, c., & snowling, m. j. (2009). developmental disorders of language learning and cognition. chichester: wiley-blackwell. irrazabal, n., saux, g., & burin, d. (2016). procedural multimedia presentations: the effects of working memory and task complexity on instruction time and assembly accuracy. applied cognitive psychology, 30, 1052-1060. doi: 10.1002/acp.3299 jeffries, s., & everatt, j. (2004). working memory: its role in dyslexia and other specific learning disabilities. dyslexia, 10, 196-214. doi: 10.1002/dys.278 johnson, c.i., & mayer, r.e. (2012). an eye movement analysis of the spatial contiguity effect in multimedia learning. journal of experimental psychology: applied, 18, 178-179. doi: 10.1037/a0026923 just, m.a., & carpenter, p.a. (1992). a capacity theory of comprehension: individual differences in working memory. psychological review, 99, 122-149. doi: 10.1037/0033-295x.99.1.122 kammerer, y., meier, n., & stahl, e. (2016). fostering secondary-school students' intertext model formation when reading a set of websites: the effectiveness of source prompts. computers & education, 102, 52-64. doi: 10.1016/j.compedu.2016.07.001 kingsley, t., & tancock, s. (2013). internet inquiry: fundamental competencies for online comprehension. the reading teacher, 67, 389-399. doi: 10.1002/trtr.1223 klinkenberg, j.e., & skaar, e. (2003). stas: standardisert test i avkoding og staving [stas: standarized test of decoding and spelling] . hønefoss, norway: ringerike ppt. knoop-van campen, c.a.n., segers, e., & verhoeven, l. (2018). the modality and redundancy effects in multimedia learning in children with dyslexia. dyslexia, 24, 140-155. doi: 10.1002/dys.1585 kurby, c.a., britt, m.a., & magliano, j.p. (2005). the role of top-down and bottom-up processes in between-text integration. reading psychology, 26, 335–362. doi: 10.1080/02702710500285870 leu, d.j., kiili, c., & forzani, e. (2016). infividual differences in the new literacies of online research and comprehension. in p. afflerbach (ed.), handbook of individual differences in reading (pp. 259-272). new york: routledge. lyon, g.r., shaywitz, s.e., & shaywitz, b.a. (2003). a definition of dyslexia. annals of dyslexia, 53, 1-14. doi: 10.1007/s11881-003-0001-9 maccullagh, l., bosanquet, a., & badcock, n. (2017). university students with dyslexia: a qualitative exploratory study of learning practices, challenges, and strategies. dyslexia, 23, 3-23. doi: 10.1002/dys.1544 mason, l., junyent, a.a., & tornatora, m.c. (2014). epistemic evaluation and comprehension of web-source information on controversial science-related topics: effects of a short-term instructional intervention. computers & education, 76, 143-157. doi: 10.1016/j.compedu.2014.03.016 mason, l., pluchino, p., & tornatora, m.c. (2016). using eye-tracking technology as an indirect instruction tool to improve text and picture processing and learning. british journal of educational technology, 47, 1083-1095. doi: 10.1111/bjet.12271 mason, l., scheiter, k., & tornatora, m.c. (2017). using eye-movements to model the sequence of text-picture processing for multimedia comprehension. journal of computer assisted learning, 33, 443-460. doi: 10.1111/jcal.12191 mayer, r.e. (2001). multimedia learning. new york: cambridge university press. mayer, r.e. (2003). the promise of multimedia learning: using the same instructional design methods across different media. learning and instruction, 13, 125-139. doi: 10.1016/s0959-4752(02)00016-6 mayer, r.e. (2014a). introduction to multimedia learning. in r.e. mayer (ed.), the cambridge handbook of multimedia learning (pp. 1-24). new york: cambridge university press. mayer, r.e. (2014b). cognitive theory of multimedia learning. in r.e. mayer (ed.), the cambridge handbook of multimedia learning (pp. 43-71). cambridge: cambridge university press. mayer, r.e. (ed.) (2014c), the cambridge handbook of multimedia learning. new york: cambridge university press. mayer, r.e., heiser, h., & lonn, s. (2001). cognitive constraints on multimedia learning: when presenting more material results in less understanding. journal of educational psychology, 93, 187-198. doi: 10 1037i/0022-0663 93.1 187 mayer, r.e., & moreno, r. (2010). techniques that reduce extraneous cognitive load and manage intrinsic cognitive load during multimedia learning. in j.l. plass, r. moreno, & r. brünken (eds.), cognitive load theory (131-152). new york: cambridge university press. mccarthy, j.e., & swierenga, s.j. (2010). what we know about dyslexia and web accessibility: a research review. universal access in the information society, 9, 147-152. doi: 10.1007/s10209-009-0160-5 melby-lervåg, m., lyster, s.a.h., & hulme, c. (2012). phonologival skills and their role in learning to read: a meta-analytic review. psychological bulletin, 138, 322-352. doi: 10.1037/a0026744 menghini, d., finzi, a., carlesimo, g. a., & vicari, s. (2011). working memory impairment in children with developmental dyslexia: is it just a phonological deficity? developmental neuropsychology, 36, 199-213. doi: 10.1080/87565641.2010.549868 moreno, r., & mayer, r.e. (2000). engaging students in active learning: the case for personalized multimedia messages. journal of educational psychology, 92, 724-733. doi: 10.1037//0022-06m.92.4.724 moreno, r., & mayer, r.e. (2002). verbal redundancy in multimedia learning: when reading helps listening. journal of educational psychology, 94, 156-163. doi: 10.1037//0022-0663.94.1.156 myers, j.l., & o'brien, e.j. (1998). accessing the discourse during reading. discourse processes, 26, 131-157. doi: 10.1080/01638539809545042 paivio, a. (1971). imagery and verbal processes. new york: oxford university press. paivio, a. (1986). mental representations: a dual-coding approach. new york: oxford university press. pociask, f.d., & morrison, g.r. (2008). controlling split attention and redundancy in physical therapy instruction. educational technology research and development, 56, 379–399. doi: 10.1007/s11423-007-9062-5 prado, c., dubois, m., & valdois, s. (2007). the eye movements of dyslexic children during reading and visual search: impact of the visual attention span. vision research, 47, 2521-2530. doi: 10.1016/j.visres.2007.06.001 ramus, f., rosen, s., dakin, s.c., day, b.l., castellote, j.m., white, s., & frith, u. (2003). theories of developmental dyslexia: insights from a multiple case study of dyslexic adults. brain, 126, 841-865. doi: 10.1093/brain/awg076 rau, a.k., moll, k., snowling, m.j., & landerl, k. (2015). effects of orthographic consistency on eye movement behavior: german and english children and adults process the same words differently. journal of experimental child psychology, 130, 92-105. doi: 10.1016/j.jecp.2014.09.012 rayner, k., ardoin, s.p., & binder, k.s. (2013). children's eye movements in reading: a commentary. school psychology review, 42, 223-233. rayner, k., chace, k.h., slattery, t.j., & ashby, j. (2006). eye movements as reflections of comprehension processes in reading. scientific studies of reading, 10, 241-255. doi: 10.1207/s1532799xssr1003_3 rieber , l.p., tzeng, s.-c., & tribble, k. (2004). discovery learning, representation, and explanation within a computer-based simulation: finding the right mix. learning and instruction, 14, 307–323. doi: 10.1016/j.learninstruc.2004.06.008 roca, j., tejero, p., & insa, b. (2018). accident ahead? difficulties of drivers with and without reading impairment recognizing words and pictograms in variable message signs. applied ergonomics, 67, 83-90. doi: 10.1016/j.apergo.2017.09.013 roeschl-heils, a., schneider, w., & van kraayenoord, c.e. (2003). reading, metacognition, and motivation: a follow-up study of german students in grades 7 and 8. european journal of psychology of education, 18, 75–86. doi: 10.1007/bf03173605 rouet, j.-f., & britt, m.a. (2014). multimedia learning from multiple documents. in r.e. mayer (ed.), the cambridge handbook of multimedia learning (2nd ed., pp. 813-841). new york: cambridge university press. rukavina, i., & daneman, m. (1996). integration and its effect on acquiring knowledge about competing scientific theories from text. journal of educational psychology, 88, 272-287. doi: 10.1037/0022-0663.88.2.272 salmerón, l., strømsø, h.i., kammerer, y., stadtler, m., & van den broek, p. (2018). comprehension processes in digital reading. in m. barzillai, j. thomson, s. schroeder, & p. van den broek (eds.), learning to read in a digital world (pp. 91-120). amsterdam: john benjamins. schneider, w. (2011). memory development in childhood. in u. goswami (ed.), the wiley-blackwell handbook of childhood cognitive development (2nd ed., pp. 347-376). malden, ma: wiley-blackwell. schneps, m.h., thomson, j.m., chen, c., sonnert, g., & pomplum, m. (2013). e-readers are more effective than paper for some with dyslexia. plos one, 8, e75634. doi: 10.1371/journal.pone.0075634 schotter, e.r., tran, r., & rayner, k. (2014). don’t believe what you read (only once): comprehension is supported by regressions during reading. psychological science, 25, 1218-1226. doi: 10.1177/0956797614531148 schüler, a., scheiter, k., & van genuchten, e. (2011). the role of working memory in multimedia instruction: is working memory working during learning from text and pictures? educational psychology review, 23 , 389-411. doi: 10.1007/s10648-011-9168-5 shaywitz, s.e., & shaywitz, b.a. (2008). paying attention to reading: the neurobiology of reading and dyslexia. development and psychopathology, 20, 1329-1349. doi: 10.1017/s0954579408000631 smith-spark, j.h., & fisk, j.e. (2007). working memory functioning in developmental dyslexia. memory, 15, 34-56. doi: 10.1080/09658210601043384 stanovich, k.e. (1986). matthew effects in reading: some consequences of individual differences in the acuistion of literacy. reading research quarterly, 21, 360-407. doi: 10.1598/rrq.21.4.1 swanson, h.l., & trahan, m.f. (1992). learning disabled readers' comprehension of computer mediated text: the influence of working memory, metacognition, and attribution. learning disabilities research and practice, 7, 74-86. sweller, j., ayres, p., & kalyuga, s. (2011). cognitive load theory. new york: springer. taylor, m., duffy, s., & hughes, g. (2007). the use of animation in higher education teaching to support students with dyslexia. education + training, 49, 25-35. doi: 10.1108/00400910710729857 torcasio, s., & sweller, j. (2010). the use of illustrations when learning to read: a cognitive load theory approach. applied cognitive psychology, 24, 659–672. doi: 10.1002/acp.1577 torgersen, j.k. (2001). the theory and practice of intervention: comparing outcomes from prevention and remediation studies. in a. fawcett & r. nicolson (eds.), dyslexia: theory and good practice (pp. 185-201). london: fulton. torgersen, j.k., alexander, a.w., wagner, r.k., rashotte, c.a., voeller, k., conway, t., et al. (2001). intensive remedial instruction with severe reading disabilities: immediate and long-term outcomes from two instructional approaches. journal of learning disabilities, 34, 33-58. doi: 10.1177/002221940103400104 trakhman, l.m.s., alexander, p.a., & berkowitz, l.e. (in press). effects of processing time on comprehension and calibration in print and digital mediums. the journal of experimental education. advance online publication. doi: 10.1080/00220973.2017.1411877 van de vijver, f.j., & harsveld, m. (1994). the incomplete equivalence of the paper-and-pencil and computerized versions of the general aptitude test battery. journal of applied psychology, 79, 852–859. doi: 10.1037/0021-9010.79.6.852 van roy, b. & pretorius, e.j (2013). is reading in an agglutinating language different from an analytic language? an analysis of isizulu and english reading based on eye movements. southern african linguistics and applied language studies, 31, 281-297. doi: 10.2989/16073614.2013.837603 van strien, j.l.h., brand-gruwel, s., & boshuizen, h.p.a. (2014). dealing with conflicting information from multiple nonlinear texts: effects of prior attitudes. computers in human behavior, 32, 101-111. doi: 10.1016/j.chb.2013.11.021 vinje, f.e. (1982). journalistspråket [ the journalist language]. fredrikstad, norway: institute for journalism. yin, r.k. (2009). case study research: design and methods (4th ed.). thousand oaks, ca: sage. frontline learning research 5 (2014) 115-139 issn 2295-3159 corresponding author: kerstin helker, institute for education, rwth aachen university, eilfschornsteinstraße 7, 52056 aachen, germany, email: kerstin.helker@rwth-aachen.de doi: http://dx.doi.org/10.14786/flr.v2i3.99 115 | f l r responsibility in the school context – development and validation of a heuristic framework kerstin helker a , marold wosnitza a,b a rwth aachen university, germany b murdoch university, perth, australia article received 27 march 2014 / revised 25 may 2014 / accepted 23 june 2014 / available online 1 july 2014 abstract existing research has identified feelings of responsibility as having major motivational implications for a person’s actions. a person identifying as being responsible for a certain task will perceive themselves as self-determined and thus invest considerable effort in the task. despite being conceptualised as an individual’s sense of internal obligation, responsibility in everyday contexts is often attributed by and to other people. different perspectives on responsibility may, however, not always overlap, especially in the school context where tasks and liabilities often remain ill-defined. this paper thus presents a framework of responsibility in the school context which assumes teachers, students and parents to share a certain number of microsystems which may (indirectly) influence one another. in order to test the usefulness of the proposed framework, a series of studies were conducted collecting data on teachers’, students’ and parents’ views of their own and one another’s responsibility in the school context. 4339 statements were assigned to categories representing different parts of the framework and reveal its usefulness for describing the complexity of responsibility attributions and its influences in the school context. findings show the framework will be helpful to embrace existing research and develop questions for further research that address central educational issues such as student and teacher motivation, teacher burnout as well as prerequisites for students’ high or low achievement. keywords: teacher responsibility; student responsibility; parent responsibility; school context k. helker & m. wosnitza 116 | f l r in the last decade the extension of demands on schools led to an extensive discussion on the particular competencies and responsibilities of stakeholders in the school context. the challenge is that for most specific responsibilities, due to the complexity and the fact that tasks and liabilities in schools are often ill-defined (fischman, dibara, & gardner, 2006), can often not clearly be attributed to one specific person or group of people. to further complicate this, there is an absence of an agreed-upon definition of the term responsibility that can lead to conflicts between stakeholders perceiving their own and others‟ responsibility differently (lauermann & karabenick, 2011). conflicts are especially likely to occur between teachers, students and parents, all emphasizing different goals and judging their own and others‟ responsibility against the background of their own sphere of experience. a review of the empirical work on responsibility in the school context underlines the complexity of the concept responsibility (author/s). it shows that when teachers, parents and students talk about students‟ learning and achievement they assign the same responsibility differently to each other. it furthermore shows that the context or the cultural setting in which this responsibility attribution takes place plays a significant role. the interplay between the stakeholders‟ attributions of responsibility is still underresearched. thus, this paper aims to examine how teachers, students and parents attribute responsibility to themselves and one another, to disentangle these often implicit and confused responsibility attributions, and represent them in a heuristic framework. 1. responsibility despite being used in a multitude of contexts and sometimes being considered a “core concept of social life” (hamilton, 1978 p. 326), the term responsibility remains unclear (del schalock, 1998; fischman et al., 2006; maulbetsch, 2010). the multitude of perspectives from which responsibility has generally been studied indicates the fluid nature of the concept (lauermann & karabenick, 2011), with perspectives ranging from conceptualising responsibility as a relatively stable disposition of a person (bierhoff, 2000) to the interrelation between personal sense of responsibility and locus of control (guskey, 1981, 1982; rose & medway, 1981a, 1981b). due to this diversity of theoretical perspectives, responsibility in the literature is often conceptualised as a multirelational construct of at least three components, which in each context are engaged differently: somebody is responsible for something under supervision or judgment of some kind of sanctioning instance (auhagen, 1999; auhagen & bierhoff, 2001; bayertz, 1995; grotlüschen, 2008; höffe, 2008; schleißheimer, 1984). this judging instance can take many forms ranging between a court and the internal conscience. one of the most elaborated constructs of responsibility is lenk‟s (lenk, 1992; lenk & maring, 1993) six-component model asking: who (subject of responsibility) is attributed responsibility for what (object of responsibility), in view of whom (addressee) by whom (judging instance) in relation to what (normative) criteria and in what realm (of responsibility or action)? this construct of responsibility was taken up by lauermann and karabenick (2011) who studied the components and theoretical status of teacher responsibility in order to tease out the complexity of its different meanings. one basic aspect of their work was the distinction of responsibility from accountability, with the latter being an explicit, formal attribution of tasks. responsibility was defined as “a sense of internal obligation and commitment to produce or prevent designated outcomes or that these outcomes should have been produced or prevented” (lauermann & karabenick, 2011, p. 135). this definition accommodates the two perspectives implied in responsibility attributions: the retrospective (that something should have happened), which is often linked to questions of fault or guilt (weiner, 1995), and the prospective view (that something should happen), denoting a subject‟s obligations for certain people, things or states (werner, 2006). these prospective responsibilities can furthermore emerge in two different ways. despite some researchers assuming feelings of responsibility to only be the result of a personal disposition (bierhoff, 1995, 2000) or of social attributions (bayertz, 1995) it is generally assumed that responsibility can either result from attributions by other people or a person‟s own sense of obligation (auhagen, 1999; bacon, 1991; kammerl, 2008; kaufmann, 1995). these two perspectives very often overlap, especially when it comes to k. helker & m. wosnitza 117 | f l r rather ill-defined tasks like the teaching profession (fischman et al., 2006). the extensive discussion of teachers‟ professional behaviour and ethics shows this lack of conventional means for defining teacher responsibility. often, teachers only face a broad description of the field of activity and are (sometimes even contractually) attributed the paramount responsibility to define their specific tasks and what they feel responsible for (werner, 2006). based on self-determination theory, this could be considered positive, as it can be assumed that an internal sense of responsibility evokes more positive motivational responses as this person perceives themselves as self-determined which enhances engagement (berkowitz & daniels, 1963; ryan & deci, 2000) and work satisfaction (müller, 2009), whereas people only being attributed responsibility from external instances would have to be controlled for compliance (lauermann & karabenick, 2011). assuming much responsibility in response to broad or non-existing guidelines for action has, however, been hypothesised to cause burnout (fischman et al., 2006). 2. responsibility in school the above indicates the relevance of discussing responsibility in relation to teaching and learning in schools today. by acting as a teacher, a person is, as in any other job, attributed a specific task responsibility whose nature is determined by the specific role this person incorporates (leithwood, edge, & jantzi, 1999). due to teachers‟ tasks and liabilities being rather ill-defined (feiks, 1992; fischman et al., 2006; pätzold, 2008; tenorth, 2004), teachers are left to define what they assume themselves, and others respectively, to be responsible for. students and their parents, in return, can be assumed to also go through the process of defining their own and others‟ responsibilities which again influences teachers (fischman et al., 2006), who according to feiks (1992) are expected to do more than just fulfilling their explicitly set duties. empirical research up to this point, however, seems to have been guided by the role of teachers as the only bearer of responsibility in the classroom (bastian, 1995; del schalock, 1998; eikenbusch, 2009) and has strongly focused its attention on teacher responsibility, linking this research field to aspects like sources of teacher responsibility, contextual influences on perceptions of responsibility (responsibility as a social, situational phenomenon) and limitations of responsible actions. students‟ and parents‟ responsibility mostly served as confinements of teacher responsibility rather than being studied for their own sake. regarding teacher responsibility some studies focused on general objects of teacher responsibility (bourke, 1990), which they found to be centred around preparation and structuring learning materials, while others specifically studied teachers‟ sense of personal responsibility for their students‟ educational outcomes (bracci, 2009; halvorsen, lee, & andrade, 2009; matteucci & gosling, 2004; potvin & papillon, 1992). results showed that teachers were more ready to assume responsibility for their students‟ success than failure with responsible teachers being better prepared and attending more advanced training units while also experiencing more support and encouragement by school administrators. the direct school setting, its socioeconomic background (diamond, randolph, & spillane, 2004), size (lee & loeb, 2000) and perceived family influences (thrupp, mansell, hawksworth, & harold, 2003) but also more remote factors like the cultural influences on teachers‟ perception of their professional identity (barrett, 2005; karakaya, 2004) were found to affect teachers‟ perceptions and perceived limitations of their responsibility. fischman, dibara and gardner (2006), however, found that teachers, compared to other professions, were more ready to take responsibility. they generally counteract the missing of clear instructions and norms of behaviour by steadily focusing their sense of responsibility on their students whom they perceive to be primary addressees of their responsibility. teachers state to meet higher academic, social, emotional and developmental demands of students resulting from problematic environments by expanding their sphere of action and sense of responsibility. this behaviour could be assumed to deprive students and parents of their responsibilities, which to some might be considered a positive development. research has shown that student responsibility is widely understood to be limited to cooperative and social behaviour in the classroom (lewis, 2001), meeting expectations and learning goals (bryan & mclaughlin, 2005) and basically “doing the work” and “obeying the rules” (bacon, 1993). the students in bacon‟s study, being asked about what they thought they were k. helker & m. wosnitza 118 | f l r responsible for, indicated to mostly feel to be held responsible rather than have feelings of personal responsibility which based on ryan and deci (2000) can be assumed to deter students from developing feelings of self-determination. in contrast to these findings, zimmerman and kitsantas (2005) found students in self-regulated learning settings to generally rate their abilities higher and attribute more responsibility to learners than teachers. these students did not limit their responsibility to classroom learning, as was done in other studies, but also felt responsible for contextual factors outside the classroom that might indirectly influence their learning. one major factor in this respect, acknowledged by all three, teachers, students and parents, is how central a student‟s parents are with supporting their child‟s school work (ballard & bates, 2008). despite emphasising the importance of parent involvement, only few teachers state to feel responsible for establishing connections with parents but rather hold them responsible for getting engaged in school matters (ramirez, 1999). such views have, however, been found to also be context-specific as in china (katyal & evers, 2007) as well as turkey (korkmaz, 2007) parents are expected to provide a loving and supportive home in which the child is well cared for and thereby equipped for school, but leave educational matters to professional educators such as teachers. thus, parent involvement in school is neither supported nor expected. up to today, to our knowledge, only one new zealand study has studied all three, teachers‟, students‟ and parents‟ perception of their own and others‟ (retrospective) responsibility. peterson, rubiedavies, elley-brown, widdowson, dixon and irving (2011) found students to view themselves as most responsible for their learning outcomes. students, however, indicate influences by (for example more or less sympathetic) teachers and their parents‟ responsibility for supporting them and providing a stimulating learning environment. while interviewed parents shared this view, teachers emphasised students‟ responsibility for their motivation and success as well as contextual matters of school facilities and resources that might influence learning – and enable teachers to deny their responsibilities and attribute them to others. in sum, a review of the empirical literature regarding teacher, student and parent responsibility showed that these three central agents in schools assume or are attributed specific prospective responsibilities, some of which only become apparent in retrospect (helker & wosnitza, 2014). this entails potential for conflict as it is the nature of things that people can only be made accountable for issues they knew about being responsible for beforehand – what you do not know, you cannot take or be attributed responsibility for (lenk, 1992, p. 10). prior research revealed that most of these three major agents in schools direct their behaviour to those things they personally feel responsible for. own responsibilities are outlined by attributing all remaining tasks to other agents. the perception of one‟s own and others‟ prospective and retrospective responsibility is a highly individual matter (gärtner, 2010) which is influenced by the subjective perception of the importance of specific tasks and of situational factors. as these views seldom are openly addressed and negotiated, different perspectives are likely to not always overlap, which might generate conflicts when conflicting goals are emphasized (lauermann & karabenick, 2011). furthermore, empirical research suggests the importance of the role of context and interactions between agents, as findings show perceptions of own and others‟ responsibility to strongly vary with the (national, economic, social etc.) setting (e.g., katyal & evers, 2007; korkmaz, 2007). the following section will present existing models of context which have in the past been applied to (the analysis of) schools‟ working and learning processes and appear relevant for describing responsibility in school. 3. relevance of context for describing responsibility in school a number of models of context have been put forward in existing literature focusing on the multiple aspects of context (see wosnitza & beltman, 2012 for an overview). wosnitza and beltman (2012), who developed a model for analysing context of specific situations with regard to the level of interaction, perspective (subjective/objective) and content (social/physical/formal). the aspect of level of interaction in this model is, as in most other work relating to context, conceptualised closely along the lines of bronfenbrenner‟s model of the ecological environment. despite covering the aspect of differing perspectives k. helker & m. wosnitza 119 | f l r people may hold on various levels of context, the model focuses on explaining the context of a specific situation rather than describing interrelations between different agents. gurtner, monnard and genoud (2001) applied a model of the school context to explore its impact on students‟ motivation. drawing on the model‟s notion of indirect as well as bidirectional influence between the person and their environment, these authors highlighted that “two students placed in an apparently identical situation may react to it differently since the context in which each one will embed that situation might be quite different.” (p. 191). representing the nature of partnerships and relationships between schools, families and communities, joyce l. epstein‟s (2011) framework of overlapping spheres of influence has become widely acknowledged and applied especially to discussions of questions regarding parental involvement in schools (e.g., galindo & sheldon, 2012; katyal & evers, 2007; lawson, 2003). based on prior research that parents can influence student educational outcomes and achievement (e.g., leichter, 1974; lightfoot, 1978; marjoribanks, 1979), epstein (2011) developed a model of school, family and community partnerships which she applied to the development of research questions as well as strategies for action in improving those partnerships. in this model, school, family and community are represented as three spheres of context which overlap to a certain degree which is determined by external forces and internal actions (e.g., time, backgrounds, actions taken in families and schools). these spheres can have unique and also combined influences on children through the interactions of parents, teachers, students and community partners. taking action for bringing together the different partners and thus enlarging the overlap between these spheres is considered to help identify shared responsibilities of home, school, and community and to increase positive influences on children. also, the degree of overlap obscures boundaries between school and family, so that the influence of one sphere can still be at work while the student is involved in the other (epstein, 2011). in proposing this framework, epstein called for the recognition of shared goals and responsibilities for the socialisation and education of the child (epstein, 2011, p. 26) and for researchers to recognise schools‟, homes‟ and communities‟ simultaneous and cumulative effects on student development and learning (epstein & sheldon, 2006). while epstein strongly focused on what strategies could be applied by schools and educators to establish functioning and reliable partnerships with their students‟ families and communities, other researchers have emphasised the influences between these spheres. christenson (2004) pointed out that different antecedents may result in the same outcome (equifinality) while similar initial conditions may still lead to dissimilar results (multifinality). applying bronfenbrenner‟s model of the ecological environment (1979), she denied the possibility of developing “uniform prescriptions” for involving parents to improve students‟ school performance, as interfaces of home and school may be variably overlapping (p. 87). to sum up, existing models can partially account for attributions of responsibility and how responsible or irresponsible actions of teachers, students and parents influence what happens in the specific or related contexts. nevertheless, to the best of our knowledge, no work has been published which developed a theoretical background for research presenting the different agents in the school context and as context for schools and one another. up to now, the different research perspectives appear isolated, incommensurate and thereby impeding a broad understanding of the phenomenon of responsibility in the school context. thus, in the following, a heuristic framework shall be presented which draws on the models of context already presented in order to comprehensively describing and structure responsibility in the school context. in order to validate this framework, empirical data will be applied to support its different components. 4. towards a framework of responsibility in the school context based on the above, we propose a heuristic framework for representing the origins and impacts of responsibility attributions in school context. teachers, students and parents attribute responsibility to themselves and the other agents in the school context to prevent or produce certain outcomes (prospective) k. helker & m. wosnitza 120 | f l r based on their perception of the respective (professional) roles, context and individual spheres of action. responsibility attributed from external sources does not automatically imply an internal sense of responsibility, because often no comparisons are made between different perspectives which can evoke differences in (the possibility of) retrospective attributions of responsibility. furthermore, attributed responsibilities may not overlap due to different perspectives on the context and the settings in which a specific person is involved. regarding the individual spheres of actions, applying bronfenbrenner‟s (1979) model of the ecological environment to responsibility attributions in the school context, we propose that teachers, students and parents engage in several subcontexts, i.e. microsystems, which sum up to constitute this person‟s mesosystem. when it comes to their school-related activities, teacher, students and parents share a specific number of the microsystems in which they are involved, i.e. interfaces of their mesosystems (general sphere of action). a mathematics teacher, for example shares the microsystem „math lesson‟ with their students while he or she might not be involved in the microsystem „english lesson‟, the students share with somebody else. while teachers‟ and parents‟ mesosystems overlap on parents‟ days, parents, just like any other of the named agents, are involved in many other, not school-related microsystems, none of the other agents is part of as for example home, work or free-time activities. actions that are not located in one of the shared microsystems (exosystems) these interfaces comprise (e.g., events outside school), might, however, have an indirect, yet considerable effect on what happens there. a parent-teacher talk might be a microsystem, the teacher and a student‟s parents share, in which the student is not involved but will certainly be affected by. it thus represents an exosystem for the student. while this example is quite obvious, it could be assumed that many other exosystems influence school microsystems which the agents sharing this microsystem are not aware of. all of the above described levels of context are embedded in the macrocontext that could be the cultural setting or the socioeconomic background of the school or family. in sum, the above assumptions allow for the following conclusions. the school context is understood as consisting of a multitude of microsystems which are determined by the agents involved. due to their bidirectional influence with the environment, in the school context, multiple actors function as the context for one another. thus, certain educational outcomes like students‟ success and failure in school cannot be traced back to one specific incident but result from the interplay of the many microsystems a student is involved in as well as the indirect influences of different exosystems. in conclusion, we assume that when it comes to their responsibility, teachers, students and parents can be and are often attributed not only specific responsibilities but also a general responsibility for certain outcomes which is not directly related to what happens in one specific microsystem but rather all of the microsystems, i.e. the mesosystem, this person, representing this specific role, is involved in (e.g., for a student home, school, meetings with friends etc.). the macrosystem in which all the other systems are embedded may influence the nature of the mesosystem as well as the attributed responsibilities (e.g., karakaya, 2004). parents from a different cultural background may thus assume teachers to be involved in microsystems or responsible for objects that they are not in this culture and would thus deny. the nature of the individual‟s mesosystem and thereby the microsystems and interfaces a person is involved in (i.e. who else is involved in this setting, physical and material nature of the setting), determines the objects this person feels or is held responsible for. drawing on lenk‟s (1992) argument that a person can only be held responsible for such things they are aware of, following duff (1998) we furthermore suggest the view that a person can only be attributed responsibilities which could be fulfilled in microsystems they are part of (e.g., teachers cannot be held responsible for what happens in the student‟s home as they are not involved in this microsystem and to not have control over events). although teachers may indirectly influence these events by their actions at school, they do not have a direct control over the events in a student‟s home. thus, the attribution of prospective responsibility requires a profound understanding of what microsystems an individual‟s sphere of action (mesosystem) comprises and which issues they have the capacity to act on. in addition, retrospective judgments of whether and how attributed responsibilities have been fulfilled are only valid if the respective person knew about and at that time was able to act on them. k. helker & m. wosnitza 121 | f l r 4.1 the heuristic framework and its structural elements the proposed heuristic framework of responsibility in the school context brings together the above considerations. as illustrated in figure 1, the framework most importantly comprises teachers, students and parents as the three central subjects, i.e. bearers, of responsibility in the school context who share specific microsystems which determine to what degree their mesosystems overlap. figure 1. heuristic framework for structuring responsibility in the school context. as illustrated in figure 1, on a conceptual level, the following subjects, i.e. bearers, of responsibility, can be identified: the teacher (t), the particular student (s), his/her parent(s) (p) and his/her classmates (c). the mesosystem of a teacher, illustrated by the circle at the top, comprises a multitude of microsystems in which they take responsibility, some of them shared with other people, others not shared with anyone. furthermore, this mesosystem is partly embedded in the macrocontext (i.e. all microsystems in which the person is involved). besides this, the teacher also shares a number of microsystems with people outside school (indicated by the dotted line) which are part of the wider community in which he or she lives, and thus takes responsibility these contexts. these microsystems can also serve as exosystems to what happens in school microsystems, with the influence and thus indirect responsibility being mediated by the teacher. interface st (student-teacher) comprises all the microsystems in which a specific student interacts with his or her teacher and both may feel or be held responsible. within the macrocontext of school, however, these are not the only microsystems the student and his/her teacher share. as the model focuses its representation on one specific student, another conceptual component of the model are the classmates of the respective student, who as a collective can also be attributed certain responsibilities which have to be differentiated from those attributed to the respective student. thus, in this model, microsystems, which the student and teacher share with the student‟s classmates (e.g., a lesson), are located in interface stc (students-teacher-classmates). the microsystems located here are not only classroom settings but also any incident in which these three actors are involved and can thus be held responsible. correspondingly, microsystems located in interface st may be set in the classroom because teachers might only interact with one student although being in the classroom with the whole group. as these groups of students can exist without the respective student being part of them and thus has to be conceptually differentiated from them, interface sc (student-classmates) comprises those microsystems shared by the student in focus and his or her classmates. this interface does not necessarily have to be embedded in the school context, as students are often friends and meet outside school. also, just k. helker & m. wosnitza 122 | f l r like in the teacher‟s case, the student‟s mesosystem involves microsystems he or she does not share with anyone or with people outside school like in a sports club, which can also be assumed to have an impact on what happens in other microsystems this student is involved in. thus, he or she might feel or be held responsible for objects located there. classmates were not explicitly attributed responsibilities, as they represent the students as a group. besides teachers and students, parents were identified to constitute the third agent in the school context. parents can be assumed to mainly be involved in the context of school when they share microsystems with either their child or their child‟s teacher. thus, interface sp (student-parents) subsumes those microsystems in which the student interacts with his or her parents, e.g. if parents help their child with their homework or ask about school over dinner. there are also microsystems, in which parents, their child and their child‟s teacher interact (interface stp, student-teacher-parents) and thus feel or are held responsible, this area being most strongly addressed by current research on parent involvement. furthermore, there are also microsystems, which parents only share with their child‟s teacher (e.g., parents‟ evenings), which are located in interface tp (teacher-parents). of course, one could argue that parents might also participate in microsystems which their child, his/her teacher and the classmates participate in and thus an interface was missing from this model. we propose, however, that if parents participate in their child‟s classroom (and also as attendants on field trips etc.) there is a change of roles by which parents assume the role of a teacher. these microsystems should thus also be located in the interfaces st and stc. this consideration indicates that responsibilities attributed to teachers, students and parents in the school context can be assumed to result from the specific roles a person is incorporating which invites the attribution of certain tasks and liabilities. furthermore, we hypothesise that the perception of what microsystems a person is involved or not involved in strongly affects what responsibilities are attributed to him or her. 5. validation of the framework suggested for structuring responsibility in the school context empirical studies into the issue of responsibility as were presented above have produced results that can be linked to further explicate some aspects of this framework of responsibility in the school context. no studies, however, have, to our knowledge, yet addressed the issue in its full complexity. therefore, the proposed framework shall be examined along data from several empirical studies in order to examine whether the model is useful to account for questions regarding the issue of responsibility attributions between teachers, students and parents. to meet this goal, the aim of this paper is to examine whether the proposed framework is useful and adequate for structuring responsibility attributions in the school context. furthermore, the study will look at the nature of the interfaces of teachers‟, students‟ and parents‟ mesosystems and examine what responsibilities these agents attribute to themselves and each other. 5.1 method the data presented in the following to support the above presented framework for structuring responsibility in the school context result from a series of studies each exploring the matter of teacher, student and parent responsibility in german secondary education. included studies and specific foci: (1) online survey with students about teacher responsibility, including lauermann & karabenick‟s (2013) teacher responsibility scale k. helker & m. wosnitza 123 | f l r (2) online survey with students about student responsibility, including a newly-developed student responsibility scale along the lines of lauermann & karabenick (2013) (3) online survey with parents about teacher responsibility, including lauermann & karabenick‟s (2013) teacher responsibility scale (4) pen and paper survey with parents of one local school (highest educational track) on shared and individual responsibility of teachers, students and parents (5) pen and paper survey of teachers of the highest educational track on shared and individual responsibility of teachers, students and parents (6) pen and paper survey of teachers of all educational tracks on shared and individual responsibility of teachers, students and parents (7) pen and paper survey of students of the highest educational track on shared and individual responsibility of teachers, students and parents. although every one of these studies had a different focus, they all contained at least one open-ended question each asking participants about their understanding of responsible teacher‟s, student‟s or parent‟s behavior (e.g., “what behavior characterizes a responsible teacher?”). some of the questionnaires contained three questions about all three agents, some of the studies only focused on student responsibility and thus only asked participants to characterize responsible students‟ behavior. the phrasing of the question aimed at catching broad descriptions of perceived teachers‟, students‟ and parents‟ responsibility. participants could name as many responsibilities as they liked for each agent. each of these mentioned responsibilities was later coded and counted separately. statements from all studies were organized into nine groups regarding the perspective (e.g., if respondents were teachers) and focus (e.g., statement about student responsibility) of the statement. table 1 provides an overview of the characteristics of these so-combined groups, characteristics of the sample and the number of statements in the perspective indicated (e.g., first cell: 68 teachers of which 58.8% were female and 42.2% were younger than or 40 years old provided 177 statements about teacher responsibility.). also, total numbers of statements per respondent group and subjects of responsibility are presented. table 1 overview of samples regarding perspective teachers‟ view of… students‟ view of… parents‟ view of… total # of statements teacher responsibility n=68; ♀ 58.8%; age: 42.4% ≤ 40years statements: 177 n=610; ♀ 60.3%: age: m=14.6 sd=2.4 statements: 1475 n=162; ♀ 88.9%; child: ♀ 54.9%; age: m=12.8 sd=2.1 statements: 535 2187 student responsibility n=68; ♀ 58.8%; age: 42.4% ≤ 40years statements: 161 n=279; ♀ 59.3%: age: m=15.1 sd=2.6 statements: 763 n=106; ♀ 85.8%; child: ♀ 57.5%; age: m=12.6 sd=0.8 statements: 364 1288 parent responsibility n=68; ♀ 58.8%; age: 42.4% ≤ 40years statements: 143 n=164; ♀ 58.9%: age: m=15.1 sd=3.0 statements: 405 n=106; ♀ 85.8%; child: ♀ 57.5%; age: m=12.6 sd=0.8 statements: 316 864 total # of statements 481 2643 1215 4339 all 4339 statements regarding teachers‟, students‟ and parents‟ responsibility were analysed using nvivo10 software for qualitative data analysis. all data were coded by a second coder and intercoder agreement was 74.1%. k. helker & m. wosnitza 124 | f l r data were coded into categories representing the six interfaces (see fig.1: st, stc, sc, sp, stp, tp) of these agents‟ mesosystems (i.e. what these people are feeling or being held responsible for in these specific microsystems.). during the coding process it became obvious that teachers, students and parents are often attributed general responsibilities that they are responsible to fulfill in all microsystems they are involved in (e.g. for being honest and trustworthy) that could not be coded into one specific interface. while these statements could have been coded into each of the above categories, as the specific person is stated to always be responsible for this object, the coders decided to code them separately in order to adequately test the framework. thus, general responsibilities being attributed to a person to be fulfilled in all the microsystems in which they are involved, were categorized into three groups representing teachers‟, students‟ and parents‟ mesosystems (i.e., sum of their microsystems). these main categories also included data on the respective person‟s responsibility for interactions with people not included in the framework for reasons of complexity (like colleagues, other parents, friends outside school etc.) in order to learn more about the responsibilities of each of the nine main categories, data in these were in a second step further categorized into sub-categories representing the different objects of responsibility in order to empirically describe the main categories. in some of these main categories, a further sub-division of statement was not necessary, due to the data varying regarding their levels of differentiation and depth. statements regarding the influences between different microsystems and also clear-cut distinctions between different areas of involvement were double-coded in an additional category for further analyses. 6. results of the overall 4339 statements about teacher, student and parent responsibility, 3993 statements could clearly be attributed to one of nine main categories suggested by the proposed framework. the remaining 346 could not be coded for reasons of ambiguity, incomprehensibility or lack of relevance regarding the research topic. the three categories covering students‟, teachers‟, and parents‟ general responsibility include those statements concerning what the specific agent is responsible for in all the microsystems they are involved in. furthermore, six main categories represent responsibilities resulting from the agents‟ shared microsystems, i.e. the interfaces of their mesosystems (see fig 1). results showed that no person was attributed responsibility in contexts in which they were not involved (e.g., parents were not attributed responsibilities that could be categorised in interface stc) and with few exceptions, all people involved were, however, attributed responsibility in the contexts in which they are involved. these exceptions concern interfaces stp and tp, in which students do not attribute any responsibility whatsoever to teachers. comparing the number of statements in the nine main categories, representing mesosystems and interfaces of these, and emphasis of a group‟s statements, with 35.2% of statements by students, the most prominent theme in the students‟ statements was the interface stc, the interaction of the teacher with their students. for the statements by teachers, the main emphasis lies on four categories, namely, students‟ (23.1% of teachers‟ statements) and teachers‟ (19.5%) general responsibilities, interface stc (18.2%) and sp (19.7%). statements by parents focused on students‟ general responsibility (20.1%) as well as interfaces stc (25.0%) and sp (19.3%). just focusing on the interfaces, stc and sp show to be the most frequently mentioned. in a second step, data attributed to these categories were analysed further for emergent themes. this section will start out by describing teachers‟, students‟ and parents‟ general responsibilities, i.e. those things, these three agents feel or are held responsible for in all the school-related microsystems they are involved in. furthermore, microsystems which these agents do not share with others or with people beyond the focus of this approach will be indicated. describing these three main categories will provide an overview of teachers‟, students‟ and parents‟ responsibilities which do not result from their being involved in a specific k. helker & m. wosnitza 125 | f l r microsystem they share with one of the other agents but from their generally being involved in the school context. these results will be followed by the presentation of the objects of responsibility arising from the shared microsystems of teachers, students and parents. the six categories representing the interfaces of teachers‟, students‟ and parents‟ mesosystems all contain responsibilities. the people involved are attributed these responsibilities because they interact with the other person(s) in this context. 6.1 teacher responsibility regarding teachers‟ general responsibility they feel or are expected by others to fulfill in all microsystems they are involved in, 16 categories were identified in the teachers‟, students‟ and parents‟ data. table 2 provides an overview of the categories as well as total numbers of categorized statements and percentages of each respondents‟ (teachers, students and parents) group. table 2 categories, frequencies and percentages of teachers’ general responsibility. categories teachers are generally responsible for… total # of statements teachers’ statements students’ statements parents’ statements …being attentive, empathic, caring and compassionate. 116 (19.66%) 32.94% 15.78% 22.14% …appearing nice and friendly in their interactions with others. 109 (18.47%) 2.35% 25.67% 8.40% ...being honest, reliable and trustworthy. 69 (11.69%) 12.94% 10.70% 13.74% …having pedagogical, methodological and content knowledge (showing teaching competence). 58 (9.83%) 11.76% 8.56% 12.21% …being ready, motivated, willing and trying hard to teach their students. 57 (9.66%) 11.76% 7.75% 13.74% …preparing lessons. 36 (6.10%) 10.59% 5.35% 5.34% …their relations with students, parents and other teachers in general. 32 (5.42%) 4.71% 5.35% 6.11% …being helpful. 31 (5.25%) 8.02% 0.76% …their self-reflection. 18 (3.05%) 3.53% 2.41% 4.58% …being patient. 17 (2.88%) 1.18% 2.94% 3.82% …doing their job and fulfilling their duties. 11 (1.86%) 3.53% 1.34% 2.29% …being well-organised. 10 (1.69%) 2.41% 0.76% …being open. 8 (1.36%) 3.35% 0.53% 2.29% …sticking to the given rules. 8 (1.36%) 2.14% … their cooperation and relations with other teachers. 5 (0.85%) 0.80% 1.53% …getting advanced training. 5 (0.85%) 1.18% 0.27% 2.29% total 590 85 (100%) 374 (100%) 131 (100%) teachers were on this general level attributed responsibilities connected to their fulfilling of their job and its requirements or were attributed responsibility for their relations with other agents (students, parents, colleagues) in the school context and for partially personal qualities. thus, teachers were attributed the responsibility for being attentive, empathic, caring and compassionate (e.g., “showing interest in every student and getting this across to the students” (t#13tr1)), which was the main category in teachers‟ and 1 abbreviation “t#13tr“ means “teacher no. 13 about teacher responsibility”. k. helker & m. wosnitza 126 | f l r parents‟ numbers of statements regarding general teacher responsibility. in the students‟ statements, the teacher responsibility for appearing nice and friendly in their interaction with others was identified as the main theme. further responsibilities regarding their social interactions, were teachers‟ responsibility for being honest, reliable and trustworthy and for being helpful and also being patient in their interactions with others (e.g., “being patient with every student (even if it‟s hard)” (s#141tr)) regarding teachers‟ responsibility for objects more related to their job, teachers were attributed the responsibility for doing their job and fulfilling their duties, having distinctive pedagogical, methodological and content knowledge (e.g., “high social and pedagogical competence” (p#114tr); “ability to teach us a lot” (s#2tr)), sticking to the given rules (e.g., “a responsible teacher has to also meet the rules set for the students (no mobile phone, chewing gum, etc.)” (s#113tr)) and being well-organized (e.g., “not being sloppy/ forgetful” (s#514tr)). furthermore, teachers‟ responsibility for being ready, motivated, willing and trying hard to teach their students (e.g., “love children and his job” (p#111tr)) was identified in the data. independently of others, teachers are also involved in microcontexts they do not share with the named agents, but in which they are attributed the responsibility for preparing lessons and getting advanced training (e.g., “be ready and motivated to get advanced training regarding contents and pedagogical matters” (t#36tr)). the data suggested another microsystem, i.e. teachers‟ cooperation and relations with other teachers (e.g., “cooperate with colleagues” (s#105tr); “not insulting other teachers” (s#512tr)). 6.2 student responsibility altogether, 17 general responsibilities of students were identified. these are represented in table 3 including the numbers of statements overall as well as per group of respondents. these general responsibilities, i.e., objects of student responsibility in all microsystems they are involved in, could be subdivided into 17 sub-categories. table 3 categories, frequencies and percentages of students’ general responsibility. categories students are generally responsible for… total # of statements teachers’ statements students’ statements parents’ statements …being ready, motivated, willing and trying their best to learn and be successful. 157 (22.14%) 21.51% 19.60% 27.06% …monitoring and adapting their own learning progress and study. 89 (12.55%) 9.68% 15.58% 8.26% …having positive relations with other people. 86 (12.13%) 8.60% 11.06% 15.60% …doing their homework. 74 (10.44%) 8.60% 13.07% 6.42% …doing their work and fulfilling their duties. 62 (8.74%) 11.83% 8.04% 8.72% …being honest and reliable. 35 (4.94%) 7.53% 3.52% 6.42% …school matters in general. 30 (4.23%) 3.23% 2.76% 7.34% …being self-reliant and take responsibility. 29 (4.09%) 5.38% 4.27% 3.21% …turning to others when needing help and accepting help given. 26 (3.67%) 4.30% 2.76% 5.05% …sticking to the rules. 22 (3.10%) 4.30% 3.27% 2.29% …their self-reflection. 21 (2.96%) 4.30% 2.51% 3.21% …preparing exams. 19 (2.68%) 1.08% 4.27% 0.46% …accuracy and order. 16 (2.26%) 2.15% 2.26% 2.29% …preparing lessons and revise contents taught. 16 (2.26%) 1.08% 3.52% 0.46% …getting good grades. 11 (1.55%) 1.08% 2.51% 0.00% …being open. 10 (1.41%) 2.15% 0.25% 3.21% …students are responsible to engage in sports 6 (0.85%) 3.23% 0.75% 0.00% k. helker & m. wosnitza 127 | f l r or social clubs outside school. total 709 93(100%) 398 (100%) 218 (100%) all three respondent groups, teachers, students and parents, mentioned students‟ responsibility for being ready, motivated, willing and trying their best to learn and be successful (e.g., “being ambitious to achieve the best possible result” (p#69sr)) most often. students also frequently mentioned their own responsibility to monitor and adapt their learning progress and study (e.g., “revise contents” (s#206sr)) as well as doing their homework. students were attributed the responsibility for doing their work and fulfilling their duties (e.g., “fulfilling even displeasing tasks” (t#1sr)), sticking to the rules, being self-reliant (e.g., “being self-reliant” (s#109sr)) and school matters in general (e.g., “attending school on a regular basis” p#43sr)). furthermore, students are held responsible for having positive relations with other people (e.g., “having a pronounced social behavior” (t#43sr)) and being honest and reliable (e.g., “keeping agreed dates (e.g. for projects)” (s#152sr)). regarding specifically students‟ school work, the responsibility for getting good grades but also for turning to others when needing help and accepting help given appear in the data (e.g., “notifying parents and teachers about problems and accepting help” (s#128sr)). independently of others, students alone are responsible for working and studying at home, which includes completing tasks and preparing for exams, but also monitoring their own learning success and study accordingly (e.g., “recognizing when more has to be done for school” (t#7sr)). apart from school matters, some data suggested students‟ responsibility to further engage in sports or social clubs outside school (e.g., “[a responsible student] shows social commitment in his free time (youth fire fighters, sports club)” (t#44sr)). 6.3 parent responsibility just like all other microsystems parents are involved in, the number of statements attributed to this category was comparatively low with 5.3% of teachers‟, 0.7% of students‟ and 14.5% of parents‟ statements. however, five categories of parents‟ responsibility could be identified as well as differences regarding how strongly these categories were emphasized by each respondent group. table 4 categories, frequencies and percentages of parents’ general responsibility. categories parents are generally responsible for… total # of statements teachers’ statements students’ statements parents’ statements …getting involved and engage in school matters. 31 (44.29%) 70.83% 23.53% 34.48% …being interested. 23 (32.86%) 16.67% 64.71% 27.59% …being objective, diplomatic and able to conciliate. 10 (14.29%) 12.50% 11.76% 17.24% …having a positive attitude towards school. 3 (4.29%) 0.00% 0.00% 10.34% …their interactions with other parents. 3 (4.29%) 0.00% 0.00% 10.34% total 70 24 (100%) 17 (100%) 29 (100%) the data suggested that parents‟ general responsibilities were for getting involved and engage in school (e.g., “cooperation with the school” (t#27pr); “participating in school life” (t#53pr)). this responsibility for getting involved in school was the most dominant one in teachers‟ statements (70.83%) and also strongly emphasized by parents themselves (34.48%) who also stressed parents‟ being responsible for being interested (27.59%), a responsibility being particularly stressed in the students‟ statements (64.71%). k. helker & m. wosnitza 128 | f l r furthermore, parents were stated to have the general responsibility for having a positive attitude towards school (e.g., “have a positive attitude towards school and lessons” (p#10pr)), being objective and diplomatic regarding school matters (e.g., “being objective towards teachers” (p#50pr)) and a specific microsystem that emerged from the data was parents‟ interactions with other parents‟ for which they are also stated to be responsible (e.g., “cooperation and communication with other parents” (p#62pr)). 6.4 interface st – student-teacher to this main category, all statements were assigned which concerned the respective student‟s interactions with their teacher independent of other people. while every teacher has a large number of students, this category represents those responsibilities attributed to teachers in every context they interact with an individual student. this interface is different from interface stc in that it only includes responsibilities that derive from a specific student interacting with a specific teacher. these dyadic interactions may, in fact, be also set in the classroom but do exist irrespective of co-students being involved (as is the case for responsibilities coded in interface stc). thus, one could hypothetically assume these responsibilities to also derive from microsystems that students share with their teachers and classmates (stc) because teachers might only interact with one student although being in the classroom with the whole group. the two categories are, however, conceptually different and thus treated separately here. table 5 provides an overview of teachers‟ responsibilities in their interactions with their students. table 5 categories, frequencies and percentages of teachers’ responsibility in their interactions with their students. categories teachers are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …developing a positive, caring, interested personal relationship with every student. 136 (32.69%) 16.67% 32.25% 37.07% …listening to, helping, counselling students when they have problems inor outside school. 117 (28.13%) 8.33% 30.80% 25.86% …enhancing every single student‟s learning. 87 (20.91%) 33.33% 19.20% 22.41% …treating students with respect. 42 (10.10%) 12.50% 9.78% 10.34% …being a role model. 22 (5.29%) 16.67% 5.43% 2.59% …developing student‟s personality. 8 (1.92%) 12.50% 1.09% 1.72% …the individual student‟s grades. 4 (0.96%) 0.00% 1.45% 0.00% total 416 24 (100%) 276 (100%) 116 (100%) teachers are described by in students‟ and parents‟ statements as most responsible for developing a positive, caring, interested, personal relationship with every student. students furthermore emphasizing teachers‟ responsibility to listen to, help and counsel their student whenever he/she turns to them with problems inor outside school (e.g., “to be responsive to students‟ problems in and outside school.” (t#61tr)). regarding school outcomes, teachers‟ responsibility to enhance every single student‟s learning (e.g., “adapt to different types of learners and try for all of them to have the same chances.” (s#114tr)) lies in the center of teachers‟ statements about their own responsibility. teachers are furthermore described as responsible for being a role model for this student (e.g., “a responsible teacher exemplifies positive behavior (meet deadlines, being organized, on time, fair…)” (t#26tr)) and treating the student with respect. other responsibilities attributed to the main category of teachers‟ general responsibilities were the ones for the student‟s grades (only mentioned by students) and the development of their personality (e.g., “co-education, e.g. what‟s good or not for a student at the moment and in the future” (p#39tr)). of the mentioned, the personal relationship lies more in the center of students‟ statements than do educational matters. k. helker & m. wosnitza 129 | f l r regarding students‟ responsibility in their interaction with the teacher, four categories could be identified in the data. table 6 provides an overview. table 6 categories, frequencies and percentages of students’ responsibility in their interactions with their teacher. categories students are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …treating teacher with respect, accept their authority. 30 (73.17%) 100.00% 65.22% 81.25% …contacting teacher with problems and questions. 6 (14.63%) 17.39% 12.50% …not provoking the teacher. 4 (9.76%) 17.39% …trusting the teacher. 1 (2.44%) 6.25% total 41 2 (100%) 23 (100%) 16 (100%) most statements in all three groups of respondents‟ statements concern students‟ being responsible to treat their teacher with respect and accept his/her authority (e.g., “respect the teacher, even if they haven‟t earned it” (s#105sr)). this may include the sub-category of not provoking teachers, which was only mentioned by students. furthermore, students were held responsible for contacting the teacher with their problems and questions (e.g., “if you have a problem, dare to ask the teachers” (s#47sr)). 6.5 interface stc – teacher-all students interface b, comprising all those microsystems in which the teacher interacts with all students, holds all those statements regarding teaching the specific lesson. within the data categorized into this main category, 14 teacher (table 7) and 7 student responsibilities (table 8) were identified. table 7 categories, frequencies and percentages of teachers’ responsibility in their interactions with all their students. categories teachers are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …teaching good, interesting lessons adapted to their students. 262 (27.29%) 21.57% 28.02% 26.29% …showing authority and be strict – consequent classroom management. 200 (20.83%) 9.80% 22.70% 17.37% …being fair. 129 (13.44%) 15.69% 11.78% 18.31% …contributing to a positive learning atmosphere. 61 (6.35%) 13.73% 6.47% 4.23% …coming to class on time and prepared. 52 (5.42%) 7.84% 5.89% 3.29% …keeping calm and continue lessons. 50 (5.21%) 1.96% 5.60% 4.69% …intervening in students‟ conflicts and work for group‟s team spirit. 43 (4.48%) 0.00% 4.02% 7.04% …motivating their students. 42 (4.38%) 11.76% 2.01% 10.33% …(objective) assessment. 31 (3.23%) 3.92% 3.88% 0.94% …involving all students in lessons. 26 (2.71%) 3.92% 2.30% 3.76% …supervising their students. 24 (2.50%) 3.92% 2.87% 0.94% …caring for the group and being their 17 (1.77%) 1.96% 2.16% 0.47% k. helker & m. wosnitza 130 | f l r students‟ advocate. …teaching the contents required, follow the curriculum. 12 (1.25%) 3.92% 0.86% 1.88% …appropriately using homework. 11 (1.15%) 0.00% 1.44% 0.47% total 960 51(100 %) 696 (100%) 213 (100%) table 8 categories, frequencies and percentages of students’ responsibility in their interactions with their teacher and other students. categories students are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …participating in lessons and pay attention. 143 (48.81%) 35.48% 52.13% 45.95% …coming to class on time and prepared. 47 (16.04%) 22.58% 15.96% 13.51% …contributing to a positive working atmosphere. 38 (12.97%) 0.00% 14.36% 14.86% …respectful relations with classmates and teachers in class. 36 (12.29%) 29.03% 7.45% 17.57% …being interested in the contents taught. 18 (6.14%) 3.23% 5.85% 8.11% …trying to understand and learn the contents taught. 8 (2.73%) 9.68% 2.66% 0.00% …trying to meet the expectations the teacher sets for the class. 3 (1.02%) 0.00% 1.60% 0.00% total 293 31 (100%) 188 (100%) 74 (100%) while both teachers and students are attributed the responsibility in this area to come to class on time and prepared (e.g., “being well-prepared” (t#38tr)) and to create/contribute to a positive working atmosphere in class (e.g., “to try not to disturb the lessons in order not to ruin others learning success” (s#250sr)), other responsibilities differ. regarding teacher responsibility, the category most of all three groups of respondents‟ statements could be assigned to, however, was the responsibility for teaching good and interesting lessons that are adapted to the students (e.g., “adapting lessons to students” (t#7tr)). also, especially by their students, teachers are attributed responsibilities that can be linked to issues of classroom management (showing authority and being strict (e.g., “clear instructions and strict implementation” (t#57tr)). further objects of teacher responsibility were supervising the students, intervene in students‟ conflicts and establish the group‟s team spirit, keeping calm and continuing lessons no matter what happens (e.g., “sometimes just turning a blind eye” (t#43tr))). further personal relations between the teacher and their class like caring for the group and being their students‟ advocate (e.g., “a responsible teacher backs their students” (s#378tr)) as well as being fair. the latter is by some statements linked to teachers‟ responsibility to include all students in lessons (e.g., “a responsible teacher tries to get quiet students out of their shell” (s#145tr)) and (objective) assessment (e.g., “clear, transparent and explained grading” (s#113tr)). regarding these teaching aspects, teachers were also attributed the responsibility to teach the contents required (follow the curriculum) and appropriately setting homework and motivate the students (e.g., “get children excited about learning” (p#103tr)). responsibilities attributed to students in this respect somewhat complemented the above, with the students‟ responsibility for participating and paying attention in lessons (e.g., “show interest in the lesson and actively participate” (p#16sr)) being the most emphasized responsibility in all three groups of k. helker & m. wosnitza 131 | f l r respondents‟ statements. furthermore, students were stated to be responsible for trying to be interested in the contents (e.g., “interest in the contents and other students‟ answers” (t#54sr)), understanding and learn the contents (e.g., “truly trying to understand the contents rather than simply marking lesson time” (t#33sr)) and meeting expectations. statements regarding students‟ responsibility to treat their classmates and teachers in lessons with respect were also included in this category. 6.6 interface sc – student-classmates in contrast to the above, this category includes all statements regarding students‟ responsibility in their interaction with their classmates. this category was conceptually different from stc because, although it can be assumed to address classroom settings in which the teacher is also present, these are responsibilities that derive from students‟ interactions with their co-students, irrespective of whether the teacher is around. only one category was evident, namely students‟ responsibility to have positive, helpand peaceful relations with their classmates (e.g., “not bully or even hurt anyone” (p#59sr); “save others from mean people” (s#46sr)). table 9 categories, frequencies and percentages of students’ responsibility in their interactions with their classmates. categories students are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …their interactions with their classmates. 134 (100.00%) 100.00% 100.00% 100.00% total 134 17 (100%) 88 (100%) 29 (100%) 6.7 interface sp – student-parents this main category subsumes all those statements regarding those responsibilities resulting from students‟ interactions with their parents. no statements focused on students‟ responsibilities in this area. table 10 provides an overview. table 10 categories, frequencies and percentages of parents’ responsibility in their interactions with their child. categories parents are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …supporting and helping their child. 177 (26.50%) 22.47% 27.93% 25.79% …having a positive and caring relationship with their child. 124 (18.56%) 19.10% 17.04% 20.81% …keeping tabs on their child‟s learning and school life. 119 (17.81%) 24.72% 15.08% 19.46% …learning with their child. 57 (8.53%) 4.49% 11.73% 4.98% …educating their child (besides school matters). 52 (7.78%) 14.61% 5.59% 8.60% …motivating their child. 48 (7.19%) 3.37% 6.98% 9.05% …providing the prerequisites for student learning. 38 (5.69%) 5.62% 5.31% 6.33% …giving their child space and freedom. 31 (4.64%) 3.37% 6.42% 2.26% …physical and structural care for their child. 22 (3.29%) 2.25% 3.91% 2.71% total 668 89 (100%) 358 (100%) 221 (100%) k. helker & m. wosnitza 132 | f l r parents in this category were attributed broader responsibilities as for generally supporting and helping them (e.g., “support and encourage” (t#1pr)), a category which about a quarter of each group of respondents statements were assigned to. a quarter of the teachers‟ statements in this overall area also mentioned parents‟ responsibility for keeping the tabs on their child‟s learning and school life (e.g., “say when and how much i have to learn or do my homework” (s#50pr)), which was not equally strongly emphasized by students and parents. further general responsibilities assigned to parents were for having a positive and caring relationship with their child (e.g., “take time for their child” (p#3pr); “take notice of me” (s#15pr)) and giving their child space and freedom (e.g., “no pressuring expectations” (s#101pr)). further responsibilities mentioned could be subdivided into responsibilities concerning the child‟s care and education (educating the child (e.g., “parents have educated their children at home and taught them values and norms.” (t#25pr)); providing physical and structural care (e.g., “send children well-prepared (fed, well-rested, low pressure) to school” (p#71pr))) as well as responsibilities related to the child‟s school work. thus, parents were stated to be responsible to provide the prerequisites for student learning (materials, working atmosphere at home (e.g., “parents are responsible for creating the ideal conditions so that the child can show the best performance in school.” (p#77pr))), motivating their child and learn with their child (e.g., “learn with the child” (p#88pr)). while 668 statements could clearly be categorized as regarding parents‟ responsibility in their interaction with their child, no statement whatsoever could be identified as describing students‟ responsibility in these microsystems they share with their parents. 6.8 interface stp – student-teacher-parents just as in the above interface, there are also no responsibilities attributed to students in the microcontexts they share with their teachers and parents. table 11 categories, frequencies and percentages of teachers’ responsibility in their interactions with the students and their parents. categories teachers are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …their interactions with students and their parents. 5 (100.00%) 100.00% 100.00% total 5 1 (100%) 4 (100%) table 12 categories, frequencies and percentages of parents’ responsibility in their interactions with the students and their parents. categories parents are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …their interactions with their children and their teachers. 14 (100.00%) 100.00% 100.00% total 14 3 (100%) 11 (100%) at the sp interface, exchange between these participants is found in the data to mostly concern students‟ learning difficulties or underachievement. students are not seen to have much responds whereas teachers are stated to be responsible to be “ready to communicate about problems with student and parents” k. helker & m. wosnitza 133 | f l r (p#99tr)) just as parents are (e.g., “in case there is a problem (e.g., underachievement, bullying) search for a solution together with the child and the teacher” (p#81pr)). 6.9 interface tp – teacher-parents this interface comprises all statements regarding responsibilities resulting from teachers‟ and parents‟ interaction. table 13 and 14 provide an overview. table 13 categories, frequencies and percentages of teachers’ responsibility in their interactions with the students’ parents. categories teachers are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …their interactions with their students‟ parents. 27 (100.00%) 100.00% 100.00% total 27 2 (100%) 25 (100%) table 14 categories, frequencies and percentages of parents’ responsibility in their interactions with their child’s teachers and respective numbers of statements. categories parents are responsible for… total # of statements teachers’ statements students’ statements parents’ statements …their interactions with their children‟s teachers. 66 (100.00%) 100.00% 100.00% 100.00% total 66 18 (100%) 14 (1000%) 34 (100%) regarding the interaction of parents with their child‟s teacher, both teachers and parents are attributed the responsibility to communicate and cooperate with the other, contacting one another when there is a problem that needs solving (e.g., “involve parents” (p#5tr); “accessible for parents” (p#95tr); “inform parents at an early stage” (p#62tr); “work with, not against the teachers” (t#5pr); “use parents‟ nights” (t#37pr)). 6.10 exosystems the framework suggested that a microsystem an agent is involved in, may be influenced by exosystems, i.e. microsystems this person is not part of. the data provided some examples of these indirect influences between microsystems. especially parents fulfilling their responsibility at home (i.e., interface sp) were often stated to influence other microsystems: “improve the relations between the teacher and the child by encouraging the child.” (p#12pr); “if their child complains about a teacher or a subject, taking that seriously but not taking the same line but trying to find solutions” (p#70pr). students were stated to be responsible for sometimes acting as their classmates‟ advocate (“standing up for other students‟ needs and issues in the face of teachers and classmates” p#2sr). regarding teachers‟ responsibility, one parent claimed that a teacher was responsible “to not, if they have not managed to get through with their lesson plans, shift the contents as homework to the parents.” (p#159tr). furthermore, one student described how much their teacher‟s interaction with other teachers influenced the microsystems in interface b: “he is responsible for not going into a cover lesson and have other teachers say bad things about the class beforehand but he/she is responsible for judging the group by themselves.” (s#604tr). k. helker & m. wosnitza 134 | f l r 7. discussion aim of this paper was to study in how far the here developed framework is useful to structure responsibility attributions in the school context. to address this issue, data from a series of studies were coded into the postulated structure of the model in order to learn about the usefulness of the model and the nature of its elements. this analysis has provided the necessary starting point for studies of the complexity of responsibility in the school context based on the here proposed framework. overall 84 objects of responsibility in the school context were identified in this study. all of these responsibilities could be allocated to specific spheres of interaction of teachers, students and parents. the results showed that the proposed framework of responsibility in the school context is adequate for structuring responsibilities that teachers, students and parents attribute to themselves and one another in the different microsystems they share in the school context. in accordance with the theoretical considerations, that a person‟s responsibility is determined by the specific role this person incorporates (leithwood et al., 1999), the qualitative data showed that all three agents were only attributed responsibility in those contexts, in which they are actively involved. accordingly, as suggested in the literature (e.g., duff, 1998) no responsibilities for objects in specific microsystems were attributed to a person who is not involved in this microsystem, i.e., no person was assigned responsibility for objects beyond their reach. thus, no parent responsibilities were mentioned in the data for goings-on in specific lessons or classroom settings just as teachers were not attributed responsibilities for students‟ learning at home. this finding may, however, vary between different cultural contexts, school systems and naturally different stages of schooling (e.g., primary vs. high school) and should thus be focused on in further research including more heterogeneous groups in this respect the data revealed, that in some areas, agents involved are not attributed any responsibility as for example in interface sp, which students share with their parents. with students mentioning significantly less responsibilities that could be categorized into this area than teachers and parents did, it can be hypothesized that students do not perceive this area as one in which they are responsible agents. the fact, that also neither teachers nor parents mentioned any student responsibilities here, supports this area‟s minor role in student responsibility. it however constitutes the category into which most of all three agents‟ statements on parents‟ responsibility were coded. regarding parents‟ responsibilities in interactions with teachers (interfaces stp and tp), the data support ramirez‟ (1999) findings that teachers attribute responsibility for parent-teacher interactions more to parents than to themselves. also, the main focus of the statements categorized into one of the shared microsystems sometimes differed between the three groups of respondents. this was most obvious with regard to interface st, which comprises teachers‟ and students‟ responsibilities resulting from their interaction. while a third of teachers‟ statements focus on their responsibility to enhance every single student‟s learning, students and parents emphasise teachers‟ responsibility to develop a positive, caring and interested personal relationship with the students in about a third of their statements. these examples show how tensions may occur between various actors emphasizing different responsibilities or goals (lauermann & karabenick, 2011). the limits of each of the described interfaces were found to be rather clear-cut which becomes specifically apparent between interfaces sc and sp. thus, a teacher is attributed the responsibility for settling conflicts between their students when he or she realizes a problem (“keeping an eye on their students, not letting everything go just using the excuse that students should and could „sort it out by themselves‟” (p#129tr)). students however believe that teachers “should not always intervene. some things have to be left for the students to settle.” (s#262tr). thus, whether teachers fulfill their responsibility for settling conflicts among students depends on whether they think students are able to do it themselves, i.e. whether this matter is part of the microsystem only the students share, or not. this example also shows that responsibilities are not attributed once for all times. this interplay between different microsystems is further supported by the close alignment of teacher and student responsibilities in interfaces st and stc, with teachers being attributed the responsibility for k. helker & m. wosnitza 135 | f l r enhancing every single student‟s learning in interface st, which affects their responsibility for involving all students in lessons and teaching lessons that are adapted to the level of all of their students. the data also reveal the fulfillment of responsibilities in different microsystems to depend on whether responsibilities in others are met. this for example holds true for teachers‟ and students‟ responsibility for coming to class prepared, which naturally subsumes these agents‟ responsibility to work and prepare classes at home, i.e. in another microcontext. in this specific context but also generally, responsibilities identified in this study showed to be far more wide-ranging than was assumed in previous research (e.g., bourke. 1990; lewis, 2001). the finding of data revealing influences between different microsystems is closely connected to these considerations, as the possibility of a person‟s actions to (in-)directly influence other microsystems despite them not being involved in them, raises their responsibility in those microsystems they actively engage in. despite these influences having been discussed in a lot of literature on, in the broadest sense, the (social) contexts of school (e.g. epstein, 2011), the question of shared responsibility has not yet been posed. the data clearly reveal that the general responsibility for advancing student learning, all three agents share, is in fact built up of many different issues different agents are responsible for taking care of in the microsystems they are involved in. everyone is responsible for doing their share in their sphere of action, i.e. their mesosystem. this also holds true if the different microsystems a person is involved in may influence one another by means of this person‟s involvement in them, i.e. serve as exosystems to one another. thus, the data revealed a teacher being held responsible for acting in a specific way (e.g., not letting other teachers influence them regarding their opinion of the students in order to not letting these prejudices influence them in their interactions with these students.) although object and addressee of responsibility are not located in the same microsystem in this case. in regard to student responsibility, parallels between bacon‟s (1993) study of student responsibility and findings in this study can be drawn. students in bacon‟s study mentioned they felt responsible for “doing the job” and “obeying the rules”, i.e. being held responsible rather than feeling personally responsible. being a student thus comes with certain responsibilities which resemble those of an employee: punctuality, having their working material on them, discipline, trying to meet expectations, working diligently and orderly etc. interestingly, most of these responsibilities are alike for teachers and students. besides this employee‟s role, students were also found to have certain responsibilities as a learner, which go beyond the school context and are not closely controlled. as a learner, a student is responsible for showing interest and motivation, trying to understand and learn what is taught in lessons and studying the materials at home. despite these two roles‟ potential overlap, they may also become quite contradictory for students being controlled for fulfilling responsibilities by their parents and teachers. in this respect, it seems obvious that parents‟ responsibility to keep the tabs on students‟ learning conflicts with their responsibility for giving their child some space and freedom. however, this conflict should also be noted as a limitation of this study, as this study has only looked at data from teachers, students and parents involved in secondary schooling. responsibility attributions can be assumed to differ in primary school settings with parents sharing a larger part of their children‟s school lives. microsystems parents share with teachers may thus become more important and also more extended regarding responsibility attributions. comparable influences on responsibility attributions can be expected for the social, cultural and economic context of the school which have been suggested by prior research. after generally studying the usefulness of the framework for structuring responsibility in the school context in this study, these aspects should definitely be addressed in further research in order to identify influences, especially because the analyses presented here only included data from german teachers, students and parents. this study has provided an overview of what responsibilities are attributed to teachers, students and parents on different levels of the school context and how these can be structured with a theoretical framework. this framework will also allow to more closely analyse responsibility attributions within the school context and the influences these might have on other contexts, i.e. microsystems. the results of the preliminary study of teacher, student and parent responsibility, especially the multitude of named shared objects of each agent‟s responsibility, strongly suggest the need for further, also k. helker & m. wosnitza 136 | f l r quantitative research. this research might provide insights into the extent to which teachers, students and parents attribute responsibility for specific objects to themselves and one another. focusing on family-school mesosystem christenson (2004) claims that the interface between home and school may be “strong for some families, weak for others, and non-existent for others.” (p. 87). thus, further research will have to focus on different types of teachers, students and parents showing specific levels and constellations of responsibility for student motivation, learning and achievement. also, the relevance of role-taking has been shown in these preliminary analyses and shall thus be extended with regard to individual roles in a context of other agents as for example suggested by goffman (1959) or other sociological concepts of interactions (e.g., weber, 2005). furthermore, existing literature has suggested studying the role the school setting and other structural factors play with regard to responsibility perception (e.g., lee & loeb, 2000; thrupp et al., 2003). consequently, future research will have to focus on the interplay of teachers‟, students‟ and parents‟ attributions of responsibility and by specifically linking these three agents exploring patterns of responsibility attributions and their influence on student achievement and motivation. keypoints a heuristic framework for structuring responsibility in the school context is developed. the framework is validated by coding teachers‟, students‟ and parents‟ statements on their own and others‟ responsibility into the suggested model. results support the usefulness of the framework and reveal the objects of teachers‟, students‟ and parents‟ responsibility in the school context and influences. acknowledgements the authors would like to thank judith fränken for the help with coding the data and sue beltman for her many invaluable comments on the manuscript. references auhagen, a. e. (1999). die realität der verantwortung [the reality of responsibility]. göttingen: hogrefe. auhagen, a. e., & bierhoff, h.-w. (eds.). (2001). responsibility the many faces of a social phenomenon. london: routledge. bacon, c. s. (1991). being held responsible versus being responsible. the clearing house, 64, 395-398. bacon, c. s. (1993). student responsibility for learning. adolescence, 28(109), 199–212. ballard, k., & bates, a. (2008). making a connection between student achievement, teacher accountability, and quality classroom instruction. qualitative report, 13(4), 560-580. barrett, a. m. (2005). teacher accountability in context: tanzanian primary school teachers' perceptions of local community and education administration. compare, 35(1), 43-61. doi: 10.1080/03057920500033530 bastian, j. (1995). verantwortung: pädagogik zwischen freiheit und verbindlichkeit [responsibility: pedagogy between freedom and obligation]. pädagogik, 47(7-8), 6–10. bayertz, k. (1995). eine kurze geschichte der herkunft der verantwortung [a short history of the origin of responsibility]. in k. bayertz (ed.), verantwortung (pp. 3–71). darmstadt: wiss. buchges. berkowitz, l., & daniels, l. r. (1963). responsibility and dependency. journal of abnormal and social psychology, 66(5), 429–436. doi: 10.1037/h0049250 k. helker & m. wosnitza 137 | f l r bierhoff, h. w. (1995). verantwortungsbereitschaft, verantwortungsabwehr und verantwortungszuschreibung [disposition for responsibility, denial of responsibility and attribution of responsibility]. in k. bayertz (ed.), verantwortung (pp. 217-240). darmstadt: wiss. buchges. bierhoff, h. w. (2000). skala der sozialen verantwortung nach berkowitz und daniels: entwicklung und validierung [the social responsibility scale by berkowitz and daniels: development and validation]. diagnostica, 46(1), 18–28. doi: 10.1026//0012-1924.46.1.18 bourke, s. (1990). responsibility for teaching: some international comparisons of teacher perceptions. international review of education, 36(3), 315-327. doi: 10.1007/bf01876000 bracci, e. (2009). autonomy, responsibility and accountability in the italian school system. critical perspectives on accounting, 20(3), 293-312. doi: 10.1016/j.cpa.2008.09.001 bronfenbrenner, u. (1979). the ecology of human development: experiments by nature and design. cambridge, ma: harvard university press. doi: 10.1525/aa.1981.83.3.02a00220 bryan, l. a., & mclaughlin, h. j. (2005). teaching and learning in rural mexico: a portrait of student responsibility in everyday school life. teaching and teacher education, 21(1), 33-48. doi: 10.1016/j.tate.2004.11.004 christenson, s. l. (2004). the family-school partnership: an opportunity to promote the learning competence of all students. school psychology review, 33(1), 83-104. doi: 10.1521/scpq.18.4.454.26995 del schalock, h. (1998). student progress in learning: teacher responsibility, accountability, and reality. journal of personnel evaluation in education, 12(3), 237–246. doi: 10.1023/a:1008063126448 diamond, j. b., randolph, a., & spillane, j. p. (2004). teachers' expectations and sense of responsibility for student learning: the importance of race, class, and organizational habitus. anthropology & education quarterly, 35(1), 75–98. doi: 10.1525/aeq.2004.35.1.75 duff, r. a. (1998). "responsibility" routledge encyclopedia of philosophy (vol. 8, pp. 290–294). eikenbusch, g. (2009). classroom management für lehrer und für schüler: wege zur gemeinsamen verantwortung für den unterricht [classroom management – for teachers and students: ways to a shared responsibility for teaching]. pädagogik, 61(2), 6–10. epstein, j. l. (2011). school, family, and community partnerships preparing educators and improving schools (2 ed.). boulder, co: westview press. epstein, j. l., & sheldon, s. b. (2006). moving forward: ideas for research on school, family, and community partnerships. in c. f. conrad & r. serlin (eds.), the sage handbook for research in education: engaging ideas and enriching inquiry (pp. 117-137). london: sage. feiks, d. (1992). zur pädagogischen verantwortung des lehrers [about the pedagogical responsibility of teachers]. lehren und lernen, 18(4), 1–20. fischman, w., dibara, j. a., & gardner, h. (2006). creating good education against the odds. cambridge journal of education, 36(3), 383–398. doi: 10.1080/03057640600866007 galindo, c., & sheldon, s. b. (2012). school and home connections and children's kindergarten achievement gains: the mediating role of family involvement. early childhood research quarterly, 27, 90-103. doi: 10.1016/j.ecresq.2011.05.004 gärtner, h. (2010). wie schülerinnen und schüler ihre lernumwelt wahrnehmen: ein vergleich verschiedener maße zur übereinstimmung von schülerwahrnehmungen [how students perceive their learning environment: a comparison of four indices of interrater agreement]. zeitschrift für pädagogische psychologie, 24(2), 111–222. doi: 10.1024/1010-0652/a000009 goffman, e. (1959). the presentation of self in everyday life. new york: doubleday. grotlüschen, a. (2008). verantwortung und verantwortungsabwehr bei der zusammenarbeit mit bildungsfernen schichten [responsibility and denial of responsibility in working with educationally disadvantaged social strata]. in h. pätzold (ed.), verantwortungsdidaktik: zum didaktischen ort der verantwortung in erwachsenenbildung und weiterbildung (pp. 95–112). baltmannsweiler: schneiderverlag hohengehren. gurtner, j.-l., monnard, i., & genoud, p. (2001). towards a multilayer model of context and its impact on motivation. in s. volet & s. järvelä (eds.), motivation in learning contexts: theoretical advances and methodological implications (pp. 189-208). amsterdam: pergamon. k. helker & m. wosnitza 138 | f l r guskey, t. r. (1981). measurement of the responsibility teachers assume for academic successes and failures in the classroom. journal of teacher education, 32(3), 44–51. doi: 10.1177/002248718103200310 guskey, t. r. (1982). differences in teachers' perceptions of personal control of positive versus negative student learning outcomes. contemporary educational psychology, 7(1), 70–80. doi: 10.1016/0361476x(82)90009-1 halvorsen, a.-l., lee, v. e., & andrade, f. h. (2009). a mixed-method study of teachers' attitudes about teaching in urban and low-income schools. urban education, 44(2), 181-224. doi: 10.1177/0042085908318696 hamilton, l. (1978). who is responsible? toward a social psychology of responsibility attribution. social psychology, 41(4), 316-328. doi: 10.2307/3033584 helker, k., & wosnitza, m. (2014). verantwortung im schulkontext – ein systematisches review des empirischen forschungsstandes [responsibility in the school context – a systematic literature review of the state of empirical research]. unterrichtswissenschaft, 42(3), 261 – 279. höffe, o. (2008). verantwortung [responsibility]. in o. höffe (ed.), lexikon der ethik (pp. 326–327). münchen: beck. kammerl, r. (2008). divergente verantwortungszuschreibungen als problemfeld beruflicher ausund weiterbildung [diverging responsibilty attributions as a problem area of vocational education and advanced training]. in h. pätzold (ed.), verantwortungsdidaktik: zum didaktischen ort der verantwortung in erwachsenenbildung und weiterbildung (pp. 31–48). baltmannsweiler: schneiderverlag hohengehren. karakaya, s. (2004). a comparative study: english and turkish teachers' conceptions of their professional responsibility. educational studies, 30(3), 195-216. doi: 10.1080/0305569042000224170 katyal, k. r., & evers, c. w. (2007). parents partners or clients? a reconceptualization of home-school interactions. teaching education, 18(1), 61-76. doi: 10.1080/10476210601151573 kaufmann, f.-x. (1995). risiko, verantwortung und gesellschaftliche komplexität [risk, responsibility and social complexity]. in k. bayertz (ed.), verantwortung (pp. 72–97). darmstadt: wiss. buchges. korkmaz, i. (2007). teachers' opinions about the responsibilities of parents, schools, and teachers in enhancing student learning. education, 127(3), 389-399. lauermann, f., & karabenick, s. (2011). taking teacher responsibility into account(ability): explicating its multiple components and theoretical status. educational psychologist, 46(2), 122-140. doi: 10.1080/00461520.2011.558818 lauermann, f., & karabenick, s. (2013). the meaning and measure of teachers' sense of responsibility for educational outcomes. teaching and teacher education, 30, 13-26. doi: 10.1016/j.tate.2012.10.001 lawson, m. a. (2003). school-family relations in context: parent and teacher perceptions of parent involvement. urban education, 38(1), 77-133. doi: 10.1177/0042085902238687 lee, v. e., & loeb, s. (2000). school size in chicago elementary schools: effects on teachers' attitudes and students' achievement. american educational research journal, 37(1), 3-31. doi: 10.3102/00028312037001003 leichter, h. j. (ed.). (1974). the family as educator. new york: teachers college press. leithwood, k., edge, k., & jantzi, d. (1999). educational accountability: the state of the art. gütersloh: bertelsmann. lenk, h. (1992). zwischen wissenschaft und ethik [between science and ethics]. frankfurt am main: suhrkamp. lenk, h., & maring, m. (1993). verantwortung normatives interpretationskonstrukt und empirische beschreibung [responsibility – normative construct of interpreattion and empirical description]. in l. h. eckensberger (ed.), ethische norm und empirische hypothese (pp. 222–243). frankfurt am main: suhrkamp. lewis, r. (2001). classroom discipline and student responsibility: the students' view. teaching and teacher education, 17(3), 307-319. doi: 10.1016/s0742-051x(00)00059-7 lightfoot, s. l. (1978). worlds apart: relationships between families and schools. new york: basic books. marjoribanks, k. (1979). families and their learning environments an empirical analysis. london: routledge. k. helker & m. wosnitza 139 | f l r matteucci, m. c., & gosling, p. (2004). italian and french teachers faced with pupil's academic failure: the "norm of effort". european journal of psychology of education, 19(2), 147-166. maulbetsch, c. (2010). person und verantwortung zur grundlegung einer pädagogischen handlungstheorie unter dem aspekt der erziehung zur verantwortung im kontext schule [person and responsibility – about laying the basis for a pedagogical theory of cation regarding the aspect of educating for responsibility in the school context]. münster: waxmann. müller, f. (2009). verantwortung für sich selbst übernehmen: arbeitsmotivation und spielräume im berufsalltag [taking responsibility for oneself: work motivation and scopes in the work routine]. pädagogik, 61(10), 32–33. pätzold, h. (2008). vom professionellen umgang mit verantwortung [about the professional handling of responsibility]. in t. rihm (ed.), teilhaben an schule (pp. 253–264). wiesbaden: vs verl. für sozialwiss. peterson, e. r., rubie-davies, c. m., elley-brown, m. j., widdowson, d. a., dixon, r. s., & irving, e. s. (2011). who is to blame? students, teachers and parents views on who is responsible for student achievement. research in education, 86(1), 1-12. potvin, p., & papillon, s. (1992). teacher's sense of responsibility towards student achievement and their attitude. canadian journal of special education, 8(1), 33-42. ramirez, a. y. f. (1999). survey on teachers' attitudes regarding parents and parental involvement. school community journal, 9(2), 21-39. rose, j., & medway, f. (1981a). measurement of teachers' beliefs in their control over student outcome. journal of educational research, 74(3), 185–189. rose, j., & medway, f. (1981b). teacher locus of control, teacher behaviour and student behaviour as determinants of student achievement. journal of educational research, 74(6), 375–381. ryan, r., & deci, e. (2000). self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. american psychologist, 55(1), 68–78. doi: 10.1037/0003-066x.55.1.68 schleißheimer, b. (1984). die verantwortung des erziehers: vorüberlegungen zu einer ethik pädagogischen handelns [the responsibility of the educator: pre-considerations about ethics of pedagogical actions]. vierteljahresschrift für wissenschaftliche pädagogik, 60(1), 1–17. tenorth, h.-e. (2004). lehrerarbeit: strukturprobleme und wandel der anforderungen [teacher work: structural problems and change in demands]. in u. beckmann, h. brandt & h. wagner (eds.), ein neues bild vom lehrerberuf? (pp. 14–25). weinheim: beltz. thrupp, m., mansell, h., hawksworth, l., & harold, b. (2003). "schools can make a difference" but do teachers, heads and governors really agree? oxford review of education, 29(4), 471-484. doi: 10.1080/0305498032000153034 weber, m. (2005). wirtschaft und gesellschaft [economy and society]. frankfurt: zweitausendeins. weiner, b. (1995). judgments of responsibility: a foundation for a theory of social conduct. new york: guilford press. werner, m. (2006). verantwortung [responsibility]. in m. düwell (ed.), handbuch ethik (pp. 541–548). stuttgart, weimar: metzler. wosnitza, m., & beltman, s. (2012). learning and motivation in multiple contexts: the development of a heuristic framework. european journal of psychology of education, 26(2), 177-193. doi: 10.1007/s10212-011-0081-6 zimmerman, b. j., & kitsantas, a. (2005). homework practices and academic achievement: the mediating role of self-efficacy and perceived responsibility beliefs. contemporary educational psychology, 30(4), 397-417. doi: 10.1016/j.cedpsych.2005.05.003 microsoft word sjöblom et al_publication.docx frontline learning research vol.4 no. 1 (2016) 17-‐39 issn 2295-‐3159 corresponding author: kirsi sjöblom, research group of educational psychology, department of teacher education, faculty of behavioural sciences, p.b. 9 (siltavuorenpenger 5), 00014 university of helsinki, finland. e-mail: kirsi.sjoblom@helsinki.fi doi: http://dx.doi.org/10.14786/flr.v4i1.217 does physical environment contribute to basic psychological needs? a self-determination theory perspective on learning in the chemistry laboratory kirsi sjöblom, kaisu mälkki, niclas sandström, kirsti lonka university of helsinki, finland article received 1 / october / revised 28 december / accepted 8 january / available online 10 february abstract the role of motivation and emotions in learning has been extensively studied in recent years; however, research on the role of the physical environment still remains scarce. this study examined the role of the physical environment in the learning process from the perspective of basic psychological needs. although self-determination theory stresses the role of the social and cultural environment, as yet the role of the physical environment has been unexplored. the study focused on beginning chemistry university students’ (n=21) experiences in a chemistry laboratory. the data consisted of focus-group interviews and self-report questionnaires. the results indicate that the physical environment can support or thwart the fulfillment of the basic psychological needs. the usability and functionality of spaces and tools contributed to not just the fluency of the intellectual activity but also to the related emotional experience of oneself acting in a particular environment. the physical environment was a source of procedural facilitation: it complemented and challenged the students’ existing skills, contributing to their experiences of autonomy and competence. the everyday successes or struggles in the laboratory built on the students’ developing professional identity as well as their sense of belonging to the professional community. this study demonstrates that the design and functionality of the physical environment has a significant role in users’ intellectual and emotional functioning. it is essential to utilize psychological and pedagogical knowledge when designing or renovating work and learning environments in order to fully make use of the potential of physical environments as part of human performance. keywords: self-determination theory; basic psychological needs; physical environment; learning environment; indoor environment; usability sjöblom et al | f l r 18 1. introduction in recent years, the broadening field of research on the role of motivation and emotions in learning has produced important new information on how to optimally arrange the study environment (see e.g. csíkszentmihályi, 2014; dweck, 2006; heikkilä & lonka, 2006; heikkilä, lonka, nieminen & niemivirta, 2012; hidi & renninger, 2006; job, walton, bernecker & dweck, 2015; lindblom-ylänne & lonka, 2000; mälkki, 2010; ryan & deci, 2009; seligman, ernst, gillham, reivich & linkins, 2009; tuominen-soini, salmela-aro & niemivirta, 2008). strikingly, even though knowledge on the study environment, and especially its social attributes, is vast, knowledge on how the physical environment is related to psychological and pedagogical phenomena as yet remains scarce (sandström, sjöblom, mälkki & lonka, 2013; beard 2009, 2012; lansdale, parkin, austin & baguley, 2011; lonka, 2012; woolner, hall, higgins, mccaughey & wall, 2007). intellectual and emotional functioning is always nested in the physical environment, even when working in virtual learning environments. however, most of the research on physical environment has traditionally focused on minimizing its negative effects on health or determining how individuals interact with the environment on a perceptual level (see e.g. alfonsi, capolongo & buffoli, 2014; evans, bullinger & hygge, 1998; parsons & hartig, 2000; ulrich, 1981), rather than on unveiling the role of the physical environment with regard to cognitive and emotional functioning. this study examines the role of the physical environment in supporting learning and basic psychological needs. previous research has indicated that the physical environment is far from irrelevant with regard to intellectual functioning: the design and functionality of the physical environment contribute to physically distributed intelligence (norman, 1993), stress over safety issues and the cognitive capacity available for higher intellectual functioning such as learning (sandström, sjöblom, mälkki & lonka, 2013). being organized in a given way, the physical space also conveys assumptions and ideologies (beard, 2012; beard & price, 2010) e.g. on the activity taking place and, as such, tunes the users into different mental modes and roles (mälkki, sjöblom & lonka, 2014). thus, similarly to the social environment, the physical environment can be seen as either facilitating learning and well-being or posing a challenge to them. moreover, of particular interest is the emotional experience related to the activity taking place in a given physical space. this experience may likely bear meaning in the process of forming a relation to the place and, more broadly, of developing one’s identity as a professional in a given field. in modern-day society people spend most of their time in indoor environments, and new multidisciplinary information is needed on how to design these spaces to best support the activity expected to take place in them. both human resources and physical spaces are valuable and costly resources: typically around 90% of business operating costs consist of direct or indirect staff costs (alker et al., 2015), and as to physical spaces, expensive indoor environments need to be used efficiently. at the same time, the industry policy of most western societies prioritizes innovation. we need to acquire further knowledge on how to facilitate the thriving of the human potential by creating fruitful grounds for it. when designing physical learning spaces, it is essential to not only take into account the most fundamental needs of the students, but also to gain understanding on the relations between the physical surroundings and the more refined psychological processes. these issues are focal in both learning environments and environments dedicated to other purposes, such as work or recreation. finally, it is not quite enough to focus on the design and functionality of physical space and tools as such. the use of available premises and equipment is essentially determined by the social practices applied in them; for instance, technology advances learning only through transformed social practices (hakkarainen, 2009; paavola, lipponen & hakkarainen, 2004). thus, although in this article we examine the role of the physical environment in the fulfillment of basic psychological needs, we do not assume that it is only a matter of a relation between the individual and the physical environment. rather, we approach the theme from the perspective that the users’ experience of the physical environment is mediated by social practices and culturally shared meanings. in a broader sense, we are approaching the intriguing interplay between the human and the material, as well as the intellectual and the emotional. sjöblom et al | f l r 19 2. theoretical framework 2.1 basic psychological needs in this study we approach questions of learning and well-being with regard to the physical learning environment from the perspective of basic psychological needs as laid out by the self-determination theory developed by deci and ryan (1985, 2000, 2008; ryan & deci 2000, 2009). this is a macro-theory of human motivation, personality development and well-being that focuses especially on volitional behavior and the surrounding conditions that support it (ryan, 2009). the theory views all human beings as inherently self-determined, actively evolving organisms, with a natural aspiration for continuous psychological development and growth. however, in order to these propensities to be actualized, the satisfaction of basic psychological needs must be sufficiently supported. according to the theory, the social and cultural environment can support the satisfaction of basic psychological needs and the self-determined behavior to varying degrees. thus, the process of growth is essentially seen to take place in relation to the surrounding conditions that, for their part, contribute to the individuals’ possibilities to embrace their full, natural potential. aligned with this emphasis, it is also relevant to study in more detail how the physical environment may, for its part, contribute to the interaction between the individual and the environment and the fulfilment of basic psychological needs (e. deci, personal communication with the first author, october 28, 2014). self-determination theory is currently one of the most prevalent and utilized theories on motivation. in the decades following the formal introduction of the theory in the 1980s, research on the theory has dramatically increased. consequently, the theory has been subject to criticisms and suggestions for further development as well. a common criticism of the theory is its cultural applicability, posing that the core features of the theory, such as the need for autonomy, are mainly descriptive of a western individual, rather than of people raised in and surrounded by more collectivist cultures (e.g. iyengar & devoe, 2003; markus & kitayama, 1991). however, further research has verified that psychological needs are equally imperative with regard to psychological well-being in both individualistic and collectivist cultures (e.g. chirkov, ryan, kim & kaplan, 2003; ryan & deci, 2006). the formal framework of self-determination theory consists of five mini-theories (ryan, 2009). this study focuses on the mini-theory of basic psychological needs. the theory states that all people, universally and regardless of their age or gender, share the same basic psychological needs, namely the needs for autonomy, competence and relatedness. these needs are seen to be central prerequisites with regard to healthy human functioning. autonomy refers to perceiving oneself as the origin or source for one’s own behavior (deci & ryan, 1985; ryan & connell, 1989; ryan & deci, 2002, 2006), competence refers to a felt sense of confidence and effectance in one’s own actions (ryan & deci, 2002), and relatedness refers to feeling connected and having a sense of belonging with regard to both other individuals and with one’s community (baumeister & leary, 1995; ryan, 1995; ryan & deci, 2002). in order to function effectively and to be psychologically healthy, these needs must be sufficiently satisfied (deci & ryan, 2008). more specifically, the satisfaction of basic psychological needs is relative to the activity and functioning pursued; needs may be seen to specify necessary nutriments with regard to healthy development and vitality as well as constructive and creative outputs (deci & ryan, 2002). thus, rather than being a goal in itself, the satisfaction of basic psychological needs is seen to facilitate intrinsic motivation, learning and well-being (niemiec & ryan, 2009; ryan & deci, 2009) as well as eudaimonic happiness (ryan, huta & deci, 2008). the theory of basic psychological needs is widely studied empirically, including in the context of learning in higher education (see e.g. black & deci, 2000). in particular, the need for autonomy and the possibilities to support it have acquired much needed attention in the context of learning and instruction (see sjöblom et al | f l r 20 e.g. jang, reeve & deci, 2010; niemiec & ryan, 2009; soenens, sierens, vansteenkiste, goossens & dochy, 2012; vansteenkiste et al., 2012). however, the research has predominantly focused on the social aspects of the learning environment, such as the interaction between the students and the teacher, while the research on the psychological needs of an individual with regard to the physical environment has been extremely scarce (see e.g. gay, 2008; gay, saunders, & dowda, 2011; rutten, boen & seghers, 2012). 2.2 the learning environment similarly to the research on the basic psychological needs, research on learning environments has mainly focused on the social learning environment while the physical learning environment has for the most part been ignored. for example, lave and wenger’s idea of legitimate peripheral participation (1991) places high importance on social engagements that provide the proper context for learning to take place. by participating in the activities of an expert community, a novice is gradually able to assimilate the professional practices and become part of the community. these kinds of views stress the role of the social learning environment in the development of professional abilities, yet neglect the physical environments in which the social activity takes place. empirical research on physical environments, on the other hand, has traditionally focused on factors related to physical health or discomfort (e.g. küller & lindsten, 1992; winterbottom & wilkins, 2009). knowledge on how the physical environment, i.e. physical spaces, tools and equipment, is related to psychological and pedagogical phenomena is still rare (lansdale, parkin, austin & baguley, 2011; lonka, 2012; woolner, hall, higgins, mccaughey & wall, 2007). while the importance of individual characteristics and the social environment should not be underestimated (e.g. perry, turner & meyer, 2006), the role of the physical environment in the learning process calls for more rigorous attention in the field of learning research. more knowledge is needed on how the physical environment can support learning, wellbeing, engagement and commitment. research on learning environments has shown that the physical environment conveys assumptions (beard, 2012; beard & price, 2010) and activates students’ previous assumptions regarding similar environments (mälkki, sjöblom & lonka, 2014). the assumptions conveyed by the physical environment may involve underlying conceptions on the learning process and the roles of the participants: an auditorium implies a different positioning and division of roles than a classroom where the desks are organized in groups and the teacher has no central position but is instead moving around the classroom on a chair. this demonstrates how the physical space itself tunes the students into different mental modes and roles. the arrangement of physical space in ways that the participants are not used to may as such turn into a disorienting dilemma, challenging existing conceptions and ways of thinking and possibly triggering reflection (mälkki, sjöblom & lonka, 2014). thus, the space or equipment cannot be seen as a separate entity, detached from the present culture. rather, social practices are embedded in the physical arrangements (hakkarainen, 2009) and also have an impact on how the physical environment is perceived and experienced by the users. along with the idea of socially and physically distributed cognition (hakkarainen, palonen, paavola & lehtinen, 2004; hutchins, 2000, 2006), physical environments also vary with regard to the degree they facilitate the activity that is expected to take place in them. for example, the space may be equipped with modern technology and devices that assist the learning process, which makes the learning process markedly different from one that is carried out without any needed assistance, such as calculators, to begin with. the very fact that learners are able to choose a suitable environment for different learning tasks is helpful with regard to completing the tasks. a concrete example of this might be having to work on a group assignment in a silent library hall or endeavoring to understand new theoretical material in a noisy hallway. in fact, the physical environment consists of affordances that may, at best, facilitate the development of new skills, help people overcome the limitations of their own capabilities and make them feel like active agents; or in contrast, the lack of needed affordances may pose a significant challenge to carrying out the sjöblom et al | f l r 21 expected activities, handicapping the cognitive functioning in the space and making people feel incapable of performing the expected tasks (sandström, sjöblom, mälkki & lonka, 2013; norman, 1993; sandström, eriksson, lonka & nenonen, 2015). thus, the physical environment for its part offers a varying degree of procedural facilitation (bereiter & scardamalia, 1987) of the aspired activity. if for instance students lack enough space for their work or constantly have to worry about unclear safety issues, these issues inevitably take a toll on the cognitive resources available for learning (sandström, sjöblom, mälkki & lonka, 2013; see also sandström, ketonen & lonka, 2014). thus, a dysfunctional environment may be handicapping with regard to intellectual activity at the most basic level. consequently, we postulate that the design and the functionality of the physical environment play a role in the students’ experiences related to the basic psychological needs. 2.3 the context of the study: exploring the basic psychological needs in a chemistry laboratory learning environment in our study we focus on beginning university chemistry students’ learning, in particular on their experiences during laboratory work, in order to unveil the dynamics between physical environment and basic psychological needs. chemistry as a study context offers an intriguing and relevant terrain for researching this interplay. namely, the physical laboratory environment, which includes not only desks and chairs but also the diverse and complex laboratory instrumentation, is especially focal in learning chemistry. focusing on the experiences of first-year students is fruitful from the perspective of their emerging sense of relatedness to the professional field. furthermore, sense of autonomy and competence are expected to develop in a study context, which, similarly to a working environment, represents a performance-oriented environment. in addition, the aforementioned topics may be particularly present in the students’ experiences in the beginning stage of their studies. as argued earlier in the text, current research on basic psychological needs in study contexts has focused especially on students’ sense of autonomy. this is a central question as a study context has traditionally been an environment where the action to a large extent is guided by the teacher, while at the same time, the students have the need to develop their sense of autonomy and competence in the field. this need for a constructive friction between the students’ existing capabilities and an appropriate amount of guidance provided by the teacher has also been addressed by vermunt and verloop (1999). in our view, it is important to look more closely at the emerging sense of relatedness with regard to the study community and the physical premises, and more generally, to the professional field. in the chemistry context this may have particular importance: for example, in finland many students discontinue their chemistry studies after a year or two. for some of these students, this may be due to a transfer to pursue studies in the faculty of medicine, where the chemistry studies serve as a platform to develop the abilities needed to be accepted into that faculty. however, this is not the case for all of the students who drop out of their chemistry studies. in order to increase understanding on student experiences in the chemistry learning context, it is particularly interesting to examine the role of the physical learning environment from the viewpoint of psychological needs and the support the physical environment could offer for learning. what are the most central characteristics of the physical environment that contribute to emerging experiences of competence, autonomy and relatedness? if we are able to consider the psychological needs of the students when designing learning spaces, we can create a fruitful ground for thriving, productive students and, at best, further current understanding on how to design leading university campuses (lonka, 2012; nenonen, kärnä, junnonen, tähtinen & sandström, 2015). sjöblom et al | f l r 22 3. the aims of the study this study explored the role of the physical environment with regard to learning from the perspective of basic psychological needs. the research questions were as follows: 1. what is the role of the physical environment in the experience of the basic psychological needs? 2. what is the role of the physical environment in the learning process from the perspective of the basic psychological needs? aligned with the theory of basic psychological needs, we postulated that the satisfaction of these needs is not a goal as such, but rather a facilitator with regard to productivity and well-being. consequently, it was relevant to study the experience of the basic psychological needs in relation to the activity pursued, that is, learning. we hypothesized that if the physical environment contributes to the experience of basic psychological needs, this may have a mediating effect on the process of learning; by supporting the fulfillment of basic psychological needs, the physical environment may facilitate learning and study engagement. in addition, we aimed at furthering the interactional perspective of the theory of basic psychological needs by considering the role of the physical environment in facilitating or posing a challenge to the fulfillment of the core needs. rather than focusing on individual experiences regarding the core needs, our emphasis was on exploring the dynamics of the phenomenon on a more theoretical level. in order to capture the diversity and depth of the student’s experiences regarding this fairly new research topic, the study approached the relations between the physical environment and basic psychological needs with qualitative methodology. while much of the research on motivation is based on self-report questionnaires in order to measure individuals’ views and beliefs, classroom observations and interviews can provide a richer depiction of situated motivation (wigfield, cambria & eccles, 2012). 4. method 4.1 participants the participants of the study were beginning-stage chemistry students (n=21, representing both genders) from a finnish university. the participants were selected based on their willingness to participate as well as the appropriate timing of their current laboratory project; in other words, participation in the interview and selection for a particular focus group also depended on whether they were able to leave their laboratory work for an hour to complete the interview. 4.2 materials the data consists of focus group interviews and questionnaires that were completed by each participant individually before entering the interview. the questionnaire served as an orientation to the interview, whereas the qualitative analysis is based on the material from the focus group interviews. the questionnaire included both open-ended and multiple choice questions. the themes of the questionnaire focused on helpful and challenging aspects of the physical environment with regard to learning as well as typical study-related use of physical spaces, equipment and technological devices: a) sources of interest and engagement in the laboratory work (open-ended), b) sources of challenge and difficulty in the laboratory work (open-ended), c) typical study-related use of technological tools in learning (multiple choice questions assessing the frequency of the use on a scale 1-6; e.g. smartphone, laptop), sjöblom et al | f l r 23 d) typical study-related use of spaces in learning (multiple choice questions assessing the frequency of the use on a scale 1-6; e.g. library, hallways, cafeterias, home), e) concrete tools, equipment or other aspects of the laboratory work that are experienced as particularly well-functioning or engaging (open-ended), f) concrete tools, equipment or other aspects of the laboratory work that are experienced as particularly cumbersome or counterproductive with regard to learning (open-ended), g) other comments and suggestions with regard to the physical learning environment (openended). the interview elaborated on the same questions with the group. 4.3 procedures 4.3.1 interviews semi-structured focus-group interviews in groups of three to four students were collaboratively carried out by two of the authors. the interviews were conducted contextually in the middle of a laboratory work session. the students completed the questionnaires in the actual laboratory space, an organic chemistry laboratory, and the interviews were carried out in an adjoining room in order to ensure privacy and focused environment. by having the students complete the questionnaire individually before entering the interview, we aimed at giving the students the space to reflect on the topics based on their own experience and perspective first, and the views could then be elaborated further in the group. the interviews followed an interpretivist approach (scott & usher, 1999; williams, 2000), aiming at "making sense of actor's actions and language within their 'natural' setting" (williams, 2000). the method of the interview was designed to leave space for the participants to freely discuss themes that they experienced as important. as the topic of the research is rather new, the structure of the questionnaire and the interview had to be open enough not to restrict the participants but to genuinely leave space for unexpected material and directions, regardless of the preconceptions or hypotheses of the researchers. moreover, the phenomena and the related experiences are such that a clearly articulated view from the students is hardly expected; rather, the data had to be approached in a holistic way to seek understanding on the phenomena. thus, the questions were formed rather open so that the interview and the discussion in the groups could develop the topics further. as a result, the interview data brought about a rich milieu of aspects of the students’ experiences, beyond the expected themes and hypotheses. 4.3.2 analysis the interviews were transcribed verbatim, and the transcriptions were then analyzed by the authors iteratively with the help of the atlas ti program. repeated stages of individual and collaborative analysis were conducted to find central categories and patterns in the reported experiences. the initial stage of the analysis and classification was data driven in order to capture unforeseen observations and patterns in the data. when the researchers gathered to discuss the initial results of the first round of analysis that each had conducted individually, it was noted that despite the differences in the conceptualizations and terminologies of the classifications among the researchers, many of the central themes and categories fell into the dimensions of basic psychological needs. the results of the first round of analysis supported the theory of basic psychological needs as a relevant theoretical approach through which to frame the findings and acquire further understanding on the role of the physical environment in the learning process. the following rounds of analysis focused on elaborating specifically on this approach with continued iterative individual and collaborative rounds. sjöblom et al | f l r 24 indeed, the theory of basic psychological needs was not yet our framework when collecting data. we aimed at more generally unveiling the role of the physical environment in the learning process. along with the data-driven analyses and initial findings, we started seeing the relevance of further rounds of analysis from the perspective of basic psychological needs. consequently, the analysis is by no means exhaustive with regard to the relation between basic psychological needs and the physical environment but rather an opening for research on the topic. this study was a deepening reanalysis of previously analyzed data (sandström, sjöblom, mälkki & lonka, 2013) on the chemistry laboratory as a physical learning environment. the previous study shed light on the role of the physical space in the learning process: the physical space may contain guidance implemented in it, and the physical space and its usability contribute to the students' sense of safety, which in turn is crucial when students are expected to engage in demanding cognitive activities. however, early in the initial phases of analysis, it seemed that in addition to the aforementioned findings, the data also offered intriguing perspectives on the dynamics between the physical environment, learning and experiences of oneself as a learner in that given environment, which deserved a deepening reanalysis. the authors represented different fields of expertise, namely educational psychology, clinical psychology, adult education and linguistics. the analysis aimed at utilizing and building on the diversity of the scholarly backgrounds of the researchers to explore different approaches to the phenomena as well as reach understanding on the core features presented in the data. as with the participants of the study, we aimed at both capturing the individual approaches and views as well as elaborating them further by combining the views and abilities of the whole group collaboratively. most of the work on the study was carried out collaboratively, with the team of researchers working on the material and writing the text in the same physical space, which added value to the depth of the analysis, as opposed to each researcher separately adding their own share of expertise to the study (hakkarainen, palonen, paavola & lehtinen, 2004). moreover, the researchers altered and modified the physical spaces in which they were working during the research process. choosing a suitable physical space with the required technological tools to accommodate a given work assignment, for example, a collaborative writing session, brought further understanding on the role of the physical space in the work process itself. the approach to the current study was abductive by nature; by utilizing the theory in the analysis of the data, we aimed at a deeper understanding of the phenomenon as well as at furthering the theory. our main focus was on the dynamics between the physical environment and the experiences of the learner rather than on a purely deductive approach driven by an emphasis on testing the theory. from a methodological point of view, our aim was not to cover all possible variations of the interplay between the physical environment and the psychological needs in the context of chemistry studies. rather, our study was aimed at serving as an opening for research on previously unmapped ground. even though the sample size can be seen as a limitation of the study and a broader sample could have been advantageous, from a theoretical point of view (see mälkki, 2012) the data were rich and offered relevant material for an exploratory analysis on the dynamics of the topic. 5. results in the following sections we will focus on how the three core needs, autonomy, competence and relatedness (ryan & deci, 2002), manifest in relation to the physical environment and the learning context. as our approach stretches the theory of psychological needs out of its usual sphere of application, we employed an abductive approach to be open to dynamics of the phenomenon that are not readily conceptualized in self-determination theory. for analytical clarity, we will in the following first examine each dimension individually, and secondly we will discuss how these dimensions are intertwined in the data. sjöblom et al | f l r 25 5.1 autonomy 5.1.1 physically mediated guidance and the use of modern technological devices in supporting students’ sense of autonomy within the context of learning and instruction, the issue of autonomy is often regarded to predominantly concern the balance between the control over one’s work and the received guidance, which is usually seen as socially mediated. students need sufficient guidance and should not be “abandoned,” but the teacher should not regulate or perform on behalf of the students the tasks and challenges that they already master, thus disturbing the sense of autonomy experienced by the students. as for the laboratory as a physical entity, guidance may be seen not merely as socially mediated but also as physically mediated (sandström, sjöblom, mälkki & lonka, 2013; hutchins, 2006); information may be embedded in the physical space itself. for instance, different tags and signs can be seen as affordances (norman, 1993) that assist individual information processing. they help people overcome the boundaries of their intellectual capacities. thus, the workspace itself can be seen as cognitively structuring, also with regard to the clarity of close surroundings such as desks. architecturally, the space itself may also communicate information, which is the case, for instance, when signs in a hallway are not needed to locate the corridor to the restrooms. on the other hand, a lack of needed information or tools provided by the physical environment can reduce one’s prerequisites for performing various tasks, either practical or intellectual, in the space. not only does this happen factually, but this may also challenge the experiences of one’s own ability and autonomy, at worst bringing about a sense of inability due to a dysfunctional environment. in this sense, properties of the physical surroundings become incorporated as capabilities of the individual. the information embedded in the physical environment may also reduce the need to seek instructions for tasks on the very basic level of functioning, such as finding the appropriate equipment to perform a given task. in contrast, if a student is not capable of navigating independently in the space without constantly asking for information on the most basic level, this can be harmful not only for the process of learning but also for the sense of autonomy experienced by the student. in fact, the guidance provided by the physical space itself may be seen as more supportive of the autonomy of the students as they take on a more active role when searching for the needed information from the physical environment, as opposed to being socially given the information that the teacher assumes that they need. they are “the origin or source for one’s own behavior” (deci & ryan, 1985; ryan & deci, 2002), and the more they can autonomously direct their study-related behavior in meaningful ways, the more they themselves are in control of the learning process. for these purposes, the physical environment may provide not only information and guidance, but also tools for searching for the requisite information. for example, the students described their frequent use of modern technology, such as smartphones, tablets and laptops, in searching for relevant information. the use of modern technologies was experienced as handy and quick in comparison to searching for the information from the library. it also seemed that the use of modern technology was at times more supportive of the students’ sense of autonomy as it reduced the need to lean on the teacher as a source of information within the laboratory space. however, some students found that the physical space did not accommodate the use of modern tools as well as they would have hoped. the students reported that workspaces crowded with chemical equipment often did not leave room for laptops even though they would have been an important part of the study process. while independent search for information requires self-directedness, it also changes some of the social aspects of having to ask for additional information. instead of presenting his or her imperfections, the student can independently approach the question and, optimally, succeed in solving it. at best, this may foster the student’s sense of autonomy. in addition, providing information in excess through various physical modalities is hardly a risk, whereas with socially mediated guidance this can often be a challenge: sjöblom et al | f l r 26 at times the assistant may come and do the thing for you, and it would be nicer to get to do it yourself, just to take the instructions and try to get something out of it. sometimes when you’ve wanted help, verbally or such, then the assistant has come and put together that instrument there and taken care of it. the independent work to me too is great, really... at least for me, even though group work is okay and nice but if the other person gets things faster and better, then i’m just like, the other person says well go find this and i do and i’m getting nothing about anything, --so then you have to take responsibility for your own work and understanding too. indeed, in light of this data, the role of socially mediated guidance in learning was as underlined as it was dilemmatic. by socially mediated guidance we refer to the support that the student receives in the learning process either from teachers or from fellow students. while the students appreciated the space and freedom to process things themselves and be independently responsible for their progress in the chemistry tasks, they felt a strong need for reassurance that they are progressing in the right direction. many students emphasized the importance of receiving social confirmation and affirmation for their assumptions either from their peers or from the teacher. 5.1.2 the volitional nature of the study activities finally, when asked about the meaning of the physical environment in their studies, throughout the data many of the students mentioned how being able to practice in the actual laboratory brought a sense of meaning and purpose to their studies. the activities performed in the laboratory demonstrated why they were there in the first place, what they would be doing in the future, and why they should proceed and advance in their studies: here the students are doing their work and the assistants are only there to see that nothing particular is happening. in the future if you’re working in the laboratory, there will probably be no one telling you to “do this, do this”. instead, you have to use your own head when you’re working there, and here you get to practice that. for me too, with this instrumentation that i’ve never got to use before, it is a fine feeling of ‘hey this is how it works’; there are levers and tubes and glass and all kinds of things gathered there. it is awfully great to get to use things that you never have before. and overall, the engagement of the laboratory work, that feeling when you’ve actually succeeded, you have that aspirin weighed, measured, everything checked – that feeling: yes i’ve accomplished something today! even though it’s nothing bigger than some ten grams of aspirin, still. as was clearly manifested in the students’ reports, the physical spaces and tools enabled study processes that were highly valued by the students and appeared to strengthen not only the sense of autonomy but also competence and relatedness to their professional community. 5.2 competence 5.2.1 the importance of practical conditions on intellectual and emotional functioning: ergonomics, usability and the fluency of the activity in the physical environment in a study context the need to be able to perform and accomplish tasks is accentuated. a predominant feature of a chemistry laboratory as a study context is that it involves concrete activities with physical equipment and tools. when asked about helpful and challenging aspects of the physical learning environment, the students brought up the importance of ergonomics in the laboratory settings. they sjöblom et al | f l r 27 mentioned how their work can be significantly disturbed by challenging external conditions, for example, when they have to work in unergonomic positions. this was evident in the experience of a student who reported having at times to do his laboratory work “in a highly confined space in a fetus-like position.” with that example in mind, one may recognize how the physical environment may have a fundamental effect in hindering or disturbing the student in applying his competence to the task at hand. the questions of usability were equally important regarding modern tools such as technological devices and software. if the prerequisites for accomplishing a task are not taken care of and the environment does not provide the needed procedural facilitation, the student cannot experience himor herself as competent in the given physical environment. in consequence, these kinds of external factors may lower the internal sense of competence; the functionality of the physical environment may not only have an enabling role with regard to the concrete activity, but there is an essentially emotional component to this as well. equipment and tools, traditional or modern, a chair or a smartphone, may hinder one’s experienced competence but also elevate it and take it to the next level. 5.2.2 the physical environment and tools: tangible indications of competence and sources of engagement on the other hand, the students frequently brought up that proper and well-functioning practical tools offered them a concrete indication of competence and accomplishment as well as a source of engagement in the learning process: i do like it that with the kind of proper practical tools one can practice making real things, that it’s not just all on the pages of the books, that it motivates and in my opinion grows that confidence, hey i could do this, hey this resulted in such a good yield. for me, that inspires me to go forward in the studies. it appeared that at best, the equipment offered stimulus for a positive, reinforcing cycle when the student was able to master, put together and utilize equipment initially experienced as strange and intimidating due to its complexity and sophistication: to me, successful reactions or syntheses help me greatly [to engage in learning]. and special and new equipment too, that you get to familiarize yourself a little with, you wonder what to do with them and they look completely strange, and you have absolutely no clue what to do with them. and then someone clears that up for you and you’re like “aah okay!” it is so nice! … especially when putting together the distillation apparatus for the first time, it was absolutely horrible, and such an awful chaos! but now that you’ve done a fair amount of that, you’re, well… it is wonderful to notice that it doesn’t take 15 minutes of agonizing anymore, you take the right instruments almost automatically. in these cases the elements of the physical environment that used to communicate strangeness became familiar and meaningful. instead of communicating difficulty and incapability, they offered a sense of mastery as well as an indication of progress in the learning process. more broadly, the mere observation that with time and practice the student could navigate and function in an environment that in the beginning had been fairly demanding may also be seen as a positive indicator of the learning process and the development of competence in the context. this kind of feedback on one’s abilities, stemming from the mundane concrete doings in the laboratory and involving both the cognitive and the emotional dimension, may be seen to be functional in nature as it is not given by someone else but emerges through the experience of success in a practical task. 5.2.3 the challenges of competent functioning in the complex physical environment: providing cognitive structuring and procedural facilitation in the space itself in order to successfully function in the laboratory environment, the students need to not only have the appropriate theoretical grounding and understanding of the phenomena, but they also need to familiarize sjöblom et al | f l r 28 themselves with the social practices of applying the information in practice in a given field. this is not straightforward as the shared practices are often in the form of an expert’s silent information, which can be best assimilated by participating in the actual procedures and operations, or by becoming part of the professional community. this, however, may be challenging as the time spent in the actual laboratory setting is limited. many students reported feeling that they were expected to be more competent in laboratory work than their actual level of competence was. the laboratory environment was highly complex and demanding for them to begin with. for example, the students mentioned that watching a security video once does not necessarily mean that they have assimilated the information and would be able to take the crucial points into account when working in the laboratory setting. this, for many students, resulted in recurrent uncertainty and pondering over safety issues: in theory you do know these things since you’ve studied the course on safe work in the laboratory, but then when you come to strange circumstances like these, it may happen that that part of your brain is not working, and you’re like, there’s all the rest of the hustle and bustle and the poisons there. in terms of the theory of flow (csíkszentmihályi, 1988; see also inkinen et al., 2013), if the challenges of the task are considerably higher than the students’ abilities to respond to them, the students are at risk of experiencing predominantly worry and anxiety, which does not facilitate their learning or wellbeing. if students are frequently experiencing failure and inability with regard to the expectations rather than meeting the expectations and noticing progress in their learning, the students’ sense of competence in the given physical environment may be hindered. the more complex the activity and the environment, the more cognitive structuring is needed. as mentioned earlier in the text, by physical means this scaffolding can be provided e.g. by adding tags, signs and information boards as well as paying attention to the overall clarity of the physical environment. 5.3 relatedness within research on learning, relatedness has mainly been studied in relation to a given social community, such as a professional community, instructors or peer students. although feelings of relatedness may not be connected to the mere physical surroundings, we considered it important to study the role of the physical environment from the viewpoint of an emerging sense of relatedness to a professional community. more specifically, based on the analysis, it appeared that the students referred to the role of the physical environment as part of their experiences of belonging to given physical premises or the lack of belonging. therefore, in the following we will also use the notion of belonging when approaching questions of relatedness with regard to the physical environment. while the dimensions of autonomy and competence were particularly central in the interview data, experiences of relatedness were less prevalent, which seems to be an important observation since within the field of chemistry there seems to be a challenge with regard to students' commitment to their studies and to the professional field. as the students described their relation to the physical space, two central themes became relevant. firstly, the students perceived different kinds of study activities to belong to different physical surroundings and associated a certain value to them. secondly, as elaborated earlier, how the physical environment accommodates the activities expected to be performed in it has importance with regard to the emerging sense of competence. consequently, the study activity, be it fluent or laborious, contributes to how easy or difficult it is for the students to proceed and succeed in their study tasks and influences how the students view themselves when working in that particular environment. in a broader sense, this bears relevance to their developing sense of relatedness to the professional field. in the field of chemistry, the laboratory surroundings are an especially central if not inseparable feature of the work itself, and therefore chemistry sjöblom et al | f l r 29 offers an intriguing terrain for studying the role of the physical environment with regard to a broader formation of relatedness. in the following we will discuss each of these points in more detail. 5.3.1 from hallways to lecture rooms: spaces of status, ownership and functionality with regard to physical space and the sense of relatedness, for the students it was important to have certain physical spaces as anchors for their activities so that they could repeatedly utilize certain spaces instead of floating around without a “home” for their activities. when asked about their preferred study environments, the students seemed to experience most ownership and belonging with regard to spaces where the activity is not instructed but rather informal, such as the tables and chairs in the hallways, libraries, the student union room and, obviously, home, that is, spaces which the students were able to enter and use on their own and where the role of teacher was not as predominant. in fact, the spaces in which the students seemed to experience belonging often were also such that supported the students’ sense of autonomy, both with regard to being able to choose and enter the space rather freely and self-directedly, as well as to the nature of the activity taking place in the space. indeed, just the very fact of being able to choose between differentiable and flexible spaces in order to best accommodate the given study activity may be seen as supporting students’ sense of autonomy and their active role in guiding their own learning process. while most of the spaces utilized by the students were not officially designated for any specific task, the students nevertheless seemed to have a clear vision regarding which spaces they would use for which study activity. for informal tasks, such as group work, the students reported choosing mainly informal environments, such as university hallways, cafeterias or public transportation. the faculty library or classrooms, instead, were perceived as natural venues for pursuing more serious and ambitious studying. further, the students associated a certain value with certain study environments. some students seemed to regard as “proper learning” those study activities that were situated in formal learning environments. for pursuing “serious study activities,” students reported choosing formal study environments, such as the faculty library. the laboratory environment, clearly being a formal learning environment, was regarded as an environment where serious study activity and “proper learning” takes place. in contrast, the activities conducted in informal environments, such as group assignments, were not described as worthy and official, albeit that these study activities may be highly essential in the process of learning. in fact, based on the students’ reports, collaborative study activities were not recognized as learning as clearly as individual work, either when instructed by a teacher or accomplished alone. specifically with regard to the laboratory environment and the related sense of belonging, some of the students indicated that in their experiences the laboratory space is not a space that belongs to them in the first place. rather, many students perceived themselves as visitors in this space that is occupied by others, such as teachers, more advanced students and the researchers who are its main users. 5.3.2 welcoming, functional and dysfunctional spaces: allowing users to be human the issue of belonging may also be seen as related to how the space communicates with work and tasks. thus questions of usability become relevant: the dysfunctionality or impracticality of the environment does not support experiences of one or one’s work belonging in the given space. the space can be seen as inviting or welcoming in relation to the individual’s own functioning; for instance, how the space is designed to meet the ergonomic needs of users builds experiences of fluency vs. laboriousness: maybe the most important thing in interior design would be functionality, as you have people of different sizes, the adjustability of the surroundings, so that the work would be ergonomic. if you have to reach something from high above, that you would have some tool or a strategy, whatever it is, so that you can reach things from above safely. at times when you are taking those poisons from somewhere terribly high, me too, a small person, it is a bit like, will it come down and will my hand slip… sjöblom et al | f l r 30 if the environment is predominantly uncomfortable and performing tasks in it is cumbersome, this does not enhance the experience of being capable or, more broadly, belonging to function in that space: sometimes you kind of know what you’re doing or what you’d like to do, but somehow you can’t as the instrument…or the practicalities don’t always work. there is no space or there are too many flies in the ointment to be able to do a simple thing. if you’re working in a fume cupboard with acid solution, you need ph paper, if i do that i first pass three chairs, three buddies, i only get to the hallway there. then i walk past devices where there are possibly people working so i have to dodge them too, and then i get to the assistants’ room where there are three other people asking them something. i stretch there and i take the ph paper... at worst there are so many things in the way, to sum it up, there are many switchbacks there. how the space communicates with the student’s needs or expectations may also stem from the way the student is able and allowed to individually customize the space and the facilities according to his or her own preferences, thus bringing about a personal touch with regard to the given physical surroundings. for instance, the student should be able to adjust the equipment to meet his or her ergonomic needs or to customize the environment to adapt to personal work habits as opposed to being forced to work in a space occupied by another person who has completely opposite habits. for example, the students had varying preferences regarding the need for clarity versus stimuli from the proximal surroundings and differed as to what point they started to feel the need to clear the space or wash the glassware. whereas some students wished to have all their equipment immediately available and within reach, other students experienced this kind of abundance as overwhelming and chaotic, disturbing both their cognitive processing and conduction of practical tasks. specifically related to chemistry laboratory work, an important issue is also how the environment allows the students to be human. that is to say, at the beginning of studies it is natural to make mistakes and break glassware or other equipment by accident. the students described the importance of the policy in the faculty regulations on whether the students need to pay none of the expenses, part of them or all of them, as this influences their confidence to practice the work that they do not yet master. in a broader sense, these kinds of background factors may also have an impact on the students’ perception of how effortless it is to be working in the space and whether it is meant for their work and incompleteness in the first place. 5.3.3 the challenges of forming a relationship with a space and place: esthetics and uninviting spaces in addition to the various subtler indications of the students’ experienced relatedness either to the physical space that they inhabit, their peers or the field in general, the data also included indications of spaces experienced as actually uninviting. some students described experiences of unpleasantness or repulsiveness, such as a space being esthetically so unsightly that it may actually have an alienating influence on the user: one student described as a freshman coming to the study premises full of enthusiasm, but considered changing the major because of the highly uninviting physical surroundings. in this case the student was never in close enough proximity to form a personal relationship with the physical study environment, as this was actually prevented by the strong initial sensation of the facilities as non-welcoming and uninviting. thus the comfort, coziness and even the materials of the physical space are not irrelevant in the process of forming a relationship to the space and place. as another example, many students mentioned the relevance of the colors in the physical environment. they were hoping for fresh, calming colors, as opposed to mirthless or exceedingly bright colors that were felt to be jarring and almost obtrusive in the study environment. while the possible lack of esthetic beauty or the experience of distaste may not, as such, prevent the experience of belonging to the physical surroundings within the study environment, it certainly does not improve the situation. the aforementioned aspects of pleasantness may be seen to point to matters that might, for their part, create beneficial circumstances for the experience of belonging to emerge. sjöblom et al | f l r 31 5.4 conclusions on the intertwinedness of the basic psychological needs within the context of chemistry studies: the physical environment as a gateway to a professional community and practices above we have considered the needs for autonomy, competence and relatedness as separate dimensions. these dimensions, however, are not detached from each other; we have held to this division for analytic purposes. rather, as is implicit in the analysis above, the dimensions of autonomy, relatedness and competence are essentially intertwined. in the following, we will specifically explicate this intertwinedness in the studied chemistry context. within the light of the basic psychological needs, what at first came across in the students’ reports was their relation to autonomy. namely, they appeared to emphasize the need for self-directedness already in their first year of studies. this may derive from the fact that the laboratory as a space offered them a direct connection to their possible future job in the laboratory, and thus they were constantly mirroring their everyday laboratory chores to the expectations of the profession: an independent role in a laboratory, possibly working alone or as the only chemist on the premises. with this vision in their minds, they desired to form a similarly self-driven and independent work ethos already at the beginning of their studies. as the profession of a chemist can be seen not only as an academic profession but also as handicraftmanship, the relation between the future profession and the novice stage courses is much closer than in many other academic fields in which the first years of studies are often mainly filled with theoretical courses. the laboratory environment represents a physical professional environment that the student is able to enter at an early stage of studies and, with practice, to increasingly master. indeed, the students often seemed to experience that the work in the laboratory bridged the gap between the rookie and professional stages: by accomplishing their concrete study tasks in the laboratory, they were doing similar tasks as professionals, which served as a gateway to the professional practices of chemists. this advance in study practices can also be seen as progress in terms of legitimate peripheral participation (lave & wenger, 1991); as the students are admitted to participate in procedures in a given professional context, they become involved in the professional community and culture and its shared social practices and are able to proceed from the fringe areas of professional abilities towards more internalized and well-established professional practices and expertise. from the viewpoint of basic psychological needs, an environment that supports feelings of efficacy as well as a connection with those who convey it is most likely to promote internal motivation (ryan, 2009). as the students experience the laboratory environment as closely representing their future workplace and mirror their actions to their future role as a professional, it is particularly important to pay attention to how the initial experiences of working as a chemist in a laboratory setting are built. here the design of a functional and pedagogically purposeful environment becomes central. to conclude, based on the results, we suggest that the experience of a given physical space builds through the activities performed in that space. the functionality and usability of the space and tools are highly important as they contribute to the fluency of the activity taking place, which builds the students’ view of themselves acting in that given environment. when a student experiences the space as a place that involves equipment and functions that he or she can master and perceives himor herself as someone successfully functioning in that environment, he or she is more likely to experience belonging to that environment and context. thus, how the physical environment manages to accommodate the most mundane everyday activities may, for the users, build on a broader experience of relatedness. this may be of importance when building a professional identity and creating a sense of belonging to a professional community. thus, by providing sufficient or even optimal premises for study activities, the physical environment may facilitate this process to varying degrees. sjöblom et al | f l r 32 conclusion examples from the data practical implications physically mediated guidance and the use of modern technological devices may support students’ sense of autonomy and competence. students were hoping for clear, well-structured spaces, where the basic-level information may be implemented in the space, or the students can acquire it with the help of technological devices, in order to enable them to navigate and function in the space in a selfdirected manner. socially mediated guidance was regarded as important in confirming one’s assumptions, in a facilitating rather that instructing manner. it is important to distinguish between physically and socially mediated guidance and their purposeful roles. physically mediated guidance should be more widely acknowledged and utilized in communicating information on a basic level, such as where to find needed equipment or dispose of substances, whereas social guidance is needed in the more complex cognitive processing. the physical environment may complement the students’ existing competence and offer procedural facilitation for their learning processes. the chemistry laboratory as a new and complex working environment seemed to be highly challenging, if not intimidating for the students at first. however, if the students were able to successfully enter and learn to master the equipment and the space, it offered them fruitful and highly engaging learning experiences. students should be provided with suitable spaces and tools as well as sufficient guidance in using them in order to ensure the scaffolding of the learning processes by both physical and social means. the more complex the activity and the environment, the more cognitive structuring is needed. being able to utilize diverse learning environments in a selfdirected manner may support students’ sense of autonomy in directing and regulating their own learning process. the students associated certain study activities as well as a certain value, status and ownership to different learning environments. formal learning environments, such as lecture halls, libraries and laboratories, as well as the formal and focused learning activities occurring in them, were often perceived as more “proper” than the informal and collaborative learning environments and activities, even though the latter were experienced as crucial in the learning process. flexible, diverse and freely accessible spaces should be available for students in order to accommodate the variety of study activities as well as support students’ sense of autonomy and relatedness. informal environments may promote more sense of belonging and ownership in novice students; the possibility to act in a professional work environment may bridge the gap between the rookie and professional stages and also bring a sense of meaning and purpose to the studies. the functionality of the physical environment contributes to the cognitive processes of the users as well as to the related emotional experience of oneself acting in the given environment. consequently, the physical environment may be instrumental in the development of the students’ sense of relatedness to the professional community. for the students the laboratory strongly represented their future work environment as chemists, and the experiences occurring in it were frequently mirrored to their future professional identity. the functionality of the physical environment and the fluency of the activity appeared to contribute to students’ sense of belonging to the professional context. special attention should be paid to the functionality of the physical environment as well as the fluency of short periods of practical work, as the experience of a physical environment builds through the activity performed in the environment. table 1 summary of results sjöblom et al | f l r 33 6. discussion in this study we analyzed the role of the physical environment in learning and well-being from the viewpoint of self-determination theory and basic psychological needs. the physical environment may support not only learning and well-being, but also autonomy, competence and relatedness with regard to the learning environment and the professional field. in the following we will elaborate on the broader theoretical and practical implications of the results. the physical space and tools can be seen as facilitating or posing a challenge to study activities and cognitive functioning by various means. the physical environment not only influences the cognitive learning process but inevitably gives rise to an emotional experience, as well. for instance, if the physical environment poses a challenge to study activities, and because of this the students constantly feel incompetent in the learning context, this experience builds on their views of themselves acting in that particular environment, and consequently, they may be less likely to frequently and willingly approach the same environment in the future. moreover, in order to reduce unnecessary anxiety over tough challenges with regard to their existing abilities, as well as to provide optimal grounds for learning to occur, it would be important to complement the students’ existing competence by offering procedural facilitation and support in both the physical and social environment. the emotional experience resulting from the concrete activities taking place in the physical space can support committing to that particular working environment, as well as the broader context related to it, such as the professional community of chemists. recent pedagogical research has emphasized the emotional components of the learning process, such as interest and engagement (see e.g. csíkszentmihályi, 2014; heikkilä, niemivirta, nieminen & lonka, 2011; hidi & renninger, 2006; inkinen et al., 2013; lonka 2012; lonka & ketonen 2012), as opposed to more traditional views concerning merely cognitive aspects of learning. furthermore, engagement in learning has been approached through conceptualizing cyclical stages in the learning process and defining optimal practices. we want to shed light on the role of the physical environment in the learning process: how the physical environment may support or hinder learning practices, and how that, in turn, contributes to the emotional experience and sense of commitment or the lack of it. this broadened viewpoint involving the role of the physical environment in learning may be utilized in envisioning a more holistic approach to engaging learning. the functionality and usability of the space and the equipment, the guidance implemented in the space as well as other support available (peers, teacher) all play key roles in the learning process. from the viewpoint of self-determination theory, physical environment represents a novel context for the application of the theory. based on this study, similarly to social and cultural environment, physical environment can also support or thwart the fulfillment of the basic psychological needs. furthermore, this study raises theoretical questions concerning the role of the three basic psychological needs as well as their interrelations in different contexts. while the fulfillment of all three needs is essential, within the light of these data it strongly appeared that in a study context, perhaps similar to other contexts that are highly demanding in relation to existing abilities, the dimension of competence seemed to be very central, if not a prerequisite, for experiences of autonomy or belonging to emerge. for example, it is challenging for students to develop a sense of belonging to the professional community if they mostly feel incapable of performing basic tasks and thus find themselves incompetent in the field in general. while the developers of the theory strongly emphasize the importance of all three needs as well as the synergy between them, depending on the nature of the activity, relatedness, for instance, may at times be less central to intrinsic motivation than autonomy and competence (deci & ryan, 2000). on the other hand, in other occasions, such as with children or adolescents who are at risk of dropping out of school, it may be most crucial to support the experience of relatedness (e. deci, personal communication with the first author, october 28, 2014). furthermore, it has been acknowledged that the dimensions of competence, autonomy and relatedness are strongly interrelated, and for instance, an autonomy-supporting atmosphere will assist in promoting relatedness and competence as well (deci & ryan, 1987; wolters & gonzalez, 2008). acknowledging these previously researched viewpoints, we wish to both emphasize the importance of sjöblom et al | f l r 34 promoting the fulfillment of all three basic needs in the learning context as well as further examine their interrelations and prerequisites. in our view the three dimensions may not in all contexts be equally interrelated and in identical interaction with each other. instead, they may be interdependent or sequential depending on the context. this context-driven analysis of the underlying dynamics may be an intriguing terrain for further research on the theory. what is of particular interest in the field of higher education is how the basic psychological needs interact with vital study-related phenomena such as the commitment to studies and the development of professional identity, and how to best take this into account when designing learning processes. this study demonstrates the importance of the physical environment for intellectual as well as emotional functioning. the intellectual functioning of an individual is always nested in a given physical environment, even when the work is carried out in a virtual environment. in fact, it may be that the impact of the physical environment on psychological functioning is often highly underestimated. with regard to future research it would be intriguing to untangle the effects that different physical space solutions have on human functioning. it is likely to make a difference whether one is working in a familiar workspace or in increasingly common open-plan multispace offices, not only with regard to ergonomics but also with regard to experiences of belonging or recovery. for instance, experiences of ownership and relatedness or beneficial, uplifting and inspiring mental modes can be supported by various means in both stable and mobile offices. in addition to the focal social aspects such as the shared culture of the community, some physically mediated options might include customizing the physical space with personal items but also utilizing modern and mobile technological means, such as customized technological tools or screen savers. moreover, the bodily dimensions of office environments beyond ergonomics offer an intriguing aspect to the psychophysical experience. for instance, the possibilities that the spaces or furniture offer for varied bodily postures and physical movement all contribute not just to physical health but also to the psychological experience and functioning. to conclude, it is essential to utilize psychological and pedagogical knowledge when designing work and learning environments. by considering the interplay between the material world and human functioning, we can create fruitful ground for thriving users and develop novel design for leading university campuses and other indoor environments. keypoints similarly to social and cultural environment, physical environment can also support or thwart the fulfillment of the basic psychological needs. learning and wellbeing can be facilitated by developing physical environments that support the basic psychological needs. the physical environment contributes to the cognitive functioning of the users as well as to the related emotional experience of oneself acting in the given environment. for example, a wellstructured physical environment may offer physically mediated guidance, cognitive structuring and procedural facilitation for the students’ learning processes. it may complement the students’ existing competence and scaffold the students’ sense of control in situations where the challenge of the task is experienced as high. physical spaces and tools should be utilized in offering students functional feedback, engaging learning experiences and gateways to practicing their future profession. in order to support the basic psychological needs as well as help the students to regulate their own learning process the students should be provided with suitable spaces and tools as well as sufficient guidance and autonomy in using them. special attention should be paid to the functionality of the physical environment, as the experience of a physical environment builds through the activity performed in the environment. sjöblom et al | f l r 35 the results provide both theoretical and practical value in understanding the role of the physical environment as part of human functioning and serve as an opening to a previously unexplored ground. by bringing together the theoretical approaches of socially and physically distributed intelligence and research on motivation, this study demonstrates the importance of the physical environment for intellectual as well as emotional functioning. the intellectual functioning of an individual is always nested in a given physical environment, even when the work is carried out in a virtual environment. utilizing psychological and pedagogical knowledge is essential when designing or renovating work and learning environments in order to fully make use of the potential of physical environments as part of human performance. acknowledgements this study was funded by the tekes (the finnish funding agency for technology and innovation) rym indoor environment project (project number 462054), the academy of finland project mind the gap (project number 1265528) as well as personal grants from finnish cultural foundation (1st and 3rd autor) and alfred kordelin foundation (2nd author). references alfonsi, e, capolongo, s. & buffoli, m. (2014). evidence based design and healthcare: an unconventional approach to hospital design. annali di igiene : medicina preventiva e di comunità, 26, 137–143. doi:10.7416/ai.2014.1968 alker, j., malanca, m., pottage, c., o’brien, r., akhras, d., ambrose, b., …wong, j. (2015). health, wellbeing and productivity in offices. world green building council, http://www.worldgbc.org/activities/health-wellbeing-productivity-offices/research. baumeister, r. f. & leary, m. r. (2000). the need to belong: desire for interpersonal attachments as a fundamental human motivation. in higgins, e. t. & kruglanski, a. w. (eds.), motivational science: social and personality perspectives (pp. 24–49). new york, ny: psychology press. beard, c. (2009). space to learn: the development and evolution of new learning environments in higher education. in buswell, j. & becket, n. (eds.), enhancing student centred learning in business and management, hospitality, leisure, sport, and tourism. oxford, uk: threshold press. beard, c. (2012). spatial ecologies: learning and working environments that change people and organisations. in alexandra, k. & price, i. (eds.), managing organisational ecologies (pp.69–80). new york, ny: routledge. beard, c. & price, i. (2010). space, conversations and place: lessons and questions from organisational development. international journal of facility management, 1. bereiter, c. & scardamalia, m. (1987). the psychology of written composition. hillsdale, nj: lawrence erlbaum associates inc. black, a. e. & deci, e. l. (2000). the effects of instructors’ autonomy support and students’ autonomous motivation on learning organic chemistry: a self-determination theory perspective. science education, 84, 740–756. doi:10.1002/1098-237x(200011)84:6<740::aid-sce4>3.0.co;2-3 chirkov, v., ryan, r. m., kim, y., & kaplan, u. (2003). differentiating autonomy from individualism and independence: a self-determination theory perspective on internalization of cultural orientations and well-being. journal of personality and social psychology, 84, 97–110. doi:10.1037/0022-3514.84.1.97 csíkszentmihályi, m. (1988). optimal experience: psychological studies of flow in consciousness. new york, ny: cambridge university press. csíkszentmihályi, m. (2014). applications of flow in human development and education: the collected works of mihály csíkszentmihályi. new york, ny: springer science + business media. sjöblom et al | f l r 36 deci, e. l. & ryan, r. m. (1985). intrinsic motivation and self-determination in human behavior. new york, ny: plenum. deci, e. l. & ryan, r. m. (1987). the support of autonomy and the control of behavior. journal of personality and social psychology, 53, 1024–1037. deci, e. l. & ryan, r. m. (2000). the “what” and “why” of goal pursuits: human needs and the selfdetermination of behavior. psychological inquiry, 11, 227–268. doi:10.1207/s15327965pli1104_01 deci, e. l. & ryan, r. m. (2002). self-determination research: reflections and future directions. in deci, e. l. & ryan, r. m. (eds.), handbook of self-determination research (pp. 431–442) rochester, ny: the university of rochester press. deci, e. l. & ryan, r. m. (2008). self-determination theory: a macrotheory of human motivation, development, and health. canadian psychology, 49, 182–185. doi:10.1037/a0012801 dweck, c. s. (2006). mindset: the new psychology of success. new york, ny: random house. evans, g. w., bullinger, m. & hygge, s. (1998). chronic noise exposure and physiological response: a prospective study of children living under environmental stress. psychological science, 9, 75–77. doi:10.1111/1467-9280.00014 gay, j. l. (2008). testing self-determination theory and the roles of the social and physical environments in an adult beginning exerciser population. columbia, sc: university of south carolina. gay, j. l., saunders, r. p. & dowda, m. (2011). the relationship of physical activity and the built environment within the context of self-determination theory. annals of behavioral medicine, 42, 188– 196. doi:10.1007/s12160-011-9292-y hakkarainen, k. (2009). a knowledge-practice perspective on technology-mediated learning. computersupported collaborative learning, 4, 213–231. doi:10.1007/s11412-009-9064-x hakkarainen, k., palonen, t., paavola, s. & lehtinen, e. (2004). communities of networked expertise: professional and educational perspectives. bingley, uk: emerald group publishing limited. heikkilä, a., & lonka, k. (2006). studying in higher education: students’ approaches to learning, selfregulation, and cognitive strategies. studies in higher education, 31, 99–117. doi:10.1080/03075070500392433 heikkilä, a., lonka, k., nieminen, j. & niemivirta, m. (2012). relations between teacher students' approaches to learning, cognitive and attributional strategies, well-being, and study success. higher education, 64, 455–471. doi:10.1007/s10734-012-9504-9 heikkilä, a., niemivirta, m., nieminen, j. & lonka, k. (2011). interrelations among university students’ approaches to learning, regulation of learning, and cognitive and attributional strategies: a person oriented approach. higher education, 61, 513–529. doi:10.1007/s10734-010-9346-2 hidi, s., & renninger, k.a. (2006). the four-phase model of interest development. educational psychologist, 41, 111–127. doi:10.1207/s15326985ep4102_4 hutchins, e. (2000). distributed cognition. international encyclopedia of the social and behavioral sciences, 2068–2072. hutchins, e. (2006). the distributed cognition perspective on human interaction. in enfield, n. j. & levinson, s. c. (eds.), roots of human sociality: culture, cognition and interaction (pp. 375–398). inkinen, m., lonka, k., hakkarainen, k., muukkonen, h., litmanen, t. & salmela-aro, k. (2013). the interface between core affects and the challenge-skill relationship. journal of happiness studies, 15, 891–913. doi:10.1007/s10902-013-9455-6 iyengar, s. s., & devoe, s. e. (2003). rethinking the value of choice: considering cultural mediators of intrinsic motivation. in murphy-berman, v. & berman, j. j. (eds.), cross-cultural differences in perspectives on self (pp. 146–191). lincoln, ne: university of nebraska press. jang, h., reeve, j. & deci, e. l. (2010). engaging students in learning activities: it is not autonomy support or structure but autonomy support and structure. journal of educational psychology, 102, 588–600. doi:10.1037/a0019682 job, v., walton, g. m., bernecker, k. & dweck, c. s. (2015). implicit theories about willpower predict selfregulation and grades in everyday life. journal of personality and social psychology, 108, 637–647. doi:10.1037/pspp0000014 sjöblom et al | f l r 37 küller, r. & lindsten, c. (1992). health and behavior of children in classrooms with and without windows. journal of environmental psychology, 12, 305–317. doi:10.1016/s0272-4944(05)80079-9 lansdale, m., parkin, j., austin, s. & baguley, t. (2011). designing for interaction in research environments: a case study. journal of environmental psychology, 31, 407–420. doi:10.1016/j.jenvp.2011.05.006 lave, j. & wenger, e. (1991). situated learning: legitimate peripheral participation. new york, ny: cambridge university press. lindblom-ylänne, s. & lonka, k. (2000). interaction between learning environment and expert learning. lifelong learning in europe, 5, 90–97. lonka, k. (2012). engaging learning environments for the future: the 2012 elizabeth w. stone lecture. in gwyer, r., stubbings, r. &walton, g. (eds.), the road to information literacy: librarians as facilitators of learning (pp. 15–30). berlin, germany: de gruyter. lonka, k. & ketonen, e. (2012). how to make a lecture course an engaging learning experience? studies for the learning society, 2, 63–74. doi:10.2478/v10240-012-0006-1 markus, h. r., & kitayama, s. (1991). culture and the self: implications for cognition, emotion, and motivation. psychological review, 92, 224–253. doi:10.1037/0033-295x.98.2.224 mälkki, k. (2010). building on mezirow’s theory of transformative learning: theorizing the challenges to reflection. journal of transformative education, 8, 42–62. doi:10.1177/1541344611403315 mälkki, k. (2012). rethinking disorienting dilemmas within real-life crises: the role of reflection in negotiating emotionally chaotic experiences. adult education quarterly, 62, 207–229. doi:10.1177/0741713611402047 mälkki, k., sjöblom, k. & lonka, k. (2014). transformation of the physical space and transformation of the subject. in nicolaides, a. & holt, d. (eds.), spaces of transformation and transformation of space: proceedings of the xi international transformative learning conference (pp. 550–556). new york, ny: teachers college, columbia university. nenonen, s., kärnä, s., junnonen, j.-m., tähtinen, s. & sandström, n. (eds.) (2015). how to co-create campus. tampere, finland: suomen yliopistokiinteistöt oy, tampere juvenes print. (in finnish) niemiec, c. p. & ryan, r. m. (2009). autonomy, competence, and relatedness in the classroom applying self-determination theory to educational practice. theory and research in education, 7, 133–144. doi:10.1177/1477878509104318 norman, d. a. (1993). things that make us smart: defending human attributes in the age of the machine. cambridge, ma: perseus books. paavola, s., lipponen, l. & hakkarainen, k. (2004). models of innovative knowledge communities and three metaphors of learning. review of educational research, 74, 557–576. doi:10.3102/00346543074004557 parsons, r. & hartig, t. (2000). environmental psychophysiology. in cacioppo, j. t., tassinary, l. g. & berntson, g. (eds.), handbook of psychophysiology (pp. 815–846). new york, ny: cambridge university press. perry, n. e., turner, j. c. & meyer, d. k. (2006). classrooms as contexts for motivating learning. in alexander, p. a. & winne, p. h. (eds.), handbook of educational psychology. mahwah, nj: lawrence erlbaum associates publishers. rutten, c., boen, f. & seghers, j. (2012). how school social and physical environments relate to autonomous motivation in physical education: the mediating role of need satisfaction. journal of teaching in physical education, 31: 216–230. ryan, r. m. (1995). psychological needs and the facilitation of integrative processes. journal of personality, 63, 397–427. ryan, r. m. (2009). self-determination theory and wellbeing. wellbeing in developing countries research review, 1. ryan, r. m. & connell, j. p. (1989). perceived locus of causality and internalization: examining reasons for acting in two domains. journal of personality and social psychology, 57, 749–761. doi:10.1037/00223514.57.5.749 sjöblom et al | f l r 38 ryan, r. m. & deci, e. l. (2000). self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. american psychologist, 55, 68–78. doi:10.1037/0003066x.55.1.68 ryan, r. m. & deci, e. l. (2002). overview of self-determination theory: an organismic dialectical perspective. in e. l. deci & r. m. ryan (eds.), handbook of self-determination research (pp. 3–33) rochester, ny: university of rochester press. ryan, r. m. & deci, e. l. (2006). self-regulation and the problem of human autonomy: does psychology need choice, self-determination, and will? journal of personality, 74, 1557–1586. doi:10.1111/j.14676494.2006.00420.x ryan, r. m. & deci, e. l. (2009). promoting self-determined school engagement: motivation, learning and well-being. in wentzel, k. r. & wigfield, a. (eds.), handbook of motivation at school (pp. 171–195). new york, ny: routledge. ryan, r. m., huta, v. & deci, e. l. (2008). living well: a self-determination theory perspective on eudaimonia. journal of happiness studies, 9, 139–170. doi:10.1007/s10902-006-9023-4 sandström, n., eriksson, r., lonka, k. & nenonen, s. (accepted for publication 2015). usability and affordances for inquiry-based learning in a blended learning environment. facilities. sandström, n., ketonen, e. and lonka, k. (2014). the experience of laboratory learning – how do chemistry students perceive their learning environment? european journal of social and behavioural sciences, 11, 1612–1625. doi:10.15405/ejsbs.144 sandström, n., sjöblom, k., mälkki, k. & lonka, k. (2013) the role of physical, social and mental space in chemistry students’ learning. european journal of social and behavioural sciences, 6, 1134–1139. doi:10.15405/ejsbs.90 scott, d. & usher, r. (1999). researching education: data, methods and theory in educational enquiry. london, uk: continuum. seligman, m., ernst, r., gillham, j., reivich, k. & linkins, m. (2009). positive education: positive psychology and classroom interventions. oxford review of education, 35, 293–311. doi:10.1080/03054980902934563 soenens, b., sierens, e., vansteenkiste, m., goossens, l., & dochy, f. (2012). psychologically controlling teaching: examining outcomes, antecedents, and mediators. journal of educational psychology, 104, 108–120. doi:10.1037/a0025742 tuominen-soini, h., salmela-aro, k. & niemivirta, m. (2008). achievement goal orientations and subjective well-being: a person-centred analysis. learning and instruction, 18, 251–266. doi:10.1016/j.learninstruc.2007.05.003 ulrich, r. s. (1981). natural versus urban scenes: some psychophysiological effects. environment and behavior, 13, 523–556. doi:10.1177/0013916581135001 vansteenkiste, m., sierens, e., goossens, l., soenens, b., dochy, f., mouratidis, a., aelterman, n., haerens, l. & beyers, w. (2012). identifying configurations of perceived teacher autonomy support and structure: associations with self-regulated learning, motivation and problem behavior. learning and instruction, 22, 431–439. doi:10.1016/j.learninstruc.2012.04.002 vermunt, j. d. & verloop, n. (1999). congruence and friction between learning and teaching. learning and instruction, 9, 257–280. doi:10.1016/s0959-4752(98)00028-0 wigfield, a., cambria, j. & eccles, j. s. (2012). motivation in education. in ryan, r. m. (ed.), the oxford handbook of human motivation. new york, ny: oxford university press. williams, m. (2000). interpretivism and generalisation. sociology, 34, 209–224. doi:10.1177/s0038038500000146 winterbottom, m. & wilkins, a. (2009). lighting and discomfort in the classroom. journal of environmental psychology, 29, 63–75. doi:10.1016/j.jenvp.2008.11.007 wolters, c. a. & gonzalez, a-l. (2008). classroom climate and motivation: a step toward integration. in maehr, m. l., karabenick, s. a. & urdan, t. c. (eds.), advances in motivation and achievement social psychological perspectives. bingley, uk: emerald group publishing limited. sjöblom et al | f l r 39 woolner, p., hall, e., higgins, s., mccaughey, c., & wall, k. (2007). a sound foundation? what we know about the impact of environments on learning and the implications for building schools for the future. oxford review of education, 33, 47–70. doi:10.1080/03054980601094693 frontline learning research vol. 11 no. 1 2023 40 56 issn 2295-3159 corresponding author: susanne schmidt, faculty 03, chair of business and economics education, johannes gutenberg-university mainz, jakob-welder-weg 9 55128 mainz, germany. susanne.schmidt@uni-mainz.de doi:https://doi.org/10.14786/flr.v11i1.885 modeling and measuring domain-specific quantitative reasoning in higher education business and economics susanne schmidt1, olga zlatkin-troitschanskaia1 & richard j. shavelson2 1johannes gutenberg-university mainz, germany 2stanford university, usa article received 8 june 2022/ revised 30 january 2023/ accepted 1 february 2023/ available online 22 march 2023 abstract quantitative reasoning is considered a crucial prerequisite for acquiring domainspecific expertise in higher education. to ascertain whether students are developing quantitative reasoning, validly assessing its development over the course of their studies is required. however, when measuring quantitative reasoning in an academic study program, it is often confounded with other skills. following a situated approach, we focus on quantitative reasoning in the domain of business and economics and define domain-specific quantitative reasoning primarily as a skill and capacity that allows for reasoned thinking regarding numbers, arithmetic operations, graph analyses, and patterns in real-world business and economics tasks, leading to problem solving. as many studies demonstrate, well-established instruments for assessing business and economics knowledge like the test of understanding college economics (tuce) and the examen general para el egreso de la licenciatura (egel) contain items that require domain-specific quantitative reasoning skills. in this study, we follow a new approach and assume that assessing business and economics knowledge offers the opportunity to extract domain-specific quantitative reasoning as the skill for handling quantitative data in domain-specific tasks. we present an approach where quantitative reasoning – embedded in existing measurements from tuce and egel tasks – will be empirically extracted. hereby, we reveal that items tapping domain-specific quantitative reasoning constitute an empirically separable factor within a confirmatory factor analysis and that this factor (domain-specific quantitative reasoning) can be validly and reliably measured using existing knowledge assessments. this novel methodological approach, which is based on obtaining information on students’ quantitative reasoning skills using existing domain-specific tests, offers a practical alternative to broad test batteries for assessing students’ learning outcomes in higher education. keywords: quantitative reasoning; confirmatory factor analysis; domain-specific learning; higher education; business and economics. mailto:susanne.schmidt@uni-mainz.de 41 | f l r 1. introduction within many study domains, quantitative reasoning is a required skill for scientific reasoning and arguing. furthermore, in study domains or subjects with a strong quantitative focus, quantitative reasoning is not only a required generic skill in terms of developing and understanding scientific arguments, but it is also necessary to understand domain-specific concepts, such as the supply-demand-function in economics or amortization plans in business. though quantitative reasoning is not necessarily explicitly taught in higher education classes, it is nonetheless part of learning and applying quantitative operations within business and economics tasks. while there are many studies and assessments that measure general quantitative reasoning (for an overview, roohr et al., 2014), there is a lack of research addressing the development of quantitative reasoning in specific domains, including the domain of business and economics. while business and economics outcome measures do not explicitly measure quantitative reasoning, they contain numerous items that demand reasoning quantitatively. the question, then, is: can quantitative reasoning be isolated from test items found on business and economics outcome measures? if it can, a separate measure of quantitative reasoning would not be necessary to track students’ development of quantitative reasoning. especially in the context of the so-called bologna reform in europe with its increasing modularization, the number of examinations and assessments in higher education has increased significantly. therefore, the research question arises as to what information on students’ quantitative reasoning ability can be obtained using the existing domain-specific tests to avoid the practically less suitable use of broad test batteries in higher education. to this end, we introduce a novel approach by isolating an assessment to measure quantitative reasoning from existing tests instead of developing a new test for this purpose. using confirmatory factor analysis (cfa), we show that a combination of questions from existing business and economics assessments provide a valid and reliable measure of quantitative reasoning. in the domain of business and economics, there are several validated and internationally established tests used to assess knowledge-related competences. for instance, there are the test of understanding college economics (tuce) (walstad et al., 2007) and the examen general para el egreso de la licenciatura (egel) (ceneval, 2011), which are standardized instruments adapted and validated in different language versions so that they can be used for assessing higher education students in different countries. the tuce, originally developed in the us, has been adapted and validated for german, japanese, korean and many more languages and higher education contexts (walstad et al., 2007; yamaoka et al., 2010). for germany, there is also a valid adaption of egel that has been used to assess knowledge in business administration (zlatkin-troitschanskaia et al., 2014). since these instruments are validated for measuring knowledge in business and economics, it is unclear whether they can also provide a reliable and valid tool for the assessment of quantitative reasoning as one subfacet of overall business and economics competence within this domain. to define the assessment design for quantitative reasoning out of the existing business and economics tests, we follow mislevy & haertel’s (2006) evidence-centered design and focus on the following three steps: (1) define the construct to be assessed (quantitative reasoning); here, within the domain of business and economics; (2) provide theoretical and empirical evidence as to whether quantitative reasoning-related test items from egel and tuce fit the construct definition of quantitative reasoning (see section 3.1); (3) collect and analyze data to investigate if test items align empirically with the construct definition; and finally, (4) draw a conclusion from (1) through (3). 42 | f l r the aim of this study, then, is to explore if we can isolate, conceptually, quantitative reasoning items on the tuce and egel and bring item-response data to bear on the claim that the subset of items actually measure quantitative reasoning. in the following, we briefly review current conceptual and empirical research on quantitative reasoning and its measurement (section 2). in section 3, we develop four hypotheses related to the overarching research question driving this study—whether a subset of the egel and tuce items can be used to reliably and validly assess the underlying and implicitly measured construct of quantitative reasoning in business and economics (in accordance with aera et al., 2014). in sections 4 and 5, we conduct conceptual and empirical analyses of the items to explore the claim that a subset of them measures quantitative reasoning. we conclude that we can empirically identify a reliable and valid subset of items that conceptually measure quantitative reasoning (and verbal reasoning) in existing business and economics knowledge tests (section 6). 2. conceptual and assessment background 2.1 quantitative reasoning as a key student learning outcome there is a growing consensus that effective learning and citizenship in the 21st century requires college graduates to be ‘quantitatively literate’, that is, to be able to think and reason quantitatively when the situation demands it (shavelson, 2008; ball, 2003; madison, 2009; nrc, 2012). universities are beginning to recognize the need for such quantitative competencies and consider them essential student learning outcomes (slos) (lusardi & wallace, 2013). for instance, in a study among the member institutions of the american association of colleges and universities (aac&u), 71% of the colleges and universities identified the acquisition of quantitative reasoning as a central aim of learning in higher education (hart research associates, 2009). similarly, the national leadership council for liberal education and america’s promise (leap) named quantitative reasoning as one of the essential slos of the new global century (aac&u, 2008). quantitative reasoning is a component of tertiary education as it is one of four key slos (the others being: writing, critical thinking, and information and technological literacy) (davidson & mckinney, 2001). quantitative reasoning is considered more than the ability to perform rough calculations. it is an essential competence and crucial prerequisite for acquiring domain-specific and generic knowledge and skills in higher education. in comparison to mathematics as a particular discipline, quantitative reasoning can be considered a generic skill and a way of thinking that requires dealing with complex, real-world, everyday challenges involving quantities and their different kinds of representations in different disciplines (davidson & mckinney, 2001). following a situated theoretical approach (shavelson, 2008), we assume that this generic skill can be manifested differently in varying domain-specific contexts. in this study, we focus on quantitative reasoning in the domain of business & economics (b&e) and define domain-specific quantitative reasoning (dsqr) primarily as a skill and capacity that allows for reasoned thinking regarding numbers, arithmetic operations, graph analyses, and patterns in real-world business and economics tasks, leading to problem-solving. quantitative reasoning is embedded in the hierarchical construct of cognitive outcomes. the hierarchical nature has five levels, which represent skills with a higher domain-specificity on lower levels and more generic skills on higher levels (shavelson & huang, 2003; figure 1). 43 | f l r figure 1. framework for cognitive outcomes (shavelson & huang, 2003, p. 14). 2.2 assessments of quantitative reasoning as a generic skill a number of instruments have been used to assess quantitative reasoning as a generic skill such as the quantitative reasoning for college science (quarcs) test (follette et al., 2017), the cla+ with the scientific and quantitative reasoning test (sqr) (zahner, 2013), the quantitative reasoning questions from the. graduate record examination (gre) and the heighten quantitative reasoning test (ets, 2016) (for an overview, roohr et al., 2014). currently, however, teaching in higher education does not focus on developing quantitative reasoning in an explicit way, or on assessing it (rocconi et al., 2013). this might be the reason why tests for quantitative reasoning are rarely used in higher education research and practice. rather the focus is more on the assessment of domain-specific competences. this said, domain-specific quantitative reasoning is seldom assessed although claimed to be an important outcome of a program of study. as if to do so requires a separate test from what is usually used, quantitative reasoning is unlikely to be assessed. however, the separate assessment of quantitative reasoning might not be necessary to measure the level and development of quantitative reasoning throughout undergraduate or graduate studies. if domainspecific knowledge tests can also provide a source for reliable and valid measurement of quantitative reasoning within a domain (o’neill & flynn, 2013), it could offer a practicable approach for higher education. 2.3 assessments of quantitative reasoning in the business and economics domain students’ knowledge and skills are commonly assessed with domain-specific competence tests in various fields of study (physics, engineering, psychology, business and economics; for an overview of research on domainspecific competences in different domains, piacc study oecd, 2013, or kokohs program, zlatkintroitschanskaia et al., 2017). following elrod (2014), we suspect that quantitative reasoning is a component embedded in domain-specific tests. therefore, separating quantitative reasoning loaded items within business 44 | f l r and economics tests seems both reasonable and feasible. furthermore, by measuring quantitative reasoning within domain-specific competence tests, teachers and students receive direct feedback on how quantitative reasoning contributes to solving domain-specific problems. indeed, in the domain of business and economics, we found no assessments of quantitative reasoning. the assessment of quantitative reasoning for indexing individual student skills or the effectiveness of curricula remains primarily a local practice (gaze et al., 2014). elrod (2014) assumes that one concern regarding quantitative reasoning assessment is the perception of quantitative reasoning as another outcome to assess aside from all the other regular tests and assessments and for which teachers or researchers may need to create a completely new assessment strategy. therefore, providing a valid measure of quantitative reasoning to be assessed within a domain-specific knowledge test, would open up new opportunities for researchers and practitioners. hereby, quantitative reasoning can be considered a ‘sub-dimension’ in existing assessment instruments. in particular, the assessment of domain-specific business and economics knowledge and understanding offers the opportunity to extract quantitative reasoning as the skill to handle quantitative data and numbers in existing test items. when assessing knowledge in business and economics, the tuce (walstad et al., 2007) and egel (ceneval, 2011) contain items where quantitative reasoning skills are necessary, as brückner and colleagues (2015b) demonstrate. although these tests deal with standardized assessments with a multiplechoice (mc) format, students must complete items with domain-specific tasks with real-life economic questions or problems. this study is based on an assessment of content knowledge in the domain of business and economics with items from the tuce and egel (section 4). we claim that items from the tuce and egel can be classified based on whether quantitative reasoning or non-quantitative verbal reasoning was required to answer a question correctly (brückner et al., 2015a for separating the tuce items into quantitative reasoning and verbal reasoning). we define verbal reasoning as an ability that allows for reasoned thinking without numbers, arithmetic operations and patterns in business and economics tasks. further, we can assume that spatial reasoning may link the two domains of quantitative reasoning and verbal reasoning. however, since there were very few tasks featuring graphs and diagrams in the two assessments considered here, it is not possible to model a third dimension with sr. therefore, the assumption of whether sr presents an empirically separable construct from quantitative reasoning cannot be verified in this study. by differentiating quantitative reasoning and verbal reasoning we assume that the way students deal with numerical or verbal content within a task influences the nature and difficulty of the solution. we suspected that some items would demand a preponderance of quantitative reasoning and some items verbal reasoning. by doing so, we focus on convergent and discriminant validity, including the quantitative reasoning’s relationship to other variables. consequently, we analyzed multiple-choice items from the tuce and egel to identify quantitative and verbal content demands. the sorting process is described in section 4. 3. research questions and hypotheses this study evaluates whether (1) it is possible to (a) identify subsets of items that conceptually measure quantitative reasoning in business and economicscontent knowledge tests, and (b) if this conceptual distinction can be empirically supported and distinguished from other achievement items of a verbal nature (verbal reasoning) (research question 1: internal construct validity). further, (2) whether the resulting scores for quantitative reasoning provide valid measures regarding external criteria for the underlying construct (research question 2: convergent and discriminant validity). we follow aera et al.’s (2014) validation criteria, with particular focus on the criterion of internal structure— the extent to which the empirical structure of the test supports the conceptual structure. more specifically, we assume that quantitative reasoning and verbal reasoning subtext scores are highly correlated but empirically separable, each with high internal consistency (hypothesis a). 45 | f l r in addition, if evidence supports a business and economics quantitative reasoning interpretation, we need to support this claim by showing that the quantitative reasoning score correlates, as expected, with a specific domain and with additional external variables. 1 only few such studies have been conducted and they show a relationship between quantitative reasoning and socio-demographic factors (brückner et al., 2015b; tiffin et al., 2014). in particular, male test takers have been found to perform better on numeracy tasks than female test takers (owen, 2012; williams et al., 1992). this finding indicates that gender effects might differ between assessments of different components of the business and economics achievement construct, that is, between quantitative reasoning and verbal reasoning (yamaoka et al., 2010). in economics, higher levels of economicsrelated quantitative reasoning have been reported for male students, while higher levels of verbal reasoning have been reported for female students. in an introductory course in economics at a us university ballard and johnson (2004) found that male students had higher numeracy scores than female students. moreover, brückner et al. (2015a) found that both quantitative reasoning and verbal reasoning were higher for male than for female students. the same is evident in general for business and economics achievement scores: male students generally perform better than female students (brückner et al., 2015a; happ et al., 2018). however, the gender-specific differences were larger, on average, for quantitative reasoning scores than for verbal reasoning scores. according to these findings, we expect male students to outperform female students on a business and economics quantitative reasoning subtest (hypothesis b). moreover, migration background has been shown to impact generic skills, and quantitative reasoning in particular. in studies in europe, students with a migration background score lower, on average, on tests in numerically oriented subdomains of business and economics such as economics (zlatkin-troitschanskaia et al., 2015), and finance (förster et al., 2015). similar results have been found in the u.s. regarding ethnicity and race. for instance, bleske-rechek and browne (2014) have shown a gap between ethnic groups on both the gre vr and quantitative reasoning average scores. furthermore, white examinees’ verbal reasoning scores fall, on average, a full standard deviation above black minority examinees’ scores, and a half standard deviation higher than examinees from other underrepresented groups. consequently, we expect students with a recent migration background, that is students with a least one parent not of german origin, have lower test results, on average, in tasks with quantitative reasoning demands than students without migration background (hypothesis c). the educational background prior to higher education also influences generic skills such as quantitative reasoning. in particular, the school leaving grade (gpa) is considered a valid indicator of a person’s general academic ability (schuler et al., 1990) as well as a significant predictor of students’ academic performance in a domain. for instance, kuncel and colleagues (2010) showed in their meta-analysis that graduates’ gpa positively correlates with their performance in quantitative reasoning tasks (r=0.23) and verbal reasoning tasks (r=0.29) on the gre test. findings based on tests that assess students’ content knowledge in subdomains of business and economics such as accounting (byrne & flood, 2008; fritsch et al., 2015), finance (förster et al., 2015), and macroeconomics (zlatkin-troitschanskaia et al., 2015) indicate that a correlation of this kind between the gpa and domain-specific test results also exists in the business and economics domain. consequently, we expect there to be a positive relationship between school leaving grades (gpa) and tests that demand quantitative reasoning in business and economics (hypothesis d). furthermore, students’ previous subject-related knowledge acquired through learning processes prior to university is highly important in the acquisition of domain-specific knowledge at university (alexander & jetton, 2003; anderson, 2005; happ et al., 2018). for the domain of business and economics, subject-related knowledge can be acquired in different ways prior to university. in germany, many students acquire their higher education entrance qualification at vocational schools that offer advanced courses in business and economics indicating that students have prior content knowledge in business and economics when they enter university. passing advanced courses in business and economics at vocational schools or completing commercial vocational training is associated with a higher level of content knowledge in business and economics subdomains (for economics, brückner et al., 2015b; for accounting, fritsch et al., 2015; for finance, förster et al., 2015). however, this prior knowledge should only have a strong effect when acquiring (or predicting) domain-specific knowledge. as quantitative reasoning and verbal reasoning are generic skills, 46 | f l r there should be little correlation between prior knowledge of business and economics and performance in these dimensions if the test is an appropriate measure for quantitative reasoning and verbal reasoning. it is assumed that students who have pursued advanced courses at vocational schools or who have completed commercial vocational training do not perform significantly better than students who do not have any prior education in business and economics in tasks that explicitly demand quantitative reasoning (hypothesis e). 4. methods and study design 4.1 quantitative reasoning and verbal reasoning items in business and economics tests students’ business and economics content knowledge and understanding were measured in the project wiwikom 2 (zlatkin-troitschanskaia et al., 2014). in this project, the tuce and egel were adapted to the german language and higher education context and comprehensively validated (aera et al., 2014; itc, 2005). the following analyses refer to the subtests in the areas of accounting and finance (16 items each from egel) and microeconomics (30 items from tuce), where we use all test items from these areas. each item consists of an item stem and four response options with one correct answer. using our definitions of quantitative reasoning and verbal reasoning we sorted the 30 tuce items and 32 egel items into these two categories (for quantitative reasoning and verbal reasoning example items from the microeconomics part, figure 2). based on the differentiation of whether a task contains numerical properties that can indicate students’ quantitative reasoning skills as described in shavelson et al. (2019), the following analyses assume a dichotomous differentiation between items with numerical content (quantitative reasoning) and items without numerical content (verbal reasoning). figure 2. two tuce items from the dimension microeconomics (walstad et al., 2007). following and expanding upon brückner et al. (2015a), who already classified tuce items into quantitative reasoning and verbal reasoning items, we noted whether numerical operations were contained in the case descriptions for both instruments, tuce and egel. items that contained numerical content and thus required students to apply mainly their mathematical abilities were classified as quantitative reasoning (for the quantitative reasoning item in sunshine city, one local ice cream company operates in a competitive labor market and product market. it can hire workers for $45 a day and sell ice cream cones for $1.00 each. the table below shows the relationship between the number of workers hired and the number of ice cream cones produced and sold. number of workers hired number of ice cream cones sold 4 5 6 7 8 340 400 450 490 520 as long as the company stays in business, how many workers will it hire to maximize profits or minimize losses? a. 5 b. 6 c. 7 d. 8 verbal reasoning item many u.s. interstate highways are crowded with traffic, but tolls are not collected even when the highways are crowded. which of the following is true about this no-toll policy? a. it is efficient because interstates are needed to transport goods. b. it is efficient because there is no cost of using the interstate once it is built. c. it is inefficient because each person’s use of the interstate adds to the congestion. d. it is inefficient because collecting tolls would increase government revenues, allowing other taxes to be decreased. 47 | f l r microeconomics part of tuce, following brückner et al., 2015a), while items dealing with purely verbally described definitions, concepts or conceptual systems were classified as verbal reasoning (table 1). table 1 distribution of quantitative reasoning (qr) and verbal reasoning (vr) items from the three content-domains of the tuce and egel content-domain qr vr total microeconomics 4 26 30 accounting 13 3 16 finance 11 4 15 total 28 33 61 like brückner and colleagues (2015a), we achieved full congruence among the four different raters in the classification of the items into quantitative reasoning and verbal reasoning subsets in our study presented here (an interrater agreement of cohens kappa=1.0; p=.000). therefore, in response to research question 1a, we conclude that it is conceptually feasible to classify test items in a business and economics knowledge test into the categories quantitative reasoning and verbal reasoning. based on this conclusion, we examined next whether the conceptual distinction between quantitative reasoning and verbal reasoning items can be empirically supported (rq1b). 4.2 sample data were collected in the summer semester 2015 using the abovementioned subtests from tuce and egel. the test was administered as a paper-pencil test in a booklet design (frey et al., 2009) using different sets of items from tuce and egel within the booklets. the test booklets were randomly distributed among the participants. the sample included 1,492 students from 27 universities and 13 universities of applied science throughout germany. the institutions involved are a representative sample. at these universities, all beginning students enrolled in master’s degrees in business and economics were invited to participate in this study in the context of introductory courses that all beginning students in these degree courses have to attend; i.e., at each university, all students enrolled in a business and economics master’s degree were assessed at once in the context of a compulsory introductory lecture. the survey was carried out on site by trained test leaders. to encourage the students to participate in this study, every participant received €5 as well as individual feedback on the test results. since participation was voluntary, the possibility of skewed representativeness at the student level cannot be excluded. however, the distribution of descriptive characteristics such as gender and age does not indicate any significant biases compared to overall student population in germany. the composition of the sample in terms of the predictor variables we used is presented in table 2. 48 | f l r table 2 distribution of the sample according to the predictors of domain-specific quantitative reasoning sample n=1,492, frequency (%) predictor yes no gender, male 783 (52.5) 708 (47.5) migration background 400 (26.8) 1,087 (72.9) advanced courses in business & economics 422 (28.3) 1,059 (71.0) vocational training 299 (20.0) 1,190 (79.8) the two indicators – ‘attended an advanced course in business and economics at commercial upper secondary school’ and ‘completed vocational training’–were used as a proxy for prior knowledge in business and economics and operationalized as two dummy-coded variables in the following analysis (section 5). the high school leaving grade (gpa) was used as an indicator of students’ general academic performance (mean=2.241, s.d=0.081 on a 5-point scale with 1 being the highest performance and 5 the lowest). 5. analyses and results this study evaluates whether (1) it is possible to (a) identify subsets of items that conceptually measure quantitative reasoning in business and economics content knowledge tests, and (b) if this conceptual distinction can be empirically supported and distinguished from other achievement items of a verbal nature (verbal reasoning). further, (2) whether the resulting scores for quantitative reasoning provide valid measures regarding external criteria for the underlying construct (research question 2: convergent and discriminant validity). as rq 1a was answered to the affirmative above, attention now turns to questions 1b and 2 and the corresponding hypotheses. 5.1 quantitative reasoning and verbal reasoning subtext scores are highly correlated but empirically separable, each with high internal consistency (hypothesis a) we address this hypothesis by testing the fit of data to alternative models using confirmatory factor analysis (cfa) with the statistical package, mplus 7.3 (muthén & muthén, 2012). all requirements for the structural equation model’s calculation were determined and were confirmed (bagozzi & yi, 1988). model-1 posits two factors corresponding to quantitative reasoning and verbal reasoning. model-2 posits a general reasoning factor combining quantitative reasoning and verbal reasoning. we used a maximum likelihood estimator with robust standard errors (labeled mlr in mplus) to take into account that item responses were dichotomous (0,1). due to the combination of booklet design and dichotomous items, the selection was limited to maximum likelihood estimators. in mplus, usually a weighted least squares estimator (wlsmv) is used for categorical variables. furthermore, as the booklet design inevitably causes missing data patterns that must be taken into account, the modelling options are limited to cfa using mplus’ options. we chose the mlr estimator, as this estimator enabled us to conduct a robust chisquare difference test. to examine the difference between the two cfa models, the 𝜒² difference test was performed. if the 𝜒² value is significant, the more restrictive model fits the data significantly worse than the general model (bagozzi & yi, 1988). the results are presented in table 3. 49 | f l r table 3 model fit of the calculated cfa models model χ² (df) p correction factor χ²/df rmsea aic bic 1 two-factor cfa model 2114.602 (1648) <.001 1.0130 1.28 0.014 51319.703 52312.150 2 one-factor cfa model 2167.981 (1649) <.001 1.0135 1.31 0.015 51371.791 52358.931 overall, both models showed a good fit to the data. the disattenuated correlation between quantitative reasoning and verbal reasoning in model model-1 was 0.76 (p=.000) suggesting that while high, quantitative reasoning and verbal reasoning could be interpreted separately. to decide which model fit the data better, we calculated a 𝜒² difference test for the empirical comparison of both models, including the models’ correction factors in the comparison test formula. as a calculation of the difference test with mplus 7.3 was not possible due to the applied ‘maximum likelihood’ estimator, we had to manually calculate the value. to this end, we applied the satorra-bentler scaled chi-square test statistic (satorra & bentler, 2010). for this purpose, the onefactor model-2 is defined as the constrained model, the two-factor model-1 is defined as the freely estimated model, and the 𝜒² test of the difference of the 𝜒² values of the model (column 2 in table 3) is used to conclude whether the reduction of the 𝜒² value in the freely estimated model is significant and whether it fits the data better than the constrained model. conducting the 𝜒² difference test for the mlr estimated models resulted in a scaled 𝜒² value of the differences of 280.5 with one degree of freedom (df=1). considering the 𝜒² statistic and its distribution, the 𝜒² difference test showed a p-value of 0.000 (δ𝝌𝟐=280.5; δ𝑑𝑓=1). model-1 therefore had a significantly better fit than model-2. aic and bic values can be used as additional indicators in model comparisons. a smaller value signifies a better data fit (schreiber et al., 2006). the model comparison shows that model-1 has lower aic and bic values and is therefore preferable to model-2. thus, we were able to isolate a quantitative reasoning score consistent with brückner et al. (2015a). moreover, the reliabilities for the general reasoning (one factor) and the quantitative reasoning and verbal reasoning scores (factor reliability, bagozzi & yi, 1988) for quantitative reasoning and verbal reasoning scores is acceptable with 0.70 for quantitative reasoning and 0.75 for verbal reasoning). hence, quantitative reasoning and verbal reasoning can be interpreted separately. this means, it is possible to measure business and economics quantitative reasoning from scores on a general knowledge test. thus, hypothesis a is supported. 5.2 the resulting scores for quantitative reasoning provide valid measures regarding external criteria for the underlying construct (hypotheses b-e) we then addressed research question 2: convergent and discriminant validity testing: hypotheses b to e. 3 we focused on the correlation of individual quantitative reasoning scores and mean comparisons with other variables in a nomological network (section 3). we analyzed whether the pattern of correlations of other variables with quantitative reasoning is what would be expected based on previous research (reviewed above) and whether it supports our hypotheses. the following variables were used in this correlational analysis: (1) gender (0=female, 1=male; expected males to score higher than females on quantitative reasoning and vice versa on verbal reasoning), (2) migration background (0=no migration background,1=migration background; expected no migrant background to perform higher on both, especially verbal reasoning ), (3) school leaving grade (1=excellent, 2=good, 3=sufficient, 4=acceptable; expected lower numbers result in higher performance in both quantitative reasoning and verbal reasoning), 50 | f l r (4) advanced courses attended (0=advanced course in business and economics, 1=no advanced course in business and economics; expected to have no significant influence neither on quantitative reasoning nor on verbal reasoning), and (5) completion of a commercial vocational training (0=commercial vocational training, 1=no commercial vocational training; expected to have no significant influence neither on quantitative reasoning nor on verbal reasoning). to test whether these convergent or discriminant external variables supported our expectations they were regressed on the quantitative reasoning and verbal reasoning measures. to this end, model 1 from section 4 was extended by adding the 5 variables as predictors to the model. because of the booklet design, this has the advantage that student abilities are not required to be estimated explicitly as, for instance, sum scores, but are estimated within the multiple indicators multiple causes (mimic) regression model. this reduces biases because difficulty and discrimination parameters are taken into account. all calculations were again performed with mplus 7.3 (muthén & muthén, 2012). the results of the latent regression model are presented in table 4. table 4 regression of individual variables on quantitative reasoning (qr) and verbal reasoning (vr) in the chosen business & economics sub-domains (n=1,445) variable qr vr constant 0.788*** 0.377*** gender (male students) 0.091*** 0.080*** migration background -0.066*** -0.064*** school leaving grade (gpa) -0.053*** -0.057*** no advanced course in business & economics -0.013 -0.011 no commercial vocational training -0.031** -0.006 r² 0.182 0.171 note. *=p-value≤.1; **=p-value≤.05; ***=p-value≤.01 regarding hypothesis b that male students perform better than female students, (which is based on all our previous studies in the domain of business and economics, brückner et al., 2015a), the results meet our expectations for quantitative reasoning but not for verbal reasoning. thus, hypothesis b is only partially supported. hypothesis c that students with a migration background perform worse than students without a migration background is supported since students with a migration background have lower scores in both quantitative reasoning and verbal reasoning. similarly, students with better school leaving grades (with 1 high and 5 low) have a higher score in quantitative reasoning and verbal reasoning and hypothesis d on the relationship between school leaving grades and test performance in quantitative reasoning items is supported. in contrast to our expectations regarding hypothesis e, that prior knowledge in business and economics acquired in advanced courses at vocational schools or in commercial training does not necessarily lead to better performance in tasks that demand quantitative reasoning, we identified a significant impact of the gpa on quantitative reasoning. however, there are no significant effects on the verbal reasoning and no significant effects from attending advanced classes in economics on verbal reasoning and quantitative reasoning. 51 | f l r moreover, the relationship between quantitative reasoning and vocational training is less strong than the relationship with all other variables, indicating that prior knowledge could be considered a discriminating criterion for quantitative reasoning. thus, hypothesis e is only partially supported. 6. discussion and conclusion quantitative reasoning is considered an essential outcome of higher education. while often not directly measured in college, existing tests in business and economics, for example, contain enough quantitative reasoning items to estimate this capacity. this study empirically identifies subsets of items that conceptually measure in business and economics knowledge tests (see research question 1). the analysis confirms that the subset of quantitative reasoning items can be empirically distinguished from verbal reasoning items (as suggested in hypothesis a). overall, the internal construct validity of quantitative reasoning was further supported in this study, in line with previous research reported by brückner et al. (2016). in terms of convergent and discriminant validity, the analyses indicate that the resulting quantitative reasoning scores correlate, as expected, with the external criteria focused on in this paper (research question 2). more specifically, male students perform better than female students on a business and economics quantitative reasoning subtest (as suggested in hypothesis b). however, male students also outperform female students on verbal reasoning tasks. further analyses are therefore necessary to determine the underlying reasons, e.g., whether these differences become manifest due to the quantitative nature of these tasks, or whether it is rather a general domain-specific effect in the tasks that is decisive here. students with a migration background have lower scores in a business and economics quantitative reasoning subtest than students without migration backgrounds (as suggested in hypothesis c). however, this difference became evident in a verbal reasoning subset as well, and will therefore require further investigation in future research. furthermore, we have found a relationship between school leaving grades (gpa) and scores in a business and economics quantitative reasoning and verbal reasoning subtests (as suggested in hypothesis d). this finding is in line with other studies, which show correlations between generic skills like quantitative reasoning and verbal reasoning, and scores in domain-specific tasks. finally, our results indicate that students who have pursued advanced courses in business and economics do not perform significantly better than students who have no prior education in business and economics in a quantitative reasoning (or verbal reasoning) subtests (as suggested in hypothesis e). however, students who have completed commercial vocational training outperform students without vocational training on quantitative reasoning but not verbal reasoning. although this effect is small, this finding contradicts our assumption. however, numerous studies show similar weak correlations between generic skills like quantitative reasoning and domain-specific knowledge, and such relations are also plausible in context of the development of cognitive outcomes (see figure 1). to summarize, the present analyses indicate that it is possible to use a knowledge test in the domain of business and economics and identify a reliable and valid subset of items that conceptually measure quantitative reasoning. in terms of construct validity of quantitative reasoning as an indirect measure out of a domainspecific knowledge test, the findings show evidence to support these resulting scores as a valid measure for quantitative reasoning in business and economics. while it is possible to get valid a measure of quantitative reasoning with a domain-specific knowledge test if the domain deals with numeric properties or quantitative features in its contents, the number of existing test items was not equally distributed between both quantitative reasoning and verbal reasoning. furthermore, when it comes to explaining differences between students’ test performances in terms of the two factors quantitative reasoning and verbal reasoning, there might be other, e.g. more general domain-specific or taskrelated effects in play which were not controlled or discovered here. for instance, in another study on the tuce, the linguistic properties of the 60 test items used can explain up to 25% of performance without 52 | f l r considering any other attributes such as gender or prior knowledge (mehler et al., 2018). neither these nor other features of the tuce tasks, particularly when comparing quantitative reasoning and verbal reasoning tasks, have been investigated so far in the research of quantitative reasoning in a specific domain. in future studies, therefore, we should examine whether and to what extent quantitative reasoning and verbal reasoning tasks differ in for instance linguistic features. the empirical differentiability and the significance of spatial reasoning (sr) in relation to verbal reasoning and quantitative reasoning should also be researched using suitable test instruments. here, performance assessments in simulating more complex realistic scenarios show particularly interesting potential (shavelson et al., 2019). the correlation between quantitative reasoning and thinking and understanding in business and economics needs to be examined in a much more detailed and differentiated manner. future studies should assess the role of quantitative reasoning using separate quantitative reasoning tests as external criteria of quantitative reasoning to validate the factor-based quantitative reasoning test scores, their relationship to domain-specific content knowledge (e.g., final grades in a bachelor’s degree program), and their incremental predictive validity. in this context, another important step would be to examine in more detail to what extent these skills are generic or to what extent they also encompass domain-specific components, which is still a fundamental underresearched question. despite these limitations, this study supports the crucial role of quantitative reasoning in solving business and economics tasks. in terms of implications for educational practice, this skill needs more curricular and instructional attention in developing (domain-specific) expertise in higher education. teaching quantitative reasoning should be anchored more deeply in economic education to reduce the substantial deficits in corresponding skills among students as shown in other studies (e.g., brückner et al., 2016). in this context, an objective, reliable, and valid assessment of students’ quantitative reasoning development provides a necessary basis for various diagnostic and instructional purposes in and outside of higher education. understanding how students learn and implement quantitative reasoning to solve domain-specific tasks, and how these skills develop throughout their academic studies can help educational practitioners to develop (more) effective tailored instruction to promote the development of quantitative reasoning among students. this may improve students’ learning outcomes and domain-specific performance. the newly developed methodological approach presented in this paper, which is based on gaining information on students’ quantitative reasoning using existing domainspecific tests, offers a practical alternative to broad timeand resource-intensive test batteries for valid measuring students’ learning outcomes in higher education. 53 | f l r acknowledgments we would like to thank the two reviewers who provided constructive feedback and helpful guidance in the revision of this manuscript. notes 1 although there is extensive research regarding general cognitive abilities and their correlates (carroll, 1993), this and related work does not take into account domain-specificity in thought processes when it comes to quantitative reasoning. 2 anonymized is the acronym for the title ‘modeling and measuring competencies in business and economics among students and graduates by adapting and further developing existing american and mexican measuring instruments (tuce/ egel). for further information, https://www.blogs.unimainz.de/fb03-wiwi-competence-1/. 3 to further evaluate the validity of our interpretation of quantitative reasoning test scores, we examined and reported on test content and student response processes in previous studies (following aera et al., 2014). the validity criterion ‘test content’ was important in adapting tuce and egel items to the german context. content analyses in the form of curricular analyses, expert interviews, and online ratings (zlatkintroitschanskaia et al., 2014) provide support for the claim that the test also measures skills demanding quantitative reasoning. evidence from ‘think aloud’ or ‘cognitive interviews’ with students supports ‘response processes’ claims. the findings of the quantitative analyses presented here were also confirmed in think aloud interviews with the test takers (brückner & pellegrino, 2016). key points this research demonstrates how students’ quantitative reasoning (qr), considered a fundamental facet of 21 st century skills, can be validly measured using existing domain-specific tests to avoid the practically less suitable use of broad test batteries in education. two well-established standardized knowledge tests from the domain of business and economics (b&e) are used to conceptually isolate quantitative reasoning embedded in domain-specific test tasks. item-response data and confirmatory factor analysis are used to see if these tasks actually measure quantitative reasoning in a valid and reliable way. https://www.blogs.uni-mainz.de/fb03-wiwi-competence-1/ https://www.blogs.uni-mainz.de/fb03-wiwi-competence-1/ 54 | f l r references aera (american education research association), apa (american psychological association), & ncme (national council on measurement in education). (2014). standards for educational and psychological testing. aera. alexander, p. a., & jetton, t. l. (2003). learning from traditional and alternative texts: new conceptualization for an information age. in a. c. graesser, m. a. gernsbacher & s. r. goldman (eds.), handbook of discourse processes (pp. 199–241). lawrence erlbaum associates. anderson, j. r. (2005). cognitive psychology and its implications (6th ed.). worth. association of american colleges and universities (aac&u). (2008). college learning for the new global century. https://secure.aacu.org/aacu/pdf/globalcentury_execsum_3.pdf bagozzi, r. p., & yi, y. (1988). on the evaluation of structural equation models. journal of the academy of marketing science, 16(1), 74–94. https://doi.org/10.1007/bf02723327 ball, d. l. (2003). mathematical proficiency for all students: toward a strategic research and development program in mathematics education. rand mathematics study panel. ballard, c. l., & johnson, m. f. (2004). basic math skills and performance in an introductory economics class. the journal of economic education, 35(1), 3–23. https://doi.org/10.3200/jece.35.1.3-23 bleske-rechek, a., & browne, k. (2014). trends in gre scores and graduate enrollments by gender and ethnicity. intelligence, 46, 25–34. https://doi.org/10.1016/j.intell.2014.05.005 brückner, s., förster, m., zlatkin-troitschanskaia, o., happ, r., walstad, w.b., yamaoka, m., & asano, t. (2015a). gender effects in assessment of economic knowledge and understanding: differences among undergraduate business and economics students in germany, japan, and the united states. peabody journal of education, 90(4), 503–518. https://doi.org/10.1080/0161956x.2015.1068079 brückner, s., förster, m., zlatkin-troitschanskaia, o., & walstad, w. b. (2015b). effects of prior economic education, native language, and gender on economic knowledge of first-year students in higher education. a comparative study between germany and the usa. studies in higher education, 40(3), 437–453. https://doi.org/10.1080/03075079.2015.1004235 brückner, s., & pellegrino, j. w. (2016). integrating the analysis of mental operations into multilevel models to validate an assessment of higher education students’ competency in business and economics. journal of educational measurement, 53(3), 293–312. https://doi.org/10.1111/jedm.12113 byrne, m., & flood, b. (2008). examining the relationship among background variables and academic performance of first year accounting students at an irish university. journal of accounting education, 26(4), 202–212. https://doi.org/10.1016/j.jaccedu.2009.02.001 carroll, j. b. (1993). human cognitive abilities: a survey of factor-analytic studies. cambridge university press. https://doi.org/10.1017/cbo9780511571312 ceneval (centro nacional de evaluación para la educación superior). (2011). examen general para el egreso de la licenciatura en administración. egel-admon. ceneval. davidson, m., & mckinney, g. g. r. (2001). quantitative reasoning: an overview. dialogue, 8, 1–5. ets (educational testing service). (2016). gre: guide to the use of scores 2016-17. https://www.ets.org/s/gre/pdf/gre_guide.pdf elrod, s. (2014). quantitative reasoning: the next "across the curriculum" movement. peer review, 16(3), 4–8. https://search.proquest.com/docview/1698319962?accountid=14632 follette, k., buxner, s., dokter, e., mccarthy, d., vezino b., brock, l., & prather, e. (2017). the quantitative reasoning for college science (quarcs) assessment 2: demographic, academic and attitudinal variables as predictors of quantitative ability. numeracy, 10(1), 1–33. https://doi.org/10.5038/19364660.10.1.5 förster, m., brückner, s., & zlatkin-troitschanskaia, o. (2015). assessing the financial knowledge of university students in germany. empirical research in vocational education and training, 7(6), 1– 20. https://doi.org/10.1186/s40461-015-0017-5 frey, a., hartig, j., & rupp, a. a. (2009). an ncme instructional module on booklet designs in large-scale assessments of student achievement: theory and practice. educational measurement: issues and practice, 28(3), 39–53. https://doi.org/10.1111/j.1745-3992.2009.00154.x https://doi.org/10.3200/jece.35.1.3-23 55 | f l r fritsch, s., berger, s, seifried, j., bouley, f., wuttke, e., schnick-vollmer, k., & schmitz, b. (2015). the impact of university teacher training on prospective teachers’ ck and pck – a comparison between austria and germany. empirical research in vocational education and training, 7(4), 1–20. https://doi.rog/10.1186/s40461-015-0014-8 gaze, e. c., montgomery, a., kilic-bahi, s., leoni, d., misener, l., & taylor, c. (2014). towards developing a quantitative literacy/reasoning assessment instrument. numeracy, 7(2), 4. https://doi.org/10.5038/1936-4660.7.2.4 happ, r., zlatkin-troitschanskaia, o., & förster, m. (2018). how prior economic education influences beginning university students’ knowledge of economics. empirical research in vocational education and training, 10(5), 1–20. https://doi.org/0.1186/s40461-018-0066-7 hart research associates. (2009). learning and assessment: trends in undergraduate education. a survey among members of the association of american colleges and universities. http://www.aacu.org/membership/documents/2009membersurvey_part1.pdf itc (international test commission). (2005). itc guidelines for translating and adapting tests. https://www.intestcom.org/files/guideline_test_adaptation.pdf kuncel, n. r., wee, s., serafin, l., & hezlett, s. a. (2010). the validity of the graduate record examination for master’s and doctoral programs: a meta-analytic investigation. educational and psychological measurement, 70(2), 340–352. https://doi.org/10.1177/0013164409344508 lusardi, a., & wallace, d. (2013). financial literacy and quantitative reasoning in the high school and college classroom. numeracy, 6(2), article 1. https://doi.org/10.5038/1936-4660.6.2.1 madison, b. l. (2009). all the more reason for qr across the curriculum. numeracy, 2(1), article 1. https://doi.org/10.5038/1936-4660.2.1.1 mehler, a., zlatkin-troitschanskaia, o., hemati, w., molerov, d., lücking, a., & schmidt, s. (2018). integrating computational linguistic analysis of multilingual learning data and educational measurement approaches to explore student learning in higher education. in o. zlatkintroitschanskaia, g. wittum & a. dengel (eds.), positive learning in the age of information (pp. 145– 193). springer vs. https://doi.org/10.1007/978-3-658-19567-0_10 mislevy, r. j., & haertel, g. d. (2006). implications of evidence-centered design for educational testing. educational measurement: issues and practice, 25(4), 6–20. https://doi.org/10.1111/j.17453992.2006.00075.x muthén, l. k., & muthén, b. o. (2012). mplus user’s guide (7th ed.). muthén & muthén. nrc (national research council). (2012). education for life and work: developing transferable knowledge and skills in the 21st century. https://nap.nationalacademies.org/catalog/13398/education-for-life-andwork-developing-transferable-knowledge-and-skills oecd (organisation for economic co-operation and development). (2013). the survey of adult skills: reader’s companion. https://www.oecd.org/skills/piaac/skills%20(vol%202)reader%20companion--v7%20ebook%20(press%20quality)-29%20oct%200213.pdf o’neill, p. b., & flynn, d. t. (2013). another curriculum requirement? quantitative reasoning in economics: some first steps. american journal of business education, 6(3), 339–346. https://doi.org/10.19030/ajbe.v6i3.7814 owen, a. l. (2012). student characteristics, behavior, and performance in economics classes. in g. m. hoyt & k. m. mcgoldrick (eds.), international handbook on teaching and learning economics (pp. 341– 350). edward elgar. rocconi, l. m., lambert, a. d., mccormick, a. c., & sarraf, s. a. (2013). making college count: an examination of quantitative reasoning activities in higher education. numeracy, 6(2), 1–20. https://doi.org/10.5038/1936-4660.6.2.10 roohr, k. c., graf, e. a., & liu, o. l. (2014). assessing quantitative literacy in higher education: an overview of existing research and assessments with recommendations for next-generation assessment. ets research report series, 2014(2), 1–26. https://doi.org/10.1002/ets2.12024 satorra, a., & bentler, p. m. (2010). ensuring positiveness of the scaled difference chi-square test statistic. psychometrika, 75(2), 24–248. https://doi.org/10.1007/s11336-009-9135-y 56 | f l r schreiber, j. b., nora, a., stage, f. c., barlow, e. a., & king, j. (2006). reporting structural equation modeling and confirmatory factor analysis results. a review. the journal of educational research, 99(6), 323–338. https://doi.org/10.3200/joer.99.6.323-338 schuler, h., funke, u., & baron-boldt, j. (1990). predictive validity of high-school grades: a meta-analysis. applied psychology: an international review, 39(1), 89–103. https://doi.org/10.1111/j.14640597.1990.tb01039.x shavelson, r. j. (2008). reflections on quantitative reasoning: an assessment perspective. in b. l. madison & l. a. steen (eds.), calculation vs. context. quantitative literacy and its implications for teacher education (pp. 27–44). mathematical association of america. shavelson, r. j., & huang, l. (2003). responding responsibly to the frenzy to assess learning in higher education. change. the magazin of higher learning, 35(1), 10–19. https://doi.org/10.1080/00091380309604739 shavelson, r. j., marino, j. p., zlatkin-troitschanskaia, o., & schmidt, s. (2019). reflections on the assessment of quantitative reasoning. in b. l. madison & l. a. steen (eds.), calculation vs. context: quantitative literacy and its implications for teacher education. mathematical association of america. tiffin, p. a., mclachlan, j. c., webster, l., & nicholson, s. (2014). comparison of the sensitivity of the ukcat and a levels to sociodemographic characteristics: a national study. bmc medical education, 14(7), 1–12. https://doi.org/10.1186/1472-6920-14-7 walstad, w. b., watts, m. w., & rebeck, k. (2007). test of understanding in college economics: examiner’s manual (4th ed.). national council on economic education. williams, m. l., waldauer, c., & duggal, v.g. (1992). gender differences in economic knowledge: an extension of the analysis. the journal of economic education, 23(3), 219–231. https://doi.org/10.1080/00220485.1992.10844756 yamaoka, m., walstad, w. b., watts, m. w., asana, t., & abe, s. (eds.). (2010). comparative studies on economic education in asia-pacific region. shumpusha. zahner, d. (2013). reliability and validity of cla +. council for aid to education. zlatkin-troitschanskaia, o., förster, m., brückner, s., & happ, r. (2014). insights from a german assessment of business and economics competence. in h. coates (ed.), higher education learning outcomes assessment (pp. 175–200). peter lang. zlatkin-troitschanskaia, o., förster, m., schmidt, s., brückner, s., & beck, k. (2015). erwerb wirtschaftswissenschaftlicher fachkompetenz im studium. eine mehrebenenanalytische betrachtung von hochschulischen und individuellen einflussfaktoren [acquisition of economic competence over the course of studies. a multilevel consideration of academic and individual determinants]. in s. blömeke & o. zlatkin-troitschanskaia (eds.), kompetenzen von studierenden (pp. 116–134). beltz juventa. https://doi.org/10.25656/01:15506 zlatkin-troitschanskaia, o., shavelson, r. j., & pant, h. a. (2017). assessment of learning outcomes in higher education – international comparisons and perspectives. in c. secolsky & b. denison (eds.), handbook on measurement, assessment and evaluation in higher education (2nd ed.). routledge. 1. introduction 2. conceptual and assessment background 2.1 quantitative reasoning as a key student learning outcome 2.2 assessments of quantitative reasoning as a generic skill 2.3 assessments of quantitative reasoning in the business and economics domain 3. research questions and hypotheses 4. methods and study design 4.1 quantitative reasoning and verbal reasoning items in business and economics tests 4.2 sample 5. analyses and results 5.1 quantitative reasoning and verbal reasoning subtext scores are highly correlated but empirically separable, each with high internal consistency (hypothesis a) 5.2 the resulting scores for quantitative reasoning provide valid measures regarding external criteria for the underlying construct (hypotheses b-e) 6. discussion and conclusion notes microsoft word bempeni & vamvakoussi_publication.docx frontline learning research vol. 3 no. 1 (2015) 18 – 35 issn 2295-3159 individual differences in students’ knowing and learning about fractions: evidence from an in-depth qualitative study maria bempenia,*, xenia vamvakoussib a university of ioannina, greece buniversity of ioannina, greece article received 22 november 2014 / revised 2 february 2015 / accepted 11 february 2015 / available online 1 april 2015 abstract we present the results of an in-depth qualitative study that examined ninth graders’ conceptual and procedural knowledge of fractions as well as their approach to mathematics learning, in particular fraction learning. we traced individual differences, even extreme, in the way that students combine the two kinds of knowledge. we also provide preliminary evidence indicating that students with strong conceptual fraction knowledge adopt a deep approach to mathematics learning (associated with the intention to understand), whereas students with poor conceptual fraction knowledge adopt a superficial approach (associated with the intention to reproduce). these findings suggest that students differ in the way they reason and learn about fractions in systematic ways and could be used to inform future quantitative studies. keywords: fractions; conceptual/procedural knowledge; individual differences; learning approach * corresponding author. current address: trempessinas 32, athens, 12136, greece. e-mail address: mbempeni@gmail.com doi http://dx.doi.org/10.14786/flr.v3i1.132 m. bempeni & x. vamvakoussi | f l r 19 1. theoretical background the distinction between procedural and conceptual knowledge has elicited considerable research and discussion among researchers in the fields of cognitive-developmental psychology and mathematics education. procedural knowledge is defined as the ability to execute action sequences to solve problems and is usually tied to specific problem types, whereas conceptual knowledge is defined as knowledge of concepts pertaining to a domain and related principles (rittle-johnson and schneider, in press). the relation between the two types of knowledge, particularly with respect to their order of acquisition has elicited considerable discussion, and there is evidence in favour of contradictory views – in the words of rittle-johnson, siegler, and alibali (2001), “concepts-first” and “procedures-first” theories. according to concepts-first theories, children develop (or are born with) conceptual knowledge in a domain and then use this knowledge to select procedures for solving problems. according to procedures-first theories, children learn procedures for solving problems in a domain and later extract domain concepts from repeated experience in solving problems. in the area of mathematics education research, the two types of knowledge (sometimes referred to by other terms) are deemed practically inseparable (gilmore & papadatou-pastou, 2009; hiebert & wearne, 1996). nevertheless, it is assumed that procedural knowledge plays an important role in the development of conceptual understanding (dubinsky, 1991; gray & tall, 1994; sfard, 1991). more specifically, it is suggested that mathematical concepts develop out of related mathematical processes. such accounts share two common background assumptions, namely that there is a single developmental path and that this path is independent of the particular domain considered. rittle-johnson and siegler (1998) challenged the latter providing evidence that the order of acquisition many vary, depending of the domain considered. in any case, the two types of knowledge appear closely related. thus, rittle-johnson et al. (2001) argued for an iterative model, according to which the two types of knowledge develop in a hand-over-hand process and gains in one type of knowledge lead to improvements in the other. this model is supported by empirical evidence and seems to provide an adequate description of the relation between conceptual and procedural knowledge (rittle-johnson & schneider, in press). nevertheless, there is evidence that sometimes the development of one type of knowledge does not necessarily lead to the development of the other. indeed, in the area of fraction learning it has been shown that some students have the ability to perform fraction procedures without exhibiting comparable conceptual understanding or without being able to explain why they are using these procedures (kerslake, 1986; peck & jencks, 1981). on the other hand, resnick (1982) presented evidence showing that some children may exhibit conceptual understanding of principles underlying subtraction without showing procedural fluency. recently, a different explanation for the contradictory findings has been proposed, namely that not enough attention has been paid to the individual differences in the way that students combine the two types of knowledge (gilmore & bryant, 2008; gilmore & papadatou-pastou, 2009; hallett, bryant, & nunes, 2010; hallett, nunes, bryant and thrope, 2012). hallett and colleagues examined the procedural and conceptual fraction knowledge of students at grade 4 and 5 (2010) as well as at grade 6 and 8 (2012). they identified groups of students who had strong (or weak) procedural as well as conceptual knowledge. however, they also consistently traced two substantial groups of students who demonstrated relative strength with one form of knowledge and weakness with the other, with differences between the two types of knowledge becoming less salient with age. these findings challenge the assumption that all children follow a uniform sequence in gaining the two types of knowledge (see also canobi, reeve, & pattison, 2003). in their attempts to explain how such individual differences arise, some researchers appealed to differences in students’ prior knowledge in the domain in question (schneider, rittle johnson, & star, 2011); differences in students’ cognitive profiles (gilmore & bryant, 2008; hallett et al., 2012) and differences in students’ educational experiences (canobi, 2004; gilmore & bryant, 2008; hallett et al., 2012). however, empirical evidence in support of these assumptions is so far lacking. for example, schneider et al. (2011) found no evidence supporting the hypothesis that the correlation between the two kinds of knowledge might m. bempeni & x. vamvakoussi | f l r 20 vary with different levels of prior knowledge in the area of equation solving. hallett et al. (2012) investigated whether individual differences in procedural and conceptual knowledge of fractions can be explained by differences in students’ general procedural and conceptual ability (measured by standardized tests); they found no such evidence. in addition, hallett et al. (2012) examined the role of school experience, which they measured as school attendance, that is, they investigated whether attending different schools could explain the individual differences in question; they found no such relation. further research, possibly with different measures, is necessary to clarify the role of the above factors in individual differences in procedural and conceptual knowledge, in particular of fractions. we argue that a factor also worth investigating is the individual student’s learning approach to mathematics. in the literature there is an overarching distinction between the deep approach to learning, associated with the individual’s intention to understand; and the surface approach, associated with the individual’s intention to reproduce. there are several ways of characterizing each learning approach, mainly adapted to tertiary education (entwistle & mccune, 2004). stathopoulou and vosniadou (2007) proposed a model, which was tested with secondary students. they included three categories for each learning approach, namely goals, (study) strategy use, and awareness of understanding. a deep approach to learning involves goals of personal making of meaning, deep study strategy use (e.g., integration of ideas), and high awareness of understanding. a superficial approach involves performance goals, superficial strategy use (e.g., rote learning), and low awareness of understanding. using these categories, stathopoulou and vosniadou showed that students with strong conceptual understanding of science concepts adopted a deep approach to science learning, whereas students with poor conceptual understanding adopted a superficial approach. a similar association might be present in the case of mathematics as well. indeed, a student that follows a deep learning approach to mathematics is more likely to pay attention to the concepts and principles in the domain in question, to be aware of conceptual difficulties, and to invest the effort necessary to overcome them. on the contrary, a student with a superficial approach is more likely to focus on memorizing procedures, especially if procedures are emphasized in instruction, as is often the case (moss, 2005). before we formulate our hypotheses, we turn to a methodological issue, namely the difficulty to measure the two types of knowledge validly and independently of each other (e.g., gilmore & bryant, 2006; hiebert & wearne, 1996; rittle-johnson & schneider, in press; schneider & stern, 2010; silver, 1986). the development of a procedural test that would be conceptual free (and vice versa) is a challenging task, since this type of tests may be person, content and context sensitive (haapasalo & kadijevich, 2000; schneider et al., 2011). moreover, for tasks administered in paper-and-pencil tests, it is often impossible to decide how the student actually solved the task. for such reasons, hiebert and wearne (1996) suggested that attention should be also paid to students’ solution strategies (see also faulkenberry, 2013). a distinction between procedural and conceptual strategies (alsawaie, 2011; clarke & roche, 2009; yang, reys, & reys, 2007) is relevant at this point: procedural strategies are related to rules and exact computation algorithms learnt from instruction. conceptual strategies, on the other hand, are diverse, and tailored to the specific problem at hand; they are mostly invented by (some) students themselves that use them flexibly in order to avoid lengthy computations as well as to deal with unfamiliar problems (see also smith, 1995). in this study, we examined ninth graders’ conceptual and procedural fraction knowledge. taking into account the methodological issue mentioned above, we designed a qualitative study in order to also monitor students’ strategies. similarly to hallett et al. (2010, 2012), we hypothesized that there are individual differences in the way students combine the two kinds of knowledge. we were particularly interested in extreme cases, namely students with strong conceptual knowledge and weak procedural knowledge, and vice versa. such cases are theoretically interesting, since they are not compatible with the iterative model (rittlejohnson et al., 2001). moreover, tracing extreme cases at grade 9 would indicate that individual differences may persist, although the general tendency is for them to become less salient with age (hallett et al., 2012). in addition, we examined students’ learning approach to mathematics learning, particularly fraction learning. following stathopoulou and vosniadou (2007), we explored whether students with strong m. bempeni & x. vamvakoussi | f l r 21 conceptual knowledge adopt a deep learning approach to mathematics, whereas students with weak conceptual knowledge adopt a superficial approach. 2. methodology 2.1 participants the participants were seven greek students at grade nine (three girls), from seven different schools in the area of athens. the selection of the participants was not random. first, based on their school grades, all participants could be characterized as medium level students in mathematics. second, they all had the same mathematics tutor, starting from the last grades of the elementary school. their tutor provided information about their mathematical behaviour. based on this information, we had reasons to expect some variation in their conceptual and procedural knowledge of fractions. we note that by grade seven greek students are taught all the material related to fractions as well as decimals, and are introduced to the term “rational numbers”. we stress that at the moment this study took place the mathematics curriculum as well as the mathematics textbooks, were “traditional’, in the sense that they emphasized general, computation-intensive procedures for dealing with fraction tasks (smith, 1995). consider, for example, that mental calculations and estimation strategies were not among the curricular goals. based on information provided by our participants’ tutor, who had extensive knowledge about their homework assignments as well as their assessment tests on a long-term basis, we had good reasons to believe that instruction relied heavily on the textbooks, at least with respect to what students were expected to do. 2.2 materials we used thirty fraction tasks grouped in four categories (see appendix a). category a included five procedural tasks, that is, tasks that for which a standard procedure was taught at school: four tasks that examined operations with fractions (q.1.1-q.1.4); and one task that required conversion to an equivalent fraction (q.1.5). category b, consisting of eight tasks, targeted on conceptual knowledge. four tasks involved fraction representations (q.1.6-q.1.9); one task required recognizing fraction as a ratio (q.1.10); one item focused in the role of the unit of reference (q.1.11); and two tasks targeted on the understanding of the effect of multiplication and division with fractions (q.1.12, q.1.13). there were no tasks similar to q.1.10-q.1.13 in the textbooks, either at the elementary, or at the secondary level. on the other hand, the area model for the representation of fractions was salient in the elementary school textbooks, but unlike q.1.8., the shape was typically given, already equally partitioned; examples of improper fraction representations were scarce (q.1.9), and there was no task similar to q.1.7. category c consisted of seven comparison (q.1.14-q.1.17, q.1.20-q.1.22) and two ordering tasks (q.1.18, q-1.19). although these tasks could be solved by standard methods taught at school, they could also be solved by a variety of conceptual strategies. finally, the tasks of the category d required deep conceptual understanding or the combination of conceptual understanding and procedural fluency. more specifically, there were two tasks regarding locating fractions on the number-line (q.1.23, q.1.24); one problem that involved an intensive quantity and required the comparison of ratios (q.1.25); one task regarding estimation of a fraction sum (q.1.26); one task that required substituting variables with non-natural numbers (q.1.27); one task that tested the use of the inverse relationship between addition and subtraction, as well as between multiplication and division with fractions m. bempeni & x. vamvakoussi | f l r 22 (q.1.28); and two tasks targeting the dense ordering of rational numbers (q.1.29, q.1.30). there were no tasks similar to q.1.27-q.1.30 in the mathematics textbooks. locating fractions on the number line was presented at the secondary level (grade 7), albeit not particularly emphasized. the selection and categorization of the tasks was based on relevant literature (e.g., clarke & roche, 2009; hallett et al., 2010, 2012; mcintosh, reys, & reys, 1992; moss & case, 1999; smith, 1995). we note that we included items targeting students’ awareness of the differences between natural and rational numbers (e.g., q.1.12, q.1.13, q.1.29, q.1.30) which is considered an important aspect of conceptual knowledge (vamvakoussi & vosniadou, 2010; mcmullen, laakkonen, hannula-sormumen, & lehtinen, 2014). we also used a considerable number of tasks related to fraction magnitude (e.g., category d tasks, q.1.23, q.1.24) (for the importance of accessing fraction magnitude in students’ developing knowledge see siegler & pyke, 2013). we stress, however, that this categorization was tentative, since we also looked into students’ strategies. this consideration is particularly important for category c tasks, but relevant for all tasks. in addition, we developed twelve items so as to explore students’ learning approach (deep/superficial) to fraction and, more generally, to mathematics learning (see appendix b). the items were presented as scenarios describing a situation that the student had to react to. 2.3 procedure in the first phase of the study each student was asked to solve the fraction tasks, thinking aloud and explaining their answers. no time limit was imposed. in the second phase three participants were selected to participate in an in-depth, semi-structured individual interview about their learning approach to mathematics. because this was a first attempt to explore a potential relation between individual differences in conceptual and procedural fraction knowledge and the individual’s learning approach, we selected one student with strong procedural, but weak conceptual knowledge; one student with strong conceptual but weak procedural knowledge; and one student who combined both procedural and conceptual knowledge. these students were additionally asked to comment on the responses of the first questionnaire (certainty about the solution, awareness of their performance in the tasks). the second interview took place about one week later. each interview lasted about one hour. all interviews were recorded and transcribed. 2.4 data analysis first, we assessed the accuracy of students’ responses in all tasks. second, we examined the strategies used. we categorized a strategy as procedural, if it was based on instructed rules and procedures related to our research tasks. based on mathematics textbooks, as well as information by our participants’ mathematics tutor, we categorized as procedural strategies the standard algorithms for fraction operations; and transformation strategies (smith, 1995), namely converting to equivalent fractions, similar fractions, decimals, or mixed numbers. transformation strategies are relevant to operations as well as comparison, and they were over-emphasized in the textbooks. we also categorized as procedural the instructed method for q.1.25, namely the construction of a 2x2 table placing the like quantities one below the other, and forming and comparing the ratios. regarding the placement of fractions on the number line, the instructed method involved segmenting the unit in the appropriate number of parts. finally, given the salience of the area model for the representation of fractions, particularly in the elementary grades, we reasoned that it had the status of definition for fractions. we thus did not consider that students used a strategy, either conceptual, or procedural in the related tasks (q.1.6–q.1.9). we categorized as conceptual the strategies that were not based on instructed procedures. for comparison tasks, such strategies involved, for example, the use or reference numbers, such as the unit and one half; and also residual thinking, that is, comparing the complementary fractions (alsawaie, 2011; clarke & roche, 2009; smith, 2005; yang et al., 2007). in a more general fashion, we categorized as conceptual m. bempeni & x. vamvakoussi | f l r 23 strategies the ones that relied on estimation of fraction magnitudes, on spontaneous use of representations, and on spotting and employing the multiplicative relations present in the task at hand (e.g., in q.1.25). we categorized a strategy as conceptual/procedural if it involved conceptual and procedural features, such as adjusting a procedural strategy to deal with a novel task. a prominent example was the use of a transformation strategy, namely converting to equivalent fractions, as a first step to deal with q.1.30, combined with the idea that this process can be repeated infinitely many times. we also note that in certain cases students provided immediate responses that were not based on a specific strategy; rather, they relied on a holistic understanding of the situation at hand. this was the case mainly for tasks targeting the differences between natural numbers and fractions (q.1.12, q.1.13, q.1.29, q.1.30). for example, some students answered immediately that there is no other number between 2/5 and 3/5, directly drawing on their natural number knowledge. we categorized the strategy of relying on natural number knowledge as conceptual. for the second phase of the study, the categories (i.e., goals, strategy use, and awareness of understanding) and the related indicators used by stathopoulou and vosniadou (2007) were our starting point for the analysis. we reviewed all transcripts and coded them when possible. we selected sentences as unit of analysis, but in some cases we used paragraphs so as to obtain a sense of the whole. we looked for utterances that included keywords pertaining to the indicators of each category (e.g., remember, memorize, memory and similar expressions for the indicator ‘‘rote-learning’’ as a superficial strategy use). we placed the sentences in the coding categories according to the initial indicators and developed new indicators when needed. after coding, data that could not be coded were identified and analyzed to determine if they represented a new category. one new category emerged, namely engagement factors, consisting of two subcategories: preferred tasks/strategies (conceptual/procedural), and also motivation (intellectual challenge/coping). in addition, we replaced the category awareness of understanding with the more general category awareness with indicators pertaining to awareness of understanding (high/low) as well as to awareness of the effectiveness of one’s personal study strategies (high/low). the categories are presented in table 5. 3. results of the 1st phase of the study tables 1-4 present how students performed in the tasks of categories a-d, respectively; and the type of strategy (conceptual, procedural, or a combination of both) they used in each task. as shown in tables 1-4, students 1, 2, and 3 were rather successful across all task categories. students 4, 5, and 6 were successful in categories a and c, but not in categories b and d. student 7 failed in category a, but was rather successful in categories b, c, d. we placed the students in three profiles: a) conceptual-procedural (students 1, 2, and 3); b) procedural (students 4, 5, and 6); and c) conceptual (student 7). in the following we present these profiles in more detail. 3.1 conceptual procedural profile the conceptual-procedural students succeeded in all tasks of category a using procedural strategies, that is, standard algorithms (table 1). student 1 and student 3 (hereafter, kosmas) also succeeded in all tasks of category b (table 2). all three students relied heavily on conceptual strategies (reference numbers, residual thinking) to deal with the tasks of category c (table 3). all three performed well in the tasks of this category, with kosmas responding correctly to all tasks. m. bempeni & x. vamvakoussi | f l r 24 table 1 students’ performance (success, failure) and type of strategy used (conceptual, procedural, or conceptual-procedural) in the tasks of category a student q.1.1 q.1.2 q.1.3 q.1.4 q.1.5 profile 1 s, p s, p s, p s, p s, p conceptual/procedural 2 s, p s, p s, p s, p s, p 3 (kosmas) s, p s, p s, p s, p s, p 4 s, p s, p s, p s, p f, p procedural 5 s, p s, p s, p s, p s, p 6 (stella) s, p s, p s, p s, p s, p 7 (filio) f, p f, p f, p f, p s, p conceptual kosmas was the only student who responded correctly to all tasks of category 4. in general, however, all three students performed well in category d tasks, showing a rather strong conceptual understanding, combined with procedural fluency. a good indicator of their conceptual understanding is their responses to the density tasks (q.1.29, q.1.30), in particular to the first that is the most challenging. student 2 and kosmas provided an impressively sophisticated answer, stating explicitly that there is no such number and explained that, given any number, no matter how small, one can always find a smaller one. student 1, on the other hand, assumed that such a number exists, thus typically his answer is incorrect; however, he stated that this number cannot be found, not even by a computer; and described it as “zero point zero, followed by infinitely many zeroes, and one unit in the end”. these students’ tendency to prefer conceptual over procedural strategies manifested itself in the tasks of category d as well. none of them applied the instructed method to solve q.1.25; instead, they focused on the relations between the quantities involved. in the words of student 2: “stella’s milk tastes sweeter, because george dissolved the double quantity of chocolate in the triple quantity of milk”. the data presented in tables 1-4 show that kosmas was the only one who succeeded in all tasks. moreover, kosmas’s responses were more elaborated than his peers’ in terms of completeness as well as of the explanations he provided. consider, for example, q.1.26 that asked for the estimation of 7/15 and 5/12. all three students noticed that each addend was smaller than 1/2 and concluded that the sum was smaller than the unit. kosmas, however, went farther to notice that “this sum equals the unit minus 0.5/15+1/12. the missing part is close to 0.1; more precisely, a bit bigger than 0.1”. he reached this close estimate of the missing part mainly via mental calculations, writing down some of the intermediate results. m. bempeni & x. vamvakoussi | f l r 25 table 2 students’ performance (success, failure) and type of strategy used (conceptual, procedural, or conceptual-procedural) in the tasks of category b student q.1.6 q.1.7 q.1.8 q.1.9 q.1.10 q.1.11 q.1.12 q.1.13 profile 1 s s s s s s s, c s, c conceptual/procedural 2 s s s s s f s, c s, c 3 (kosmas) s s s s s s s, c s, c 4 s f f f f f f, c f, c procedural 5 s f f f f f f, c f, c 6 (stella) f f f f f f f, c f, c 7 (filio) s s s s s s s, c s, c conceptual table 3 students’ performance (success, failure) and type of strategy used (conceptual, procedural, or conceptual-procedural) in the tasks of category c student q.1.14 q.1.15 q.1.16 q.1.17 q.1.18 q.1.19 q.1.20 q.1.21 q.1.22 profile 1 s, c s, c s, c s, c s, c s, c f, c s, c s, c conceptual/procedural 2 s, c s, c s, c s, c s, c f, c/p s, c s, c s, c 3 (kosmas) ssss) s, c s, c s, c s, c s, c s, c s, c s, c s, c 4 s, p s, p s, p s, p s, p s, p s, p s, p s, p procedural 5 s, p s, p s, p s, p s, p s, p s, p s, p s, p 6 (stella) s, p s, p s, p s, p s, p s, p s, p s, p s, p 7 (filio) s, c s, c s, c s, c s, c s, c s, c s, c s, c conceptual m. bempeni & x. vamvakoussi | f l r 26 table 4 students’ performance (success, failure) and type of strategy used (conceptual, procedural, or conceptual-procedural) in the tasks of category d student q.1.23 q.1.24 q.1.25 q.1.26 q.1.27 q.1.28 q.1.29 q.1.30 profile 1 f, c f, c s, c/p f, c s, c s, c/p f, c s, c/p conceptual/procedural 2 s, c/p s, c/p s, c s, c s, c f, c s, c s, c/p 3 (kosmas) s, c/p s, c/p s, c/p s, c s, c/p s, c/p s, c s, c/p 4 f, p f, p f, p f, c f, p f, p f, c f, c procedural 5 f, p f, p f, p f, c f, p f, p f, c f, c 6 (stella) f, p f, p f, c f, c f, p f, p f, c f, c 7 (filio) s, c s, c s, c/p s, c s, c f, c/p f, c s, c conceptual 3.2 procedural profile as shown in table 1, the students of this profile performed very well in the tasks of category a (table 1). on the contrary, their performance was very law in the tasks of category b (table 2). in particular, student 3 (hereafter, stella) failed in all the tasks of this category. she stated that “the nominator shows how many pieces to take” to justify her answer in q.1.6, and she drew a circle and partitioned it in three unequal parts in q.1.8 (figure 1). none of these students exhibited any understanding of the fundamental principle that the fractional parts of the unit should be equal, as also evidenced by their performance in q.1.7 (table 2). figure 1. stella’s response to q.1.6, q.1.8: representations for the fractions 1/4 and 2/3, respectively. all three students failed to represent the improper fraction 5/3 (q.1.9). figure 2 presents s5 and stella’s attempts to deal with this task. s4 gave no answer to the problem. figure 2. procedural profile: student 5 and stella’s’ attempt to represent the fraction 5/3. m. bempeni & x. vamvakoussi | f l r 27 in addition, all three students failed in q.1.10, explaining that the denominator shows how many pieces the pizza had, and the nominator how many pieces were eaten. they also failed in q.1.11, since they did not consider that the units of reference might be different. moreover, they all insisted on executing the calculations in q.1.12 and q.1.13. when they were explicitly instructed not to do it, they came up with the rule “multiplication makes bigger, whereas division makes smaller”. all students of this profile were flawless in the tasks of category c, using only procedural strategies. they were, however, very reluctant to try without using paper and pencil, when they were asked to. in case they tried, their responses reflected severe lack of understanding. for example, stella claimed that 123/220 is greater than 6/5 because the numbers 123 and 220 are greater than 6 and 5, respectively. the students of this profile failed in all tasks of category d (table 4). again, they relied heavily on procedural strategies, in particular transformation strategies. for example, they all converted fractions into decimals in q.1.23 and q.1.24. they also attempted to use this strategy or to perform the calculation in the estimation task q.1.26, although they were specifically asked not to. stella, in particular, explicitly stated that it is impossible to solve the task without converting to similar fractions or to decimals first. students 4 and 5 applied the instructed method q.1.25. however, they were not able to interpret the result. consider, for example, the answer and the explanation provided by student 5: “george’ s milk tastes sweeter, because his proportion 600/100=6 is better than stella’s 200/50=4”. on the other hand, stella’s answer indicated that she neglected the multiplicative relations defining the relative quantities that are involved in the situation: “the girl’s quantities are rather small compared to the boy’s. so i believe that her milk tasted sweeter”. these students’ responses to the tasks on dense ordering (q.1.29, q.1.30) were immediate and reflected the idea that fractions (or decimals, in case they had converted them) are discrete, like the natural numbers. stella stated that “there are no other numbers between 2/5 and 3/5, because 3 comes right after 2”. according to stella, one was the smallest positive number, while students 4 and 5 proposed 0.1. 3.3 conceptual profile as mentioned above, there was only one student placed in this profile, namely filio. as shown in table 1, filio failed in all tasks of category a, except for q.1.5, since she was quite competent with equivalent fractions (see also her solution in q1.25 below). on the contrary, she succeeded in all tasks of category b (table 2). she was able to explain adequately her responses. for example, to explain her disagreement with maria in q.1.10, filio said that “i don’t know how many pieces this pizza had. kostas could have eaten 3 pieces, only if the pizza was cut in four”. similarly, in q.1.11, she exclaimed: “where are the pizzas? i need to see the pizzas. are they the same or not?” while dealing with q.1.12 and q.1.13, she explicitly stated that the outcome is not necessarily bigger, just because there is multiplication involved. she tried with several numbers, and eventually came up with a generalization: “when we multiply a number a by a fraction smaller than the unit, the product is smaller than the number a”. filio succeeded in all tasks of category c (table 3) using consistently only conceptual strategies. interestingly, she also succeeded in most of the tasks of category d (table 4). her responses in q.1.23, q.1.24, were based on estimation of the fraction magnitudes and a rough approximation of their location on the numbers line. unlike the students of the conceptual-procedural profile, she didn’t attempt to find the exact locations by partitioning the line segments. quite similar to these students, however, she focused on the relations between the quantities in q.1.25, employed a transformation strategy, and came up with a solution that is not taught at school: “the 50gr of chocolate powder that stella put in 200gr milk is half the quantity that george put in 600gr. so i double the quantities 50/200 and i get 100/400. then, 100 in 400 means more chocolate powder in the milk than 100 in 600! so, stella’s milk tastes sweeter.” similarly to kosmas, filio explicitly stated that there are infinitely many pairs whose product is 3 (q.1.27). moreover, she also stated that there are infinitely many numbers between 2/5 and 3/5 (q.1.30). m. bempeni & x. vamvakoussi | f l r 28 unlike all other participants, she justified her answer using spontaneously a rather sophisticated representation: “if we locate them on the number-line, there is definitely a gap in between. in this gap, there are infinitely many numbers”. we note that filio explicitly expressed her discomfort with tasks in which she could not avoid using procedures (e.g., category a tasks, q.1.28). we also note that filio was monitoring her performance during the solution process. she explicitly expressed doubt about responses that were actually incorrect; she also revised certain answers herself. for example, when solving q.1.18, she initially answered that the fractions 3/4 and 6/7 are equal, because for both one fractional unit is needed to complete the unit. she revised this answer after locating the two fractions on the number line. 3.4 conclusions the first phase of the study revealed three different student profiles: the conceptual-procedural profile consisted of three students with quite strong conceptual knowledge of fractions, combined with procedural fluency. these students appeared to prefer conceptual strategies over procedural strategies, when this was possible. one of these students, namely kosmas (student 3), was exceptionally strong: not only did he succeed in all tasks, but he also gave the most complete and elaborated answers. the procedural profile consisted of three students who were capable of applying instructed procedures. this capability allowed them to deal very successfully with the tasks that could actually be solved by an instructed procedure. however, these students failed in most tasks that required conceptual knowledge, exhibiting lack of understanding for even the most fundamental fraction ideas. stella, in particular, failed in the simplest conceptual tasks. these students relied heavily on procedural strategies and avoided consistently to try otherwise. when they did try, they typically failed. finally, the conceptual profile consisted of one student, namely filio (student 7). filio consistently avoided applying procedures throughout the interview, and she failed when she had to do it. she nevertheless exhibited a firm understanding of fundamental fraction ideas; and thus she managed to deal quite successfully with many tasks by applying consistently conceptual strategies. thus, in line with recent discussions regarding the relation between conceptual and procedural knowledge of fractions (e.g., hallett et al., 2010, 2012), we found individual differences in the way that students combine the two kinds of knowledge. moreover, we showed that these differences can be extreme – consider, for example, stella and filio. 4. results of the 2nd phase of the study table 5 presents the categories that describe the deep learning approach and the superficial learning approach to mathematics, and their indicators. in the following we present excerpts from transcribed interviews of kosmas, filio, and stella, in order to highlight the similarities and the differences in their learning approaches to mathematics, along these categories. 4.1 goals kosmas and filio repeatedly referred to the importance of learning with understanding in mathematics, which they both juxtaposed with rote learning. for them, learning with understanding meant personal making of meaning. this point is illustrated in the following excerpts, in which kosmas and filio explain how they would help a hypothetical younger student that is challenged by the comparison of fractions: m. bempeni & x. vamvakoussi | f l r 29 “perhaps i could try to explain fractions as i understand them. he has to find a personal way of thinking though. he could study the rules. in fact, there are two ways: in the case of fractions, the first one is to memorize the rules and apply them. for example, between two fractions with the same numerator 3/5 and 3/7 the bigger is this one with the smaller denominator. alternatively, he would compare the two fractions to the unit, that is, notice that 3/7 is closer to 1 than 3/5. there is a difference: in the second case you have understood exactly what happens with fractions-the first is rote learning. you can reach a conclusion regarding which of two fractions is bigger but you don’t understand why. personally, if i saw these two fractions, i would compare the fractions to the unit so as to check the validity of the rule.” (kosmas, q.2.11) “i would help him understand the concept of fraction. but, you know, everyone has their own way of thinking. mathematics is not rote learning, you have to put your mind to the work. […] i could explain to him how to compare fractions based on the rules, but if he wants be really able to compare fractions, i think that he should understand the concept of fraction. he must understand what fractions are and then he will do well in fractions.” (filio, q.2.11) consider also the following excerpts: “the most important thing is to understand. knowing the rules will also help you, there is no doubt about it. but understanding is the most important thing.” (kosmas, q.2.11) “if i understand the meaning of what i do, then i can solve the exercises.” (filio, q.2.2) table 5 deep vs. superficial learning approach to mathematics: categories and indicators categories sub-categories indicators deep approach superficial approach goals understanding / personal making of meaning focus on what is required /assessed at school study strategies combining theory and practice systematic, long-term time investment memorizing and rehearsing more rehearsing awareness of understanding high low effectiveness of own study strategies high low engagement factors task/strategy preferences conceptual procedural motivation intellectual challenge coping on the contrary, stella repeatedly referred, explicitly or implicitly, to the importance of complying with what is assessed at school and appeared to focus exclusively on the material taught at school. this is summarized nicely in the following excerpt: “what i would advise a younger student is to look at the exercises solved at school, to focus on what is likely to be asked in the exams, and to pay attention to what the teacher has emphasized on.” (stella, q.2.1) m. bempeni & x. vamvakoussi | f l r 30 4.2 study strategies kosmas and filio both stressed that in order to study efficiently in mathematics one needs to combine studying theory in depth and extensive practice with exercises. they also expressed their conviction that solving unfamiliar problems is important as a study strategy as well as an indicator of understanding. “you have to know the theory very well so as to understand mathematics. if you only solve exercises, your competence is very limited. one has to understand the theory in depth before trying to solve exercises.” (kosmas, q.2.2) “if you give me any problem and i can solve it, then it means i have understood well.” (kosmas, q.2.9) “one should understand the theory very well and practice a lot as well; and solve exercises beyond the ones in the textbook.” (filio, q.2.2) in contrast, stella’s study strategies were limited to memorizing and rehearsing: “studying what is needed for solving the exercises is pretty much sufficient.” (stella, q.2.2) “studying the theory is good, because you have to know some theory to be able to solve the exercises. but i think that it is better to focus on exercises. personally, i look at what we have done at school, so as to remember how the exercises are solved. i solve them again and again, and then i check if they are correct.” (stella, q.2.3) in addition, unlike stella, kosmas and filio appeared to value the hypothetical students’ study strategies in q.2.3, although they both admitted that they don’t study like this. “there is no doubt that this is the appropriate way of studying the theory. […] this is how i should study but, unfortunately, i don’t. that’s why i am not strong in mathematics.” (kosmas, q.2.3) “what she does is just fine. i don’t study like this, but i wish i did.” (filio, q.2.3) moreover, kosmas and filio referred to the importance of investing time on mathematics studying. they distinguished between merely spending time on studying, and studying systematically and in depth. “mathematics is a course that has to do a lot with understanding, so you have to study a lot. you have to start systematically in mathematics from the beginning. gaps are difficult to cover, one needs to dedicate lots of time for both theory and exercises.” (kosmas, q.2.2) “i was preparing for a mathematics test and i spent lots of time, but only during the last two days before the exam. i believe that studying in depth results to success. if you study superficially, you are not prepared appropriately. when we talk about mathematics, you can’t prepare at the last minute. if you do it, you will fail. it is impossible to learn mathematics two days before the exams.” (kosmas, q.2.4) “it’s not only the time spent on studying, it’s also the way you study. […] you may feel well-prepared for a test because you have spent lots of time on solving exercises and fail in the end. for example, what has happened to me is to face unfamiliar problems in a test and fail. in that test, our teacher tested whether we can think for ourselves, so he examined us in different tasks than the ones we had solved in the classroom. […] in order to succeed, you must have understood the concepts and have practiced a lot.” (filio, q.2.4) stella also mentioned time as an important factor of success in mathematics. for stella, however, spending more time on studying meant more rehearsing: m. bempeni & x. vamvakoussi | f l r 31 “[one of my classmates] is a very good at math. i believe that i am good too, but not exactly at the same level. […] i think he spends more hours studying than i do. […] perhaps he solves the exercises more times than i do.” (stella, q.2.5) 4.3 awareness 4.3.1 awareness of understanding kosmas felt confident that he was able to assess his performance in mathematical tasks in general. in fact, he was very accurate in assessing his performance regarding the fraction tasks. as already mentioned, filio was monitoring her performance in the fraction tasks and corrected several mistakes herself in the process. she also detected practically all the tasks that she had answered incorrectly. in addition, she was aware that she lacked procedural fluency: “i don’t remember rules and procedures regarding fractions. however, if someone reminded me of them, i could apply them.” (filio, q.2.9) filio acknowledged that fractions require “a lot of thinking” and recalled that she was challenged by fractions at the elementary school. interestingly, she mentioned that she managed to grasp the meaning of fractions, by connecting the “school fractions” with the fractions she met at her music courses. (filio, q.2.6) stella, on the other hand, was confident that she had answered pretty much all fraction tasks correctly. she appeared to detect her mistake in q.1.9, and she revised her answer. however, her second attempt was again incorrect, since it was based on the assumption that 5/3 is “a bit bigger than 0.5”. nevertheless, stella believed that she had a firm understanding of fractions in general: “i believe that i understand everything about fractions. i never had any difficulty with fractions. i found them very easy at the elementary school, too. in general, i have never had any problems with mathematics, as far as i can remember.” (stella, q.2.8) 4.3.2 awareness of the effectiveness of own study strategies as mentioned before, kosmas and filio both admitted that they did not follow effective study strategies in mathematics, although they recognized and appreciated them. in addition, they both attributed the fact that they didn’t excel in mathematics to their own way of studying. “[one my classmates] is really strong in mathematics. i am at a considerably lower level. this is because i don’t invest enough time to study seriously in mathematics. [...] often i only solve the exercises that i have as homework and stop there. [...] i could be as strong as my fellow student, provided that i would be determined to study seriously (kosmas, q.2.5) “i could be as good as him [my fellow student]. how? the old-fashioned way: putting time and effort in studying as i should.” (filio, q.2.5) on the contrary, not once did stella question her study strategies: “every time something went wrong, this happened because i was not so careful. […] or i thought i knew the material and that there was no need to look at it again, but in fact i did not remember it well. but in cases that i had studied as i should, i believe that stress was responsible for my failure.” (stella, q.2.4) m. bempeni & x. vamvakoussi | f l r 32 4.4 engagement factors 4.4.1 task/strategy preferences as already mentioned, during the first phase of the study it was more than obvious that filio resented the tasks that she perceived as procedural. for instance, she grew impatient with q.1.28 and quitted trying, exclaiming “i’ve had enough! i spent too much time on this already. i can’t do it, i won’t do it!”. kosmas, on the other hand, never expressed any discomfort when he had or chose to apply procedural strategies. in spite of this important difference, these students both expressed their preference for conceptual over procedural tasks, when they were explicitly asked to chose: “this is an easy choice! i would choose the second one, because i do not like using methods. i do know, however, that the first one is easier. at any moment you can open your book and remember how it is solved.” (kosmas, q.2.10) “not the first one, for sure. it’s better to think something new, instead of constantly doing the same. i find no meaning in the application or rules and procedures. it is not interesting. it is like rote learning, you know, a method to solve exercises.” (filio, q.2.10) unlike kosmas and filio, stella showed a clear preference for procedural strategies during the first phase of the studies; and she explicitly stated that she would prefer the standard, procedural task in q.2.10. 4.4.2 motivation as it may be evident by their responses to q.2.10, kosmas and filio were motivated by novel and challenging tasks. there were clear such indications about kosmas already in the first phase of the study. for instance, when he first saw q.1.29, his immediate reaction was the following: “the smallest positive number! this is a nice question, isn’t it?” in fact, kosmas was the only participant who chose to deal with the most demanding and unfamiliar tasks first. when asked why, he replied: “i like challenging tasks much more. i find no interest in solving exercises similar to the ones i have met before. the point is to think of something new.” similarly, filio explained her choice of the unfamiliar task in q.2.10 as follows: “when you try to solve an exercise and you finally discover that something that you thought for yourself is correct, you get a very nice feeling.” (filio, q.2.10). on the contrary, stella’s main concern was to stay on the safe side. as may be evident by her responses presented above, she was mainly interested in good school performance. when she explained why she would prefer the “standard”, procedural, task in q.2.10, she indicated that she was minding the possible failure that guided her choice: “i would choose the first one because it involves operations, which i already know. so i would be sure that i can respond correctly. the second one may involve something i don’t know or never met before.” finally, we note that for kosmas and filio learning with understanding, besides being an important goal in mathematics learning, also had a motivational aspect. consider, for example, the following excerpts: “if you are to study mathematics, you should understand what you’re doing. you should find meaning in what you do.” (filio, q.2.2) “[my classmate who excels in mathematics] has a special interest in math, he loves it. he finds meaning in what he does. that’s why he dedicates so many hours to studying.” (filio, q.2.5) m. bempeni & x. vamvakoussi | f l r 33 both students mentioned that they felt they understood mathematics at the elementary level, but not so at the secondary level. this was due to the fact that procedures are over-emphasized at the secondary level and this appeared to be demotivating for them. “instruction on fractions is based on algorithms and students do not understand the concept of fraction. for example, in the addition of fractions we learn a priori that fractions must have the same denominator without understanding why. something similar happens to mathematics teaching in general. we should understand mathematics deeper and i think that teachers must help us. how? i don’t know.” (kosmas, q.2.7). 4.5 conclusions as evidenced by their interview excerpts, kosmas and filio exhibited similar features along the categories goals, (study) strategy use, awareness, and engagement factors. specifically, they both appeared to value understanding and personal making of meaning in mathematics learning; they were convinced that the study of mathematics requires combining deep understanding of theory as well as extensive practice; systematic and long-term time investment was a key issue for them, as they appeared aware that merely spending time on mathematics studying is not enough to succeed in mathematics. kosmas and filio showed high awareness of understanding in the domain of fractions; they were also highly aware of their limitations as students in mathematics. finally, they showed a clear preference for tasks that require conceptual understanding and present an intellectual challenge, which appeared to be motivating for them. on the contrary, filio differed across all categories. specifically, filio’s goal was to cope successfully with what was required at school; her study strategies were limited to memorizing rules and procedures as well as solving similar or even the same exercises repeatedly; she preferred procedural tasks because she was confident that she would succeed. finally, she showed practically no awareness of her (extremely limited) conceptual understanding of fractions, and no awareness of the limitations of her study strategies. 5. discussion our results support the hypothesis that there are individual differences in the way that students develop conceptual and procedural knowledge of fractions. similarly to hallett et al. (2010, 2012), we identified students who were strong with respect to one type of knowledge, but weak with respect to the other. although the findings of hallet et al. (2012) indicate that such individual differences become less salient with age, we showed that for some students they remain extreme, even at grade 9. consider, for example, stella and filio: it appears that for these students conceptual and procedural knowledge of fractions have not developed in a hand-over-hand process, as predicted by the iterative model (rittle-johnson et al., 2001). in addition, our study provides preliminary evidence indicating that the individual student’s learning approach to mathematics is worth investigating in relation to individual differences in conceptual and procedural knowledge. similarly to stathopoulou and vosniadou (2007), we found that kosmas and filio, who exhibited strong conceptual knowledge of fractions, both valued a deep approach to mathematics learning; whereas stella, who exhibited poor conceptual knowledge of fractions, appeared to follow a superficial approach. this finding cannot, of course, be generalized, given that it comes from a qualitative study, with small sample. moreover, it is based on “extreme” cases of individuals. nevertheless, this qualitative evidence can inform the hypotheses and the design of future quantitative studies. investigating individual differences in conceptual and procedural knowledge is important for understanding mathematical development (canobi, 2004; hallett et al., 2010, 2012). from an educational perspective, however, encouraging the symmetrical development of the two kinds of knowledge is an m. bempeni & x. vamvakoussi | f l r 34 important goal, since they are both considered essential for students’ mathematical competence (rittlejohnson & schneider, in press). to this end, probably the first step would be to foster learning environments in which both conceptual and procedural knowledge are valued – and also assessed. keypoints there are individual differences, even extreme, in the way students combine conceptual and procedural knowledge of fractions. the individual student’s learning approach to mathematics is a factor worth investigating with respect to individual differences in conceptual and procedural fraction knowledge. references alsawaie o. (2011). number sense-based strategies used by high-achieving sixth grade students who experienced reform textbooks. international journal of science and mathematics education, 10(5), 1071-1097. doi: 10.1007/s10763-011-9315-y canobi, k. h. (2004). individual differences in children's addition and subtraction knowledge. cognitive development, 19, 81-93. doi: 10.1016/j.cogdev.2003.10.001 canobi, k. h., reeve, r. a., & pattison, p. e. (2003). patterns of knowledge in children’ s addition. developmental psychology, 39, 521-534. doi: 10.1037/0012-1649.39.3.521 clarke, d. m., & roche α. (2009). students’ fraction comparison strategies as a window into robust understanding and possible pointers for instruction. educational studies in mathematics, 72(1), 127138. doi: 10.1007/s10649-009-9198-9 dubinsky, e. (1991). reflective abstraction in advanced mathematical thinking. in d. o. tall (ed.) advanced mathematical thinking (pp. 95-123). kluwer: dordrecht, doi: 10.1007/0-306-47203-1_7 entwisle, n. & mccune v. (2004). the conceptual bases of study strategy inventories. educational psychology review, 16, 325-345. doi: 10.1007/s10648-004-0003-0 faulkenberry, t. j. (2013). the conceptual/procedural distinction belongs to strategies, not tasks: a comment on gabriel et al.(2013). frontiers in psychology, 4, 1-2. doi: 10.3389/fpsyg.2013.00820 gilmore, c. k., & bryant, p. (2006). individual differences in children’s understanding of inversion and arithmetical skill. british journal of educational psychology, 76, 309–331. doi: 10.1348/000709905x39125 gilmore, c. k., & bryant, p. (2008). can children construct inverse relations in arithmetic? evidence for individual differences in the development of conceptual understanding and computational skill. british journal of developmental psychology, 26, 301–316. doi: 10.1348/026151007x236007 gilmore, c. k., & papadatou-pastou, m. (2009). patterns of individual differences in conceptual understanding and arithmetical skill: a meta-analysis. mathematical thinking and learning, 11, 25-40. doi: 10.1080/10986060802583923 gray, e., tall. d. (1994). duality, ambiguity, and flexibility: a proceptual view of simple arithmetic. journal for research in mathematics education 25(2), 407-428. hallett, d., nunes, t., & bryant, p. (2010). individual differences in conceptual and procedural knowledge when learning fractions. journal of educational psychology, 102, 395–406. doi: 10.1037/a0017486 haapasalo, l., & kadijevich, d. (2000). two types of mathematical knowledge and their relation. jmd - journal for mathematic-didaktik, 21, 139-157. doi: 10.1007/bf03338914 hallett, d., nunes, t., bryant, p., & thorpe, c. m. (2012). individual differences in conceptual and procedural fraction understanding: the role of abilities and school experience. journal of experimental child psychology, 113, 469-486. doi: 10.1016/j.jecp.2012.07.009 m. bempeni & x. vamvakoussi 35 | f l r hiebert, j., & wearne, d. (1996). instruction, understanding, and skill in multidigit addition and subtraction. cognition & instruction, 14(3), 251-283. doi: 10.1207/s1532690xci1403_1 kerslake, d. (1986). fractions: children’s strategies and errors: a report of the strategies and errors in secondary mathematics project. windsor, uk: nfer–nelson. mcintosh, a., reys, b. j., & reys, r. e. (1992). a proposed framework for examining basic number sense. for the learning of mathematics, 12, 2-8. mcmullen, j., laakkonen, e., hannula-sormumen m., & lehtinen e. (in press). modeling developmental trajectories of rational number. learning and instruction. doi:10.1016/j.learninstruc.2013.12.004. moss, j., & case, r. (1999). developing children’s understanding of the rational numbers: a new model and an experimental curriculum. journal for research in mathematics education, 30, 122-147. national council of teachers of mathematics. (1989). curriculum and evaluation standards for school mathematics. reston, va: author. peck, d. m., & jencks, s. m. (1981). conceptual issues in the teaching and learning of fractions. journal for research in mathematics education, 12(5), 339-348. resnick, l. b. (1982). syntax and semantics in learning to subtract. in t. p. carpenter, j. m. moser & t. a. romburg (eds.), addition and subtraction: a cognitive perspective (pp. 136–155). hillsdale, nj: erlbaum. rittle-johnson, b., siegler, r. s., & alibali, m. w. (2001). developing conceptual understanding and procedural skill in mathematics: an iterative process. journal of educational psychology, 93, 346-362. doi: 10.1037/0022-0663.93.2.346 rittle-johnson, b., & siegler, r. s. (1998). the relations between conceptual and procedural knowledge in learning mathematics: a review. in c. donlan (ed.), the development of mathematical skills (pp. 75110). east sussex, uk: psychology press. rittle-johnson, & b., schneider, m. (in press). developing conceptual and procedural knowledge of mathematics. in r. kadosh & a. dowker (eds), oxford handbook of numerical cognition. oxford press. schneider, m., & stern, e. (2010). the developmental relations between conceptual and procedural knowledge: a multimethod approach, developmental psychology, 46, 178-192. doi: 10.1037/a0016701 schneider. m., rittle-johnson b, & star j. (2011). relations among conceptual knowledge, procedural knowledge, and procedural flexibility in two samples differing in prior knowledge. journal of developmental psychology, 47, 1525-1538. doi: 10.1037/a0024997 siegler, r, pyke, a. (2013). developmental and individual differences in understanding of fractions. developmental psychology, 49, 1994–2004. doi: 10.1037/a0031200 silver, e. a. (1986). using conceptual and procedural knowledge: a focus on relationships. in j. hiebert and p. lefevre (eds.), conceptual and procedural knowledge: the case of mathematics. new jersey: erlbaum associates. smith, j. (1995). competent reasoning with rational numbers. cognition and instruction, 13(1), 3-50. doi: 10.1207/s1532690xci1301_1 sfard, a. (1991). on the dual nature of mathematical conceptions: reflections on processes and objects as different sides of the same coin. educational studies in mathematics, 22, 1-36. doi: 10.1007/bf00302715 stathopoulou, c., & vosniadou, s. (2007). conceptual change in physics and physics-related epistemological beliefs: a relationship under scrutiny. in s. vosniadou, a. baltas, & x. vamvakoussi (eds.), reframing the conceptual change approach in learning and instruction (pp. 145-164). oxford, uk: elsevier. doi: 10.1016/j.cedpsych.2005.12.002 vamvakoussi x. & vosniadou, s. (2010). how many decimals are there between two fractions? aspects of secondary school students’ reasoning about rational numbers and their notation. cognition & instruction, 28(2), 181-209. doi: 10.1080/07370001003676603 yang. d., c., reys, r., & reys, b. (2007). number sense strategies used by pre-service teachers in taiwan. international journal of science and mathematics education, 7(2), 383-403. doi: 10.1007/s10763-0079124-5 frontline learning research 1 (2013) 42 71 issn 2295-3159 corresponding author: mariel f. musso, katholieke universiteit leuven / universidad argentina de la empresa, mariel.musso@hotmail.com http://dx.doi.org/10.14786/flr.v1i1.13 42 | f l r predicting general academic performance and identifying the differential contribution of participating variables using artificial neural networks mariel f. musso ab , eva kyndt ac , eduardo c. cascallar ad , filip dochy a a katholieke universiteit leuven, belgium b universidad argentina de la empresa, argentina c university of antwerp, belgium d assessment group international, usa / belgium article received 8 march 2013 / revised 2 july 2013 / accepted 16 july 2013 / available online 27 august 2013 abstract many studies have explored the contribution of different factors from diverse theoretical perspectives to the explanation of academic performance. these factors have been identified as having important implications not only for the study of learning processes, but also as tools for improving curriculum designs, tutorial systems, and students’ outcomes. some authors have suggested that traditional statistical methods do not always yield accurate predictions and/or classifications (everson, 1995; garson, 1998). this paper explores a relatively new methodological approach for the field of learning and education, but which is widely used in other areas, such as computational sciences, engineering and economics. this study uses cognitive and non-cognitive measures of students, together with background information, in order to design predictive models of student performance using artificial neural networks (ann). these predictions of performance constitute a true predictive classification of academic performance over time, a year in advance of the actual observed measure of academic performance. a total sample of 864 university students of both genders, ages ranging between 18 and 25 was used. three neural network models were developed. two of the models (identifying the top 33% and the lowest 33% groups, respectively) were able to reach 100% correct identification of all students in each of the two groups. the third model (identifying low, mid and high performance levels) reached precisions from 87% to 100% for the three groups. analyses also explored the predicted outcomes at an individual level, and their correlations with the observed results, as a continuous variable for the whole group of students. results demonstrate the greater accuracy of the ann compared to traditional methods such as discriminant analyses. in addition, the ann provided information on those predictors that best explained the different levels of expected performance. thus, results have allowed the identification of the specific influence of each pattern of variables on different levels of academic performance, providing a better understanding of the variables with the greatest impact on individual learning processes, and of those factors that best explain these processes for different academic levels. keywords: predictive systems; academic performance; artificial neural networks m. f. musso et al. 43 | f l r 1. introduction many studies have explored the contribution to the explanation of academic performance with the use of various different variables and from diverse theoretical perspectives (e. g. bekele & mcpherson, 2011; fenollar, roman, & cuestas, 2007; kuncel, hezlett, & ones, 2004; miñano, gilar, & castejón, 2008). many factors have been identified as having important implications not only for the study of learning processes, but also as tools for improving of curriculum designs, tutorial systems, and students‟ academic results (miñano et. al., 2008; musso & cascallar, 2009a; zeegers, 2004). from this previous body of research, it has become apparent that the accurate prediction of student performance could have many useful applications for positive outcomes of the learning process and lead to advances in learning theory. for example, it could be helpful to identify students at risk of low academic achievement (musso & cascallar, 2009a; ramaswami & bhaskaran, 2010). this prediction could serve as an early warning of future low academic performance and guide interventions that could prove beneficial for such students. similarly, being able to understand the role of different intervening variables that influence performance for all and for each category of performance level, would be a significant contribution to improve the approach to teaching and better understand learning processes. many previous studies have focused on the prediction of academic performance (e.g., hailikari, nevgi, & komulainen, 2008; krumm, ziegler, & buehner, 2008; turner, chandler, & heffer, 2009). many of the studies about academic performance have considered grade point average (gpa) as the best summary of student learning, not only because of its strong prediction of performance for other levels of education (e. g. kuncel et al., 2004, 2005), but also for other life outcomes as salary (roth & clarke, 1998), and job performance (roth, be vier, switzer, & schippman, 1996). the prediction of academic performance has been carried out with different methodological approaches. the first and most common approach found in the educational literature, has to do with the use of traditional statistical methods, such as discriminant analysis and multiple linear regressions (braten & stromso, 2006; vandamme, meskens & superby, 2007). a second approach can be found in various studies which have used structural equation modelling (sem) to compare theoretical models to data sets and/or to test different models of academic performance (fenollar et al., 2007; miñano et al., 2008; ruban & mccoach, 2005). these traditional approaches – that are tools widely used to predict gpa, to orient selection, placement, and/or classification of the academic process –failed to consistently show the capacity to reach accurate predictions or classifications in comparison with artificial intelligence computing methods (everson, chance, & lykins, 1994; kyndt, musso, cascallar, & dochy, 2012, submitted; lykins & chance, 1992; maucieri, 2003; weiss & kulikowski, 1991). therefore, a third approach to the “prediction of academic performance” that we can find in recent literature involves machine learning techniques, such as methods using artificial neural networks (ann). this method has been used and proven useful in several other fields, such as business, engineering, meteorology, and economics. it is considered an important method to classify potential outcomes and is well regarded as an excellent pattern-recognizer (detienne, detienne, & joshi, 2003; neal & wurst, 2001; white & racine, 2001). recent work in the field of computer sciences has started to apply this methodology to large data banks of nation-wide educational outcomes (abu naser, 2012; croy, barnes, & stamper, 2008; fong, si, & biuk-aghai, 2009; kanakana, & olanrewaju, 2011; maucieri, 2003; mukta & usha, 2009; pinninghoff junemann, salcedo lagos, & contreras arriagada, 2007; ramaswami & bhaskaran, 2010; zambrano matamala, rojas díaz, carvajal cuello, & acuña leiva, 2011; walczak, 1994). this methodology has also recently been used with various applications in educational measurement, in conjunction with other theoretical models of different constructs such as self-regulation of learning (cascallar, boekaerts & costigan, 2006; everson et al., 1994; gorr, 1994; hardgrave, wilson, & walstrom, 1994), reading readiness (musso & cascallar, 2009a); and performance in mathematics (musso & cascallar, 2009b; musso, kyndt, cascallar, & dochy, 2012). the application of predictive systems, with the emergence of new methodologies and technologies, have made it possible to assess a wide range of data and student performances in order to evaluate their current and future performance without the need for traditional testing (boekaerts & cascallar, 2006; cascallar et al., 2006). this methodological approach using ann can lead to the possible implementation of continuous assessment in the context of intelligent classrooms (birenbaum et al., 2006). m. f. musso et al. 44 | f l r existing databases together with the constant monitoring of student performance could provide a continuous evaluation in real time of the students‟ progress. the interrelationship between many of the variables participating in the complex and multi-faceted problem of academic performance are not clearly understood, and they are often related in nonlinear ways. ann have demonstrated to be a very effective approach to address situations with these characteristics and to be able to classify and predict outcomes under those conditions with a high level of accuracy, especially when large data sets are available. this approach also allows the researcher to consider a large number of variables simultaneously and make use of their interrelationships without the usual parametric constraints. these advantages would allow researchers in the learning sciences to better understand the complex patterns of interactions between the variables at different levels of academic performance, not just for the prediction of performance but also to understand the participating factors that could be related to these outcomes. several previous studies using ann have addressed the classification of outcomes into different levels of performance, for different academic purposes: a) diagnostic purposes in order to identify those students most in need of support at the beginning of their primary school, regarding their readiness for learning to read (musso & cascallar, 2009a), and b) identifying students with low expected writing performance at the vocational secondary school level in order to provide support prior to their first year, and thus avoiding possible failure (boekaerts & cascallar, 2011). in these and other possible applications, the early detection of future low performance, and more targeted interventions, would decrease the negative experience of failure, and it would provide an important diagnostic tool for effective interventions. this approach would improve the chances of achieving successful outcomes, particularly for students identified as being “at-risk”. detecting and understanding the most significant variables that are the best indicators of the future low performers would be an important tool for management of school resources and planning remediation programs at all levels of an educational system. similarly, knowing the best indicators of the future high performers, would allow first of all the understanding of many of the factors leading to these positive outcomes. it would also allow an accurate selection of those students who could be assigned to advance programs, fellowships and/or be the object of talent searches. the accurate placement of students in different courses or programs according to how they are expected to perform would prevent possible failure, as well as providing the opportunity to offer challenging tasks for students expected to be among the high performers. in addition, a better understanding of the interrelationships between the variables leading to different levels of performance, would allow the fine-tuning of instructional approaches to the individual and/or group needs using the information provided by an ann approach. some authors have shown that traditional statistical methods do not always yield accurate predictions and/or classifications (bansal, kauffman & weitz, 1993; everson, 1995; duliba, 1991). preliminary research using ann for prediction, selection, and classification purposes suggests that this method may improve the validity and accuracy of the classifications, as well as increase the predictive validity of educational outcomes (everson et al., 1994; hardgrave et al., 1994; perkins, gupta, tammana, 1995; weiss & kulikowski, 1991). this paper explores this new methodological approach using a large amount of data collected from the students (including both cognitive and non-cognitive measures) in order to design predictive models using artificial neural networks (ann). the ann models in this research study can identify those predictors that could best explain different levels of academic performance in three different performance groups which cover all the range of performances, as well as making accurate classifications of the expected level of performance for each subject. data about individual differences in basic cognitive variables were collected, since they are strongly related to the student‟s achievement (colom, escorail, chin shih, & privado, 2007; grimley & banner, 2008). although it has been argued that considering students‟ cognitive ability can lead to a relatively strong prediction of academic performance (colom et al., 2007), this prediction could be strengthened by including background and non-cognitive predictors. as chamorro-premuzic & arteche (2008) discuss, combining both cognitive ability and non-cognitive measures can provide a broader understanding of an individual‟s likelihood to succeed in academic settings, with models that predict such m. f. musso et al. 45 | f l r performance at least one academic year in advance of the actual measure being obtained (grade-point average, gpa). in addition, discriminant analyses (da) was used to analyse the same data in order to compare the predictive classificatory power of both methodologies. to better understand the rationale for this research, it is useful to review some of the main constructs included as predictors in this study, and to explain the quite novel methodology introduced from the family of predictive systems, that is, the machine learning modelling technique of artificial neural networks (ann). 2. theoretical considerations 2.1 working memory and academic performance intelligence and the g-factor are the most frequently studied factors in relation to academic achievement and the prediction of performance (miñano et al., 2012). there is a large body of research that shows a strong positive correlation between g and educational success (e.g., kuncel, hezlett, & ones, 2001; linn & hastings, 1984). the g-factor is defined, in part, as an ability to acquire new knowledge (e.g., cattell, 1971; schmidt, 2002; snyderman & rothman, 1987). although the g-factor is not the same construct as working memory (wm), several studies have demonstrated a high correlation between these measures (heitz et al., 2006; unsworth, heitz, schrock, & engle, 2005). following the early study of daneman and carpenter (1980) on individual differences in working memory capacity (wmc) and reading comprehension, further research has shown the importance of wmc as a domain-general construct (conway, cowan, bunting, therriault, & minkoff, 2002; conway & engle, 1996; engle & kane, 2004; feldman barrett, tugade, & engle, 2004; kane et al., 2004), including the prediction of average scores over several academic areas (colom et al., 2007). similarly, a large body of literature shows wmc as a very important construct in several areas and several studies have shown its importance in a wide range of complex cognitive behaviours such as comprehension (e.g., daneman & carpenter, 1980), reasoning (e.g., kyllonen & christal, 1990), problem solving (welsh, satterleecartmell, & stine, 1999) and complex learning (kyllonen & stephens, 1990; kyndt, cascallar, & dochy, 2012; st clair-thompson & gathercole, 2006). wmc is an important predictive variable of intellectual ability and academic performance, consistent over time (e.g. engle, 2002; musso & cascallar, 2009a; passolunghi & pazzaglia, 2004; pickering, 2006). working memory is a paradigmatic form of cognitive control that explains how this cognitive control occurs, and which involves the active maintenance and executive processing of information available to the cognitive system, combining the ability to both maintain and effectively process information with minimal loss (jarrold & towse, 2006). it is crucial for the processing of information within the cognitive system, it has a limited capacity and it differs between individuals (conway et al., 2005). the literature seems to indicate two fundamental approaches according to the interpretation of working memory and executive control. traditional perspectives represent working memory and executive control as separate modules (e.g., baddeley, 1986). the perspective taken in this research coincides with another view that understands working memory and executive control as constituting two sides of the same phenomenon, an emergent property from the neuro-cognitive architecture (anderson, 1983, 1993, 2002, 2007; anderson et al., 2004; hazy; frank & o‟reilly, 2006). 2.2 attention and academic performance attention as a cognitive construct has been studied from different theoretical and methodological approaches (e.g., posner & rothbart, 1998; redick & engle, 2006; rueda, posner, & rothbart, 2004). it is evident that our cognitive system is constantly receiving a variety of inputs form the environment. all these inputs are competing for the limited resources of the cognitive system, and requiring our “attention”. m. f. musso et al. 46 | f l r however, because human cognitive capacities are limited in their ability to process information simultaneously (gazzaniga, ivry, & mangun, 2002), it is the shifting of the processing capacity and selection of stimuli to attend to, which constitute the basic aspects of our attentional system (redick & engle, 2006). this shifting and selection of incoming information is the function of the attentional system, which allows us to redirect our attention to the relevant aspects of the environmental information for the task or goals at hand. this study adopts the framework of posner and petersen (1990) who described three different and semiindependent attentional networks: orientation, alertness and executive attention. the orienting network allows the selection of information from sensory input, the alerting network refers to a system that achieves and maintains an alert state, and executive attention or executive control is responsible for resolving conflict among responses (fan, mccandliss, summer, raz, & posner, 2002). the efficiency of these three attentional networks can be quantified by reaction time measures (fan et al., 2002). redick and engle (2006) and unsworth et al. (2005) have found that individual differences in working memory capacity are related to those in attentional control, thus establishing that the executive control mechanism is closely related to working memory capacity. several studies have shown the importance of attention as a predictor of general academic performance (gsanger, homack, siekierski, & riccio, 2002; kyndt et al., 2012, submitted; riccio, lee, romine, cash and davis, 2002), reading (landerl, 2010; lovett, 1979), mathematical performance (fernandez-castillo & gutiérrez-rojas, 2009; fletcher, 2005; musso et al., 2012), and written expression (reid, 2006). the research on learning disorders has found that attentional problems are negatively associated to academic achievement (jimmerson, dubrow, adam, gunnar, & bozoky 2006). 2.3 learning strategies and academic performance the estimated level of contribution of basic cognitive processes to the determination of academic achievement has shown considerable variation, which ranges from a moderate to a medium-high effect (castejón & navas, 1992; navas, sampascual, & santed, 2003). consequently, the studies focusing on the prediction of academic performance have increasingly included the so-called non-cognitive variables such as motivation, attributions, self-concept, effort, goal orientation, etc. (e.g., fenollar et al., 2007; pintrich, 2000). learning strategies (ls) have been defined as student‟s actual behaviours, in a specific context, to engage in a task (biggs, 1987). other researchers describe ls as any thoughts or behaviours that help the students to acquire new information and integrate this new information with their existing knowledge (weinstein & mayer, 1986; weinstein, palmer, & schulte, 1987; weinstein, schulte & cascallar, 1982). ls also help students retrieve stored information. examples of ls include summarizing, paraphrasing, imaging, creating analogies, note-taking, and outlining (weinstein et al., 1987). previous research has provided support for the mediating role of learning strategies (dupeyrat & marine, 2005; fenollar et al., 2007; simons, dewitte, & lens, 2004). fenollar et al. (2007) have compared a theoretical model, where achievement goals and self-efficacy were hypothesised to have direct effects on academic performance, to a mediating model where such effects were mediated through study strategies. results from the study showed that achievement goals and self-efficacy have no direct effects on performance, and they suggest that the mediating model provides a better fit to the data (fenollar et al., 2007). 2.4 artificial neural networks and performance conceptually, a neural network is a computational structure consisting of several highly interconnected computational elements, known as neurons, perceptrons, or nodes. each “neuron” or unit carries out a very simple operation on its inputs and transfers the output to a subsequent node or nodes in the network topology (specht, 1991). neural networks exhibit polymorphism in structure and parallelism in computation (mavrovouniotis & chang, 1992), and it can be represented as a highly interconnected structure of processing elements with parallel computation capabilities (grossberg, 1980, 1982; rumelhart, hinton, & m. f. musso et al. 47 | f l r williams, 1986; rumelhart, mcclelland, & the pdp research group, 1986). in general, an ann consists of an input layer (which can be considered the independent variables), one or more hidden layers, and an output layer that is comparable to a categorical dependent variable (cascallar et al., 2006; garson, 1998). all ann process data through multiple processing entities which learn and adapt according to patterns of inputs presented to them, by constructing a unique mathematical relationship for a given pattern of input data sets on the basis of the match of the explanatory variables to the outcomes for each case (marshall & english, 2000). thus, neural networks construct a mathematical relationship by “learning” the patterns of all inputs from each of the individual cases used in training the network, while more traditional approaches assume a particular form of relationship between explanatory and outcome variables and then use a variety of fitting procedures to adjust the values of the parameters in the model. during the training phase, anns generate a predicted outcome for each case, and when this prediction is incorrect the network makes adjustments to the weights of the mathematical relationships among the predictors and with the expected outcome, weights that are represented in the hidden layers of the network. the predicted output is a continuous variable with a specific value for each case (or subject) which includes information on the probability of belonging to each of the categorical classifications requested by the developer of the ann. according to this architecture, the ann finally recognizes patterns and classifies the cases presented into the requested outcome categories, depending on the target question, and given the individual probability values for each case. this information is generated by the network through many iterations, gradually changing and adjusting the weights for all the interrelationships between the units after each incorrect prediction. during this training process, the network becomes increasingly accurate in replicating the known outcomes from the test cases. the neural network continues to improve its predictions until one or more of the pre-determined stopping criteria have been met. these stopping criteria can be, for example, a minimum level of accuracy, learning rate, persistency, number of iterations, amount of time, etc. once trained, the network is tested with the remaining cases in the dataset, which is considered a form of validation of the network (testing phase), by observing how the weights in the model, now fixed to those obtained in the training phase, predict classes of outcomes in a new set of data of which outcomes are known to the experimenter but not to the ann system. afterwards it can also be applied to predict future cases where the outcome is still unknown (cascallar et al., 2006). in addition, with complementary techniques in predictive stream analysis, the neural network approach allows us to determine the predictive power of each of the variables involved in the study, providing information about the importance of each input variable (cascallar et al., 2006; garson, 1998). predictive stream analyses (cascallar & musso, 2008), based in this case on neural network (ann) models, have several strengths: (a) because these are machine learning algorithms, the assumptions required for traditional statistical predictive models (e.g., ordinary least squares regression) are not necessary. as such, this technique is able to model nonlinear and complex relationships among variables. ann aim to maximize classification accuracy and work through the data in an interactive process until maximum accuracy is achieved, automatically modelling all interactions among variables; (b) anns are robust, general function estimators. they usually perform prediction tasks at least as well as other techniques and most often perform significantly better (marquez, hill, worthley, & remus, 1991); (c) ann can handle data of all levels of measurement, continuous or categorical, as inputs and outputs. because of the speed of microprocessors in even basic computers, anns are more accessible today than when they were originally developed. current research has shown that neural network analysis substantially improves the validity of the classifications and increases the accuracy and predictive validity of the models, in education and other fields (kyndt et al., 2012, submitted; musso & cascallar, 2009b; perkins et al., 1995). the ann learns by examining individual training cases (subjects/students), then generating a prediction for each student, and making adjustments to the weights whenever it makes an incorrect prediction. information is passed back through the network in iterations, gradually changing the weights. as training progresses, the network becomes increasingly accurate in replicating the known outcomes. this process is repeated many times, and the network continues to improve its predictions until one or more of the stopping criteria have been met. a minimum level of accuracy can be set as the stopping criterion, although m. f. musso et al. 48 | f l r additional stopping criteria may be used as well (e.g., number of iterations, amount of processing time). once trained, the network can be applied, with its structure and parameters, to future cases (validation or holdout sample) for further validation studies and programme implementation (lippman, 1987). as long as the basic assumptions of the population of persons or events that the ann used for training is constant or varies slightly and/or gradually, it can adapt and improve its pattern recognition algorithms the more data it is exposed to in the implementations. the class of ann models used in this research can be compared with the more traditional discriminant analysis approach. both of these methods derive classification rules from samples of classified objects based on known predictors. this general approach is called „supervised learning‟ since the outcomes are known and relationships are modelled or „supervised‟ according to these outcomes (kohavi & provost, 1998). but, there are significant differences in the algorithms and procedures for both analyses, such as the fact that while discriminant analysis assumes linear relationships, neural network analysis does not. in terms of comparisons with another common statistical method used in educational research, linear regression, it is important to note that although neural networks can address some of the same research issues as regression it is inherently a different mathematical approach (detienne et al., 2003). there is another family of predictive systems which are “unsupervised” (e.g., kohonen networks), in which the patterns presented to the network are not associated with specific outcomes; it is the neural network itself that derives the commonalities between the predictors, grouping cases into classes on the basis of these similarities. thus, these analyses can be used to explore the data from a different perspective and learn the grouping of cases based on these predictor commonalities instead of being focused on predictions or individual outcomes (cascallar et al., 2006; kyndt et al., 2012, submitted). neural networks excel in the classification and prediction of outcomes; especially when large data sets are available that are related in nonlinear ways, and where the intercorrelation between variables is not clearly understood. these properties of anns clearly make them particularly suitable for social science data where they can simultaneously consider all variables in a study (garson, 1998). moreover, the assumptions of normality, linearity and completeness that are made by methods such as multiple linear regression (kent, 2009), and that are often very difficult to establish for social science data, are not made in neural network analysis. neural networks can work with noisy, incomplete, overlapping, highly nonlinear and noncontinuous data because the processing is spread over a large number of processing entities (garson, 1998, kent, 2009). in this regard it can be said that neural networks are robust and have wide non-parametric application. there is also evidence that neural models are robust in the statistical sense, and also robust when faced with a small number of data points (garson, 1998). very few studies within the educational literature have used neural network analysis or any other type of predictive system (e.g., cascallar et al., 2006; cascallar & musso, 2008; musso & cascallar, 2009a; pinninghoff junemann et al., 2007; wilson & hardgrave, 1995). 2.5 ann processing and measures to evaluate the neural network system performance in order to evaluate the performance of the neural network system, there are a number of measures used which provide a means of determining the quality of the solutions offered by the various network models tried. the traditional measures include the determination of actual numbers and rates for true positive (tp), true negative (tn), false positive (fp), and false negative (fn) outcomes, as products of the ann analysis. in addition, certain summative evaluative algorithms have been developed in this field of work, to assess overall quality of the predictive system. these overall measures are: recall, which represents the proportion of correctly identified targets, out of all targets presented in the set, and is represented as: recall = tp/(tp + fn); and precision which represents the proportion of correctly identified targets, out of all identified targets by the system, and is represented as: precision = tp/(tp + fp). two other measures, derived from signal-detection theory (roc analysis), have also been used to report the characteristics of the detection sensitivity of the system. one of them is sensitivity (similar to recall: the proportion of correctly identified targets, out of all targets m. f. musso et al. 49 | f l r presented in the set), and which is expressed as sensitivity = tp/(tp + fn). the other is specificity, defined as the proportion of correctly rejected targets from all the targets that should have been rejected by the system, and which is expressed as specificity = tn/(tn + fp). all the traditional measures are typically represented in what is called a “confusion matrix” representing all four outcomes. in addition, the evaluation of ann performance is also carried out with another summative measure, which is used to account for the somewhat complementary relationship between precision and recall. this measure is defined as f1, and is defined as f1 = (2 * precision * recall)/(precision + recall). such a definitional expression of f1 assumes equal weights for precision and recall. this assumption can be modified to favour either precision or recall, according to the utility and cost/benefit ratio of outcomes favouring either precision or recall for any given predictive circumstance. 2.6 objectives and research questions the objective of this study is to identify patterns of variables that will allow a correct predictive classification of three levels of general academic performance (gap) into: low, middle and high gap, measured by the grade-point-average (gpa). this was achieved by taking into consideration basic cognitive processes (working memory capacity; alerting, orienting and executive attention), learning strategies, and family-social background factors. the idea behind this paper is to explore new approaches to obtain predictive classifications of learning outcomes, without the use of one specific test, using a large number of variables (cognitive and non-cognitive) that could better capture the true complex composite of influences participating in the actual observed outcomes from individual students. in addition, it is another objective of the research to explore the differences in the patterns predicting each level of performance (low, middle and high performance) to inform future research into the causal factors generating and participating in those sets of identified variables and that could explain different levels of performance using artificial neural networks. of course, previous academic performance could have been taken into account to facilitate the predictive classification, but this was purposely avoided for two reasons: as a proof-of-concept that other variables are sufficient to predict academic performance, and to highlight more clearly the weight that each of these other variables has in the determination of a student‟s academic performance. in order to explore the differences in the patterns predicting each level of performance, three artificial neural network (ann) models were developed. two of them to predict the students who would be in each of the extreme performance levels (low 33% and high 33% of gpa) in order to analyse the differences between the patterns of variables having the most predictive weight for each group, and thus providing information on the potentially different processes involved in those low and high performance outcomes. a third ann was developed, capable of accurately producing a predictive classification for the three levels of performance simultaneously (low 33%, middle 33%, and high 33%). this final ann model was capable of finding the common patterns that could predict simultaneously all performance groups. the relative importance of the predictors for each network was also analysed. the predictive capability of each ann was systematically improved by modifying the parameters that determine the rate of learning, the persistence, momentum, and stopping criteria, and the type of functions used for weight adjustments. precision, sensitivity, specificity and accuracy of the three networks were obtained. in addition, the correlation between the individual prediction for each student and the actual observed gpa was established, and proved to be very high. the main research questions of this study are: how accurately can different levels of academic performance in higher education be predicted by working memory capacity, attentional networks, learning strategies and background variables when used as inputs in a neural network model? what is the relative importance of the predictor variables and the observed differences for each performance level category? m. f. musso et al. 50 | f l r 3. method 3.1 participants the total sample included 864 university students, of both genders (male 45.4%; female 54.6%), ages between 18 and 25 (mage = 20.38, sd = 3.78), recently enrolled in the first year in several different disciplines (psychology, engineering, medicine, law, social communication, business and marketing), in three private universities in argentina, during the 2009-2011 academic years. in all, 67.8% of the sample was 17 to 20 years old, 24.7% was 21-25 years old, and 7.5% was older than 25 years. the students in the sample came from private religious secondary schools (48.5%), private non-religious schools (19%) , private bilingual schools (15.4%), public secondary schools (15%), and 2.1% from international community schools. all student data (predictors) was collected at the beginning of the corresponding academic year, and the dependent variable (gpa) was collected at the end of the same academic year. an 80% math accuracy criterion was imposed for all participants in the automated operation span (unsworth et al., 2005). therefore, they were encouraged to keep their math accuracy at or above 80% at all times (to insure that the interfering task was actually being performed). as a consequence of this criterion, 78 participants were excluded from the analyses. the final sample consisted of 786 students. 3.2 instruments 3.2.1 attention network test (ant) (fan et al., 2002) this computerized task provides a measure for each of the three anatomically defined attentional networks: alerting, orienting, and executive. the ant is a combination of the cued reaction time (posner, 1980) and the flanker test (eriksen & eriksen, 1974). the participant saw an arrow on the screen that, on some trials, was flanked by two arrows to the left and two arrows to the right. participants were asked to determine when the central arrow points left or right, by two mouse buttons (leftright). they were instructed to focus on a centrally located fixation cross throughout the task, and to respond as quickly and accurately as possible. during the practice trials, but not during the experimental trials, subjects received feedback from the computer on their speed and accuracy. the practice trials took approximately 2 minutes and each of the three experimental blocks took approximately 5 minutes. the whole experiment took about twenty minutes. the measure for (general) attention is the average response time regardless of the cues or flankers. to analyse the effect of the three attentional networks, a set of cognitive subtractions described by fan et al. (2002) were used. the efficiency of the three attentional networks is assessed by measuring how response times are influenced by alerting cues, spatial cues, and flankers (fan et al., 2002). the alerting effect was calculated by subtracting the mean response time of the double-cue conditions from the mean response time of the no-cue conditions. for the orienting effect, the mean response time of the spatial cue conditions (up and down) were subtracted from the mean response time of the center cue condition. finally, the effect of the executive control (conflict effect) was calculated by subtracting the mean response time of all congruent flanking conditions, summed across cue types, from the mean response time of incongruent flanking conditions (fan et al. 2002). the test-retest reliability of the general response times (in this study used as a measurement of general attention), calculated by fan et al. (2002) equaled .87. the test-retest reliability of the subtractions is less good. the executive control is the most reliable (r=.77), followed by the orienting network (r=.61). the alerting network showed to be the least reliable (r=.52) (fan et al. 2002). m. f. musso et al. 51 | f l r 3.2.2 automated operation span (unsworth et al., 2005) this is a computer-administered version of the ospan instrument (unsworth et al., 2005) that measures working memory capacity. the responses were collected via click of a mouse button. first, participants receive practice and secondly, the participants perform the actual experiment. the practice sessions are further broken down into three sections. the first practice is a simple letter span task. they see letters appear on the screen one at a time. in all experimental conditions, letters remain on-screen for 800 milliseconds (ms). then, participants must recall these letters in the same order they saw them from a 4 x 3 matrix of letters (f, h, j, k, l, n, p, q, r, s, t, and y) presented to them. recall consists of clicking the box next to the appropriate letters; the recall phase is untimed. after each recall, the computer provides feedback about the number of letters correctly recalled. next, participants practice the math portion of the experiment. participants first see a math operation (e.g. (1*2) + 1 = ?). once the participant knows the answer they click the mouse to advance to the next screen. participants then see a number (e.g. “3”) and are required to click if the number is the correct solution by clicking on “true” or “false.” after each operation participants are given feedback. the math practice serves to familiarize participants with the math portion of the experiment, as well as to calculate how long it takes a given person to solve the math problems, establishing an individual baseline. thus, it attempts to account for individual differences in the time it takes to solve math problems. this is then used as an individualized time limit for the math portion of the experimental session. the final practice session has participants perform both the letter recall and math portions together, just as they will do in the experimental block. the participants first are presented with a math operation, and after they click the mouse button indicating that they have solved it, they see the letter to be recalled. if the participants take more time to solve the math operations than their average time plus 2.5 sd, the program automatically moves on and counts that trial as an error. this serves to prevent participants from rehearsing the letters when they should be solving the operations. participants complete three practice trials, each of set size 2. after the participant completes all of the practice sessions, the program moves them on to the real trials. the real trials consist of 3 sets of each set-size, with the set-sizes ranging from 3 to 7 letters. this makes for a total of 75 letters and 75 math problems. subjects are instructed to keep their math accuracy at or above 85% at all times. during recall, a percentage in red is presented in the upper right-hand corner. subjects are instructed to keep a careful watch on the percentage in order to keep it above 85%. this study reports the absolute ospan score (the sum of all perfectly recalled sets) that is interpreted as the measure of overall working memory capacity, and one reaction time score (operations). the task takes approximately 20–25 minutes to complete (unsworth et al., 2005). this measure of working memory capacity has a high correlation with other measures of working memory and general intelligence, as ospan and raven progressive matrices. in addition, aospan has a good test-retest reliability (r = .83) and an adequate internal consistency (α=.78) (unsworth et al., 2005). 3.2.3 learning strategies questionnaire (lassi; weinstein et al.,1987; weinstein & palmer, 2002; weinstein et al., 1982). the original version is a 77-item questionnaire with 10 scales that assesses the students' awareness about, and use of, learning and study strategies related to skill, will, and self-regulation components of strategic learning. these scales and their corresponding internal consistency coefficients reported in the users‟ manual (weinstein & palmer, 2002), are as follows: attitude scale (α = .77), motivation scale (α = .84), time management scale (α = .85), anxiety scale (α = .87), concentration scale (α = .86), information processing scale (α = .84), selecting main ideas scale (α = .89), study aids scale (α= .73), self-testing scale (α = .84), and test strategies scale (α = .80). the present study used a spanish-version (strucchi, 1991), which was slightly modified in some semantic and grammatical aspects for the local sample. the exploratory factor analysis determined a matrix with five factors that explained 37.52% of the variance. factor 1 related to “cognitive resources/cognitive processing” (α = .871; 13 items; r 2 = 18.03%); factor 2, related to “time management” (α = .807; 10 items; r 2 = 8.404%); factor 3, dealing with “processing of information and generalization” (α = .783; 8 items; r 2 = 4.567%); factor 4 which is related to “anxiety management” (α = .60; 5 items; r 2 = 3.431%); and factor 5, which involves the construct of “study m. f. musso et al. 52 | f l r techniques and use of help” (α = .728; 7 items; r 2 = 2.685%). students gave responses on a likerttype scale, from 1 (never) to 5 (always). 3.2.4 background information basic background information of each student used in the analyses was: gender, highest level of education of mother and father (not completed primary schoolprimary schoolsecondary schoolgraduated universitypost-graduate), occupation of parents, and secondary school from which the student graduated (public private religious school private non-religious school bilingual school foreign community) 3.2.5 academic performance academic performance was measured by the grade point average (gpa) of all courses (different subjects depending on the discipline) at the end of each of the academic years. all course grades which are used by the universities to calculate the overall gpa are obtained using university-wide criteria for the interpretation and assignment of final scores in each course, from which the gpa was calculated. the gpa information was collected from official records at the end of the first academic year for each student, at each of the participating universities, and they all are in a scale from 0 to 10 (with 10 indicating best performance). 3.3 analyses procedure the ann model used was a backpropagation multilayer perceptron neural network, that is, a multilayer network composed of nonlinear units, which computes its activation level by summing all the weighted activations it receives and which then transforms its activation into a response via a nonlinear transfer function, which establishes a relationship between the inputs and the weights they are assigned. during the training phase, these systems evaluate the effect of the weight patterns on the precision of their classification of outputs, and then, through backpropagation, they adjust those weights in a recursive fashion until they maximize the precision of the resulting classifications. ann parameters and variable groupings, as well as all other network architecture parameters, were adjusted to maximize predictive precision and total accuracy. confusion matrices have been determined for each ann, as well as roc analyses for the evaluation of sensitivity and specificity parameters. parameters such as learning rate (the rate at which the ann “learns” by controlling the size of weight and bias changes during learning), momentum (adds a fraction of the previous weight update to the current one, and is used to prevent the system from converging to a local minimum), number of hidden layers, stopping rules (when the network should stop “learning” to avoid over-fitting the current sample), activation functions (which define the output of a node given an input or set of inputs to that node or unit), and number of nodes were specified and varied in the model construction phase in order to maximize the overall performance of the network model. 3.4 architecture of the neural networks according to the objectives of this research, three different neural networks (ann) were developed as predictive systems for the gpa of the students in this study. ann1 was developed to maximize the predictive classification of the lowest 33% of students, which would be scoring the lowest average gpa at the end of the academic year. ann2 was developed to maximize the predictive classification of the highest 33% of students, which would be scoring the highest gpa. ann3 was developed to predict the classification of students into the three levels of expected gpa at the same time. the data set was partitioned into a training set and a testing set for each ann, and for each network, training and testing samples were chosen at random by the software, from the available set of cases. one suggested criterion is that the number of m. f. musso et al. 53 | f l r training inputs (cases) should be at least 10 times the number of input and middle layer neurons in the network (garson, 1998). similarly, it is suggested that about 2/3 (or 3/4) of the cases in the available data set be used for the training phase in order to include a set of cases representing most of the patterns expected to be present in the data (patterns represented by the vector for each case). the remaining 1/3 or 1/4 of the data is used for the testing phase of the network. the specific architecture of each of the three neural networks developed is as follows: ann1 (maximizing the prediction for the low 33% performance group): all cognitive variables, learning strategies, and background variables were introduced in the analysis. they were used for the development of the vector-matrix containing all predictor variables for each student. the resulting network contained all the input predictors, with a total of 18 input units (reaction time operation, reaction time math, reaction time problem, orienting attention, alerting attention, executive control, absolute aospan, processing of information/ generalization, study techniques and use of help, anxiety management, time management, cognitive resources/cognitive processing, gender, mother's occupation, father's occupation, secondary school from which the student graduated, highest level of education completed by father, and highest level of education completed by mother). the model built contained one hidden layer, with 15 units. the output layer contained a dependent variable with two units (categories corresponding to “belongs to lowest 33%” or “belongs to highest 67 %”). in terms of the architecture of the network, a standardized method for the rescaling of the scale dependent variables was used. the hidden layer had a hyperbolic tangent activation function which is the most common activation function used for neural networks because of its greater numeric range (from -1 to 1) and the shape of its graph. the output layer utilized a softmax activation function that is useful predominantly in the output layer of a clustering system, converting a raw value into a posterior probability. the output layer used the cross-entropy error function in which the error signal associated with the output layer is directly proportional to the difference between the desired and actual output values. this function accelerates the backpropagation algorithm and it provides good overall network performance with relatively short stagnation periods (nasr, badr, & joun, 2002). the training was carried out with the „online‟ methodology (one case per cycle), with an initial learning rate of 0.4, and momentum equal to 0.9. the optimization algorithm was gradient descent (which takes steps proportional to the negative of the approximate gradient of the function at the current point), and the minimum relative change in training error was 0.0001. ann2 (maximizing the prediction for the high 33% performance group): all cognitive, learning strategies, and background variables were introduced in the analysis. they were used for the development of the vector-matrix containing all predictor variables for each student. the resulting network contained all the input predictors, with a total of 18 units (reaction time operation, reaction time math, reaction time problem, orienting attention, alerting attention, executive control, absolute aospan, processing of information/generalization, study techniques and use of help, anxiety management, time management, cognitive resources/cognitive processing, gender, mother's occupation, father's occupation, secondary school from which the student graduated, highest level of education completed by father, and highest level of education completed by mother). the model built contained one hidden layer, with nine units, and an output layer with two units (categories corresponding to “belongs to highest 33%” or “belongs to lowest 67%”). in terms of the architecture of the network, a standardized method for the rescaling of scale dependent variables was used. the hidden layer had a hyperbolic tangent activation function. the output layer utilized a softmax activation function. cross-entropy was chosen as the error function. the dataset was partitioned into training set and testing set. the training was carried out with the „online‟ methodology, with an initial learning rate of 0.5, and momentum equal to 0.7. the optimization algorithm was gradient descent, and the minimum relative change in training error was 0.0001. ann3 (maximizing the simultaneous prediction for all the performance groups: low 33% middle 33% high 33%, simultaneously): all cognitive, learning strategies and background variables were introduced in the analysis. they were used for the development of the vector-matrix containing all predictor m. f. musso et al. 54 | f l r variables for each student. the resulting network contained all the input predictors, with a total of 19 input units (reaction time operation, reaction time math, reaction time problem, orienting attention, alerting attention, executive control, absolute aospan, processing of information/ generalization, study techniques and use of help, anxiety management, time management, cognitive resources/cognitive processing, gender, mother's occupation, father's occupation, secondary school, highest level of education completed by father, and highest level of education completed by mother, ln of attention total rt). the model built contained one hidden layer, with 20 units, and one output layer with three units (categories corresponding to “belongs to low 33%”, “belongs to middle 33%” or “belongs to high 33%” of the performance groups). in terms of the architecture of the network, a standardized method for the rescaling of scale dependent variables was used. the hidden layer and the output layer both had a hyperbolic tangent activation functions. a standardized method for the rescaling of covariates was used. sum of squares was chosen as error function. the dataset was partitioned into training set and testing set. the training was carried out with the „online‟ methodology, with an initial learning rate of 0.4, and momentum equal to 0.8. the optimization algorithm was gradient descent, and the minimum relative change in training error was 0.0001. the software used was spss v.19 – neural network module, for the development and analysis of all predictive models in this study. two development phases of the predictive system were carried out: training of the network and testing of the network developed. during the training phase several models were attempted, and several modifications of the neural network parameters were explored, such as: learning persistence, learning rate, momentum, and other criteria. these tests continued until achieving desired levels of classification, maximizing the benefits of the model chosen. in these analyses both precision and recall, as outcome measures of the network, were given equal weight. there was no need to trim the number of predictor inputs in the three models. the validation procedure used was the leave-one-out methodology. 3.5 discriminant analyses discriminant analyses (da) were carried out using the same data and the same categories of gpa used in the neural networks analyses. da1 was performed to discriminate between the students belonging to the lowest 33% of gpa and contrasting them against those not in that category. da2 was focused on identifying students in the highest 33% of academic performance versus those not in that group, and da3 was calculated to discriminate the students belonging to each one of the three levels of gpa performance. in order to give every variable the opportunity to contribute significantly to the prediction, a stepwise discriminant analysis was calculated for each category including all independent variables. in addition, we calculated three discriminant analyses, one for each category including the independent variables of the maximised neural networks of each category. 4. results 4.1 descriptive data the final sample included 786 university students from several disciplines (psychology, engineering, medicine, law, social communication, business and marketing), in three private universities, during the 2009-2011 academic years. descriptive statistics of the cognitive variables and learning strategies are presented in table 1 (cognitive variables) and table 2 (learning strategies). m. f. musso et al. 55 | f l r table 1 descriptive statistics for attentional networks, general reaction time, working memory capacity (absolute aospan) and reaction time operation alerting attention orienting attention executive control ln of attention total rt absolute aospan (sum of perfectly recalled sets) ln rt operation n 786 786 786 786 786 786 mean 34.40 44.01 102.54 6.20 27.88 7.01 sd 22.14 22.90 41.68 .11 14.83 .20 skewness .25 .24 3.31 .67 .25 .46 kurtosis 1.96 5.01 26.14 .98 -.510 .45 minimum -78.00 -77.67 19.00 5.92 0 6.50 maximum 123.83 213.83 558.00 6.74 68 7.75 note: ln of attention total rt: logarithm of attention total reaction time (measure of attention network test) ln rt operation: logarithm of reaction time operation (measure of aospan) table 2 descriptive statistics for each factor of learning strategies (lassi) cognitive resources/cog nitive processing time management processing of information/ generalization anxiety management study techniques and use of help n 756 756 756 756 756 mean -.02 .00 .01 .00 -.01 sd 1.09 1.12 1.11 1.15 1.14 skewness .24 .18 -.37 .35 -.67 kurtosis -.16 -.21 -.07 -.41 -.03 minimum -2.87 -2.86 -4.61 -2.53 -4.24 maximum 3.85 3.30 2.56 3.57 2.22 4.2 neural network analyses ann1 was designed to predict the performance group corresponding to the lowest 33% of predicted gpa. it included 82.4 % of the participants (n = 632) in the training phase and 17.6% (n = 111) in the testing phase. after training, ann1predicting the group with the low 33% of academic performance – was able to reach 100% correct identification of the students that belong to the target group (lowest 33%) (see figure 1).the precision of ann1 equalled 1 on a maximum of 1. the sensitivity of the network equalled 1, and the specificity (defined as the proportion of correctly rejected targets from all the targets that should have been rejected by the system) was equal to 1. the area under the curve equalled .877. m. f. musso et al. 56 | f l r prediction of academic performance 33% lowest (target group) others observed academic performance 33% lowest (target group) 100% 0% others 0% 100% figure 1. testing phase of the neural network predicting the lowest 33% of academic performance scores. in general, several tables (3-5) show the actual predictive weights of the variables that the anns used in the prediction of future academic performance for each of the groups (low 33%, high 33% and the whole sample). the “importance” column can be interpreted as the actual predictive weight of each variable, and the “normalized importance” column represents the percent of predictive weight for each variable (in each group‟s analysis) with respect to the variable with the greatest predictive weight for the group in question, which is assigned a 100%. table 6 summarizes the actual predictive weights of the variables, grouped by construct: background variables (i.e., parents‟ education, parents‟ occupation, type of secondary school), basic cognitive variables (i.e., working memory capacity, attentional networks), reaction time variables (i.e., operations, attentional), and learning strategies/motivation variables (i.e., study techniques, time management, anxiety management). it allows an easier comparison of the sources of predictive weights by area between the various student groups and also for the total sample. table 3 shows the actual predictive weight of each input, and the normalised importance of the different variables for the ann1 predictive classification. these results indicate that the learning strategies regarding cognitive processes, reaction time (rt), and time management were the most important predictors. all reaction times are converted to natural logarithms (ln) of the actual rt. table 3 relative importance of the most predictive variables included in the model for the predictive classification of the lowest 33% of scores in academic performance low 33% group independent variable importance variables importance normalized importance cognitive resources/cognitive processing 0.092 100.00% ln reaction time math 0.083 90.80% time management 0.080 87.30% secondary school from which the student graduated 0.066 71.50% father's occupation 0.065 70.90% executive control 0.062 67.60% mother's occupation 0.058 63.70% ln reaction time problem 0.058 62.80% m. f. musso et al. 57 | f l r absolute aospan (sum of perfectly recalled sets) 0.055 60.50% anxiety management 0.051 55.40% alerting attention 0.050 54.40% ln reaction time operation 0.048 52.40% orienting attention 0.048 52.10% study techniques and use of help 0.046 51.70% processing of information/ generalization 0.043 46.50% gender 0.040 43.70% highest level of education completed by mother 0.030 32.60% highest level of education completed by father 0.025 27.10% ann2 was designed to predict the performance group corresponding to the highest 33% predicted gpa. it included 77.9% of the students in the training phase (n= 614) and 22.1% in the testing phase (n= 136). after training, ann2 reached an accuracy of 100 % (see figure 2). the precision of ann2 equalled 1 on a maximum of 1. the sensitivity of the network equalled 1, and the specificity amounted to 1. the area under the curve equalled .788. prediction of academic performance 33% highest (target group) others observed academic performance 33% highest (target group) 100% 0% others 0% 100% figure 2. testing phase of the neural network predicting the highest 33% of academic performance scores. the most important variables for the prediction of ann2 (high 33%) were reaction time, mother‟s occupation, type of secondary school, father‟s occupation and executive control (executive attention measure) (see table 4). m. f. musso et al. 58 | f l r table 4 relative importance of the most predictive variables included in the model for the predictive classification of the highest 33% of scores in academic performance high 33% group independent variable importance variables importance normalized importance ln of reaction time operation 0.084 100.00% mother's occupation 0.081 97.10% secondary school from which the student graduated 0.081 96.10% father's occupation 0.076 90.10% executive control 0.072 86.40% alerting attention 0.062 73.90% processing of information/ generalization 0.055 65.10% orienting attention 0.054 64.10% study techniques and use of help 0.053 62.30% highest level of education completed by father 0.051 60.70% ln of reaction time math 0.049 58.50% anxiety management 0.047 55.60% highest level of education completed by mother 0.044 52.80% absolute aospan (sum of perfectly recalled sets) 0.044 52.70% time management 0.044 52.20% cognitive resources/cognitive processing 0.037 44.70% ln of reaction time problem 0.033 39.90% gender 0.033 39.60% both networks showed interesting differences in the pattern of relative normalized importance of those variables with the highest participation in the predictive model. for the low performing group in terms of general gpa (those predicted to be in the lowest 33% of scores), several learning strategies related to cognitive processes, reaction time (wmc and attentional networks functioning), and time management were most important in providing predictive weights for a correct classification. on the other hand, results from the predictive model for those students expected to be in the highest 33% of the general gpa scores, the top three predictors with the most significant participation were background variables involving mother‟s and father‟s occupation, type of secondary school, and overall reaction time of the cognitive and attentional processes. ann3, which was designed to predict the three gpa performance groups simultaneously, used 82.8% of the students (n=710) for the training phase, and 17.2% (n=122) for the testing phase. after maximizing the training procedures, the accuracy in the testing phase reached 87.5% for the lowest 33%, 100% for the middle 33%, and 100% for the highest 33% (see figure 3). the precision of ann3 equalled .875 on a m. f. musso et al. 59 | f l r maximum of 1. the sensitivity of the network equalled 1, and the specificity amounted to .50. the areas under the curve were .658 for the low 33%, .583 for the middle 33%, and .637 for the high 33%. prediction of academic performance 33% lowest middle 33% 33% highest observed academic performance low 33% 87.5 % 10% 2.5% middle 33% 0% 100% 0% high 33% 0% 0% 100% figure 3.testing phase of the neural network predicting the three levels of academic performance scores (low 33%middle 33%high 33%). the most important variables for the prediction of ann3 were orienting attention, learning strategies related to the cognitive resources and information processing, time management, and executive control (executive attentional network) (see table 5). table 5 relative importance of the most predictive variables included in the model for the predictive classification of the three levels of academic performance all 3 groups gpa (low 33% mid 33% high 33%) independent variable importance variables importance normalized importance orienting attention 0.087 100.00% cognitive resources/cognitive processing 0.076 86.86% time management 0.074 84.92% executive control 0.073 83.30% father's occupation 0.071 81.80% mother's occupation 0.070 79.91% ln of attention total reaction time 0.067 77.25% alerting attention 0.067 76.63% ln of reaction time math 0.061 70.14% processing of information/ generalization 0.050 57.20% ln of reaction time operation 0.043 49.64% study techniques and use of help 0.041 46.55% ln of reaction time problem 0.040 46.13% m. f. musso et al. 60 | f l r anxiety management 0.038 43.89% gender 0.032 36.67% highest level of education completed by father 0.031 35.73% absolute aospan (sum of perfectly recalled sets) 0.031 35.09% highest level of education completed by mother 0.026 29.88% secondary school 0.024 27.29% 4.3 maximizing the ann models all ann models were developed so as to maximize the accuracy of the classification. the number of units in the hidden layers was determined by optimizing the ability of the hidden nodes to store the necessary weight information, while avoiding the over-determination that would result from an excessive number of units. while greater number of units would have given the model greater flexibility, it would have increased complexity at the cost of decreasing generalizability to the testing sample. similarly, not enough units would not have produced a proper fit with the data and would have reduced the power of the model. therefore, various models were developed in order to find the proper balance and maximize the predictive power for each model. in all models, the training and testing samples were selected at random from the existing data and the proportions were adjusted in order to maximize the training sample while preserving the appearance of all detected patterns in the testing sample, so as to be able to appropriately test the model. other parameters that were varied in order to maximize the performance of the networks were learning rate and momentum. the variations in the learning rate parameter allowed the control of the amount of weight and bias change during the training of the network. different problem conditions find better solutions with different size of changes in the architecture of the network. regarding the momentum, it was used to prevent the network from converging too early to a local minimum, and conversely to avoid overshooting the global minimum of the function; thus, it is important to avoid having a value which is too large for the momentum (it can overshoot), or too low (it can get stuck in a local minimum). balancing these parameters maximizes the solution, and if correctly identified provide a stable and reliable solution as the ones that were found in this study. 4.4 predictive contribution by categories of variables besides studying the contribution of each variable individually for each neural network developed to classify the various expected performance levels (low performers, high performers, and three performance groups simultaneously), the contribution of each category or set of variables (background, basic cognitive processes, total reaction times for wmc operations and attentional networks, and learning strategies/motivation) was analysed for each ann developed, and the total predictive weight for each category of variables, as well as their average, was determined. table 6 and figure 4 show that in terms of predictive weight, the most important variables when estimating the levels of predicted gpa performance for all three groups simultaneously, are the background factors (e.g., socio-economic status proxy data, type of secondary school, occupation and education of parents, etc.), but when comparing the two extreme predicted performance groups, it is interesting to note that specific patterns involving different variables are evident for low and high expected academic performance: learning strategies/motivation had a stronger predictive weight for students expected to be in the lowest 33% of gpa performance; on the other hand, for students predicted to belong to the highest 33% of gpa performance, background variables and some of the cognitive processing variables were those carrying the most predictive weight. m. f. musso et al. 61 | f l r table 6 comparative predictive weight contribution for the three levels of academic performance by each of the categories of predictor variables low 33% mid 33 high 33% mean predictive weight of each area background 28.40% 25.40% 36.60% 30.13% basic cognitive 21.50% 25.70% 23.20% 23.47% reaction time total 18.90% 21.10% 16.60% 18.87% learning strategies/motivation 31.20% 27.80% 23.60% 27.53% 100% 100% 100% figure 4. comparison of predictive weight levels for the three levels of academic performance by categories of predictor variables. 4.5 initial analysis of individual continuous estimates of future academic performance while most of this study has been centered around the successful development of models to categorize expected levels of performance (which can be varied according to the problem situation), it is also important and useful to demonstrate that this machine learning approach can be used to predict individual specific outcomes (not just relatively broad performance categories). although these performance categories can be very useful, as has been indicated for the identification and possible intervention in specific groups of high achievers or low achievers (i.e., learning disabilities, non-readiness for some specific task such as reading), and they can be used very effectively for targeted interventions in learning situations, it is also important to 0% 5% 10% 15% 20% 25% 30% 35% 40% low 33% middle 33% high 33% background basic cognitive reaction time total learning strategies/motivation m. f. musso et al. 62 | f l r be able to understand the underlying phenomenon at the individual level, considering performance a continuous variable. for this reason, the predicted gpa-category (low-middle-high) probability values assigned by the network to each individual student were used to analyze their correlation with the observed gpa, as compared to the predicted value, in the context of the ann3 model, in which the whole sample of students was simultaneously classified in the three levels of expected performance. that is, the probability value for each student of belonging to a given category (all students received a certain probability of belonging to each of the outcome groups, as determined by the ann), was correlated with the gpa actually obtained by each student. results were indicative of a high degree of correlation between those measures. the three predicted groups of low, mid, and high performance had an actual observed gpa mean of 3.88 (sd = 1.21, n = 327), 5.67 (sd = .33, n = 243), and 7.28 (sd = .78, n = 294), respectively. all these average gpa means were significantly different from each other (p < .000). within each one of the performance levels, the correlation of the ann individual predicted value with the actual gpa was: low 33%, r = .78; high 33%, r = .73, and for the whole sample of students, at all three levels, the correlation of the ann predicted values with the observed gpa was r = .86. further studies will continue to explore these individual relationships, but as they are, they confirm a high level of correlation between the actual gpa and the expected values assigned by the ann. 4.6 discriminant analyses (da) da1 focused on the attempted predictive classification of students expected to be in the lowest 33% of gpa average, compared to the rest of the students. one of the restrictions of this analysis has to do with the assumption of equality of covariance matrices that, in this case, is not violated (box‟s m = 5.253, f = .871, p= .515). gender, wmc and cognitive resources/learning strategies, were able to discriminate between the two groups of students, but not the rest of the variables, that were included in the ann1. the squared canonical correlation (cr²) gives the amount of variation between the groups that is explained by the discriminating variables, which in this case was quite low (wilk‟s λ = .896, χ² = 84.786, df = 3, p = .001, cr² = .323). da2 was carried out to attempt to discriminate between students expected to be in the highest 33% of gpa average, compared to the 67% of the rest of the students. the same independent variables that were used in the ann2 were entered in this analysis. results show that the independent variables were not able to discriminate between both groups of students. the box‟s m statistic is not significant (box‟s m = 11.813, f = .781, p = .700), meaning that the assumption of equality of covariance matrices is not violated. in this analysis the squared canonical correlation indicated that the strength of the function is very low (wilk‟s λ = .926, χ² = 58.694, df = 5, p = .001, cr² = .271). only gender, highest level of education of the father, wmc, and cognitive resources, and time management among the learning strategies set, were variables that entered significantly in this model. da3 was carried out with the same variables as those used to develop ann3, in order to predict the expected gpa performance level of the three groups of academic performance simultaneously. the assumption of equality of covariance matrices was not violated (box‟s m = 7.522, f = .623, p = .824). in this case, only gender, cognitive resources within the learning strategies set and wmc were significant for the model, and participated in the discrimination between the students in the three groups. but the model explained a very low and non-significant proportion of the variance (wilk‟s λ = .998, χ² = 1.791, df = 2, p = .408, cr² = .048). m. f. musso et al. 63 | f l r 5. discussion and conclusions the purpose of this study was to show the applicability and the effectiveness of the ann approach to the predictive classification of students in the full range of academic performance (gpa), as well as to identify and understand the importance of the variables for each level (low, middle and high) of expected gpa. this methodology, using a predictive system, was chosen as it is very effective under conditions of very complex and great amount of data, in which a large number of variables interact in various complex and not very well understood patterns. the results attained in this study have allowed the identification of the specific influence of each input set of variables on different levels of academic performance (high and low performance), on one hand, and common processes across all students, on the other hand. one important contribution of this predictive approach is the finding that the same variables have different effects in each group of students, defining specific patterns for each performance level. although the contribution of each variable in a particular pattern carries a relatively small predictive weight, it is the combined effect of the pattern of variables which explains a lower or higher academic performance model. among the student group with the lowest 33% of academic performance, two main predictors are learning strategies components (cognitive resources/cognitive processing and time management). the importance of learning strategies as a mediating factor in a model predicting academic performance has been shown in different studies (dupeyrat & marine, 2005; fenollar, et al., 2007; simons et al., 2004; weinstein & mayer, 1986; weinstein et al., 1987; weinstein et al., 1982). however, this study added the contribution of a complex pattern of variables for a particular group of students, identifying specific learning strategies that help the classification of students in a low performance group (i.e., thoughts or behaviours that help to use imagery, verbal elaboration, organization strategies, and reasoning skills). included in this set are learning strategies that help build bridges between what they already know, and what they are trying to learn and remember (i.e., knowledge acquisition, retention, and future application). in addition, variables related to speed of processing involved in wmc functioning have an important predictive weight for the determination and modelling of the low performance group. other studies that have used ann have also found that basic cognitive processing variables such as wmc and executive attention carried the most predictive weight in the low performance group of students (kyndt et al., 2012, submitted; musso & cascallar, 2009a; musso et al., 2012). moreover, the literature has indicated the positive association between wmc and academic achievement (gathercole, pickering, knight, & stegmann, 2004; riding, grimley, dahraei, & banner, 2003). regarding the relative importance of each variable, if we compare the relative role of wmc and other cognitive resources between the low and high performance groups, wmc and cognitive resources were far more important for lower gpa students. the fact that their importance for the prediction is much greater for the lower performing group is greatly due to the fact that all members of the high group had higher levels of wmc and cognitive resources, therefore not providing the necessary information to the network. on the other hand, it was an identifying characteristic of the low performing group which had consistently lower values of wmc and cognitive resources. remediation programmes, tutorial systems and instruction methods should consider these specific learning strategies, cognitive processing characteristics and wmc resources, in order to provide basic support to students at risk. such informed interventions would improve the possibilities of successful academic achievement for the at-risk groups, including those with particular learning difficulties. background variables together with reaction time measures and attentional executive control are the most important predictors for the highest academic performance group, as indicators of both efficiency in the processing and of adequate selection of information. social background variables, such as educational level of the parents, have been found to be significant in a previous ann study (pinninghoff, junemann et al., 2007), and these results have been replicated in this study. the executive control mechanism is responsible for resolving conflicts among responses (fan et al., 2002). this attentional system has been closely related to working memory capacity (redick & engle, 2006), and was found to mediate and compensate wmc deficits for certain tasks (musso et al., 2012). other attentional networks seem to be much less discriminating among students who reach certain threshold levels needed for high academic performance. these findings have m. f. musso et al. 64 | f l r significant implications in the way that the learning process can be addressed for students identified as potential high achievers. for this group, promoting learning through the use of metacognitive strategies, complex processing, and targeted teacher feedback would be an important way of maximizing their potential performance. regarding methodological implications, these results demonstrate the greater accuracy of the ann approach compared to other traditional methods such as da. other studies have also made use of multilayer perceptron artificial neural networks, with positive results for the analysis of educational data (abu naser, 2012; croy et al., 2008; fong, et al., 2009; kanakana, & olanrewaju, 2011; mukta & usha, 2009; ramaswami & bhaskaran, 2010; zambrano matamala, et al., 2011). however, the present study has been able to maximize the precision obtained in the predictive classification of overall academic performance through the careful adjustment of network parameters and algorithms, producing highly accurate results with minimal misclassifications. similarly, the initial study of the correlation between the ann probabilities of performance level assigned to each individual student, with the actual gpa observed, shows a significant degree of correlation between the two measures (r = .86 for the whole sample), with performance as a continuous variable. further studies will refine the technique to maximize these individual results. the results of the da confirm the lack of significant linear relationships between the independent variables analysed in this study and academic performance. neural network models have an important advantage in this respect, as they are able to model nonlinear and complex relationships among variables with greater precision and accuracy. even though the assumptions required for traditional statistical predictive models (e.g. equality of covariance matrices) were not violated for the three stepwise discriminant analyses that were performed, the amount of variance explained was low in all three da analyses. none of these analyses were able to discriminate with sufficient accuracy between the different levels of expected academic performance. when we compare these results with the anns modelled in this study, it can be concluded that anns are much more robust, and perform significantly better than other classical techniques, as prior studies have also indicated (everson et al., 1994; marquez et al., 1991). this study has shown the power of this predictive approach using anns to model future overall academic performance in higher education, specifically in academic admissions and/or placement. to put the current results in perspective, if we consider one of the best known and most reliable tests currently in use, the sat from the college board, it has been found (kobrin, patterson, shaw, mattern, & barbuti, 2008) that all sections of the sat taken together, even with the more recent addition of a writing score, can predict at best 28% of the variance of the first-year college gpa for the average population of students. if we add to the sat results the information of the gpa obtained in secondary education, the overall prediction is of only 38% of the variance of first-year college gpa (kobrin et al, 2008). with the current ann models, it has been possible to correctly classify 100% of student performance in the categories examined, that is, 100% of the students were correctly classified, and our research currently continues into the development of new predictive models, with much larger data sets, to classify students in much narrower bands of expected performance having already attained 98-99% accuracy in models for quintals of student performance distributions. in addition, work will also continue for the prediction of specific expected gpa results for each individual student. in conclusion, the current predictive systems approach facilitates and maximizes the identification of those factors (or predictors) of the learning processes which participate in varying degrees in the modelling of different levels of performance in academic outcomes in higher education. if we can identify specific profiles of students, focusing on the most important variables, this opens major possibilities for the improvement of assessment procedures and the planning of pre-emptive interventions. given that this methodology allows for the accurate prediction of actual academic performance at least one academic year in advance to it actually being measured (gpa), it has implications for the application of these methods in educational research and in the implementation of diagnostic “early-warning” programmes in educational settings. these results also inform cognitive theory and help in the development of improved automated tutoring and learning systems. although some of the variables involved, such as educational level of the m. f. musso et al. 65 | f l r parents, are impossible to alter in their effects on academic performance at the time of the assessment, they do inform policy and indicate the weight that many social and environmental factors influence future academic performance. this methodological and conceptual approach allows us to consider a large number of variables simultaneously and select those which are most relevant and allow a greater degree of intervention to improve student performance, including early intervention programmes for students in need of special support. the capacity to very accurately classify expected student performance, which is also what tests attempt to do, without the performance sampling issues of traditional testing, and using a much broader spectrum of all factors influencing a student‟s overall performance, is a major advantage of the anns methodology. in fact, it also represents a more valid approach to educational assessment due to its overall accuracy and the breadth of the constructs considered to classify the expected performance. traditional assessments are not sufficient for more complex assessments or for assessment systems that intend to serve multiple direct and indirect purposes, in complex educational situations (mislevy, 2013; mislevy, steinberg, & almond, 2003) in this respect, this new approach allows for the conceptualization and development of new modes of assessment which could facilitate breaking away from traditional forms of testing while at the same time improving the quality of the assessment process (segers, dochy & cascallar, 2003). finally, the use of ann together with other methods as cluster analyses and kohonen networks could contribute to the study of the specific patterns of those variables which influence the learning process for each level of performance. in fact, a major observation resulting from the data in this study is that variables contribute to the prediction in relatively small proportions, and it is the joint effect of many contributing variables that could cause significant changes in performance. in other words, there is no “magic bullet”, rather the accumulation of effects from all these various sources that produces significant changes in outcomes. these results provide an insight into learning questions from a different perspective and one that has important implications for educational policy and education at large. keypoints this approach provides a more contextualized and encompassing new mode of assessing expected performance without some of the pitfalls found in traditional forms of testing. anns are a powerful tool to model future academic performance, specifically in academic diagnostic evaluations for placement and early-warning assessments. this methodology demonstrates that variables impacting the outcome of the learning process are embedded in specific large-scale patterns which determine their degree of influence and direction of their effects. a predictive systems approach is a valuable method to study the specific patterns of variables influencing the learning process at each level of expected performance, to better understand the determinants of learning outcomes and ways to improve them with early interventions. references abu naser, s. s. (2012). predicting learners performance using artificial neural networks in linear programming intelligent tutoring system. international journal of artificial intelligence & applications (ijaia), 3(2), 65-73 anderson, j. r. (1983). the architecture of cognition. cambridge, ma: harvard university press. anderson, j. r. (1993). rules of the mind. hillsdale, nj: lawrence erlbaum associates. anderson, j. r. (2002). spanning seven orders of magnitude: a challenge for cognitive modeling. cognitive science, 26, 85–112. m. f. musso et al. 66 | f l r anderson, j. r. (2007) how can the human mind occur in the physical universe? new york: oxford university press. anderson, j. r., bothell, d., byrne, m. d., douglass s., lebiere, c., &yulin, q. (2004). an integrated theory of the mind. psychological review, 111(4), 1036–1060. baddeley, a. d. (1986). working memory. oxford: clarendon press. bansal, a., kauffman, r. j., & weitz, r. r. (1993). comparing the modeling performance of regression and neural networks as data quality varies: a business value approach. journal of managemnet informations systems, 10(1), 1132. bekele, r., & mcpherson, m. (2011).a bayesian performance prediction model for mathematics education: a prototypical approach for effective group composition. british journal of educational technology, 42(3), 395–416. biggs, j. (1987). study process questionnaire manual. melbourne, australia: australian council for educational research. birenbaum, m., breuer, k., cascallar, e., dochy, f., dori, y, ridgway, j, wiesemes, r. (2006), & nickmans, g. (editor). a learning integrated assessment system. educational research review, 1, 61-67. boekaerts, m., & cascallar, e. (2006). how far have we moved toward the integration of theory and practice in self-regulation? educational psychology review, 18(3), 199-210. boekaerts, m. & cascallar, e. c. (2011). predicting and explaining writing outcomes: neural network methodology at work. symposium: predicting academic performance with the use of predictive systems analysis. proceedings of the biennial conference of the european association for research on learning and instruction (earli). exeter, uk, 30 august – 3 september 2011. braten, i. & stromso, h. (2006). epistemological beliefs, interest, and gender as predictors of internet-based learning activities. computers in human behavior, 22, 1027-1042. cascallar, e. c., boekaerts, m., & costigan, t. e. (2006) assessment in the evaluation of selfregulation as a process, educational psychology review, 18(3), 297-306. cascallar, e. c., & musso, m. f. (2008). classificatory stream analysis in the prediction of expected reading readiness: understanding student performance. international journal of psychology, proceedings of the xxix international congress of psychology icp 2008, 43(43/44), 231-.231. castejón, j. l., & navas, l. (1992). determinantes del rendimiento académico en la educación secundaria. un modelo causal. [determinants of academic achievement in secondary education. a causal model]. análisis y modificación de conducta, 18(61), 697-728. cattell, r. b. (1971). abilities: structure, growth and action. boston: houghton mifflin. chamorro-premuzic, t., & arteche, a. (2008). intellectual competence and academic performance: preliminary validation of a model. intelligence, 36, 564-573. colom, r., escorial, s., chun shih, p., & privado, j. (2007).fluid intelligence, memory span, and temperament difficulties predict academic performance of young adolescents. personality and individual differences, 42, 1503-1514. conway, a. r. a., cowan, n., bunting, m. f., therriault, d., & minkoff, s. (2002). a latent variable analysis of working memory capacity, short term memory capacity, processing speed, and general fluid intelligence. intelligence, 30, 163183. conway, a. r. a., & engle, r.w. (1996). individual differences in working memory capacity: more evidence for a general capacity theory. memory, 4, 577-590. conway, a. r. a., kane, m. j., bunting, m. f., hambrick, d. z., wilhelm, o., & engle, r. w. (2005).working memory span tasks: a methodological review and user‟s guide. psychonomic bulletin & review, 12(5), 769-786 croy, m., barnes, t., & stamper, j. (2008). towards an intelligent tutoring system for propositional proof construction. in a. briggle, k. waelbers, and p. brey (eds.), computing and philosophy (pp. 145215). amsterdam, the netherlands: ios press. daneman, m., & carpenter, p. a. (1980).individual-differences in working memory and reading. journal of verbal learning and verbal behaviour, 19, 450 466. m. f. musso et al. 67 | f l r detienne, k. b., detienne, d. h., & joshi, s. a. (2003). neural networks as statistical tools for business researchers. organizational research methods, 6, 236-265. duliba, k. a. (1991) contrasting neural nets with regression in predicting performance in the transportation industry. proceedings of the twenty-fourth annual hawaii international conference on system sciences, 4. dupeyrat, c., & marine, c. (2005). implicit theories of intelligence, goal orientation, cognitive engagement, and achievement: a test of dweck's model with returning to school adults. contemporary educational psychology, 30(1), 43-59. engle, r.w. (2002). working memory capacity as executive attention. current directions in psychological science, 11, 19-23. engle, r.w., & kane, m. j. (2004).executive attention, working memory capacity, and a two-factor theory of cognitive control. in b. ross (ed.), the psychology of learning and motivation (pp. 145-199). newyork, ny: elsevier. eriksen, b. a., & eriksen, c.w. (1974). effects of noise letters upon the identification of a target letter in a non search task. perception and psychophysics, 16, 143-149. everson, h. t. (1995). modelling the student in intelligent tutoring systems: the promise of a new psychometrics. instructional science, 23(5-6), 433-452. everson, h. t., chance, d., & lykins, s. (1994). exploring the use of artificial neural networks in educational research. paper presented at the annual meeting of the american educational research association, new york. fan, j., mccandliss, b. d., summer, t., raz, a., & posner, m.i. (2002).testing the efficiency and independence of attentional networks. journal of cognitive neuroscience, 14(3), 340-347. feldman barrett, l., tugade, m. m., & engle, r. w. (2004). individual differences in working memory capacity and dual-process theories of mind. psychological bulletin, 130, 553-573. fenollar, p., roman, s., & cuestas, p. j. (2007). university students‟ academic performance: an integrative conceptual framework and empirical analysis. british journal of educational psychology, 77, 873891. fernandez-castillo, a., & gutiérrez-rojas, m. e. (2009). selective attention, anxiety, depressive symptomatology and academic performance in adolescents. electronic journal of research in educational psychology, 7(1), 49-76. fletcher, j. m. (2005). predicting math outcomes: reading predictors and comorbidity. journal of learning disabilities, 38(4), 308-312. fong, s., si, y.-w., & biuk-aghai, r. p. (2009). applying a hybrid model of neural network and decision tree classifier for predicting university admission. proceedings of the 7th international conference on information, communication, and signal processing (icics2009), pp. 1-5, macau, china, ieee press. garson, g. d. (1998). neural networks. an introductory guide for social scientists. london: sage publications ltd. gathercole, s. e., pickering, s. j., knight, c., & stegmann, z. (2004).working memory skills and educational attainment: evidence from national curriculum assessments at 7 and 14 years of age. applied cognitive psychology, 18, 1-16. gazzaniga, m., ivry, r., & mangun, g. (2002).cognitive neuroscience: the biology of the mind (2nd ed.). new york, ny: w.w. norton grimley, m., & banner, g. (2008).working memory, cognitive style, and behavioural predictors of gcse exam success. educational psychology, 28(3), 341-351. grossberg, s. (1980). how does the brain build a cognitive code? psychological review, 87, 151. grossberg, s. (1982). studies of mind and brain: neural principles of learning, perception, development, cognition and motor control. boston: reidel press. gsanger, k., w., homack, s., siekierski, b., & riccio, c. (2002).the relation of memory and attention to academic achievement in children. archives of clinical neuropsychology, 17(8), 790. hailikari, t., nevgi, & a., komulainen, e. (2008). academic self-beliefs and prior knowledge as predictors of student achievement in mathematics: a structural model. educational psychology, 28(1), 59-71. http://ieeexplore.ieee.org/xpl/mostrecentissue.jsp?punumber=882 http://ieeexplore.ieee.org/xpl/mostrecentissue.jsp?punumber=882 http://www.sciencedirect.com/science?_ob=gatewayurl&_method=citationsearch&_urlversion=4&_origin=sdtoptwofive&_version=1&_piikey=s0361476x04000256&md5=b17b12e34e6786bfaf70ce74613ab07b http://www.sciencedirect.com/science?_ob=gatewayurl&_method=citationsearch&_urlversion=4&_origin=sdtoptwofive&_version=1&_piikey=s0361476x04000256&md5=b17b12e34e6786bfaf70ce74613ab07b m. f. musso et al. 68 | f l r hardgrave, b. c., wilson, r. l., & walstrom, k. a. (1994).predicting graduate student success: a comparison of neural networks and traditional techniques. computer and operations research, 21(3), 249-263. hazy, t. e., frank, m. j., & o‟ reilly, r. c. (2006). banishing the homunculus: making working memory work, neuroscience 139, 105–118. heitz, r. p., redick, t. s., hambrick, d. z., kane, m. j., conway, a. r. a., & engle, r. w. (2006). working memory, executive function, and general fluid intelligence are not the same. behavioral and brain sciences, 29, 135-136. jarrold, c., & towse, j. n. (2006). individual differences in working memory. neuroscience, 139, 39-50. jimmerson, s. r., dubrow, e. h., adam, e., gunnar, m., & bozoky, i. k. (2006).associations among academic achievement, attention, and andrenocortical reactivity in caribbean village children. canadian journal of school psychology, 21, 120-138. kanakana, g., & olanrewaju, a. (2011).predicting student performance in engineering education using an artificial neural network at tshwane university of technology, proceedings of the isem, stellenbosch, south africa. kane, m. j., hambrick, d. z., tuholski, s.w., wilhelm, o., payne, t.w., & engle, r.w. (2004). the generality of working memory capacity: a latent variable approach to verbal and visuospatial memory span and reasoning. journal of experimental psychology: general, 133, 189-217. kent, r. (2009). rethinking data analysis – part two. some alternatives to frequentist approaches. international journal of market research, 51, 181-202. kobrin, j. l., patterson, b. f., shaw, e. j., mattern, k. d., & barbuti, s. m. (2008). validity of the sat for predicting first-year college grade point average. college board research report 2008-5.new york: the college board. retrieved from http://research.collegeboard.org/rr2008-5.pdf. kohavi, r. & provost, f. (1998).glossary of terms. machine learning, 30(2–3): 271–274. kuncel, n. r., hezlett, s. a., & ones, d. s. (2001). a comprehensive meta-analysis of the predictive validity of the graduate record examinations: implications for graduate student selection and performance. psychological bulletin, 127(1), 162-181. kuncel, n. r., crede, m., thomas, l. l., klieger, d.m., seiler, s.n., & woo, s.e. (2004). a meta-analysis of the pharmacy college admission test (pcat) and grade predictors of pharmacy student success. annual conference of the american psychological society, chicago, il. kuncel, n. r., hezlett, s. a., & ones, d. s. (2004). academic performance, career potential, creativity, and job performance: can one construct predict them all? journal of personality and social psychology, 86(1), 148-161. kuncel, n. r., crede, m., thomas, l. l., klieger, d. m., seiler, s. n., & woo, s. e. (2005). a meta-analysis of the pharmacy college admission test (pcat) and grade predictors of pharmacy student success. american journal of pharmaceutical education, 69(3), 339-347. krumm, s., ziegler, m., buehner, m. (2008). reasoning and working memory as predictors of school grades. learning and individual differences, 18 (2), 248-257. kyllonen, p. c., & christal, r. e. (1990). reasoning ability is (little more than) working-memory capacity?! intelligence, 14, 389-433. kyllonen, p. c., & stephens, d. l. (1990).cognitive abilities as determinants of success in acquiring logic skill. learning and individual differences, 2, 129-160. kyndt, e., cascallar, e., & dochy, f. (2012). individual differences in working memory capacity and attention, and their relationship with students‟ approaches to learning. higher education, 64(3), 285297. kyndt, e., musso, m., cascallar, e., & dochy, f. (2012, submitted). predicting academic performance: the role of cognition, motivation and learning approaches. a neural network analysis.journal of further and higher education. landerl, k. (2010). temporal processing, attention, and learning disorders. learning & individual differences, 20(5), 393-401. linn, r. l., & hastings, c. n. (1984). a meta-analysis of the validity of predictors of performance in law school. journal of educational measurement, 21, 245-259. http://www.psychology.gatech.edu/renglelab/publications/2006/heitzetal_bbs_2006.pdf http://www.psychology.gatech.edu/renglelab/publications/2006/heitzetal_bbs_2006.pdf http://www.psychology.gatech.edu/renglelab/publications/2006/heitzetal_bbs_2006.pdf http://internal.psychology.illinois.edu/~nkuncel/gre%20meta.pdf http://internal.psychology.illinois.edu/~nkuncel/gre%20meta.pdf http://internal.psychology.illinois.edu/~nkuncel/gre%20meta.pdf http://internal.psychology.illinois.edu/~nkuncel/academic_performance%20-%20in%20jpsp,%20by%20kuncel.pdf http://internal.psychology.illinois.edu/~nkuncel/academic_performance%20-%20in%20jpsp,%20by%20kuncel.pdf http://internal.psychology.illinois.edu/~nkuncel/academic_performance%20-%20in%20jpsp,%20by%20kuncel.pdf http://internal.psychology.illinois.edu/~nkuncel/kuncel%20et%20al%202005%20-%20pcat%20-%20ajpe.pdf http://internal.psychology.illinois.edu/~nkuncel/kuncel%20et%20al%202005%20-%20pcat%20-%20ajpe.pdf http://internal.psychology.illinois.edu/~nkuncel/kuncel%20et%20al%202005%20-%20pcat%20-%20ajpe.pdf m. f. musso et al. 69 | f l r lippman, r. (1987). an introduction to computing with neuralets. ieee assp magazine, 3(4), 4-22. lovett, m. w. (1979). the selective encoding of sentential information in normal reading development. child development, 50(3), 897. lykins, s., & chance, d. (1992). comparing artificial neural networks and multiple regression for predictive application, proceedings of the eight annual conference on applied mathematics, edmond ok, 155169 marquez, l., hill, t., worthley, r., & remus, w. (1991). neural network models as an alternative to regression. proceedings of the ieee 24th annual hawaii international conference on systems sciences, 4, 129-135. marshall, d. b., & english, d. j. (2000).neural network modelling of risk assessment in child protective services. psychological methods, 5(1), 102-124. maucieri, l. p. (2003). predicting behavior with an artificial neural network: a comparison with linear models of prediction (january 1, 2003). etd collection for fordham university, ny, usa. retrieved from http://fordham.bepress.com/dissertations/aai3098134. mavrovouniotis, m. l. & chang, s. (1992).hierarchical neural networks. computers & chemical engineering, 16(4), 347-369. miñano, p., gilar, r., & castejón, j. l. (2012) a structural model of cognitive-motivational variables as explanatory factors of academic achievement in spanish language and mathematics. anales de psicología, 28(1), 45-54. mislevy, r. j. (2013). measurement is a necessary but not sufficient frame for assessment. measurement, 11, 47–49, 2013 mislevy, r. j., steinberg, l. s., & almond, r. a. (2003). on the structure of educational assessments. measurement: interdisciplinary research and perspectives, 1, 3–67. mukta, p., & usha, a., (2009). a study of academic performance of business school graduates using neural network and statistical techniques. expert systems with applications, 36(4), 7865-7872. musso, m. f., & cascallar, e. c. (2009a). new approaches for improved quality in educational assessments: using automated predictive systems in reading and mathematics. journal of problems of education in the 21 st century, 17, 134-151. musso, m. f., & cascallar, e. c. (2009b).predictive systems using artificial neural networks: an introduction to concepts and applications in education and social sciences. in m. c. richaud & j. e. moreno (eds.).research in behavioural sciences (volume i), (pp. 433-459). argentina: ciipme/conicet. musso, m. f., kyndt, e., cascallar, e. c., & dochy, f. (2012). predicting mathematical performance: the effect of cognitive processes and self-regulation factors. education research international.vol. 12. nasr, g. e., badr, e. a., & joun, c. (2002). cross entropy error function in neural networks: forecasting gasoline demand. flairs-02 proceedings of the aaai. retrieved from http://www.aaai.org/papers/flairs/2002/flairs02-075.pdf navas, l., sampascual, g., & santed, m. a. (2003). predicción de las calificaciones de los estudiantes: la capacidad explicativa de la inteligencia general y de la motivación. [prediction of students‟ performance scores: the role of the general intelligence and motivation. journal of general and applied psychology], 56(2), 225-237. neal, w., & wurst, j. (2001). advances in market segmentation. marketing research, 13(1), 14-18. passolunghi, m. c., & pazzaglia, f. (2004). individual differences in memory updating in relation to arithmetic problem solving. learning and individual differences 14(4), 219-230. perkins, k., gupta, l. & tammana (1995). predict item difficulty in a reading comprehension test with an artificial neural network. language testing, 12(1), 34-53. pickering, s. j. (2006). working memory and education. usa: academic press. pinninghoff junemann, m. a., salcedo lagos, p. a., & contreras arriagada, r. (2007).neural networks to predict schooling failure/success. in j. mira & j.r. ´alvarez (eds.), iwinac 2007, part ii, lncs 4528(pp. 571–579). berlin / heidelberg: springer-verlag. pintrich, p. r. (2000). the role of goal orientation in self-regulated learning. in m. boekaerts, p.r. pintrich, & m. zeidner (eds.), handbook of self-regulation (pp. 452–502). san diego, ca: academic press. m. f. musso et al. 70 | f l r posner, m. i. (1980). orienting of attention. quarterly journal of experimental psychology, 41a, 19-45. posner, m. i., & petersen, s. e. (1990). the attention system of the human brain. annual review neuroscience. 13, 25-42. posner, m. i., & rothbart, m. k. (1998). attention, self-regulation and consciousness. philosophical transactions of the royal society of london. series b, biological sciences, 353, 1915–1927. ramaswami, m. m., & bhaskaran, r. r. (2010). a chaid based performance prediction model in educational data mining. international journal of computer science issues, 7(1), 10-18. redick, t. s., & engle, r.w. (2006).working memory capacity and attention network test performance. applied cognitive psychology, 20, 713-721. reid, r. (2006). self-regulated strategy development for written expression with students with attention deficit/ hyperactivity disorder. exceptional children, 73(1), 53-67. riccio, c. a., lee, d., romine, c. cash, d., & davis, b. (2002).relation of memory and attention to academic achievement in adults. archives of clinical neuropsychology, 18(7), 755-756. riding, r. j., grimley, m., dahraei, h., & banner, g. (2003).cognitive style, working memory and learning behaviour and attainment in school subjects. british journal of educational psychology, 73, 749-769. roth, p. l., be vier, c. a., switzer, f. s., & schippmann, j. s. (1996). meta-analyzing the relationship between grades and job performance. journal of applied psychology, 81, 548-556. roth, p. l., & clarke, r. l. (1998). meta-analyzing the relation between grades and salary. journal of vocational behavior, 53, 386-400. ruban, l. m., & mccoach, d. b. (2005). gender differences in explaining grades using structural equation modeling. the review of higher education, 28, 475-502. rueda, m. r., posner, m. i., & rothbart, m. k. (2004). attentional control and self regulation. in r.f. baumeister & k.d. vohs (eds), handbook of self regulation: research, theory, and applications, new york: guilford press, 14: 283-300. rumelhart, d., hinton, g. & williams, r. (1986). learning representations by back-propagating errors. nature, 323, 533536. rumelhart, d. e., mcclelland, j. l., & the pdp research group. (1986). parallel distributed processing: explorations in the microstructure of cognition. volume i. cambridge, ma: mit press. schmidt, f. l. (2002). the role of general cognitive ability and job performance: why there cannot be a debate. human performance, 15, 187–210. segers, m., dochy, f., & cascallar, e. (2003).optimizing new modes of assessment: in search of qualities and standards.the netherlands: kluwer academic publishers. simons, j., dewitte, s., & lens, w. (2004). the role of different types of instrumentality in motivation, study strategies, and performance: know why you learn, so you'll know what you learn! british journal of educational psychology, 74, 343-360. snyderman, m., & rothman, s. (1987). survey of expert opinion on intelligence and aptitude testing. american psychologist, 42(2), 137-144 specht, d. (1991). a general regression neural network. ieee transactions on neural networks, 2(6), 568576. st clair-thompson, h. l., & gathercole, s. e. (2006). executive functions and achievements in school: shifting, updating, inhibition, and working memory. the quarterly journal of experimental psychology, 59(4), 745-759. strucchi, e. (1991). inventario de estrategias de aprendizaje y de estudio. [learning strategies inventory and study]. buenos aires: psicoteca. turner, e. a., chandler, m., & heffer, r. w. (2009). influence of parenting styles, achievement motivation, and self-efficacy on academic performance in college students. journal of college student development, 50, 3, 337-346. unsworth, n., heitz, r. p., schrock, j. c., & engle, r. w. (2005). an automated version of the operation span task. behavior research methods, 37(3), 498-505. vandamme, j. p., meskens, n., & superby, j. f. (2007). predicting academic performance by data mining methods.education economic, 15(4), 405-41. m. f. musso et al. 71 | f l r walczak, s. (1994). categorizing university student applicants with neural networks. ieee international conference on neural networks, 6, 3680-3685. weinstein, c. e., & mayer, r.e. (1986). the teaching of learning strategies. in m.c. wittrock (ed.), handbook of research on teaching (3rd ed.). macmillan, new york. weinstein, c. e. & palmer, d. r. (2002). lassi: user’s manual (2 nd edition). clearwater, fl: h&h publishing company, inc. weinstein, c. e., palmer, d. r., & schulte, a. c. (1987).learning and study strategies inventory. clearwater, fl: h & h publishing company, inc. weinstein, c. e., schulte, a. c, & cascallar, e. c. (1982). the learning and studies strategies inventory (lassi): initial design and development. technical report, us army research institute for the social and behavioural sciences, alexandria, va. weiss, s. m. & kulikowski, c. a. (1991). computer systems that learn. san mateo, ca: morgan kaufmann publishers. welsh, m.c., satterlee-cartmell, t., & stine, m. (1999). towers of hanoi and london: contribution of working memory and inhibition to performance. brain cognition, 41(2), 231-242. white, h. & racine, j. (2001): statistical inference, the bootstrap, and neural network modelling with application to foreign exchange rates. ieee transactions on neural networks: special issue on neural networks in financial engineering, 12, 657-673. wilson, r. l. & hardgrave, b. c. (1995). predicting graduate student success in a mba program: regression vs. classification. educational and psychological measurement, 55, 186-195. zambrano matamala, c., rojas díaz, d., carvajal cuello, k., & acuña leiva, g. (2011). análisis de rendimiento académico estudiantil usando data warehouse y redes neuronales. [analysis of students‟ academic performance using data warehouse and neural networks] ingeniare. revista chilena de ingeniería, 19(3), 369-381. zeegers, p. (2004). student learning in higher education: a path analysis of academic achievement in science. higher education research & development, 23(1), 35-56. microsoft word do et al_publication.docx frontline learning research vol. 11 no. 1 (2023) 94 122 issn 2295-3159 peer cooperation during teaching in paired field placements: forms and challenges minh-ly do1 & tina hascher1 1university of bern, switzerland article received 11 june 2023/ article revised 22 june 2023/ accepted 23 june 2023 / available online 12 july 2023 abstract paired field placement is an important element of teacher education where student teachers can acquire professional cooperative skills through team teaching. however, little is known about challenges that student teachers face during team teaching. also, knowledge about challenges during the team teaching process (e.g. planning, instruction, reflection) is scarce. this study focuses on pre-primary and primary student teachers’ challenges with peer cooperation during team teaching, the problems they face, and how they cope with negative experiences. data were collected from 30 student teachers through in-depth, semi-structured interviews. results reveal various forms of conflict during different phases of peer cooperation in team teaching such as lack of flexibility due to pressure to follow agreements, or unclear roles and responsibilities. instruction turns out to be the most challenging phase of team teaching, with lack of compatibility with the peer as the most frequent reason for problems. reflection is rarely used in a cooperative setting. the findings also revealed the frequent use of reactive strategies to cope with challenges, particularly the strategy of avoiding problems. keywords: peer cooperation; paired field placements; student teachers; team teaching; challenges. corresponding author: tina hascher, institute of educational science, department of research in school and instruction, university of bern, fabrikstrasse 8, 3012 bern, switzerland. tina.hascher@unibe.ch doi: https://doi.org/10.14786/flr.v11i1.1305 do | f l r 94 1. introduction cooperation among teachers has proven to positively influence teaching quality (organisation for economic co-operation and development [oecd], 2019), and students have shown higher achievement in schools where teachers cooperate (ronfeldt et al., 2015). standards for teacher education (cochran-smith, 1991; darling-hammond, 1996) have highlighted cooperation as a professional skill that needs to be developed (thousand et al., 2006). teachers express a need for training and preparation for cooperation (murawski & dieker, 2008), and the national council for the accreditation of teacher education (2010) also encourages us teacher education programmes to implement cooperation models, such as team teaching in field placements where student teachers are introduced to practice through student teaching. what is known of cooperation in field placements? studies have confirmed the power of collaborative learning in different practicum settings and teacher education contexts (for a review see cohen et al., 2013) as well as student teachers’ cooperation in pairs with a cooperating or mentor teacher (e.g. goodnough et al., 2009). although studies on team teaching in paired field placements have highlighted benefits for student teachers (baeten & simons, 2014; kamens, 2007), there is also evidence that paired field placement can be challenging and lead to negative experiences (e.g. guise et al., 2017; nokes et al., 2008). to date, few studies have paid close attention to the specific challenges of peer cooperation in field placements (baeten & simons, 2014; dang, 2013; goodnough et al., 2009). according to dang (2017, p. 327), specific attention “should be paid to the process of collaboration in paired-placements, to optimize the resolution of conflicts and the conditions that lead to teacher learning in pairs”. this study therefore aims to identify and understand the forms of team teaching and challenges that student teachers face during team teaching in paired field placement during different phases of teaching (including the phases of planning, instruction and reflection) and, thus, phases of cooperation (e.g. dang, 2013). it aims to contribute to a better understanding of paired field placements as learning situations by analysing the challenges of cooperation and coping strategies from student teachers’ perspective. the added value of the identification of student teachers’ challenges and coping strategies is twofold: first, it can enrich our knowledge of how personal growth and professional development in paired field placements can be enhanced or impeded; second, it can inform teacher education and mentor teachers to better understand when student teacher cooperation needs support in paired field placements. thus, the findings can help to improve the quality of paired field placements. 2. cooperation in the teaching profession the importance of the idea of cooperation for effective teaching can be aligned with different learning theories (for a review see dillenbourg et al., 1996 and hämäläinen & vähäsantanen, 2011). generally, the theoretical rationale for cooperation is rooted in cognitive-developmental and learning perspectives, grounded in the work of piaget (1926), vygotsky’s (1978) socio-cultural approach, and the notion of situated learning based on the theory by lave and wenger (1991). teacher cooperation was addressed by little (1990), who differentiated four models of cooperation: storytelling and scanning for ideas, aid and assistance, sharing and joint work. it is further discussed under the umbrella of a variety of constructs and practices, such as co-teaching, team teaching and cooperative teaching or collaboration (see baeten & simons, 2014). nissen et al. (2014, p. 473) suggested viewing cooperation and collaboration “as two different forms of interaction” that, for instance, can result when teachers work together with special education teachers. similarly, arnold et al. (2012, p. 433) stated that “cooperation allows for some independent work of group members, who take responsibility for specific subtasks to be assembled into a larger whole at the end”, whereas collaboration implies more direct interaction among individuals to create a common product and involves negotiations, discussions and accommodating others’ perspectives. thus, the term “cooperation” seems to refer to a broader concept, recognizing different roles and functions of the participants, and may include collaborative activities and coconstruction. we acknowledge previous research that has shown that cooperation can take various forms (kamens, 2007) and allow this definition to include a variety of various work-related interactions, such as informal exchange, sharing ideas, mutual support, co-teaching in a classroom and common reflection on do | f l r 95 instruction. more concretely, we use the term “cooperation” according to baeten and simons (2014, p. 93), with “two or more teachers in some level of collaboration in the planning, delivery and/or evaluation of a course,” as this definition stipulates that the participants enjoy a similar status and covers the whole process of teaching and cooperation. this definition shows that professional cooperation is a complex and challenging task. with regard to teacher education, it is therefore interesting to know how pre-service teachers are introduced to peer cooperation in the teaching profession and how they learn to cooperate during teacher education and, more specifically, during paired field placements. 3. student teacher peer cooperation in paired field placement paired field placement is a form of partnered student teaching (gardiner & robinson, 2009) that allows peer cooperation. cooperation in partnered student teaching can take a variety of forms, models and formats (e.g. baeten & simons, 2014), such as “station teaching” or the “one teach, one support” approach (e.g. friend et al., 2010) with team teaching as a prominent model. 3.1 student teacher team teaching team teaching has been recognized as a promising model of student teaching during field experiences, such as in language classes (e.g. barahona, 2017; carless, 2006; liu, 2008). a meta-analysis conducted by baeten and simons (2014, p. 95) revealed five models of team teaching. (1) in the observation model, the teaching is carried out by a person who is observed by a partner. the responsibility for the entire course of the lesson lies with the person teaching. (2) in the coaching model, the lesson is conducted by one person, the partner has an advisory function in addition to the role of observer (e.g. feedback, suggestions for improvement). one teacher has the overall responsibility for the course of the lesson. (3) in the assistant teaching model, one person has the main responsibility for teaching, while the partner assists. although one person has the overall responsibility for the lesson, the assisting partner takes over a part of the responsibility, for example, for the individual assistance of single students. (4) in the equal status model, teaching is based on equal partnership. all persons work under common objectives and responsibility. this form of team teaching requires joint lesson preparation and includes three subforms: sequential teaching (teaching is divided into sequences), parallel teaching (groups of students are taught simultaneously by different teachers) and station teaching (teachers are responsible for specific parts during the teaching process). (5) in the teaming model, classes are conducted under an equal partnership. all persons work under common objectives and assumption of responsibility. this form of team teaching requires joint lesson preparation and includes three subforms: parallel, sequential and station teaching. baeten and simons (2014) shed light on the roles and responsibilities depending on these five team teaching models. they found that the “equal status model” was the most frequently used team teaching model during paired field placements (baeten & simons, 2016). a comparison between “sequential teaching” and “parallel teaching” (both equal status models) showed that student teachers have positive feelings towards both models (simons et al., 2020). 3.2 benefits of student teacher peer cooperation there is a growing body of evidence that shows the benefits of student teacher peer cooperation. generally, team teaching is positively appraised (anderson & speck, 1998), and student teachers appreciate having a partner to provide continuous feedback and encouragement while teaching (kamens, 2007). benefits include emotional and professional support (e.g. bullough et al., 2002; goodnough et al., 2009; stairs et al., 2009; tsybulsky & muchnik-rozanov, 2019), increased dialogue (e.g. sorensen, 2014), support for professional development (e.g. goodnough et al., 2009) and personal growth (e.g. barahona, 2017; dang, 2013; simons et al., 2020). it can also help to reduce feelings of isolation (kelchtermans, 2006). studies also highlight the importance of peer cooperation as a learning approach (e.g. johnson & johnson, 2009; topping, 2005). student teacher cooperation can be seen as an opportunity to learn from each other, for do | f l r 96 example through peer coaching and mentoring (e.g. howlett & nguyen, 2020; wynn & kromrey, 2000). specifically, peer cooperation can be a mutual stimulus during co-planning through the exchange of ideas (tsybulsky, 2019). during paired field placements, peer cooperation may involve reflective co-generative dialogue to improve teaching quality (birrell & bullough, 2005; wassell & lavan, 2009) – for instance, peers discuss issues that impact teaching and learning in order to collaboratively develop solutions through reflective discussions (scantlebury et al., 2008). student teacher peer cooperation offers mutual guidance in the classroom by sharing responsibilities in managing student learning (darragh et al., 2011). other advantages may be related to classroom management, as two people monitor school students in the classroom (kamens, 2007; nokes et al., 2008). peer cooperation also seems to support student teachers in coping with stress (birrell & bullough, 2005; goodnough et al., 2009). 3.3 challenges of student teacher peer cooperation despite these benefits, research has shown that student teacher peer cooperation can be challenging and even fail. working with a peer might be an unfamiliar situation that can lead to difficulties (bashan & holsblat, 2012). difficulties are related to a peer’s willingness to cooperate, as peer placements often do not reflect a realistic teaching situation in daily work (gardiner & robinson, 2011). student teacher concerns also relate to the worry that there will be fewer teaching opportunities during peer placements (kamens, 2007). lack of time and increased workload are one of the most frequently mentioned challenges during team teaching (e.g. nokes et al., 2008). instructional activities must be coordinated, which means that additional time is required (simons et al., 2020; tsybulsky, 2019). additionally, cooperation partners need to be able to negotiate and discuss concerns in a way that is mutually beneficial, and discrepant perspectives may lead to conflicts (nokes et al., 2008). concerns about differences in teaching styles during student teacher cooperation are cited as disadvantages (nokes et al., 2008). tensions may occur when student teachers provide peer feedback (shin et al., 2007). although peer feedback might be more detailed and frequent than that of teachers (gardiner & robinson, 2009), it is also criticized as being too lenient and for its lack of quality (baeten & simons, 2014). there is evidence that challenges of peer cooperation are related to the process of team teaching and that challenges might differ according to the specific teaching task. for example, a case study conducted by kamens (2007) showed that difficulties in student teacher cooperation are related to tensions arising during preparation time. student teachers face challenges when structuring cooperative activities during co-planning and disagreement during the planning activities results in working independently (dieker, 2001). effective coplanning requires building a common understanding of shared goals (mastropieri et al., 2005) as well as daily interactions on a regular basis to foster ongoing discussions and reflections on teaching (gallo-fox & scantlebury, 2016; murawski & lochner, 2011). however, research into student teacher challenges in field placements that takes the process of team teaching (planning/preparation, instruction and reflection) into account remains scarce. furthermore, little is known about how student teachers cope with the challenges they face during peer cooperation. 3.4 coping with challenges during student teacher peer cooperation although student teachers often view field experiences as the most valuable parts of teacher education, they also consider them as stressful experiences (admiraal, 2020; macdonald, 1993; murray-harvey et al., 2000). stress in field placements can result from a lack of strategies for managing classroom interaction (cluniesross et al., 2008; heikonen et al., 2017). a study conducted by murray-harvey et al. (2000) revealed that student teachers regarded mentor teachers as the most important source for coping with stress, but little is known about whether they manage challenges individually or together, or about how they cope with challenges in different phases of peer cooperation during paired field placements. individual patterns and activities for dealing with challenges can be described as coping strategies, which help “to manage specific external and/or internal demands that are appraised as taxing or exceeding the resources of the person“ (lazarus & folkman, 1984, p. 141). according to lazarus and folkman (1984), coping can be do | f l r 97 distinguished in two categories: emotion-focused (the individual regulates emotions under distress) and problem-focused (the individual avoids or makes efforts to solve the problem causing the distress). endler and parker (1999) added as a third category, task-oriented coping, in which the individual refers to strategies used to solve the problem by reconceptualizing it. there are several studies that include a variety of coping actions, categorizations and concepts (carver et al., 1989; schwarzer & schwarzer, 1996; skinner et al., 2003). coping strategies are also categorized as active (efforts to deal with problems) or inactive (avoid the problems). in the context of the teaching profession, two additional strategies are discussed that are predominantly used by teachers in addressing misbehaviour in classrooms (clunies-ross et al., 2008; wilks, 1996). proactive coping strategies include future-oriented efforts to anticipate, influence and control events, that is, setting clear rules in a classroom or altering a situation before problems escalate. they include behaviours to cope with challenges, such as acting in advance to prevent either an event or a potential future stressor (aspinwall & taylor, 1997). reactive coping strategies are defined as “immediate and spontaneous responses focused on an event that had already occurred” (heikonen et al., 2017, p. 540). they are conceptualized as strategies to tackle and deal with situations in the classroom after they have turned into problems (clunies-ross et al., 2008). according to reupert and woodcock (2010), student teachers reported that they are more likely to use reactive strategies in the classroom, although they considered proactive strategies to be more effective. compared to student teachers who tend to use reactive coping strategies (e.g. avoidance strategies), student teachers who use proactive coping strategies (e.g. problem-solving strategies) tend to experience less stress (gustemscarnicer et al., 2019). heikonen et al. (2017) showed that student teachers used reactive behavioural strategies most frequently in challenging classroom situations in order to take control of the situation. reactive coping strategies are thus often characterized as “survival-oriented” (heikonen et al., 2017, p. 544). coping strategies depend on the situation, and results from studies of effective coping strategies for managing school student (mis)behaviour (e.g. admiraal et al., 2000; clunies-ross et al., 2008; heikonen et al., 2017) cannot simply be translated to other situations. there is evidence that cooperation with colleagues might be helpful (blase, 1989; lindqvist et al., 2017). lindqvist et al. (2020) found that student teachers use various methods of cooperation as forms of proactive coping, such as seeking help and guidance from other more experienced colleagues, when challenges arise. student teachers share adversity with a trusted ally when conflicts arise with other teachers. common collaborative coping strategies included getting allies to address issues cooperatively. there is also evidence that using avoidance is common as a reactive strategy in order to cope with stress (gustems-carnicer et al., 2019). however, there is still a need for knowledge on how student teachers cope with challenging situations during paired field placement when they are partnered with a peer. 4. the current study along with the benefits of team teaching during paired field placements (baeten & simons, 2014; dang, 2013), studies have revealed the difficulties experienced by student teachers when they work with peers (kamens, 2007). studies on team teaching have primarily focused on cooperation during instruction (e.g. baeten & simons, 2016). however, the teaching and cooperation process also includes preparation and reflection. this study thus aimed to identify and understand the challenges that student teachers face during the overall process of team teaching. we also tried to gain a better understanding of how student teachers manage, adapt and respond to challenges in the different phases of team teaching. a knowledge of how student teachers cope with those challenges could inform teacher education programmes regarding how to promote student teacher cooperation skills at an early stage of their training. based on earlier work by baeten and simons (2014) regarding the advantages and disadvantages of team teaching in paired field experiences, we addressed the following research questions: rq1: which forms of cooperation do student teachers describe during different phases of team teaching (planning, instruction, reflection) in paired field placements? do | f l r 98 rq2: what challenges of peer cooperation during different phases of team teaching (planning, instruction, reflection) in paired field placements do student teachers report? rq3: how do student teachers cope with peer cooperation challenges during different phases of team teaching (planning, instruction, reflection) in paired field placements? 5 methodology 5.1 context and participants the present qualitative study was part of a larger mixed-methods research project, “cooperation in field experiences” (2014–2017), that was a joint venture between the institute of educational science, department of research in school and instruction, at the university of bern and the institute of primary education of the university of teacher education in bern. the aim of the project was to examine student teacher cooperation during paired field placements to understand which forms of cooperation are realized and how cooperation skills among student teachers are developed. all participants for this substudy were enrolled in a teacher preparation programme for pre-primary and primary education at the university of teacher education in bern. this programme leads to a primary school teacher bachelor’s degree (180 ects) and includes five practica structured into three teaching practice modules (43 ects). the practica last two to six weeks and four of the five practica are organized in pairs with partners from the same semester. reasons for pairing are mainly based on organizational reasons, i.e. pairing enables field placements for all student teachers despite a shortage of capacity in schools. pairing is also expected to support the development of student cooperation skills (gardiner & robinson, 2009). however, no explicit curricular guidelines or mentor teacher preparation exist. as most of the student teachers did not know each other at the beginning of their studies, they are randomly assigned into pairs for the first practicum. in the three subsequent practica, student teachers have a say in pairing and can choose a pair. the last practicum is placed close to graduation, covers six weeks and is organized as an individual practicum (table 1). table 1: overview of teacher preparation programme: field placements teaching practice module 1 teaching practice module 2 teaching practice module 3 orientation practicum basics of teaching specialized internship subject-related teaching and learning individual practicum multi-perspective approach; individual focus clarifying professional aptitude through critical reflection on the career objective. student teachers dealt with the basics of teaching and gave a selfassessment of their professional aptitude. implementation of various concepts of lesson planning and instruction. student teachers create learning settings based on school students need. student teachers improve their competence by comprehensively exploring their profession on an individual basis. the student teachers take on the role of a classroom teacher or learn administrative procedures. practicum 1 (8 half days and 2 weeks block) practicum 2 (2 weeks) practicum 3 (3 weeks) practicum 4 (4 weeks) practicum 5 (5 weeks) with partner with partner with partner with partner single do | f l r 99 three teaching practice modules (tpm1–3) can be differentiated. tpm1 includes an orientation practicum (practicum 1) at the beginning of student teachers’ studies that consists of eight half days and two weeks, as well as a second two-week practicum (practicum 2) at the end of the second semester. both practica focus on an induction into the basics of teaching, and student teachers are expected to clarify professional suitability through a critical examination of career goals. tpm2 covers two practica (practicum 3 and 4) during the second academic year, which last three and four weeks and focus on learning and teaching. student teachers apply different approaches in planning and instruction and should learn to consider the learning of school students. tandem partners in tpm1 are usually different in tpm2. tpm3 involves a final six-week long-term practicum in the third year (practicum 5), which aims to support student teachers’ professional growth. student teachers set individual goals for teaching and learning in the last internship. across all practica, student teachers are mentored by expert school teachers and are occasionally visited by university supervisors. further support includes a parallel module at the university of teacher education in bern, comprising a teaching practice group and a specialist support group. the teaching practice group consists of several student teachers, a university supervisor and a mentor teacher and provides the opportunity to participate in and instigate discussions about field experiences. student teachers can connect and share their experiences with peers and obtain advice from university supervisors and mentor teachers. this mandatory module, however, does not explicitly include discussions related to team teaching in paired placements. there are also no guidelines or recommendations regarding student teacher peer cooperation. student teachers are encouraged to actively take responsibility for their own professionalization. although mentor teachers are expected to carefully supervise student teachers, they are not explicitly instructed to mentor paired field experiences. 5.2 sample a random selection of 70 out of nearly 200 students who had already participated in the framework study were invited to participate in the interviews. overall, 37 student teachers volunteered to participate. five students had to be excluded for personal or organizational reasons, and two students withdrew. the participants of the final sample (n=30) included 15 student teachers with a focus on preschool education and lower primary school (k-2) and 15 with a focus on upper primary education (3–6). student teachers were, on average, 22.6 years old (sd=2.18; range=21-30). the distribution of female (90%) and male (10%) students corresponds to the high proportion of female students in preschool and primary school levels at the teacher education institute. all participants had given verbal full consent for a scientific use of their answers. confidentiality of data were ensured. participation was fully voluntary, and participants were given the opportunity to quit the interview at any time. 5.3 data collection data collection was conducted during the 2016/2017 academic year at the university of bern. data were collected in november 2016 by the first author. each interview lasted approximately 60 to 90 minutes and took place on the university campus. table 2: semi-structured interview topics, guiding questions and examples of categories topics interview questions categories forms of cooperation (little, 1990) team teaching model how did you work together during planning/ instruction/ reflection? storytelling and scanning for ideas, aid and assistance, sharing, joint work do | f l r 100 (baeten & simons, 2014) (see question on instruction above) observation model, coaching model, assistant teaching model, equal status model, sequential-, paralleland station teaching, teaming model challenges of team teaching (baeten & simons, 2014) conflicts (brody, 2012; friedman et al., 2000; putnam & wilson, 1982) what were the challenges with peer cooperation during planning/ instruction/ reflection? personality differences, differences in personal goals and values, lack of compatibility, lack of flexibility, less individual teaching, increased workload, lack of clarity in terms of responsibilities, lack of feedback from mentor teachers, difficulty with peer feedback, challenges for school students relationship, work style and communication conflicts data collection was performed using an in-depth, semi-structured individual interview (table 2), which was conducted twice for each participant, for both tpm1 and tpm2 at the same interview meeting. the interview guide was pretested with two students. after the pretest interviews, minor modifications were made to the interview guide (e.g. the order of two interview questions was revised and individual questions were removed or added). student teachers were asked to describe their experiences with paired field placements for each teaching practice module 1 and 2. at the end of the interviews, all student teachers were invited to generally comment on the challenges during both teaching practice modules. student teachers were invited to give multiple answers to each question. 5.4 data analysis data were analysed using mayring’s (2010) approach of structuring qualitative content analysis by determining meaning units and categories referring to the research questions. the interview material was categorized using a deductive–inductive coding scheme to structure the content. the deductive coding scheme was developed for each research question by selecting the following categories derived from prior theory: (1) in order to cover the whole teaching process, we combined little’s (1990) four forms of cooperation that refer to the typical life of teachers in school with baeten and simons’ (2014) five team teaching models that explicitly focus on cooperation during instruction (table 4) for the analyses of student teacher cooperation in paired field placements; (2) the categories for the challenges of team teaching were derived from the extended categorization according to baeten and simons (2014) (table 5); (3) finally, text segments were coded according to the use of proactive or reactive behavioural strategy, as identified by putnam and wilson (1982), friedman et al. (2000) and brody (2012). we sorted the challenges into three main types: relationship, work style and communication conflicts. then, we differentiated between proactive and reactive coping. proactive strategies aim to find a solution to the problem, engaging in activities for goal achievement/pursuit (example: “she could have given me a little more sophisticated feedback ... i told her ‘yeah, can you observe me a little bit better during my lesson?’” (st_28)). reactive strategies imply reduced efforts to deal with the stressor and use avoidance and distraction from active engagement to address the problem (example: “scheduling wasn’t good because it was like ‘you said you're going to take this (!) to our meeting and then it’s not available’ ... can’t fully trust her to do her part” (st_3)). table 3: coding scheme for forms of peer cooperation according to little (1990) do | f l r 101 as can be seen in table 3, little’s (1990) work on coopeation refers predominantly to the description of general collaborations between teachers and does not cover all cooperation activities during field experiences. accordingly, feedback from mentor teachers and division of work and teaching need to be added. moreover, we complemented the coding scheme based on little’s (1990) with a specific focus on various forms of independent teaching (see table 4) in the paired setting as introduced by baeten and simons (2014). baeten and simons’ differentiation into forms of team teaching (2014) helped to specify and explain how the division of lessons has been practised and experienced by the student teachers. as a final step (table 5), we applied the analysis of related challenges that were identified by beaton and simons (2014). table 4: coding scheme for team teaching models according to baeten and simons (2014) category coding method example storytelling and scanning for ideas students gain information through quick occasional exchanges of stories and experiences. informal exchange takes place outside of active class hours “so the train ride was kind of our time slot for exchange” (st_6) aid and assistance students give each other advice or share ideas on specific teaching situations. concrete assistance is given when explicitly requested “if you had a problem, you asked the other for assistance” (st_1) sharing students exchange materials, ideas, opinions or methods. access to the material is granted, ideas and aspects of the work are revealed “... we just stayed there after school and discussed ... we could help each other a lot” (st_22) joint work students engage in joint work, share responsibility and goals. students discuss different views and opinions through collective actions. “when we practised team teaching, we planned together in detail from the very beginning” (st_8) new categories: arrangements and division into teaching lessons students make arrangements and divide teaching lessons “we have always arranged who is dayresponsible or lesson-responsible” (st_4) receiving feedback from mentor teachers students receive feedback from the mentor teacher “we talked about it, lesson by lesson, and the mentor teacher gave feedback” (st_5) category example observation model “after the lesson observation we gave each other feedback” (st_2) coaching model “ ... we also supported each other when it came to coaching” (st_9) assistant teaching model “one person was responsible and the other assisted” (st_28) equal status model do | f l r 102 table 5: coding scheme for challenges related to baeten and simons (2014) field of challenges main categories subcategories examples relationship conflict personality differences differences in personal goals and values disagreement insists on own opinion “we didn’t get along, we saw each other as rivals. i’m more of a lone wolf” (st_13) “ ... she wanted to do it her way” (st_1) work style conflicts lack of compatibility lack of flexibility different perceptions and ideas different working styles “both have completely different ideas” (st_2). “she has a different work rhythm, which was difficult!” (st_12) mismatch of individual teaching styles “ ... different types of teaching, someone wants more group work and the other doesn’t mind” (st_11) “ ... i had to stick to the agreement we previously made” (st_5) communication conflicts lack of clarity regarding responsibilities “ ... not quite clear who’s really in charge now” (st_20) lack of feedback from mentor teachers feedback from mentor teachers more valued “ ... feedback from the mentor teacher is just more important to me” (st_12) difficulty with peer feedback destructive feedback “ ... feedback she gave me was like ‘that wasn't good, you have to do it in a different way’” (st_1) incapable of criticism “she could not handle my feedback” (st_22) increased workload time-consuming “the same thing is discussed three or four times” (st_10) organizational issues “the practicum location was too far away to meet before classes started” (st_7) less individual teaching “ ... it’s rather unusual that you teach together” (st_14) challenges for school students ” ... it's a little bit confusing for the smaller children when they have multiple teachers in class” (st_6) the coding scheme was developed including definitions, anchor examples and coding rules for the main categories and subcategories (mayring, 2010). anchor examples for each category were extracted from the sequential teaching “she read the story to the children and then i took over” (st_25) parallel teaching “we had two classes and everyone taught their class with the same content” (st_8) station teaching “we were working on the theme ‘air’ and we had 27 different stations” (st_9) teaming model “ ... we also did team teaching together, planning a whole lesson, teaching and giving each other feedback later” (st_17) do | f l r 103 interviews (see table 3). additional subcategories were developed and inductively added when student teachers stated a challenge not mentioned in the previously discussed literature. the coding unit was the answer to the question, i.e. as smallest component a single word, for example, the answer “division” as response to the question of team teaching practices. the category system was applied using data analysis software maxqda 18. all interviews were coded by the first author and two independent co-raters each coded half of the interviews. intercoder reliability was tested by comparing the results of the initial rating by the first author with one of the other two raters. the corrected cohen’s kappa coefficients (brennan & prediger, 1981) indicate an interrater reliability of high agreement. kappa values range from 0.73 to 1.00 (see appendix table a1, a2 and a3). additionally, we defined an illustrative case to offer an exploratory view of how student teachers coped with the challenges during the process of team teaching. 6 results 6.1 rq1: which forms of peer cooperation do student teachers describe during different phases of team teaching (planning, instruction, reflection) in paired field placements? the first analysis of forms of team teaching was based on little’s (1990) approach. in addition to little’s four main categories – (1) storytelling and scanning for ideas, (2) aid and assistance, (3) sharing and (4) joint work – two more categories emerged from the data: (5) agreements and division of lessons that were related to planning and instruction and (6) receiving feedback from mentor teachers that was reported for the reflection phase (table 6). table 6: overview of forms of peer cooperation (according to little, 1990) category planning instruction reflection codes count tpm1/tpmp2 percentage tpm1/tpmp2 codes count tpm1/tpmp2 percentage tpm1/tpmp2 codes count tpm1/tpmp2 percentage tpm1/tpmp2 storytelling and scanning for ideas 18/6 21%/9% -/-/10/16 12%/30% aid and assistance 22/9 25%/14% 21/24 39%/35% 26/9 30%/17% sharing 17/19 20%/29% -/-/16/6 19%/11% joint work 11/10 12%/15% -/-/13/8 15%/15% agreements and division of lessons 19/22 22%/33% 33/45 61%/65% -/-/ receiving feedback from mentor teachers -/-/-/-/21/15 24%/28% total n tpm1/tpm2 87/66 54/69 86/54 note: tpm1=results regarding teaching practice module 1; tpm2=results regarding teaching practice module 2 calculation of the percentage: the total number of codes was divided by the number of responses for each phase of cooperation. 6.1.1 planning agreements and divisions of lessons were described during both tpm (tpm1: 22%; tpm2: 33%), for example: “the division was actually quite simple: we determined who would teach which part of the lessons, do the introduction or sometimes we divided whole days” (st_13). student teachers mentioned aid and assistance as being provided through giving advice or sharing ideas on specific situations (tpm1: 16%; tpm2: 2%) and concrete assistance when asked (tpm1: 9%; tpm2: 12%): “if you had a problem, you asked the other, you could get a little help” (st_6). storytelling and scanning for ideas (tpm1: 21%; tpm2: 9%) included two categories informal exchange (tpm1: 15%; tpm2: 5%) and exchange of ideas (tpm1: 6%; do | f l r 104 tpm2: 3%): “we shared ideas on how to stimulate lessons to keep children attentive.” (st_3). the practices of cooperation that little (1990) defined as sharing (share responsibilities and goals, discussions about different views and opinions and provide feedback) were reported for both tpms (tpm1: 20%; tpm2: 29%). sharing included two categories as well (exchange of opinions and methods; exchange of material and ideas). “we always set up a basic concept and shared our ideas with each other, like ‘did you find out something about that, too? how are we going to do this?’ ...” (st_9). joint work was rarely reported (tpm1: 12%; tpm2: 15%). in summary, the results revealed that cooperation during planning was primarily focused on division of work. as a notable difference, student teachers described agreements and division more frequently in tpm2 than in tpm1 (tpm1: 22%; tpm2: 33%). 6.1.2 instruction a respectable number of answers aligned with the forms of peer cooperation according to little (1990) during both tpms (tpm1: 54 responses; tpm2: 69 responses), however only two forms could be identified. the majority of codes related to the category agreements and division of lessons (tpm1: 61%; tpm2: 65%) and aid and assistance (tpm1: 39%; tpm2: 35%), which included two subcategories, concrete assistance when asked (tpm1: 28%; tpm2: 29%) and give advice/share ideas on specific situations (tpm1: 11%; tpm2: 6%). student teachers often described divided lessons for separate instruction, and no extensive form of peer cooperation, as defined by little (1990), could be identified. however, our data showed various forms of peer cooperation during instruction, and thus the analysis of the instruction, according to little’s forms of cooperation, left a number of answers uncoded. the specific focus on the three phases of teaching (planning, instruction, reflection) revealed additional forms of cooperation that were specifically practised during instruction. we therefore augmented the analysis with the team teaching forms of baeten and simons (2014), which explicitly focus on cooperation during instruction in field placements and help to better more clearly explain how division of lessons has been implemented. a total of 54 (tpm1) and 69 (tpm2) responses were coded. according to the multiple response format, the analysis revealed that student teachers described, on average, two different forms of team teaching that they used during instruction (table 7). table 7: forms of team teaching during instruction (according to baeten & simons, 2014) category instruction codes count tpm1/tpm2 percentage tpm1/tpm2 team teaching models observation model 3/2 5%/3% coaching model 1/1 2%/1% assistant teaching model 22/18 39%/25% equal status model 23/38 41%/53% parallel teaching 3/3 5%/4% sequential teaching 19/31 34%/43% station teaching 1/4 2%/6% teaming model 4/5 7%/7% individual teaching 3/8 6%/11% total 56/72 do | f l r 105 note: tpm1=results regarding teaching practice module 1; tpm2=results regarding teaching practice module2. the most frequently reported team teaching model according to baeten and simons (2014) was the equal status model and its variations (tpm1: 41%; tpm2: 53%). sequential teaching as one of the three subforms of the equal status model (parallel, sequential and station teaching) was selected as a favourite teaching strategy for peer cooperation (tpm1: 34%; tpm2: 43%). more than two out of three student teachers reported using sequential teaching during the two tpms: “we divide subjects and classes ... for example, now ms. a. does this with you and then ms. w. does this with you”. i can still remember a music lesson where we rehearsed a dance with the class. i started by demonstrating and then she demonstrated more moves” (st_3). student teachers also described cooperative practices as equal partners: “one was always in charge during their sequences but both had joint responsibility ... we agreed when we would do what” (st_2). the second most preferred team teaching model was the assistant teaching model (tpm1: 39%; tpm2: 25%): “during the circle the one who was not teaching took a seat in the circle and intervened, or helped some children depending on their behaviour” (st_20). student teachers reported poor use of the parallel and station teaching (tpm1: 2%; tpm2: 6%). 6.1.3 reflection reflection in tpm1 showed that the majority of student teachers primarily mentioned aid and assistance (tpm1: 30%; tpm2: 17%) and storytelling and scanning for ideas in tpm2 (tpm1: 12%; tpm2: 30%). aid and assistance was given through feedback/exchange about instruction: “we undertook reflection either at noon or in the afternoon. we just discussed in general what we had observed about teaching, what we had noticed” (st_29). receiving feedback from mentor teachers was the second most frequent form for both tpms (tpm1: 24%; tpm2: 28%). notably, student teachers reported that reflection was often guided or initiated by the mentor teachers: “we actually didn’t do a lot of reflection. more with the mentor teacher. just in a threesome combination. but usually the mentor teacher did the reflection” (st_10). 6.2 rq2: what challenges of peer cooperation during different phases of team teaching (planning, instruction, reflection) in paired field placements did student teachers report? as has been shown, the three different phases of team teaching are associated with various forms of peer cooperation, with a majority of divided and rather independent teaching practices during instruction. accordingly, we categorized the challenges that student teachers reported according to these three phases. table 8 shows that student teachers most frequently reported challenging experiences during instruction (n=76), followed by planning (n=65) and fewer experiences during reflection (n=36). table 8: challenges of peer cooperation (extended categorization according to baeten & simons, 2014) field of challenges planning (n=65) instruction (n=76) reflection (n=36) relationship conflicts (31%/11%/8%) personality differences (10%) differences in personal goals and values (21%) personality differences (4%) differences in personal goals and values (7%) personality differences (5%) differences in personal goals and values (3%) work style conflicts (35%/47%/-) different perceptions and ideas (16%) different working styles (11%) mismatch of individual teaching styles (16%) lack of flexibility (8%) lack of flexibility (26%) do | f l r 106 note: whereas the interview question regarding experiences with peer cooperation (rq1) was separated for the two practice phases (tpm1, tpm2), the interview question regarding the challenges of peer cooperation addressed challenges during the three phases of team teaching in general. the results thus include tpm1 and tpm2. 6.2.1 planning the primary challenges were due to increased workload (43%), work style conflict (35%) and relationship conflicts (31%). issues considered frequently were differences in personal goals and values (21%) and personality differences (10%): “she didn’t want to accept any other opinions. we couldn’t find a common solution” (st_22). lack of compatibility became evident through differences in perceptions and ideas (16%), as well as working styles (11%): “both of us have completely different ideas during the preparation, actually different preparation strategies, too” (st_2). student teachers also expressed displeasure about the lack of flexibility when forced to follow agreements (8%). 6.2.2 instruction most of the conflicts during instruction could be categorized into two field of challenges: work style conflicts (47%) and communication conflicts (28%). the most commonly cited challenge in cooperation was lack of clarity regarding responsibilities (28%): “when we taught together it was maybe a little bit difficult as to who was speaking now in class” (st_19). also, challenges regarding lack of flexibility (26%) were mentioned due to pressure to adhere to pre-agreements: “we don’t get along very well. we kind of agreed who’s in charge but still i would like to express my opinion. but i had to stick to the agreement we made before” (st_22). an additional 14% of the challenges were related to school students, as unclear teacher roles led to confusion among school students: “when you have different teachers, it is always difficult for the kids, like ‘what am i not allowed to do in this class again?’ ...” (st_6). a small number of student teachers indicated that their partner insisted on their own opinion during the instruction (7%). 6.2.3 reflection along with increased workload (42%) and, in particular, time-related issues (e.g. “sometimes the same thing is discussed three or four times, which can be exhausting” (st_11)), issues during reflection most often concerned communication conflicts (50%). student teachers did not feel comfortable giving peer feedback, believing that not all peers are able to handle criticism (19%) (e.g. “cannot accept the criticism or take it positively” (st_22)) and 14% had to deal with destructive feedback. a lack of feedback from mentor teachers was seen as a challenge (17%), for example, “a mentor teacher can draw on their experience” (st_3). less individual teaching (5%) increased workload (43%/-/42%) time-consuming (29%) organizational issues (5%) time-consuming (42%) communication conflicts (-/28%/50%) lack of clarity with regard to responsibilities (28%) lack of feedback from mentor teachers (17%) destructive feedback (14%) incapable of criticism (19%) challenges for school students (14%) do | f l r 107 in sum, most of the challenges of peer cooperation were experienced during instruction (76 codes), whereas reflection was perceived to be the least challenging (36 codes). however, challenges did already arise in the earliest stage of peer cooperation, as the second most frequent challenges (65 codes) were reported during planning. this finding points to the importance of the quality of peer cooperation during lesson preparation. based on putnam and wilson’s (1982), friedman et al.’s (2000) and brody’s (2012) field of challenges, increased workload and work style conflicts were most frequently reported. in particular, we found that the three different phases differed not only regarding challenge frequency but also challenge categories. planning proved to be especially difficult due to increased workload (43%), work style conflicts (35%) and relationship conflicts (31%). instruction was difficult due to work style conflicts (47%) and communication conflicts (28%), whereas reflection was difficult due to communication conflicts (50%) and workload (42%). given that the predominant form of peer cooperation was division of work and rather independent teaching, such as assistant teaching and equal status, the question of how student teachers cope with these conflicts seems to necessitate closer attention. 6.3 rq3: how do student teachers cope with peer cooperation challenges during different phases of team teaching (planning, instruction, reflection) in paired field placements? in the next step, we aligned the challenges that individual student teachers reported in the three phases of team teaching with their use of coping strategies. we focused this analysis on the three challenges that are related to peer interaction, i.e. relationship conflicts, work style conflicts and communication conflicts. as seen in table 9, in coping with the perceived challenges, student teachers reported the use of more reactive than proactive coping strategies. table 9: overview of the frequency of challenges and coping strategies by team teaching phases by student teachers challenges coping strategy planning relationship conflicts (3) reactive (2) proactive (1) work style conflicts (12) reactive (8) proactive (4) communication conflicts (5) reactive (5) proactive (0) instruction relationship conflicts (2) reactive (0) proactive (2) work style conflicts (10) reactive (7) proactive (3) communication conflicts (4) reactive (1) proactive (3) reflection relationship conflicts (5) reactive (3) proactive (2) communication conflicts (2) reactive (1) proactive (1) (n) total numbers of student teachers. note: overall numbers of challenges do not add up to 30 student teachers because not all challenges were encountered by each student teacher. in particular, in case of a lack of cooperation, no challenges were mentioned. in order to better understand how student teachers cope with peer cooperation challenges, we selected two illustrative cases that included both reactive and proactive coping strategies. tom predominantly uses reactive coping strategies (table 10), whereas luisa uses both reactive and proactive coping strategies to manage challenges (table 11). these two cases, tom and luisa, were also selected because both reported challenges that occur in all three phases of cooperation. thus, both illustrative cases serve as examples of forms of cooperation and challenges in paired field placements and how student teachers tried to solve conflicts. do | f l r 108 6.3.1 tom tom’s case study shows how students deal with challenges when they use reactive strategies in all three phases of collaboration. table 10: overview of case study tom tom’s case form of cooperation challenges coping strategy planning division of labour relationship conflicts work style conflicts communication conflicts reactive instruction sequential teaching relationship conflicts reactive reflection separate reflection relationship conflicts reactive planning: work style conflict and lack of communication triggered subsequent relationship conflict during planning, both student teachers divided the tasks. however, even the division of labour and the separate work led to conflicts due to perceived unfair division and different working styles and outcomes. first, the relationship was described as positive, but peer cooperation continually worsened. according to tom, each person had a sense of accomplishing more than their peer. unequal contributions and efforts led to tensions. tom described conflicts that arose due to different preferences about how to accomplish tasks. i did a lot and somehow nothing came from the other person. i’m more of a person who waits first and then takes it into my own hands. i’ve prepared so many things, but she still considered the situation to be unfair. and i thought to myself once again: “i have already done everything”. that somehow created an uneasy feeling between us. to avoid further conflicts, tom decided to do the preparation alone. he informed his partner but did not ask for her consent. however, he had doubts about the unequal division of work and the contributions. he was unsatisfied with the team outcome and expressed his frustration. his discomfort with having done the tasks alone contradicted his free decision to do it this way. disagreements about duties were not openly discussed. the biggest challenge was thus perceived and termed as “sacrifice”. the biggest challenge was somehow the “sacrifice”. i may exaggerate from time to time, but with her it was perhaps almost the complete opposite. i tried to make stamps with natural materials for a whole day. and she was around for just about half an hour and then said, “i have to leave. i’m stressed.” but she often feels like i am doing less. afterwards i thought to myself, can’'t we just take a little time for things somehow? task conflict and a lack of communication triggered relationship conflict. the applied strategy of separating work instead of using peer cooperation led to an unequal distribution of work and responsibilities, which in turn intensified tensions and feelings of annoyance for both student teachers. these tensions meant that student teachers stopped collaboration during planning. do | f l r 109 instruction: harmonic instruction due to independent work but fragile relationship one might expect the misfit between tom and his partner to continue during the process of team teaching. surprisingly, it was found that despite the challenges during the planning, the instruction proceeded harmoniously. a deeper look into the process, however, shows that this harmony was based on collective disconnectedness. during the instruction we got along with each other. we focused on the lessons and we appreciated each other in this respect. we actually shared things quite often. so, she did her lessons and i did mine. the “lead function” was so important. we wanted to do it that way – “that’s your work somehow and you decide what’s going to be done here.”‘ on this level, it worked. the student teachers focused on different lessons during instruction. when each person was able to perform their tasks independently, the cooperation worked properly. tom therefore emphasized the importance of the leadership function. a clear allocation of tasks made the peer cooperation work. conflicts arose in other situations, however. grading situations and contact with mentor teachers were identified as moments of stress and harmful for peer cooperation. i noticed that and told her, “whenever there were moments of stress, we always had this problem”. so, especially during exams or when something was being graded or when the teachers came over for lessons. reflection: reduced tensions when mentor teachers involved but fragile relationship as the reflection was primarily guided by the mentor teacher, this phase turned out to be less conflictual between the student teachers. reflection was always quite important. the person who hadn’t taught sat with the teacher at a table at the back of the classroom. once, my partner expressed her displeasure. she felt that it was not acceptable that the teacher and i had whispered during her lesson. she always felt extremely attacked because she always thought it was about her. the clear structure of separate reflection reduced conflict between the student teachers but no peer cooperation developed. the vulnerability of the relationship became clear in situations of contact with the mentor teacher and impeded the relationship. 6.3.2 luisa in luisa’s case study, exchange was practised among the student teachers during planning and reflection. however, this exchange revealed several conflicts and reactive and proactive coping strategies are used to deal with these challenges. do | f l r 110 table 11: overview of case study luisa luisa’s case form of cooperation challenges coping strategy planning exchange of ideas relationship conflicts work style conflicts communication conflicts reactive instruction sequential teaching relationship conflicts communication conflicts reactive reflection exchange of feedback communication conflicts proactive planning: work style conflicts led to relational and communication challenges luisa reported work style conflicts during planning, as the student teachers had different preferences on how to accomplish tasks. the mismatch of behaviour led to tensions. sometimes she made suggestions which i didn't agree with. after that, it was difficult to somehow find a balance. she was very stubborn and wanted to do her own thing. when we had actually agreed on something, she wanted to change it again and came up with “we could do it another way”. as conflicts increased, luisa stopped addressing disagreements with her partner. she felt that discussing issues would create more issues and may even escalate things. accordingly, communication between the student teachers worsened. occasionally, luisa reported these challenges to the mentor teacher, who in turn addressed the issues. although luisa was annoyed as she had to deal with her partner’s difficult work style, she valued her partner’s contribution to the planning, while emphasizing the differences between them. she contributed to the planning, she looked for a lot of ideas and was able to come up with some new ideas as well. but she had difficulty taking my ideas into account and accepting feedback or criticism. so, we didn’t always enjoy each other’s company because we were not always able to deal with each other’s ideas. after unsuccessful trials in finding shared solutions during planning, luisa tended to reduce peer cooperation due to different work styles. interestingly, she continued with common preparation, although the work style differences remained unresolved. this had a negative impact on her morale and cooperation behaviour. instruction: increased relational issues and lack of communication the conflicts due to the different work behaviour that luisa mentioned during planning continued during work in the classroom. the few lessons we tried to teach together were a little more difficult because we didn’t want to tell each other what to do in class. we were in the forest and i had the lead. i set a limit of the field. she went out beyond that limit with a small group of school students. it was difficult to address it in any meaningful manner. when i set boundaries, i want her to accept them, too! luisa is unable to react properly when the school students overstep the boundary with her partner. she does not want to embarrass her partner in front of the school students. as a consequence, she cannot explain to the school students why their behaviour was inappropriate. luisa is annoyed because she expects both the school students and her partner to follow her rules. do | f l r 111 despite problems, luisa aimed to benefit from the paired placement as she tried to learn from the mistakes of her partner. keen to improve her own teaching, she adopted her partner as a negative role model. luisa concluded that they were able to benefit from each other, but also that her partner gained more value from their teamwork than she did. as a result, i learned what i should not do! i rather learned from her mistakes. she couldn’t handle any interruptions, she needed a lot of intervention signs – such as the “silence” – sign introduced in this class to manage the classroom, which then also lowered the effect of these signs. reflection: challenges in communication lead to poor reflection quality luisa stated that the reflection was relatively poor due to the lack of exchange and communication. she was afraid that she might trigger negative reactions from her partner by providing feedback. her attempts to provide supportive feedback and to offer suggestions for improvement were not valued by her partner: also, i tried to phrase negative feedback in such a way that it doesn’t sound merely bad, more like suggestions about how she might do better. her attempt to cope with the difficulties by preventing negative feedback resulted in an uneasiness about not being honest. she regrets that neither student could adequately benefit from peer cooperation during reflection and that the lack of communication also impeded further cooperation. the reflection was rather poor, simply because our exchange was not good. because we couldn’t talk to each other on a productive level. i wish we had had better communication and that we could both provide each other with good criticism and constructive feedback. that’s why we couldn’t collect any good ideas for our further cooperation. peer cooperation in the reflection phase was nearly impossible due to the ongoing problems with communication. although luisa applied a proactive coping strategy by providing supportive feedback, peer cooperation remained difficult due to communication problems. 7. discussion and conclusion this study aimed for a better understanding of the peer cooperation challenges that student teachers encounter in different phases of team teaching during paired field placements and of the coping strategies that student teachers apply to manage these challenges. first, we identified which forms of team teaching were used during different phases of team teaching (planning, instruction, reflection) in paired field placements (rq1). based on little’s (1990) framework of teacher collaboration and beaten and simons’ (2014) work on team teaching, the results revealed that agreements and division into lessons were frequently used during planning. student teachers made arrangements to divide lessons with the aim of elevating opportunities for individual teaching. the findings show that cooperative placements were spontaneously redesigned into individual forms, as the planning of sequential teaching is a method well suited to preparing and teaching separately, while other forms of team teaching would require more cooperation. reasons might involve poor instruction and support for cooperative planning, as well as a lack of role models in schools and universities (le et al., 2018). future research could investigate whether peer cooperation during planning could be fostered through extra time and specific cooperative tasks. there is evidence that peer cooperation can benefit from mentored sessions that focus on student teacher cooperation during field placements (e.g. gardiner & robinson, 2009; goodnough et al., 2009; walsh & elmslie, 2005). based on the team teaching models of baeten and simons (2014), the findings showed that student teachers most frequently used the equal status model and, particularly, sequential teaching, followed by the assistant do | f l r 112 teaching model during the instruction phase. the results repeat the findings from the planning phase, such that student teachers focus primarily on individual teaching by dividing lessons. these more separate models of teaching would also need high-quality peer cooperation, however. for example, the specific tasks related to the schedule and forms of support need to be discussed regarding the role of an assistant. if an assistant is supposed to take an active role, the person who is responsible for the lesson must explicitly include those phases. the frequent use of the equal status model during both teaching practice modules also reflects how student teachers perceived themselves during field placements. even if they might have differed in individual competences, they saw themselves as equal in the course of their training. this begs the question of whether peer cooperation needs closer supervision and instruction, which could be given through models of peer mentoring, such as within the content-focused peer-coaching model (becker et al., 2019; kreis, 2019). the desire to work alone during the paired field placement was also mirrored in the reflection phase. the opportunity to seek help from each other and the feedback they provided were only marginally addressed during the reflection. giving the important role that feedback has for the learning process (dee, 2012), this suggests the necessity of introducing student teachers and mentor teachers to models of professional peer feedback (wynn & kromrey, 2000). the findings also imply that student teachers might not sufficiently consider the reflection phase as a learning opportunity for cooperative lesson preparation, although baeten and simons (2014) found that students can benefit from team teaching through increased dialogue, which implies reciprocal exchanges. second, this study sought to identify challenges of peer cooperation with a specific link to the three different phases of team teaching (rq2). the findings of this study confirmed challenges that align with disadvantages, as reported by baeten and simons (2014). baeten and simons (2014), however, did not cover all phases of cooperation in their studies. team teaching was thus only partially mapped, because cooperation can occur during planning, instruction and reflection. accordingly, we identified additional challenges, some of which were specifically linked to phases of team teaching (e.g. the pressure to follow agreements during planning and instruction). relationship conflicts occurred more frequently during lesson planning and the majority of challenges related to a lack of compatibility between the team teaching partners during planning and instruction. diverse working styles, as well as personal disagreement and misfits, caused problems. this lack of compatibility between cooperation partners has already been discussed in the literature (e.g. goodnough et al., 2009; kamens, 2007). the results confirm the finding by nokes et al. (2008), that student teachers faced moments of tension when personalities or philosophies of teaching did not match. challenges seemed to occur early, when individuals had different expectations of planning and work contributions, and failed to communicate. surprisingly, communication issues were not explicitly mentioned as a challenge by student teachers, although it affected the planning process. the results revealed that paired field placements often resulted in divided instead of cooperative lessons. along with a lack of compatibility, student teachers reported that the increased workload caused by peer cooperation hampered peer cooperation. similar to research that found workload to be a relevant obstacle to teacher cooperation (parsons & stephenson, 2005), student teachers seemed to avoid this extra time and effort for mutual exchange. student teachers tended to use sequential teaching, as they felt displeasure when forced to follow agreements during planning and instruction. this challenge could add new information to the existing knowledge about disadvantages of student teacher team teaching as outlined by baeten and simons (2014) and the difficulties of teacher cooperation (de jong et al., 2019). further research is necessary, however, to identify the role that this challenge could play in impeding cooperation. another challenge was related to the confusion of school students due to unclear teacher roles and responsibilities during planning and instruction, which might have had negative effects on their learning. in line with previous research (baeten & simons, 2016; goodnough et al., 2009), the results confirm confusion and unclear roles as key disadvantages of team teaching (schmulian & coetzee, 2019). do | f l r 113 the results also showed that challenges were not only relevant within a team teaching phase, but that negative effects of phases could be intertwined. this was specifically evident for challenges in planning or instruction, which affected peer cooperation in the subsequent phases. limited time and a lack of support for cooperative planning were revealed as a burden for cooperative teaching and reflection. if the planning phase allowed student teachers to work separately, opportunities for common teaching and reflection decreased. even when they shared a teaching lesson, they preferred to work individually and waived the opportunity for cooperative reflection. as the opportunity for reflection with the team teaching partner was, more generally, marginally used during the paired field placements, the results could be seen as demonstrating a need for better mentoring as regards the student teacher peer exchange and cooperative reflection. specific forms of mentoring, such as reflective teaching (zeichner, 1981), or models for providing professional conversations such as improvementfocused feedback (timperley, 2015) could help to support student teacher peer cooperation during reflection. third, this study found that student teachers tend to use more reactive than proactive coping strategies in all three phases of team teaching. based on two case studies, it illustrated how student teachers coped with challenges across different phases of team teaching (rq3). both cases showed predominantly reactive coping strategies that were applied to a number of challenges that had already emerged during the planning phase. similarly to the study by heikonen et al. (2017), who found reactive coping strategies to be prevalent for student teachers coping with difficult interactions during classroom management, the challenges of peer cooperation were answered after the event had already occurred and the conflicts had developed. case 1 (tom) showed work style challenges that subsequently triggered relationship conflicts. case 2 (luisa) is an illustration of problem avoidance, which led to conflicts including work-related and interpersonal issues. both cases also offered new insights into the coping strategies used by student teachers and added information to the theories of cooperative teacher behaviour. when student teachers had negative experiences of peer cooperation during planning, they implemented an avoidance behaviour, particularly in the division of lessons. this division of lessons was directly or indirectly supported by the mentor teachers. according to baeten and simons (2014), the division of lessons used with sequential, station, or parallel teaching is regarded as a highly cooperative form of team teaching. in our study, however, these forms of team teaching were applied in order to avoid peer cooperation. this suggests a need to consider not only the instruction phase but also the planning and reflection phases when defining and evaluating teaching behaviour as cooperative. team teaching provides an opportunity for dynamic exchange with peers, cooperatively overcoming challenges and embracing vygotsky’s (1978) notion that learning is a social activity. this might be a big challenge for student teachers. interestingly, we also found that difficulties could arise when student teachers avoided peer cooperation by preparing and teaching lessons individually. studies have shown that mentor teachers were needed by student teachers to act as mediators when problems arose. for example, mentor teacher support encouraged student teacher reflections on disagreement (e.g. nokes et al., 2008). as successful cooperation requires a positive atmosphere and a good relationship (gardiner & robinson, 2011), as well as high-quality communication (gillies, 2004), student teachers and mentor teachers should be specifically prepared for cooperation (baeten & simons, 2014). the results of this study also indicated that student teachers were uncertain about how to respond to or tackle difficulties with peer cooperation, and thus imply that student teachers need to be guided to become aware of potential proactive strategies to regulate challenges they face during peer cooperation. this finding mirrors previous studies that reported that student teachers have limited cooperative skills to cope with challenges (heikonen et al., 2017). it would be interesting to explore how to improve the quality of support seminars and reflection during paired field placements with the mentor teacher. future studies should also further examine how teacher education can support student teachers’ personal development to avoid conflicts based on, for example, different values or beliefs (see meijer et al., 2009). 8. practical implications for teacher education this study confirms that, although paired field placements provide “... a structure for collaboration, the structure alone does not guarantee that successful collaboration will occur” (gardiner & robinson, 2011, p. 10). these findings have the following practical implications: do | f l r 114 • emphasizing the learning potential of paired field placements. the pairing is an opportunity to share competencies. the learning potential of team teaching should be explicitly explained to student teachers to highlight its benefits in fostering positive attitudes towards cooperation. a positive attitude toward cooperation could encourage student teachers. • fostering the student teacher relationship/team-building process. the results suggest a need to promote the relationship between students in the early stages of paired field placements. opportunities for first contacts and mutual exchange should be created prior to the start of the placement. moreover, communication among all partners in all stages of paired field placements should be encouraged. • providing organizational support with time allocation/management. the administration may provide support by aligning student teachers’ schedules for cooperation before and during the practicum, resolving time-related issues by allocating time for team meetings. mentor teachers can also help by scheduling opportunities to plan and reflect on lessons after instruction. meetings should take place on a regular basis, at regular times and places, in order to routinize joint planning, reflection and feedback discussions. • promoting student teacher social skills through specific training. student teachers should be explicitly trained with regard to their cooperation skills in courses to maintain a positive relationship through frequent reflective discussions (e.g. providing constructive feedback, establishing concrete agreements, or coordinating cooperation). student teachers also need to be guided to become aware of strategies to manage difficulties and to resolve conflicts constructively during paired field placements. • promoting specific training for mentor teachers and university teachers. mentor teachers and university teachers who guide paired field placements play an important role, particularly in the case of disagreements and conflicts. teacher education should prepare mentor teachers and university teachers for their guiding role in promoting communication and cooperation within student teacher pairs. 9. limitations this study has several limitations. first, it investigated only a small number of participants from one teacher education programme and the results are not representative of other teacher education contexts. as teacher education in switzerland varies across states, the results cannot be translated to other states. second, although the study aimed for an understanding of the student teacher perspective, only a one-sided view of peer cooperation in a tandem practicum could be investigated. subsequent studies will try to mirror both sides of the peer duo in order to uncover the full picture of cooperation. third, the results must be rated as exploratory in nature. the specific features of a teacher education programme, such as the form of preparing and mentoring paired field placements, needs more detailed consideration. fourth, we identified a limited set of coping strategies. future research should investigate a greater variety of coping strategies. fifth, although we could identify various forms of peer cooperation according to three phases of team teaching (planning, instruction, reflection), it was not possible to associate specific challenges with the various forms of team teaching due to the high dominance of individual teaching by dividing lessons. it would be interesting to know whether more advanced forms of peer cooperation, such as the teaming model, differ in terms of challenges. this question calls for quasi-experimental research that aims to compare different forms of peer cooperation. also, longitudinal research with other groups of student teachers in other paired field placements would be welcome. more generally, longitudinal research could help to deepen our understanding of student teacher challenges. such insights could be of great value for teacher education, to improve cooperative field placements. do | f l r 115 keypoints student teachers most frequently used division of work and sequential teaching results reveal various forms of conflicts during different phases of peer cooperation (planning, instruction, reflection) the most challenging part of peer cooperation is experienced during instruction with lack of compatibility with the peer student teachers tend to use reactive coping strategies in response to challenges the potential of paired field placements as learning opportunities for cooperation skills is undermined by division of work appendix 1: corrected cohen’s kappa coefficients, forms of cooperation table a1 appendix 2: corrected cohen’s kappa coefficients, challenges table a2 little’s form of cooperation (1990) cooperation during instruction according to baeten and simons (2014) challenges of team teaching related to baeten and simons (2014) co-rater 1 co-rater 2 co-rater 1 co-rater 2 co-rater 1 co-rater 2 planning 0.80 0.89 0.86 0.86 instruction 0.79 0.83 0.82 0.87 0.86 0.86 reflection 0.75 0.81 0.98 0.91 challenges relationship conflicts work style conflicts communication conflicts co-rater 1 co-rater 2 co-rater 1 co-rater 2 co-rater 1 co-rater 2 planning 0.82 0.88 0.93 0.91 0.84 0.82 instruction 0.91 0.94 0.90 0.89 0.82 0.80 reflection 0.79 0.81 0.87 0.86 do | f l r 116 appendix 3: corrected cohen’s kappa coefficients, coping strategy table a3 coping strategy reactive reactive proactive proactive planning co-rater 1 co-rater 2 co-rater 1 co-rater 2 relationship conflicts 0.96 0.98 1.00 1.00 work style conflicts 0.85 0.80 0.75 0.80 communication conflicts 0.80 0.79 1.00 1.00 instruction relationship conflicts 1.00 1.00 0.96 0.98 work style conflicts 0.83 0.86 0.80 0.78 communication conflicts 0.78 0.73 0.79 0.80 reflection relationship conflicts 0.92 0.89 1.00 0.98 work style conflicts communication conflicts 0.98 1.00 1.00 1.00 references admiraal, w. (2020). a typology of student-teachers' coping with stressful classroom events. european journal of education, 3(1), 6–17. https://doi.org/10.1080/03055698.2020.1729097 admiraal, w. f., korthagen, f. a. j., & wubbels, t. (2000). effects of student teachers’ coping behaviour. british journal of educational psychology, 70(1), 33–52. https://doi.org/10.1348/000709900157958 anderson, r. s., & speck, b. w. (1998). "oh what a difference a team makes": why team teaching makes a difference. teaching and teacher education, 14(7), 671–686. https://doi.org/10.1016/s0742-051x(98)000213 arnold, n., ducate, l., & kost, c. (2012). collaboration or cooperation? analyzing group dynamics and revision processes in wikis. calico journal, 29(3), 431–448. aspinwall, l. g., & taylor, s. e. (1997). a stitch in time: self-regulation and proactive coping. psychological bulletin, 121(3), 417–436. baeten, m., & simons, m. (2014). student teachers' team teaching: models, effects, and conditions for implementation. teaching and teacher education, 41, 92–110. https://doi.org/10.1016/j.tate.2014.03.010 baeten, m., & simons, m. (2016). student teachers’ team teaching: how do learners in the classroom experience team-taught lessons by student teachers? journal of education for teaching, 42(1), 93–105. https://doi.org/10.1080/02607476.2015.1135226 do | f l r 117 barahona, m. (2017). exploring models of team teaching in initial foreign/second language teacher education: a study in situated collaboration. australian journal of teacher education, 42(12), 144–161. http://dx.doi.org/10.14221/ajte.2017v42n12.9 bashan, b., & holsblat, r. (2012). co-teaching through modeling processes: professional development of students and instructors in a teacher training program. mentoring & tutoring: partnership in learning, 20(2), 207–226. https://doi.org/10.1080/13611267.2012.678972 becker, e. s., waldis, m., & staub, f. c. (2019). advancing student teachers’ learning in the teaching practicum through content-focused coaching: a field experiment. teaching and teacher education, 83, 12– 26. https://doi.org/10.1016/j.tate.2019.03.007 birrell, j. r., & bullough, r. (2005). teaching with a peer: a follow-up study of the 1st year of teaching. action in teacher education, 27(1), 72–81. https://doi.org/10.1080/01626620.2005.10463375 blase, j. j. (1989). the micropolitics of the school: the everyday political orientation of teachers toward open school principals. educational administration quarterly, 25(4), 377–407. https://doi.org/10.1177/0013161x89025004005 brennan, r. l., & prediger, d. j. (1981). coefficient: some uses, misuses, and alternatives. educational and psychological measurement, 41(3), 687–699. brody, r. g. (2012). external auditors' willingness to rely on the work of internal auditors: the influence of work style and barriers to cooperation. advances in accounting, 28(1), 11–21. https://doi.org/10.1016/j.adiac.2012.02.005 bullough, r. v., young, j., erickson, l., birrell, j. r., clark, d. c., egan, m. w., berrie, c. f., hales, v., & smith, g. (2002). rethinking field experiences: partnership teaching vs. single-placement teaching. the journal of teacher education, 53(1), 68–80. https://doi.org/10.1177/0022487102053001007 carless, d. r. (2006). good practices in team teaching in japan, south korea and hong kong. system, 34(3), 341–351. https://doi.org/10.1016/j.system.2006.02.001 carver, c. s., scheier, m. f., & weintraub, j. k. (1989). assessing coping strategies: a theoretically based approach. journal of personality and social psychology, 56(2), 267–283. clunies-ross, p., little, e., & kienhuis, m. (2008). self-reported and actual use of proactive and reactive classroom management strategies and their relationship with teacher stress and student behaviour. educational psychology, 28(6), 693–710. https://doi.org/10.1080/01443410802206700 cochran-smith, m. (1991). learning to teach against the grain. harvard educational review, 61(3), 279–310. https://doi.org/10.17763/haer.61.3.q671413614502746 cohen, e., hoz, r., & kaplan, h. (2013). the practicum in preservice teacher education: a review of empirical studies. teaching education, 24(4), 345–380. https://doi.org/10.1080/10476210.2012.711815 dang, t. k. a. (2013). identity in activity: examining teacher professional identity formation in the pairedplacement of student teachers. teaching and teacher education, 30, 47–59. https://doi.org/10.1016/j.tate.2012.10.006 do | f l r 118 dang, t. k. a. (2017). exploring contextual factors shaping teacher collaborative learning in a pairedplacement. teaching and teacher education, 67, 316–329. https://doi.org/10.1016/j.tate.2017.06.008 darling-hammond, l. (1996). the right to learn and the advancement of teaching: research, policy, and practice for democratic education. educational researcher, 25(6), 5–17. https://doi.org/10.3102/0013189x025006005 darragh, j. j., picanco, k. e., tully, d., & henning, a. s. (2011). when teachers collaborate, good things happen: teacher candidate perspectives of the co-teach model for the student teaching internship. the journal of the association of independent liberal arts colleges of teacher education, 8(1), 83–109. de jong, l., meirink, j., & admiraal, w. (2019). school-based teacher collaboration: different learning opportunities across various contexts. teaching and teacher education, 86, 1–12. https://doi.org/10.1016/j.tate.2019.102925 dee, a. l. (2012). collaborative clinical practice: an alternate field experience. issues in teacher education, 21(2), 147–163. dieker, l. a. (2001). what are the characteristics of “effective” middle and high school co-taught teams? preventing school failure, 46(1), 14–25. dillenbourg, p., baker, m., blaye, a., & o'malley, c. (1996). the evolution of research on collaborative learning. in p. reimann & h. spada (eds.), learning in humans and machines: towards an interdisciplinary learning science (pp. 189–211). elsevier. endler, n. s., & parker, j. d. a. (1999). coping inventory for stressful situations (ciss): manual (2nd ed.). multi-health systems. friedman, r. a., tidd, s. t., currall, s. c., & tsai, j. c. (2000). what goes around comes around: the impact of personal conflict style on work conflict and stress. international journal of conflict management, 11(1), 32–55. https://doi.org/10.1108/eb022834 friend, m., cook, l., hurley-chamberlain, d., & shamberger, c. (2010). co-teaching: an illustration of the complexity of collaboration in special education. journal of educational and psychological consultation, 20(1), 9–27. https://doi.org/10.1080/10474410903535380 gallo-fox, j., & scantlebury, k. (2016). coteaching as professional development for cooperating teachers. teaching and teacher education, 60, 191–202. https://doi.org/10.1016/j.tate.2016.08.007 gardiner, w., & robinson, k. (2009). paired field placements: a means for collaboration. the new educator, 5(1), 81–94. https://doi.org/10.1080/1547688x.2009.10399565 gardiner, w., & robinson, k. s. (2011). peer field placements with preservice teachers: negotiating the challenges of professional collaboration. professional educator, 35(2), 1–11. gillies, r. m. (2004). the effects of communication training on teachers’ and students’ verbal behaviours during cooperative learning. international journal of educational research, 41(3), 257–279. https://doi.org/10.1016/j.ijer.2005.07.004 do | f l r 119 goodnough, k., osmond, p., dibbon, d., glassman, m., & stevens, k. (2009). exploring a triad model of student teaching: pre-service teacher and cooperating teacher perceptions. teaching and teacher education, 25(2), 285–296. https://doi.org/10.1016/j.tate.2008.10.003 guise, m., habib, m., thiessen, k., & robbins, a. (2017). continuum of co-teaching implementation: moving from traditional student teaching to co-teaching. teaching and teacher education, 66(1), 370–382. https://doi.org/10.1016/j.tate.2017.05.002 gustems-carnicer, j., calderón, c., & calderón-garrido, d. (2019). stress, coping strategies and academic achievement in teacher education students. european journal of teacher education, 42(3), 375–390. https://doi.org/10.1080/02619768.2019.1576629 hämäläinen. r., & vähäsantanen, k. (2011). theoretical and pedagogical perspectives on orchestrating creativity and collaborative learning. educational research review, 6(3), 169–184. https://doi.org/10.1016/j.edurev.2011.08.001 heikonen, l., toom, a., pyhältö, k., pietarinen, j., & soini, t. (2017). student-teachers' strategies in classroom interaction in the context of the teaching practicum. journal of education for teaching, 43(5), 534– 549. https://doi.org/10.1080/02607476.2017.1355080 howlett, k., & nguyen, h. (2020). autoethnographic reflections of an international graduate teaching assistant’s co-teaching experiences. journal of international students, 10(2), 401–419. https://doi.org/10.32674/jis.v10i2.774 johnson, d. w., & johnson, r. t. (2009). an educational psychology success story: social interdependence theory and cooperative learning. educational researcher, 38(5), 365–379. https://doi.org/10.3102/0013189x09339057 kamens, m.w. (2007). learning about co-teaching: a collaborative student teaching experience for preservice teachers. teacher education and special education, 30(3), 155–166. https://doi.org/10.1177/088840640703000304 kelchtermans, g. (2006). teacher collaboration and collegiality as workplace conditions. a review. zeitschrift für pädagogik, 52(2), 220–237. kreis, a. (2019). content-focused peer coaching – facilitating student learning in a collaborative way. in t. janík, i. m. dalehefte, & s. zehetmeier (eds.), supporting teachers: improving instruction. examples of research-based in-service teacher education (pp. 37–55). waxmann. lave, j., & wenger, e. (1991). situated learning legitimate peripheral participation. cambridge university press. lazarus, r. s., & folkman, s. (1984). stress, appraisal, and coping. springer. le, h., janssen, j., & wubbels, t. (2018). collaborative learning practices: teacher and student perceived obstacles to effective student collaboration. cambridge journal of education, 48(1), 103–122. https://doi.org/10.1080/0305764x.2016.1259389 do | f l r 120 lindqvist, h., weurlander, m., wernerson, a., & thornberg, r. (2017). resolving feelings of professional inadequacy: student teachers’ coping with distressful situations. teaching and teacher education, 64, 270– 279. https://doi.org/10.1016/j.tate.2017.02.019 lindqvist, h., weurlander, m., wernerson, a., & thornberg, r. (2020). conflicts viewed through the micropolitical lens: beginning teachers’ coping strategies for emotionally challenging situations. research papers in education, 35(6), 746–765. https://doi.org/10.1080/02671522.2019.1633559 little, j. w. (1990). the persistence of privacy: autonomy and initiative in teachers‘ professional relations. teachers college record 91(4), 509–536. liu, l. (2008). co-teaching between native and non-native english teachers: an exploration of co-teaching models and strategies in the chinese primary school context. reflections on english language teaching 7(2), 103–118. macdonald, c. j. (1993). coping with stress during the teaching practicum: the student teacher's perspective. alberta journal of educational research, 39(4), 407–418. mastropiemeijri, m. a., scruggs, t. e., graetz, j. e., norland, j., gardizi, w., & mcduffie, k. (2005). case studies in co-teaching in the content areas: successes, failures and challenges. intervention in school and clinic, 40(5), 260–270. https://doi.org/10.1177/10534512050400050201 mayring, p. (2010). qualitative inhaltsanalyse. grundlagen und techniken (11th ed.). [qualitative content analysis. basics and techniques]. beltz. meijer, p. c., korthagen, f. a.j., & vasalos, a. (2009): supporting presence in teacher education: the connection between the personal and professional aspects of teaching. teaching and teacher education, 25(2), 297–308. https://doi.org/10.1016/j.tate.2008.09.013 murawski, w., & dieker, l. (2008). 50 ways to keep your co-teacher: strategies for before, during, and after co-teaching. teaching exceptional children, 40(4), 40–48. https://doi.org/10.1177/004005990804000405 murawski, w. w., & lochner, w. w. (2011). observing co-teaching: what to ask for, look for, and listen for. intervention in school and clinic, 46(3), 174–183. https://doi.org/10.1177/1053451210378165 murray-harvey, r., slee, p. t., lawson, m. j., silins, h., banfield, g., & russell, a. (2000). under stress: the concerns and coping strategies of teacher education students. european journal of teacher education, 23(1), 19–35. https://doi.org/10.1080/713667267 national council for the accreditation of teacher education (ncate). (2010). transforming teacher education through clinical practice: a national strategy to prepare effective teachers. report of the blue ribbon panel on clinical preparation and partnerships for improved student learning. ncate. nissen, h. a., evald, m. r., & clarke, a. h. (2014). knowledge sharing in heterogeneous teams through collaboration and cooperation: exemplified through public-private-innovation partnerships, industrial marketing management, 43(3), 473–482. nokes, j. d., bullough, r. v., egan, w. m., birrell, j. r., & hansen, j. m. (2008). the paired-placement of student teachers: an alternative to traditional placements in secondary schools. teaching and teacher education, 24(8), 2168–2177. https://doi.org/10.1016/j.tate.2008.05.001 do | f l r 121 organisation of economic and cultural development (oecd). (2019). talis 2018 results (volume i): teachers masand school leaders as lifelong learners, talis, oecd. https://dx.doi.org/10.1787/1d0bc92a-en. parsons, m., & stephenson, m. (2005). developing reflective practice in student teachers: collaboration and critical partnerships. teachers and teaching: theory and practice, 11(1), 95–116. https://doi.org/10.1080/1354060042000337110 piaget, j. (1926). the language and thought of the child. harcourt brace. putnam, l. l., & wilson, c. e. (1982). communicative strategies in organizational conflicts: reliability and validity of a measurement scale. in m. burgoon & n. e. doran (eds.), communication yearbook 6 (pp. 629– 652). sage. reupert, a., & woodcock, s. (2010). success and near misses: pre-service teachers’ use, confidence and success in various classroom management strategies. teaching and teacher education, 26(6), 1261–1268. https://doi.org/10.1016/j.tate.2010.03.003 ronfeldt, m., farmer, s. o., mcqueen, k., & grissom, j. (2015). teacher collaboration in instructional teams and student achievement. american educational research journal, 52(3), 475–514. https://doi.org/10.3102/0002831215585562 scantlebury, k., gallo–fox, j., & wassell, b. (2008). coteaching as a model for pre-service secondary science teacher education. teaching and teacher education, 24(4), 967–981. putnamhttps://doi.org/10.1016/j.tate.2007.10.008 schmulian, a., & coetzee, s.a. (2019). to team or not to team: an exploration of undergraduate students' perspectives of two teachers simultaneously in class. innovative higher education, 44(4), 317–328. http://dx.doi.org/10.1007/s10755-019-9466-2 schwarzer, r., & schwarzer, c. (1996). a critical survey of coping instruments. in m. zeidner & n. s. endler (eds.), handbook of coping: theory, research, applications (pp. 107–132). john wiley and sons. shin, e.-k., wilkins, e. a., & ainsworth, j. (2007). the nature and effectiveness of peer feedback during an early clinical experience in an elementary education program. action in teacher education, 28(4), 40–52. https://doi.org/10.1080/01626620.2007.10463428 simons, m., baeten, m., & vanhees, c. (2020). team teaching during field experiences in teacher education: investigating student teachers’ experiences with parallel and sequential teaching. journal of teacher education, 71(1), 24–40. https://doi.org/10.1177/0022487118789064 skinner, e. a., edge, k., altman, j., & sherwood, h. (2003). searching for the structure of coping: a review and critique of category systems for classifying ways of coping. psychological bulletin, 129(2), 216–269. https://doi.org/10.1037/0033-2909.129.2.216 sorensen, p. (2014). collaboration, dialogue and expansive learning: the use of paired and multiple placements in the school practicum. teaching and teacher education, 44, 128–137. https://doi.org/10.1016/j.tate.2014.08.010 stairs, a., corrieri, c., fryer, l., genovese, e., panaro, r., & sohn, c. (2009). inquiry into partnered student teaching in an urban school university partnership. school university partnerships, 3(1), 75–89. do | f l r 122 thousand, j., villa, r., & nevin, a. (2006). the many faces of collaborative planning and teaching. theory into practice, 45(3), 239–248. https://doi.org/10.1207/s15430421tip4503_6 timperley, h. (2015). professional conversations and improvement-focused feedback: a review of the research literature and the impact on practice and student outcomes. prepared for the australian institute for teaching and school leadership, aitsl, melbourne. topping, k. j. (2005). trends in peer learning. educational psychology, 25(6), 631–645. https://doi.org/10.1080/01443410500345172 tsybulsky, d. (2019). the team teaching experiences of pre-service science teachers implementing pbl in elementary school. journal of education for teaching: international research and pedagogy, 45(3), 244– 261. https://doi.org/10.1080/09589236.2019.1599505 tsybulsky, d., muchnik-rozanov, y. (2019). the development of student-teachers' professional identity while team-teaching science classes using a project-based learning approach: a multi-level analysis. teaching and teacher education, 79, 48–59. https://doi.org/10.1016/j.tate.2018.12.006 vygotsky, l. s. (1978). mind in society: the development of higher psychological processes. harvard university press. walsh, k., & elmslie, l. (2005). practicum pairs: an alternative for first field experience in early childhood teacher education. asia-pacific journal of teacher education, 33(1), 5–21. https://doi.org/10.1080/1359866052000341098 wassell, b., & lavan, s. k. (2009). revisiting the dialogue on the transition from coteaching to inservice teaching: new frameworks, additional benefits and emergent issues. culture studies of science education, 4, 477–484. https://doi.org/10.1007/s11422-008-9152-7 wilks, r. (1996). classroom management in primary schools: a review of the literature. behaviour change, 13(1), 20–32. https://doi.org/10.1017/s0813483900003922 wynn, m. j., & kromrey, j. (2000). paired peer placement with peer coaching to enhance prospective teachers' professional growth in early field experience. action in teacher education, 22(2), 73–83. https://doi.org/10.1080/01626620.2000.10463041 zeichner, k.m. (1981). reflective teaching and field-based experience in teacher education. interchange, 12(4), 1–22. frontline learning research 1 (2013) 12 issn 2295-3159 http://dx.doi.org/10.14786/flr.v1i1.57 1 | f l r editorial frontline research in an accessible and flexible way erno lehtinen an increasing number of new scientific journals have been founded in the last few years. a big part of these new publishing forums are open-access electronic-only journals. when starting a new journal it is important to carefully think about why this new journal is needed and which kind of journal it should be. the two previously founded journals of the european association for research on learning and instruction (earli) have been very successful. learning and instruction has established its role as one of the leading journals in education and educational psychology. it publishes theoretically and methodologically strong original articles. educational research review has opened new opportunities for publishing review articles, metaanalyses and theoretical papers, and the quickly increased impact factor indicates that it is also well trusted by the research community. then why the need to start the third earli journal frontline learning research (flr)? during the almost three decades of european scientific collaboration within earli the number of researchers and the quality of research in the field of learning and instruction have rapidly increased. also, the two previously founded earli journals have extended their influence far beyond the european countries and receive high-level submissions from all over the world. because of these developments there is a much larger number of excellent manuscripts out there in the earli community than the two existing journals are able to publish. we believe that many of these research papers are worth publishing. yet, it was not only the increase of the publication pressure that led earli to the decision to supplement the existing journals by founding frontline learning research. the main aim was to develop a journal which would explicitly support innovative theoretical and methodological thinking and increase dynamics in the field. accordingly, the emphasis of the journal will be on promoting educational and learning sciences as a multidisciplinary domain, drawing from cognitive, philosophical, sociological, psychological and pedagogical theoretical paradigms. while emphasising innovative and risk-taking approaches this new journal will follow the successful policy of the other earli journals by making sure that all manuscripts go through a serious and rigorous review process. it will be a big challenge for the editors and reviewers to combine these principles, however, we believe that it is feasible. authors submitting manuscripts are the key players in creating a novel publication culture for the journal. innovative ideas and risk-taking studies have a stronger impact if they are presented in a rigorous way that enables careful evaluation of theoretical arguments, methodological details, and conclusions. e. lehtinen 2 | f l r there are deep-going changes in scientific publication policies and it was the right time for earli to take these changes into account. although libraries of big and wealthy universities provide researchers with wide on-line access to scientific journals, the journal packages available in many universities are limited. in addition, many readers of scientific publications belong to universities or research institutions providing access to scientific journals. in order to increase the impact of scientific publishing in educational and learning sciences it is important to develop open-access forums, which are available for readers independently of the organisation in which they are working. frontline learning research is for this reason an open-access journal. it means that anyone using the internet can read it for free. researchers are guaranteed more flexible access to the journals articles and the open-access format also enables them to use the articles as teaching material in face-to-face and on-line courses. furthermore, it also means that the journal is easily accessible for practitioners. frontline learning research is an electronic-only journal. in daily research work on-line versions of scientific publications are used more and more frequently and researchers seldom go to libraries to read traditional hard copies of the journals. however, we still tend to think that the existence of a traditional paper version is a prerequisite for a high reputation scientific publication. in many traditional scientific fields (e.g. physics), however, the situation is rapidly changing and electronic-only journals can be found among the most highly ranked publishing forums. in planning flr we emphasised that the electronic-only format does not only mean that there are no hard copies available, but also that it opens up new opportunities that go beyond the possibilities of traditional printed journals. the electronic delivery form provides authors with a large variety of options for dynamic presentations, such as videos, simulations, hyperlinks and animations. in other words, this electronic journal makes it possible to demonstrate novel data collection processes and alternative analysis methods in a flexible way. the electronic-only format also allows more freedom for using varying types of articles. in flr we welcome short, regular and extended manuscripts. this allows very quick communication about new findings, while also enabling in-depth description of complex empirical data. slow review processes and a long publication lag are frustrating for researchers. in flr much attention is paid to the fast review process. as an electronic journal flr is also flexible in terms of articles published in individual issues and there is no need for publication lag. a fast review and publication schedule makes it possible to have intensive scientific discussions within the journal. the papers published in this first issue of the journal demonstrate some of the ideas we have about innovative and risk-taking research. the editorial team of the frontline learning research invites earli members and researchers elsewhere to participate in this collaborative enterprise to create a new innovative publishing culture for learning research. editor-in-chief erno lehtinen editors sanne akkerman, filip dochy, nikos papadouris and jan vermunt assistant editor inneke berghmans microsoft word koerber et al_publication.docx frontline learning research vol.5 no. 1 (2017) 76 -‐ 84 issn 2295-‐3159 corresponding author: susanne koerber, department of psychology, freiburg university of education, kunzenweg 21, 79117 freiburg, germany. phone: +49(0)761 / 682-278, email: susanne.koerber@ph-freiburg.de. doi: http://dx.doi.org/10.14786/flr.v5i1.265 diagrams support revision of prior belief in primary-school children susanne koerbera, christopher osterhausa, beate sodianb afreiburg university of education, department of psychology, germany bludwig-maximilians-university munich, department of psychology, germany article received 4 july / revised 21 november / accepted 29 november / available online 28 february abstract the reluctance of children to revise their prior beliefs is a prominent phenomenon in the reasoning literature. one way to facilitate belief change is offering explanations, and this study examined whether highlighting (counter)evidence with diagrams leads to belief revision to the same extent. altogether 134 preschoolers and second-graders (5and 7year-olds, respectively) were presented with either counterintuitive data or explanations, both refuting a strong commonly held belief concerning the relation between two variables (e.g. eating carrots improves vision). in the explanation condition, we presented children with an explanatory underlying mechanism for the unexpected causal relation (e.g. spinach and carrots contain the same amount of vitamin a, with both improving vision). in the diagram condition, children were presented with empirical data displayed in a bar graph (non-covariation), which also disconfirmed the initial belief. in both age groups and both conditions we found significant numbers of belief revision with high certainty ratings concerning the new belief. belief change was more pronounced in second-graders, who in addition showed significantly more changes in the diagram condition than in the explanation condition. these findings suggest that the perceptual saliency of (counter)evidence helps children to correctly evaluate hypotheses, which supports changes in their prior belief. keywords: belief revision, hypothesis evaluation, primary-school children, diagram, explanation koerber et al | f l r 77 1. introduction the prominent role of prior knowledge in the evaluation of evidence is a well-documented phenomenon in the reasoning literature (e.g., croker & buchanan, 2011; kuhn, amsel, & o’laughlin, 1988). especially when evidence is inconsistent with their favored hypotheses or beliefs, children often refuse to update their initial beliefs; rather, they distort their interpretation of the data so that these are in accordance with their initially held belief (klaczynski, 2001; kuhn et al., 1988). the distortion typically entails that children ignore data that are inconsistent with their initial belief, that they selectively attend to only those parts of the data that are consistent with their initial belief, or that they even misinterpret the data (amsel & brock, 1996; chinn & brewer, 1993; chinn & malhotra, 2002; kuhn, et al., 1988; masnick, klahr, & knowles, 2016). a common explanation for this phenomenon has been proposed by kuhn (e.g., kuhn, 2010). kuhn suggests that young children do not understand that theory (beliefs) and evidence (data) are two epistemologically distinct categories. specifically, she argues that young children do not understand that theories need to be fully backed up by evidence, which would result in their selective interpretation of data. in a well-known study, kuhn and her colleagues (1988) asked children to interpret data that suggested a causal relation between a set of variables (e.g., different kinds of food, being healthy or not). specifically, sixthand ninth graders, as well as adults, were presented with fictitious data that, so were told, was obtained in a boarding school, in which different groups of students consumed four different kinds of food or beverage (diet coke or regular coke, baked potato or fries, oranges or apples, and special k or granola). some of the students in the boarding school felt sick after lunch, others were healthy. two variables contradicted the prior belief of the participants and two variables were in accordance with the participants’ beliefs concerning the impact of the different food items on the health of the students when children were asked whether a specific kind of food caused the students’ sickness, kuhn and colleagues found that only the older children and the adults used evidence-based reasoning (i.e., they related their answer to the covariation data). in contrast, most of the younger children did not attend to the evidence at all or they did not interpret it correctly and instead they kept their initial belief. although this finding seems to suggest some severe deficits in data interpretation skills and an insufficient differentiation between hypothesis and evidence in young children, more recent research shows that children as young as six-year olds possess a basic understanding of the hypothesis-evidence relation (e.g. ruffman, perner, olson & doherty, 1993). for instance, koerber, sodian, thoermer, and nett (2005) found that children as young as five-year-olds are able to successfully interpret simple patterns of covariation data, without any distortion of prior beliefs, when data are not overly complex and when the prior belief is not too strong. in their study, koerber and colleagues presented children with a hypothesis held by a story character and a set of covariation data (perfect and imperfect covariations, noncovariation) that contradicted the protagonist’s hypothesis. most five-year-olds successfully attributed a belief revision to the protagonist when the relation presented in the data was straightforward (perfect covariation), showing that they are able to successfully interpret simple patterns of data without any distortions and to incorporate this new evidence into their theories (see also piekny & maehler, 2013; van der graaf, segers, & verhoeven, 2016, for a replication of these findings). these confirmatory findings of early data interpretation skills are in line with a growing literature on early preschool and primary school scientific thinking, which shows that already young children possess a basic understanding of the distinction between hypothesis and evidence (mayer, koerber, sodian & schwippert, 2014; sandoval, sodian, koerber, & wong, 2014). the discovery of increasingly mature scientific thinking skills in this young age group suggests that children, in principle, should be able to use data and evidence to update and revise their initially held beliefs. but then why do studies find such weak performance in belief revision tasks in young children? we argue that one of the main reasons why evidence evaluation often fails to promote belief revision in young children is that many studies use evidence that is either too complex and/or is not salient enough, especially so for young children. asking children to interpret data about the effects of multiple variables in a single design, as for instance was done in kuhn et al. (1988), places heavy demands on children’s general koerber et al | f l r 78 information-processing capacities and it demands a sequential processing of information. in addition the information typically only enters via a single perceptual route (i.e., data are presented only verbally without graphical depiction). chinn and malhotra (2002) found that indeed children find it difficult to make correct, undistorted observations of (counter)evidence when the data are not salient. data with little salience increased children’s reluctance to change their prior belief in the face of new evidence, showing that the salience of the evidence is an important facilitator of successful belief revision. one possible way to increase salience of data lies in the presentation of evidence in simple bar graphs. bar graphs make possible a salient and meaningful representation of covariation data and even preschoolers can successfully interpret these graphs (koerber & sodian, 2008). their positive effects are, such as those of many well-designed visual displays, mostly attributed to the following three characteristics (e.g. hegarty, 2011; kosslyn, 1989, 1994, 2006; larkin & simon, 1987): first, bar graphs serve as external storage for information and thus they reduce memory capacity. second, the relations between two variables are spatially organized in bar graphs and they can be perceived at a glance (i.e., different pieces of information and relations between variables can be perceived simultaneously and thus bar graphs allow for a more efficient processing of information). and third, complex cognitive processes can be “offloaded” on perception in bar graphs. taken together, bar graphs thus assist information processing by offering an additional perceptual route (in addition to the verbal route) and, in addition, their two-dimensional display of a relation between two variables allows using analogies to space and spatial relations in order to make inferences on non-spatial content domains. graphs thus offer the viewer information (e.g., a linear trend) in a direct way which does not require that this information is inferred or computed from numerical or verbal data. recent research has shown that the positive effects of bar graphs even hold for children as young as six-year-olds. moreover, prior research showed that kindergarteners can successfully read off causal relations from simple bar graphs (koerber & sodian, 2008). the present study investigates whether successful belief revision can be induced in young children when evidence is presented in a salient way (i.e., in a bar graph) that requires reduced information processing. specifically, we compare this way of inducing belief revision to a highly effective means of revising initially held beliefs, which is providing children with explanations and causal mechanisms that link the two variables. this approach has been taken by koslowski (1996, 2012, see also koslowski, marasia, chelenza, & dublin, 2008), who argues that children often do not give up their initially held beliefs in favor of contrary evidence because of their strong subjective causal theories. these causal theories comprise not only information about the statistical association (covariation) between two variables, but they also entail beliefs about the underlying causal mechanism that connects the two variables. according to this account, covariation evidence alone cannot sufficiently challenge the initially held beliefs when no adequate, novel explanations about the underlying causal mechanism is offered simultaneously. in a study involving sixth and ninthgraders as well as college students, koslowski (1996) presented participants with the results of a fictitious study that investigated whether two kinds of food (sweets with low or high concentration of sugar, and milk with low or high concentration of fat) influenced whether or not children could easily fall asleep. all participants initially believed that sugar but not fat had a significant influence on children’s ability to find sleep. participants were assigned to two conditions: in one condition, participants received only covariation evidence that contradicted their initially held belief (“covariation-only-condition”); in the other condition, participants received not only covariation evidence, but also they were given an explanation concerning the mechanism that may link fat to sleep (“covariation-and-mechanism condition”). as hypothesized, receiving an additional explanation led to significant more belief revision than did the covariation-evidence alone. the explanation condition that was included in the present study therefore presented, analogously to koslowski (1996), children with an explanation that accounted for the occurrence of the counterevidence and a mechanism that linked the two variables. the diagram condition in turn, presented empirical data about covariation in a salient way, depicted in bar graphs. participants were preschool and primary-school children, who are at an age when conceptual change occurs over a wide range of knowledge areas. we hypothesized that children in this young age group would change their prior belief when presented with evidence in a koerber et al | f l r 79 salient way. in addition, we hypothesized that belief revision would, in the diagram condition, not be inferior to belief revision in the explanation condition. 2. methods a 2 (condition: explanation vs. diagram) by 2 (age group: preschoolers vs. second-graders) betweensubjects design was employed to investigate children’s belief revision in four tasks. 2.1. participants participants were 134 children, among them 54 preschoolers (m = 5,6 years, sd = 5 months, nexplanation = 28, ndiagram = 26) and 80 second-graders (m = 7,6 years, sd = 5 months, nexplanation = 40, ndiagram = 40). the 64 girls and 70 boys were sampled from 10 middle-class preschools and schools in a proximity to a large city in southern germany. preschools and schools supplied parents with information material about the study, and parents decided whether or not their children would be allowed to participate in the study. in addition to parental informed consent, child assent was obtained for all participants. participants from both age groups were semi-randomly assigned to one of the two conditions to ensure equal group sizes. 2.2. material and procedure four tasks were used to test children’s belief revision. the contexts used in these tasks were contexts in which children typically hold strong, naïve beliefs about the covariation and causal relation between two variables. these were: (1) eating carrots (but not spinach) improves vision; (2) drinking milk (but not mineral water) increases bone density; (3) eating gummi bears (but not mustard) makes you fat; and (4) eating chocolate (but not bananas) makes you happy. for each context, the children were first asked about their initial belief concerning the relation between the variables (e.g. “does eating carrots or eating spinach improve vision, or do they both equally contribute to good vision?”). children’s answers were visually displayed by placing small plastic cards (e.g., eyes and carrots) in front of them. this was done so that children would remember their initial belief. also, children indicated how confident they were about their initial belief (0, 1, 2). in addition, children were presented with a story protagonist (robbie or anna, depending on the gender of the child which was the same for the protagonist) who held the same initial belief as the child. this was done in order to account for potential differences between children’s own belief revision and the revision they ascribe to another person. depending on children’s initial belief, the following counterevidence was used to induce belief revision in children: in the carrot/spinach context (task 1), children who believed that carrots or spinach improved vision were presented with counterevidence that showed that the effect of the two variables is equally strong (noncovariation). children who initially believed that both factors are equally associated with good vision were presented with counterevidence that showed that carrots improve vision (see table 1 for an overview of which counterevidence was presented in response to varying initial beliefs). because the prior literature revealed that the type of covariation evidence (e.g., perfect covariation or noncovariation) influences preschoolers’ interpretation of data (e.g., koerber et al., 2005, piekny & mähler, 2013), we included counterevidence in the form of noncovariation as well as in the form of perfect covariation in order to account for potential effects. koerber et al | f l r 80 table 1 design of the four tasks context child said counterevidence (factor a) counterevidence (factor b) counterevidence (a/b same effect) carrot/spinach (task 1) carrot (a) x spinach (b) x doesn’t matter (a/b) x milk/water (task 2) milk (a) x water (b) x doesn’t matter (a/b) x gummi bears/mustard (task 3) gummi bears (a) x mustard (b) x doesn’t matter (a/b) x chocolate/bananas (task 4) chocolate (a) x bananas (b) x doesn’t matter (a/b) x in the diagram condition, counterevidence was presented visually in bar graphs (for an example, see figure 1); in the explanation condition, experimenters presented the children with an explanation about a mechanism that supported the opposite of children’s initial belief. in the carrots/spinach context, for instance, children who initially believed that only carrots would improve vision heard an explanation that maintained that spinach and carrots contain equal amounts of vitamin a, which is the mechanism that improves vision. after the evidence was presented, experimenters reminded children of their initial belief. subsequently, they asked the children about their present belief (including the strength of their confidence in this belief) as well as about the present belief of the story protagonist (third person). children were interviewed individually by two research assistants, who were extensively trained and who each interviewed equal amounts of children in each condition and age group (i.e., there was no interaction between experimenters and condition). to ensure that children in the diagram condition were able to interpret the bar graphs, a short introduction was given in which the experimenters explained how to read a bar graph. figure 1. example of a bar graph (perfect covariation). koerber et al | f l r 81 3. results preanalyses revealed that children’s confidence in their initial belief [low (=0), moderate (=1) or high (=2)] was high before treatment (m = 1.59; sd =.44 and m = 1.63, sd = .38, respectively, for the preschoolers and second-graders) with no significant difference between the age groups, t(132) =-.547, p>.05. this clearly indicates that all children held strong initial beliefs. figure 2 shows the mean percent of belief revision across all four tasks. in line with our hypothesis, more than 60% of all children changed their initial belief in light of the salient evidence or the explanations. in the explanation condition, between 63% and 71% of all children changed their initial belief (solid lines) and they ascribed a belief revision to the story protagonist (dashed lines) in the light of the counterevidence. in the diagram condition, belief revision was more pronounced in both measures for the second-graders (84% and 86%) than for the preschoolers (59% and 67%). the difference between the diagram condition and the explanation condition was significant in second grade, where children significantly more often changed their own initial belief, t(78)=2.68, p <.01, and significantly more often ascribed a belief revision to the story protagonist, t(78) = 3.44, p=.001, in the diagram condition than in the explanation condition. in the diagram condition, there was, in addition, a significant difference between the two age groups, with five-year-olds changing their initial belief significantly less often than eight-year-olds, t(64) =-4.67, p<.001, and also they ascribed significantly less often a belief revision to the story protagonist than did eight-year-olds, t(64) =– 2.63, p=.01. while a 2 (age) by 2 (condition) analysis of variance of children’s mean number of belief revisions ascribed to the story protagonist revealed no significant main effects for age (f(1, 129) = 1.282, ns) or condition (f(1, 129) = 2.590, ns) a significant interaction, f(1, 129) = 5.631, p < .05, η2 = .042 was revealed. figure 2. mean number of belief revisions categorized by age and condition 4. discussion can successful belief revision be induced in young children when evidence is presented in a salient way? the present study found that salient evidence (diagrams) is as effective to challenge and revise young children’s initial beliefs as explanations. in contrast to the findings of a large number of prior studies (e.g., amsel & brock, 1996; chinn & brewer, 1993), our findings suggest that even preschoolers are able to revise an initial belief when they are presented with counterevidence, be it an explanation about a mechanism or covariation evidence in a salient way. second-graders showed even more belief revision when presented 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% explanafon diagram preschoolers (self) preschoolers (other) second-‐graders (self) second graders (other) koerber et al | f l r 82 with salient counterevidence (bar graph) than when receiving an explanation, suggesting that this form of evidence presentation may be especially beneficial in this age group. in accordance with our hypothesis, bar graphs helped children to successfully interpret the data and to revise their initial beliefs. bar graphs appear to be such a helpful tool in belief revision because they depict the relation between two or more variables in a straightforward manner; in addition, they enable students to grasp complex data patterns that depict the relations between multiple variables at a single glance, and they help to reduce memory and information processing demands. this way, bar graphs make the counterevidence more salient, and by means of this saliency they lead to substantially higher rates of belief revision than have been found in prior studies (e.g., amsel & brock, 1996). interestingly, our second graders performed even better in the diagram condition than in the explanation condition. this is an interesting finding because research from koslowski (1996) has shown highly beneficial effects of explanations in belief revision. a possible explanation for this finding (the high amount of belief revision in the diagram condition) is that children may have come up with self-generated explanations (legare, 2012). these self-generated explanations may have supported and strengthened the effect of the diagrams. legare showed that twoto six-year-olds produce substantially more often selfgenerated explanations when they are faced with counterevidence than when they interpret evidence that is consistent with their initial beliefs. importantly, our participants did not seem to distort the data in the diagram condition. this is most likely due to the salience of the counterevidence that we presented in the bar graphs, which makes it difficult to ignore or neglect aspects of the data that do not align with children’s initial beliefs. visually perceiving counterevidence thus might have triggered children to generate their own, new explanations for the unexpected relation rather than distorting the data. and it is reasonable to assume that these self-generated explanations are even more beneficial for inducing belief revision than those generated by another person. given this interpretation, it might be that the better performance of the second graders in the diagram condition cannot only be attributed to their perception of the visual information but also to their deeper processing of the information. future studies therefore need to disentangle the impact of the visual salience of the counterevidence and the depth of processing of the information (e.g., the generation of selfexplanations). to this end, “thinking aloud” protocols in which participants are explicitly asked to come up with explanations may be especially helpful. if the salient presentation of evidence and diagrams indeed provoke self-generated explanations, then it is impossible to strictly isolate the effects of the presentation of the diagram and the explanations. thus, in line with koslowski (1996, 2012), we did not regard evidence and explanation as a dichotomy, but rather we suggest that future studies should investigate the way in which the salient presentation of evidences leads to the generation of theory-based explanations. adding a third, mixed condition, in which the participants receive counterevidence and explanations, should then be equally facilitating for belief revision as the diagrams alone. children’s own belief revision was contrasted in this study with their ascription of belief revision to another person. although we found convergent findings in these two measures, the difference between the diagram and the explanation condition was more pronounced in second grade when asking children about another person’s belief revision. the belief revision of a third person has been the dependent measure in many other studies (e.g., koerber et al, 2005; piekny & mähler, 2013), which is typically done because it ensures that the (counter)evidence is evaluated from a more abstract and distant level. the general trends in our data were however similar for our two measures (own belief revision and belief revision ascribed to a third person) so that it seems reasonable to assume that these two measures are closely related, leading to the same conclusions regarding the influence of saliency on evidence interpretation. in sum, our findings suggest that the salient presentation of counterevidence leads to belief revision in children as young as five-year-olds. although explanations are important for children’s belief revision, explanations do not need to be provided externally (e.g., by an adult); instead, our data show that presenting evidence may suffice when it is presented in a way that is salient and that may elicit children’s selfgeneration of explanations. if this interpretation of our findings holds and children indeed generate koerber et al | f l r 83 explanations while processing salient evidence, this finding will be of high relevance not only for theorists, but also for teachers and practitioners in schools. first, it underlines the important role diagrams can play for illustrating concepts. second, it also reveals their usefulness for eliciting and promoting conceptual change (see also koerber, 2003). therefore, diagrams should play a more important role in school in general and specifically in science and math curricula. keypoints preschoolers are able to revise a prior belief when presented with counterevidence perceptual saliency of evidence (given in a diagram) helps children to evaluate hypotheses and revise beliefs the role of evidence and explanations for supporting hypothesis evaluation should not be viewed as dichotomous. acknowledgements the study was conducted as part of the project “development of understanding graphs and diagrams in preschool age and elementary-school age” and it was funded by the german research council (dfg) (ko 2276/3-1).we would like to thank daniela huber and susanne mikschl for their assistance in data collection and all children, parents, and teachers who supported this study. references amsel, e., & brock, s. (1996). the development of evidence evaluation skills. cognitive development, 11, 523-550. http://dx.doi.org/10.1016/s0885-2014(96)90016-7 chinn, c. a., & brewer, w. f. (1993). the role of anomalous data in knowledge acquisition: a theoretical framework and implications for science instruction. review of educational research, 63, 1-49. chinn, c. a., & malhotra, b. a. (2002). children's responses to anomalous scientific data: how is conceptual change impeded? journal of educational psychology, 94, 327-343. http://dx.doi.org/10.1037/0022-0663.94.2.327 croker, s., & buchanan, h. (2011). scientific reasoning in a real-‐‑world context: the effect of prior belief and outcome on children's hypothesis-‐‑testing strategies. british journal of developmental psychology, 29(3), 409-424. doi: 10.1348/026151010x496906 hegarty, m. (2011). the cognitive science of visual-spatial displays: implications for design. topics in cognitive science, 3(3), 446–474. doi: 10.1111/j.1756-8765.2011.01150.x klaczynski, p. a. (2001). analytic and heuristic processing influences on adolescent reasoning and decision-‐‑ making. child development, 72, 844-861. doi: 10.1111/1467-8624.00319 koerber, s. (2003). der einfluss externer repräsentationsformen auf proportionales denken im grundschulalter. [the influence of external forms of representation on proportional reasoning in elementary school age.] hamburg: verlag dr. kovac. koerber, s., sodian, b., thoermer, c., & nett, u. (2005). scientific reasoning in young children. preschoolers’ ability to evaluate covariation evidence. swiss journal of psychology, 64, 141-152. http://dx.doi.org/10.1024/1421-0185.64.3.141 koerber, s., & sodian, b. (2008). preschool children’s ability to visually represent relations. developmental science, 11, 390-395. doi: 10.1111/j.1467-7687.2008.00683.x koerber et al | f l r 84 kosslyn, s. m. (1989). understanding charts & graphs. applied cognitive psychology, 3, 185–226. doi: 10.1002/acp.2350030302 kosslyn, s. m. (1994). elements of graph design. new york: w. h. freeman. kosslyn, s. m. (2006). graph design for the eye and mind. new york: oxford university press koslowski, b. (1996). theory and evidence: the development of scientific reasoning. mit press. koslowski, b. (2012). scientific reasoning: explanation, confirmation bias and scientific practice. in g. feist & m. gorman (eds.), handbook of the psychology of science and technology (pp 151-192). dordrecht: springer. koslowski, b., marasia, j., chelenza, m., & dublin, r. (2008). information becomes evidence when an explanation can incorporate it into a causal framework. cognitive development, 23(4), 472-487. http://dx.doi.org/10.1016/j.cogdev.2008.09.007 kuhn, d. (2010). what is scientific thinking and how does it develop? in u. goswami (ed.). handbook of childhood cognitive development (2nd ed., pp. 472-534). oxford, england. blackwell. kuhn, d., amsel, e., & o'loughlin, m. (1988). the development of scientific reasoning skills. orlando, ca: academic. larkin, j., & simon, h. (1987). why a diagram is (sometimes) worth ten thousand words. cognitive science, 11, 65–99. doi: 10.1111/j.1551-6708.1987.tb00863.x legare, c. h. (2012). exploring explanation: explaining inconsistent evidence informs exploratory, hypothesis-‐‑testing behavior in young children. child development, 83, 173-185. doi: 10.1111/j.14678624.2011.01691.x masnick, a. m., klahr, d., & knowles, e. r. (2016). data-driven belief revision in children and adults. journal of cognition and development, (just-accepted). http://dx.doi.org/10.1080/15248372.2016.1168824 mayer, d., sodian, b., koerber, s., & schwippert, k. (2014). scientific reasoning in elementary school children: assessment and relations with cognitive abilities. learning and instruction, 29, 43-55. http://dx.doi.org/10.1016/j.learninstruc.2013.07.005 piekny, j., & maehler, c. (2013). scientific reasoning in early and middle childhood: the development of domain-‐‑general evidence evaluation, experimentation, and hypothesis generation skills. british journal of developmental psychology, 31(2), 153-179. doi: 10.1111/j.2044-835x.2012.02082.x ruffman, t., perner, j., olson, d. r., & doherty, m. (1993). reflecting on scientific thinking: children's understanding of the hypothesis-‐‑evidence relation. child development, 64, 1617-1636. doi: 10.1111/j.1467-8624.1993.tb04203.x sandoval, w. a., sodian, b., koerber, s., & wong, j. (2014). developing children's early competencies to engage with science. educational psychologist, 49, 139-152. http://dx.doi.org/10.1080/00461520.2014.917589 van der graaf, j., segers, e., & verhoeven, l. (2016). scientific reasoning in kindergarten: cognitive factors in experimentation and evidence evaluation. learning and individual differences, 49, 190-200. http://dx.doi.org/10.1016/j.lindif.2016.06.006 frontline learning research vol. 5 no. 3 special issue (2017) 55 65 issn 2295-3159 corresponding author: ass. prof. adam szulewski bsc, md, frcpc, mhpe, dep. of emergency medicine, queen’s university, empire 3, kingston general hospital, 76 stuart street, kingston, ontario, k7l 2v7,canada, aszulewski@qmed.ca doi: http://dx.doi.org/10.14786/flr.v5i3.256 pupillometry as a tool to study expertise in medicine adam szulewskia, danielle keltonb, daniel howesc adepartment of emergency medicine, queen’s university, canada bfaculty of medicine, queen’s university, canada cdepartments of emergency medicine and critical care medicine, queen’s university, canada article received 2 may / revised 9 november / accepted 23 march / available online 14 july abstract background pupillometry has been studied as a physiological marker for quantifying cognitive load since the early 1960s. it has been established that small changes in pupillary size can provide an index of the cognitive load of an individual as he/she performs a mental task. the utility of pupillometry as a measure of expertise is less well established, although recent research in the fields of education, medicine and psychology indicates that differences in pupillary size during domain-specific tasks allows differentiation between experts and novices in appropriately designed experiments. purpose the goal of this review is to explore the existing body of evidence for the use of pupillometry as a measure of expertise and to identify its strengths and constraints within the context of expertise research in the medical sciences. results pupillometry is a robust metric that allows researchers to better understand cognitive load in medical practitioners with varying levels of expertise. in medical expertise research, it has been used to study surgeons, anesthetists and emergency physicians. its strengths include its ability to provide quantitative and objective outputs, to be measured unobtrusively with new technology and to be precisely computed as cognitive load changes over the course of completion of a task. constraints associated with this methodology include its potential inaccuracy with changes in ambient light and pupillary accommodation as well as the need for relatively expensive equipment. conclusion with recent technological advances, pupillometry has become a simple and robust method for quantifying physiological changes attributable to cognitive load and is increasingly being utilized in medical education. it can be used as a reliable marker of cognitive load and has been shown to differentiate levels of expertise in medical practitioners. keywords: pupillometry; cognitive load; expertise; medical education http://dx.doi.org/10.14786/flr.v5i3.256 szulewski et al | f l r 56 1. background the measurement of human cognitive load has been of interest to researchers for decades. knowing how intensely a person is thinking has implications beyond knowing what that person is thinking about. this is particularly relevant in the context of professional domains (like medicine) where critical and cognitively loading decisions often need to be made with limited time and in the context of other competing priorities. this “intensity” of thinking, which is related to cognitive load, functions within the constraints of a limited working memory. working memory is a key executive function. executive functions are a group of mental processes that are required when an individual has to pay attention, and when it would be considered inappropriate, insufficient or impossible to rely on instinct or to respond automatically (burgess & simons, 2005). in addition to working memory, executive functioning involves two other core activities: inhibition (self-control, selective attention, cognitive inhibition) and cognitive flexibility (which is closely related to creativity). these three core activities are combined in different ways to build higher order executive functions such as reasoning, problem solving and planning (diamond, 2013). working memory, which is responsible for the manipulation of stored information (or our ability to “think”), is generally thought to be limited (paas, tuovinen, tabbers, & van gerven, 2003). ongoing work in this area now suggests that with experience, experts are able to expand working memory capacity by having developed methods for storage and retrieval of domain-specific information in long-term memory – so called long-term working memory (ericsson & kintsch, 1995). this is accomplished, in part, by pattern recognition and schema development, and a resultant relative decrease in the cognitive load that a problem or situation imposes as an individual becomes more expert-like (szulewski, roth, & howes, 2015). functionally, cognitive load can be thought of as the mental capacity that is allocated to performing a task (paas et al., 2003). it is thought to be comprised of three components: intrinsic cognitive load (icl), extraneous cognitive load (ecl) and germane cognitive load (gcl) (young, van merrienboer, durning, & ten cate, 2014). icl is a function of expertise and task complexity, while ecl is related to suboptimal information presentation conditions. gcl refers to the working memory resources that are dedicated to processing icl, and thus to learning (sweller, 2010). in general, researchers measure cognitive load using psychometric scales, physiological variables and secondary task methodology (paas et al., 2003). briefly, psychometric scales gather subjective data from participant self-reports after task completion. physiological variables use task-evoked pupillary responses (teprs) or pupillometry, heart rate variability, galvanic skin response (among others) as surrogate markers of cognitive load. secondary task methodology relies on participants’ performance on a secondary task (that requires sustained attention, like detecting an auditory signal) and uses this information to glean the level of cognitive load imposed by the primary task. each of these techniques has its own strengths and limitations; but in general, each is thought to provide data about total (or measurable) cognitive load. the contribution of intrinsic, extraneous and germane cognitive load to each of these measurement techniques remains to be elucidated (leppink, paas, van gog, van der vleuten, & van merriënboer, 2014). this review focuses on one particular physiological method of measuring cognitive load – pupillometry, which is the study of changes in pupil size. we will first examine the technique of pupillometry as a surrogate marker for cognitive load in non-medical domains and then we will focus the discussion on pupillometry research in medicine and how this relates to the development of expertise. finally, constraints of the technique will be discussed. szulewski et al | f l r 57 2. pupil physiology dilation and constriction of the human pupil is necessary for day-to-day visual tasks. dilation (mydriasis) of the pupil is accomplished by the contraction of the iris dilator (radial) muscle, which is controlled by the sympathetic nervous system. constriction (miosis) of the pupil occurs when the iris sphincter (circular) muscle contracts, which is controlled by the parasympathetic nervous system. two commonly tested reflexes in clinical medicine are the light reflex and the accommodation reflex. during the light reflex, the pupil dilates in low luminance environments and constricts in high luminance environments. in the accommodation reflex, as an individual changes visual focus from a distant object to a closer object, the pupil constricts, and vice-versa (lang, 2015). in addition to these clinically measurable and commonly discussed reflexes, pupils also change in size as a result of non-visual stimuli. this was first described in detail by hess and polt (1960) where it was shown that pupil size varied when participants viewed particular images (for example, sexually suggestive ones). follow-up studies by this group and others further demonstrated that pupil size could be used to measure cognitive load (or mental effort). physiologically, it is thought that pupil size changes with cognitive loading as a result of pathways that originate in the locus coeruleus, which is a major norepinephrine source in the brain (laeng, sirois, & gredebäck, 2012). in fact, locus coeruleus activity has been shown to be very closely related to sympathetic activity and changes in pupil size (aston-jones & cohen, 2005). these pupillary responses are spontaneous and very difficult to control voluntarily. the voluntary dilation of a subject’s pupils is only possible indirectly if the subject imagines a situation (e.g. selfinduced sexual imagery) where his/her pupils would normally dilate (whipple, ogden, & komisaruk, 1992). this would be particularly difficult, if not impossible, to systematically do while simultaneously performing other cognitively loading tasks. this makes the technique robust. although other autonomic measurements like heart rate and skin resistance have also been found to provide similar information regarding sympathetic activity (and thus cognitive loading), pupillometry has been found to yield the most consistent and readily analysable results (kahneman, tursky, shapiro, & crider, 1969). 3. pupillometry as a measure of cognitive load in a seminal article, hess and polt (1964) found there to be a strong correlation between difficulty of arithmetic problems posed to participants and the magnitude of the increase in their pupil sizes. further, they observed that after a question was asked, participants’ pupils showed a gradual increase in diameter, reached a maximum size just prior to reporting an answer, and then reverted back to their original diameter shortly thereafter. in another article, beatty and kahneman (1966) built upon these original experiments and were able to confirm two phases in the pupillary response to cognitive processing. first, they noted a loading phase with dilation corresponding to information gathering and an unloading phase where the pupil constricted as answers were verbalized by the participants. based on the results from these studies as well as others, it became generally accepted that changes in pupil size reflect changes in cognitive processing load during task performance and provide information about processing resources. specifically, more difficult cognitive tasks were found to cause both an increase in the amplitude and the latency of pupillary dilation (beatty, 1982). these early experiments were carried out with relatively onerous experimental processes that involved developing large quantities of photographs taken by cameras in precisely controlled environments and then manually measuring pupil size with a ruler. this made large-scale experiments impractical. modern technology has allowed researchers to electronically collect pupil size data with stationary as well as mobile devices, obviating the need for time-consuming manual measurement and allowing for less stringent experimental environments. some of the previously described studies that used arithmetic problems have szulewski et al | f l r 58 now been replicated with the new technology, showing similar results. figure 1 is taken from one of these studies that used a mobile eye-tracker to capture participant pupil size at a rate of 30hz during arithmetic problem solving. as was first shown in the original experiments, the new technology also demonstrated that pupil size increased with increasing problem difficulty and changed predictably with phases of information gathering and delivery of responses (szulewski, fernando, baylis, & howes, 2014). figure 1. “difficult questions resulted in peak dilation of 11.8% compared to baseline whereas “easy” questions resulted in peak dilation of 5.0% compared to baseline (p = 0.005). time 0 to 3 seconds serves as baseline (3 seconds prior to question presentation); time 3 to 8 seconds corresponds to the time that the question was on the screen; time 8 to 11 seconds corresponds to the 3 seconds after the question was removed and the black dot appeared. [from szulewski, a., fernando, s. m., baylis, j., & howes, d. (2014). increasing pupil size is associated with increasing cognitive processing demands: a pilot study using a mobile eye-tracking device. open journal of emergency medicine, 2014. reprinted with permission]. the ease of use and precision of the newer technology has expanded the role of pupillometry to more theoretical realms. in addition to reliably demonstrating increased cognitive load with increasing question difficulty, pupillometry data have also shown that the modality of information presentation has cognitive loading effects. using a remote eye tracker, klingner, tversky, and hanrahan (2011) showed that cognitive load is higher for the same tasks when they are presented orally as opposed to visually. these experiments underscore the precision and expanded applications of this technique. other groups of researchers have also investigated the ability to measure cognitive load in novel environments using pupillometry. one such experiment by palinko, kun, shyrokov, and heeman (2010) investigated measuring mean pupil diameter change in drivers as they operated a simulated vehicle while they were involved in simultaneous spoken dialogues. pupil diameter changed as expected and the authors concluded that pupillometry was better in quantifying small changes in cognitive load in the simulator compared with other measures like lane position and steering wheel angle. results from studies like this one suggest that pupillometry can be reliably used in more true-to-life situations in addition to well controlled laboratory settings. importantly, during the driving simulator experiment, luminance varied only ± 5% in the simulated experimental environment which likely minimized the contribution of the light reflex to pupillary szulewski et al | f l r 59 changes and allowed for a relatively clean signal. changes in luminance become more of an experimental issue in real-world environments where background luminance varies to greater degrees. on the whole, these studies seem to suggest that the construct being measured with pupillometry is cognitive load, although there is research to suggest that other factors (e.g. emotion, fatigue, age, pain and certain drugs) also contribute to change in pupil size (holmqvist et al., 2011). validity evidence for the use of pupillometry to measure cognitive load specifically was recently described by szulewski, gegenfurtner, howes, sivilotti, and van merriënboer (2016). in this study, pupillometric measurements of cognitive load were compared to psychometric measurements of cognitive load across different question types, question difficulty and experience levels in a testing environment. based on the predictability of the results and the strong correlation of the measurement instruments, the authors concluded that there is validity evidence to use either psychometric or pupillometric measurements to measure cognitive load in traditional testing environments. 4. pupillometry research in medicine given the promising results of pupillary analysis in experimental settings and the increasing availability of the new technology, researchers have started to expand its use into other domains, including medicine. medicine is a particularly interesting field in which to study cognitive load given its inherent characteristics where physicians regularly make high stakes decisions, often under considerable external pressures (including time and stress). these characteristics are emphasized during non-routine emergency situations. one study that investigated critical incidents in the operating room examined anaesthesiology trainees’ pupil sizes, among other physiological responses as surrogate markers of cognitive workload (schulz et al., 2011). participants’ pupil sizes were found to increase as the severity of a critical incident increased. although this pattern held true within scenarios, the authors found that there was no difference between sessions or individuals. this was thought to be due to individual pupil variations as well as external factors like lighting. these issues raise the concern that external factors can skew pupillometric results and make it difficult to interpret the data reliably in real-world environments where luminance is not adequately controlled. all real-life physician-patient clinical encounters would, as a result, be affected. the main issue in these scenarios involves the light reflex which is capable of causing pupil diameter changes of up to 120% from baseline, which is far greater than the changes of up to 20% that can be attributed to cognitive processing demands. (holmqvist et al., 2011; laeng et al., 2012). in an effort to mitigate the confounding effects of the light reflex, investigators often try to control for luminance during their experiments. zheng, jiang, and atkins (2015) did just this and were able to confirm that pupil responses behaved as expected with changing sub-task difficulty in a simulated laparascopic surgical experiment. in a related study, the same group noted that the rate of change of pupil size was better than pupil diameter in assessing mental workload of the simulated laparscopic task (jiang, zheng, tien, & atkins, 2013). to address some of these issues, new techniques (like the index of cognitive activity) have been designed to separate out the light reflex from pupil changes secondary to cognitive workload by measuring abrupt discontinuities in the pupil size signal (marshall, 2002). this index of cognitive activity has been utilized in the objective assessment of surgical skill where pupil size (along with other eye and pupillary metrics) was used to objectively classify non-expert from expert surgeons in environments that were uncontrolled for luminance including a simulator as well as a live operating room (richstone et al., 2010). szulewski et al | f l r 60 5. pupillometry and expertise performance on tests has universally been utilized to measure the construct of ability, intelligence, competence or expertise in a domain. despite its wide use, test-taking is known to have many limitations as a surrogate marker to measure these constructs. applications of pupillometry have allowed researchers to delve deeper into this area than simply examining performance. this is particularly interesting when considering cognitive processing as subjects answer questions correctly. based on the traditional view of assessment in test-taking, two individuals who get the same score on a test are thought to have equal domainspecific skill (or ability or expertise). the reality is more nuanced. figure 2 is taken from a study by ahern and beatty (1979) which shows the cognitive processing demands (as measured by pupillometry) of participants with both high and low intelligence as defined by scholastic aptitude test scores as they were faced with arithmetic problems (and answered these problems correctly). based on traditional assessment modalities, both groups of individuals would be assessed equally for having answered correctly. a closer analysis, however, revealed that the group with “lower intelligence” had greater increases in pupillary dilation than the “higher intelligence” group at all question difficulty levels. essentially, the group with the lower intelligence had to “think harder” to achieve the same correct response as the group with higher intelligence. figure 2. averaged task-evoked pupillary responses for correctly solved problems at three levels of difficulty for subjects in the high and low groups of psychometrically measured intelligence. at all difficulty levels, larger pupillary responses are observed for subjects in the low group. [from ahern, s., & beatty, j. szulewski et al | f l r 61 (1979). pupillary responses during information processing vary with scholastic aptitude test scores. science, 205(4412), 1289-1292. reprinted with permission from aaas.] moving away from intelligence defined by standardized testing, a study by szulewski et al. (2015) found similar results in novices and trained physicians as they answered clinically-based multiple choice questions. the participant groups in this study were divided not by intelligence, but by clinical experience. those with more clinical experience (the trained physician group) had smaller changes in pupil diameter as they answered the questions compared to the more novice group when both groups answered correctly (see figure 3). in another study, tien et al. (2015) found that junior surgeons had greater pupil sizes than expert surgeons during open inguinal hernia repair. both of these studies (which divided physician participant groups based on experience level) emphasize that those with less experience expend more cognitive load than those with more experience when they perform domain-specific tasks, even when the measured outcome is the same. although it is reasonable to assume that these observed differences are due to different experience levels, one might argue that there may be other confounding factors between the groups that could skew the results. this is a potential issue for any cross-sectional study. a study by richstone et al. (2010) suggests that it is in fact experience/expertise that is responsible for the pupillometric changes between groups, as opposed to another confounder. as part of their study, they examined one non-expert surgeon three times over the course of 18 months both in simulated and live surgical environments. during this longitudinal analysis, they found that it became increasingly difficult to differentiate this non-expert from the expert surgeon group as his pupil metrics became more expert-like over time with increased training and experience. this finding suggests that the differences between groups of participants are in fact due to skill or expertise, as opposed to another confounding factor. overall, these studies suggest that there is empirical evidence that those with more domain-specific experience exhibit a certain cognitive efficiency as they perform tasks associated with their training and experience that their novice counterparts have not yet developed. figure 3. results of an analysis of correctly answered clinical multiple-choice questions. the increase in the pupil diameter of novices was significantly greater than that of trained physicians (p < 0.001). [from szulewski, a., roth, n., & howes, d. (2015). the use of task-evoked pupillary response as an objective measure of cognitive load in novices and trained physicians: a new tool for the assessment of expertise. academic medicine, 90(7), 981-987. reprinted with permission from aamc.] szulewski et al | f l r 62 it is debatable whether differences in cognitive efficiency are relevant during a test where a student is asked a sequence of questions and he/she generally focuses all of his/her working memory onto the question at hand before moving onto the next one. moreover, it is debatable whether an assessor would even want to know this information. arguably, however, this cognitive efficiency in the “more intelligent” or “more skilled” or “more experienced” or “more expert-like” group becomes relevant in complex situations with competing priorities, as the less cognitively strained individual will have a greater proportion of his/her working memory available for other cognitively demanding executive functions. one such area where competing priorities often coexist and where cognitive efficiency might be beneficial is clinical medicine. during medical emergencies in particular, a physician team leader is cognitively tasked not only with making appropriate medical decisions but also employing crisis resource management techniques (like leadership skills, situational awareness, communication skills and resource utilization) to optimize patient care (hicks, bandiera, & denny, 2008). it logically follows that cognitive efficiency in medical decision-making will more readily allow the physician leader to perform these simultaneous crisis resource management tasks to a higher level given the real constraints of human working memory. anecdotally, cognitive efficiency seems to evolve with experience. the “anatomy” of working memory is thought to change with the development of expertise and it is likely that certain clinical tasks cognitively load experts and novices in different ways (szulewski et al., 2015). this evolution of the thinking process is tied to expertise development. 6. typical experimental conditions for pupillometry studies as outlined in this review, researchers have successfully used pupillometry as a cognitive load measurement tool in numerous experimental conditions. these range from relatively simple experiments where pupillometric data are gathered as participants are presented with written or verbal questions and are tasked to solve problems in fields including arithmetic and language, among others. other experiments involve the use of different stimuli including photographs or even simulated driving environments. in medical applications, pupillometry has similarly been used in various settings including test-taking as well as more high fidelity environments like simulation and actual physician-patient clinical encounters. the task instructions provided to participants are equally variable and range from solving provided problems to performing operations in live surgical environments. 7. constraints of pupillometry though it is clear that pupillometry provides useful information about both visual as well as nonvisual stimuli, the technology has a number of constraints. until relatively recently, accurate pupillometry studies required cumbersome experimental environments and tedious data collection and analysis. although some of these issues have been addressed with new technology, the cost of this technology poses new financial barriers for certain researchers on smaller budgets. this is especially relevant for those part-time researchers who may want to incorporate pupillometry into their professional and teaching duties, like academic physicians. this reality suggests that, for the time being, given the costs, pupillometry research is more likely to occur at a theory-building level. as a result, some of its potential benefits in adjusting task difficulty for an individual learner and individualizing and optimizing education will remain elusive until the technology becomes cheaper and more readily available for teachers. szulewski et al | f l r 63 another significant constraint of the technology relates to the accommodation and light reflexes. although pupillometry provides consistent and fairly easy-to-interpret data in experimental conditions of constant ambient light and focus distance, data output in real-world conditions is suboptimal. as previously discussed in this review, the index of cognitive activity has been designed in an effort to overcome some of these obstacles. although this technique allows for extraction of valuable information from large data sets with changing ambient light, the results are more coarse and provide less precise and detailed information about shorter-term cognitive changes that might be relevant in studying precise comparisons between groups performing shorter tasks (klingner, kumar, & hanrahan, 2008). in addition, because this metric is a commercial product and its algorithm is not made publicly available, it cannot be replicated nor adequately studied. as a result, it is of limited benefit to researchers. another consideration in pupillometry research is participant age. older individuals generally have pupils that are smaller and are more restricted in their ability to dilate compared to younger people (holmqvist et al., 2011; piquado, isaacowitz, & wingfield, 2010). since many studies compare cognitive load between novices (who are usually younger) to experts (who are generally older), this might lead to confounding, as a smaller pupil diameter change may be due to a combination of increased age as well as decreased cognitive load. cognitive load researchers should be aware of this issue and either control for participant age (where possible) or correct for it. correction measures include expressing pupil size changes relative to a baseline measurement and/or age-adjusting for pupil size and reactivity based on participants’ pupil responsiveness to a range of experimental light stimuli (piquado et al., 2010). finally, the accuracy of pupillometric measurement is dependent to some degree on gaze position, with greater systematic error occurring when the eye is looking away from the eye-tracker’s camera (brisson et al., 2013). different eye-tracking devices attempt to correct for this error, but the accuracy of pupillometric measures suffers from variable quality under these conditions. this is especially relevant for researchers studying cognitive load where the participant’s gaze may move away from centre. 8. conclusion pupillometry is a robust and reliable method for studying cognitive load. since its inception as a scientific field in the 1960’s, it has evolved greatly. the development of new technology to measure pupil size that can electronically gather pupil data at high rates has led to the increased use of pupillometry in diverse fields. despite the inherent constraints of the technique including interference by luminance and its cost, pupillometry remains a promising metric for researchers to utilize in the study of cognitive load. it can provide insights into the human thinking process that would otherwise be unobservable. it has a particularly promising role in the field of medicine and in the study of physician expertise development. utilizing pupillometry to better understand and optimize physician cognitive load (and overload) is clinically relevant and has the potential to directly impact medical education and ultimately patient care. keypoints pupillometry is a robust method of quantifying cognitive load. otherwise unobservable insights into cognitive processes can be gleaned with the use of pupillometry. pupillometry research in medicine is contributing to a better understanding of expertise development across medical domains. szulewski et al | f l r 64 despite its benefits, pupillometry data in real-world applications suffers in quality as a result of the light and accommodation reflexes. references ahern, s., & beatty, j. (1979). pupillary responses during information processing vary with scholastic aptitude test scores. science, 205(4412), 1289-1292. aston-jones, g., & cohen, j. d. (2005). an integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. annu. rev. neurosci., 28, 403-450. doi: 10.1146/annurev.neuro.28.061604.135709 beatty, j. (1982). task-evoked pupillary responses, processing load, and the structure of processing resources. psychological bulletin, 91(2), 276. beatty, j., & kahneman, d. (1966). pupillary changes in two memory tasks. psychonomic science, 5(10), 371-372. brisson, j., mainville, m., mailloux, d., beaulieu, c., serres, j., & sirois, s. (2013). pupil diameter measurement errors as a function of gaze direction in corneal reflection eyetrackers. behavior research methods, 45(4), 1322-1331. doi:10.3758/s13428-013-0327-0 burgess, p. w., & simons, j. s. (2005). theories of frontal lobe executive function: clinical application. in p. w. halligan & d. t. wade (eds.), effecitveness of rehabiliation for cognitive deficits (pp. 211231). new york: oxford univeristy press. diamond, a. (2013). executive functions. annual review of psychology, 64, 135-168. doi:10.1146/annurevpsych-113011-143750 ericsson, k. a., & kintsch, w. (1995). long-term working memory. psychological review, 102(2), 211. hess, e. h., & polt, j. m. (1960). pupil size as related to interest value of visual stimuli. science, 132(3423), 349-350. hess, e. h., & polt, j. m. (1964). pupil size in relation to mental activity during simple problem-solving. science, 143(3611), 1190-1192. hicks, c. m., bandiera, g. w., & denny, c. j. (2008). building a simulation‐ based crisis resource management course for emergency medicine, phase 1: results from an interdisciplinary needs assessment survey. academic emergency medicine, 15(11), 1136-1143. doi: 10.1111/j.15532712.2008.00185.x holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2011). eye tracking: a comprehensive guide to methods and measures: oup oxford. jiang, x., zheng, b., tien, g., & atkins, m. (2013). pupil response to precision in surgical task execution. studies in health technology and informatics, 184, 210. kahneman, d., tursky, b., shapiro, d., & crider, a. (1969). pupillary, heart rate, and skin resistance changes during a mental task. journal of experimental psychology, 79(1p1), 164. klingner, j., kumar, r., & hanrahan, p. (2008). measuring the task-evoked pupillary response with a remote eye tracker. paper presented at the proceedings of the 2008 symposium on eye tracking research & applications. klingner, j., tversky, b., & hanrahan, p. (2011). effects of visual and verbal presentation on cognitive load in vigilance, memory, and arithmetic tasks. psychophysiology, 48(3), 323-332. doi: 10.1111/j.14698986.2010.01069.x laeng, b., sirois, s., & gredebäck, g. (2012). pupillometry a window to the preconscious? perspectives on psychological science, 7(1), 18-27. doi: 10.1177/1745691611427305 lang, g. k. (2015). ophthalmology: thieme. leppink, j., paas, f., van gog, t., van der vleuten, c. p., & van merriënboer, j. j. (2014). effects of pairs of problems and examples on task performance and different types of cognitive load. learning and instruction, 30, 32-42. szulewski et al | f l r 65 marshall, s. p. (2002). the index of cognitive activity: measuring cognitive workload. paper presented at the human factors and power plants, 2002. proceedings of the 2002 ieee 7th conference on. paas, f., tuovinen, j. e., tabbers, h., & van gerven, p. w. (2003). cognitive load measurement as a means to advance cognitive load theory. educational psychologist, 38(1), 63-71. doi: 10.1207/s15326985ep3801_8 palinko, o., kun, a. l., shyrokov, a., & heeman, p. (2010). estimating cognitive load using remote eye tracking in a driving simulator. paper presented at the proceedings of the 2010 symposium on eyetracking research & applications. piquado, t., isaacowitz, d., & wingfield, a. (2010). pupillometry as a measure of cognitive effort in younger and older adults. psychophysiology, 47(3), 560-569. doi:10.1111/j.1469-8986.2009.00947.x richstone, l., schwartz, m. j., seideman, c., cadeddu, j., marshall, s., & kavoussi, l. r. (2010). eye metrics as an objective assessment of surgical skill. annals of surgery, 252(1), 177-182. doi: 10.1097/sla.0b013e3181e464fb schulz, c., schneider, e., fritz, l., vockeroth, j., hapfelmeier, a., wasmaier, m., . . . schneider, g. (2011). eye tracking for assessment of workload: a pilot study in an anaesthesia simulator environment. british journal of anaesthesia, 106(1), 44-50. doi: 10.1093/bja/aeq307 sweller, j. (2010). element interactivity and intrinsic, extraneous, and germane cognitive load. educational psychology review, 22(2), 123-138. doi: 10.1007/s10648-010-9128-5 szulewski, a., fernando, s. m., baylis, j., & howes, d. (2014). increasing pupil size is associated with increasing cognitive processing demands: a pilot study using a mobile eye-tracking device. open journal of emergency medicine, 2014. doi: 10.4236/ojem.2014.21002 szulewski, a., gegenfurtner, a., howes, d. w., sivilotti, m. l. a., & van merriënboer, j. j. g. (2016). measuring physician cognitive load: validity evidence for a physiologic and a psychometric tool. advances in health sciences education, 1-18. doi: 10.1007/s10459-016-9725-2 szulewski, a., roth, n., & howes, d. (2015). the use of task-evoked pupillary response as an objective measure of cognitive load in novices and trained physicians: a new tool for the assessment of expertise. academic medicine, 90(7), 981-987. doi: 10.1097/acm.0000000000000677 tien, t., pucher, p. h., sodergren, m. h., sriskandarajah, k., yang, g.-z., & darzi, a. (2015). differences in gaze behaviour of expert and junior surgeons performing open inguinal hernia repair. surgical endoscopy, 29(2), 405-413. doi: 10.1007/s00464-014-3683-7 whipple, b., ogden, g., & komisaruk, b. r. (1992). physiological correlates of imagery-induced orgasm in women. archives of sexual behavior, 21(2), 121-133. young, j. q., van merrienboer, j., durning, s., & ten cate, o. (2014). cognitive load theory: implications for medical education: amee guide no. 86. medical teacher, 36(5), 371-384. doi: 10.3109/0142159x.2014.889290 zheng, b., jiang, x., & atkins, m. s. (2015). detection of changes in surgical difficulty: evidence from pupil responses. surgical innovation, 22(6), 629-635. doi: 10.1177/1553350615573582 frontline learning research vol.5 no. 3 special issue (2017) 31 42 issn 2295-3159 elizabeth a. krupinski, phd, department of radiology & imaging sciences, emory university 1364 clifton rd ne d107 atlanta, ga30322, united states. email: ekrupin&emory.edu. doi: http://dx.doi.org/10.14786/flr.v5i3.250 receiver operating characteristic (roc) analysis elizabeth a. krupinski emory university, usa article received 13 april / revised 23 march / accepted 23 march / available online 14 july abstract visual expertise covers a broad range of types of studies and methodologies. many studies incorporate some measure(s) of observer performance or how well participants perform on a given task. receiver operating characteristic (roc) analysis is a method commonly used in signal detection tasks (i.e., those in which the observer must decide whether or not a target is present or absent; or must classify a given target as belonging to one category or another), especially those in the medical imaging literature. this frontline paper will review some of the core theoretical underpinnings of roc analysis, provide an overview of how to conduct an roc study, and discuss some of the key variants of roc analysis and their applications. keywords: receiver operating characteristic analysis, roc, observer performance, visual expertise http://dx.doi.org/10.14786/flr.v5i3.250 krupinski | f l r 32 1. introduction visual expertise can be measured or assessed in a variety of ways, but in many cases it is the behavioral outcome related to visual expertise that is of ultimate concern. for example, in medical imaging (e.g., radiology, pathology, telemedicine) a physician must view an image (e.g., x-ray exam, pathology slide, photograph of a skin lesion) and render a diagnostic decision (e.g., tumor present or absent), prognosis (e.g., malignant vs benign) and/or a recommendation for a treatment plan (e.g., additional exams or surgery). in the military, radar screens need to be monitored for approaching targets (e.g., missiles) and decisions made as to whether they are friend or foe and whether to escalate the finding to the next level of action. there are many other situations where these types of visual detection and/or classification tasks take place in real life, and where investigations take place to assess how expertise impacts these decisions and what we can do to improve them. in medicine this is particularly important as when decision errors are made they have direct and significant impacts on patient care and well-being [1-2]. the problem with “real life” is that is often very difficult to determine how well the interpreter is actually performing. feedback is often not provided and when it is it is often separate in time from the actual decision making event – often quite disparate in time, making the feedback less impactful. compounding the problem is that there is often not a single correct answer. thus, the majority of performance assessments are done in the context of research and are often focused on assessing observer performance in the context of comparing a new technique or technology to an existing one. to deal with the complex nature of decision interpretation tasks where it is important to understand and balance the consequences both correct and incorrect decisions, receiver operating characteristic (roc) analysis is a very valuable tool. it is important throughout this review to keep in mind that performance is not a constant. one’s performance on any given task will vary as a function of a number of factors and thus the metrics and principles discussed will reflect those changes and differences. for example, when someone first learns a decision task, their criteria for rendering a decision may be based more on didactic learning that they have engaged in and the exact examples of the task they have encountered to date. as expertise grows and they encounter more and varied examples, the criteria they use in their decisions is likely to change as their knowledge and skill develops. even when one is an expert, changes in the environment, the stimuli and the consequences of the decisions rendered may change, making it necessary to adjust one’s criteria to the new situation. for example, on a chest x-ray image cancer looks like a white spot(s) on the lung and even residents in training quickly learn to detect and diagnose lung cancer. however, in the southwest united states and other desert regions there is a condition known as valley fever (in infection caused by the fungus coccidioidomycosis) that appears in the lungs as a white spot(s). radiologists who move to places like arizona where valley fever is quite common go through a period of criteria adjustment – initially calling nearly everything cancer (high false positive rate) but soon learning to distinguish valley fever from lung cancer as they see more cases and learn the discriminating features. 2. basics of decision making for roc roc analysis was developed in the early 1950s based on principles from signal-detection theory for evaluation of radar operators in the detection of enemy aircraft and missiles [3-4], and additional contributions were thereafter made by researchers in engineering, psychology, and mathematics [5-7]. roc was introduced into medicine in the 1960s by lee lusted [8-11], with significant efforts devoted to gaining a better understanding of decision-making [12-15]. this entrée into medicine was the result of a series of studies in radiology that began soon after world war ii to determine which of four radiographic and fluoroscopic techniques (e.g., radiographic film vs slides) was better for tuberculosis (tb) screening [16-17]. the goal was to find a single imaging technique that would outperform all the others (in other words allow krupinski | f l r 33 the radiologists to reach the correct decision). instead what they found was that the intra-observer and interobserver variation was unexpectedly so high that it was impossible to determine which one was best. this was unexpected as until then it was presumed that given the same image data all radiologists looking at the images would be seeing the same diagnostic features and information, detecting the tb if it was present, and rendering the same diagnostic decision. the idea that everyone may “see” something different in the images, perhaps as a function of experience or expertise, had never been considered. thus, it was necessary to build systems that could generate better images so radiologists’ performance could improve (i.e., reduce observer variability), and develop methods to evaluate these new systems and assess their impact on observer performance. although there are newer methods that allow for more than two decisions in the roc task environment [18-19], roc is traditionally a binary decision task – target/signal (e.g., lesion, disease, missile) present versus target/signal absent, or in the case of classification rather than detection the target/signal belongs to class 1 (e.g., cancer, enemy) or class 2 (e.g., not cancer, friend). for roc analysis, these two conditions must be mutually exclusive. there must also be some sort of “truth” or gold standard for each option. in radiology for example, pathology is often used as the gold standard. in cases where there is no other definitive test or method for determining the truth, panels of experts are often used to establish the gold standard [20-21]. given the truth and the decisions of the observers in the study, a 2x2 table readily summarizes all four possible decisions: true positive (tp) (target present, observer reports as present), false negative (fn) (target present, observer reports as absent), false positive (fp) (target absent, observer reports as present), and true negative (tn) (target absent, observer reports as absent). the tp and tn decisions are correct while the fn and fp decisions are incorrect. 2.1 common performance metrics suppose you have an observer who is an expert skilled at visually detecting a specific poisonous frog in the jungle versus a similar but non-poisonous frog. knowing that she can make the correct decision is important because she is your guide on an expensive eco-jungle tour and if she says a given cute little frog is not poisonous but in reality is, you might reach out to touch it with potentially fatal consequences. in radiology, one of the common sources of litigation is mammography – mammographers either missing potential breast cancers or overcalling benign conditions as malignant causing undo stress and anxiety. real life examples of important decision tasks abound, all of which require careful assessment of correct and incorrect decisions. from the basic 2x2 matrix of 4 decisions described above come some key metrics often used to assess performance in visual expertise and other observer performance studies. the two most commonly used are sensitivity and specificity. sensitivity reflects the ability of the observer to correctly classify the target present stimuli (e.g., x-ray or other images) and is calculated as: sensitivity = tp/(tp + fn) (2.1) specificity reflects the ability of the observer to correctly classify the target absent stimuli and is calculated as: specificity = tn/(tn + fp) (2.2) when you combine these decisions, a measure of accuracy can be obtained as: accuracy = (tp + tn)/ (tp + fn + tn + fp) (2.3) two other common metrics that are used in medicine are positive and negative predictive value: positive predictive value (ppv) = tp/(tp + fp) (2.4) negative predictive value (npv) = tn/(tn + fn) (2.5) krupinski | f l r 34 in general, there is a trade-off between sensitivity and specificity – as you increase one you decrease the other. in other words, if you want to detect more targets (high sensitivity) it often occurs as a result of making more false positives (decreased specificity). why would you want to use sensitivity/specificity versus ppv/npv? basically, the former are independent of the prevalence of targets in the case sample while the latter are not. an example might be useful. suppose you have an observer who is an expert skilled at visually detecting a specific poisonous frog in the jungle versus a similar but non-poisonous frog and her sensitivity is 95% and specificity 80%. in jungle #1 there are 1000 frogs total with a prevalence of 50% poisonous (n = 500). in jungle #2 there are also 1000 frogs total but only 25% are poisonous (n = 250). based on this, our observer has the following performance levels. it can be seen that depending on which jungle the observer study is conducted, the performance even of an expert observer will differ. jungle #1: tp = 475 fn = 25 fp = 100 tn = 400 accuracy = (475 + 400)/(475 + 25 + 100 + 400) = 0.88 or 88% ppv = 475/(475 + 100) = 0.83 or 83% npv = 400/(400 + 25) = 0.94 or 94% jungle #2: tp = 238 fn = 12 fp = 150 tn = 600 accuracy = (238 + 600)/(238 + 12 + 150 + 600) = 0.84 or 84% ppv = 238/(238 + 150) = 0.61 or 61% npv = 600/(600 + 12) = 0.98 or 98% in many cases sensitivity and specificity are more than adequate measures of performance for visual search tasks, but it becomes complicated when the test sets contain cases with a range of difficulty levels. for example, in radiology some bone fractures are very obvious and thus easy to detect but others are quite subtle and can readily be missed. in the frog example, a bright red frog in a green jungle is likely easier to detect than a light green frog in a dark green jungle. in cases that are not obvious the decision as to whether or not the target is actually there becomes less certain and observers may not be willing to give a binary yes/no present/absent decision, but may be more willing to report their decision as a function of confidence, for example reporting a target (or lack of target) as definitely present, probably present or possibly present. in other words, the observer’s decision threshold for reporting can change as a function of many things, including but not limited to the nature of the target, target prevalence, background complexity within which the target is embedded, number and type of targets, and observer experience or expertise. this is where roc analysis becomes useful. 2.2 the roc curve even the visual expert may not perform as one would think without delving further into the nature of the task. for example, in radiology decision thresholds can change within and between observers as a function of the nature of the task and its consequences. in chest ct exam interpretation a radiologist may adopt a very conservative threshold for reporting possible abnormalities in the lungs that could be cancer nodules but are less than 5 mm in size because they know that ct is typically the best imaging exam to do (i.e., no other follow-up imaging options) and obtaining a biopsy on such a small target is very difficult (potentially leading to a pneumothorax or puncture of the lung) and unlikely to yield a specimen large enough to get an accurate biopsy on. instead of reporting it, the radiologist may recommend a 6-month watch period and another ct exam. in mammography however, small lesions are easier to biopsy and there is no risk of puncturing a lung or other vital structure, so mammographers tend to be more liberal in their reporting of potential cancerous findings. this comes at the risk of more false positives but before doing a biopsy additional x-ray images or an ultrasound is often recommended, reducing the impact of a false positive even further. krupinski | f l r 35 roc analysis and the resulting roc curve is a method that captures the relationship (i.e., trade-offs) between sensitivity and specificity as well as the range of decision thresholds that every observer has no matter what their level of expertise and experience. the roc curve (figure 1) is a graphical representation of this relationship, plotting sensitivity (the tp fraction) versus 1-specificity (the fp fraction or 1 – tn/(tn + fp) = fp/(fp + tn)) for all possible decision thresholds. the axes go from 0 to 1 since sensitivity and specificity are typically represented as proportions. the diagonal line is chance performance or guessing and the curves indicate increasingly better performance moving to the upper left corner which represents perfect performance. thus in terms of visual expertise, one would expect that those with more expertise would tend to have curves closer to the upper left corner. the fact that those with more expertise tend to have curves closer to the upper left corner is true, but as noted above much depends on the decision thresholds that the individual observer has for a given task. each point on a given curve represents a specific tp and fp fraction or decision threshold setting. for example, on the “good” curve, the plus sign represents a rather conservative decision maker – not reporting a lot of targets (low sensitivity) but also not making a lot of false positives (high specificity or low 1specificity). the cross sign represents a more liberal decision maker – reporting a lot of targets (high sensitivity) but making a lot of false positives (low specificity or high 1-specificity). referring back to the example of the radiologist moving to arizona and overcalling valley fever as cancer, the figure could just as well represent his/her learning or adjustment period, in which the lower curve is when they first move and overall cancers and the upper curve is their performance after a few months of seeing more exemplars of valley fever so better discrimination is possible. figure 1. example of a typical roc curve plotting sensitivity vs 1-specificity. the diagonal line is chance performance or guessing and the curves indicate increasingly better performance moving to the upper left corner which represents perfect performance. 2.3 how to plot an roc curve it is rare that someone actually plots an roc curve by hand as there are a number of software programs available both in commercial statistical packages and freely downloadable from research web sites. however, it is useful to see how the confidence ratings discussed above lead to the generation of an roc curve. thus, suppose you ran a visual detection experiment and the observer was required to identify whether or not a given image had a target (n = 50) or not (n = 50) and then report confidence in that decision as definite, probable or possible. this yields 6 categories of responses where 1 = absent, definite and 6 = present, definite. you can then create a table (table 1) showing the distribution of responses as a function of truth (whether or not the image actually contains a target). it should be noted that continuous rating scales (e.g., 0 – 100) can be used as well and methods exist for generating operating points (decision thresholds) and plotting these as well [22]. krupinski | f l r 36 table 1 an example of the distribution of confidence scores for a subject in an observer performance study with a 6point confidence scale and images with a target present or absent (truth). truth 1 2 3 4 5 6 present 2 3 2 5 20 18 absent 16 15 10 4 3 2 the sensitivity, specificity and fp fraction can then be determined at each threshold or cutoff point as in table 2. table 2 sensitivity, specificity and fp fraction can then be determined at each threshold or cutoff point. result positive is > sensitivity specificity fp fraction 2 probably absent 0.96 (48/50) 0.32 (16/50) 0.68 3 possibly absent 0.90 (45/50) 0.62 (31/50) 0.38 4 possibly present 0.86 (43/50) 0.82 (41/50) 0.18 5 probably present 0.76 (38/50) 0.90 (45/50) 0.10 6 definitely present 0.36 (18/50) 0.96 (48/50) 0.04 the roc plot can then be generated plotting sensitivity on the y-axis and 1-specificity (fp fraction) on the x-axis. figure 2. roc curve generated from the data in table 2. in terms of fitting an actual curve to the data points in the roc plot, there are a variety of methods. simply “connecting the dots” is the empirically based version but it creates a stepped or jagged plot. a smooth curve reflecting the theoretical “true” curve is much more desirable. there are basically two ways to approach generating the curve – parametric and non-parametric [22-28]. the non-parametric approach does not have any assumptions about the structure of the underlying data distribution and essentially smooths the histograms of the output data for the two classes. the parametric methods do rely on the validity of the underlying distribution assumptions. most researchers prefer the parametric approaches and much of the available software uses these approaches as well. 2.4 interpreting the roc curve there are a few key metrics used to interpret the roc curve and characterize observer performance. the most common one is the area under the curve (auc or az). as noted above the diagonal line in the roc plot represents chance or guessing performance and it clearly divides the roc space into two halves thus representing an auc of 0.5. the top left corner is perfect performance and encompasses all of the area krupinski | f l r 37 below it, thus auc is 1.0. any curve lying between chance and perfect performance will have a value between 0 and 1.0, with better performance having values closer to 1.0. as with generation of the curve itself, there are a variety of methods to calculate auc [22-29] and most programs use one of these methods. partial auc acknowledges that the more traditional auc is often not appropriate, as not all decision thresholds or operating points are equally important [30-31]. in other words, in real life observers may not actually operate at certain thresholds for one reason or another. in medicine for example, a diagnostic test with low specificity (a high false positive rate) may not be clinically acceptable. in this case it may be useful to select a given (acceptable) fp rate, determine its associated sensitivity (tp rate), and then calculate the area under the curve only up to that operating point (i.e., capturing only a part of the total auc). partial auc is very common in the development of computer-aided detection and discrimination algorithms for medical imaging. other metrics used less often are d´, de´, m, and zk [32-33]. 2.5 comparing roc curves although a single roc curve is common, quite often studies are designed to compare performance, for example between experts and novices on a given visual task. thus there are two curves – one for experts and one for novices – and the question is whether there is a significant difference in performance (auc typically) or not. visually it is not always possible to tell if the difference is significant. this is especially true when the roc curves cross at some point (usually the upper right quadrant/corner) as in figure 3 [3435]. therefore, statistical methods have been developed, some for comparing only two curves and some for comparing multiple curves. again, there are parametric and non-parametric options [22-23, 36-40]. one of the most common methods for comparing multiple observers and multiple cases is the multi-reader multicase method developed by dorfman, berbaum and metz [39]. figure 3. example of 2 roc curves that cross. figure 4 is an example of the output from an mrmc analysis on a study that had 6 observers viewing a series of images in 2 conditions (different computer monitors for displaying medical images). the visual task was to search for subtle fractures in bone x-ray images. readers 1, 3 and 5 were expert radiologists and 2, 4 and 6 were resident trainees. the upper portion provides the auc values for each observer in each condition, followed by the difference between the two conditions. auc values are usually reported in publications out to three decimal points maximum. the lower portion shows the results of the analysis of variance (anova) comparing the aucs in the standard anova output format. the actual output document provides more information than provided here, such as the variance components and errorcovariance estimates, and different ways of treating the various variables (e.g., random readers and random cases, fixed readers and random cases, random readers and fixed cases), but this example shows how many available programs output relevant data comparing roc curves. the analyses can also take into account krupinski | f l r 38 level of expertise by comparing the two groups of observers as described above (in this case there were differences but it did not reach statistical significance). figure 4. an example of the output from an mrmc analysis. 3. other types of roc the discussion of roc analysis up to this point has been about tasks that involve the detection of one target and for the most part assess fps only as they occur in target absent images (again 1 per). in real life however, scenes and other visual stimuli often contain multiple targets and fps can occur in both target present and target absent stimuli. traditional roc analysis also typically does not ask or require the observer to locate the target once it has been detected. thus there is always some question (unless the target appears in a specific location (e.g., center of the display) every time) as to whether the observer actually detected the true target (tp) or called something else in the image (fp) thereby actually missing the true target (fn). the earliest attempt to account for location in roc tasks was the lroc (location roc). in this method, the observer provides a confidence rating that somewhere in the image there is a target, then marks the location of the most suspicious region [41-42]. lroc is an advantage over roc in that it takes location into account, but it still only allows for a single target. to account for multiple targets, free response roc (froc) was developed in which observers mark different locations and provide a confidence rating for each mark [43-46]. the problem with froc is that when the roc is plotted the x-axis, rather than going from 0 to 1 like the traditional rfoc curve, goes from 0 to infinity (based on however many fps are reported). this makes calculating the area under curve quite difficult and the comparison of two curves even more challenging. the alternative froc (afroc) method was developed to address issue, creating a plot that has both axes going from 0 to 1 [47]. the jackknife afroc (jafroc) method was then developed to allow for generalization to the population of readers and cases, in the same way that mrmc roc does [4446]. krupinski | f l r 39 4. other considerations in addition to deciding which type of roc analysis is best (which really depends on the hypotheses, nature of the task, types and number of images and targets), there are two other aspects that are typically important. as already discussed above, the truth or gold standard for cases must be known in advance. with simple psychophysical studies this is quite easy but for real life images (scenes, medical images, industrial images, satellite images) this can be more difficult. other considerations when selecting cases include: how subtle or obvious the targets are, how much and what type of background “noise” is in the image, where the targets are located (random or in specified locations), the size(s) of the targets and how much background image is included, how long the images will remain available for viewing, whether the images can be manipulated (e.g., zoom/pan or window/level) by the observers, and target prevalence as noted above. sample size is the other key issue with respect to setting up an roc study – how many images and observers are required to achieve adequate power once the study is completed. as with any other power calculation, sample size will depend on a number of factors including the metric under consideration (e.g., auc or partial auc) and the design (e.g., repeated measures with the same observers viewing the same images in two or more conditions or different readers viewing the images in different conditions). there are a number of key papers describing methods to calculate sample sizes for various study designs, many of which include representative tables showing sample sizes required for different power estimates [48-51]. some of the available roc programs also include a power calculator and some will provide power in the analysis output. 5. software programs it is not possible to list all of the available software programs for roc analysis as there are always new ones being released. there are however some reliable sites where the more commonly used programs can be found. the medical image perception laboratory web site [52] at the university of iowa and the roc software site at the university of chicago [53] contain the programs developed by that team (dorfman, berbaum, metz, hillis) including the mrmc roc, rocfit, labroc4, corroc, indroc, rocket, labmrmc, proproc, rscore, bigamma, rscore-j and sas programs to perform sample size estimates. software for froc, afroc and jafroc (chakraborty) analyses are available as well [54]. some commercial statistical software also has modules for roc analysis [55-59]. keypoints roc analysis provides metrics of observer performance in visual detection and discrimination tasks area under the curve (auc) is the most commonly used metric of performance common variants of roc analysis allow for multiple targets and location accuracy key study design issues include target characteristics and establishing a gold standard software is available to conduct roc analyses krupinski | f l r 40 references analyse-it. http://analyse-it.com/docs/220/method_evaluation/roc_curve_plot.htm last accessed april 13, 2016. birkelo, c.c., chamberlain, w.e., phelps, p.s. (1947). tuberculosis case finding. a comparison of the effectiveness of various roentgenographic and photofluorographic methods. jama, 133, (6), 359-366. pmid: 20281873 bunch, p.c., hamilton, j.f., sanderson, g.k. simmons, a.h. (1978). a free-response approach to the measurement and characterization of radiographic-observer performance. j appl photogr eng, 4,166– 171. chakraborty, d.p., berbaum, k.s. (2004). observer studies involving detection and localization: modeling, analysis, and validation. med phys, 31 (8), 2313-2330. doi: 10.1118/1.1769352 chakraborty, d.p. (2005). recent advances in observer performance methodology: jackknife free-response roc (jafroc). rad protect dosim, 114 (1), 26-31. doi: 10.1093/rpd/nch512 chakraborty, d.p. (2006). analysis of location specific observer performance data: validated extensions of the jackknife free-response (jafroc) method. acad radiol, 13 (10), 1187-1193. doi: 10.1016/j.acra.2006.06.016 chakraborty, d.p., winter, l.h.l. (1990). free-response methodology: alternate analysis and the new observer-performance experiment. radiol, 174 (3), 873-881. doi: 10.1148/radiology.174.3.2305073 delong, e.r., delong, d.m., clarke-pearson, d.l. (19880. comparing the areas under two or more correlated receiver operating characteristics curves: a non-parametric approach. biometrics, 44 (3), 837-845. pmid: 3203132 dev chakraborty’s froc web site. http://perception.radiology.uiowa.edu/software/receiveroperatingcharacteristicroc/tabid/120/defaul t.aspx last accessed april 13, 2016. dorfmam, d.d., alf, e. (1968). maximum likelihood estimation of parameters of signal detection theory: a direct solution. psychometrika, 33 (1), 117-124. doi: 10.1007/bf02289677 dorfman, d.d., alf, e. (1969). maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals – rating method data. j math psychol, 6 (3), 487-496. doi: http://dx.doi.org/10.1016/0022-2496(69)90019-4 dorfman, d.d., berbaum, k.s., metz, c.e. (1992). receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. invest radiol, 27(9), 723731. pmid: 1399456 dorfman, d.d., berbaum, k.s., metz. c.e., lenth, r.v., hanley, j.a., dagga, h.a. (1997). proper receiver operating characteristic analysis: the bigamma model. acad radiol, 4 (2), 138-149. pmid: 9061087 edwards, d.c. (2013). validation of monte carlo estimates of three-class ideal observer operating points for normal data. acad radiol, 20 (7), 908-914. doi: 10.1016/j.acra.2013.04.002 egan, j.p. (1975). signal detection theory and roc analysis. new york, ny: academic press. faraggi, d., reiser, b. (2002). estimation of the area under the roc curve. stats med, 21 (20), 3093-3106. doi: 10.1002/sim.1228 garland, l.h. (1949). on the scientific evaluation of diagnostic procedures. radiol, 52, (3), 309-328. doi: http://dx.doi.org/10.1148/52.3.309 green, d.m., swets, j.a. (1974). signal detection theory and psychophysics. huntington, ny: krieger publishers. hajian-tilaki, k.o., hanley, j.a., joseph, l., collet, j.p. (1997). a comparison of parametric and nonparametric approaches to roc analysis of quantitative diagnostic tests. med decis making, 17 (1), 94102. doi:10.1177/0272989x9701700111 hanley, j.a. (1988). the robustness of the “binormal” assumptions used in fitting roc curves. med decis making, 8 (3), 197-203. doi: 10.1177/0272989x8800800308 hanley, j.a., mcneil, b.j. (1983). a method for comparing the areas under receiver operating characteristic curves derived from the same cases. radiol, 148 (3), 839-843. doi: 10.1148/radiology.148.3.6878708 http://analyse-it.com/docs/220/method_evaluation/roc_curve_plot.htm https://doi.org/10.1118/1.1769352 http://dx.doi.org/10.1093/rpd/nch512 https://doi.org/10.1016/j.acra.2006.06.016 https://doi.org/10.1148/radiology.174.3.2305073 http://perception.radiology.uiowa.edu/software/receiveroperatingcharacteristicroc/tabid/120/default.aspx http://perception.radiology.uiowa.edu/software/receiveroperatingcharacteristicroc/tabid/120/default.aspx http://dx.doi.org/10.1016/0022-2496%2869%2990019-4 https://doi.org/10.1002/sim.1228 http://dx.doi.org/10.1148/52.3.309 https://doi.org/10.1177/0272989x9701700111 https://doi.org/10.1177/0272989x8800800308 https://doi.org/10.1148/radiology.148.3.6878708 krupinski | f l r 41 hanley, j.a., mcneil, b.j. (1982). the meaning and use of the area under a receiver operating characteristic (roc) curve. radiol, 143 (1), 29-36. doi: http://dx.doi.org/10.1148/radiology.143.1.7063747 institute of medicine. (1999). to err is human: building a safer health care system. washington, dc: national academy press. jiang, y., metz, c.e., nishikawa, r.m. (1996). a receiver operating characteristic partial area index for highly sensitive diagnostic tests. radiol, 201 (3), 745-750. doi: 10.1148/radiology.201.3.8939225 kundel, h.l., polansky, m. (1997). mixture distribution and receiver operating characteristic analysis of bedside chest imaging with screen-film and computed radiography. acad radiol, 4 (1), 1-7. pmid: 904086. lusted, l.b. (1960). logical analysis in roentgen diagnosis. radiol, 74,178-193. doi: http://dx.doi.org/10.1148/74.2.178 lusted, l.b. (1968). introduction to medical decision making. springfield, il: charles c. thomas publishers. lusted, l.b. (1969). perception of the roentgen image: applications of signal detection theory. rad clin n am, 7, 435-459. lusted, l.b. (1971). signal detectability and medical decision making. science, 171, (3977), 1217-1219. doi: 10.1126/science.171.3977.1217 mcclish, d.k. (1989). analyzing a portion of the roc curve. med decis making, 9 (3), 190-195. doi: 10.1177/0272989x8900900307 mcneil, b.j., adelstein, s.j. (1976). determining the value of diagnostic and screening tests. j nuc med, 17, (6), 439-448. pmid:1262961 mcneil, b.j., hanley, j.a. (1984). statistical approaches to the analysis of receiver operating characteristic (roc) curves. med dec making, 4, (2), 137-150. doi:10.1177/0272989x8400400203 mcneil, b.j., keeler, e., adelstein, s.j. (1975). primer on certain elements of medical decision making. ne j med, 293, (5), 211-215. doi: 10.1056/nejm197507312930501 metz, c.e., herman, b.a., shen, j.h. (1998). maximum-likelihood estimation of roc curves from continuously-distributed data. stats med, 17 (9), 1033-1053. pmid: 9612889 metz, c.e., kronman, h.b. (1980). statistical significance tests for binormal roc curves. j math psych, 22 (3), 218-243. doi: http://dx.doi.org/10.1016/0022-2496(80)90020-6 metz, c.e., pan, x. (1999). “proper” binormal roc curves: theory and maximum-likelihood estimation. j math psych, 43 (1), 1-33. doi: 10.1006/jmps.1998.1218 nakas, c.t. (2014). developments in roc surface analysis and assessment of diagnostic markers in threeclass classification problems. revstat – stat j, 12 (1), 43-65. ncss statistical software. http://www.ncss.com/software/ncss/procedures/ last accessed april 13, 2016. obuchowski, n.a. (1997). testing for equivalence of diagnostic tests. am j roentgen, 168 (1), 13-17. doi: 10.2214/ajr.168.1.8976911 obuchowski, n.a. (1994). computing sample size for receiver operating characteristic studies. invest radiol, 29 (2), 238-243. doi:10.2214/ajr.175.3.1750603 obuchowski, n.a. (2000). sample size tables for receiver operating characteristic studies. am j roentgen, 175 (3), 603-608. doi:10.2214/ajr.175.3.1750603 obuchowski, n.a. (2004). how many observers care needed in clinical studies of medical imaging? am j roentgen, 182 (4), 867-869. doi: 10.2214/ajr.182.4.1820867 peterson, w.w., birdsall, t.l., fox, w.c. (1954). the theory of signal detectability. ire prof gp in theory trans pgit, 4, (4), 171-212. doi: 10.1109/tit.1954.1057460 petrick, n., gallas, b.d., samuelson, f.w., wagner, r.f., myers, k.j. (2005). influence of panel size and expert skill on truth panel performance when combining expert ratings. proc spie med imag, 5749, 596286. doi: 10.1117/12.596286 schulman, k.a., kim, j.j. (2000). medical errors: how the us government is addressing the problem. curr control trials cardiovasc med, 1(1), 35-37. doi: 10.1186/cvm-1-1-035 spss statistics. http://www-03.ibm.com/software/products/en/spss-statistics last accessed april 13, 2016. starr, s.j., metz, c.e., lusted, l.b., goodenough, d.j. (1975). visual detection and localization of http://dx.doi.org/10.1148/radiology.143.1.7063747 https://doi.org/10.1148/radiology.201.3.8939225 http://dx.doi.org/10.1148/74.2.178 https://doi.org/10.1177/0272989x8900900307 https://doi.org/10.1177/0272989x8400400203 http://dx.doi.org/10.1016/0022-2496%2880%2990020-6 https://doi.org/10.1006/jmps.1998.1218 http://www.ncss.com/software/ncss/procedures/ https://doi.org/10.2214/ajr.168.1.8976911 https://doi.org/10.2214/ajr.175.3.1750603 https://doi.org/10.2214/ajr.175.3.1750603 https://doi.org/10.1109/tit.1954.1057460 https://dx.doi.org/10.1186%2fcvm-1-1-035 http://www-03.ibm.com/software/products/en/spss-statistics krupinski | f l r 42 radiographic images. radiol, 116 (3), 533-538. doi:10.1148/116.3.533 stata data analysis and statistical software. http://www.stata.com/features/overview/receiver-operatingcharacteristic/ last accessed april 13, 2016 swensson, r.g. (1996). unified measurement of observer performance in detecting and localizing target objects on images. med phys, 23 (10), 1709-1725. doi:10.1118/1.597758 swets, j.a. (1979). roc analysis applied to the evaluation of medical imaging techniques. radiol, 14 (2), 109-121. pmid: 478799 swets, j.a., dawes, r.m., monahan, j. (2000). psychological science can improve diagnostic decisions. psych sci public interest, 1 (1), 1-26. doi: 10.1111/1529-1006.001 swets, j.a., pickett, r.m. (1982). evaluation of diagnostic systems. methods from signal detection theory. new york, ny: academic press. tanner, w.p., swets, j.a. (1954). a decision-making theory of visual detection. psych rev, 61, (6), 401-409. pmid: 13215690 university of iowa medical image perception roc software. http://perception.radiology.uiowa.edu/software/receiveroperatingcharacteristicroc/tabid/120/defaul t.aspx last accessed april 13, 2016. university of chicago roc software. http://metz-roc.uchicago.edu/ last accessed april 13, 2016. wald, a. (1950). statistical decision functions. new york, ny: wiley, inc. zou, k.h., hall, w.j., shapiro, d.e. (1997). smooth non-parametric receiver operating characteristic (roc) curves for continuous diagnostic tests. stats med, 16 (19), 2143-2156. pmid: 9330425 zhou, x.h., obuchowski, n.a., mcclish, d.k. (2002). statistical methods in diagnostic medicine. new york, ny: wiley. medcalc statistical software. https://www.medcalc.org/manual/roc-curves.php last accessed april 13, 2016. https://doi.org/10.1148/116.3.533 http://www.stata.com/features/overview/receiver-operating-characteristic/ http://www.stata.com/features/overview/receiver-operating-characteristic/ https://doi.org/10.1118/1.597758 https://doi.org/10.1111/1529-1006.001 http://perception.radiology.uiowa.edu/software/receiveroperatingcharacteristicroc/tabid/120/default.aspx http://perception.radiology.uiowa.edu/software/receiveroperatingcharacteristicroc/tabid/120/default.aspx http://metz-roc.uchicago.edu/ https://www.medcalc.org/manual/roc-curves.php microsoft word möller et al_publication.docx frontline learning research vol.4 no. 2 special issue (2016) 1 – 11 issn 2295-3159 __________________________ corresponding author: jens möller, kiel university, institute of psychology, department of educational psychology, olshausenstraße 75, 24118 kiel, germany. email address: jmoeller@psychologie.uni-kiel.de doi: http://dx.doi.org/10.14786/flr.v4i2.169 the generalized internal/external frame of reference model: an extension to dimensional comparison theory jens möllera, hanno müller-kalthoffa, friederike helma, nicole nagya, herb w. marshb akiel university, germany baustralian catholic university, king saud university article received 6 may / revised 24 july / accepted 3 november / available online 20 january abstract the dimensional comparison theory (dct) focuses on the effects of internal, dimensional comparisons (e.g., “how good am i in math compared to english?”) on academic selfconcepts with widespread consequences for students’ self-evaluation, motivation, and behavioral choices. dct is based on the internal/external frame of reference model (i/e model) which integrates dimensional and external, social comparisons (e.g., “how good am i in math compared to my classmates?”). this article presents an extension, the generalized i/e model, which describes effects of dimensional and social comparisons in various areas. firstly, it proposes that such comparisons are carried out not only within the academic area but also within other areas. secondly, it proposes effects of social and dimensional comparison for other variables besides self-concepts, i.e. for motivational constructs, learning behaviors, or personality characteristics. the present article closes with an examination and discussion of the contributions of the dct by applying standards of good theories to it. keywords: dimensional comparison; social comparison; self-concept; domain-specificity möller et al f | f l r 2 1. introduction to the dimensional comparison theory and the i/e model this paper deals with a recently developed theory in the field of educational psychology, the dimensional comparison theory (dct; möller & marsh, 2013). first we will present the central ideas of the dct and the empirical support for its assumptions with regard to the antecedents of dimensional comparisons and the psychological processes carried out while dimensionally comparing aspects of different domains. then we will present a recent extension derived from the dct, the generalized internal/external frame of reference model (gi/e model). whereas the internal/external frame of reference model (i/e model; marsh, 1986) deals with the relations between math and verbal achievements and self-concepts and proposes positive effects from math and verbal achievements to corresponding self-concepts and negative effects on non-corresponding selfconcepts, its generalization allows the application of the relations and effects described therein to other domains as well. the dct and the gi/e model both are presented with reference to their motivational implications. in the discussion, dct’s gains with regard to the criteria of good theories developed by van lange (2013) will be summarized. 1.1 the dimensional comparison theory like social comparison theory (festinger, 1954) or the temporal comparison theory (albert, 1977), the dimensional comparison theory (möller & marsh, 2013) details the cognitive process of evaluating a certain target by comparing it to a certain standard. this cognitive process comprises four stages: the selection of a certain target for evaluation, the selection of a certain standard, the comparison of the target with the standard, and finally the evaluation of the target (biernat & eidelman, 2007; mussweiler, 2003). whereas social comparisons use information on others as the standard (festinger, 1954), and temporal comparisons use prior information on oneself as the standard (albert, 1977), dimensional comparisons use information on other attributes of the same person as a standard (möller & köller, 2001a, b). more precisely, dimensional comparisons (like temporal comparisons) are intra-individual comparisons, for example comparing one’s own mathematical achievement with one’s own verbal achievement affecting both mathematical and verbal selfevaluations. in the dct, dimensional comparisons are defined as taking place when people compare their achievement in one domain (the target domain) with their achievement in another domain (the standard domain). most of the research on dct is quantitative in nature and data come from field studies. however, there are some experimental studies clearly demonstrating effects of dimensional comparisons on self-concepts. in möller and köller (2001a) as well as in pohlmann and möller (2009), participants received dimensional comparison information indicating that their performance on the domain a is ranked worse (better) than their performance on the domain b. participants felt better about their performance in their better-off domain and worse in their worse-off domain, controlled for the presence of social comparison information (see also strickhouser & zell, 2015). research on dimensional comparison has also shown that these comparisons happen in everyday life situations. for example, in a diary study möller and husemann (2006) examined spontaneous dimensional comparisons using qualitative data. their participants were told to note and describe any dimensional comparison that came to their mind during a period of 14 days. university students (study 1) and high school students (study 2) recorded an average of more than six dimensional comparisons during the two weeks, clearly supporting the assumption that dimensional comparisons occur in everyday life. students were asked to mark which domain served as target and which domain served as the comparison standard. results showed that academic matters were most commonly used as target domains (i.e., “we were given our school reports and i compared my grade in religion with my grade in mathematics”). personal relationships with friends, partners, and family as well as the general well-being, physical appearance, and personality characteristics were also used möller et al f | f l r 3 as targets (“although i am not that thin, i am not touchy”), yet less frequently. additionally, target and standard often belonged to the same domain. it was also shown that people often carry out dimensional comparisons when they are motivated to enhance themselves or to improve their mood. particularly in situations of failure, upward dimensional comparisons with a better-off standard (möller & husemann, 2006) serve compensational needs: when i fail at math, it is more pleasurable to concentrate on my verbal abilities. if self-enhancement is the major motivation for dimensional comparison, it is beneficial to use a better-off domain as a comparison standard. such an upward comparison often leads to a higher self-esteem and a more positive mood state in the better-off domain (despite some costs in self-concepts in the worse-off domain). the gains in the better-off domain following downward dimensional comparison (from this perspective) should be stronger than the losses in the worse-off domain following upward dimensional comparison (from this perspective): the net effect of dimensional comparison on self-evaluations empirically seems to be slightly positive (pohlmann & möller, 2009). in the diary study by möller and husemann (2006), upward dimensional comparisons were more frequent than downward comparisons. in study 1, the majority of dimensional comparisons were upward (70.5%). participants reported 20.9 % downward comparisons and 8.7% horizontal comparisons. in study 2, 52.3% of all comparisons were upward, 34.9% downward, and 12.8% horizontal. however, the need for selfenhancement is not the only motivation for dimensional comparisons. following möller, helm, müller-kalthoff, nagy, & marsh (2015), motivations for domain-specific self-evaluation and for self-improvement may also lead to dimensional comparisons. for example, someone trying to self-evaluate how verbally talented he or she is may choose his/her math ability as a comparison standard even if choosing math as a (worse-off) standard may not be beneficial to the actual self. dickhäuser, reuter, and hilling (2005) and nagy et al. (2006) showed that the probability of choosing a particular course in school is positively affected by high achievement in corresponding subjects and negatively affected by high achievement in non-corresponding subjects (i.e., influenced by dimensional comparisons). imagine a student who has to decide whether he/she wants to concentrate on language or on science courses. one criterion for his/her decision will be his/her achievement in these domains; he/she may ask him/herself: “am i better at science or in language arts?”. then again, selfimprovement motivation may also trigger dimensional comparisons: to become better in math, a student might analyze his/her motivation and learning behavior in his/her better-off verbal subjects and transfer them to math. here, the comparison might lead to a behavioral assimilation. one might think that verbal and math achievements both are based on motivation, intelligence, and adequate learning behavior so that the more positive achievement in verbal subjects could be transferred to math. this would lead to a more positive learning behavior in math as well. in addition, some hints on the antecedents of dimensional comparisons could be found: dimensional comparisons (like social comparisons) were shown to be triggered by motivational needs and/or by external forces (möller et al., 2015). the dct is inspired by the i/e model (marsh, 1986, figure 1), which posits the joint operation of both social comparisons and dimensional comparisons to construct domain-specific academic self-concepts. students conduct social comparisons by comparing their achievement with the achievement of their classmates (external frame of reference). for example, if a student’s verbal achievement is lower than that of his/her classmates, likely his/her verbal self-concept will also be lower. in addition, students conduct dimensional comparisons by comparing their achievement in a given subject with their achievements in another subject (internal frame of reference). for example, if a student’s verbal achievement is lower than his/her math achievement, his/her verbal self-concept will suffer and his/her math self-concept will benefit from dimensional comparisons. möller, pohlmann, köller, and marsh (2009) meta-analyzed 69 studies with n = 125,308 students on the relations between academic achievements and self-concepts (see figure 1). the average correlation between math and verbal achievements was strongly positive (r = .67), and much higher than the average correlation between math and verbal self-concepts (r = .10), indicating a strong domain-specificity of academic self möller et al f | f l r 4 concepts. moreover, the effects of external comparisons, i.e. the effects of math achievement on math selfconcept (β = .61) and of verbal achievement on verbal self-concept (β =.49), were substantial and positive (see the horizontal paths in figure 1). however, the effects of dimensional comparisons, i.e. the effects from verbal achievement to mathematical self-concept (β = −.27) and of mathematics achievement on verbal self-concept (β = −.21), were negative (see the cross-dimensional paths in figure 1). integrating the results leads to the central assumption of the i/e model: the strong positive correlation between subjects-specific achievements does not lead to strong positive correlations between subject-specific self-concepts. the reason for this is the negative effect of dimensional comparisons between subject-specific achievements. the results of the meta-analysis indicate the effects of social and dimensional comparisons described in the classic i/e model to be valid for different achievement measures (grades as well as standardized achievement test scores), for different grades, gender groups, and countries (möller et al., 2009). despite the various studies supporting the i/e model, the actual psychological processes behind dimensional comparisons remain rather unexplored. a central assumption of the dct is that dimensional comparison effects are moderated by the perceived similarity of the compared school subjects. according to the dct, different school subjects form a similarity continuum (marsh, byrne, & shavelson, 1988; marsh, lüdtke et al., 2015) that explains the different outcomes of dimensional comparisons. the dct predicts that for dissimilar subjects (so-called far comparisons) like math and english, dimensional comparisons lead to contrast effects, whereas smaller contrast effects or even assimilation effects result between similar subjects (so-called near comparisons). for example, möller, streblow, pohlmann, and köller (2006) found positive path coefficients from achievements to non-corresponding self-concepts between relatively similar subjects like english and german or math and physics. möller, streblow, and pohlmann (2006) asked students directly for their belief in a negative interdependence of math and verbal abilities, that is, whether they thought of math and verbal abilities as negatively correlated or not. stronger beliefs in a negative interdependence of math and verbal ability were accompanied by more negative path coefficients from grades in one subject to academic selfconcepts in the other subject. if students considered abilities in two subjects to be positively correlated, the impact of dimensional comparisons even showed a positive assimilation effect. therefore, similarity perceptions regarding different school subjects seem to be composed to a great amount of interdependence beliefs students hold about underlying abilities. we propose that the similarity of school subjects influence dimensional comparisons in a manner that is described for social comparisons by the selective accessibility model (sam) designed by mussweiler (2003). according to sam, the comparison of a certain target to a given standard is influenced by perceptions of the general similarity of the target and the standard. we assume that when two subjects like math and english are selected as target and standard for dimensional comparison, the comparison process is driven by dissimilarity assumptions. when two more similar subjects like math and physics are dimensionally compared, the comparison process is driven by similarity assumptions instead. the similarity assumption will make commonalities between math and physics more accessible, which will result in lower contrast or even assimilation effects in self-concepts. the dissimilar perception will make differences between subjects more accessible and lead to contrast effects as described in the original i/e model. möller et al f | f l r 5 figure 1. the i/e model: results of a meta-analytic path-analysis on the relations between math and verbal achievement and math and verbal self-concept (from möller et al., 2009). mach = math achievement; mself = math self-concept; vach = verbal achievement; vself = verbal self-concept. 1.2 the generalized internal/external frame of reference model (gi/e model) whereas the i/e model is originally restricted to math and verbal achievements and math and verbal selfconcepts, its logic is extended in the dct to a variety of other variables. therefore, we introduce a generalized i/e model (see figure 2) which may serve as a kind of a guide to look for more i/e like relations between independent and dependent variables. in this model, a person carries out social and dimensional comparisons. for example a student compares his/her own standing or perception in a certain domain with someone else’s standing, and as a result the student is able to form an opinion on his/her own standing in that particular domain. the student also carries out a dimensional comparison when comparing his/her perception of aspects of a particular domain a with his/her perception of aspects of a particular domain b coming to a conclusion about his/her standing in domain a in comparison to his/her standing domain b. both comparisons may have consequences for any kind of domain-specific thought and learning behavior. if a student perceives him/herself as being better than most of his/her classmates in a domain, the effects of social comparisons are positive for self-evaluations. however, the effects of dimensional comparisons are often negative for a particular domain. if someone perceives that he/she is better in sports than in music he/she might neglect his/her musical activities because he/she prefers sports. in extension to the original i/e model, the gi/e model allows an integration of each domain-specific aspect that students tend to compare externally and internally as a predictor or an independent variable. it also integrates the consequences of self-evaluation, motivation, and learning behavior as criteria or dependent variables. at the moment our assumptions on the generalizability of the i/e model and effects to different constructs are rather hypothetical. however there are already some empirical studies that provide preliminary evidence of the validity of our extensions. we will present the description of these studies with regard to the question whether they extended the i/e model on the side of the predictors (i.e., examining the effects of different independent variables on the academic self-concept), on the side of the criteria (i.e., examining the effects of academic achievements on different dependent variables) or on both (i.e., examining i/e-like effects in completely different domains). möller et al f | f l r 6 changing predictors. most recent extensions on the side of the predictors integrate more or other school subjects than merely the native language and math (e.g. chiu, 2012; marsh, lüdtke et al., 2015; jansen, schroeders, lüdtke & marsh, 2015; möller, streblow, pohlmann, & köller, 2006; nagy, trautwein, baumert, köller, & garrett, 2006), i.e., multiple academic subjects (native and foreign language, history, biology, physics, and math). as already outlined, the application of the i/e model to two similar subjects does not typically lead to contrast effects in subject-specific self-concepts. it rather leads to no significant effect from subject-specific achievement to self-concept in the other subject or even to assimilative effects, i.e., positive effects from achievement to self-concept in the other subject. a study by marsh, lüdtke et al. (2015) offers an illustration of the distinction of between-domain comparisons and within-domain comparisons, showing significant contrast effects for so-called far comparisons (between dissimilar subjects) and significantly less contrast or even assimilation effects for so-called near comparisons (for similar subjects). jansen et al. (2015) analyzed dimensional comparison effects for five domains and found support for the hypotheses which derived from the dct. both contrast and assimilation effects can result from dimensional comparisons: mathematics, physics, and chemistry showed contrast effects to german self-concept, whereas more assimilative effects were found from achievements in the three subjects to mathematics, physics, and chemistry self-concepts. furthermore, tietjens, möller, and pohlmann (2005) successfully replicated the i/e model using sports achievement as predictors. performance in track and field negatively affected the self-concept in swimming and basketball and swimming performance negatively influenced the self-concept in soccer (see also chanal, sarrazin, guay, & boiché, 2009). in the diary study mentioned above (möller & husemann, 2006), participants were found to compare a vast variety of different domains intra-individually, like personality characteristics and physical attractiveness. figure 2. the generalized i/e model. an extension of the i/e model to other domains and consequences. changing criteria. some studies in the tradition of the i/e model used math and verbal achievements as comparison target and standard, but then used different dependent variables analyzing effects on variables other möller et al f | f l r 7 than self-concepts, i.e. self-regulated learning (miller, 2000), emotions (goetz, frenzel, hall, & pekrun, 2008), intrinsic motivation (marsh, abduljabbar, parker, morin, abdelfattah, nagengast, möller, & abu-hilal, 2015), and interest (schurtz, pfost, nagengast, & artelt, 2014). with regard to motivation we refer to the corpus of motivation terms relevant to academic achievement murphy and alexander (2000) discussed. they differentiated between self-schema (including self-efficacy and attribution), interest (situational and individual), intrinsic and extrinsic motivation, and goal orientation (including learning, performance, and work avoidance goals). our general answer to the question which constructs fit into i/e relations as criteria is based on the domain-specificity of the motivational constructs: whereas self-schema, interest, and intrinsic/extrinsic motivation in most research studies are domain-specific constructs, goal orientation is often conceptualized as more domain-general (bong, 2013). the more a motivational construct is specific to a domain, the more we will assume effects of dimensional comparisons. motivational constructs that are less specific to a domain are rarely able to be the subject of dimensional comparisons. there is one important exception to this rule: in the möller et al. (2009) meta-analysis, the most important moderator was the type of self-schema measure. when measures of self-efficacy served as selfconcept indicators that included the target in the item, analyses revealed larger relations between math and verbal self-evaluations and the i/e model did not fit the data well. the most important difference between such self-efficacy beliefs and self-concept with regard to comparison processes may be that self-efficacy beliefs, when including the target in the item beliefs, are much more driven by former experiences with similar types of tasks. so far, there is a lack of studies that analyze the effects of dimensional comparisons on other criteria than students’ self-reported self-concepts. although there is already evidence for i/e like relations between students’ verbal and math grades and other-ratings of students’ ability beliefs (e.g., dickhäuser, 2005; möller, 2005), it would be interesting to analyze the effects of social and dimensional comparisons using a broader variety of domain-specific criteria (observed and self-reported) such as time spent on homework, teacher-ratings of students’ classroom behavior, or students’ perceptions of instructional quality of classes (see arens & möller, 2016). in one of the first studies directly referring to the gi/e model, arens and möller (2016) asked students for their grades in math and german and for their perceptions of the learning environment in each their math and german classes from two perspectives: first, the students were asked to rate their relationships to their math teacher and their german teacher. second, the students were asked to judge the perceived quality of the instruction they received in math and german classes. analyses revealed positive paths from grades to corresponding student-teacher relationships and qualities of instruction. more importantly for the dct, negative paths occurred from grades to non-corresponding perceptions of the quality of instruction and positive paths occurred from grades to non-corresponding student-teacher relationships, indicating dimensional comparison processes. changing predictors and criteria. very few studies replaced both independent and dependent variables. however dietrich, dicke, kracke, & noack (2015) found dimensional comparison effects when analyzing crossdomain relations of teacher support and motivation: higher levels of perceived teacher support in one subject were negatively related to students’ intrinsic value and effort in another subject. möller and savyon (2003) analyzed dimensional comparison effects between intelligence and honesty. they gave their participants success or failure feedback on anagram tasks. people in the failure condition rated themselves as being more honest than did students who received positive feedback in the anagram task, indicating that processes of dimensional comparison were carried out between intelligence and honesty. möller and marsh (2013) suggested a transfer of the i/e model to basic personality characteristics. they viewed the “big two” personality dimensions agency (competence) and communion (warmth; abele & möller et al f | f l r 8 wojciszke, 2014) as ideal candidates for an extension of the dct since both dimensions are independent of each other in self-perception, as are math and verbal self-concepts. a first re-analysis of the data of abele, rohe, and hauke (2013, merged from studies i and ii) revealed some support for a “big two i/e model”. helm et al. (under review) revealed typical i/e patterns between other-rated agency and communion as predictors and selfrated agency and communion as criteria, i.e. positive effects on corresponding self-ratings and negative effects on non-corresponding self-ratings. such extensions to personality variables like agency and communion widen the opportunities delivered by the generalized i/e model. to sum up, the generalized i/e model may serve as a matrix for other juxtapositions of domaincharacteristics with consequences for domain-specific beliefs and learning behaviors. the successful transfer of our assumptions to agency and communion may serve as an example for the research possibilities that arise from these extensions, within and outside of learning and motivation research. 1.3 discussion the aim of the present article was to give an overview on a new theory in motivation research, the dimensional comparison theory, as well as to devise a critical extension to the core assumption of the theory. namely, we introduced a generalized i/e model assuming external and in particular internal, dimensional comparisons to be rather general comparison processes not only limited to the formation of academic selfconcepts alone, but to apply to evaluations of different constructs as well. the gi/e model may enable future research to go beyond the relations between verbal and math self-concepts and apply the rationale underlying external and dimensional comparisons in different fields and disciplines as well. in conclusion, we would like to evaluate the dct in regard to its usefulness in terms of a good theory. we will try to evaluate dct from our (subjective) perspective. according to van lange (2013) truth, abstraction, progress, and applicability as standards (tapas) may serve as ideals for theories in psychology. in the following section, we would like to apply tapas to dct: truth. the ideal of truth is met when a theory allows hypothesizing testable relations between the constructs that the theory deals with. although according to popper (1959) truth remains an unreachable ideal, empirical studies allow researchers, who are testing hypotheses derived from theories, to evaluate what aspects of a theory are more or less adequate descriptions of the data. with regard to motivation and learning research, such a theory has to describe or explain data on motivational constructs, e.g. relations between two motivational constructs or between motivation and achievement. in the case of the dct, we have shown that there is strong evidence (a) for the occurrence of dimensional comparisons inside and outside of schools and (b) for crossdomain effects between achievements and academic self-concepts. the empirical support is smaller with regard to other motivational constructs. abstraction. the second ideal of abstraction asks theories to go beyond single empirical studies, generalize findings, and verbalize relations between constructs. in our opinion, the dct is abstract enough while still exposing the causal relations between aspects of domains and corresponding self-evaluations as well as noncorresponding self-evaluations of these aspects, and grounding them on psychological principles. the dct overcomes the limitations of the i/e model, which is a more descriptive approach. progress. thirdly, ideal theories strive for progress. they should add some new insights to the prior knowledge or provide a new and intriguing perspective on a phenomenon. the dct emphasizes a comparison process rather neglected in research outside of educational psychological self-concept research. a first visible sign of progress may be that research on the dct is published in a major social psychology journal (strickhouser & zell, 2015). from our perspective and concerning the dct, it is essential to generalize the i/e model and to initiate more (and more experimental) research on the psychological phenomena associated with möller et al f | f l r 9 dimensional comparisons. derived from dct, the gi/e model allows the application of the relations and effects described in the i/e model to other domains and person characteristics as well. this would offer a new theoretical perspective on the formation of self-concepts and on the emergence of motivational tendencies in different areas of life. applicability. fourthly, a theory should be applicable to real-world concerns, an ideal that is in particular important in an applied field of research like motivation and learning research. the dct does not only claim to be applicable to the well-known and practically important relations between academic achievement and academic self-concept, but has the potential of explaining several real world phenomena that may benefit from considering dimensional comparisons as an underlying psychological process. the dct is applicable in each situation, which forces people to choose between alternatives. in particular, when students have to choose between courses, select careers and academic majors, dimensional comparisons will be at work. to sum up, the gi/e model was introduced overcoming the i/e model’s limitations with regard to math and verbal affairs. first empirical support was presented underlining the fruitfulness of the model to initiate further research activities. the gi/e model is a consequence of the formulation of the dct and enriches the theories’ purpose. further research is needed that adds new domains as comparison targets and standards as well as new motivational and learning outcomes as consequences of external and internal comparisons. keypoints the internal/external frame of reference model (i/e model) describes the relations between math and verbal achievements and self-concepts. the dimensional comparison theory (dct) focuses on the internal frame of reference and extends the i/e model. instead of being limited to math and verbal achievements, it deals with other domains as well. instead of being limited to math and verbal self-concepts, it deals with motivational constructs, learning behaviors, and personality characteristics. therefore, a generalized i/e model is proposed that allows the application of previous findings and will initiate further research. dct is discussed by applying standards of good theories to it. references abele, a., rohe, m., & hauke, n. (2013). does my friend see me like i do? friendship quality and self-other agreement on the fundamental dimensions of social judgment. unpublished manuscript, university of erlangen-nuremburg, germany. abele, a. e., & wojciszke, b. (2014). communal and agentic content. a dual perspective model. advances in experimental social psychology, 50, 198–255. arens, k. & möller, j. (2016). dimensional comparisons: effects on perceived learning environments. learning and instruction, 42, 22-30. doi: 10.1016/j.learninstruc.2015.11.001 albert, s. (1977). temporal comparison theory. psychological review, 84, 485–503. doi: http://dx.doi.org/10.1037/0033-295x.84.6.485 möller et al f | f l r 10 biernat, m., & eidelman, s. (2007). standards. in a. w. kruglanski and e. t. higgins (eds.), social psychology: handbook of basic principles, volume 2 (pp. 308–333). new york: the guilford press. bong, m. (2013). self-efficacy. in j. hattie, e. m. anderman (eds.), international guide to student achievement (pp. 64-66). new york, ny, us: routledge/taylor & francis group. chanal, j. p., sarrazin, p. g., guay, f., & boiché, j. (2009). verbal, mathematics, and physical education selfconcepts and achievements: an extension and test of the internal/external frame of reference model. psychology of sport and exercise, 10, 61–66. doi: http://dx.doi.org/10.1016/j.psychsport.2008.06.008 chiu, m.-s. (2012). the internal/external frame of reference model, big-fish-little-pond effect, and combined model for mathematics and science. journal of educational psychology, 104, 87–107. doi: http://dx.doi.org/10.1037/a0025734 dickhäuser, o. (2005). teachers' inferences about students' self-concepts – the role of dimensional comparison. learning and instruction, 15(3), 225-235. doi: http://dx.doi.org/10.1016/j.learninstruc.2005.04.004 dickhäuser, o., reuter, m., & hilling, c. (2005). coursework selection: a frame of reference approach using structural equation modelling. british journal of educational psychology, 75, 673–688. doi: http://dx.doi.org/10.1348/000709905x37181 dietrich, j., dicke, a. l., kracke, b., & noack, p. (2015). teacher support and its influence on students' intrinsic value and effort: dimensional comparison effects across subjects. learning and instruction, online first. doi: http://dx.doi.org/10.1016/j.learninstruc.2015.05.007 festinger, l. (1954). a theory of social comparison processes. human relations, 7, 117–140. doi: http://dx.doi.org/10.1177/001872675400700202 goetz, t., frenzel, c. a., hall, n. c., & pekrun, r. (2008). antecedents of academic emotions: testing the internal/external frame of reference model for academic enjoyment. contemporary educational psychology, 33, 9–33. doi: http://dx.doi.org/10.1016/j.cedpsych.2006.12.002 helm, f., müller-kalthoff, h., & nagy, n., & möller, j. (under review). dimensional comparison effects on academic self-concepts: similarity of school subjects. university of kiel, germany. jansen, m., schroeders, u., lüdtke, o., & marsh, h. w. (online first, 2015). contrast and assimilation effects of dimensional comparisons in five subjects: an extension of the i/e model. journal of educational psychology. doi: http://dx.doi.org/10.1037/edu0000021 marsh, h. w. (1986). verbal and math self-concepts: an internal/external frame of reference model. american educational research journal, 23, 129–149. doi: http://dx.doi.org/10.3102/00028312023001129 marsh, h. w., abduljabbar, a. s., parker, p. d., morin, a. s., abdelfattah, f., nagengast, b., möller, j., & abuhilal, m. m. (2015). the internal/external frame of reference model of self-concept and achievement relations: age-cohort and cross-cultural differences. american educational research journal, 52(1), 168202. doi: http://dx.doi.org/10.3102/0002831214549453 marsh, h. w., byrne, b. m., & shavelson, r. j. (1988). a multifaceted academic self-concept: its hierarchical structure and its relation to academic achievement. journal of educational psychology, 80, 366–380. doi: http://dx.doi.org/10.1037/0022-0663.80.3.366 marsh, h. w., lüdtke, o., nagengast, b., trautwein, u., abduljabbar, a. s., abdelfattah, f., & jansen, m. (2015). dimensional comparison theory: paradoxical relations between self-beliefs and achievements in multiple domains. learning and instruction, 35, 16-32. doi: http://dx.doi.org/10.1016/j.learninstruc.2014.08.005 miller, j. w. (2000). exploring the source of self-regulated learning: the influence of internal and external comparisons. journal of instructional psychology, 26, 47–52. möller, j. (2005). paradoxical effects of praise and criticism: social, dimensional and temporal comparisons. british journal of educational psychology, 75(2), 275-295. doi: http://dx.doi.org/10.1348/000709904x24744 möller et al f | f l r 11 möller, j., helm, f., müller-kalthoff, h., nagy, n., & marsh, h. w. (2015). dimensional comparisons: theoretical assumptions and empirical results. in: j. wright (ed.), international encyclopedia of the social and behavioral sciences, 2nd ed. (pp. 430-436). elsevier: oxford, gb. möller, j., & husemann, n. (2006). internal comparisons in everyday life. journal of educational psychology, 98, 342–353. doi: http://dx.doi.org/10.1037/0022-0663.98.2.342 möller, j., & köller, o. (2001a). dimensional comparisons: an experimental approach to the internal/external frame of reference model. journal of educational psychology, 93, 826–835. doi: http://dx.doi.org/10.1037/0022-0663.93.4.826 möller, j., & köller, o. (2001b). frame of reference effects following the announcement of exam results. contemporary educational psychology, 26, 277-287. doi: http://dx.doi.org/10.1006/ceps.2000.1055 möller, j. & marsh, h. w. (2013). dimensional comparison theory. psychological review, 120, 544-560. doi: http://dx.doi.org/10.1037/a0032459 möller, j., pohlmann, b., köller, o., & marsh, h. w. (2009). a meta-analytic path analysis of the internal/external frame of reference model of academic achievement and academic self-concept. review of educational research, 79, 1129–1167. doi: http://dx.doi.org/10.3102/0034654309337522 möller, j., & savyon, k. (2003). not very smart thus moral: dimensional comparisons between academic selfconcept and honesty. social psychology of education, 6, 95–106. doi: http://dx.doi.org/10.1023/a:1023247910033 möller, j., streblow, l., & pohlmann, b. (2006). the belief in a negative interdependence of math and verbal abilities as determinant of academic self-concepts. british journal of educational psychology, 76, 57–70. doi: http://dx.doi.org/10.1348/000709905x37451 möller, j., streblow, l., pohlmann, b., & köller, o. (2006). an extension to the internal/external frame of reference model to two verbal and numerical domains. european journal of psychology of education, 21, 467–487. doi: http://dx.doi.org/10.1007/bf03173515 murphy, p. k., & alexander, p. a. (2000). a motivated exploration of motivation terminology. contemporary educational psychology, 25(1), 3-53. doi: http://dx.doi.org/10.1006/ceps.1999.1019 mussweiler, t. (2003). comparison processes in social judgment: mechanisms and consequences. psychological review, 110, 472–489. doi: http://dx.doi.org/10.1037/0033-295x.110.3.472 nagy, g., trautwein, u., baumert, j., köller, o., & garrett, j. (2006). gender and course selection in upper secondary education: effects of academic self-concept and intrinsic value. educational research and evaluation, 12, 323–345. doi: http://dx.doi.org/10.1080/13803610600765687 pohlmann, b., & möller, j. (2009). on the benefits of dimensional comparisons. journal of educational psychology, 101, 248–258. doi: http://dx.doi.org/10.1037/a0013151 popper, k. r. (1959). the logic of scientific discovery. new york, ny: harper. (original work published as logik der forschung, 1935) schurtz, i. m., pfost, m., nagengast, b., & artelt, c. (2014). impact of social and dimensional comparisons on student's mathematical and english subject-interest at the beginning of secondary school. learning and instruction, 34, 32-41. doi:http://dx.doi.org/10.1016/j.learninstruc.2014.08.001 strickhouser, j. e. & zell, e. (2015, online first). self-evaluative effects of dimensional and social comparison, journal of experimental social psychology. doi: http://dx.doi.org/10.1016/j.jesp.2015.03.001 tietjens, m., möller, j., & pohlmann, b. (2005). leistung und selbstkonzept in verschiedenen sportarten [achievement and self-concept in various sports]. zeitschrift für sportpsychologie, 12, 135–143. doi: http://dx.doi.org/10.1026/1612-5010.12.4.135 van lange, p. m. (2013). what we should expect from theories in social psychology: truth, abstraction, progress, and applicability as standards (tapas). personality and social psychology review, 17(1), 40-55. doi: http://dx.doi.org/10.1177/1088868312453088 codepen knoop publication frontline learning research vol.8 no. 4 (2020) 37 51 issn 2295-3159 how teachers integrate dashboards into their feedback practices carolien knoop-van campena& inge molenaara abehavioural science institute, radboud university, the netherlands article received 12 march 2020 / revised 20 may / accepted 18 june / available online 13 july abstract in technology empowered classrooms teachers receive real-time data about students’ performance and progress on teacher dashboards. dashboards have the potential to enhance teachers’ feedback practices and complement human-prompted feedback that is initiated by teachers themselves or students asking questions. however, such enhancement requires teachers to integrate dashboards into their professional routines. how teachers shift between dashboardand human-prompted feedback could be indicative of this integration. we therefore examined in 65 k-12 lessons: i) differences between humanand dashboard-prompted feedback; ii) how teachers alternated between humanand dashboard-prompted feedback (distribution patterns); and iii) how these distribution patterns were associated with the given feedback type: task, process, personal, metacognitive, and social feedback. the three sources of feedback resulted in different types of feedback: teacher-prompted feedback was predominantly personal and student-prompted feedback mostly resulted in task feedback, whereas dashboard-prompted feedback was equally likely to be task, process, or personal feedback. we found two distribution patterns of dashboard-prompted feedback within a lesson: either given in one sequence together (blocked pattern) or alternated with studentand teacher-prompted feedback (mixed pattern). the distribution pattern affected the type of dashboard-prompted feedback given. in blocked patterns, dashboard-prompted feedback was mostly personal, whereas in mixed patterns task feedback was most prevalent. hence, both sources of feedback instigation as well as the distribution of dashboard-prompted feedback affected the type of feedback given by teachers. moreover, when teachers advanced the integration of dashboard-prompted feedback in their professional routines as indicated by mixed patterns, more effective types of feedback were given. keywords: teacher dashboards; feedback; adaptive learning technologies info corresponding author: e-mail: c.knoop-vancampen@pwo.ru.nl doi: https://doi.org/10.14786/flr.v8i4.641 1. introduction with the growing use of educational technologies, teacher dashboards are increasingly available in classrooms. while students are practising with educational technologies, dashboards provide teachers with concurrent information about students’ performance, pace, and progress (van leeuwen, janssen, erkens, & brekelmans, 2014; molenaar & van schaik, 2016). teachers can use dashboards with additional information to improve their feedback practices with dashboard-prompted feedback. this complements human-prompted feedback, which teachers initiate themselves and/or give in response to students’ questions. so far, most studies concerning teacher dashboards have primarily focused on how teachers understand dashboard information and translate this into action (molenaar & knoop-van campen, 2019; verbert, et al., 2014). to the best of our knowledge, how the source of feedback, e.g. humanversus dashboard-prompted feedback, is related to the type of feedback given has not yet been investigated. nor is it known how teachers alternate between humanand dashboard-prompted feedback during teaching and whether this distribution pattern is related to the feedback given. in this paper, we postulate that dashboard-prompted feedback is given alongside human-prompted feedback and that the way teachers alternate between dashboardand human-prompted feedback could be indicative of how well they have incorporated dashboards into their professional routines. we therefore examined: i) differences between humanand dashboard-prompted feedback; ii) how teachers alternated between humanand dashboard-prompted feedback (distribution patterns); and iii) how these distribution patterns were associated with the type of feedback given. by investigating all feedback and not only dashboard-prompted feedback, this paper takes a broad perspective on how dashboards are integrated into teachers’ feedback practices. 1.1 teacher feedback practices providing feedback in classroom situations is a complex professional task (roelofs & sanders, 2007). decisions on how to help students during learning are based on teachers’ pedagogical knowledge base, which consists of information on students’ knowledge and abilities, teachers’ perceptions of their students, teachers’ content knowledge combined with more general knowledge and beliefs about effective pedagogical practices (meijer, verloop, & driel, 2001; roelofs & sanders, 2007). this pedagogical knowledge base therefore combines content, pedagogy and learners’ characteristics in such a way that teachers can take effective pedagogical actions (gudmuindsdottir & shulman, 1987). the practical impact of teachers’ professional knowledge base is visible in their professional routines, i.e., “patterns and routines of action, interaction, and sense-making” (ballet, & kelchtermans, 2009, p. 1153). such routines are indicative of the way teachers apply their pedagogical knowledge base and also extend to feedback provided by teachers to their students. in order for teachers to provide appropriate feedback, it is important to identify students’ current level of performance and knowledge and target feedback at identified needs of the students (wood, brunner, & ross, 1976). in the classroom context, feedback can be defined as information provided by the teacher regarding aspects of the students’ performance or behaviour (hattie, & timperley, 2007). five types of feedback are often distinguished: process, metacognitive, task, social, and personal feedback (hattie, & timperley, 2007; keuvelaar-van den bergh, 2013). process feedback gives information about students’ progress towards learning goals (hattie, & timperley, 2007), while metacognitive feedback helps students to control and monitor their learning (de jager, jansen, & reezigt, 2005). these two types of feedback are the most effective types of feedback to improve learning as they increase students’ self-regulation and strategic handling (hattie, 2012). next, task feedback informs students about the state of their performance and helps them to reflect on their current understanding (butler, & winne, 1995; hattie, & timperley, 2007). as task feedback directly supports task execution, it is also supportive for learning (keuvelaar-van den bergh, 2013). social feedback helps students to adequately collaborate with other students and is found to be important in collaborative learning settings to enhance learning outcomes (keuvelaar-van den bergh, 2013). personal feedback, which supports students to improve their behaviour during learning, is considered less effective as it does not explain how student behaviour is related to task or process elements and therefore entails little direction to improve learning (shute, 2008). teachers are thus not only challenged to diagnose when students are in need of feedback, but also to select the appropriate and effective type of feedback accordingly. even though much has been written about feedback, less is known about which types of feedback teachers actually give during lessons (bennett, 2011; voerman, meijer, korthagen, & simons, 2012). we found only two studies that investigated teachers’ actual feedback practices in classrooms. in these studies, observations were performed to identify the feedback types teachers gave in class. one study observed 32 teachers in primary education and showed that they mostly gave task and process feedback, whereas metacognitive, social, and personal feedback were hardly used (keuvelaar-van den bergh, 2013). this indicated that in primary education one of the most effective types of feedback, namely metacognitive feedback, is scarce. a second study (voerman, meijer, korthagen, & simons, 2012), in which 78 teachers in secondary education were observed, found that these teachers mostly provided students with non-specific feedback, for example providing personal encouragement such as “well done” and this did not give students directions to improve learning. additionally, they also noticed that only 7% of all feedback given was process feedback. this is problematic, as process feedback is helpful for students to improve their learning. the authors concluded that teachers seldom provided effective feedback during classroom activities (voerman, meijer, korthagen, & simons, 2012). combined, these two studies indicate that the type of feedback provided in classrooms is not always effective and that there are still considerable gains to be achieved with regards to teachers' feedback practices. 1.2 how teacher dashboards augment feedback practices in addition to human-prompted feedback given by teachers on their own initiative (teacher-prompted feedback) or in response to students’ questions (student-prompted feedback), feedback can also be elicited from information teachers view on their dashboards. teacher dashboards are increasingly used in k-12 education. these dashboards provide additional information on students’ performance, pace and progress and, as such, can be viewed as tools to augment teachers’ pedagogical knowledge base (holstein, mclaren, & aleven, 2017; molenaar & knoop-van campen, 2019). during learning, software captures real-time data on learner performance which is immediately displayed to the teachers on their laptop or computer screen (dashboards). as such, dashboards provide teachers with organized visualizations giving information about individual students or the whole class. class dashboards often provide an overview of all of the students’ correct and incorrect answers on problems. individual dashboards mostly show individual students’ progress on different learning goals and the development of their knowledge and skills, while predictive analytics are also presented. however, merely providing this information on dashboards is not enough to impact teachers’ feedback practices. even though professional routines are flexible and develop continuously (lacourse, 2011), actively and intentionally changing these professional routines by integrating new tools, such as dashboards, involves several stages of implementation. the learning analytics process model identifies four stages which teachers have to go through before dashboard data can impact their teaching practices (verbert, et al., 2014). first, teachers need to become consciously aware of data on the dashboards and build understanding as to when they can use this information during teaching (awareness stage). second, teachers have to be able to formulate questions for the data to answer (reflection stage). for example, “how can i see when a student needs help and which feedback is most appropriate?” third, teachers have to analyse the data to answer these questions (sense-making stage). for example, “lia makes a lot of mistakes; she needs additional help”. fourth and last, teachers need to determine which response to the data is appropriate and fits their analysis of the situation best (impact stage). for example, “lia does not seem to understand how to simplify mixed fractions, i should explain to her that she needs to extract the whole numbers first”. for dashboard data to be converted into feedback, teachers need to enact all stages of the learning analytics process model. during this process, they transform data into meaningful feedback actions to improve their teaching (molenaar & knoop-van campen, 2019). the learning analytics process model described above provides a theoretical model to understand how teachers translate data into action. however, how teachers apply this in their real-life professional routines remains unclear. there are only limited empirical insights into the actual impact of dashboards on teachers’ feedback practices in classrooms (van leeuwen, janssen, erkens, & brekelmans, 2014). some initial evidence has been found for the enactment of the learning analytics process model in practice; teacher awareness of dashboards was found to positively influence the feedback they gave to students (molenaar & knoop-van campen, 2017a). when teachers consulted dashboards more often, they activated more and also more diverse knowledge in their pedagogical knowledge base. furthermore, teachers who activated more diverse knowledge also gave more and more different types of feedback to students (molenaar & knoop-van campen, 2019). other studies have also provided evidence that dashboards changed teachers’ feedback practices. teachers tend to provide more feedback when they are supported by dashboards compared to situations without dashboards (van leeuwen, janssen, erkens, & brekelmans, 2014). in addition, regarding feedback allocation, one study showed that teachers who received real-time notifications about student performance directed their attention to low performing students more than they did without dashboards (martinez-maldonado, clayphan, yacef, & kay, 2014). this in turn led to improvements in these students’ performance. another study indicated that teachers also gave more feedback to good students who were struggling (knoop-van campen, wise, & molenaar, submitted.). this study also showed that dashboard-prompted feedback given to low performing students entailed equal amounts of task and process feedback while human-prompted feedback to this group consisted mostly of task feedback. lastly, an augmented reality dashboard that indicated which students needed additional support while practising in an intelligent tutor system caused teachers to spend more time on students who showed poor productivity in learning, which consequently had a positive impact on their learning (holstein, hong, tegene, mclaren, & aleven, 2018). thus, there is initial evidence that dashboard-prompted feedback is profoundly different from human-prompted feedback and can enhance learning outcomes. dashboards not only increased the amount of feedback given, but also elicited more effective types of feedback. in these studies, however, there was great variation in the frequency of dashboard-use and the type of feedback given during lessons (e.g. martinez-maldonado, clayphan, yacef, & kay, 2014; molenaar & knoop-van campen, 2017a, 2019). given the positive benefits of teacher dashboards on students’ learning, it is important to examine how this variation can be explained. with the learning analytics process model and the importance of enacting all stages in mind, a possible explanation may be the level at which teachers incorporated dashboards into their professional routines. the way teachers alternate between humanand dashboard-prompted feedback in their daily classroom activities may reflect how well they have integrated dashboards into their professional routines and this may be related to how well they enact the stages of the learning analytics process model. properly enacting the stages of the model enables teachers to optimally translate the dashboard data into practice. better integration of the dashboard into their feedback practices could therefore be followed by more effective types of feedback (i.e., more specific and more responsive feedback) after consulting the dashboard. 1.3 present study there is initial evidence that dashboards have a positive impact on teachers’ feedback practices, but only when they effectively integrate the dashboard information into their professional routines. previous studies have not specifically addressed dashboard-prompted feedback, as differences between dashboardand human-prompted feedback and how teachers integrate dashboards into their professional routines have not been investigated. therefore, we examined i) differences between humanand dashboard-prompted feedback; ii) how teachers alternated between humanand dashboard-prompted feedback (distribution patterns); and iii) how these distribution patterns were associated with the type of feedback given. we expected that dashboard-prompted feedback would elicit more effective types of feedback (e.g., task and process feedback) than human-prompted feedback. regarding the integration of dashboard-feedback into lessons, we expected to see two distribution patterns. first, when teachers have not (or not yet) fully integrated dashboards into their professional routines, they will likely use them only during a particular phase of the lesson. we hypothesize that this may result in a blocked pattern with dashboard-prompted feedback in one part of the lesson and human-prompted feedback in the other parts of the lesson. in contrast, when teachers become more proficient in using dashboards and have incorporated them into their professional routines, we expect more flexible dashboard use during the lesson. this may result in patterns displaying a mix of humanand dashboard prompted feedback (mixed patterns), in which teachers use teacher-, studentand dashboard-prompted feedback interchangeably. better integration of dashboards into teachers’ professional routines is expected to result in more effective types of feedback being given after consulting the dashboards. 2. method 2.1 participants in total, 65 lessons were observed: 45 teachers were observed of whom 20 were observed twice. all lessons were 50 minutes long and taught in year group 2 (8-year-old students) to year group 6 (12-year-old students). lessons were arithmetic or spelling lessons, dealing with topics on the regular school curriculum. the teachers were mostly female (75%), between 20 and 65 years old spread evenly across the age range and with a corresponding range of teaching experience between 2 and 30 years. adaptive learning technology was used in these classrooms on a daily basis. while students worked on problems in the adaptive learning technology, real-time data was shown on the teacher dashboards. teachers had between 1 and 3 years of experience in working with this technology. 2.2 materials 2.2.1 adaptive learning technology the adaptive learning technology (alt) used in this study called ‘snappet’ , ran on tablet computers and is widely used for arithmetic and spelling across schools in the netherlands (molenaar & knoop-van campen, 2016). the arithmetic problems in the alt were comparable to those done by students in regular classrooms. the alt offered both adaptive and non-adaptive problems. non-adaptive problems were pre-selected for a particular topic and all students in the class received the same non-adaptive problems in a lesson. the adaptive problems adjusted to the skills of the individual student. a derivative of the elo algorithm adapted problems to the current knowledge level of the student (klinkenberg, straatemeier, & van der maas, 2011). the algorithm worked with a student’s knowledge score; the representation of a student’s current level of knowledge on a particular topic. the knowledge score was calculated based on all problems that a student had worked on. every problem in the system had a difficulty level which was automatically generated and updated by the system based on all of the student’s answers (klinkenberg, straatemeier, & van der maas, 2011). based on the student’s knowledge level, the alt selected the next practice problem. the problem was selected in such a way that the student had a 75% probability of answering correctly. 2.2.2 dashboards teachers viewed a visualization of students’ data on the dashboard (see figure 1). the software captures real-time data on learner performance which was immediately displayed to the teachers on dashboards. the dashboard showed information on problems students had worked on. after a student’s name, it indicated how many problems students had solved (progress) and whether the problems were answered correctly (performance). the circles indicated problems answered. green indicated a correct answer, red an incorrect response and combined green with red circles indicated a correct response on the second attempt. a blue open circle indicated the current problem. the first section of the dashboard dealt with non-adaptive problems that were part of the lesson, the second section indicated adaptive problems with the topic the students were working on in the heading above the circles. finally, the progress indictor (human icon in front of students’ names) showed which students were making progress (green icon), not making progress (red icon) or were currently unknown (grey icon). the system defined these aspects based on an algorithm and visualized it with color-coding of these icons. figure. 1. teacher dashboard (anonymized) 2.2.3 the observations the classroom observation app (molenaar & knoop-van campen, 2017b) was used to track the sources and feedback types. observations were performed by trained student assistants using the observation app. these observers were instructed to code all feedback actions during a lesson. for each feedback action, the source (teacher, student, dashboard) and the type (task, process, personal, metacognitive, and social feedback; see table 1) were coded (hattie, & timperley, 2007; keuvelaar-van den bergh, 2013). teachers accessed the dashboards on their computer screen, laptop, or tablet. computers and laptops were situated in such a way that they were accessible to the teacher during lessons. table 1. types of feedback 2.3 data analyses to examine differences between humanand dashboard-prompted feedback, we ran a chi-square with bonferroni column proportion comparisons. the column proportions tests were used to determine the relative ordering of categories of the columns (type of feedback) in terms of the category proportions of the rows (source of feedback). to examine how teachers alternated between human and dashboard-prompted feedback, we plotted feedback actions within one lesson on three distinct levels (teacher, student, dashboard) on the y-axis and all feedback actions on the x-axis (see figure 4). a grounded approach was used to determine the codes on a subset of the sample. the subset of plots was inspected and this comparison lead to three categories: no dashboard-prompted feedback, blocked distribution pattern, or mixed distribution pattern (see figure 4 in the results section). the remaining plots were then coded by two independent coders, with a very high cohen's kappa (k = .93). where the two coders disagreed, codes were discussed until agreement was reached. chi-square analysis with bonferroni column proportions were used to understand how these distribution patterns were associated with the type of feedback given. 3. results 3.1 descriptives in 65 lessons, 3,410 feedback actions were observed. the source was recorded for 3,330 actions (80 missing) and the type was recorded for 3,237 actions (173 missing). overall, teacher-prompted feedback was most prevalent (1,869 times (56%)), followed by student-prompted feedback (867 times (26%)) and dashboard-prompted feedback (594 times (18%)) (see figure 2a). regarding all feedback actions, task feedback was given most often (997 times (31%)) closely followed by process feedback (980 times (30%)) and personal feedback (894 times (28%)). both metacognitive feedback (199 times (6%)) and social feedback (167 times (5%)) were less frequent (see figure 2b). figure 2a. feedback source figure 2b. type of feedback 3.2 associations between source and type of feedback first, we examined the association between source and feedback type. there was a significant association, χ2 (8, n = 3157) = 466.81, p < .001 (see figure 3). the most likely feedback types differed depending on the source of the feedback. below we further specify the results for teacher-, studentand dashboard-prompted feedback. bonferroni column proportions were used to determine how the type of feedback was associated with the source of feedback. when teachers triggered feedback themselves (teacher-prompted feedback), personal feedback (n = 692, 37%) was most frequently given followed by process (n = 477, 26%) and task feedback (n = 410, 22%). social (n = 146, 8%) and metacognitive feedback (n = 126, 7%) were less frequently given. bonferroni column proportions indicated a significant difference between all the feedback types in how often each feedback type was given. this means that personal feedback was most likely to follow when teachers prompted feedback, followed by process, task, social and metacognitive feedback. when teachers reacted to students’ questions (student-prompted feedback), task feedback (n = 413, 48%) was most frequently given, followed by process feedback (n = 354, 41%). personal feedback (n = 56, 7%), metacognitive (n = 33, 4%) and social feedback (n = 6, 1%) were less frequent. bonferroni column proportions indicated that task and process feedback were equally likely to occur after students’ questions. both task and process were more likely to be given in response to students’ questions than personal, social, and metacognitive feedback. metacognitive feedback was more likely to be given than social feedback but less likely than the other types of feedback. when teachers provided feedback after dashboard consultation (dashboard-prompted feedback), task feedback (n = 160, 36%) was most frequently given, followed by personal (n = 125, 28%) and process feedback (n = 124, 28%). again metacognitive (n = 31, 7%) and social feedback (n = 4, 1%) were less frequent. bonferroni column proportions indicated that except for social feedback, which was less likely to be given, the other four types of feedback were equally likely. figure 3. feedback type for teacher-, studentand dashboard-prompted feedback 3.3 patterns of dashboard-prompted feedback second, we examined how teachers alternated between humanand dashboard-prompted feedback. thirteen lessons (20%) did not show any dashboard-prompted feedback (see figure 4a for an example: there was no feedback prompted by the dashboard). furthermore, we found two different distribution patterns: fifteen lessons (23%) showed a blocked distribution pattern (see figure 4b for an example: dashboard prompted-feedback occurred in one or more chunks during the lesson), and thirty-seven lessons (57%) showed a mixed pattern where dashboard feedback was alternated with teacher and student feedback (see figure 4c for an example: the feedback source interchanged during the lesson). each figure shows all feedback actions in that specific lesson sequentially on the x-axis and in three rows dashboard-, student-, and teacher-prompted feedback on the y-axis. figure 4a. no dashboard-prompted feedback figure 4b. blocked pattern figure 4c. mixed pattern 3.4 dashboard-prompted feedback third, we investigated how the two distribution patterns (blocked and mixed) were associated with the type of dashboard-prompted feedback given. there was a significant association between the distribution pattern and the feedback type, χ2 (4, n = 438) = 33.56, p < .001 (see figure 5). the most likely feedback types differed between the two distribution patterns. in lessons with a block pattern, most dashboard-prompted feedback was personal (n = 38, 36%), process (n = 26, 25%), or task feedback (n = 23, 22%). some metacognitive (n = 18, 17%) and no social feedback (n = 0, 0%) was given. bonferroni column proportions indicated that personal feedback was equally likely to be given as process, but more likely than task or metacognitive feedback. metacognitive feedback was less likely to be given than task and process feedback, but just as likely as social feedback. in lessons with a mixed pattern, most dashboard-prompted feedback was task (n = 135, 41%), process (n = 95, 29%), and personal feedback (n = 87, 26%). some metacognitive (n = 12, 4%) and social feedback (n = 4, 1%) was given. bonferroni column proportions indicated that all types of feedback, except for social feedback, were equally likely to be given. social feedback was less likely to be given than the other types, but just as likely as metacognitive feedback. to conclude, during blocked pattern lessons personal feedback was most likely to be given, whereas in mixed pattern lessons task feedback was most frequent. in figure 5, we can observe a reversed pattern of task, process, and personal feedback in the blocked versus the mixed patterns. figure 5. dashboard-prompted feedback per distribution pattern 4. discussion dashboards have the potential to enhance teachers’ feedback practices and complement human-prompted feedback that is initiated by teachers themselves or students asking questions. however, such enhancement requires teachers to integrate dashboards into their professional routines. how teachers shift between dashboardand human-prompted feedback could be indicative of this integration. therefore, we examined i) differences between humanand dashboard-prompted feedback; ii) how teachers alternated between humanand dashboard-prompted feedback (distribution patterns); and iii) how these distribution patterns were associated with the type of feedback given. results will foster new understanding on how dashboards are integrated into teachers’ feedback practices. 4.1 how teachers integrate dashboards into their feedback practices regarding the first research question, results indicated that, in line with our expectation, feedback source and feedback type were related. teacher-prompted feedback was predominantly personal and student-prompted feedback mostly resulted in task feedback, whereas dashboard-prompted feedback was equally likely to be task, process, or personal feedback. this indicates that dashboards induce more diverse feedback practices. moreover, as teacher-prompted feedback constitutes mostly personal feedback which is less effective, we postulate that dashboard-prompted feedback may be more efficient. moreover, dashboards stimulated the less frequently used process feedback, which is known to be positively associated with learning (hattie, 2012). whereas van leeuwen, janssen, erkens, and brekelmans (2014) showed that dashboards changed the number of teachers’ feedback actions, we found that dashboards also changed the quality of feedback, by increasing the number of effective feedback practices. secondly, we investigated how teachers alternated between human and dashboard-prompted feedback and whether so-called distribution patterns of dashboard-prompted feedback could be observed. the two expected distribution patterns were found. there were lessons in which dashboard-prompted feedback occurred in one part of the lesson and human-prompted feedback in another part of the lesson (blocked pattern). there were also lessons in which teachers used teacher-, studentand dashboard-prompted feedback interchangeably and where dashboard-prompted feedback was mixed with human-prompted feedback throughout the whole lesson (mixed pattern). these distribution patterns indicate that there was not only variation in the extent to which teachers used the dashboard, but also in how they integrated dashboards into their professional routines. additionally, it is important to note that there was also a group of teachers (20%) who did not give any dashboard-prompted feedback. these teachers did not integrate the dashboards into their feedback practices. thirdly, we examined how the observed patterns of dashboard-prompted feedback were associated with feedback type. in line with our expectations, the type of dashboard-prompted feedback was found to be affected by the distribution pattern it occurred in. dashboard-prompted feedback in blocked patterns was mostly personal feedback, while in mixed patterns task feedback was most prevalent. ergo, higher levels of alternation of dashboard-prompted feedback with other feedback sources, leads to more task feedback. we postulate that different patterns might be indicative of differential development of teachers’ professional routines with regards to the analysis and application of dashboard information. even though all teachers who use teacher dashboards enact the learning analytics process (verbert, et al., 2014), the achieved depth and effectiveness of their awareness, reflection, sense-making, and implementation in their feedback actions differs between teachers. the patterns we observed visualized these differences in implementation and showed that along with integration of dashboard-feedback among their other feedback, teachers also gave more efficient types of feedback. teachers who gave dashboard-prompted feedback in blocked patterns mostly gave personal feedback, which resembles the main feedback type also initiated by themselves. this may indicate that teachers recognized a specific student was in need of support, but only acted on a general level providing personal feedback. still, teachers showing mixed patterns have developed their analytic skills to integrate the dashboard information within their existing knowledge base and consequently are able to follow up with task-related support (butler & winne, 1995; hattie & timperly, 2007). hence, this informs a hypothesis that teachers go through different stages in learning how to integrate dashboards into their daily practices and that distribution patterns may be indicative of their current level of integration. future research should further explore the relation between distribution patterns of dashboard-prompted feedback in lessons and teacher proficiency in these feedback practices. 4.2 limitations the limitation of this study was that it was a naturalistic study of teachers’ feedback practices in lessons. there was no strict study design and hence cause and effect relations could not be detected. however, the study did provide ecologically validated data on how teachers in real life use and integrate dashboard data, which cannot be simulated in a lab setting. to understand teachers’ feedback actions in response to dashboards, it is vital to take into account their pedagogical knowledge base and thus investigate their professional routines in their class with children they know and work with on a daily basis. 4.3 future research future research could investigate how the integration of dashboard-prompted feedback develops over time. as more integrated use of dashboards seems to relate to positive changes in feedback type, a training intervention could be envisaged, in which teachers would be taught to enhance their analytical abilities. this could support understanding and interpretation of the information on the dashboard and could enhance teachers’ capacities to integrate dashboard information into their existing pedagogical knowledge base. future research in turn could investigate whether this enhances teachers’ feedback practices. it would also be useful to examine whether particular teacher characteristics, for example teachers' experience with the dashboard or their analytical skills, relate to the integration of dashboard-prompted feedback. 4.4 practical implications practical implications of this study are fourfold. first, we now know that dashboards elicit more diverse feedback practices compared to human-prompted feedback. hence, dashboards seem to sustain rich feedback practices especially with respect to inducing more process feedback. second, we learned that different sources of feedback are related to different types of feedback. in professional training the function of different sources of feedback could be highlighted. third, mixed patterns are more inductive to task feedback, hence teacher training could focus on how teachers integrate dashboard-prompted feedback into their overall feedback practices. fourth, even though dashboard-prompted feedback is more diverse than human-prompted, both social and metacognitive feedback are underrepresented. social feedback is less inherent to the context studied, but metacognitive feedback could be beneficial for students. especially as young learners’ metacognitive skill have not yet fully developed (veenman et al. 2006), it is important that teachers support the acquisition of these skills by providing feedback. hence, dashboard designers should consider how to include information that can elicit metacognitive feedback. 4.5 conclusions to conclude, we showed that dashboard-prompted feedback is more diverse than feedback provided on teachers’ own initiative and in response to students’ questions. the way dashboard-prompted feedback is distributed in a lesson affected the feedback type teachers gave. this distribution may be indicative of teachers’ professional routines regarding dashboard usage and dashboard-prompted feedback and more specifically be indicative of the extent to which dashboards are integrated into educational practice. in technology empowered classrooms, dashboards thus provide the opportunity to optimize teachers’ feedback practices when integrated into their professional routines. keypoints feedback initiated after dashboard consultation is more diverse compared to teacher-prompted and student prompted feedback teachers gave dashboard-prompted feedback in one block or they mixed humanand dashboard-prompted feedback the mixed pattern was associated with more task feedback, compared to mostly personal feedback in a blocked pattern the advanced integration of dashboards into teachers’ professional routines positively impacts feedback acknowledgments this work was supported by the nro doorbraak project awarded to inge molenaar (grant number: 405-15-823). the authors wish to thank the students of the radboud university for collecting the data, all the schools and teachers who participated in this study, and dr. j.a. houwman for proofreading the manuscript. references ballet, k., & kelchtermans, g. (2009). struggling with workload: primary teachers’ experience of intensification. teaching and teacher education, 25(8), 1150-1157. doi.10.1016/j.tate.2009.02.012 bennett, r. e. (2011). formative assessment: a critical review.assessment in education: principles, policy & practice, 18(1), 5-25. doi.10.1080/0969594x.2010.513678 butler, d. l., & winne, p. h. (1995). feedback and self-regulated learning: a theoretical synthesis. review of educational research, 65(3), 245-281. doi.10.3102/00346543065003245 de jager, b., jansen, m., & reezigt, g. (2005). the development of metacognition in primary school learning environments. school effectiveness and school improvement, 16(2), 179-196. doi. 10.1080/09243450500114181 gudmundsdottir, s., & shulman, l. (1987). pedagogical content knowledge in social studies. scandinavian journal of educational research, 31(2), 59-70. doi.10.1080/0031383870310201 hattie, j. & timperley, h. (2007). the power of feedback. review of educational research, 77(1), 81-112. doi.10.3102/003465430298487 hattie, j. (2012). visible learning for teachers: maximizing impact on learning. routledge. doi.10.4324/9780203181522 holstein, k., hong, g., tegene, m., mclaren, b. m., & aleven, v. (2018, march). the classroom as a dashboard: co-designing wearable cognitive augmentation for k-12 teachers. in proceedings of the 8th international conference on learning analytics and knowledge (pp. 79-88). doi.10.1145/3170358.3170377 holstein, k., mclaren, b. m., & aleven, v. (2017, march). intelligent tutors as teachers' aides: exploring teacher needs for real-time analytics in blended classrooms. in proceedings of the seventh international learning analytics & knowledge conference (pp. 257-266). acm. doi.10.1145/3027385.3027451 keuvelaar-van den bergh, l. (2013). teacher feedback during active learning: the development and evaluation of a professional development programme. klinkenberg, s., straatemeier, m., & van der maas, h. l. (2011). computer adaptive practice of maths ability using a new item response model for on the fly ability and difficulty estimation. computers & education, 57(2), 1813-1824. doi.10.1016/j.compedu.2011.02.003 knoop-van campen, c.a.n., wise, a.f., & molenaar, i. (subm.). the equalizing effect of teacher dashboards on feedback in a k-12 classroom. lacourse, f. (2011). an element of practical knowledge in education: professional routines. mcgill journal of education / revue des sciences de l'éducation de mcgill , 46 (1), 73–90. doi.10.7202/1005670ar leeuwen, a. van, janssen, j., erkens, g., & brekelmans, m. (2014). supporting teachers in guiding collaborating students: effects of learning analytics in cscl. computers & education, 79, 28-39. doi.10.1016/j.compedu.2014.07.007 martinez-maldonado, r., clayphan, a., yacef, k., & kay, j. (2014). mtfeedback: providing notifications to enhance teacher awareness of small group work in the classroom. ieee transactions on learning technologies, 8(2), 187-200. doi.10.1109/tlt.2014.2365027. molenaar, i., & knoop-van campen, c. a. n. (2016, april). learning analytics in practice: the effects of adaptive educational technology snappet on students' arithmetic skills. in proceedings of the sixth international conference on learning analytics & knowledge (pp. 538-539). doi.10.1145/2883851.2883892 molenaar, i., & knoop-van campen, c.a.n. (2017a, september). teacher dashboards in practice: usage and impact. in european conference on technology enhanced learning (pp. 125-138). springer, cham. doi.10.1007/978-3-319-66610-5_10 molenaar, i., & knoop-van campen, c.a.n. (2017b, september). how teachers differ in using dashboards: the classroom observation app. presented at the workshop “multimodal learning analytics across (physical and digital) spaces” at european conference on technology enhanced learning. springer, cham.. springer, cham. molenaar, i., & knoop-van campen, c.a.n. (2019). how teachers make dashboard information actionable. ieee transactions on learning technologies, 12(3), 347-355. doi.10.1109/tlt.2018.2851585. molenaar, i., & schaik, a. van (2016) a methodology to investigate classroom usage of educational technologies on tablets. in: aufenanger, s., bastian, j. (eds.) tablets in schule und unterricht. forschungsergebnisse zum einsatz digitaler medien , pp. 87–116. springer, wiesbaden. doi.10.1007/978-3-658-13809-7_5 roelofs, e., & sanders, p. (2007). towards a framework for assessing teacher competence. european journal of vocational training, 40(1), 123-139. shute, v. j. (2008). focus on formative feedback. review of educational research, 78(1), 153-189. doi.10.3102/0034654307313795 schwartz, r. m. (2005). decisions, decisions: responding to primary students during guided reading. the reading teacher, 58 (5), 436-443. doi.10.1598/rt.58.5.3 verbert, k., govaerts, s., duval, e., santos, j. l., assche, f., parra, g., & klerkx, j. (2014). learning dashboards: an overview and future research opportunities. personal and ubiquitous computing, 18(6), 1499-1514. doi. 10.1007/s00779-013-0751-2 verloop, n., van driel, j., & meijer, p. (2001). teacher knowledge and the knowledge base of teaching. international journal of educational research, 35(5), 441-461. doi.10.1016/s0883-0355(02)00003-4 voerman, l., meijer, p. c., korthagen, f. a., & simons, r. j. (2012). types and frequencies of feedback interventions in classroom interaction in secondary education. teaching and teacher education, 28 (8), 1107-1115. doi.10.1016/j.tate.2012.06.006 wood, d., bruner, j. s., & ross, g. (1976). the role of tutoring in problem solving. journal of child psychology and psychiatry, 17(2), 89-100. doi.10.1111/j.1469-7610.1976.tb00381.x microsoft word baptista et al_publication.docx frontline learning research vol.3 no. 3 special issue (2015) 55 67 issn 2295-3159 1 corresponding author: ana baptista, learning development, student services directorate, mile end library queen mary university of london, mile end road, london e1 4ns. phone: +44 (0)20 7882 2838, email a.baptista@qmul.ac.uk, doi: http://dx.doi.org/10.14786/flr.v3i3.147 the doctorate as an original contribution to knowledge: considering relationships between originality, creativity, and innovation ana baptistaa, liezel frickb, karri holleyc, marvi remmikd, jakob tesche, gerlese âkerlindf aqueen mary university of london, uk bstellenbosch university, south africa cuniversity of alabama, usa duniversity of tartu, estonia einstitute for research information and quality assurance, germany faustralian national university and university of canberra, australia article received 31 january 2015 / revised 17 july 2015 / accepted 3 august 2015 / available online 12 october 2015 abstract this article explores the meaning of originality in doctoral studies and its relationship with creativity and innovation. doctoral theses are expected to provide an original contribution to knowledge in their field all over the world. however, originality is not well defined. using the literature on concepts of originality as a foundation, this article shows that originality is not a concept commonly understood. creativity introduces a focus on the production of knowledge, which is not just novel but also meaningful. innovation is becoming of increasing importance in doctoral theses with the societal shift to knowledge-based economies and introduces the requirement of immediate relevance for economic purposes in doctoral education. while the three elements appear to be substantial building blocks of the potential contribution doctoral work can make in the 21st century, it is unclear the extent to which doctoral theses fulfil these expectations. the article discusses this problem with a focus on implications for doctoral education. keywords: doctoral education; originality; creativity; innovation baptista et al | f l r 56 1. introduction the role of the doctoral thesis as an original contribution to knowledge has traditionally signalled a high level of intellectual output within the academic discipline. while considered an essential component of doctoral education, the nature of originality is typically ill-defined. commonly associated with the production of new knowledge, originality is increasingly seen as inherent to creativity and innovation (european universities association, 2010). however, how the three concepts of originality, creativity and innovation operate within the doctoral education process, independently and collectively, is unclear. in addition, questions remain over how and whether originality, creativity, and innovation may be facilitated in doctoral programs, even though these concepts are commonly found in policy documents and literature on doctoral education. nowotny, scott and gibbons (2001) suggest the production of knowledge within the knowledge society values creativity, application and flexibility, a process that is enhanced in the doctoral environment (walsh, anders & hancock, 2013). doctoral students form a key component of such knowledge production and are therefore directly influenced by how such notions – specifically originality, creativity and innovation – are defined and influence each other. this article therefore explores the meaning of originality in doctoral studies and the relationship with innovation and creativity. the aim is to provide insights into the nature of originality in doctoral education for 21st century knowledge societies. 2. originality the debate about the originality of doctorates dates back to the 19th century (mommsen, 1876). while originality has been a long-held requirement of doctorates, the publication of doctoral theses introduced in the 19th century helped to reduce fraud and enabled the assessment of originality by relevant disciplinary communities. for example, since the first uk doctorate was awarded in 1917, the degree has required “an output that constitutes original research as defined by the academic community into which the candidate wishes to be admitted” (qaa, 2011, p.12). this requirement places thesis examiners in a powerful brokerage position with responsibility to enact a judgement of originality on behalf of their respective academic community, although the assessment of appropriate degrees of originality differs substantially amongst examiners (clarke & lunt, 2014; denicolo, 2003; johnston, 1997). for over a century, the quality of originality has been considered essential to the doctoral thesis (european universities association, 2007, 2010; australasian qualifications framework advisory board, 2007; hornbostel, 2009; association of american universities 1998, in lovitts, 2005; new zealand qualifications authority, 2001; uk quality assurance agency for higher education, 2011) in order to achieve what is commonly now referred to as ‘doctorateness’ (wellington, 2013). the council for doctoral education of the european universities association (eua), as one example, recommends as the first principle of doctoral education that “the core component of doctoral training is the advancement of knowledge through original research” (2010, p.2). moving beyond a surface-level assessment of originality requires attention to the development of original thought and original work (clarke & lunt, 2014). for the former, new knowledge might be generated as a result of the doctoral thesis, or existing knowledge might be applied to result in a new understanding. for the latter, developing a musical score or a painting can indicate original work. not only are doctoral students required to assess and categorise existing bodies of knowledge through this process, but they also draw conclusions regarding knowledge and make decisions about implementation (simpkins, 1987, cf. lovitts, 2007). originality may be evident in the study’s design, the knowledge synthesis, the implications, or the way in which the research is presented (wellington, 2010). baptista et al | f l r 57 this assessment emphasises the nuanced ways in which the outcome of originality might be achieved. applying existing methods to new data could result in incremental additions to the knowledge base, while the application of new methods, new questions, or new ideas could generate more substantial shifts in knowledge (lovitts, 2005). this variability underscores the emphasis on significance in doctoral research. whilst significance is not inherently a component of originality (johnston, 1997), it is important to note that original research within the context of doctoral education is expected to provide knowledge of significance to the field of study (tinkler & jackson, 2004). these varying perspectives on originality show that it does not have a universal definition, nor does it manifest in the same way in all doctoral work. originality is not only related to an outcome or product, but also to the overall process of producing an outcome. a doctoral student cannot achieve a product without undergoing a process that stimulates the creation of that product. what is deemed original may vary between disciplines, programmes and even individual projects. the originality of a dissertation can be expressed in a number of ways, and the kind of originality that is recognised and appreciated has traditionally been dependent on discipline (guetzkov, lamont & mallard, 2004; lamont, 2009; lovitts, 2007). disciplinary variation influences the assessment of originality. for example, clarke and lunt (2014) suggest that originality in science, technology, engineering and mathematics disciplines is defined by publishability, whilst in arts, humanities and social sciences it is related to intellectual originality. guetzkow and colleagues (2004) argue that natural sciences define originality “as the production of new findings and new theories”, while social sciences and humanities define it “much more broadly: as using a new approach, theory, method, or data; studying a new topic, doing research in an understudied area; or producing new findings” (p.190). disciplinary implications are evident for phd students’ perceptions and expectations about the phd as process and product, and also for the way students learn how to do research, and consequently what it means to be original. knowledge is rarely de-contextualised, and numerous factors influence the way an individual frames a question and chooses the path to answer that question. disciplines consist of old and emerging specialisms (kekäle, 2000), and how these different bodies of knowledge are defined and arranged determines the output (bailin, 1985). knowledge defined as old or emergent may intertwine to create a process or product that may be called original. delamont, atkinson and parry (2000, p.174) state that: “the originality of postgraduate research is always defined in terms of the essential tension between accepted prior knowledge and new discoveries or ideas”. disciplinary influences are evident in cultural norms including the research process (such as group projects or those led by a supervisor), the form of the thesis (such as monograph or articlebased), and the long-term impact on the field (such as future publication and citation impacts). thus, a definition of originality in doctoral degrees assumes different nuances in different contexts. numerous issues should be considered in addressing originality in doctoral education: • the interplay between old and new, i.e. that originality inevitably builds on existing knowledge and practices in some way; • disciplinary variation in originality; • the existence of degrees of originality, and the need for originality to be accompanied by significance; • the need to address originality in doctoral process as well as product, with associated implications for research training. both bennich-björkman (1997) and beghetto (2013) agree that originality can be defined as something that is new or novel, but originality does not necessarily have to be applicable or relevant. herein lies the difference between originality and creativity, as described below. baptista et al | f l r 58 3. creativity along with the expectation of originality, doctoral research is strongly associated with creativity, commonly as a way in which students engage in the research process. for example, the australian qualifications framework (2013) specifies that doctoral graduates are required to demonstrate “the application of knowledge and skills with initiative and creativity”. thus creativity implies that a contribution (such as a doctoral thesis) needs to be both novel (original) and relevant (according to bennich-björkman, 1997) or applicable (according to beghetto, 2013). beghetto (2013) defines creativity as anything deemed as both original and task-appropriate within a particular socio-cultural-historical context – such as an academic discipline. the genealogy of creativity can be traced back to the greek word ‘krainein’, which means to fulfil. people who fulfil their potential, who express an inherent drive or capacity, can be seen as creative (evans & deehan, 1988). pope (2005, p.11) consequently defines creativity as “the capability to make, do or become something fresh and valuable with respect to others as well as ourselves”, which involves “a grappling deep within the self and within one’s relations with others: an attempt to wrest from the complexities and contradictions we have internalised”. this definition goes beyond creativity in the thesis production and process, to creativity of the person, i.e., the doctoral graduate themselves. this positions creativity as including the full realisation and expression of a person’s potential (lovitts, 2008; mackinnon, 1970) – thus ‘becoming doctorate’, a responsible and independent scholar (barnacle, 2005). assessing creativity requires attention to the intellectual context, including big c creativity, or that which brings about knowledge new to the human race, and pro c creativity, which occurs within a professional workspace (kaufman & beghetto, 2007). the disciplinary context adds another important variable, underscored by the key elements of motivation, independence, and intellectual challenge (jurisevic, 2011). bennich-björkman’s (1997) classification scheme (see table 1) offers further insights into the relationship between originality and creativity. table 1 classification of research contributions (adapted from bennich-björkman, 1997, p.25) is the contribution novel? yes no is the contribution relevant? yes creative cumulative no original replication the relationship between originality and creativity, according to bennich-björkman, is defined through novelty and relevance. in principle, relevance may be determined at individual, societal or economic levels (steinberg & lubart, 1999), but in the case of the doctorate, most commonly refers to the judgment of the disciplinary community in which the doctorate is produced. while creative work is expected to be relevant as well as novel, originality is expected only to be novel. by taking the focus off of immediate relevance, the pure concept of originality recalls blue skies research and an emphasis on the pursuit of knowledge for its own sake. this view of originality thus seems appropriate to the time when the expectation of original research was first introduced into the doctorate, with the rise of the modern university in the late 19th century. in addition to counterposing creativity and originality, bennich-björkman’s classification attempts to define knowledge production that is not original. cumulative research is characterised as being highly baptista et al | f l r 59 relevant, in the sense of being valuable or useful to disciplinary communities, but not novel. this focus on relevance positions cumulative research as a valuable contribution to knowledge, but neither original nor creative. replication of research is positioned as neither novel nor relevant, but is nonetheless an important aspect of knowledge development that increases the reliability of research findings and thus trust in the outcomes – small studies may be replicated on a larger scale or with another sample, for instance. disciplinary differences matter, as cumulative research and replicative studies are not uncommon in many natural science doctorates. thus, the ‘in practice’ definition of originality in doctoral theses may be made as much on pragmatic grounds as on conceptual ones. the product of a creative endeavour demonstrates an original and appropriate contribution that has purpose and can be judged by some sort of external criteria (sternberg & lubart, 1999). a process-product distinction exists between creativity and originality, with the idea of a creative process underpinning an original product or outcome. this distinction has implications for the design of doctoral education, suggesting that originality in research outcomes may best be achieved by encouraging creative processes during the candidature, such as a creative learning environment or peer collaborations. the notion of fit for purpose that our discussion has highlighted as a key aspect of creativity raises questions such as fit for whom or what? such questions open the door to innovation being one of the drivers of research in the 21st century that also needs to be considered in the contributions doctoral work is expected to make. 4. innovation innovation has become an increasing expectation of doctoral studies as part of the global post world war ii economic shift from industrial and manufacturing based economies to technological and knowledge based economies (delanty, 2001; marginson & considine, 2000; rolfe, 2013). by definition, innovation involves the process of transforming an invention into practical application, and is most commonly associated with private industry (marsh, 2010). as the production of knowledge has come to be of increasing importance to national economies, university research is expected to better serve the needs of industry, through innovation in science and technology in particular. the term ‘innovation’ is most often found in economic discourses on production processes or products (marsh, 2010). governmental higher education policies place an emphasis on stronger links between industry and universities, and development of knowledge that can be exploited for economic benefit (delanty, 2001; henkel, 2000), bringing the concept of innovation firmly into the 21st century doctoral education. the lisbon declaration on the purpose of europe’s universities (2007) strongly links university research with innovation, emphasising the importance of universities’ “capacity for promoting cultural, social and technological innovation” (p.1) and that “to meet the challenges of the twenty-first century (...) [requires] technological and social innovation which will solve problems as they arise and ensure economic success” (p.2). thus, innovation as part of doctoral research privileges the production of knowledge that is economically useful, either in terms of technological advances or societal use. technological innovation is typically linked to marketable technologies, for example developing patents. social innovation would relate to applied research aimed at improving societal conditions or solving societal problems. examples are abundant in a variety of disciplines ranging from medicine (eg, curbing mother to child transfer of hiv/aids) to education (eg, improving literacy rates). in classical economic theory, innovators are considered creative entrepreneurs who successfully acquire monopoly positions with innovative products or production processes (schumpeter, 1912). innovation is defined as the practical application of a novel, and thus original idea, but it must be an idea with a potential application: “innovations of any kind start with some kind of creative enterprise, and the enterprise must produce work that is not just novel, but useful. innovation is the channelling of creativity so as to produce a creative idea and/or product that people can and wish to use” (sternberg, pretz & kaufman, 2003, p.158). baptista et al | f l r 60 the doctorate is increasingly economically positioned as an important source of skilled and innovative knowledge workers, as required by a knowledge-based economy with a strong emphasis on research and development. this position has led to an exponential growth in the number of phds awarded internationally, especially in the natural sciences and engineering (cyranoski, gilbert, ledford, nayar & yahia, 2011), and a shift in expectations of employment post-phd away from academia and towards industry, government and private enterprise (auriol, 2010; enders, 2005). innovation has claimed a prominent place in defining a key purpose of the 21st century doctorate as preparing the candidate for a future career in either academe or industry, and developing skills for employability (wellington, 2013). the extent to which these developments have changed the conditions under which knowledge is produced in doctoral theses and science in general is unclear (geiger, 2004). the literature on thesis examiners shows hardly any expectation of innovation in doctoral theses in terms of developing applications for industry, though engineering is an exception here, where an application of existing methods to a problem from engineering practice is considered original, just as is the invention of new devices (lovitts, 2007, p.173). similarly, the conceptualisation of originality in economics, as the application of existing methods to a novel problem, is also considered original (lovitts, 2007, p.173). both disciplines consider practical problem solving as an original contribution. 5. implications for doctoral education risk is intrinsically linked to originality, creativity and innovation, and is thus an unavoidable element of doctoral education (frick, albertyn & bitzer, 2014). doctoral education is inherently risky given the requirement to produce original knowledge. the lisbon declaration (2007) argues that universities “should encourage a culture of risk-taking (...) in order to produce an institutional milieu favourable to creativity, knowledge creation and innovation” (p.3), reinforcing the idea that an original contribution requires a certain amount of risk-taking in choosing a topic and approach, due to the novelty aspect inherent to originality. students need to have “the courage and confidence to take risks, to make mistakes, to invent and reinvent knowledge, and to pursue critical and lifelong inquiries in the world, with the world, and with each other” (freire, 1970, cited in lin & cranton, 2005, p.458). mackinnon (1970) agrees that the courage to take risks is an important characteristic of creative endeavours – such as doctoral studies. however, balancing risk with originality, creativity and innovation may provide challenges for the supervisory relationship and the research process (brown, 2010; latham & braun, 2009). therefore, it is important not only to manage risk constructively, but also to understand how it manifests within doctoral education. byrnes, miller and schafer (1999) refer to four aspects that need consideration when defining risk that could be applied to doctoral education. firstly, risk is closely associated with goals, values and outcomes. hence, the importance of current debates about the purpose of a doctorate in a risk society full of uncertainties and changes (park, 2005, 2007), as well as the definition of supervisory and research responsibilities and roles that characterise doctoral students and supervisors. secondly, risk involves interplay between an individual’s subjective perception of risk and the perceptions of the larger community. different students and different supervisors may interpret risk differently, which may influence how they negotiate their relationship and study focus. thirdly, individual characteristics determine the extent of possible risk. for instance, a study may be less risky if the doctoral student has particular research and/or subject expertise. finally, context determines “who can take what risks and how” (hood, jones, pidgeon, turner, gibson & bevan-davies, 1992, p.136). for example, certain projects may become less risky if expert supervision and other resources are readily available. this conceptualisation of risk reflects significant forces that relate to elements in the context, relationships in the supervisory process, and individual characteristics of doctoral students. these forces are reflected in the broader literature on doctoral education, which highlights several factors that may affect the overall success of a doctorate, including: (i) characteristics of the doctoral candidate themselves; (ii) nature baptista et al | f l r 61 of the doctoral supervision experienced; and (iii) institutional, departmental, disciplinary and external cultures. each of these factors is explored in more detail below. individual student characteristics can strongly impact on the originality of their work. for instance, doctoral education requires that students at times work independently in an uncertain environment. within this environment, healthy program cultures encourage risk-taking by students within the context of the field. the interpretation of risk is a process fraught with possible complications, particularly in terms of the expertapprentice relationship still prevalent between the supervisor and student. however, students who have been socialised in an undergraduate academic culture or a professional environment that promote novel ways of knowing will have a stronger foundation for originality. in addition to student characteristics, doctoral supervision is one of the most important influences on research student outcomes (latona & browne, 2001; seagram, gould & pyke, 1998). evans (2004) conceptualizes the role of the supervisor as that of risk manager and risk mitigator, acting as an intermediary between the demands of society, the discipline(s) involved, the institution and the doctoral candidate. frick, albertyn and bitzer (2014) report various strategies that supervisors use at different stages during the doctorate to support students and mitigate risk, including formulating clear expectations; determining and developing student capability, independence, analytical thinking skills, problem solving skills, integrative thinking skills, creativity, and expectations during the student selection phase; encouraging wide reading, critical debate, benchmarking, time for incubation of ideas, and challenging students during conceptualisation of the study; developing academic writing and methodological skills through incorporating expert input; supporting networking, colloquia, regular contact, communication, co-supervision and mentoring practices; and promoting peer review and writing for publication during the doctorate. they encourage further research that explores ways of balancing rather than controlling risk, while encouraging innovation in the doctoral education process. increased awareness of risk could lead supervisors to contain risk in a responsible manner. of course, it is not only the student who assumes the risk in terms of research, but also the supervisor. institutional, departmental, disciplinary and external cultures influence how faculty and students engage with a doctoral curriculum. backhouse (2009), frick (2012) and holligan (2005) point to cultural factors (including bureaucratic institutional systems, ethics and funding policies) as determinants of the extent to which risk-taking is possible in doctoral studies. for instance, a danger of the current emphasis on doctoral throughput in the minimum allocated time is that it may lead to avoiding the risk of choosing a complex and less defined problem. not all research that may be considered original requires lengthy periods of time, but nor can all research be contained within minimum, finite time periods. ultimately, the process of doctoral education is influenced by the various cultures in which such work takes place. in particular, how such cultures define novel knowledge outcomes is highly relevant. clearly, approaches to doctoral education that might encourage originality are patchy, making it difficult to design an educational agenda for the future when there are so many uncertainties and unpredictable changes embedded in doctoral (research) education and supervision, and when concepts that characterise this challenging high-level process overlap and seem somewhat blurred. but perhaps operating in a state of uncertainty, unpredictability and blurred boundaries is what the future of higher education is all about. 6. conclusions: insights into the nature of originality in doctoral research we can see from this examination of originality, creativity and innovation the extent to which all three concepts are often defined with reference to each other. clearly, these concepts share a focus on novelty in research. where the concepts differ is in the underlying purpose or intention for seeking novelty – with creativity it is disciplinary relevance or value, with innovation it is useful economic outcomes, whilst baptista et al | f l r 62 with originality it is more blue skies knowledge seeking – but all three of these concepts may influence the way in which the potential contribution of doctoral work is seen. but whilst originality may be free of instrumental connotations, a doctorate is not. doctoral theses are expected to make not just an original but also significant contribution to the field, the implication being that there is little value in originality if it is not also significant. however, the determination of significance is context-dependent. what would be considered significant in the 19th century would likely be different to the 21st century, and in one discipline or subspecialisation different to another, for instance. it could be argued that creativity and innovation all incorporate originality, in the form of novelty in research. hence, it may be possible to have originality without creativity or innovation, but not vice versa. meanwhile, all three concepts can contribute to the development of the doctoral contribution in overlapping but different ways. conceptually, the links between these concepts can be displayed as follows: figure 1. the relationship between originality, creativity and innovation. in figure 1 we show that originality, creativity and innovation are related elements that can all contribute to the doctoral contribution, but that the emphasis shifts depending on the concept. as doctorateness seems to be a multi-faceted concept itself (wellington, 2013) this fluid emphasis may be useful to allow for (trans)disciplinary, programme and individual differences in what it means to be doctorate. baptista et al | f l r 63 meanwhile, in the current economic and socio-political climate, the question of whether doctoral studies can or should be safe-guarded from instrumental requirements for applied relevance must be considered. doctoral theses call not just for originality, but originality that advances the field in a substantial way. just as the internal characteristics of the field change over a period of time, so does the external context which helps give shape to (and ultimately, contribute to a definition of) knowledge production. while this demand need not include the focus on economic benefits or relevance attached to innovation or creativity, it still places constraints on the type of originality considered appropriate for a doctoral thesis. appropriate approaches to developing originality as part of doctoral education need to be considered. although expectations of originality in doctoral work seem ubiquitous, there is little literature on design of curricula or pedagogical processes for supporting the development of originality. as described above, the concept remains vague to examiners and supervisors (clarke & lunt, 2014; lovitts, 2007). meanwhile, a common assumption seems to exist that the process of engaging in doctoral research will in and of itself lead to originality, as if through some magical process: “the goal of doctoral education is to cultivate the research mindset, to nurture flexibility of thought, creativity and intellectual autonomy through an original, concrete research project. it is the practice of research that creates this mindset” (european universities association, 2010, p.2). the unanswered question from this statement is how the practice of research cultivates these attributes, and in what ways doctoral education might intentionally foster these outcomes. such vague notions for ensuring the development of such a central expectation of doctoral education seem inappropriate in the context of the 21st century focus on higher education efficiency, accountability and quality assurance. considering the ways in which doctoral education can facilitate originality requires attention to the doctoral curriculum, i.e. process, as well as the thesis outcomes, i.e. product. 7. outlook in exploring the nature of originality, this article has linked different conceptualisations of novelty as applied to doctoral theses, showing that while originality appears to be the basic requirement, other expectations such as creativity and innovation, and associated criteria of usefulness and economic advancement have recently appeared on the agenda. this association suggests a new differentiation in the requirements for doctoral theses. however, the relation between these concepts is not yet fully clear. the question remains as to whether the differentiation of requirements for a doctoral thesis is just a mirror of changes affecting research and knowledge creation in general, or whether there are more nuanced issues to consider related to doctoral education specifically. as the doctorate is seen as the initial process in becoming a researcher, changing requirements for the doctorate will most likely affect the way knowledge creation operates in the future. higher education has experienced these changes before. as one example, the publication of doctoral theses is now commonplace, and many institutions offer open public access to theses produced by doctoral graduates. another example involves the development of the group dissertation for certain disciplines. these so-called ‘capstone projects’ not only encourage students to work collaboratively, but they often involve external stakeholders. the challenge of defining original research has implications for the nature of doctoral training, and specifically for the internal function of disciplines and for the relation between academic disciplines and society. future research should examine the extent to which these new requirements are part of institutional guidelines, supervisors’ expectations and doctoral students’ identity conceptualisations. an even more fundamental question is about the determination or assessment of originality. a troubling reality underscores the consideration of originality in doctoral education – to what extent have doctoral theses ever been shown to fulfil the requirement of an original and significant contribution to knowledge, apart from via the subjective judgments of examiners? with theses by publication becoming more widespread, new pathways for intra-individual replicapability of originality and in depth analysis baptista et al | f l r 64 emerge, for example through the application of bibliometric tools and content analysis of citations. however, the question of which stakeholders should be involved in this assessment and what bibliometric indicators might be utilised are unresolved issues. another almost unquestioned theme in the extant literature is that originality arises out of the doctoral training process, be it an intensive supervisor-mentee relationship or more structured doctoral training conditions. this assumption is particularly noteworthy given that no valid database exists that can be used to demonstrate whether a doctoral thesis can be considered original, much less which experiences contribute to a doctoral student being able to perform such work. future research should take steps towards unpacking the relationship between doctoral training conditions and outcomes, in the sense of fulfilling the requirement of originality. the following questions offer ideas for future research: • what are doctoral program designers’ conceptualisations of originality? • how do these relate to conceptualisations of originality by supervisors, examiners and students? • which requirements can be achieved through better training, and which are dependent on individual characteristics of doctoral students, such as propensity for risk-taking? cross country and international comparisons could be valuable here; although the doctorate shares commonalities in the international context, the degree to which the doctorate is organised as a training process varies from country to country. this article has considered how originality builds on existing knowledge and practices by stimulating an interplay between old and new. how should doctoral curricula and the supervisory relationship explicitly develop students’ originality skills? it is incorrect to assume that all doctoral supervisors and those who design curricula at doctoral level at all higher education institutions possess originality skills themselves. additionally, formal structures at contextual and institutional levels, where doctoral education and supervision take place, as well as in national contexts stimulate both the definition of originality as well as the attitude towards research and knowledge. to tackle these questions, the research agenda for the future should open spaces for discussions about the place of originality in the supervisory relationship, curricula design, and the cultural environment that an institution and even a research group has to offer. disciplines should strengthen dialogues about the requirements for a doctoral thesis in their field, and research should supply these discussions with evidence based knowledge. simultaneously, a critical approach to the different discourses at different levels should be reviewed in the light of the most relevant and updated literature. these dialogic interactions between practices, perceptions and research may be a way of improving the overall experience students and supervisors will have in doctoral programs. keypoints in exploring the nature of originality, this article has linked different conceptualisations of novelty as applied to doctoral theses, showing that while originality appears to be the basic requirement, other expectations such as creativity and innovation, and associated criteria of usefulness and economic advancement have recently appeared on the agenda. the challenge of defining original research has implications for the nature of doctoral training, and specifically for the internal function of disciplines and for the relation between academic disciplines and society. further research must be carried out in order to shed light on possibly diverse ways of determining or assessing originality baptista et al | f l r 65 references australian qualifications framework. (2013). aqf specification for the doctoral degree. retrieved in october 2014, from http://www.aqf.edu.au/wp-content/uploads/2013/05/14aqf_doctoral-degree.pdf auriol, l. (2010). careers of doctorate holders: employment and mobility patterns. sti working paper 201/4. paris: oecd. statistical analysis of science, technology and industry. backhouse, j. (2009). creativity within limits: does the south african phd facilitate creativity in research? journal of higher education in africa, 7(1/2), 265-288. bailin, s. (1985). on originality. interchange, 16(1), 6-13. barnacle, r. (2005). research education ontologies: exploring doctoral becoming. higher education research and development, 24(2), 179-188. bennich-björkman, l. (1997). organising innovative research: the inner life of university departments. oxford: iau press, pergamon. brown l. 2010. balancing risk and innovation to improve social work practice. british journal of social work, 40, 1211–1228. byrnes, j.p., miller, d.c., & schafer, w.d. (1999). gender differences in risk taking: a meta-analysis. psychological bulletin, 125(3), 367-383. clarke, g., & lunt, i. (2014). the concept of ‘originality’ in the ph.d.: how is it interpreted by examiners? assessment & evaluation in higher education, 39(7), 803-820. cyranoski, d., gilbert, n., ledford, h., nayar, a., & yahia, m. (2011). the phd factory. nature, 472, 277279. delanty, g. (2001). challenging knowledge – the university in the knowledge society. buckingham: srhe & open university press. delamont. s., atkinson, p., & parry, o. (2000). the doctoral experience. success and failure in graduate school. london: falmer press. denicolo, p. (2003). assessing the phd: a constructive view of criteria. quality assurance in education, 11(2), 84-91. enders, j. (2005). border crossings: research training, knowledge dissemination and the transformation of academic work. higher education, 49(1-2), 119-133. european universities association. (2007). lisbon declaration europe’s universities beyond 2010: diversity with a common purpose. brussels: eua. european universities association. (2010). salzburg ii recommendations european universities' achievements since 2005 in implementing the salzburg principles. brussels: eua. evans, p., & deehan, g. (1988). the keys to creativity. london: grafton. evans, t. (2004). risky doctorates: managing doctoral studies in australia as managing risk. paper presented at the australian association for research in education conference, melbourne, 28 november – 2 december 2004. frick, b.l. (2012). pedagogies for creativity in science doctorates. in a. lee & s. danby (eds.), reshaping doctoral education: programs, pedagogies, curriculum (pp. 113-127). london: routledge. frick, b.l., albertyn, r.m., & bitzer, e.m. (2014). conceptualising risk in doctoral education: navigating boundary tensions. in e.m. bitzer, r.m. albertyn, b.l. frick, b. grant & f. kelly (eds.), candidates, supervisors and institutions: pushing postgraduate boundaries. stellenbosch: sunmedia. geiger, r. (2004). knowledge and money: research universities and the paradox of the marketplace. stanford: stanford university press. guetzkov, j., lamont, m., & mallard, g. (2004). what is originality and the humanities and the social sciences? american sociological review, 69(2), 190-212. henkel, m. (2000). academic identities and policy change in higher education. london and philadelphia: jessica kingsley publishers. holligan, c. (2005). fact or fiction? a case history of doctoral supervision. educational research, 47(3), 267-278. baptista et al | f l r 66 hood, c., jones, d.k.c., pidgeon, n.f., turner, b.a., gibson, r., & bevan-davies, c. (1992). risk management. in the royal society (eds.), risk: analysis, perception and management. london: the royal society. hornbostel, s. (2009). promotion im umbruch – bologna ante portas. in m. held, g. kubon-gilke & richard (eds.), bildungsökonomie in der wissensgesellschaft: band 8. jahrbuch normative und institutionelle grundfragen der ökonomik (pp. 213–240). marburg: metropolis verlag. johnston, s. (1997). examining the examiners: an analysis of examiners' reports on doctoral theses. studies in higher education, 22(3), 333-347. juriševič, m. (2011). postgraduate students’ perception of creativity in the research process. cepsjournal, 169. kaufman, j. c., & beghetto, r. a. (2009). beyond big and little: the four c model of creativity. review of general psychology, 13(1), 1. kekäle, j. (2000). quality assessment in diverse disciplinary settings. higher education, 40(4), 465-488. lamont, m. (2009). how professors think: inside the curious world of academic judgment. cambridge, ma: harvard university press. latham s. & braun m. 2009. closing the loop: innovation and decline. academy of management annual meeting, chicago, il. august 7-11. latona, k., & browne, m. (2001). factors associated with completion of research higher degrees. canberra: higher education division, department of education, training and youth affairs. lin l & cranton p. 2005. from scholarship student to responsible scholar: a transformative process. teaching in higher education, 10(4), 447–459. lovitts, b.e. (2005). being a good course-taker is not enough: a theoretical perspective on the transition to independent research. studies in higher education, 30(2), 137-154. lovitts, b.e. (2007). making the implicit explicit: creating performance expectations for the dissertation. sterling, va: stylus. lovitts, b.e. (2008). the transition to independent research: who makes it, who doesn’t, and why. the journal of higher education, 79(3), 296-325. mackinnon, d. (1970). creativity: a multi-faceted phenomenon. in j.d. roslansky (ed.), creativity (pp. 1732). amsterdam: north-holland. marginson, s., & considine, m. (2000). the enterprise university: power, governance and reinvention in australia. cambridge & melbourne: cambridge university press. marsh, i. (2010). innovation and public policy the challenge of an emerging paradigm. canberra: australian innovation research centre. mommsen, t. (1905 [1876]). die deutschen pseudoktoren. in o. hirschfeld (ed.), theodor mommsen: reden und aufsätze (pp. 402-409). berlin: weidmann. new zealand qualifications authority. (2001). national qualifications framework. wellington: new zealand qualifications authority. nowotny, h., scott, p., & gibbons, m. (2001). re-thinking science: knowledge and the public in an age of uncertainty (p. 12). cambridge: polity press. park, c. (2005). new variant phd: the changing nature of the doctorate in the uk. journal of higher education policy and management, 27(2), 189-207. park, c. (2007). redefining the doctorate. york: the higher education academy. pope, r. (2005). creativity: theory, history, practice. london: routledge. quality assurance agency for higher education. (2011). uk quality code for higher education: doctoral degree characteristics. gloucester: quality assurance agency for higher education. rolfe, g. (2013). the university in dissent. london and new york: routledge. schumpeter, j. a. (1997 [1912]). theorie der wirtschaftlichen entwicklung: eine untersuchung über unternehmergewinn, kapital, kredit, zins und den konjunkturzyklus (9th ed.). berlin: duncker und humblot. seagram, b., gould, j., & pyke, s. (1998). an investigation of gender and other variables on time to completion of doctoral degrees. research in higher education, 39(3), 319-335. baptista et al | f l r 67 sternberg, r.j., & lubart, t.i. (1999). the concept of creativity: prospects and paradigms. in r.j. sternberg (ed.), handbook of creativity (pp. 3-15). cambridge: cambridge university press. sternberg, r., pretz, j., & kaufman, j. (2003). types of innovations. in l.v. shavinina (ed.), the international handbook on innovation (pp. 158-169). oxford: elsevier. tinkler, p., & jackson, c. (2004). the doctoral examination process. maidenhead: open university press. walsh, e., anders, k., & hancock, s. (2013). understanding, attitude and environment: the essentials for developing creativity in stem researchers. international journal for researcher development, 4(1), 19-38. wellington, j. (2010). making supervision work for you. london: sage. wellington, j. (2013). searching for 'doctorateness'. studies in higher education, 38(10), 1490-1503. dy research (3rd ed.). thousand oaks, ca: sage. frontline learning research 3 (2014) 22-30 issn 2295-3159 corresponding author: associate professor gavin t l brown, school of learning, development, & professional practice, faculty of education, the university of auckland, private bag 92019, auckland, 1142, new zealand, gt.brown@auckland.ac.nz http://dx.doi.org/10.14786/flr.v2i1.24 22 | f l r the future of self-assessment in classroom practice: reframing selfassessment as a core competency gavin t. l. brown a , lois r. harris b a university of auckland, new zealand b central queensland university, australia article received 23 rd october 2013 / revised 7 th november 2013 / accepted 26 th february 2014 / available online 25 th april 2014 abstract formative assessment policies and self-regulation theories argue that student selfassessment of their own work and processes are useful for raising academic performance and self-regulatory skills. however, research into student self-evaluation raises serious doubts about the quality of self-assessment as an assessment process and identifies conditions which must be met if students’ judgments are to be useful, valid, and reliable. this paper recommends that student self-assessment should no longer be treated as an assessment, but instead as an essential competence for self-regulation. as such, we describe a potential curriculum approach that could guide teachers to appropriate use of self-assessment tools. keywords: student self-assessment; compulsory schooling; curriculum; research synthesis 1. introduction student self-assessment is an evaluation of a student‟s own work products and processes in classroom settings. formative assessment (a.k.a., assessment for learning) policies argue that student self-assessment is useful for raising academic performance (black & wiliam, 2006). research evidence suggests that selfassessment does contribute positively to learning outcomes, but its effects are highly variable, with many threats to its validity (brown & harris, 2013). nonetheless, student self-assessment is strongly advocated as an important classroom practice (e.g., leahy, lyon, thompson, & wiliam, 2005). this paper responds to recent and seminal reviews and position papers on self-assessment (e.g., andrade, 2010; brown & harris, 2013; boud & falchikov, 1989; butler, 2011; falchikov & boud, 1989; dochy, segers, & sluijsmans, 1999; g. t. l. brown & l. r. harris 23 | f l r dunning, heath, & suls, 2004; ross, 2006), all of which have raised issues to consider in relation to what is needed for the future of self-assessment. we are increasingly persuaded that self-assessment is not a robust assessment practice and that its real place in schooling is as a teachable and learnable component of selfregulated learning. however, current manifestations of self-assessment advocacy do not provide wellinformed guidance to researchers or practitioners about self-assessment. hence, our goal is to first establish the need for a self-assessment curriculum and second to sketch out what that curriculum could look like. 2. self-assessment as assessment assessment practices, which contribute to decision-making, need to be demonstrably valid and reliable (messick, 1989). the usefulness of self-assessment for decision-making seems to depend, in part, upon whether the student can accurately or realistically judge the qualities of their own work. however, the realism or veridicality (i.e., truthfulness) of self-assessment are difficult to ascertain, since this can only be determined through comparison to other people‟s (e.g., teachers, peers, or parents) judgements or ratings or to performance on externally devised tests or examinations. as butler (2011) makes clear, there has been a stream of research around student self-assessment that has emphasised the need for realistic, veridical, or verifiably accurate self-assessment if it is to effectively contribute to achievement (e.g., brown & harris, 2013; boud & falchikov, 1989; falchikov & boud, 1989; dunning, heath, & suls, 2004; ross, 2006). in contrast, there is another stream of research that has claimed that realism or veridicality in self-assessment is moot, since the self-assessment process helps students develop greater awareness of the quality of their work and criteria by which their work can be evaluated (e.g., andrade, 2010). butler (2011) concludes that the research in student self-assessment indicates that inaccurate, but positively biased, self-assessment leads to improved outcomes; while, inaccurate, negatively biased, self-assessment has a negative impact on achievement. hence, while there is empirical and theoretical evidence for contrasting positions around the realism of student self-assessments, we take the position that if self-assessment is to contribute to highlyconsequential decision-making (e.g., teacher decisions about grouping, curriculum planning, or retention/promotion and student decisions about pursuing or dropping further study in a topic area), then it is necessary for self-assessments to be demonstrably realistic or truthful. the research evidence is robust that the agreement between student self-assessment and other measures (e.g., test scores, teacher judgements, or peer ratings) is moderate at best (brown & harris, 2013; falchikov & boud, 1989). correlations between (a) self-ratings and teacher ratings, (b) self-estimates of performance and actual test scores, and (c) student and teacher rubric-based judgments tend to range from r≈.20 to .80, with few studies reporting correlations greater than r > .60 (brown & harris, 2013). greater realism and sophistication of self-assessment is more evident among more experienced and more able students. furthermore, consideration of how teachers use student self-assessment in classroom contexts suggests that there are other important factors which threaten reliability and validity. there is robust evidence that when self-assessments are disclosed (e.g., traffic light self-assessments displayed to the teacher in front of the class), there are strong psychological pressures on students that lead to dissembling and dishonesty (harris & brown, 2013; cowie, 2009). students may intentionally disguise the truth in order to protect their reputations. other students will rely on construct-irrelevant and subjective criteria (e.g., “i made an effort” or “i‟m good at this”), rather than intended criteria in judging the quality of their performance and this is associated with lower accuracy in self-evaluations (brown & harris, 2013). there are many factors in the human condition that contribute to unrealistic self-assessments, including tendencies to (a) be unrealistically optimistic about one‟s own abilities, (b) believe that one is above average, (c) neglect crucial information, and (d) have deficits in required information (dunning, heath, & suls, 2004). while substantial advocacy of self-assessment has been promulgated, these studies suggest that assessment for learning policies as implemented may have overlooked important facets of how humans make judgements. further, there may be some psychological and social danger to students in current self-assessment practices. hence, awarding grades or basing educational interventions or changes based on unrealistic or construct-irrelevant self-assessments is untenable. if self-assessment processes lead students to conclude wrongly that they are good or weak in some domain and they base personal decisions on such false interpretations, harm could be done, even in classroom settings (e.g., task avoidance, not enrolling in future g. t. l. brown & l. r. harris 24 | f l r subjects) (ramdass & zimmerman, 2008). the quality of interpretations and decisions depends on realistic input (messick, 1989) and because students are generally unrealistic in self-assessments, the use of such data in any formal way for assessment is probably unwarranted. hence, in many ways student self-assessment fails the validity and reliability requirements of an assessment robust enough on which to (a) base changes to classroom practice, (b) calculate grades or scores, and (c) include in any reporting. 3. self-assessment as self-regulation the use of self-assessment within assessment for learning policies draws on self-regulation of learning theories which identify student capabilities to set targets and evaluate progress against criteria as a basis for meta-cognitively informed improvement of learning outcomes (zimmerman, 2008). self-regulation refers to self-directive and self-generated metacognitive, motivational, and behavioural processes through which individuals transform personal abilities into control of outcomes in a variety of contexts (zimmerman, 2001). in taking action, an individual requires the ability to understand and choose a rationale for taking action (i.e., motivation), the capacity to set future-oriented objectives, plans, or projects (i.e., goals), the capability to select means or methods for obtaining his or her goals, even in the face of adversity or boredom (i.e., strategies), and the proficiency to monitor progress (i.e., self-assess), and adjust strategy implementation as appropriate (i.e., regulation) (brown, et al., 2005). there is evidence that students can improve their self-regulation skills through self-assessment (i.e., set targets, evaluate progress relative to target criteria, and improve the quality of their learning outcomes) (andrade, du, & wang, 2008; andrade, du, & mycek, 2010; brookhart, andolina, zuza, & furman, 2004). furthermore, self-assessment is associated with improved motivation, engagement, and efficacy (griffiths & davies, 1993; klenowski, 1995; munns & woodward, 2006; schunk, 1996), reducing dependence on the teacher (sadler, 1989). it is also seen as a potential way for teachers to reduce their own assessment workload, making students more responsible for tracking their progress and feedback provision (sadler & good, 2006; towler & broadfoot, 1992). thus, consistent with self-regulation theory, self-assessment contributes to greater meta-cognitive skills associated with greater achievement. in reviewing literature around student self-assessment practices, brown and harris (2013) found diverse enacted practices which they grouped into three major categories (i.e., self-estimation of performance, self-rating, and rubric based judgements). these three categories contain a wide variety of procedures; for example, (a) using a model answer as a reference (hewitt, 2001), (b) integrating teacherevaluation with self-evaluation (olina & sullivan, 2002), (c) self-correction (harward, allred, & sudweeks, 1994), (d) using a computerized prompt system (daiute & kruidenier, 1985), (e) self-selected reinforcements or rewards, especially for achieving challenging goals (barling, 1980; miller, duffy, & zane, 1993; wall, 1982), (f) contributing to the design of a scoring rubric (sadler & good, 2006), or (g) judging the accuracy of answers to standardized test items (koivula, hassmén, & hunt, 2001). in contrast, inspection of recent pedagogical texts (e.g., absolum, 2006; clarke, 2005; harlen, 2007; taylor & nolen, 2005; weeden, winter, & broadfoot, 2002; wiggins & mctighe, 1998) indicates that a relatively narrow range of selfassessment techniques are suggested (e.g., rubrics, rating scales, including traffic lights, reflections on portfolios or series of tasks). further, we find no evidence of coherent packaging or sequencing of these activities or a theoretical model underpinning authors‟ recommendations of these practices. there seems to be an almost ad hoc, grab bag approach to the use of self-assessment. the templates given as examples in the texts we consulted included mainly checklists, rating scales (sometimes using smiley-face categories), and lists of possible prompting questions which focused on diverse outcomes including completion, compliance, effort, and attitude (e.g., mcmillan, 2001; stiggins, 2005), rather than necessarily upon the full-range of self-regulating behaviours. also, textbooks seemed to treat the topic generically, without specifying ages or stages of development which would be appropriate for the practices, templates, or examples that they provided; it appears to be left to the reader to judge if a particular activity or prompt is appropriate for his or her students. inspection of classroom practices of self-assessment reinforces this perception that teachers are applying self-assessment techniques with little thought as to potential threats to the validity of publicly displayed selfassessment (harris & brown, 2013; ross, rolheiser, & hogaboam-gray, 1998) and, certainly, have few g. t. l. brown & l. r. harris 25 | f l r concerns about the need to provide students structured support to enable them to use self-assessment realistically. this stands in stark contrast to the research literature that shows significant educational impact of self-assessment upon student learning, if students are systematically taught how to self-assess (daiute & kruidenier, 1985; glaser, kessler, palm, & brunstein, 2010; harward, allred, & sudweeks, 1994; mcdonald & boud, 2003; ramdass & zimmerman, 2008; ross, 2006; ross, hogaboam-gray, & rolheiser, 2002). hence, there is a clear need to give self-assessment techniques a semblance of order—identifying the ease or difficulty of implementation and use would provide a robust basis for developing a curriculum of self-assessment. self-assessment is an essential component of self-regulation and would appear to be a learnable competence. the advantage of situating self-assessment as a competence is that competencies usually have levels of development (e.g., ranging from novice to expert) (rychen & salganik, 2003) and, consequently, can be used as the basis for a teaching curriculum. 4. self-assessment as a curricular competence while the assessment for learning policy reforms have tried to move increasingly away from formal testing towards a more pedagogical understanding of „assessment‟, and despite advocacy for the use of selfassessment as a component of self-regulation, little attention has been put into formalising a self-assessment curriculum, in light of well-established research findings. insufficient attention has been given to curricular concerns, such as: (a) what self-assessment skills should be taught? (b) what is the developmental sequence for teaching self-assessment skills? (c) how should self-assessment skills be taught? (d) what are appropriate goals for teaching student self-assessment competence according to student age and ability? (e) what are useful criteria for evaluating student competence in self-assessment? (f) what are appropriate mechanisms by which student self-assessment reports could be evaluated, if required? this paper offers a first attempt into developing a curriculum for self-assessment as a component of self-regulation. we appreciate that school curricula are overloaded and do not advocate for the creation of a new curricular topic. instead, we suggest that within more general frameworks of teaching subject content and developing students as independent, life-long learners, the possession and implementation of a selfassessment curriculum is likely to be of great utility to teachers and students. as self-assessment is present in some forms already in most curricula, we are advocating for systematically organising and formalising what is, in most instances, already there. this would improve its impact and better match practices to student selfassessment abilities, allowing students opportunities to master more complex self-assessment and selfregulatory skills as they progress through school. lower performing and younger students need input (i.e., instruction and feedback) to master this key self-regulatory process. so what might that input look like? research has shown (brown & harris, 2013; ross, 2006) that realistic self-assessments are more likely when: (1) students are involved in the process of establishing criteria for evaluating work outcomes; (2) students are taught how to apply those criteria; (3) students receive feedback from others (i.e., teachers and peers) to help move students toward more accurate evaluations; (4) students are taught how to use other assessment data (e.g., test scores or graded work) to improve their work; (5) there is psychological safety when self-evaluation is used; (6) when rewards for accuracy are used; and (7) when students are required to explicitly justify to their peers their self-evaluations. these insights give us a basis for developing a curriculum that could guide the implementation of student self-assessment as a necessary competence for self-regulation. our first recommendation for a student self-assessment curriculum is to start with simple, concrete techniques before introducing complex, abstract techniques, including holistic, intuitive judgements about g. t. l. brown & l. r. harris 26 | f l r effort, satisfaction, or work quality. for very young students, even the act of estimating how many times they would be able to throw a bean bag into basket was difficult (powel & gray, 1995). nonetheless, powel and gray‟s (1995) technique is extremely simple and the realism of a student self-assessment can be objectively verified by the student using a tangible metric. hence, estimating how many items one might get right on a spelling list, math quiz, or vocabulary quiz are straightforward strategies which allow easy determination of the realism of the student self-assessment (jones, trap, & cooper, 1977; wan-a-rom, 2010). linking such estimates very close in time to the instructional moment also makes the task more concrete (barnett & hixon, 1997). even asking students to estimate how well they think they will do compared to their last known performance provides a concrete and personal reference point. at an intermediate stage, self-assessments supported by externally-sourced, yet explicit scaffolding of intended learning outcomes (e.g., models, computer-assisted prompts, teacher evaluations) as an adjunct or guide should be introduced. more realistic comparisons of the student‟s work quality to that of other students in the class may be feasible here, but such normative comparisons may not be a desirable curricular goal. it seems more useful to have students focus on comparing their work to that of established standards or against their previous performance rather than on how others are doing. nonetheless, techniques that allow greater autonomy in self-assessment (e.g., self-correction or self-rating of one‟s own work) should be introduced once students have demonstrated that they can assess their work realistically. at an advanced stage, rubrics or criteria, preferably developed in conjunction with the students to ensure that they have a deep understanding of the rubric progression, should be introduced. by this point, students should be able to be reasonably realistic making use of more holistic, and possibly intuitive, judgements of their work quality using rating scales or key point checklists. the research shows that the greatest learning gains come when students engage in a deeper analysis of their own work; however, getting students to that level of analysis is unlikely to be instantaneous. gradual introduction of more sophisticated self-assessment techniques seems highly desirable. throughout the development of self-assessment competence, the priority needs to be kept on realism in self-evaluation, regardless of the level of performance. we must help students avoid inappropriate negative bias in their self-assessments, which will mean helping highly able students accept that their work is actually exemplary or of a high standard. in contrast, while having an overly positive self-assessment does not have as many ill-effects (butler, 2011), realism has its own internal benefits. accurate self-monitoring contributes to the possibility of entering a growth-pathway in which students identify and respond to their weaknesses, instead of pursuing an ego-protection pathway in which students seek to maximise unmerited positive feelings about their work (boekaerts & corno, 2005). hence, teachers need to implement strategies that encourage and foster honest self-reflection. that will mean, at least for a time, permitting some selfassessments to remain private from the teacher, not forcing students to display realistic but negative selfassessments in front of classmates, and encouraging students to share their self-evaluations with trusted people (e.g., a best friend or a family member). this recommendation is not new; andrade (2010) has long advocated focusing on the self-regulatory effects of self-assessment rather than its veridicality. and certainly, it means not using self-assessments for grading, reporting, or accountability purposes. nonetheless, students generally want to understand if they have judged their own work appropriately and expect teachers to provide feedback and instruction (harris & brown, 2013; gao, 2009; peterson & irving, 2008). thus, insulating student self-assessment perpetually from the teacher would be counterproductive. hence, within a context of psychological safety, if teachers gain access to student selfassessments (e.g., those recorded alongside homework activities handed in to the teacher), it seems desirable for teachers to comment on the realism of student self-evaluations as an important learning objective in its own right. the goal is to foster realistic self-monitoring that is used to guide appropriate learning strategies (i.e., more sophisticated responses than „work harder‟ are needed). students need environments in which realism is prioritised and protected, even if it means, at first, teachers cannot easily ascertain what students think about their own learning. a self-assessment curriculum should also encourage students to explain the criteria they used to evaluate their own work. the intellectual sophistication required to justify an assessment is a significant factor in improving learning outcomes and metacognitive capability. in an environment of g. t. l. brown & l. r. harris 27 | f l r trust (e.g., a classroom with a warm supportive interpersonal climate), explaining one‟s reasoning for a selfassessment to a trusted peer is associated with improved learning outcomes (dunning, heath, & suls, 2004). it should come as no surprise that both teachers and students will need training before they can engage with self-assessment as a taught and learned competence. new professional development materials and courses are needed that go beyond the exhortation to use student self-assessment (e.g., leahy, lyon, thompson, & wiliam, 2005). these resources need to ensure teachers are aware of the theory and research base for self-assessment and provide techniques that are appropriately sequenced for the skill level students have in this competence. until teachers abandon a simple approach (e.g., using smiley-face self-rating scales for effort and satisfaction), it is unlikely self-assessment will fulfil its promise. once teachers have an appropriate understanding, they will need to train students in developing realistic self-evaluations for the explicit purpose of guiding their own learning. fortunately, the research evidence makes it abundantly clear that the quality of student self-assessment improves with training and that enhanced outcomes arise. while this paper provides a rough outline for the scope and sequence of a self-assessment curriculum, more research is needed to identify if there are ages or stages, below which, particular types of self-assessment are unrealistic for students to complete accurately. additionally, a proper curriculum would incorporate all parts of the self-regulation cycle (zimmerman, 2008) around the self-assessment practices proposed at particular levels. it is also important to invent new self-assessment practices which may better align with a complex model of self-regulation than current suggested self-reflection and self-assessment practices, most of which are focused at the end, rather than throughout, the learning cycle. we trust that this treatment of self-assessment as a self-regulating competence, rather than as an assessment practice, will contribute to improved classroom practice and professional development systems. we consider that a curriculum for self-assessment competence would be of great benefit to educational practice and trust that this first sketch will trigger significant developments. keypoints student self-assessment generally has a positive impact on academic performance, although it is not a robust assessment method in terms of validity and reliability. student self-assessment is an important aspect of and contributor to greater self-regulation of learning. student self-assessment needs a curricular framework to ensure it is an effective treated as a self-regulating competence. acknowledgments this paper was inspired by a presentation at the 2013 biennial meeting of the european association for research in learning & instruction, munich, germany. references absolum, m. (2006). clarity in the classroom: using formative assessment to build learning-focused relationships. auckland, nz: hachette livre new zealand. andrade, h. l. (2010). students as the definitive source of formative assessment: academic self-assessment and the self-regulation of learning. in h. l. andrade & g. j. cizek (eds.), handbook of formative assessment (pp. 90-105). new york: routledge. g. t. l. brown & l. r. harris 28 | f l r andrade, h. l., du, y., & mycek, k. (2010). rubric-referenced self-assessment and middle school students' writing. assessment in education: principles, policy & practice, 17(2), 199-214. doi: 10.1080/09695941003696172 andrade, h. l., du, y., & wang, x. (2008). putting rubrics to the test: the effect of a model, criteria generation, and rubric-referenced self-assessment on elementary school students' writing. educational measurement: issues and practice, 27(2), 3-13. doi: 10.1111/j.1745-3992.2008.00118.x brown, g. t. l., & harris, l. r. (2013). student self-assessment. in j. h. mcmillan (ed.). the sage handbook of research on classroom assessment (pp. 367-393). thousand oaks, ca: sage. brown, g. t. l., reddish, p., leeson, h.v., milfont, t.l., brychkova, l. & hattie, j. a. c. (2005, june). assessment of key competencies: a proposal for inclusion in e-asttle. asttle advisory. rep. 19, university of auckland, project asttle. barling, j. (1980). a multistage multidependent variable assessment of children's self-regulation of academic performance. child behavior therapy, 2(2), 43-54. doi: 10.1300/j473v02n02_03 barnett, j. e., & hixon, j. e. (1997). effects of grade level and subject on student test score predictions. journal of educational research, 90(3), 170-174. doi: 10.1080/00220671.1997.10543773 black, p., & wiliam, d. (2006). developing a theory of formative assessment. in j. gardner (ed.), assessment and learning (pp. 81-100). london: sage. boekaerts, m, & corno, l. (2005). self-regulation in the classroom: a perspective on assessment and intervention. applied psychology: an international review, 54(2), 199-231. doi: 10.1111/j.14640597.2005.00205.x boud, d., & falchikov, n. (1989). quantitative studies of student self-assessment in higher education: a critical analysis of findings. higher education, 18, 529-549. doi: 10.1007/bf00138746 brookhart, s. m., andolina, m., zuza, m., & furman, r. (2004). minute math: an action research study of student self-assessment. educational studies in mathematics, 57(2), 213-227. doi: 10.1023/b:educ.0000049293.55249.d4 butler, r. (2011). are positive illusions about academic competence always adaptive, under all circumstances: new results and future directions. international journal of educational research, 50(4), 251-256. doi: 10.1016/j.ijer.2011.08.006 clarke, s. (2005). formative assessment in the secondary classroom. abingdon, uk: hodder murray. cowie, b. (2009). my teacher and my friends helped me learn: student perceptions and experiences of classroom assessment. in d. m. mcinerney, g. t. l. brown & g. a. d. liem (eds.), student perspectives on assessment: what students can tell us about assessment for learning (pp. 85-105). charlotte, nc: information age publishing. daiute, c., & kruidenier, j. (1985). a self-questioning strategy to increase young writers' revising processes. applied psycholinguistics, 6(3), 307-318. doi: 10.1017/s0142716400006226 dochy, f., segers, m., & sluijsmans, dominique. (1999). the use of self-, peerand co-assessment in higher education: a review. studies in higher education, 24(3), 331-350. doi: 10.1080/03075079912331379935 dunning, d., heath, c., & suls, j. m. (2004). flawed self-assessment: implications for health, education, and the workplace. psychological science in the public interest, 5(3), 69-106. doi: 10.1111/j.15291006.2004.00018.x falchikov, n., & boud, d. (1989). student self-assessment in higher education: a meta-analysis. review of educational research, 59(4), 395-430. doi: 10.3102/00346543059004395 gao, m. (2009). students' voices in school-based assessment of hong kong: a case study. in d. m. mcinerney, g. t. l. brown & g. a. d. liem (eds.), student perspectives on assessment: what students can tell us about assessment for learning (pp. 107-130). charlotte, nc: information age publishing. glaser, c., kessler, c., palm, d., & brunstein, j. c. (2010). förderung der schreibkompetenz bei viertklässlern: spezifische und gemeinsame effekte prozessund egebnisbezogener prozeduren der selbstgregulation auf indikatoren der schreibleistung, strategiebeherrschung und selbstbewertung [improving fourth graders' self-regulated writing skills: specialized and shared effects of processoriented and outcomerelated self-regulation procedures on students' task performance, strategy use, g. t. l. brown & l. r. harris 29 | f l r and self-evaluation]. zeitschrift fur padagogische psychologie/ german journal of educational psychology, 24(3-4), 177-190. doi: 10.1024/1010-0652/a000015 griffiths, m., & davies, c. (1993). learning to learn: action research from an equal opportunities perspective in a junior school. british educational research journal, 19(1), 43-58. doi: 10.1080/0141192930190104 harlen, w. (2007). assessment of learning. los angeles: sage. harris, l. r., & brown, g. t. l. (2013). opportunities and obstacles to consider when using peerand selfassessment to improve student learning: case studies into teachers' implementation. teaching and teacher education, 36, 101-111. doi: 10.1016/j.tate.2013.07.008 harward, s. v., allred, r. a., & sudweeks, r. r. (1994). the effectiveness of our self-corrected spelling test methods. reading psychology, 15(4), 245-271. doi: 10.1080/0270271940150403 hewitt, m. p. (2001). the effects of modeling, self-evaluation, and self-listening on junior high instrumentalists' music performance and practice attitude. journal of research in music education, 49(4), 307-322. doi: 10.2307/3345614 jones, j, c., trap, j., & cooper, j. o. (1977). technical report: students' self-recording of manuscript letter strokes. journal of applied behavior analysis, 10(3), 509-514. doi: 10.1901/jaba.1977.10-509 klenowski, v. (1995). student self-evaluation processes in student-centred teaching and learning contexts of australia and england. assessment in education: principles, policy & practice, 2(2), 145-163. doi: 10.1080/0969594950020203 koivula, n., hassmén, p., & hunt, d. p. (2001). performance on the swedish scholastic aptitude test: effects of self-assessment and gender. sex roles: a journal of research, 44(11), 629-645. doi: 10.1023/a:1012203412708 leahy, s., lyon, c., thompson, m., & wiliam, d. (2005). classroom assessment minute by minute, day by day. educational leadership, 63(3), 18-24. mcdonald, b., & boud, d. (2003). the impact of self-assessment on achievement: the effects of selfassessment training on performance in external examinations. assessment in education: principles, policy & practice, 10(2), 209-220. doi: 10.1080/0969594032000121289 mcmillan, j. h. (2001). classroom assessment: principles and practice for effective instruction (2nd ed.). boston, ma: allyn & bacon. messick, s. (1989). validity. in r. l. linn (ed.), educational measurement (3rd ed., pp. 13-103). old tappan, nj: macmillan. miller, t. l., duffy, s. e., & zane, t. (1993). improving the accuracy of self-corrected mathematics homework. journal of educational research, 86(3), 184-189. doi: 10.1080/00220671.1993.9941157 munns, g., & woodward, h. (2006). student engagement and student self-assessment: the real framework. assessment in education: principles, policy and practice, 13(2), 193-213. doi: 10.1080/09695940600703969 olina, z., & sullivan, h. j. (2002). effects of classroom evaluation strategies on student achievement and attitudes. educational technology, research and development, 50(3), 61-75. doi: 10.1007/bf02505025 peterson, e. r., & irving, s. e. (2008). secondary school students‟ conceptions of assessment and feedback. learning and instruction, 18(3), 238-250. doi: 10.1016/j.learninstruc.2007.05.001 powel, w. d., & gray, r. (1995). improving performance predictions by collaboration with peers and rewarding accuracy. child study journal, 25(2), 141-154. ramdass, d., & zimmerman, b. j. (2008). effects of self-correction strategy training on middle school students' self-efficacy, self-evaluation, and mathematics division learning. journal of advanced academics, 20(1), 18-41. doi: 10.4219/jaa-2008-869 ross, j. a. (2006). the reliability, validity, and utility of self-assessment. practical assessment research & evaluation, 11(10), available online: http://pareonline.net/getvn.asp?v=11&n=10. ross, j. a., hogaboam-gray, a., & rolheiser, c. (2002). student self-evaluation in grade 5-6 mathematics: effects on problem-solving achievement. educational assessment, 8(1), 43-59. doi: 10.1207/s15326977ea0801_03 g. t. l. brown & l. r. harris 30 | f l r ross, j. a., rolheiser, c., & hogaboam-gray, a. (1998). skills training versus action research in-service: impact on student attitudes to self-evaluation. teaching and teacher education, 14(5), 463-477. doi: 10.1016/s0742-051x(97)00054-1 rychen, d. s., & salganik, l. h. (2003). a holistic model of competence. in d. s. rychen & l. h. salganik (eds.). key competencies for a successful life and a well-functioning society (pp. 41-62). cambridge, ma: hogrefe & huber. sadler, p. m., & good, e. (2006). the impact of selfand peer-grading on student learning. educational assessment, 11(1), 1-31. doi: 10.1207/s15326977ea1101_1 sadler, r. (1989). formative assessment and the design of instructional systems. instructional science, 18, 119-144. doi: 10.1007/bf00117714 schunk, d. h. (1996). goal and self-evaluative influences during children's cognitive skill learning. american educational research journal, 33(2), 359-382. doi: 10.2307/1163289 stiggins, r. j. (2005). student-involved assessment for learning (4th ed.). upper saddle river, nj: pearson education. taylor, c. s., & nolen, s. b. (2005). classroom assessment: supporting teaching and learning in real classrooms. upper saddle river, nj: pearson education. towler, l., & broadfoot, p. (1992). self-assessment in the primary school. educational review, 44(2), 137151. doi: 10.1080/0013191920440203 wall, s. m. (1982). effects of systematic self-monitoring and self-reinforcement in children's management of test performances. journal of psychology, 111(1), 129-136. doi: 10.1080/00223980.1982.9923524 wan-a-rom, u. (2010). self-assessment of word knowledge with graded readers: a preliminary study. reading in a foreign language, 22(2), 323-338. weeden, p., winter, j., & broadfoot, p. (2002). assessment: what's in it for schools? london: routledgefalmer. wiggins, g. p., & mctighe, j. (1998). understanding by design. alexandria, va: association for supervision and curriculum development. zimmerman, b. j. (2001). theories of self-regulated learning and academic achievement: an overview and analysis. in b. j. zimmerman & d. h. schunk (eds.). self-regulated learning and academic achievement: theoretical perspectives (2 nd edn.). mahwah, nj: lea. zimmerman, b. j. (2008). investigating self-regulation and motivation: historical background, methodological developments, and future prospects. american educational research journal, 45(1), 166-183. doi: 10.3102/0002831207312909 frontline learning research vol. 5 no. 3 special issue (2017) 43 54 issn 2295-3159 corresponding author: sharon e. fox, m.d., ph.d., dep. of pathology and laboratory medicine, southeast louisiana veterans healthcare system, new orleans email: sharon.fox4@va.gov doi: http://dx.doi.org/10.14786/flr.v5i3.258 eye-tracking in the study of visual expertise: methodology and approaches in medicine sharon e. foxa,b & beverly e. faulkner-jonesa abeth israel deaconess medical center, boston, ma, usa; harvard medical school, boston, ma, usa blsu health sciences center, new orleans, la, usa article received 7 may / revised 23 march / accepted 24 march / available online 14 july abstract eye-tracking is the measurement of eye motions and point of gaze of a viewer. advances in this technology have been essential to our understanding of many forms of visual learning, including the development of visual expertise. in recent years, these studies have been extended to the medical professions, where eye-tracking technology has helped us to understand acquired visual expertise, as well as the importance of visual training in various medical specialties. medical decision-making involves a complex interplay between knowledge and sensory information, and the study of eye-movements can reveal the mechanisms involved in acquiring the visual component of these skills. eye-tracking studies have even been extended to develop computational models of procedures for “expert” skill assessment, and to eliminate potential sources of error in image-based diagnostics. this review will examine the current eye-tracking frontier for the study of visual expertise, with specific application to medical professions. keywords: eye-tracking; visual expertise; digital imaging http://dx.doi.org/10.14786/flr.v5i3.258 fox et al | f l r 44 1. introduction eye-movements and their meaning have long been the subject of scientific study, and recent advances in technology have allowed for the study of eye gaze – or eye-tracking – during the acquisition of complex forms of visual expertise. in this review of eye-tracking methodology, we will examine the way in which the science of acquired visual expertise, as well as the development of eye-tracking technology, have allowed us to better understand visual training in medicine. training and expertise in medicine involves not only acquisition of knowledge, but also the integration of sensory information during the process of diagnosis and disease management. in many instances, that sensory information is visual, and the study of eye-movements can reveal not only the cognitive processes behind medical expertise, but also the mechanisms involved in acquiring these skills. furthermore, studies of expert gaze patterns can help us to understand common perceptual pitfalls, and to develop technologies that may assist in training medical professionals and eliminate sources of error. 2. history & eye-tracking mechanisms eye movements represent one of the most frequent sensorimotor activities in humans. large scanning movements, or saccades, typically occur 3-4 times per second (holmqvist, 2011). the most frequently reported eye gaze metric, however, represents the relative “pause” between saccades, known as fixations. individual fixations generally last approximately 200-300 milliseconds, with the fovea of the eye remaining relatively still along a point of gaze (holmqvist, 2011). in the early 1970s, eye-tracking techniques were advanced by using video-based techniques, wherein recorded features of reflections of light from the eye could be systematically tracked. one option was to scan for the lack of reflectance from the pupil (“dark-pupil” tracking), although low contrast between the pupil and dark-brown irises led to suboptimal results. if the eye is lit from the front, the light will bounce off the back of the lens and appear very bright (“bright-pupil” tracking). this bright circle can then be more reliably detected than the dark-pupil technique (merchant, morrissette, & porterfield, 1974; rayner, 1978). in addition to the ability to capture the motion of the eyes, a second and important component of eye-tracking for the study of vision is the ability to capture the visual stimulus in such a way that gaze can be determined. most early eye-tracking studies involved significant restraint of the head for this purpose. an important innovation in this area was the development of eye-tracking systems that measured multiple features of the eye in order to infer position relative to a visual stimulus (holmqvist, 2011). optical properties of the eye, such as corneal reflection and pupil location, vary differently under conditions of head versus eye movement, and their recordings can be used to solve for the actual gaze point of the viewer (for additional discussion, (wolfe, evans, drew, aizenman, & josephs, 2015)). by utilizing non-visible light, such as near-infrared light sources, such eye-trackers can be made less noticeable, and easier to use in a variety of lighting conditions. such systems are particularly useful in the study of natural learning environments, and allow results to be generalized to real world situations. the balance between obtaining a high-precision record of an observer's point-of-regard and allowing natural headand body-movements is where much of the technological advancement in eye-tracking has arisen over the past twenty years, in addition to "scene cameras" that allow eye-tracking data to be superimposed on the naturalistic point-of-view of the participant during movement (browatzki, bulthoff, & chuang, 2014; huette, winter, matlock, & spivey, 2012; johnson, liu, thomas, & spencer, 2007). computer embedded and table-mounted remote optical eye-trackers have also been developed that allow some natural head movement while sitting in front of a computer screen for two-dimensional stimulus presentation. such advances have placed fewer constraints on experimental subjects, such that modern researchers are able to record the eye movements of freely moving subjects carrying out everyday tasks. fox et al | f l r 45 these features, combined with improvement in user interface, have allowed eye-tracking to be used in the study of visual training, and the application of visual training within the field of medicine. it is also possible that future advances in this technology could allow eye-tracking devices to assist visual processes in medical education and decision-making. 3. eye-tracking applied to medical expertise 3.1 introduction eye-tracking provides a potential means for understanding the nature and acquisition of visual expertise as it relates to medical knowledge. as a method of assessing gaze patterns during task execution, eye-tracking has been used to understand which parts of an image are important to medical decision-making, and also the process which experts adopt to analyze these images. to date, the majority of the published eyetracking studies involving medical expertise are naturally in the fields requiring extensive visual training, such as radiology (drew, evans, vo, jacobson, & wolfe, 2013; kundel, nodine, krupinski, & mellothoms, 2008; g. tourassi, voisin, paquit, & krupinski, 2013; g. d. tourassi, mazurowski, harrawood, & krupinski, 2010; wolfe et al., 2015). these studies often utilize static images, which are relatively easy to employ in eye-tracking experimental designs and analyses, but which don’t always translate to the visual environments of other clinical specialties. in recent years, additional studies have appeared in relation to fields such as dermatology, surgery, and anatomic pathology (ahmidi, ishii, fichtinger, gallia, & hager, 2012; bombari, mora, schaefer, mast, & lehr, 2012; brunye et al., 2014; fox, law, & faulkner-jones, 2017; e. krupinski, chao, hofmann-wellenhof, morrison, & curiel-lewandrowski, 2014; e. a. krupinski, graham, & weinstein, 2013; e. a. krupinski et al., 2006; law, atkins, lomax, & wilson, 2003; tiersma, peters, mooij, & fleuren, 2003). the ability to use dynamic stimuli in eye-tracking research (ahmidi, ishii, et al., 2012; drew, vo, olwal, et al., 2013; fox et al., 2017; mallett et al., 2014), as well as the use of wearable and increasingly portable devices has expanded the opportunities in which eye-tracking can be utilized to understand the gain of visual expertise in clinical settings. in the study of medical education, most experimental designs involve two or three participants groups, each at a different level of medical training (ahmidi, ishii, et al., 2012; e. a. krupinski et al., 2013; mallett et al., 2014; phillips et al., 2013). analyses vary widely, however, with some reporting qualitative information, such as regions of an image with the greatest or least attention, and others using quantitative analyses of fixations in relation to defined areas of interest (aois) (ahmidi, ishii, et al., 2012; brown et al., 2014; fox et al., 2017; e. a. krupinski et al., 2013; mallett et al., 2014; phillips et al., 2013). a recent, and notable study of the use of eye-tracking in medical education involved the use of expert eye-tracking data to train observational techniques (h. jarodska et al., 2012), and will be further discussed in subsequent sections. overall, eye-tracking studies across medical specialties have suggested that more experienced physicians require fewer fixations, and less time spent on areas of interest, while performing at a higher rate of accuracy than novices. the visual patterns identified by eye-tracking experiments, however, depend in part on the type of visual expertise acquired, and the integration of medical knowledge in that process. in order to understand how eye-tracking methodology may be applied to the study of visual expertise in medicine, we must first understand the differences between visual tasks in a variety of clinical scenarios. 3.2 search-related expertise radiology is the most extensively studied field of medicine in relation to visual expertise, and it is not surprising that a significant literature involving eye-tracking has arisen in relation to this specialty (drew, evans, et al., 2013; kundel et al., 2008; g. tourassi et al., 2013; g. d. tourassi et al., 2010; wolfe et al., fox et al | f l r 46 2015). the visual expertise learned by radiologists is an example of a search-related task, in which the radiologist identifies visual “targets” in an image containing both expected and “distractor” elements (wen et al., 2016; wolfe et al., 2015). screening tests, both in radiology and pathology, are generally performed using a visual search model (stewart et al., 2007; wolfe, 1995). in this visual task, search is required because everything in the visual field cannot be identified and processed simultaneously. object recognition is limited to one, or a small number of objects at one time (wolfe, 2012b; wolfe et al., 2015). attention may appear random, but is often guided by multiple cognitive mechanisms. at the most basic level, exploratory visual gaze is directed towards items with “bottom-up” salience (braun, 1994; wolfe & horowitz, 2004). these items do not depend on the purpose of the search, and therefore are not the result of acquired visual expertise. by contrast, gaze patterns of expert radiologists during search tasks may be less affected by bottom-up salience when these elements are known distracters from a visual target (wolfe et al., 2015). bottom-up salience of a visual item is determined by basic features, such as color, size, contrast, movement and luminosity (braun, 1994; wolfe & horowitz, 2004; wolfe, horowitz, kenner, hyle, & vasan, 2004). wolfe et al. (2015) classifies these attributes as “pre-attentive,” because they do not require a specific goal or form of expert attention to bias patterns of gaze. through medical training, including knowledge acquisition and an understanding of the visual properties of targets, “top-down,” or user-guided visual search develops (draschkow, wolfe, & vo, 2014; wolfe, 1994). at this level of visual processing, the radiologist guides attention toward a mental representation of the features and potential location of a target (wen et al., 2016). top-down processing is also employed at the point of diagnosis, when the visual item is matched to both a mental depiction of the target, and general medical knowledge (drew, evans, et al., 2013; drew, vo, olwal, et al., 2013). wolfe et al. (1994) summarize this process in the “guided search model,” in which bottom-up attention to distracters is at least partially suppressed by the top-down effects of visual expertise. the expert’s knowledge provides scene guidance to relevant parts of the image, which combines with these effects to create a mental representation of the likely location of targets (wen et al., 2016). this course of attentional gaze may be modulated by systematic training algorithms for visual inspection (for example, a trained pattern of attention to each potential diagnostic target within a chest radiograph), as well as clinical information (draschkow et al., 2014; drew, evans, et al., 2013; wolfe et al., 2015; wolfe et al., 2007). eye-tracking has been utilized to better understand the development of a “priority map,” within a radiological image, and how this priority map may evolve during the diagnostic process. it is proposed that the priority map must change as the radiologist’s eyes move about the image as the salience of items will change with their distance from the present fixation (wen et al., 2016). interestingly, the development of visual expertise with training in radiology seems to indicate a move towards efficiency, and away from repeated attention to “non-priority” regions of the diagnostic image (g. tourassi et al., 2013; wolfe et al., 2015). search for multiple visual targets, held in memory, is known as “hybrid search” (wolfe, 2012a). attention is not guided as effectively in this form of search, which may play a role in the training process of visual expertise in medicine (eckstein, 2011). this effect can be highlighted as it relates to simulation studies in medical education. in a study of senior nursing students administering medications in a clinical simulation setting, 40% administered a contraindicated medication to a patient with a known allergy (b. amster et al., 2015). eye-tracking data in this educational study was used to determine whether students administered a contraindicated medication due to a knowledge deficit, or because the information required was not visualized. in this case, the necessary information was visualized by all students, and the deficit in knowledge related to pharmacology could be corrected. in addition, acquired knowledge through experience in the field, as well as risk-aversion in certain types of medical cases, may lead to altered salience depending upon “value.” the learned value of visual stimuli significantly affects attentional priority (laurent, hall, anderson, & yantis, 2015; sali, anderson, & yantis, 2014). in basic eye-tracking experiments involving manipulation of object low level properties, participants quickly learned to search for objects of a property that would produce a valuable reward (laurent et al., 2015; sali et al., 2014). thus, if a radiologist is “rewarded,” for example, for finding cancerous lesions as opposed to incidental pathology, these objects may affect salience maps. fox et al | f l r 47 in contrast to the study of errors due to a knowledge deficit (amster et al., 2015), eye-tracking has also been utilized to identify the errors which can occur during visual search guided by expertise. in a version of the “invisible gorilla” study drew et al. utilized an eye-tracker to follow the eye movements of radiologists as they searched for lung nodules in a serial stack of ct images. on the last case, a gorilla was inserted into the lung, and the majority of radiologists failed to notice this (drew, vo, & wolfe, 2013). eyetracking was able to demonstrate that this was not because the radiologists were negligent – they did, in fact, attend to the region of the gorilla – but their expert attention to expected visual targets, acquired through medical training, led to an “inattentional blindness” to the gorilla (drew, evans, et al., 2013; drew, vo, & wolfe, 2013). this particular study illustrates the way in which eye-tracking can be used to study the visual process of “missing” an obvious abnormality as compared to the presence of disease. wolfe and van wert (2010) have also discussed the importance of prevalence effects in designing appropriate eye-tracking experiments for understanding medical expertise. screening tasks such as screening for breast cancer on a mammogram, or cervical cancer on a pap smear, represent an important class of search with low prevalence of visual targets within the population screened (wolfe & van wert, 2010). studies of vigilance and low-prevalence search tasks have shown that rare events are missed more often than common ones (wolfe et al., 2007; wolfe & van wert, 2010). wolfe and van wert (2010) explain that when targets are rare, observers are more likely to reject an ambiguous target. by contrast, participants are much more likely to label an ambiguous item as a target when prevalence is high. this effect is attributed to an unconscious decision rule that is changing (wolfe & van wert, 2010). observers also become faster to declare themselves to be finished with an image under low-prevalence conditions, but forcing them to slow down does not make observers less likely to reject low-prevalence targets (wolfe, 2012b). importantly, evans et al. (2013) found the same effect to be true for both experts and novices. in an experiment involving manipulation of prevalence in the setting of mammography, radiologists had a false-negative rate of 12 % in the setting of high prevalence, and 30% at low prevalence (evans, birdwell, & wolfe, 2013). similar results were found with cytologists reading cervical cancer screening slides (evans, tambouret, evered, wilbur, & wolfe, 2011). in all cases, rare targets were missed more often. this has important implications for the generalizability of eye-tracking experiments related to visual expertise, as it suggests that prevalence in the experimental setting may affect visual performance in a way that is not seen in clinical practice. in some cases, prevalence in an experimental setting may actually offer the opportunity for increased training (jarodska et al., 2012), and the subsequent improvement of visual patterns. while some studies have demonstrated an acquisition of a pattern of “expert” gaze with focused training on gaze patterns (jardoska et al., 2012; e.a. henneman et al., 2014), it would be highly informative to examine the effects of such training over a prolonged period of time to ascertain whether this represents a permanent shift due to the use of eyetracking as an educational tool. furthermore, the patterns of vision seen in the search-related expertise employed for medical screening may differ significantly from the examination of images from which a pathologic diagnosis is expected. 3.3 “gestault” or holistic expertise we have discussed several studies in which eye-tracking has elucidated search-related features of visual expertise, however some forms of visual expertise acquired in medicine involve a visual categorization or “gestault” assessment. dermatologists, for example, are often required to diagnose a clearly visible skin lesion or rash from clinical appearance. similarly, pathologists are frequently asked to render a diagnosis from a distinct image of a lesion or cell type. while search-related expertise is certainly employed in both of these fields, a significant proportion of visual expertise is devoted to recognition and identification, rather than locating the target. kundel (2008) noted that visual search was not necessary for all radiologic images, and he incorporated the idea of holistic visual processing in radiologic diagnosis into eyetracking experiments. the ability to interpret complex visual information in a short period of time is common to all humans, most notably in the form of face perception (bukach, gauthier, & tarr, 2006; rossion, collins, goffaux, & curran, 2007). it is likely that this technique is employed by medical experts fox et al | f l r 48 who have significant experience with the visual characteristics of a diagnostic entity. experts in these fields may describe recognizing an image like one does an acquaintance, which implies visual processing that may be similar to face processing – a visual task long studied with eye-tracking methodology (gauthier & nelson, 2001; pascalis et al., 2005; vanderwert et al., 2015; wagner, hirsch, vogel-farley, redcay, & nelson, 2013). the eye-tracking methods used to study holistic visual processing during medical training differ from those employed for search-related expertise. one study of holistic processing asked if mammographers could look at a bilateral mammogram for less than a second and determine if the woman should be called back (evans, georgian-smith, tambouret, birdwell, & wolfe, 2013). the technique of a brief exposure is often useful in identifying holistic processing, as well as the most important areas of interest for the rapid interpretation of images by experts (evans, georgian-smith, et al., 2013; e. krupinski et al., 2014; kundel et al., 2008). with increasing medical expertise, several studies have also shown specific changes in eye movement patterns (drew, vo, olwal, et al., 2013; fox et al., 2017; e. a. krupinski et al., 2013). characteristically, trainees make more eye movements when evaluating an image than do experts, and those eye movements cover more of the area of the image. this development of visual efficiency is also seen in the development of human face processing from infancy to adulthood, suggesting another link between these two forms of holistic visual processing (fox et al., 2017; gauthier & nelson, 2001; pascalis et al., 2005). we have noted in preliminary work that while this form of visual efficiency seems to be naturally acquired over time, more efficient patterns can develop as a result of visual tools designed to enhance the educational process (fox et al., 2017). this is similar to the finding that training for efficiency of gaze, as well as accuracy in the assessment of infant seizures, could be enhanced through the use of image visualizations based upon expert eye-tracking data (jarodska et al., 2012). 3.4 hand-eye and procedural expertise in recent years, the importance of visual as well as tactile expertise in medical procedures has been recognized, and several eye-tracking studies have examined the type of visual expertise acquired in the training of medical procedural skills (ahmidi, ishii, et al., 2012; law et al., 2003). one such study compared the eye movements utilized by expert and novice surgeons performing a laparoscopic procedure in a computer-based simulator (law et al., 2003). the results of this study showed that novices needed more visual feedback of the tool position to complete the task than did experts. in addition, the experts tended to maintain eye gaze on the surgical target while manipulating surgical instruments, whereas novices were more varied in eye-hand coordination, and often tracked the surgical tool rather than the surgical target (law et al., 2003). the development of robotic and laparoscopic surgical instruments has also allowed for the acquisition of kinematic data related to common procedures, and skill level can be evaluated with these measures in conjunction with eye-tracking data (ahmidi, ishii, et al., 2012). data from robotic surgical systems show that hidden markov models (hmm) can facilitate the recognition of surgical skill level (ahmidi, hager, ishii, gallia, & ishii, 2012; ahmidi, ishii, et al., 2012; ahmidi et al., 2015). methods for the assessment of surgical skill utilizing eye-tracking as a quantitative measure have also been developed (ahmidi, ishii, et al., 2012). in the experiment by ahmidi et al. (2012), sinus procedures were performed by experts and novices on cadavers with the use of an endoscope and a visualization screen. a 50hz remote eye-tracker was utilized in this case, with alignment to video data collected through the endoscope. the results of hmm generated from both kinematic and eye-tracking data reveal that eye-gaze does contain expertise-related structures, and the addition of this data to kinematic information improves models of skill expertise by 13.2% for expert and 5.3% for novice levels (ahmidi, ishii, et al., 2012). models combining both measures can reportedly quantify a surgeon’s skill level on a specific procedure with an accuracy of 82.5% (ahmidi, ishii, et al., 2012). in the field of medical learning, several studies involving all levels of medical professionals have used individual eye-tracking data for both training, and debriefing after simulations (jarodska et al., 2012; henneman et al., 2014). as a debriefing strategy for medical simulations, eye-tracking can offer useful fox et al | f l r 49 information about errors that cannot be readily observed or verbalized. further, allowing medical trainees to observe expert scanpaths derived from controlled eye-tracking has been proven to be more effective than verbal didactics describing the methods of visual assessment. the scanpaths of trainees in these procedural settings can sometimes be used to assess best practices in medicine, as well as assure competency. this was performed as a follow-up to the study of errors in medication administration (marquard et al, 2011; amster, 2015), to analyze patterns of gaze most associated with identification errors among nurses. nurses who recognized errors were more likely to focus on one piece of identification information at one time, comparing medication labels and the corresponding patient information on an id badge in sequence, as opposed to reading either the badge or medication bottle in full before changing the point of fixation. eyetracking has also been a useful technique for the study of real world disruptions in the medical training environment. for example, in simulated emergency room settings, medical professionals who visually engaged an interruption during a visual task were more likely to commit an error (marquard et al., 2011). taken together, these studies of gaze patterns in procedural settings suggest a role for eye-tracking as both a training and assessment tool in medical education, and the necessity of a realistic clinical environment for the application of trained visual patterns of gaze. with the continuing trend towards quantitative assessment of procedural skill in medical training, these early studies suggest a role for eye-tracking methodology in the study and evaluation of visual training as one component of technical proficiency. 3.5 three-dimensional and dynamic visual stimuli several of the studies mentioned have necessitated the use of visual stimuli that require manipulation by the viewer, such as three-dimensional images and dynamic video recordings. several recent studies provide evidence that radiologists examining ct images in three-dimensions developed visual patterns that involved maintaining little movement in one dimension while scanning across the plane of the other two dimensions (drew, vo, olwal, et al., 2013; wen et al., 2016). two distinct techniques within this pattern could be clustered, but no significant superiority of one technique over another was demonstrated. recent advances in software for analyzing dynamic scenes or areas of interest allow for greater opportunity to investigate expertise with these types of medical images in the clinical environment (holmqvist, 2011; mallett et al., 2014; phillips et al., 2013). in the field of pathology, for example, advances in digital imaging have allowed for the development of three-dimensional models (jeong et al., 2010; ward, rosen, law, rosen, & faulkner-jones, 2015), as well as whole-slide images, which more closely replicate the experience of microscopy (fallon, wilbur, & prasad, 2010; fox et al., 2017). while some educational tasks involve the use of static images, the diagnostic process of anatomic pathology almost always involves movement of the slide image, as well as adjustment of magnification. while several studies of eye-tracking have focused on specific features of digital pathology images (bombari et al., 2012; fox et al., 2017; e. krupinski et al., 2014; e. a. krupinski et al., 2013; e. a. krupinski et al., 2006; tiersma et al., 2003), we now have the tools available to examine the effects of expertise upon the dynamic diagnostic process. the authors have examined pathology trainees at the early and late stages of training, and found support for the use of specific dynamic digital platforms in the acquisition of “expert” patterns of gaze (fox et al., 2017). as mentioned previously, it may be possible to use specific viewing platforms to teach a pattern of evaluation that replicates that of medical experts (fox et al., 2017; jarodska et al., 2012). this could potentially involve the observation of expert gaze during a procedure or visual diagnostic process, the direction of trainee gaze through focused visualization guided by this data (jardoska et al., 2012), or the development of training viewers designed to improve gaze direction and efficiency. fox et al | f l r 50 4. constraints of eye-tracking experiments while there are many important questions related to the study of visual expertise in medicine, a major constraint on these studies is always participant number and available time. medical professionals are a limited participant resource, and they often do not have the time to participate in experimental settings. furthermore, there is a limit to how one can manipulate clinical practice for research purposes, and the generalizability of experimental findings to real-world workflow. for this reason, most studies involving eye-tracking are designed to utilize only a small number of participants (often 5-15 per group), as well as fewer visual exemplars to allow for time efficiency. the invention of increasingly portable eye-trackers that can be integrated with a variety of cameras or image interfaces allows for greater accessibility, and perhaps studies which can occur within real clinical settings. in a study of operating room technicians utilizing circulation machines, eye-tracking revealed that expert technicians visually fixated upon a larger number of critical sources of information during the operative procedure as compared to novices (y. tomizawa et al., 2012). with knowledge of these differences, it may be possible to incorporate eye-tracking as a component of self-assessment in early medical practice. a key feature in this form of training would be the instruction of optimal gaze – either in a didactic, or simulation environment – followed by evaluation in applicable realworld scenarios. the collection of expert gaze patterns in the clinical environment, as well as the assessment of trainees in real applications, will allow for greater understanding of the generalizability of the results derived from research settings. furthermore, the feedback of scanpath and fixation data to individual medical trainees may prove useful as not only a research, but an educational tool. in two separate studies, comparisons of standard verbal debriefing of medical trainees following a simulated patient encounter, debriefing involving videos of their scanpaths, and debriefing involving both verbal and scanpath information, revealed that addition of eye-tracking data most significantly improves subsequent performance (jarodska et al., 2012; henneman et al., 2014). it is likely that the increasing ease of use of eye-tracking tools will allow investigators to recruit a greater number of “expert” participants, and thereby provide accurate and generalizable data to the medical community. 5. conclusion visual expertise is an important component of medical learning, and eye-tracking is one method by which we can better understand this skill and its acquisition in relation to the clinical workflow. studies to date have divided visual expertise among categories of search, image categorization, and procedural skill, with significant overlap of these tasks between fields of medicine. with the continued development of cheaper, faster, and more ergonomic eye-tracking devices, it is likely that we will have many opportunities to study the increasing number of professional tasks requiring visual expertise, and to utilize these results for improved medical training and quality of care. the use of eye-tracking in the training of visual medical expertise, as well as self-evaluation, has the potential to impact overall competency. visualization of the eye movements of expert clinicians may provide insights into a diagnostic process, or the means to avoid medical errors. these tools can enhance the traditional processes of learning through lectures, simulations, or observational sessions. specifically, the process of observing the scanpath or directed gaze of a medical expert during a visual task can improve trainee performance, while eye-tracking data from trainees can provide a method of feedback and self-assessment to students. finally, eye-tracking research may allow us to understand the complex patterns of gaze that underlie diagnostic reasoning, and provide further insight into additional learning methods that improve upon clinical expertise. fox et al | f l r 51 keypoints eye-tracking technology has evolved from one-dimensional photographic techniques, to noninvasive and increasingly portable methods that can be used in a variety of medical setting. visual expertise in medicine is acquired in conjunction with clinical knowledge, and can be characterized as search-related, holistic, or in association with kinematic skills. eye-tracking can assist in the assessment of expertise, as well as address human errors in visually based medical decision-making. acknowledgments we would like to give special thanks to dr. charles nelson, and dr. jeremy wolfe for their expertise and guidance in writing this manuscript. in addition, we would like to give thanks to nih support for dr. beverly faulkner-jones (nih: nibib sbir grant 2r44eb013518-02a1) and to dr. sharon fox (nih: nibib 3r25ns070682-04s1). references ahmidi, n., hager, g. d., ishii, l., gallia, g. l., & ishii, m. (2012). robotic path planning for surgeon skill evaluation in minimally-invasive sinus surgery. med image comput comput assist interv, 15(pt 1), 471-478. doi: 10.1007/978-3-642-33415-3_58 ahmidi, n., ishii, m., fichtinger, g., gallia, g. l., & hager, g. d. (2012). an objective and automated method for assessing surgical skill in endoscopic sinus surgery using eye-tracking and tool-motion data. int forum allergy rhinol, 2(6), 507-515. doi:10.1002/alr.21053 ahmidi, n., poddar, p., jones, j. d., vedula, s. s., ishii, l., hager, g. d., & ishii, m. (2015). automated objective surgical skill assessment in the operating room from unstructured tool motion in septoplasty. int j comput assist radiol surg, 10(6), 981-991. doi:10.1007/s11548-015-1194-1 amster b., marquard j., henneman e., fisher d. (2015). using an eye tracker during medication administration to identify gaps in nursing students' contextual knowledge: an observational study. nurse educ, 40:83–86. doi: 10.1097/nne.0000000000000097 bombari, d., mora, b., schaefer, s. c., mast, f. w., & lehr, h. a. (2012). what was i thinking? eyetracking experiments underscore the bias that architecture exerts on nuclear grading in prostate cancer. plos one, 7(5), e38023. doi:10.1371/journal.pone.0038023 braun, j. (1994). visual search among items of different salience: removal of visual attention mimics a lesion in extrastriate area v4. j neurosci, 14(2), 554-567. browatzki, b., bulthoff, h. h., & chuang, l. l. (2014). a comparison of geometricand regression-based mobile gaze-tracking. front hum neurosci, 8, 200. doi:10.3389/fnhum.2014.00200 brown, p. j., marquard, j. l., amster, b., romoser, m., friderici, j., goff, s., & fisher, d. (2014). what do physicians read (and ignore) in electronic progress notes? appl clin inform, 5(2), 430-444. doi:10.4338/aci-2014-01-ra-0003 brunye, t. t., carney, p. a., allison, k. h., shapiro, l. g., weaver, d. l., & elmore, j. g. (2014). eye movements as an index of pathologist visual expertise: a pilot study. plos one, 9(8), e103447. doi:10.1371/journal.pone.0103447 bukach, c. m., gauthier, i., & tarr, m. j. (2006). beyond faces and modularity: the power of an expertise framework. trends cogn sci, 10(4), 159-166. doi:10.1016/j.tics.2006.02.004 crane, h. d., & steele, c. m. (1985). generation-v dual-purkinje-image eyetracker. appl opt, 24(4), 527. https://doi.org/10.1097/nne.0000000000000097 fox et al | f l r 52 dodge, r., & cline, t. s. (1901). the angle velociy of eye movements. psychological review, 8(2), 145157. doi:http://dx.doi.org/10.1037/h0076100 draschkow, d., wolfe, j. m., & vo, m. l. (2014). seek and you shall remember: scene semantics interact with visual search to build better memories. j vis, 14(8), 10. doi:10.1167/14.8.10 drew, t., evans, k., vo, m. l., jacobson, f. l., & wolfe, j. m. (2013). informatics in radiology: what can you see in a single glance and how might this guide visual search in medical images? radiographics, 33(1), 263-274. doi:10.1148/rg.331125023 drew, t., vo, m. l., olwal, a., jacobson, f., seltzer, s. e., & wolfe, j. m. (2013). scanners and drillers: characterizing expert visual search through volumetric images. j vis, 13(10). doi:10.1167/13.10.3 drew, t., vo, m. l., & wolfe, j. m. (2013). the invisible gorilla strikes again: sustained inattentional blindness in expert observers. psychol sci, 24(9), 1848-1853. doi:10.1177/0956797613479386 eckstein, m. p. (2011). visual search: a retrospective. j vis, 11(5). doi:10.1167/11.5.14 evans, k. k., birdwell, r. l., & wolfe, j. m. (2013). if you don't find it often, you often don't find it: why some cancers are missed in breast cancer screening. plos one, 8(5), e64366. doi:10.1371/journal.pone.0064366 evans, k. k., georgian-smith, d., tambouret, r., birdwell, r. l., & wolfe, j. m. (2013). the gist of the abnormal: above-chance medical decision making in the blink of an eye. psychon bull rev, 20(6), 1170-1175. doi:10.3758/s13423-013-0459-3 evans, k. k., tambouret, r. h., evered, a., wilbur, d. c., & wolfe, j. m. (2011). prevalence of abnormalities influences cytologists' error rates in screening for cervical cancer. arch pathol lab med, 135(12), 1557-1560. doi:10.5858/arpa.2010-0739-oa fallon, m. a., wilbur, d. c., & prasad, m. (2010). ovarian frozen section diagnosis: use of whole-slide imaging shows excellent correlation between virtual slide and original interpretations in a large series of cases. arch pathol lab med, 134(7), 1020-1023. doi:10.1043/2009-0320-oa.1 fox, s. e., law, c. c., & faulkner-jones, b. e. (2017). quantitative gaze assessment of a dual ”side by side” viewer versus a single whole slide image viewer for pathology education. manuscript submitted for publication. gauthier, i., & nelson, c. a. (2001). the development of face expertise. curr opin neurobiol, 11(2), 219224. henneman, e.a., cunningham h., fisher d.l., et al. (2014) eye tracking as a debriefing mechanism in the simulated setting improves patient safety practices. dimens crit care nurs, 33:129–135. doi: 10.1097/dcc.0000000000000041. holmqvist, k. (2011). eye tracking : a comprehensive guide to methods and measures. oxford ; new york: oxford university press. huette, s., winter, b., matlock, t., & spivey, m. (2012). processing motion implied in language: eyemovement differences during aspect comprehension. cogn process, 13 suppl 1, s193-197. doi:10.1007/s10339-012-0476-6 jarodzka, h., balslev, t., holmqvist, k., nyström, m., scheiter, k., gerjets, p., & eika, b. (2012). conveying clinical reasoning based on visual observation via eye-movement modelling examples. instructional science, 40(5), 813-827. doi: 10.1007/s11251-012-9218-5 jeong, w. k., schneider, j., turney, s. g., faulkner-jones, b. e., meyer, d., westermann, r., . . . pfister, h. (2010). interactive histology of large-scale biomedical image stacks. ieee trans vis comput graph, 16(6), 1386-1395. doi:10.1109/tvcg.2010.168 johnson, j. s., liu, l., thomas, g., & spencer, j. p. (2007). calibration algorithm for eyetracking with unrestricted head movement. behav res methods, 39(1), 123-132. doi: 10.3758/bf03192850 krupinski, e., chao, j., hofmann-wellenhof, r., morrison, l., & curiel-lewandrowski. (2014). understanding visual search patterns of dermatologists assessing pigmented skin lesions before and after online training. j digit imaging, 27, 779-785. doi:10.1007/s10278-014-9712-1 krupinski, e. a., graham, a. r., & weinstein, r. s. (2013). characterizing the development of visual search expertise in pathology residents viewing whole slide images. hum pathol, 44(3), 357-364. doi:10.1016/j.humpath.2012.05.024 http://dx.doi.org/10.1037/h0076100 fox et al | f l r 53 krupinski, e. a., tillack, a. a., richter, l., henderson, j. t., bhattacharyya, a. k., scott, k. m., . . . weinstein, r. s. (2006). eye-movement study and human performance using telepathology virtual slides: implications for medical education and differences with experience. hum pathol, 37(12), 15431556. doi:10.1016/j.humpath.2006.08.024 kundel, h. l., nodine, c. f., krupinski, e. a., & mello-thoms, c. (2008). using gaze-tracking data and mixture distribution analysis to support a holistic model for the detection of cancers on mammograms. acad radiol, 15(7), 881-886. doi:10.1016/j.acra.2008.01.023 laurent, p. a., hall, m. g., anderson, b. a., & yantis, s. (2015). valuable orientations capture attention. vis cogn, 23(1-2), 133-146. doi:10.1080/13506285.2014.965242 law, b., atkins, m. s., lomax, a. j., & wilson, j. g. (2003). eye trackers in a virtual laparoscopic training environment. stud health technol inform, 94, 184-186. mallett, s., phillips, p., fanshawe, t. r., helbren, e., boone, d., gale, a., . . . halligan, s. (2014). tracking eye gaze during interpretation of endoluminal three-dimensional ct colonography: visual perception of experienced and inexperienced readers. radiology, 273(3), 783-792. doi:10.1148/radiol.14132896 marquard j.l., henneman p.l., he z., jo j., fisher d.l., henneman e.a. (2011). nurses' behaviors and visual scanning patterns may reduce patient identification errors. j exp psychol appl, 17:247–256. doi: http://dx.doi.org/10.1037/a0025261 merchant, j., morrissette, r., & porterfield, j. l. (1974). remote measurement of eye direction allowing subject motion over one cubic foot of space. ieee trans biomed eng, 21(4), 309-317. doi:10.1109/tbme.1974.324318 pascalis, o., scott, l. s., kelly, d. j., shannon, r. w., nicholson, e., coleman, m., & nelson, c. a. (2005). plasticity of face processing in infancy. proc natl acad sci u s a, 102(14), 5297-5300. doi:10.1073/pnas.0406627102 phillips, p., boone, d., mallett, s., taylor, s. a., altman, d. g., manning, d., . . . halligan, s. (2013). method for tracking eye gaze during interpretation of endoluminal 3d ct colonography: technical description and proposed metrics for analysis. radiology, 267(3), 924-931. doi:10.1148/radiol.12120062 rayner, k. (1978). eye movements in reading and information processing. psychol bull, 85(3), 618-660. rossion, b., collins, d., goffaux, v., & curran, t. (2007). long-term expertise with artificial objects increases visual competition with early face categorization processes. j cogn neurosci, 19(3), 543-555. doi:10.1162/jocn.2007.19.3.543 sali, a. w., anderson, b. a., & yantis, s. (2014). the role of reward prediction in the control of attention. j exp psychol hum percept perform, 40(4), 1654-1664. doi:10.1037/a0037267 stewart, j., 3rd, miyazaki, k., bevans-wilkins, k., ye, c., kurtycz, d. f., & selvaggi, s. m. (2007). virtual microscopy for cytology proficiency testing: are we there yet? cancer, 111(4), 203-209. doi:10.1002/cncr.22766 tiersma, e. s., peters, a. a., mooij, h. a., & fleuren, g. j. (2003). visualising scanning patterns of pathologists in the grading of cervical intraepithelial neoplasia. j clin pathol, 56(9), 677-680. doi: http://dx.doi.org/10.1136/jcp.56.9.677 tomizawa y., aoki h., suzuki s., matayoshi t., yozu r. (2012). eye-tracking analysis of skilled performance in clinical extracorporeal circulation. j artif organs, 15:146–157. doi: 10.1007/s10047012-0630-z tourassi, g., voisin, s., paquit, v., & krupinski, e. (2013). investigating the link between radiologists' gaze, diagnostic decision, and image content. j am med inform assoc, 20(6), 1067-1075. doi:10.1136/amiajnl-2012-001503 tourassi, g. d., mazurowski, m. a., harrawood, b. p., & krupinski, e. a. (2010). exploring the potential of context-sensitive cade in screening mammography. med phys, 37(11), 5728-5736. doi:10.1118/1.3501882 vanderwert, r. e., westerlund, a., montoya, l., mccormick, s. a., miguel, h. o., & nelson, c. a. (2015). looking to the eyes influences the processing of emotion on face-sensitive event-related potentials in 7month-old infants. dev neurobiol, 75(10), 1154-1163. doi:10.1002/dneu.22204 fox et al | f l r 54 wagner, j. b., hirsch, s. b., vogel-farley, v. k., redcay, e., & nelson, c. a. (2013). eye-tracking, autonomic, and electrophysiological correlates of emotional face processing in adolescents with autism spectrum disorder. j autism dev disord, 43(1), 188-199. doi:10.1007/s10803-012-1565-1 ward, a., rosen, d. m., law, c. c., rosen, s., & faulkner-jones, b. e. (2015). oxalate nephropathy: a three-dimensional view. kidney int, 88(4), 919. doi:10.1038/ki.2015.31 wen, g., aizenman, a., drew, t., wolfe, j. m., haygood, t. m., & markey, m. k. (2016). computational assessment of visual search strategies in volumetric medical images. j med imaging (bellingham), 3(1), 015501. doi:10.1117/1.jmi.3.1.015501 wolfe, j. m. (1994). guided search 2.0 a revised model of visual search. psychon bull rev, 1(2), 202-238. doi:10.3758/bf03200774 wolfe, j. m. (1995). the pertinence of research on visual search to radiologic practice. acad radiol, 2(1), 74-78. wolfe, j. m. (2012a). saved by a log: how do humans perform hybrid visual and memory search? psychol sci, 23(7), 698-703. doi:10.1177/0956797612443968 wolfe, j. m. (2012b). when do i quit? the search termination problem in visual search. nebr symp motiv, 59, 183-208. doi: 10.1007/978-1-4614-4794-8_8 wolfe, j. m., evans, k. k., drew, t., aizenman, a., & josephs, e. (2015). how do radiologists use the human search engine? radiat prot dosimetry. doi:10.1093/rpd/ncv501 wolfe, j. m., & horowitz, t. s. (2004). what attributes guide the deployment of visual attention and how do they do it? nat rev neurosci, 5(6), 495-501. doi:10.1038/nrn1411 wolfe, j. m., horowitz, t. s., kenner, n., hyle, m., & vasan, n. (2004). how fast can you change your mind? the speed of top-down guidance in visual search. vision res, 44(12), 1411-1426. doi:10.1016/j.visres.2003.11.024 wolfe, j. m., horowitz, t. s., van wert, m. j., kenner, n. m., place, s. s., & kibbi, n. (2007). low target prevalence is a stubborn source of errors in visual search tasks. j exp psychol gen, 136(4), 623-638. doi:10.1037/0096-3445.136.4.623 wolfe, j. m., & van wert, m. j. (2010). varying target prevalence reveals two dissociable decision criteria in visual search. curr biol, 20(2), 121-124. doi:10.1016/j.cub.2009.11.066 microsoft word kospentaris et al_publication.docx frontline learning research vol.4 no. 1 (2016) 40-‐57 issn 2295-‐3159 visual and analytic strategies in geometry george kospentarisa, stella vosniadoub, smaragda kazic, emilian thanoud anational and kapodistrian university of athens, greece bthe flinders university of south australia, australia and national and kapodistrian university of athens, greece cpanteion university of social and political sciences, greece dnational and kapodistrian university of athens, greece article received 13 november / revised 28 december / accepted 19 january / available online 3 march abstract we argue that there is an increasing reliance on analytic strategies compared to visuospatial strategies, which is related to geometry expertise and not on individual differences in cognitive style. a visual/analytic strategy test (vast) was developed to investigate the use of visuo-spatial and analytic strategies in geometry in 30 mathematics teachers and 134 11th grade students. students’ performance in the vast was also compared to performance in tests of visuo-spatial abilities, of abstract reasoning, and of geometrical knowledge. the results showed high performance of all the participants in the vast items that could be solved by relying on visuo-spatial strategies. however, only the math teachers showed high performance in the vast items that required the application of analytic geometrical strategies. there were high correlations between the students’ performance in the tests of visuo-spatial and abstract reasoning abilities and the vast analytic strategies scale, but the contribution of these tests to the vast analytic performance became statistically insignificant when geometrical knowledge was used as a mediating factor. the implications of this work for the learning and assessment of geometrical knowledge are discussed. keywords: geometry learning and instruction; visual-spatial reasoning; analytic strategies; assessment of geometry corresponding author: george kospentaris, anational and kapodistrian university of athens, dimitsanas str. 20, 11522, athens, greece, email: 25aris@math.uoa.gr doi: http://dx.doi.org/10.14786/flr.v4i1.226 kospentaris et al 1. introduction in recent years research has accumulated showing that spatial thinking is central to success in science, technology, engineering, and mathematics, the so-called stem disciplines. spatial thinking is thinking about the location of objects and their relations and requires both visuo-spatial ability – the ability to mentally visualize the rotation of objects – and spatial abstract reasoning – the ability to identify analogical relations amongst patterns (see newcombe, 2010; wai, lubinski, & benbow, 2009). there is convincing evidence that there are important individual differences in spatial thinking and that spatial thinking abilities can predict success in stem disciplines (hegarty & waller, 2006; wai, et al., 2009). particularly impressive are the analyses of large data sets showing that people with high scores on tests of spatial thinking in high school are more interested in science and math, are more likely to get advanced degrees in stem, and are more likely to pursue stem careers (shea, lubinski, & benbow, 2001; wai et al., 2009). this has led to an increase in training studies that aim at improving spatial thinking as a means of improving performance in stem disciplines (sanchez, 2012; uttal et al., 2013). there is little doubt that much of the problem solving done in science, mathematics and engineering requires the use of spatial thinking (kozhevnikov, motes, & hegarty, 2007; stieff, 2007, zazkis, dubinsky, & dautermann, 1996). in euclidean geometry, where figures are the main objects of study, the role of spatial thinking is of utmost importance. visualizing the shapes and their relation is a standard prerequisite for the understanding of geometrical propositions (battista, 2007). although a great deal of this spatial thinking can be achieved using visuo-spatial strategies – i.e., strategies that allow individuals to obtain spatial information from immediate perceptual processes – spatial thinking can also be achieved using analytic strategies, where rules provide access to spatial information without recourse to visual perception and mental animation (stieff, 2007). zazkis et al. (1996) showed that the majority of the university students participating in abstract algebra courses used a combination of visuo-spatial and analytic strategies (see also schwartz & black, 1996; stieff, 2007). the use of visuo-spatial vs. analytic strategies has been predominately examined from an individual differences point of view, as an individual characteristic or a cognitive style (eisenberg & dreyfus, 1991; pitta-pantazi & christou, 2009). this emphasis on individual differences has obscured the fact that reliance on analytic strategies also characterizes the acquisition of expertise in many domains of stem. as expertise is acquired, problem solving increasingly relies on specialized, domain-specific, rule-based, analytic approaches compared to visual, perceptual information and mental rotation. for example, in the domain of organic chemistry expert chemists develop a predilection for analytic strategies to solve chemistry tasks (stieff, 2007). in geometry, reliance on visuo-spatial strategies seems to coincide with the level 1 of the van hiele (1986) theory in geometrical thinking. at later levels geometrical thinking increasingly requires an understanding of the logical systems that geometry represents. in geometry, shapes are represented by a set of properties and their relations and geometrical thinking is characterized by the formal manipulation of a logical system. thus, when geometrical expertise is achieved, geometrical thinking relies increasingly on analytical formal processes based on geometrical knowledge. the purpose of the present research is to develop a task that can differentiate visual from analytic reasoning in geometry -a visual/analytic strategy task (vast) -and to validate it by comparing novices to experts in geometry. in the next section we present a summary review of the literature on geometrical thinking and define and explain our theoretical position with respect to the use of visuo-spatial and analytic strategies. 1.1 geometrical thinking piaget and his collaborators (piaget, inhelder, & szeminska, 1948/1960; piaget & inhelder, 1948/1967) were the first to study the psychological foundations of geometrical thinking and to propose that it develops in four sequential and hierarchical stages1. in subsequent years, van hiele (1986) argued that 1 there is a great deal of research on spatial development in young children and different theoretical approaches have appeared after piaget’s seminal work (see newcombe & huttenlocher, 2000; spelke & kinzler, 2007) but this is not the focus of the present paper. 41 kospentaris et al 42 there are five, qualitatively distinct, hierarchical levels of thought in geometry. in contrast to piaget, van hiele strongly emphasized the crucial role of school instruction in the acquisition of geometrical knowledge (van hiele, 1986, p. 65-66). more recently, houdement and kuzniak (2003) proposed (on theoretical grounds) that the five van hiele levels can be reduced to three kuhnian-like paradigms: geometry 1-natural geometry; geometry iinatural axiomatic geometry, and geometry iii – formalist axiomatic geometry. empirical research so far has failed to confirm the predictions of the van hiele theory that students move through discrete levels of geometrical thought, each characterized by different internal conceptual organization (battista, 2007). it appears instead that students oscillate between different levels of geometric understanding depending on the context and the nature of the problems to be solved. for this reason some researchers have argued that although there might be different levels of geometric thinking as identified by van hiele, these do not represent distinct stages but develop in parallel and without discontinuities between them (clements & battista, 2001; lehrer, jenkins, & osana, 1998). it follows from the above that we need a theoretical framework that can account for the considerable conceptual re-organizations that take place in the process of acquiring and using geometrical knowledge without posing the existence of hierarchical and well-defined distinct stages. for these reason, it is proposed here that it might be fruitful to examine geometrical thinking from a conceptual change point of view, and that the framework theory (ft) approach to conceptual change (vosniadou, 2013; vosniadou & skopeliti, 2014) can serve as an anchor for examining changes in geometrical knowledge after exposure to instruction. the ft belongs to a class of conceptual change approaches known as ‘theory-theory’ (carey, 2009), but also differs from them in important ways. briefly, the ft claims that (a) there are systems of core cognition that bootstrap cognitive development (carey, 2009; spelke & kinzler, 2007), without making strong nativist interpretations of early infant competencies2, and (b) that conceptual development consists of episodes of qualitative change, which, however, are not discontinuous or stage like. rather, conceptual change is seen as a slow and gradual learning process greatly facilitated by sociocultural and educational inputs. according to the ft, the same constructive-type mechanisms that are involved in all learning processes are also involved in conceptual change processes often producing fragmentation and misconceptions, but eventually having the potential to lead to qualitatively different conceptual organizations (vosniadou & skopeliti, 2014). finally, the ft claims that initial systems of thought continue to exist and influence thinking, even after instruction-induced conceptual changes have occurred (shtulman & varcarcel, 2012; vosniadou et al., 2015). seen from this theoretical perspective, it is argued that geometrical knowledge is originally built on two core cognitive systems (spatial and numerical) that rely on visuo-spatial information (newcombe & frick, 2010; spelke, lee, & izard, 2010), but that it gradually develops through systematic instruction to rely on more analytic strategies based on formal geometrical knowledge. in other words, we claim that there is a growing reliance on analytic strategies in geometric thinking with the acquisition of expertise, and that the systematic use of analytic strategies is a product of conceptual changes that take place in the subject-matter area of geometry. we do not claim that visuo-spatial strategies become extinct and that experts rely on analytic strategies only. unlike stage theories, we argue that the initial, visuo-spatial, approach to geometry is not supplanted by the analytic one, but continues to exist and to be used when contextually appropriate. the ability to systematically employ analytic strategies in geometry, however, is a major intellectual achievement and not a matter of individual differences in cognitive style. it is the product of a conceptual change which takes place over many years and which requires the acquisition of new concepts and new forms of geometrical thinking. we believe that many geometry education researchers would agree with this account of the development of geometrical knowledge. geometry is undeniably a formal system and geometric reasoning 2 various theories are attempting to explain early spatial development including connectionist interpretations and neoconstructivist approaches (see newcombe, uttal, & sauer, in press, for an extensive review). kospentaris et al 43 consists of using this formal system to reason about shape and space. according to battista (2007), underlying this formal system is a ‘primitive’ system of visuo-spatial thinking allowing individuals to ‘see’, inspect, and reflect on spatial objects, images, relationships and transformations (p. 843). this ‘primitive’ system is characterized by what he calls ‘perceptual objects’ – i.e., mental entities perceived by an individual when viewing physical objects in the real world, including geometrical diagrams. in contrast, expert geometrical knowledge operates on ‘conceptual objects’ – i.e., abstract, completely idealized and general mental entities based on ‘formal’ categories, which are explicitly circumscribed according to verbally stated, property-based definitions. the difference between a geometric diagram and a figure captures this basic dichotomy: the former is a material entity, a concrete case that imperfectly represents the abstract concept, while the latter is a theoretical, ideal object without any physical properties. similarly, fischbein (1993) argues that experts in geometry form and reason with ‘figural concepts’. a figural concept is controlled by logical rules in the context of an axiomatic system but is also a mental entity, an image with a spatial-figural content, although devoid of any concrete sensorial properties (fischbein, 1993, p. 148). battista’s (2007) and fischbein’s (1993) arguments are consistent with the proposal that there are ontological and representational shifts that take place in the development of geometrical knowledge analogous in some respects to the ontological shifts that take place in learning science (chi, 2008; vosniadou, 2013). for example, a circle, this quite familiar shape, changes from a visual gestalt (figure 1a) and becomes the locus of all plane points characterized with the property that they are equidistant from its center (figure 1b), or, in the conceptual frame of analytic geometry, to an equation (the plane points satisfying x2+y2 =r2, figure 1c). in addition, the theoretical explanations in the domain also change. at the beginning, geometrical propositions are mainly inductive generalizations based on empirical observations and experimentation with perceptual objects and not on proofs and deductive procedures based on accepted axioms and previously proven propositions. figure 1. changes in the representation of the circle. it could be argued that the above arguments would also be acceptable by stage theories, such as the van hiele theory. if this is the case, then what can the ft offer in our theoretical understanding of geometrical expertise? although some stage theories allow for intra-individual differences across tasks (a phenomenon known as decalage –piaget & inhelder, 1948/1967), they nevertheless assume that a) the new forms of thinking that develop in geometry gradually transform and eventually replace the ‘primitive’ visuospatial system with a more advanced system of thought based on analytical, formal knowledge, and b) that this process leads to distinct, qualitatively different stages in students’ thinking. from the perspective of the ft, however, knowledge acquisition does not proceed through hierarchical and well defined distinct stages, but through the gradual assimilation of the new information into the initial, ‘primitive’ system, creating in the process inert knowledge, fragmentation, and misconceptions, many of which are ‘synthetic’ conceptions3 3 synthetic conceptions are formed when learners assimilate scientific information to their incompatible prior knowledge producing in the process an alternative, erroneous conception, which however has some internal consistency and explanatory value, such as the ‘impetus misconception’ in mechanics (clement, 1982), the ‘molecules in matter’ model in the atomic-molecular theory (wiser & smith, 2013), and the ‘hollow sphere’ model in observational kospentaris et al 44 (vosniadou & skopeliti, 2014). although there is an order of acquisition in this conceptual development and some learning progressions can be identified, these cannot be characterized as ‘stages’, both because they cannot be clearly identified as such, and because the ‘primitive’ visuo-spatially-based system – is not eradicated but continues to co-exist with the formal, analytical modes of thought (shtulman & valcarcel, 2012). the purpose of the present research is not to test a full-blown theory of geometrical thinking, but to start in this direction by developing a valid task that can help us distinguish the use of visuo-spatial from analytic strategies in geometry. in the next section the rationale behind the development of the visual/analytic strategy task is described. 1.2 the visual/analytic strategy task (vast). several tasks have been developed over the years to test students’ movement from the visual to the descriptive/analytic van hiele level (e.g., burger & shaughnessy, 1986; gutiérrez & jaime, 1998; lynn & lynch, 2010; usiskin, 1982). the main limitation of such tasks is that they either favour the recall or recognition of definitions of shapes and their properties over their understanding and their application in novel situations, or that they do not require thinking based on more sophisticated, relational properties (battista (2007). in addition to the above, there are several other standardized geometry tests that assess students’ level of geometrical knowledge, such as the california standards geometry test (csgt, 2009). these standardized tests examine mainly the extent to which students can perform school procedures, e.g., to apply a known formula for some computation within a narrow formal context, which often imposes a particular solution method. thus it remains unclear whether the students who succeed in these tests would present the same level of geometry knowledge in situations where the test format would not be similar to the way they have been taught. in the present research a different method to measure visual and analytic reasoning was developed, based on the following considerations: first, we avoided setting our task in the typical geometry textbook style that could suggest deductive requirements and delimit visualization or measuring. second, we did not impose a particular solution method to the solver but rather selected problems that could be solved using either visuo-spatial or analytic strategies, so that we could investigate spontaneous strategy choice. third, we wanted to investigate not only whether students are able to use analytic strategies but also whether they are able to do so in situations that require them to inhibit visual-perceptual information processing and reason instead along formal geometric lines. thus, a task was needed in which the perceptual difficulty of comparing shapes would be intensive and where the use of analytic strategies would lead to conclusions sometimes conflicting with visual-perceptual information. the above theoretical considerations led to the development of the present visual/analytic strategy task (vast). the vast is a verbal/picture verification task. the participants are presented with a geometrical configuration that includes two shapes and are asked to decide whether a verbal statement that states that these shapes are congruent (congruence domain), similar (similarity domain), or occupy the same area (area domain), is true or false. as shown in figure 2, there are four types of configuration conditions in each geometrical domain: (a) the ‘appearance+/reality+’ condition where the two shapes both appear to be and are indeed congruent, similar or area equivalent; (b) the ‘appearance-/reality-’ condition where the two shapes neither appear nor are congruent, similar or area equivalent; (c) the ‘appearance+/reality-’ condition where the two shapes appear to be but are not congruent, similar or area equivalent; and (d) the ‘appearance-/reality+’ condition where the two shapes do not appear to be but are congruent, similar or astronomy (vosniadou, 2013; vosniadou & brewer 1994). in geometry, the ‘figural object’ described by fischbein (1993) to be formed from the synthesis of the ‘perceptual’ and ‘conceptual’ objects described by batista (2007) is such a hybrid, synthetic conception. kospentaris et al 45 area equivalent. on the top of each configuration there is a verbal statement, such as, for instance, ‘the lengths of the routes are the same’ (see figure 2, upper row). the participants are asked to decide whether this statement is true or false with respect to the geometrical configuration to which it refers. in all three geometrical domains, the conditions (a) and (b) involve items purposely designed to be solved by visual estimation alone and which are consistent with the adoption of either a visual or an analytic geometric strategy (thereafter the vast consistent subscale, or vast-cons). the conditions (c) and (d) are inconsistent with reliance on visual estimation alone and require for their correct solution reliance on geometrical knowledge and the adoption of analytic strategies (thereafter the vast-incons). the geometrical knowledge required involves either measurement and empirical confirmation or euclidean deductive argumentation. more specifically, in the case of the geometrical configurations a1 and a2 in figure 2, the conclusions that the two routes are equal (in a1) and unequal (in a2) can be reliably achieved through visualspatial inspection. they can also be deduced on the basis of known geometrical properties: in a1, the conclusion of equality can be deduced from the congruence of the corresponding line segments, which are the opposite sides of the formed rectangles. in a2, the conclusion that nick’s path is shorter than john’s path can be deduced from the geometrical axiom of triangle inequality – that the hypotenuse is always shorter than the sum of the right angled segments. in the case of the geometrical configuration a3 and a4, however, the conclusions cannot be deduced by using visual strategies, but only through reliance on geometrical knowledge. in a3, in order to deduce the inequality of length line segments, one has to compute the hypotenuses of the formed right triangles and compare the oblique line segments with the vertical or horizontal ones. in a4 by drawing horizontal and vertical lines, the segments forming the zigzag “john’s route” are equal to the corresponding segments forming the direct “nick’s route”, as opposite sides of rectangles. the above rationale applies to all items of the test. figure 2. sample items from the visual/analytic shift test (vast). kospentaris et al 46 1.3 questions and hypotheses of the present study our purpose in the present study was to examine if the vast is a reliable and valid test of visual and analytic reasoning in geometry. with respect to reliability, we wanted to find out whether kuder-richardson (k-r) reliability index was acceptable across the different sub-scales (hypothesis 1). with respect to validity, and in view of our theoretical position that the use of visual and analytic strategies is related to geometry expertise, we wanted to find out if the vast would be able to differentiate the performance of experts in geometry from that of novices. for this reason, the vast was administered to a group of mathematics teachers with extensive experience in teaching geometry and to a group of 11th grade students who had been exposed to euclidean geometry teaching. we hypothesized that if the vast is a good test of the use of visual and analytic strategies, then the mathematics teachers should have high scores both in the vast-cons and in the vast-incons because of their expertise in geometry. on the contrary, the students would obtain high scores only in the vast-cons, which can be solved with visuo-spatial strategies and not in the vast-incons, which requires reliance on analytic strategies based on geometrical knowledge and the inhibition of visual estimation (hypothesis 2). the above hypotheses are different from what would be expected assuming that performance in the vast is related only to individual differences in strategy use as opposed to geometry expertise. individual differences in strategy use would not predict systematic differences in the performance of the high school students. rather, some participants should do better in the visuo-spatial items and some others in the analytic items, regardless of their geometry knowledge (hypothesis 3). hypothesis 4 concerned the relation between the vast-incons and geometrical knowledge, as measured by school grades in geometry (gg) and performance in a standardized test of geometrical knowledge (the california standards geometry test csgt). high correlations were predicted between performance in the vast-incons, csgt and gg, because they are all alternative measures of geometrical problem solving and geometrical knowledge. finally, we investigated the relation between vast-icons and two cognitive abilities that comprise spatial thinking: (i) abstract reasoning ability, i.e., the ability to identify patterns, analogical relationships and logical rules, and (ii) visuo-spatial ability – i.e., the ability to mentally visualize the rotation of objects. in view of the well-documented findings in the literature that spatial thinking is strongly related with students’ performance in stem subjects, we hypothesized that performance in the vast-incons should correlate positively with performance in the these two tests of spatial thinking (hypothesis 5). however, in accordance with our theoretical position, namely that it is the acquisition of geometrical knowledge that leads to the use of analytic strategies, we expected that geometry knowledge (as measured by the csgt) would significantly contribute to vast-incons performance, reducing the influence of the spatial thinking factor (hypothesis 6). 2. method 2.1 participants the participants included 30 mathematics teachers (age range 30-55 years, 18 men) and 134 11th grade students (age range 16.4-17.5 years, 71 boys). the mathematics teachers had considerable experience kospentaris et al 47 teaching high school geometry. the students were of middle-class backgrounds, came from two different schools, had four different geometry teachers, and were towards the end of a five-year course in geometry4. 2.2 materials the visual analytic shift test (vast) consisted of four items for each geometrical domain counterbalanced across the four conditions described earlier. thus, there were a total of 48 items (4 items for each geometrical domain × 3 domains × 4 conditions), randomly ordered. the california standards geometry test (csgt) consisted of 15 items (5 for each geometrical domain) selected from the overall 96 items of the california standards geometry test sample released in 2009 (http://www.cde.ca.gov/ta/tg/sr/documents/cstrtqgeomapr15.pdf). the selection was made on the basis of the relevance of each item to the national geometry curriculum. the purdue visualization of rotations test (rot) is a test that determines how well one can visualize the rotation of threedimensional objects. it is among the tests of spatial thinking less likely to be contaminated by analytical abilities. to restrict analytical processing, a time limit of 10 minutes for the 20item version of this test was strictly enforced (bodner & guay, 1997). the abstract reasoning test (art) is one of the tests of spatial thinking used by wai et al., (2009). it is a non-verbal measure of fluid intelligence, consisting of 15 items. school grades in geometry were collected for all students participating in the study. 2.3 procedure the vast was administered to the mathematics teachers individually in their school office. the vast and the csgt were administered to the students as a group test during a 45-minute class session. their order of presentation was counterbalanced. the students were instructed to answer the vast and csgt items using whatever method they found suitable. formulas were provided to students individually, if they asked for them. the rot and art were administered to small groups of students in the school computer lab. completion of the electronic tests required approximately 30 minutes. 3. results 3.1 reliability indices since all measures were binominal, the kuder-richardson (k-r) reliability test was applied. the results showed that the reliability of the two subscales was acceptable (vast-cons, kuder-richardson (kr) = .70; vast-incons, kuder-richardson (k-r) = .75). reliability of the rest scales are as follows: csgt (kuder-richardson (k-r)= .76, range of mean percentage performance: 13.33-100.00, mean= 63.82, std= 22.35), rot (kuder-richardson (k-r)= .74, range: 1-20, mean= 8.59, std= 3.81), and art (kuderrichardson (k-r)= .57, range: 5-15, mean= 9.89, std= 2.49). 4 the students had been taught the basic geometric concepts and methods based on empirical measurements and inductive generalizations in grades 7, 8, and 9. in grades 10 and 11 they were introduced to the procedures of deductive proofs characterizing euclidean geometry. kospentaris et al 48 3.2 performance of the experts vs novices in order to examine the effect of the visual vs. analytic component on the participants’ performance, two composite scores were computed: the mean percentage performances in the vast consistent subscale (vast-cons) and in the vast inconsitent subscale (vast-incons). examination of the mean and standard deviation of the performance on the vast-cons, showed that the scale almost reached a ceiling effect (mean percentage= 83.86, see table 1). thus, as was planned, this subscale consisted of easy items that could be successfully solved by the math teachers as well as by the students. given that normality assumptions did not hold for this particular subscale, no parametric tests were applied. table 1 means and std as a function of expertise and item type (vast-cons vs. vast-incons) of the vast participants item type vast-cons vast-incons total mean std mean std mean std teachers 92.000 7.575 78.841 11.685 85.392 7.525 students 81.637 12.454 50.751 15.314 66.175 10.312 total 83.857 12.321 56.770 18.606 70.293 12.566 a t-test for independent samples was applied on vast-incons performance. results showed significant difference between the two groups [t(138)= -9.324, p<.001, mean= 50.75, for the students, and mean= 78.84, for the math teachers]. the math teachers answered correctly almost all of the items in the vast-incons, whereas the 11th graders had considerable difficulty with the vast-incons. in agreement with hypothesis 2, teachers’ and students’ performance was clearly differentiated in the vast-incons, where the performance of students was considerably lower than that of the teachers. in order to examine hypothesis 3, we plotted the students’ individual mean percentage performance in the vast-cons and the vast-incons. figure 3 shows the mean percent score of each participant on the y-axis. as can be seen, it was not the case that some participants performed well in the vast-cons and others in the vast-incons, as would have been predicted by the individual differences/cognitive style hypothesis. with very few exceptions, the items in the vast-cons that could have been solved using visuospatial strategies were much easier for each individual participant than the items in the vast-incons, which required recourse to analytic strategies. it can also be seen, that many students performed well only in the case of the vast-cons. kospentaris et al 49 figure 3. individuals’ mean performance in the vast-cons and the vast-incons. the performance of the teachers is shown above the horizontal line. 3.3 relations between the vast-cons, vast-incons csgt, gg rot and art due to the violation of the normality assumption of the vastcons, a spearman’s rho correlation analysis was performed on the mean scores of the vastcons with the vastincons, csgt, gg, rot and art. results showed that vastcons correlated moderately only with rot (rho=.190, p=.047), whereas all other correlations were insignificant (with vastincons, rho=.136, with csgt, rho=.165, with gg, rho=.054, and with art, rho=.176). 3.4 relations between the vast-incons, csgt and gg in order to examine hypothesis 5, a correlation analysis (pearson’s r) was performed on the mean percentage scores on the vast-cons, the vast-incons, rot (cronbach alpha= .74, range: 1-20, mean= 8.59, std= 3.81), and art (cronbach alpha= .57, range: 5-15, mean= 9.89, std= 2.49). as predicted, the results showed statistically significant correlations between performance in the two vast subscales, rot and art (table 2). 3.5 relations between the vast-incons, rot and art in order to examine hypothesis 5, a correlation analysis (pearson’s r) was performed on the mean percentage scores on the vast-incons, rot (cronbach alpha= .74, range: 1-20, mean= 8.59, std= 3.81), and art (cronbach alpha= .57, range: 5-15, mean= 9.89, std= 2.49). as predicted, the results showed kospentaris et al 50 statistically significant correlations between performance in the two vast subscales, rot and art (table 2). table 2 correlation between the measures of the study 1 2 3 4 5 1. vast-incons 2. csgt .429** 3. gg .318** .575** 4. rot .313** .357** .249** 5. art .366** .414** .285** .327** **. correlation is significant at the 0.01 level (2-tailed) *. correlation is significant at the 0.05 level (2-tailed) a stepwise regression analysis was applied with the purpose of examining in greater detail the contributions of the above-mentioned measures of this study to vast-incons performance. the measures were inserted in the analysis in the following order: rot, art, and csgt. the order of insertion followed the theoretical rationale of the present study, that is, general visuo-spatial ability (rot) was inserted first, followed by general abstract reasoning ability (art), and, finally, by performance on csgt, which incorporated all the above and included geometrical knowledge. based on our theoretical analysis, performance in csgt should predict performance in the vast-incons best. table 3 results of step-wise regression of rot, art, and csgt on vast-incons model b std. error β t sig. 1 rot 1.258 .432 .295 2.909 .005 2 rot .901 .430 .211 2.096 .039 art 1.772 .581 .307 3.048 .003 3 rot .522 .442 .122 1.179 .242 art 1.223 .603 .212 2.027 .046 csgt .196 .076 .282 2.561 .012 kospentaris et al 51 the results (see table 3) showed that all three consecutive models had a good fit. for the first step (rot) [f (1, 89) =8.461, p= .005], for the second step (rot and art) [f (2, 88) = 9.269, p< .001], and for the third step (rot, art and csgt) [f (3, 88) = 8.755, p< .001]. as it can be seen in table 3, when csgt was entered in the model the contribution of rot became non-significant (p= .242) and the contribution of art became marginally significant (p= .046). these results fully confirmed hypothesis 6, indicating that geometrical knowledge, and not general visuo-spatial abilities and abstract spatial reasoning, accounted for the visual/analytic strategy shift as measured by the vast-incons. in order to further validate the above result, a mediation analysis of the patterns of relations was applied on the data (see figure 4), by using amos (version spss21) through bootstrapping (number of bootstrap samples=2000, bias corrected confidence intervals= .95). first, the direct relations between rot and art on the vast-incons were computed. for rot and vast: two-tailed significance p< .045, and for art and vast: two-tailed significance p< .019. thus, the results indicated that both paths were significant. we then tested two models, one with the csgt and the other with geometry grades as mediating variables. when the csgt was added as a mediating variable, the indirect effect (i.e., the mediating path from rot through csgt to vast-incons) was significant (p= .006), and so was the mediating path from art through csgt to vast-incons (p= .001). inspection of the direct effects showed that both relations were completely mediated by csgt (for the path between rot and vast-incons, p= .143 and for the path art to vast-incons, p= .133). the best model that resulted (see figure 4), eliminating only the direct relation from rot to the vast-incons, had an acceptable fit (χ2 (1)= 2.384, p= .123, cfi= . 977, standardized rmr= .04). figure 4. regression weights of the mediation analysis between rot, art, csgt, and vast-incons. when geometry grades were treated as the mediating variable, the indirect effect from rot through geometry grades to vast-incons was significant (p= .042), and so was the mediating path from art through school grades to vast-incons (p= .025). inspection of the direct effects showed that the relation between rot and the vast-icons was completely mediated by geometry grades (for the path between rot and vast-incons p= .086), whereas the relation between art and the vast-incons was partially mediated by grades (p= .048). the final model, after eliminating the direct relation between rot and the vast-incons did not, however, show a good fit [χ2 (1)= 3.431, p= .06, cfi= .942, standardized rmr= .05], since the value of χ2/df exceeded the value of 3, and model’s p value was statistically significant. to conclude and summarize, the results of the mediation analysis confirmed the hypothesis that performance on the vast-incons will be mediated by geometrical knowledge, particularly when kospentaris et al 52 geometrical knowledge was measured by csgt. in addition, the model with the best fit still retained the direct relation between analytic reasoning as measured by art and the vast-incons. 4. discussion our main purpose in this study was to develop and validate a task that could distinguish visuo-spatial from analytic reasoning in geometry. as mentioned in the introduction, there have been several attempts so far to develop tasks that capture the change from the visual to the descriptive/analytic level in geometry. these previous attempts were not very successful because they were based on the recall or recognition of definitions of shapes and their properties, did not require thinking based on relational properties, and/or did not require the application of formal geometrical thinking in novel situations. the vast differs substantially from these previous attempts because it avoids the typical geometrystyle problems that impose an analytical solution method and because it consists of tasks that can be solved using either visuo-spatial or analytic strategies. the greatest advantage of vast, compared to previous tests, is that it focuses specifically on the antagonism between the two substantially different types of strategy in geometry and allows us to check individuals’ abilities to spontaneously choose and adequately apply the correct strategy. finally, the vast investigates the ability to use analytic strategies in situations where the visual element plays quite a central role and where visual information processing must be inhibited in favour of formal geometrical thinking. the results of the present study showed that the vast consists of items that have good internal consistency. most importantly the results show that the vast is a valid test because it can differentiate geometry teachers from students and because it correlates highly with other tests of spatial thinking and geometrical knowledge. 4.1 are the differences in vast performance related to geometry expertise? despite small differences in the ease or difficulty of the solution of individual items, the main pattern of results was the same: the items in the vast-cons that could be solved correctly using visuo-spatial strategies were much easier for all participants than the items in the vast-incons which required the use of analytic strategies and the inhibition of the visual element. as predicted (hypothesis 2), the math teachers were able to solve both the vast-cons and the vast-incons. this result suggests that the math teachers have access to both visuo-spatial and analytic strategies. on the contrary, the 11th grade students could easily solve only the vast-cons, but had great difficulty with the vast-incons. this finding indicates that the students relied predominantly on visuospatial strategies and had difficulty in employing analytic strategies when required by the task, further confirming hypothesis 2. the finding that there were systematic differences in the performance of the math teachers and the students -the math teachers performed well in both the vast-cons and the vast-incons, but the 11th grade students were able to perform well only in the vast-cons -supports the argument that the use of analytic strategies is related to the acquisition of geometry expertise. as explained in the introduction, if the differences in vast performance were due to individual differences in strategy use, then we should expect some mathematics teachers and some students to perform well in the vast-cons and others in the vastincons (hypothesis 3). this was not however the case. the items in the vast-incons were systematically more difficult than the items in the vast-cons both for teachers and for students. moreover, the students performed well only in the vast-cons, indicating that the majority of the students were able to successfully apply only simple visuo-spatial strategies. kospentaris et al 53 additional evidence in favour of the interpretation that the use of analytic strategies requiring geometrical knowledge in the vast-inconsistent sub-scale comes from the results of the mediation analysis which showed that relations between performance in the rot and the vastinconsistent subscale was completely mediated by performance in the csgt. this means that high performance in the vastinconsistent scale requires not just domain-general analytic skills, but domain-specific geometrical knowledge and ability to use analytic thinking in a geometrical context. the results showed considerable individual differences in the performance of the students in the vast-incons. these differences seem to be related to differences in geometry expertise within the student group. this conclusion can be deduced from the high correlations that were obtained between performance in the vast-incons, performance in the csgt, and gg (hypothesis 4). students’ performance in the vast-cons and assumed ability to apply simple visuo-spatial strategies did not correlate significantly with their gg or with their performance in the csgt, confirming hypothesis 4, namely, that this performance does not necessarily require geometrical knowledge. 4.2 what is the contribution of the cognitive skills involved in spatial thinking in vast performance? the results of the present study showed high correlations between performance in the vast-cons and the vast-incons with both rot and art, confirming the prediction that spatial thinking as measured by tests of visuo-spatial and abstract (spatial) reasoning abilities contributes to students’ performance in both scales of the vast. however, performance in the vast-incons was also highly correlated with performance in the test of geometrical knowledge (the csgt) and geometry grades. most importantly the results of the stepwise regression confirmed that when the csgt performance was inserted in the last step of analysis, the contribution of rot and art (which were statistically significant in the previous steps of the analysis), became nonor marginally significant. furthermore, when a mediation analysis was applied on the data, the previously significant direct relations between performance in the vast-incons and rot were mediated by the students’ geometrical knowledge, as measured by performance in the csgt. since the direct relations between art and vast-incons were not eliminated, however, it might be the case that analytic abilities are directly contributing to performance in the vast-incons. to sum up, the results indicate that the correct use of analytic strategies cannot be explained only on the basis of visuo-spatial abilities and abstract reasoning, i.e., the cognitive skills that comprise spatial thinking, but requires the accumulation of considerable geometrical knowledge. 4.3 implications for a theory of geometrical thinking one of the main reasons we developed the vast was in order to show that the learning of geometry requires significant conceptual changes to take place, and that instruction-induced conceptual changes culminate in the ability to use formal, geometrical knowledge in problem-solving and in the flexible use of visual/spatial and analytic strategies appropriate in the given contexts. the role of instruction here is quite crucial. as fischbein (1993) stressed, “the development of figural concepts generally is not a natural process” (p. 161). the present findings support the hypothesis that the use of analytic strategies in geometry is not a matter of individual differences in cognitive style but a major intellectual achievement, a conceptual change, which requires the acquisition of new forms of geometrical thinking. the application of analytic, formal geometrical knowledge in problem solving does not mean that visuo-spatial geometrical reasoning disappears. the fact that the math teachers could easily solve the items in the vast-cons suggests that they still had access to visual strategies. this issue needs to be investigated further, however, in view of the fact that analytic strategies could be used in the vast-cons also. finally, a great deal more research is also required to investigate the hypotheses of the ft according to which the kospentaris et al 54 processes of acquisition of geometrical knowledge are slow and gradual rather than sudden or stage-like, and that these processes can give rise to fragmentation and synthetic models. 4.4 implications for learning and instruction the low performance of the students in the vast-incons after almost five years of instruction in geometry supports the argument that the systematic use of analytic strategies is a major intellectual achievement that requires considerable conceptual changes. it should be added here that the 11th grade students were in their 5th year of geometry instruction, they had completed a two year course in plane eucleadian geometry, which was taught in a formal manner and accompanied by many geometry problems to be solved, and that they had an additional year’s course in analytic geometry designed for students opting for university study in stem subjects. consequently, all the students had been taught the formulae and theorems required to answer correctly all the vast items, both in the consistent and inconsistent sub-scales. thus, we can conclude from the above that it is possible to have acquired a great deal of school-type knowledge in geometry and not yet function at an analytic level in geometric thinking. although geometry instruction focuses almost exclusively on the acquisition of formal geometrical knowledge and the application of analytic strategies, it does not seem to be very successful in transferring to situations different from the narrow school context in which it is taught and in producing the necessary conceptual changes. this situation is similar to what is happening in other stem domains, such as physics or chemistry, where school instruction often fails to produce the necessary conceptual changes (e.g., clement, 1982; disessa, 1982). we hope that the present findings will further sensitize educators of the need to develop instruction that emphasizes not the recall and rigid application of formal definitions and rules but the constructive, dynamic activities of students that can help them understand how formal definitions fit with their visualspatial experiences and representations of geometrical shapes (fischbein, 1993; kilpatrick, hoyles, skovsmose, & valero, 2005; lehrer jenkins & osana, 1998). 4.5 limitations of the present study and future research the present work has a number of limitations and leaves several open questions to be answered by future research. first, the sample of the present study is small when attempting to validate a new measure such as the vast and therefore the results presented in this exploratory study need to be replicated with larger samples and more age groups. in addition, reaction time studies as well as use of qualitative methods, such as interviews, think-aloud protocols and eye-movement tracking could be used to further validate the vast. although performance in the vast differentiates teachers from experts and is related to geometry expertise, the results do not provide information about how analytic reasoning in geometry actually develops. developmental research is needed to further examine the hypotheses of the ft that knowledge acquisition in geometry is a continuous and not a stage-like process, during which fragmentation and synthetic conceptions are formed. future research needs to also investigate the contribution of intellectual abilities and mathematics knowledge to vast performance by administering additional tests, such as a propositional test measuring verbal analytical skills, measures of visuo-spatial and phonological memory, speed of processing and executive function, as well as measures of mathematics abilities. results from these tests could be used as co-variates in order to reduce as much as possible the effect of individual differences. last but not least, the relation between individual differences in spatial thinking and the use of visual and analytic strategies in geometry needs to be investigated further, preferably using longitudinal designs. if, as we claim, geometry expertise (and possibly expertise in other domains of stem) requires the eventual development of analytic strategies, why are individuals who are good in spatial thinking more successful in stem disciplines than those who are not? a possible explanation of this finding may be that individuals kospentaris et al 55 who are good in spatial thinking find it easier to do well in geometry early on, before conceptual changes in this domain require the development of analytic strategies based on formal, geometrical knowledge. maybe because of these early successes, these students develop an interest in geometry (or other stem-related disciplines), spend more time studying, and thus eventually undergo the conceptual changes and develop the analytic strategies required for geometrical expertise. this conjecture is consistent with the findings of the present study that spatial thinking as measured by visuo-spatial and abstract (spatial) reasoning tests contributes to success in the vast, but is rendered insignificant for the vast-incons when geometrical knowledge is taken into account. this issue needs to be further researched. keypoints considerable conceptual changes are required to go from visuo-spatial reasoning to analytic strategies in geometry these changes are related to geometry expertise and not to individual differences questions arise about the role of visual-spatial reasoning in geometry expertise acknowledgments the research reported in this paper is supported by a grant from the greek ministry of education, general secretariat for research and technology, molvisedu, thalis – aristotle university of thessaloniki. we would like to thank petros roussos for helpful comments. references battista, m. t. (2007). the development of geometric and spatial thinking. in f. lester (ed.), second handbook of research on mathematics teaching and learning, nctm (pp. 843-908). reston, va: national council of teachers of mathematics. bodner, g.m., & guay, r.b. (1997). the purdue visualization of rotations test. the chemical educator, 2(4), 1-17. burger, w. f., & shaughnessy, j. m. (1986). characterizing the van hiele levels of development in geometry. journal for research in mathematics education, 17, 31-48. csgt california standards test geometry sample (2009). http://www.cde.ca.gov/ta/tg/sr/ documents/cstrtqgeomapr15.pdf carey, s. (2009). the origin of concepts. new york, ny: oxford university press. cheng, y.l., & mix, k.s. (2012). spatial training improves children's mathematics ability. journal of cognition and development (published online, september 19, 2012). http://www.tandfonline.com/doi/pdf/10.1080/15248372.2012.725186 chi, m. t. h., (2008). three types of conceptual change: belief revision, mental model transformation, and categorical shift. in s. vosniadou (ed.), international handbook of research on conceptual change (pp. 61-82). new york, ny: routledge. clement, j. (1982). students' preconceptions in introductory mechanics. the american journal of physics, 50(1), 66-71. clements, d. h., & battista, m. t. (2001). logo and geometry. journal for research in mathematics education monograph.reston, va: national council of teachers of mathematics. disessa, a., (1982). unlearning aristotelian physics: a study of knowledge-based learning. cognitive science, 6(1), 37-75.doi: 10.1207/s15516709cog0601_2 kospentaris et al 56 eisenberg, t., & dreyfus, t. (1991). on the reluctance to visualize in mathematics. in w. zimmermann & s. cunningham (eds.), visualization in teaching and learning mathematics (pp. 26-37). washington, dc: maa. evans, j. st. b.t., & stanovich, k. (2013). dual-process theories of higher cognition: advancing the debate. perspectives on psychological science, 8(3), 223–241. doi: 10.1177/1745691612460685 fischbein, e. (1993). the theory of figural concepts. educational studies in mathematics, 24 (2), 139-162. gutiérrez, a., & jaime, a. (1998). on the assessment of the van hiele levels of reasoning. focus on learning problems in mathematics, 20(2-3), 27-46. hegarty, m. (1992). mental animation: inferring motion from static diagrams of mechanical systems. journal of experimental psychology: learning, memory and cognition, 18, 1084–1102. hegarty, m., montello, d. r., richardson, a. e., ishikawa, t., & lovelace, k. (2006). spatial abilities at different scales: individual differences in aptitude-test performance and spatial-layout learning. intelligence, 34, 151-176. hegarty, m., & waller, d. (2006). individual differences in spatial abilities. in p. shah & a. miyake (eds.), handbook of visualspatial thinking. cambridge, ma: cambridge university press. houdement, c., & kuzniak, a. (2003). elementary geometry split into different geometrical paradigms. proceedings of cerme 3, belaria, italy. http://www.dm.unipi.it/~didattica/ cerme3/proceedings/groups/tg7/tg7_houdement_cerme3.pdf kastens, k.a., & ishikawa, t. (2006). spatial thinking in the geosciences and cognitive sciences: a crossdisciplinary look at the intersection of the two fields. in c. a. manduca & d. w. mogk (eds.), earth and mind: how geologists think and learn about the earth (pp. 53-76). boulder, co: geological society of america. kilpatrick, j., hoyles, c., & skovsmose, o. (eds.) in collaboration with valero, p. (2005). meaning in mathematics education, usa: springer. kozhevnikov, m., motes, m., & hegarty, m. (2007). spatial visualization in physics problem solving. cognitive science, 31, 549-579. lehrer, r., jenkins, m., & osana. h. (1998). longitudinal study of children’s reasoning about space and geometry. in r. lehrer & d. chazan (eds.), designing learning environments for developing understanding of geometry and space (pp. 137-167). mahwah, nj: erlbaum. lynn, b. m., & lynch, c. m. (2010). van hiele revisited. mathematics teaching in the middle school, 16(4), 232-238. newcombe, n. s. (2010). picture this: increasing math and science learning by improving spatial thinking. american educator, summer, 29-43. newcombe, n. s., & frick, a. (2010), early education for spatial intelligence: why, what, and how. mind, brain and education, 4(3), 102-111. newcombe, n. s., & huttenlocher j. (2000). making space: the development of spatial representation and reasoning. cambridge, ma: mit press. newcombe, n. s., uttal, d. h. & sauter, m. (in press). spatial development. in p. zelazo (ed), oxford handbook of developmental psychology. new york, ny: oxford university press. piaget, j., & inhleder, b., (1948/1967). the child’s conception of space. london: w.w. norton & company. piaget, j., inhelder, b., & szeminska, a. (1948/1960). the child’s conception of geometry. london: routledge and kegan paul. pitta-pantazi, d., & christou, c. (2009). cognitive styles, dynamic geometry and measurement performance. educational studies in mathematics, 70, 5-26. sanchez, c.a. (2012). enhancing visuospatial performance through video game training to increase learning in visuospatial science domains. psychonomic bulletin & review, 19 (1), 58–65. schwartz, d.l., & black, j.b. (1996). shuttling between depictive models and abstract rules: induction and fallback. cognitive science, 20, 457-497. shea, d. l., lubinski, d., & benbow, c. p. (2001). importance of assessing spatial ability in intellectually talented young adolescents: a 20-year longitudinal study. journal of educational psychology, 93(3), 604–614. kospentaris et al 57 shtulman, a., & valcarcel, j. (2012). scientific knowledge suppresses but does not supplant earlier intuitions. cognition, 124, 209–215. spelke, e. s., & kinzler, k. d. (2007). core knowledge. developmental science, 10, 89–96. spelke, e.s., lee, s. a., & izard, v. (2010). beyond core knowledge: natural geometry. cognitive science, 34(5), 863-884 stieff, m. (2007). mental rotation and diagrammatic reasoning in science. learning and instruction, 17, 219234. usiskin, z. (1982). van hiele levels and achievement in secondary school geometry. colombus, oh: eric. uttal, d. h., meadow, n. g., tipton, e., hand, l. l., alden, a.r., warren, c., & newcombe, n.s. (2013). the malleability of spatial skills: a meta-analysis of training studies. psychological bulletin, 139, 352402. van hiele p. m. (1986). structure and insight: a theory of mathematics education. london: academic press inc. vosniadou, s., & brewer, w. f. (1994). mental models of the day/night cycle. cognitive science, 18, 123183. vosniadou, s. (2013). conceptual change in learning and instruction: the framework theory approach. in s. vosniadou (ed.), international handbook of research on conceptual change, 2nd edition (pp. 11-30). new york, ny: routledge. vosniadou, s., & skopeliti, i. (2014). conceptual change from the framework theory side of the fence. science and education, 23(7), 1427-1445. vosniadou, s., pnevmantikos, d., makris, n., ikospentaki, k., lepenioti, d., chountala, a., & kyrianakis, g. (2015). executive functions and conceptual change in science and mathematics learning, 7th annual conference of the cognitive science society, pasadena, ca. https://mindmodeling.org/cogsci2015/papers/0434/paper0434.pdf wai, j., lubinski, d., & benbow, c. p. (2009). spatial ability for stem domains: aligning over 50 years of cumulative psychological knowledge solidifies its importance. journal of educational psychology, 101(4), 817–835. wiser, m., & smith, c. (2013). learning and teaching about matter in the middle school years. how can atomic-molecular theory be meaningfully introduced? in s. vosniadou (ed.), international handbook of research on conceptual change, 2nd edition (pp. 177-194). new york, ny: routledge. zazkis, r., dubinsky, e., & dautermann, j. (1996). coordinating visual and analytic strategies: a study of students' understanding of the group d 4. journal for research in mathematics education, 27(4), 435– 457. microsoft word niculescu et al_published.docx frontline learning research vol. 3 no. 1 (2015) 1-17 issn 2295-3159 corresponding author: alexandra c. niculescu, educational research department, maastricht university school of business and economics, po box 616, 6200 md maastricht, netherlands. email address: a.niculescu@maastrichtuniversty.nl doi http://dx.doi.org/10.14786/flr.v3i1.136 exploring the antecedents of learning-related emotions and their relations with achievement outcomes alexandra c. niculescua, dirk tempelaara, amber dailey-hebertb, mien segersa, wim gijselaersa, a maastricht university, netherlands b park university, united states article received 2 december 2014/ revised 6 february 2015 / accepted 9 february 2015 / available online 24 march 2015 abstract recent work suggests that learning-related emotions (lres) play a crucial role in performance especially in the first year of university, a period of transition for most students; however, additional research is needed to show how these emotions emerge. we developed a framework which links a course-contextualized antecedent – academic control in pekrun’s (2006) control value theory of achievement emotions – with generic antecedents – adaptive and maladaptive cognitions and behaviors from martin’s (2007) motivation and engagement wheel framework – to explain a classical problem: the emergence of lres in a transition period. using a large sample (n = 3451) of first year university students, our study explores these two antecedents to better understand how four lres (enjoyment, anxiety, boredom and hopelessness) emerge in a mathematics and statistics course. through the use of path-modelling, we found that academic control has a strong effect on all four lres – with the strongest impact observed for learning hopelessness and secondary, for learning anxiety. academic control, on its turn, builds on contributions from adaptive and mal-adaptive cognitions. furthermore, adaptive cognitions have an impact on learning enjoyment (positive) and on boredom (negative). surprisingly though, the maladaptive behaviors impact positively learning enjoyment and negatively learning anxiety. following this, we predicted performance outcomes in the course and found again academic control as the main predictor, followed by learning hopelessness. overall, this study brings evidence that adaptive and maladaptive cognitions and behaviours act as important antecedents of academic control, the main predictor of lres and course performance outcomes. keywords: learning-related emotions; academic control; adaptive and non-adaptive cognitions and behaviors; academic achievement; first year of university. niculescu et al | f l r 2 1. introduction the first year experience of university is known as a transition period (baker & syrik, 1999; tinto, 1997), when students are confronted with novel situations over which they have low control, yet still hold high expectations for success (perry, hladkyj, pekrun, clifton, & chipperfield, 2005). these conditions typically create negative emotional reactions towards learning in academic situations (stupnisky, perry, hall, & guay, 2012), which can lead to voluntary withdrawal at the course level (ruthig et al., 2007) and overall poor performance across all courses taken at the university (hall, perry, ruthig, hladkyj, & chipperfield, 2006). such emotions, known as achievement related emotions, can have serious consequences on how students perform within a course (pekrun, goetz, frenzel, barchfeld, & perry, 2011). this is particularly true for mathematics and statistics courses, in which students experience high levels of negative emotions, especially in in learningor homework-related situations (dettmers et al., 2011; goetz et al., 2012). within these courses, negative emotions emerge from beliefs about a low capacity to influence outcomes (frenzel, pekrun, & goetz, 2007; pekrun, 2000), referred to as appraisals of control (pekrun, goetz, titz, & perry, 2002). at the same time, students come into these courses holding generic predispositions towards learning at university, such as adaptive and maladaptive cognitions and behaviours, which will also influence their experiences within a course (martin, 2007). although we know that emotions experienced in learningor homework-related situation are particularly important for performance (leone & richards, 1989; verma, sharma, & larson, 2002), additional research is needed in the first year of university to help us understand how these emotions emerge and how they can be influenced (putwain, sander, & larkin, 2013). such information can inform the design of educational interventions to create “emotionally sound” (astleitner, 2000) learning environments which can potentially improve academic achievement. the present study focuses on two different antecedents of achievement learning-related emotions: 1) the course contextualized antecedents (appraisal of control) and, 2) the generic antecedents towards learning at university (adaptive and maladaptive cognitions and behaviours). both antecedents need to be integrated, as they are complementary in providing information about the emergence of emotions in a course setting. direct antecedents are necessary for explaining the emergence of distinct emotions at a course level and distal antecedents can explain the individual differences that arise in the emergence of these emotions. finally, relations and implications for academic achievement are further discussed. 1.1 theoretical framework over the past twenty years we have seen a growing interest in, and increased research that explores the role of achievement emotions across various educational contexts and course settings. such research investigates different functions of academic emotions within a course, such as their effects on self-regulation (artino jr. & jones ii, 2012), learning engagement (ainley & ainley, 2011), learning choices (tempelaar, niculescu, rienties, gijselaers, & giesbers, 2012) and achievement (dettmers et al., 2011; goetz, frenzel, pekrun, & hall, 2006; goetz et al., 2012). the transition required in the first year of university involves several challenges which may include perceived competition and pressure to perform – both demanding heightened self-reliance and autonomy (perry, hladkyj, pekrun, & pelletier, 2001). since students are expected to engage in more individual self-study, the importance of achievement emotions in individual learningor homeworkrelated situations (as compared to the classroom setting, for example) is particularly important. these emotions are referred to in the literature as achievement learning-related emotions (pekrun, 2000). at the same time, a closer investigation of students’ experiences is necessary to clarify how learningrelated emotions (lres) emerge at the course level. 1.1.1 achievement emotions achievement emotions are defined as “emotions that are directly linked to achievement activities and outcomes” (pekrun et al., 2011, p. 37). in the control-value theory of achievement emotions (cvtae; pekrun, 2006), emotional experiences have a situational context, meaning that they can be niculescu et al | f l r 3 experienced in different academic situations within a course: 1) being in class, 2) taking tests and exams and, 3) studying outside of class (while learning or when preparing homework). of particular interest are the emotions experienced in learning-related situations as students seem to experience the most unpleasant emotions when compared with other academic situations, such as learning in the classroom (leone & richards, 1989). indeed, according to the cvtae, first year university students experience a variety of learning-related emotions, whether the emotions are positive or negative. 1.1.2 learning – related emotions and their course contextualized antecedents according to the control-value theory of achievement emotions (cvtae; pekrun, 2006), discrete learning-related emotions (lres) arise from the appraisal of achievement activities and outcomes. emotions that result from such appraisals can indirectly influence achievement outcomes. there are two dimensions of appraisals: control and value. the appraisal of control refers to a student’s belief about whether he/she has control over learning activities/outcomes; the appraisal of value describes the subjective value attributed to these activities/outcomes. these appraisals are considered direct antecedents of lres and are acquired at the course level (pekrun, 2006). control appraisals describe the perceived controllability of one’s own competency towards achievement activities and outcomes; as a general rule, low and high levels of control appraisals influence emotions differently (pekrun, 2000). for instance, low control leads to an increased level in negative emotions (e.g., learning anxiety) and a more elevated level of control favours a heightened experience of positive emotions (such as learning enjoyment). empirical evidence shows that the appraisal of control longitudinally relates to emotions (perry et al., 2001; perry et al., 2005), as well as to subsequent academic achievement in the first year of university (hall et al., 2006; ruthig et al., 2008; stupnisky et al., 2012). for instance, perry et al. (2001) found that students who reported higher levels of primary control also felt less bored (-.48) and less anxious (-.35) towards the course, and obtained higher final grades (.27). similar relations are shown by hall et al. (2006): correlations between primary control and several emotions (anger, regret, happiness and pride) are in the range of -.27 to .24; primary control relates positively to the final course grade (.21) as well as to cumulative gpa (.25). overall, this correlational evidence suggests relations between primary control, emotions and performance which are of moderate size (cohen, 1992). there are also documented gender differences in the beliefs students hold towards their abilities to perform in mathematics (female students tend to generally believe they are not very good at mathematics), with implications on how the two genders feel about this subject (robinson & clore, 2002; frenzel et al., 2007). finally, the implications of studying course specific antecedents of lres is relevant when explaining the development of emotions over time and, indirectly, for understanding their consequences on achievement. 1.1.3 generic antecedents of learning-related emotions there are also more general expectancies and predispositions towards learning at university students already hold when entering a course, which can be considered generic antecedents of lres and achievement. students enter a new course holding background characteristics (intelligence, personality, high school gpa etc.) but also possessing a set of adaptive and impeding cognitions, and adaptive and impeding behaviors, towards learning in the new setting of university (martin, 2007). therefore, we applied the ‘motivation and engagement wheel’ framework of martin (2007, 2009) as a model for distal antecedents of learning-related emotions (lres). the motivation and engagement wheel breaks down all motivation and engagement concepts into four categories: adaptive cognitions, adaptive behaviors, impeding cognitions, and maladaptive behaviors. these four categories each consist of two or three sub-dimensions. for adaptive cognitions, the dimensions consist of self-belief, valuing school, and learning focus. student’s confidence to do well in university, their belief that learning will be useful and relevant, and their interest in learning new topics/developing new skills, all contribute to various academic outcomes (martin, 2011). furthermore, the adaptive behavioral dimensions include persistence, planning, and task management. to date, a study of martin and marsh (2006) shows that self-efficacy, control, planning, low anxiety, and persistence predict enjoyment and class participation. conversely, the impeding or deactivating antipodes of the cognitions (that obstruct learning rather than enhance it) include anxiety, failure avoidance and uncertain control. the maladaptive behaviors are twofold: self-handicapping and disengagement. in turn, self-handicapping (as a niculescu et al | f l r 4 disruptive behaviour) can predict negative academic outcomes (martin, marsh, & debus, 2001). although the experience of the adaptive and mal-adaptive cognitions and behaviors can differ on average for female and male students (liem & martin, 2012), the concepts operating in this motivation and engagement wheel represent generic orientations that are relatively stable over contexts (martin, 2009). for this reason, in pekrun’s theory, such generic orientations can be integrated as distal antecedents of lres. although it may appear that some of the concepts (e.g. self-belief/efficacy, persistency and control) from the “motivation and engagement wheel” are closely related to the appraisal of control in the cvtae, it is important to ensure clarity (distinction) between them: while the distal antecedents are more trait-type of constructs, the direct antecedent (appraisal of control) is a subject specific type of appraisal. overall, the motivation and engagement concepts play an important role in students’ cognitive appraisals, in their emotions during learning, and in achievement outcomes (martin & marsh, 2006; martin, 2011). figure 1 summarizes the conceptual model used in our study. figure 1. the conceptual framework of the study to sum-up, the added value of integrating both direct and distal antecedents into one framework is to explain: 1) the emergence of distinct emotions through direct antecedents, and 2) through distal antecedents, the individual differences that arise in learning emotions when students enroll in a course. 1.1.4 learning – related emotions and academic performance while other settings have been extensively studied, such as the exam situation, few studies have investigated situations outside the class (putwain, larkin, & sander, 2013; schutz & pekrun, 2006; trautwein et al., 2009). recent research discusses students’ emotional experiences during individual learning activities such as mathematics homework (dettmers et al., 2011; goetz et al., 2012) in which the assignments are considered “emotionally charged activities” (dettmers et al., 2011, p. 25). in the homework situation students seem to experience the most unpleasant emotions when compared with other academic situations (leone & richards, 1989; verma, sharma, & larson, 2002). furthermore, learning – related emotions (lres) are of particular interest, as they demonstrate a strong relationship with achievement outcomes. while it is already known that positive emotions have a positive impact on academic performance (dettmers et al., 2011; pekrun et al., 2002), by focusing on the experience of unpleasant emotions during homework, dettmers et al. (2011) demonstrates how elevated anxiety and boredom levels shape effort and disengagement in study, to predict negative achievement in mathematics. considering the transition represented by the first year of university, more evidence is needed – particularly in this period – about niculescu et al | f l r 5 students’ emotional experiences in learning situations. to our best knowledge, only few studies (putwain, sander, et al., 2013) have addressed this issue in the first year of university context. to our best knowledge, we found only one study (tempelaar et al., 2012) which investigates how these emotions emerge and influence learning outcomes in the setting of an undergraduate introductory mathematics or statistics course. the present study builds further on the tempelaar et al. (2012) work to look how distinct lres emerge from course contextualized and generic antecedents and further, how they influence achievement outcomes in a first year university mathematics and statistics course. 1.2 research questions and hypotheses we have asked the following research questions: rq1. what role do distal and direct antecedents play in the development of lres? rq2. to what extent can the direct and distal antecedents together explain student performance at the course level? furthermore, we hypothesize: h1. the distal antecedents will have effects on both control appraisals and lres, with differential roles for adaptive and maladaptive distal antecedents. h2. the direct antecedents, control appraisals, will have an effect on lres. this effect will be different for positive versus negative (or neutral) lres. the control appraisals will influence positively enjoyment and negatively anxiety, boredom and hopelessness. h3. distal antecedents, direct antecedents and lres all explain student performance in the course. research hypotheses are graphically depicted in the figure 2, demonstrating the a priori structural model. to facilitate the reading of this conceptual model, all three negative emotions are taken together, as well as the two adaptive cognitions and behaviours, and the two maladaptive ones. figure 2. the hypothesized structural model niculescu et al | f l r 6 the hypothesized structural model expresses that adaptive cognitions and behaviours, academic control, positive emotion, and performance are all hypothesized to be positively related, whereas maladaptive cognitions and behaviours and negative emotions are hypothesized to be positively related amongst them, but negatively related with the first subset of variables. not explicit in this conceptual model is that distal antecedents are represented by second order factors of the motivation and engagement instrument, however allowing for path estimates being different from factor loadings. 2. method 2.1. sample and setting the participants were 3451 freshmen (19 years old on average, 62.5% male) enrolled over four consecutive academic years (10/11, 11/12, 12/13, 13/14) in a business and economics program at a european university. most students had an international background, a vast majority (77.4%) holding an international education diploma and one third of the sample had been previously educated in the field of mathematics (mathematical major specialization). the setting was a compulsory introduction course to mathematics and statistics, scheduled in the first term of the academic year. it had a duration of eight weeks out of which, seven weeks were scheduled for education and the last week was reserved for exams. 2.2. procedure in week two of the course students completed an online questionnaire concerning their adaptive and maladaptive cognitions and behaviors towards learning at university in general. in week four participants completed another online questionnaire, this time about their control appraisals and lres regarding the specific subject of the course. the timing was chosen to capture sufficient experience with the learning activities. in weeks three, five and seven of the course, voluntary mathematics and statistics quizzes were planned which, if performed successfully, added a bonus score to the final course grade. every week, students were expected to prepare homework assignments which, if solved, granted students bonus points. in week eight of the course, students participated in the written exam. all students included in this study provided informed consent for the data collected by means of online questionnaires and for use of their study results. 2.3. measures and variables we measured learning-related emotions through four scales: enjoyment, anxiety, boredom and hopelessness, of the achievement emotions questionnaire (aeq; pekrun et al., 2011). the enjoyment scale (10 items, e.g. “i enjoy accruing new knowledge”), anxiety scale (11 items, e.g. “i get tense and nervous while studying”), boredom scale (11 items, e.g. “the material bores me to death”) and hopelessness scale (11 items, e.g. “i feel hopeless when i think about studying”) were slightly re-phrased to match the specific situation of our course. for reasons of consistency in our research, all items were answered on a 7-point likert scale (1 = ‘completely disagree’ and 7 = ‘completely agree’). control appraisals were measured with the academic control scale (acs) of perry et al. (2001). academic control as described by perry et al. is a domain, course-specific measure of college students’ beliefs. the scale is composed of eight items, each answered on a 7-point scale (1 = ‘strongly disagree’ and 7 = ‘strongly agree’), e.g. “i have a great deal of control over my academic performance in this course”. niculescu et al | f l r 7 adaptive and maladaptive cognitions and behaviors were measured with the motivation and engagement scale (mes; martin, 2007). the mes consists of four scales and eleven subscales subsumed under the four scales. the adaptive cognition scale is composed of three sub-scales: self-belief (e.g. “if i try hard, i believe i can do my university work well”), valuing school (e.g. “learning at university is important for me”) and learning focus (e.g. “i feel very pleased with myself when i really understand what i’m taught at the university”). the second scale, adaptive behavior contains the following subscales: persistence (e.g. “if i can’t understand my university work at first, i keep going over until i do”), planning (e.g. “if i start an assignment i plan out how i am going to do it”) and study management (e.g. “when i study, i usually study in places where i can concentrate”). the third sub-scale, maladaptive (impeding) cognition includes the anxiety (e.g. “when exams and assignments are coming up, i worry a lot”), failure avoidance (e.g. “often the main reason i work at university is because i don’t want to disappoint others”) and uncertain control (e.g. “i am often unsure how i can avoid doing poorly at university”) sub-scales. finally, maladaptive behavior includes the self-handicapping (e.g. “sometimes i don’t study very hard before exams so i have an excuse if i don’t do as well as i hoped”) and disengagement (e.g. “i often feel like giving up at university”) sub-scales. academic achievement was measured with a performance portfolio consisting of three separate parts: mathperformance, statsperformance and bonusperformance. first, the two performance outcomes mathperformance and statsperformance were assessed in a final written exam which covered a mathematics component and a statistics component, graded separately. second, the bonusperformance represented the sum of bonus scored in quizzes and homework. quizzes, although optional, were available for both mathematics and statistics in an online format. some further bonus could be achieved by doing weekly homework, containing assignments for mathematics and statistics. finally, the three separate parts were summed in the qmperformance which represented the total score for the course. we accounted for any potential influences coming from gender (female and male) and level of introductory mathematics education (distinguishing between two tracks, mathmajor and mathminor) as control variables. 2.4. statistical analyses as a preliminary step in the analysis, the four cohorts were checked upon invariance of mean levels and correlation structures. next, beyond descriptive analyses, this study applies structural equation modeling. models were estimated with lisrel (version 8.8) using maximum likelihood (ml) estimation. to prevent capitalization on chance, rather conservative model building rules were adapted: p-values of 1% or less were required as a cutoff value for significance for the adoption of any structural path; correlated traits were only allowed for variables measured by the same instrument. as measurement model for the motivation and engagement constructs, a second order confirmatory factor model was postulated, with second the order factors adaptive cognitions, adaptive behaviors, impeding cognitions, and maladaptive behaviors (see martin, 2007). we identified both second order and first order latent factors for motivation and engagement variables, and in order to derive a parsimonious model, we based the relationships with lre’s and control appraisal on the second order factors. however, we allowed for differentiated effects of first order factors, by testing if first order factors would add predictive power to the already included second order factors. we report the chi-square and degrees of freedom values, the comparative fit index (cfi), the nonnormed fit index (nnfi, also known as tli) and the root mean square error of approximation (rmsea) as indicators of goodness of fit. hu and bentler (1999) suggested for cfi/tli values larger than .90 for a satisfactory fit and for rmsea values should not exceed .08 and preferably be .06 or lower. niculescu et al | f l r 8 3. results 3.1. preliminary analysis we checked the assumptions of normality through spss 22. values of skewness and kurtosis were in the expected range of chance fluctuations in that statistic for all scales. to make the performance measures equivalent over cohorts, we transformed exam scores into cohort specific z-scores. these transformed variables were used in all subsequent analyses. we provide descriptive statistics and reliabilities (table 1) – as well as measures for differences between gender and prior education track. all analyses were based on a subset of students for which background characteristics, lres variables and performance data were all available (3355 of the 3451 students, 97%). table 1. means (m), standard deviations (sd), cronbach’s alpha and test statistics for gender and prior mathematics education differences: t-value and cohen d-value m sd α gender difference math prior education t –value d–value t –value d–value adaptive cognitions: self-belief 5.82 0.73 0.73 1.08 0.04 2.86** 0.10 valuing school 5.84 0.67 0.67 -5.15 *** -0.18 1.65 0.06 learning focus 5.95 0.73 0.80 -9.65*** -0.34 -0.14 0.00 adaptive behaviors: planning 4.79 0.99 0.73 -9.73*** -0.34 0.15 0.01 study management 5.56 0.89 0.74 -9.04*** -0.32 -2.66* -0.09 persistence 5.34 0.85 0.78 -6.79*** -0.24 1.00 0.04 impeding cognitions: anxiety 4.50 1.27 0.83 -16.12*** -0.57 -6.07*** -0.21 failure avoidance 2.57 1.19 0.83 0.90 0.03 -1.45 -0.05 uncertain control 3.45 1.18 0.80 -5.418*** -0.19 -4.58*** -0.16 maladaptive behaviors: self-handicapping 2.43 1.08 0.81 5.68*** 0.32 -0.45 -0.02 disengagement 1.97 0.90 0.74 7.09*** 0.25 1.20 0.04 academic control 5.26 0.89 0.82 3.868*** 0.14 13.68*** 0.48 learning-related emotions anxiety 3.85 1.11 0.91 -11.41*** -0.40 -15.13*** -0.53 boredom 2.94 1.13 0.93 7.65*** 0.27 -4.44*** -0.16 hopelessness 3.01 1.22 0.94 -7.18*** -0.25 -17.08*** -0.60 niculescu et al | f l r 9 note: performance scores are normalized scores; concerning gender differences, a negative score represents female students; a positive score in the differences in previous math education represents math major. 3.2. bivariate correlations bivariate correlations are reported in table 2. due to the large number of manifest variables, the correlation table contains scale values rather than individual item values for the survey data based on the aeq, acs and mes instruments. the four performance measures are manifest variables too. table 2. correlations of scales of the aeq, asc, and mes instruments (1-16) and performance measures (17-20) enjoyment 4.11 0.92 0.85 -0.55 -0.02 10.40*** 0.37 performance outcomes math performance -1.03 -0.04 20.47*** 0.72 stats performance 1.68 0.06 11.87*** .042 bonus performance -6.70*** -0.24 11.73*** 0.41 qm performance -1.00 -0.04 18.41*** 0.65 niculescu et al | f l r 10 the signs of the bivariate correlations express the divide into adaptive and maladaptive constructs. adaptive cognitions and behaviours are positively correlated to 1) academic control, 2) the positive lre of enjoyment, and to 3) performance measures. correlations with performance measures are however weak, and not fully consistent for study management. correlations between academic control and enjoyment versus performance measures are stronger, and consistently positive. a reverse pattern exists for the maladaptive cognitions and behaviours: positively correlated to negative lres, negatively correlated to academic control, enjoyment and performance measures. however, within the motivation and engagement variables, anxiety is unique in that it acts as a maladaptive cognition dimension in relation to lres and performance. yet, it correlates weakly with other maladaptive mes variables, as well as with the adaptive constructs (learning focus, study management, and planning) but to a lesser degree. 3.3. structural models separate structural equation models were estimated for each of the four performance constructs, each of them having identical relationships between the motivation and engagement latent constructs, and the latent constructs based on lres and academic control. figure 3 contains the diagram of the structural part of the structural equation model (leaving out the measurement parts of the lre, academic control and motivation and engagement constructs for reasons of readability), having only the mathematics score in the exam as performance construct. it is relevant to mention that structural models for the other performance constructs deviate only in terms of the equation predicting the performance constructs, and these equations are provided at the end of this section, in table 3. all regression paths are expressed as standardized betas. structural models were estimated in two multi-group specifications: on the basis of gender, and on the basis of prior mathematics track in high school. both result in a rejection of invariant latent means, fully in line with the outcomes of the descriptive analyses: differences in mean scales between female and male students, and between students educated in the math major, versus math minor track, also show up as significant differences in latent means. however, at the stringent .01 significance level, no rejection of the hypothesis of invariant estimates in the variance-covariance structure was found: the structural relations appear to be the same for the subgroups. fit indices of both two-group models were nearly identical, with χ2 = 26,424 and 25,946 respectively, and identical measures for df = 9,030, cfi = .98, nnfi = .98, rmsea = .39, 95% ci rmsea = (.38, .39), for the structural models including the mathematics score as performance measure. niculescu et al | f l r 11 figure 3. path diagram of structural part with standardized estimates 3.3.1. testing hypotheses in h1 we expected that the distal antecedents will have effects on both control appraisals and lres. in agreement with the cvtae (pekrun, 2006), academic control plays a central role in the antecedentconsequence relationship of adaptive and mal-adaptive cognitions and behaviours, and lres. academic control is a pure cognitive construct: it builds on contributions from adaptive and maladaptive cognitions, excluding any behavioural influence. impeding cognitions as a whole have a strong negative impact on academic control. this is explained by the fact that impeding cognition is most strongly reflected by uncertain control (.76). at the same time, that effect is attenuated by the two paths of anxiety (.56) and failure avoidance (.68), which constitute the first order factor of impeding cognition. since behaviours, both of adaptive and maladaptive type, do not contribute to academic control, the relationships between behaviours and emotions are only direct ones. the paths originating from adaptive cognitions are fully in line with the hypotheses: positive impact on enjoyment (.13), negative impact on boredom (-.24). however, the maladaptive behaviours do play a rather remarkable role. although bivariate relations are all in the hypothesised direction (positive with negative emotions, negative with the positive emotion), within the full structural model, the additional impact of maladaptive behaviours on lres is positive for enjoyment (.40), whilst its impact on anxiety is negative (-.20). this is the resultant of a multiple relationship with colinearity amongst maladaptive cognitions and behaviours: for given levels of academic control and maladaptive niculescu et al | f l r 12 cognitions, the additional effect of maladaptive behaviours is adverse to the bivariate effect. gender differences may also contribute to these adverse effects: male students score much higher than female students on maladaptive behaviours, but at the same time demonstrate less emotion of anxiety and hopelessness. in h2 we assumed that control appraisals will influence positively enjoyment and negatively anxiety, boredom and hopelessness. as hypothesized and already shown in the bivariate relations analysis, academic control has indeed a strong effect on the four lres. these effects are positive for enjoyment and negative for all other three emotions. the strongest effect is observed for hopelessness (-.65). then, enjoyment and academic control and boredom and academic control respectively, relate rather weaker (.32, -.24). the relation between academic control and anxiety (-.54) is rather strong and has a negative direction: the students in our sample are on average high in academic control (m=5.26) which might result on a rather lower level of anxiety (m=3.85). in h3 we specified that the distal antecedents, direct antecedents and lres all explain student performance in the course. we notice a consistent and dominant role of academic control on performance. then, a secondary role of hopelessness, with a crucial exception: for the bonus score (which is composed of the digital homework and quizzes). this result is very plausible: for students high in hopelessness, it is rational to allocate relative high levels of time and effort to learning in the digital tool, given its intensive scaffolding. since the share of the bonus is much smaller in the overall score than the share of math and stats exam scores, in the overall score the negative impact of hopelessness is back. a remarkable role is played by enjoyment: it impacts performance, as expected, positively for math; nevertheless, it impacts performance negatively in stats. again, this finding can be regarded as very plausible, due to the different nature of mathematics and statistics education. students who like mathematics a lot tend to prefer it over statistics. evidence for this claim is indirect: t-test for independent groups indicates that students from the ‘math major’ track score different in enjoyment, hopelessness and anxiety, from students from the ‘math minor’ track. european ‘math major’ tracks focus on mathematics only, not on stats, and very often contain less statistics subjects than the ‘math minor’ track. since enjoyment has opposite impact on math and stats performance, it is no surprise that it drops out as explanatory variable in the total score, qm i performance. lastly, self-handicapping enters as explanatory variable in one performance category: bonus. again, this is very plausible: it requires discipline to do all the homework, so students high in self-handicapping will underperform. since bonus has only a small share in the total score, it is not visible for qm i performance. for a more detailed overview of each’s variable contribution in each of the four performance outcomes, the relations between these variables are provided in the equations below (coefficients for each independent variable are expressed in standardized betas): mathperfomance = 0.32*academiccontrol + 0.06*enjoyment – 0.10*hopelessness statsperfomance = 0.27*academiccontrol – 0.10*enjoyment – 0.13*hopelessness bonusperfomance = 0.24*academiccontrol + 0.09*enjoyment – 0.16*selfhandicapping qm1perfomance = 0.33*academiccontrol – 0.13*hopelessness 4. discussion recent work suggests that learning-related emotions (lres) play a crucial role in performance especially in the first year of university, a period of transition for most students; however, additional research is needed to show how these emotions emerge. to explain this classical problem, we developed a framework which links two types of antecedents of lres: 1) the course-contextualized academic control in the control value theory of achievement emotions (pekrun, 2006) as a direct antecedent and 2) the generic adaptive and maladaptive cognitions and behaviors from the motivation and engagement wheel framework (martin, niculescu et al | f l r 13 2007) as distal antecedents. we used this framework to predict learning achievements in a mathematics and statistics course. the main findings of this study bring forth the emergence of four distinct lres (enjoyment, anxiety, boredom and hopelessness) and the fact that they standalone from students’ individual performance. such findings are reassuring: although lres are important, they are not blocking students to perform academically. more importantly, the relations between lres and performance are rather weak when taking into account their antecedents. especially, in the mediational model comprising academic control, lres and performance, we see that academic control plays a central role in the development of the four lres investigated in our study as well as for what regards the performance outcomes in the course. the direct relationship between appraisals and performance strongly dominates the indirect relationship through lres. next, academic control has a strong effect on all of the four lres with the strongest impact observed for hopelessness and secondary, for anxiety. the model explaining the four lres is again of mediational type. beyond the indirect effect through the control appraisal, there are direct effects from the four second order motivation and engagement factors to the lres. in this part of the model, direct and indirect effects rather well balance in size. academic control, on one hand, builds on contributions from adaptive and mal-adaptive cognitions solely, where the main impact is explained by the uncertain control dimension of impending cognitions. on the other hand, adaptive cognitions have a positive impact on enjoyment and a negative one on boredom. where impeding cognitions confirm the hypotheses of positive relationship with the negative emotions, surprisingly though, the maladaptive behaviours impact the lres positively for enjoyment and negatively for anxiety. it seems that amongst students scoring high on maladaptive behaviour (amongst them an overrepresentation of male students), there exists a dislike of the learning activities (increased levels of boredom), but not of the learning content: high enjoyment, low anxiety. with respect to the implications on performance outcomes, the most consistent role is played by academic control; this is followed by hopelessness (with the exception played for bonus as detailed earlier). at last, an important role is also played by enjoyment: it has opposite impact for math (positive) and stats (negative) performance. our findings are consistent with earlier research on the central role of control appraisals in the emergence of achievement emotions (pekrun et al., 2002; perry et al., 2001) as well predicting performance at the course level (hall et al., 2006). this study also provides support for the positive relations between impeding cognitions and negative emotions (martin & marsh, 2006). conversely, it extends such evidence by showing maladaptive behaviours influencing positively enjoyment and negatively anxiety. we therefore extend on the control value theory of achievement emotions (pekrun, 2006) by integrating the distal antecedents of emotions from the motivation and engagement wheel framework (martin, 2007). most notably, to the knowledge of the authors, the study is the first of its kind in using an integrated framework to ultimately explain achievement outcomes in the first year at university. we have provided a new approach to understand students’ emotional experiences when they first enter a university study. in this respect, the two theories are complementary: on one side our results are an empirical validation of the cvtae; on the other side, the concepts operating in the mes could provide practical solutions on how to facilitate educational change in the classroom by using the influence these variables have in the experience of emotions. 4.1. additional findings although not the main focus of this study, we find interesting gender patterns and effects of prior education. they are described separately. first, in our descriptive analysis, we find gender patterns that match earlier research (martin, 2007). females score significantly higher on all adaptive dimensions, with one exception: self-belief, where no significant difference is found. statistical significance of gender differences is however inflated by the large sample size; effect sizes are in the .2 to .4 range, therefore, small in size. with regard to the maladaptive dimension, we find the same pattern as described by martin (2007): maladaptivity expresses itself stronger in the form of impeding cognitions in females, but in the form of maladaptive behaviours in males. the gender effect in anxiety is not only significant, but also medium in size, again in line with previous research (preckel, goetz, pekrun, & kleine, 2008). this divide between the niculescu et al | f l r 14 cognitive and behavioural aspects of maladaptivity repeats itself in the lres. it is in boredom, the behavioural aspect of neutral emotions (see pekrun et al., 2002), that males score higher than females, and in the cognitive aspects of the negative lres, anxiety and hopelessness, that females score higher. the last gender effect refers to academic control, where male students score higher than female students, in line with outcomes of selfconcept research (frenzel et al., 2007). the second effect we investigated refers to prior education: having been educated in high school in an advanced, rather than a basic mathematics track. the impact on the generic dimensions of motivation and engagement are quite small, as to be expected. students from the advanced track are higher in self-belief, but lower in study-management and anxiety; effect sizes are however very small. these findings contrast the impact of prior education on the lres and academic control: the largest effect size, .6, is observed for hopelessness; in rest we find medium size effects. these effects point in the direction that students from the advanced track are higher in enjoyment and academic control and lower in anxiety, hopelessness, and boredom. 4.2. limitations using a large sample, our study proposed a framework linking control appraisals (as direct predecessor) with motivation and engagement concepts (as distal predecessors) in an attempt to better explain the emerge and consequences of lres in a first year undergraduate mathematics and statistics course. however, we point out two limitations. first of all, our lres measures (assessed through self-reports) rely heavily on retrospective beliefs about emotions, which make them subject to the same biases as the self-appraisals (robinson & clore, 2002). at the same time, self-reports still remain the most reliable measure (zeidner, 1998) and, for that reason the most extensively used approach, which is able to capture in a non-invasive manner students’ emotional experiences in an educational setting. second, while in the present study we tried to answer how emotions emerge in an introductory course, an important question for future studies remains: how students’ emotions change over different courses in the first year at university. future work should employ the use of a longitudinal design, over a period of time and different course subject which could cover ideally an entire year of study. 4.3. recommendations for further research some general recommendations should be outlined. first, our results showed that amongst students scoring high on maladaptive behaviour, there is a dislike of the learning activities (increased levels of boredom), but not of the learning content: high enjoyment, low anxiety. we propose that they solve this tension by designing their own learning trajectories, participating at a lower level in homework and quizzes (as evident from the role of self-handicapping in explaining the bonus performance), and prepare independently for the exam. we mentioned earlier that the evidence gained in our study could potentially inform the design of educational interventions to improve academic achievement while, at the same time, support building emotionally sound learning environments. in this respect, a first aspect to consider would be that any educational interventions in the classroom should foster students’ sense of competency towards the specific learning activities required in a mathematics and statistics course. if such progress is acquired, then reinforcing – by means of feedback – the certainty of control over the activities and outcomes in which students engage is key. increasing students enjoyment and decreasing their hopelessness seems intuitive, still these measures should be regarded in context together with the factors from which they emerge, the maladaptive behaviours. if emotions are more difficult, and less desirable, to influence directly, addressing students maladaptive behaviours could be a reasonable solution. niculescu et al | f l r 15 4.4. conclusion it can be concluded from our study that next to personal factors that bring their contribution (especially in the development of academic control), it is the contextual experience in a course that shapes students’ emotional experiences and performance. besides all other known factors, emotions seem to play a central role in any learning process as an input and as a major educational outcome next to academic performance (pekrun, frenzel, goetz, & perry, 2007). therefore, learning about the factors that play a role in how these emotions develop – and how, in turn, they further influence academic outcomes – is crucial. good education should also care about how students feel and not only how well they can perform academically. keypoints academic control impacts strongest learning hopelessness adaptive cognitions impact both learning enjoyment and boredom maladaptive behaviours impact learning enjoyment and anxiety achievement outcomes are mainly predicted by academic control and learning hopelessness references ainley, m., & ainley, j. (2011). student engagement with science in early adolescence: the contribution of enjoyment to students’ continuing interest in learning about science. contemporary educational psychology, 36(1), 4–12. doi:10.1016/j.cedpsych.2010.08.001 artino jr., a. r., & jones ii, k. d. (2012). exploring the complex relations between achievement emotions and self-regulated learning behaviors in online learning. emotions in online learning environments, 15(3), 170–175. doi:10.1016/j.iheduc.2012.01.006 astleitner, h. (2000). designing emotionally sound instruction: the feasp-approach. instructional science, 28(3), 169–198. baker, r. w., & siryk, b. (1999). sacq student adaptation to college questionnaire (2nd ed.). los angeles: western psychological services. cohen, j (1992). a power primer. psychological bulletin, 112(1), 155-159. http://dx.doi.org/10.1037/00332909.112.1.155 dettmers, s., trautwein, u., lüdtke, o., goetz, t., frenzel, a. c., & pekrun, r. (2011). students’ emotions during homework in mathematics: testing a theoretical model of antecedents and achievement outcomes. students’ emotions and academic engagement, 36(1), 25–35. doi:10.1016/j.cedpsych.2010.10.001 frenzel, a. c., pekrun, r., & goetz, t. (2007). girls and mathematics--a “hopeless” issue? a control-value approach to gender differences in emotions towards mathematics. european journal of psychology of education, 22(4), 497–514. goetz, t., frenzel, a. c., pekrun, r., & hall, n. c. (2006). the domain specificity of academic emotional experiences. journal of experimental education, 75(1), 5–29. goetz, t., nett, u. e., martiny, s. e., hall, n. c., pekrun, r., dettmers, s., & trautwein, u. (2012). students’ emotions during homework: structures, self-concept antecedents, and achievement outcomes. learning and individual differences, 22(2), 225–234. doi:10.1016/j.lindif.2011.04.006 hall, n. c., perry, r. p., ruthig, j. c., hladkyj, s., & chipperfield, j. g. (2006). primary and secondary control in achievement settings: a longitudinal field study of academic motivation, emotions, and performance1. journal of applied social psychology, 36(6), 1430–1470. niculescu et al | f l r 16 hu, l., & bentler, p. m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling:a multidisciplinary journal, 6(1), 1–55. doi:10.1080/10705519909540118 ibm corp. released 2013. ibm spss statistics for windows, version 22.0. armonk, ny: ibm corp. jöreskog, k.g., sörbom, d. (1996). lisrel 8 user's reference guide. scientific software international: chicago. leone, c., & richards, h. (1989). classwork and homework in early adolescence: the ecology of achievement. journal of youth and adolescence, 18(6), 531–548. doi:10.1007/bf02139072 liem, g. a. d., & martin, a. j. (2012). the motivation and engagement scale: theoretical framework, psychometric properties, and applied yields. australian psychologist, 47(1), 3–13. doi:10.1111/j.1742-9544.2011.00049.x martin, a. j. (2007). examining a multidimensional model of student motivation and engagement using a construct validation approach. british journal of educational psychology, 77(2), 413–440. doi:10.1348/000709906x118036 martin, a. j. (2009). motivation and engagement across the academic life span: a developmental construct validity study of elementary school, high school, and university/college students. educational and psychological measurement, 69(5), 794–824. doi:10.1177/0013164409332214 martin, a. j. (2011). holding back and holding behind: grade retention and students’ non-academic and academic outcomes. british educational research journal, 37(5), 739–763. doi:10.1080/01411926.2010.490874 martin, a.j., marsh, h. (2006). academic resilience and its psychological and educational correlates: a construct validity approach. psychology in the schools, 43(3), 267 281. martin, a.j., marsh, h.w., debus, r.l. (2001). self-handicapping and defensive pessimism: exploring a model of predictors and outcomes from a self-protection perspective. journal of educational psychology, 93, 87–102. pekrun, r. (2000). a social-cognitive, control-value theory of achievement emotions. in j. heckhausen (ed.), motivational psychology of human development: developing motivation and motivating development. (pp. 143–163). new york, ny us: elsevier science. pekrun, r. (2006). the control-value theory of achievement emotions: assumptions, corollaries, and implications for educational research and practice. educational psychology review, 18(4), 315–341. doi:10.1007/s10648-006-9029-9 pekrun, r., elliot, a. j., & maier, m. a. (2006). achievement goals and discrete achievement emotions: a theoretical model and prospective test. journal of educational psychology, 98(3), 583–597. doi:10.1037/0022-0663.98.3.583 pekrun, r., goetz, t., frenzel, a. c., barchfeld, p., & perry, r. p. (2011). measuring emotions in students’ learning and performance: the achievement emotions questionnaire (aeq). contemporary educational psychology, 36(1), 36–48. doi:10.1016/j.cedpsych.2010.10.002 pekrun, r., goetz, t., titz, w., & perry, r. p. (2002). academic emotions in students’ self-regulated learning and achievement: a program of qualitative and quantitative research. educational psychologist, 37(2), 91–106. perry, r. p., hladkyj, s., pekrun, r. h., clifton, r. a., & chipperfield, j. g. (2005). perceived academic control and failure in college students: a three-year study of scholastic attainment. research in higher education, 46(5), 535–569. doi:10.1007/s11162-005-3364-4 perry, r. p., hladkyj, s., pekrun, r. h., & pelletier, s. t. (2001). academic control and action control in the achievement of college students: a longitudinal field study. journal of educational psychology, 93(4), 776. preckel, f., goetz, t., pekrun, r., & kleine, m. (2008). gender differences in gifted and average-ability students: comparing girls’ and boys’ achievement, self-concept, interest, and motivation in mathematics. gifted child quarterly, 52(2), 146–159. niculescu et al | f l r 17 putwain, d. w., larkin, d., & sander, p. (2013). a reciprocal model of achievement goals and learning related emotions in the first year of undergraduate study. contemporary educational psychology, 38(4), 361–374. doi:10.1016/j.cedpsych.2013.07.003 putwain, d. w., sander, p., & larkin, d. (2013). using the 2 × 2 framework of achievement goals to predict achievement emotions and academic performance. learning and individual differences, 25(0), 80–84. doi:10.1016/j.lindif.2013.01.006 robinson, m. d., & clore, g. l. (2002). belief and feeling: evidence for an accessibility model of emotional self-report. psychological bulletin, 128(6), 934–960. doi:10.1037//0033-2909.128.6.934 ruthig, j. c., perry, r. p., hladkyj, s., hall, n. c., pekrun, r., & chipperfield, j. g. (2007). perceived control and emotions: interactive effects on performance in achievement settings. social psychology of education, 11(2), 161–180. doi:10.1007/s11218-007-9040-0 schutz, p. a., & pekrun, r. (2007). introduction to emotion in education. in p. a. schutz & r. pekrun (eds.), emotion in education. (pp. 3–10). san diego, ca us: elsevier academic press. stupnisky, r. h., perry, r. p., hall, n. c., & guay, f. (2012). examining perceived control level and instability as predictors of first-year college students’ academic achievement. contemporary educational psychology, 37(2), 81–90. doi:10.1016/j.cedpsych.2012.01.001 tempelaar, d. t., niculescu, a., rienties, b., gijselaers, w. h., & giesbers, b. (2012). how achievement emotions impact students’ decisions for online learning, and what precedes those emotions. the internet and higher education, 15(3), 161–169. doi:10.1016/j.iheduc.2011.10.003 tinto, v. (1997). colleges as communities: exploring the educational character of student persistence. journal of higher education, 68(6). trautwein, u., schnyder, i., niggli, a., neumann, m., & lüdtke, o. (2009). chameleon effects in homework research: the homework–achievement association depends on the measures used and the level of analysis chosen. contemporary educational psychology, 34(1), 77–88. doi:10.1016/j.cedpsych.2008.09.001 verma, s., sharma, d., & larson, r. w. (2002). school stress in india: effects on time and daily emotions. international journal of behavioral development, 26(6), 500–508. doi:10.1080/01650250143000454 zeidner, m. (1998). test anxiety: the state of the art. springer. codepen vriesema frontline learning research vol.8 no. 3 special issue (2020) 126 139 issn 2295-3159 experience and meaning in small-group contexts: fusing observational and self-report data to capture self and other dynamics christine calderon vriesemaa, & mary mccaslinb auniversity of california, santa barbara, usa buniversity of arizona, usa article received 17 may 2019 / revised 15 november/ accepted 1 january / available online 30 march abstract self-report data have contributed to a rich understanding of learning and motivation; yet, self-report measures present challenges to researchers studying students’ experiences in small-group contexts. rather than using self-report data alone, we argue that fusing self-report and observational data can yield a broader understanding of students’ small-group dynamics. we provide evidence for this assertion by presenting mixed-methods findings in three sections: (a) self-report data alone, (b) observational data alone, and (c) the fusion of both data sources. we rely on 101 students’ self-reported experiences as well as observational (i.e., audio) data of students working in their group (n = 24 groups). in section order, we found that (1) students’ self-reported small-group behavior predicted their end-of-study reported anxiety and emotion; (2) coded observational data captured five types of group dynamics that students can engage in; and (3) students’ initial group-level characteristics predicted their real-time group dynamics, and observed group regulation activity predicted students’ self-reported anxiety, emotion, and regulation moving forward. thus, while self-report and observational data alone can each increase our understanding of student motivation and learning processes, pursuing both in tandem more effectively captures the give-and-take among students, how these experiences evolve over time, and the personal meanings they can afford. keywords: self-report; observation; small-group dynamics; motivation; co-regulation info corresponding author email: vriesecn@uwec.edu doi: https://doi.org/10.14786/flr.v8i3.493 1. introduction instruments measuring students’ motivation and learning processes have contributed to a rich understanding of students’ experiences in school. yet, the extent to which self-report measures adequately capture the learning process for all students across varying contexts remains an important concern (e.g., urdan & bruchmann, 2018). for researchers studying specific instructional contexts, self-report data can pose challenges to investigating motivation and strategy use in small groups. namely, self-report measures make it difficult to investigate how students’ reported behavior and emotion occur in real-time and in relation to other people in their immediate environments. when students work together, each person brings unique experiences and characteristics into their small groups. how identity, disposition, motivation, and readiness to learn impact group functioning—and how individuals are impacted by their interactions with others over time—reflects a dynamic process that self-report data alone cannot capture. to better understand this complex learning environment, we pursued a longitudinal, mixed-methods study of 101 students’ small-group experiences during six math lessons (n = 24 groups from two third-grade and two fifth-grade classrooms). students completed self-report measures at pretest and at posttest. throughout the study, students also completed an instrument describing their individual small-group behaviors after each lesson. finally, after completing the study, students responded to items asking them how they would feel if their teacher asked them to get into small groups again. in addition to these self-report data, our project included real-time audio data of students working in their small groups. selected results of this study were briefly discussed as part of a larger chapter focusing on the guiding theoretical perspective (mccaslin & vriesema, 2018); we present the full study here for the first time. the present special issue aims to better understand the impact of self-report data on theory and practice (dinsmore & fryer, 2020). we contribute to this goal by specifically addressing two of the three guiding questions: how does the use of self-report constrain the analytical choices made with self-report data, and how do the interpretations of self-report data influence interpretations of findings? we situate both questions within the context of small-group research. we begin by briefly introducing the guiding theory. we then present our study’s findings in three sections depicting what we learn from self-report data alone, observational data alone, and integrating both data sources. 1.1 theory the co-regulation model (mccaslin, 2009) that guides this research is a motivation perspective positing that learners are social, have a basic need for participation and validation (mccaslin & burross, 2008), and differ in how and what they participate (mccaslin et al., 2016). influenced by vygotskian tenets, this theory describes how three sources of influence function together to inform emergent identity. these sources are cultural (e.g., norms, challenges), social (e.g., relationships, opportunities), and personal (e.g., readiness to learn, disposition). students bring their personal backgrounds and characteristic adaptations to the classroom; yet, the opportunities presented to students and the relationships formed throughout their schooling experiences can shape who students become. given the dynamic processes described within this theoretical perspective, small-learning groups present an opportune setting to study emergent identity. students in small groups each bring varying achievement levels, dispositions, and motivation to the task. however, the nature of the small-group instructional setting requires that students work together toward a common goal and negotiate challenges when necessary. when students work with each other across multiple occasions, small groups provide an opportunity to understand how student identity informs their work with other classmates and how these shared classroom experiences can shape student identity moving forward. some scholars have hailed small-group learning formats as the success story of educational psychology (johnson & johnson, 2009). small-group activities can enhance student thinking and learning of both formal (e.g., math) and informal (e.g., appropriate social skills, motivated student engagement) content and skills (e.g., elias & schwab, 2006; hadwin et al., 2018; webb, 2008). however, while small-group learning has demonstrated benefits, there also are concerns that not all small-group activities are beneficial nor do all group members experience them similarly (rogat et al., 2013; webb, 2013). naturalistic observational studies examining the processes that actually occur within small groups and what students make of them are relatively scarce. extant research, however, suggests their importance (e.g., hadwin & järvelä, 2011; tan et al., 2005; webb, 2013). therefore, the dynamic processes occurring within small-group settings necessitate dynamic methodologies to study them. asking students about their experiences in small groups can yield important information regarding students’ interpretations of events; and, researchers can investigate how students’ personal characteristics associate with these self-report data. however, self-report data alone cannot capture the give-and-take of small-group interactions. yet, observational data alone also can fail to capture the full student experience. in the case of observation-only data, researchers rely on their own interpretations of events and fail to capture students’ own self-reported experiences of the events. thus, combining self-report and observational data provides a foundation for more fully understanding how individual characteristics inform small-group dynamics, and how these dynamics inform student identity moving forward. to illustrate these points in finer detail, we present three sections that discuss (a) self-report data, (b) observational data, and (c) the fusion of both data sources in our research. 2. section 1: self-report data this section relies on self-report data to illustrate how students’ reported small-group behavior associated with their characteristics at pretest and posttest. first, we describe how students’ pretest characteristics—their teacher-ranked math readiness, self-reported anxiety and emotional adaptation (i.e., context-dependent emotion and coping strategies; mccaslin et al., 2016)—associated with their self-reported small-group behavior. second, we show how self-reported group behavior predicted students’ posttest anxiety, emotional adaptation, and reports of how they would feel if their teachers asked them to get into small groups again. to contextualize these results, we first describe the relevant method information. 2.1 procedure students (n = 101) completed the pretest (october) and posttest (january) surveys that measured their anxiety and emotional adaptation. teachers also ranked each of their students on mathematics achievement at pretest. at the end of the study, students completed an instrument asking them how they would feel if their teacher asked them to get into groups again. throughout the study, students also completed short instruments immediately after each small group lesson to indicate their behavior during the lesson. we present students’ average reported behavior (i.e., the average across the six lessons) below in order to enhance clarity of the results. 2.2 data sources 2.2.1 what school is like for me (wslm) wslm is the test anxiety scale, a well-known, well-researched, and well-critiqued instrument (pekrun, 2006; zeidner & matthews, 2005) adapted from sarason et al., 1958). wslm asks students to agree or disagree with 18 sentences describing anxious thoughts and feelings. cronbach’s alpha for wslm was α = .76 at pretest and .71 at posttest. 2.2.2 school situations (ss) school situations (ss; burggraf, 1993) is an adaptation of the test for self-conscious affect (tosca), a dispositional measure originally designed for adults and subsequently revised by tangney and colleagues to include children (e.g., tangney et al., 1995). the ss inventory asks students to use a five-point scale to endorse sentences in response to 12 written vignettes that portray routine school challenges within three contexts: whole class, small group, or private/individual. sentences are behavioral representations of emotions (guilt, shame, or pride) and coping strategies (externalize, normalize). rather than consider the five ss scales (pride, guilt, shame, normalize, externalize) independently, as originally designed, we used five unique emotional adaptation profiles identified in previous research (mccaslin et al., 2016) for our analyses. the five profiles were: (1) distance and displace: the student attempts to withdraw from a difficult situation to care for the self and/or attempts to blame other people or things to find relief from feelings of shame; (2) regret and repair: the student attempts to repair or fix the situation and to care for the self through normalizing the event in order to find relief from feelings of guilt; (3) inadequate and exposed: the student assumes responsibility and blame for mistakes or difficulties without engaging in self-care or displacement strategies in response to negative emotion; (4) proud and modest: the student acknowledges success, but tempers feelings of pride with humility; and (5) minimize and move on: the student adopts a ‘just keep going, do not dwell, look beyond it’ escape response to mistakes and difficult situations. at pretest, cronbach’s alpha was .75, .79, .70, .72, and .64 for distance and displace, regret and repair, inadequate and exposed, proud and modest, and minimize and move on, respectively. in the same order, internal consistency reliability at posttest was .75, .87, .70, .75, and .66, respectively. 2.2.3 how i was in group today (how i was) how i was presented 20 sentences to students and asked them to underline any that described their behavior in their group that day. sentences comprised three scales (mccaslin et al., 1994): (1) enhancing: sentences that represent engagement from which other group members may benefit; (2) neutral: sentences that represent participation that is neither active nor withdrawn; and (3) interfering: sentences that describe preoccupation with concerns of the self. the interfering scale consisted of items suggesting that students withdrew from or were unable to participate in small group activity (e.g., “my stomach felt funny”; “my head hurt”) rather than engaging in behaviors that actively distracted or interfered with others in small group. therefore, we subsequently refer to this scale as “withdrawn” to clarify this distinction. 2.2.4 how i felt how i felt was designed to capture students’ thoughts and feelings when the teacher said it was time to get into their small group. it consisted of six items that described positive and negative emotional experiences in three relative domains: cognitive, affective, and physiological. interested (cognitive), happy (affective mood), and relaxed (physiological) comprised the “positive” scale (α = .85). confused (cognitive), sad (affective mood), and nervous (physiological) comprised the “negative” scale (α = .82). students used a 3-point scale (not at all, a little bit, a lot) to respond to each item. 2.3 results 2.3.1 pretest student characteristics and self-reported small-group behavior students’ pretest anxiety and emotional adaptation did not associate with students’ self-reported small-group behavior. however, students with higher initial math readiness reported greater use of neutral regulation strategies, such as listening, during their small groups (r = -.27, p = .007; higher numbers indicate lower rank in math readiness). 2.3.2 self-reported small-group behavior and posttest student characteristics we pursued a series of multiple regression analyses that controlled for students’ reported pretest anxiety and pretest emotional adaptation. we did not control for group membership (e.g., using fixed effects models) because we believed that this might yield decontextualized results. in this paper, we focused on exploring how group processes shaped individual processes and vice versa; thus, we did not account for group membership in order to work toward this goal. however, we did attempt to cluster errors at the group level in our regression analyses in order to account for the shared variance within groups. unfortunately, we did not have a sufficient number of participants for the number of groups in our study to run this analysis effectively. as a result, we proceeded to use traditional multiple regression analyses here and subsequently in the paper. results indicated that students’ self-reported behavior in their small-groups predicted students’ posttest anxiety (f(9, 72) = 4.16, p < .001; r2 = .34, adjusted r2 = .26), as well as two emotional adaptation profiles: regret and repair (f(9, 71) = 5.22, p < .001; r2 = .40, adjusted r2 = .32) and inadequate and exposed (f(9, 71) = 3.00, p = .004, r2 = .28, adjusted r2 = .18). specifically, reported use of enhancing regulation during small group predicted less anxiety at posttest (β = -2.32, p = .035). use of withdrawn regulation also predicted lower endorsement of regret and repair and inadequate and exposed emotional adaptation at posttest (β = -0.25, p = .01; β = -0.22, p = .051, respectively). 2.3.3 self-reported small-group behavior and posttest anticipated affect we pursued a series of multiple regression analyses that controlled for students’ pretest anxiety and emotional adaptation to determine how students’ self-reported behavior during small group predicted their anticipated affect at posttest (i.e., when they imagined the teacher asking them to get into small groups again). students’ self-reported behavior in small groups predicted their endorsement of both positive and negative affect (f(9,75) = 3.00, p < .001, r2 = .32, adjusted r2 = .24; f(9,75) = 3.45, p = .001, r2 = .29, adjusted r2 = .21, respectively). reported enhancing behavior during small group predicted greater anticipated positive affect (β = 0.50, p < .001); in contrast, reported withdrawn behavior predicted greater anticipated negative affect (β = 0.42, p < .001). 2.4 constrained analytical choices and interpretations overall, the self-report data indicated how students’ reported small-group behavior associated with their personal characteristics and attitudes at pretest and posttest. specifically, average student-perceived enhancing behavior across the six lessons predicted lower anxiety at posttest and greater positive affect at the end of the study when students imagined getting into small groups again. in contrast, students who described themselves as withdrawn during their small groups felt more negative emotion when they imagined getting into small groups again. student-perceived withdrawn behavior also predicted less endorsement of inadequate and exposed and regret and repair emotional adaptation; thus, while withdrawing from participation might mitigate the potential for experiencing shame in small-group settings, it also prevents students from potentially developing strategies for overcoming interpersonal challenges with peers. although the interpretations of self-report data provided insight into how students’ perceived small-group behavior associated with their personal characteristics and expectations (e.g., affect), there are several important limitations. first, our analyses were constrained by individual-level data. the data allowed us to examine how students’ self-reported behavior associated with their pretest and posttest outcomes; yet, students do not participate in their small groups alone. the constrained data sources prevented a more complete understanding of the give-and-take among students in these settings. second, our interpretations of the data relied purely on student reports. students’ individual interpretations of their classroom activities are vital to understanding their emergent identity; however, finding ways to corroborate self-report data with real-time data can enhance understanding of selfand other-awareness in small-group dynamics. 3. section 2: observational data while section 1 illustrated associations with students’ self-reported small-group behavior, section 2 depicts students’ actual behavior during their small groups. in section 2, we describe the types of co-regulation dynamics that emerged during students’ small-groups lesson and how the dynamics associated with the groups’ average achievement on the small-group tasks; the group is the unit of analysis. we present the observational results after describing the relevant procedures and coding systems. 3.1 procedure three researchers independently analyzed, transcribed, and verified audio data of small-group interactions for three lessons (representing the beginning, middle, and end of the six lessons) for each group (n = 24 groups). the three researchers remained unaware of the larger study. two complementary observation systems were developed for analyzing the audio data. we describe the coding systems below. 3.2 data sources 3.2.1 group behavior checklist (gbc) the first system, the gbc, is a lower-inference observation instrument that captured the range of onand off-task behaviors that students displayed when working with others in small groups. this study used four gbc variable domains: (a) planning, (b) problem solving, (c) help-seeking, and (d) feedback. coding was completed in 30-second intervals. in total, 2,180 intervals were coded with the gbc. the percentage of exact agreement (91%) among coders was calculated on three coding pairs over three lessons. 3.2.2 group environment summary (ges) the second system, the ges, is a higher-inference system that captured students’ interpersonal and affective dynamics and expressed intrapersonal coping strategies. variable domains included group affective climate; giggle/laugh bursts; and types of aggressive, protective, regressive/escape, and somatic expressed coping behaviors. coding was completed in two halves: at the mid-point and end of each lesson. the percentage of exact agreement was 73% among three coding pairs over three lessons. see mccaslin and colleagues (2011) for more complete documentation of audio enhancement and transcription procedures; mccaslin and vega (2013) for coding system design, procedures, and application in the pilot study; and vega (2014) for implementation decisions for the revised system. 3.2.3 group achievement student activity worksheets completed “by the group” as part of each lesson were scored and verified by two math educators for correctness. percentage correct represented students’ small-group achievement for the lesson material. 3.3 results 3.3.1 group dynamics we represented the gbc and ges data as the percentage of intervals in which a behavior occurred. we then subjected the data from both observation systems to a principal components analysis using varimax rotation in order to develop an understanding of overall group dynamics from our discrete coding categories. results yielded five independent factors that collectively accounted for 61.92% of the variance in student small-group interactive behavior. factors, in order of magnitude, were: conflict and control, working together, resource drain, edgy compliance, and scuffle and confusion (see table 1 for example behaviors and percentage variance explained by factor). we consider these five distinct co-regulation dynamics that students can engage in while in small groups. table 2 real-time co-regulation dynamics note. only the top three positively loaded items for each factor are listed in the table. total number of items varied by factor: n = 18 for factor 1; n = 11 for factor 2; n = 7 for factor 3; n = 9 for factor 4; n = 5 for factor 5. the exploratory factor analysis yielded 8 cross-loaded items: 5 items loaded in the opposite direction, and 3 items loaded in the same direction. the five small-group dynamics factors can be organized into relatively task-focused, other-focused, or the fusion of the two perspectives. in task-focused contexts, student dynamics primarily centered on the academic activity at hand, whereas dynamics in other-focused contexts reflected an emphasis on one’s group members. in addition, we can consider how types of coping behaviors typically associated with individual student behavior—aggressive, protective, and regressive—emerged as characteristics of group co-regulation dynamics. please see figure 1 for a visual representation of how the joint activity varied across the five small-group dynamics. figure 1. small-group regulation foci note. this figure was adapted from mccaslin and vriesema (2018). the working together dynamic fused the demands of task and peers in small group learning within a protective press. group members could ask for assistance, disagree with each other, and offer suggestions and solutions without concern for personal safety. in comparison, an aggressive press encompassed both the relatively task-involved edgy compliance dynamic (in which provocative and aggressive behaviors were related to attempts to meet task demands) and the other-involved conflict and control dynamic (in which aggression and protection behaviors consumed group attention). finally, the scuffle and confusion dynamic in the disorganized pursuit of task demands and the resource drain of needy peers were each marked by regressive, or relatively immature, co-regulation dynamics. taken together, these profiles did not represent particular groups per se; rather, they represented the types of co-regulation dynamics—that is, the types of observed behavior (e.g., communication patterns, coping strategies)—that emerged during the students’ time in small groups. please see table 2 for the means and standard deviations for the co-regulation dynamics. table 3 descriptive statistics for students’ co-regulation dynamics note. means and standard deviations reflect the percentage of time students spent engaging in each of the co-regulation dynamics. 3.3.2 group achievement scuffle and confusion negatively associated with the average percentage correct on group task activities (r = -.46, p = .024). group achievement did not associate with conflict and control (r = -.02, p = .929), working together (r = .09, p = .670), resource drain (r = .08, p = .718), or edgy compliance (r = .21, p = .321). 3.4 constrained analytical choices and interpretations we presented this section depicting observational data alone for two reasons. first, the extensive coding systems provided a framework for understanding the types of dynamics that can emerge in small groups. the researchers observed student behavior that informed behavioral co-regulation ranging from the ‘ideal’ working together dynamics to the aggressive give-and-take between students (e.g., conflict and control) to the disorganized task pursuits that embodied scuffle and confusion. these interpretations, therefore, yielded a broader understanding of students’ systematically observed behavior during their small-group lessons. second, we presented the observational data alone to illustrate that even with real-time data of students working in their small groups, we fail to understand what these dynamics can mean for students’ identity moving forward. relying on observational data alone constrained our analyses to correlations between small-group dynamics and group achievement on the small-group tasks. while this has the important benefit of aligning with the types of data available to teachers when they use small groups in their instruction, we argue that fusing observational and self-report data can provide a more nuanced understanding of students’ experiences in small groups as well as insight into what these classroom experiences can mean for students’ emergent identity. we provide evidence for this argument in the next section. 4. section 3: fusing self-report and observational data rather than constraining self-report data to individual-level analyses or relying solely on group-level observational data, section 3 first illustrates how group-level characteristics associated with real-time group dynamics. specifically, we took the average of group members’ individual characteristics, such as emotional adaptation, to determine how the group’s overall approach to learning tasks associated with their group functioning during the small-group lessons. second, we describe how the give-and-take of small-group dynamics predicted students’ individual characteristics at posttest (i.e., their anxiety, emotional adaptation, and anticipated affect). third, while we have confidence in the reliability of our systematic coding procedures, researcher perceptions of small group dynamics may not coincide with student perceptions of their small-group experiences. therefore, we also present results that depict the alignment between studentand researcher-perceived group behaviors. 4.1 results 4.1.1 pretest group characteristics and real-time group co-regulation dynamics we created group-averaged scores for anxiety, emotional adaptation, and math readiness in order to determine how group composition associated with co-regulation dynamics. anxiety. group-averaged anxiety did not associate with students’ group regulation dynamics. emotional adaptation. group-averaged endorsement of inadequate and exposed positively associated with working together dynamics (r = .47, p = .021). group-averaged endorsement of distance and displace positively associated with engagement in resource drain dynamics (r = .41, p = .048). finally, group-averaged endorsement of proud and modest negatively associated with edgy compliance group dynamics (r = -.42, p = .039). math readiness. groups with a greater concentration of higher-ranked math students were more likely to display working together and edgy compliance dynamics (r = -.21, p = .049; r = -.23, p = .028 respectively). in contrast, groups with a greater concentration of lower-ranked math students were more likely to engage in scuffle and confusion dynamics (r = .37, p = .001). 4.1.2 real-time group co-regulation dynamics and posttest student outcomes we pursued a series of multiple regression analyses controlling for students’ pretest characteristics to determine how real-time group dynamics predicted students’ self-reported anxiety, emotional adaptation, and anticipated affect at posttest. anxiety. small-group dynamics predicted students’ self-reported anxiety, f(11, 66) = 3.97, p < .001; r2 = .40, adjusted r2 = .30. specifically, participating in groups that displayed working together and resource drain co-regulation dynamics predicted lower posttest student anxiety (β = -.27, p =.011; β = -.27, p = .012, respectively). although students may have used different strategies in the two co-regulation dynamics, receiving help from peers in both contexts may have associated with lower anxiety at posttest. emotional adaptation. group dynamics predicted students’ endorsement of regret and repair at posttest, f(11, 65) = 4.09, p < .001; r2 = .41, adjusted r2 = .31. in particular, edgy compliance dynamics predicted greater regret and repair at posttest (β = .21, p = .038). group dynamics did not predict the other four emotional adaptation profiles. anticipated affect. we explored whether small-group dynamics predicted how students would feel if their teachers asked them to get into their small groups again. students’ observed small-group dynamics predicted their self-reported anticipated positive affect (f(11, 69) = 2.08, p < .001; r2 = .25, adjusted r2 = .13). specifically, participating in groups that displayed working together and edgy compliance dynamics predicted greater anticipated positive affect reported by students at end of the study (β = .43, p = .004, and β = .30, p = .041, respectively). small-group dynamics did not predict anticipated negative affect. 4.1.3 alignment between self-report and observational group data to examine the alignment between student and researcher perceptions, we (a) created group-averaged how i was scores for the same lessons for which we had coded data and then (b) examined the associations between self-reported group behavior and observed group behavior. group is the unit of analysis (n = 24). group-averaged reported enhancing behavior positively associated with working together and resource drain co-regulation dynamics (r = .27, p = .010; r = .31, p = .003, respectively). group-averaged reported withdrawn behavior positively associated with both resource drain (r = .24, p = .022) and conflict and control (r = .23, p = .027) dynamics. finally, group-averaged reported withdrawn behavior also negatively associated with the working together dynamic (r = -.29, p = .004). 4.2 interpretations fusing the self-report and real-time observation data provided two main insights into students’ experiences in small groups. first, results indicated that students were aware of themselves within their groups. the self-reported small group behavior aligned with the systematic observation data. for example, self-reported enhancing behavior positively associated with working together co-regulation, while self-reported withdrawn behavior associated with greater conflict and control and less working together co-regulation. further underscoring the alignment between the two data sources, resource drain co-regulation—dynamics in which group members expressed needs (e.g., by asking for materials, attention, etc.)—associated with both self-reported enhancing and withdrawn behavior. in this instance, groups consisted of members who provided help (enhancing) to those who needed and/or wanted it (withdrawn). thus, students as young as grade three appear to accurately self-monitor, and students as old as grade five appear willing to accurately report their small-group behavior. second, results indicated that students’ emotional adaptation and academic readiness were important features of small-group dynamics and personal learning. for example, participation in edgy compliance co-regulation dynamics, unpleasant as it may have been, predicted an increase in students’ posttest endorsement of the regret and repair emotional adaptation profile. this suggested that students were not only aware of the behavior exhibited in their small group but that they learned about themselves and others from that experience. in this instance, interpersonal dynamics informed intrapersonal endorsements that appeared to move the student away from their prior experience toward the person they wanted to become—the person who feels badly when failing to support another and works to make amends. students’ academic readiness also provided evidence for the press between intraand interpersonal dynamics. groups with higher-ranked students, for example, were more likely to display working together and edgy compliance co-regulation dynamics. this suggests that students with higher math readiness have the potential to direct their resources in more productive (e.g., offering suggestions, asking questions) and less productive (e.g., bragging, refusing others’ participation) ways. yet, in spite of the different real-time interaction patterns, the self-report data indicated that students learned from these experiences and that the small groups shaped students’ posttest characteristics. as noted with edgy compliance co-regulation, these dynamics predicted greater endorsement of positive regulation strategies moving forward (regret and repair); and, for the groups already working together, these dynamics predicted lower anxiety at posttest. 5. discussion in this paper, we addressed two of the three guiding questions in this special issue: how does the use of self-report constrain the analytical choices made with self-report data, and how do the interpretations of self-report data influence interpretations of findings? our goal was to illustrate the benefits and challenges of using self-report data to understand students’ experiences in and attitudes towards small groups. overall, using self-report data alone provided insight into students’ self-awareness and perceptions of their own behavior during small group. while useful for capturing the perceived student experience, relying purely on self-report data constrained our inquiry to individual-level analyses in ways that ignored the mutual give-and-take between the individual and their group members. thus, the primary challenge of using self-report data to study small groups is that we fail to capture the dynamic processes that are inherent in these social learning tasks. in other words, self-report data can illustrate how personal sources of influence (e.g., math readiness, emotional adaptation) shape students’ experiences and emergent identities; yet, we fail to also learn how students shape—and are shaped by—the mutual press between personal and social sources of influence in real time. although we emphasized the role of self-report data for understanding students’ experiences in small groups, this paper also identified strengths and limitations of observation-only data. for example, our real-time data corroborated research by ladd and colleagues (2014) in which students reported the (lack of) positive small-group behaviors displayed by their peers. the researchers noted that “substantial proportions of participants received average ratings that were so low…as to imply that they “seldom” or “never” exhibited such skills during collaborative classroom activities” (p. 169). our data provided insight into which behaviors and skills these peers might engage in instead. working together is great when it happens, but the pursuit of joint activity, in which disagreements are respected, questions appreciated, peer elaborations valued, and understandings deepened, does not represent the reality of the array of small group dynamics. instead, groups also display aggressive and regressive coping behaviors that result in more or less effective group functioning. yet, while observation-only data yielded a broader understanding of real-time behavior in small groups, we nevertheless failed to capture students’ perceived experiences within these dynamics. rather, by fusing self-report and observational data, we learned that small-group co-regulation dynamics were saturated with social and self-conscious emotions, and the uneven regulation of those emotions often did not proceed smoothly or turn out well. overall, students differed in their typical need to cope with learning difficulty, but coping with lack of control and uncertainty are part of what it means to be in a small group for most. to be in a small group with peers—classmates who vary in their own learning and social skills (ladd et al., 2014; rogat et al., 2013)—can exacerbate or attenuate that reality. thus, like others in this special issue (e.g., rogiers et al., 2020; van halem et al., 2020), we argue that using multiple data sources can yield broader understandings of student behavior in classroom settings. 5.1 considerations the focus on self-report data in this paper and special issue warrants further discussion of survey data in particular. first, some critiques of self-report data question whether participants can provide accurate responses to researchers’ survey items. these concerns broadly reflect literature showing that participants sometimes fail to understand their own motives (nisbett & wilson, 1977) or provide opinions about events that did not happen (bishop et al., 1980). however, the present study provides evidence that students in grades three and five can (and are willing to) provide accurate reports of their time in small groups. students’ self-reported individual group behavior aligned with researcher-observed behaviors taking place during the small-group activities. for researchers, this suggests the utility of using self-report data in research on small-group processes, particularly when these survey measures can be corroborated with other data sources. furthermore, in line with prior recommendations (corno, 2011), teachers may also want to consider using brief surveys to better understand how small-group activities unfolded in their classrooms. of course, while this strategy may help teachers to refine these activities in their classes, future research will also need to determine whether students’ responses vary depending on whether students are reporting co-regulation dynamics to their teachers or to researchers. second, in addition to considering participants’ understanding of their own attitudes, some researchers express concerns about using self-report data due to the surveys themselves. these criticisms acknowledge that there may be aspects of any given survey that can prevent participants from responding as accurately as possible (duckworth & yeager, 2015). in the current paper, the internal consistency reliability coefficients generally provided one source of evidence for using these survey measures in our analyses. however, it is important to acknowledge that one of our five school situations factors, minimize and move on, fell below the recommended .70 for cronbach’s alpha (nunnally, 1978). thus, even though this factor—capturing student escape strategies—was identified in previous research using this same instrument (mccaslin et al., 2016), we encourage researchers to replicate this work to determine whether the five school situations factors emerge in their own samples, or whether some strategies do not translate across all contexts. in spite of the relatively low internal consistency for the minimize and move on factor, we are confident in our self-report measures, again due to the alignment between the survey and observational data in this study. 5.2 conclusion in sum, conceptions of small-group members in terms of their cooperative or regulatory skill set is a start that is likely to sputter without recognition of the fullness of individuals who have personal histories and concerns that make them more and less vulnerable to threat (frijda, 2008) and making threats. this calls for expanding conceptions of small-group cooperative skills and dispositions of individuals to include, for example, considerations of power and influence among group members. our observations of demanding behavior and provocative exchange suggest that students can consider power from a perspective of coercion and control rather than one of support and positive influence (keltner, 2016). students’ personal concerns and heightened perceptions of threat are part of power and influence dynamics. both are better understood within deliberate consideration of conflicts that may underlie and result from them. we do students a disservice when we fail to acknowledge the fullness of the task of working and learning with others. we also miss an opportunity to fully learn from the potential of small-group learning for students’ personal growth and well-being when we fail to use multiple research methods fluidly. thus, while self-report and observational data alone can each increase our understanding of student motivation and learning processes, pursuing both in tandem can yield richer understandings of students’ classroom activity, how these experiences evolve over time, and how that matters in the dynamics of being and becoming a student. keypoints students’ self-reported behavior during small group predicted their reported end-of-study anxiety and anticipated emotion. real-time audio data indicated five distinct types of co-regulation dynamics that students can engage in within small groups (e.g., communication patterns, coping, etc.). students’ initial group-level characteristics predicted their real-time co-regulation dynamics. co-regulation dynamics during small group predicted individual students’ self-reported end-of-study anxiety, anticipated emotion, and emotional adaptation. the real-time audio data corroborated students’ self-reported behavior during small group. references bishop, g. f., oldendick, r. w., tuchfarber, a. j., & bennett, s. e. (1980). pseudo-opinions on public affairs. the public opinion quarterly, 44(2), 198-209. burggraf, s. a. (1993). school situations. unpublished manuscript. bryn mawr, pa: bryn mawr college. corno, l. (2011). studying self-regulation habits. in h. d. schunk, & b. zimmerman (eds.), handbook of self-regulation of learning and performance (pp. 361-375). new york: routledge. duckworth, a. l., & yeager, d. s. (2015). measurement matters: assessing personal qualities other than cognitive ability for education purposes. educational researcher, 44(4), 237-251. https://doi.org/10.3102/0013189x15584327 elias, m. j., & schwab, y. (2006). from compliance to responsibility: social and emotional learning and classroom management. in c. m. evertson & c. s. weinstein (eds.), handbook of classroom management: research, practice, and contemporary issues (pp. 309-341). mahwah, nj: lawrence erlbaum associates. frijda, n. h. (2008). the psychologists’ point of view. in m. lewis, j. m., haviland-jones, & l. f. barrett (eds.), handbook of emotions, 3rd ed. (pp. 68-87). new york: guilford press. fryer, l. k., & dinsmore, d. l. (2020). the promise and pitfalls of self-report: development, research design and analysis issues, and multiple methods. frontline learning research, 8(3), 1–9. http://doi.org/10.14786/flr.v8i3.623 hadwin, a. f., & järvelä, s. (2011). introduction to a special issue on social aspects of self-regulated learning: where social and self meet in the strategic regulation of learning. teachers college record, 113(2), 235-239. hadwin, a. f., järvelä, s., & miller, m. (2018). self-regulation, co-regulation, and shared regulation in collaborative learning environments. in d. h. schunk & j. a. greene (eds.), handbook of self-regulation of learning and performance (pp. 83-06). new york, ny: routledge. johnson, d. w., & johnson, r. t. (2009). an educational psychology success story: social interdependence theory and cooperative learning. educational researcher, 38, 365-379. https://doi.org/10.3102/0013189x09339057 keltner, d. (2016). the power of paradox: how we gain and lose influence. new york, ny: penguin press. ladd, g. w., kochenderfer-ladd, b., visconti, k. j., ettekal, i. sechler, c. m., & cortes, k. i. (2014). grade-school children’s social collaborative skills: links with partner preference and achievement. american educational research journal, 51(1), 152-183. https://doi.org/10.3102/0002831213507327 mccaslin, m. (2009). co-regulation of student motivation and emergent identity. educational psychologist, 44(2), 137-146. https:// doi.org/10.1080/00461520902832384 mccaslin, m., & burross, h. (2008). student motivational dynamics. teachers college record, 110(11), 2319-2340. mccaslin, m., tuck, d., waird, a., brown, b., lapage, j., & pyle, j. (1994). gender composition and small-group learning in fourth-grade mathematics. elementary school journal, 94, 467-482. mccaslin, m., & vega, r. i. (2013). peer co-regulated learning, emotion, and coping in small-group learning. in s. phillipson, k. y. l. ku, s. n. phillipson (eds.), constructing educational achievement: a sociocultural perspective (pp. 118-135). new york, ny: routledge. mccaslin, m., & vriesema, c. c. (2018). co-regulation: a model for classroom research in a vygotskian perspective. in d. m. mcinerney & g. a. d. liem (eds.), big theories revisited 2: research on sociocultural influences on motivation and learning. charlotte, nc: information age publishing. mccaslin, m., vega, r. i., anderson, e. e., calderon, c. n., labistre, a. m. (2011). tabletalk: navigating and negotiating in small-group learning. in d. mcinerney, r. walker, g. liem (eds.), sociocultural theories of learning and motivation: looking back, looking forward (pp. 191-222). charlotte, nc: information age publishing. mccaslin, m., vriesema, c. c., & burggraf, s. (2016). making mistakes: emotional adaptation and classroom learning. teachers college record, 118(2). nisbett, r. e., & wilson, t. d. (1977). telling more than we can know: verbal reports on mental processes. psychological review, 84(3), 231-259. https://doi.org/10.1037/0033-295x.84.3.231 nunnally, j. c. (1968). psychometric theory (2nd edition). new york: mcgraw-hill. pekrun, r. (2006). the control-value theory of achievement emotions: assumptions, corollaries, and implications for educational research and practice. educational psychology review, 18, 315-341. https://doi.org/10.1007/s10648-006-9029-9 rogat, t. k., linnenbrink-garcia, l., & didonato, n. (2013). motivation in collaborative groups. in c. e. hmelo-silver, c. a. chinn, c. k. k. chan, & a. m. o’donnell (eds.), the international handbook of collaborative learning (pp. 250-267). new york: taylor & francis. rogiers, a.; merchie, e., & van keer, h. (2020). opening the black box of students’ text-learning processes: a process mining perspective. frontline learning research, 8(3), 40–62. http://doi.org/10.14786/flr.v8i3.527 sarason, s. b., davidson, k. s., lighthall, f. f., & waite, r. r. (1958). a test anxiety scale for children. child development, 29(1), 105-113. tan, i. g. c., sharan, s., & lee, c. k. e. (2007). group investigation effects on achievement, motivation, and perceptions of students in singapore. the journal of educational research, 100(3), 142-154. https://doi.org/10.3200/joer.100.3.142-154 tangney, j. p., burggraf, s. a., & wagner, p. a. (1995). shame-proneness, guilt-proneness, and psychological symptoms. in j. p. tangney & k. w. fischer (eds.) self-conscious emotions: the psychology of shame, guilt, embarrassment, and pride (pp. 343-367). ny: guilford press. urdan, t., & bruchmann, k. (2018). examining the academic motivation of a diverse student population: a consideration of methodology. educational psychologist, 53(2), 114-130. https://doi.org/10.1080/00461520.2018.1440234 van halem, n., van klaveren, c. p. b. j., drachsler, h., schmitz, m., & cornelisz, i. (2020). tracking patterns in self-regulated learning using students’ self-reports and online trace data. frontline learning research, 8(3), 142-164. http://doi.org/10.14786/flr.v8i3.497 vega, r. i. (2014). the role of student coping in the socially shared regulation of learning in small groups. unpublished doctoral dissertation. tucson, az: university of arizona. webb, n. m. (2008). learning in small groups. in t. l. good (ed.), 21st century education: a reference handbook (vol. 2., pp. 203–211). thousand oaks, ca: macmillan. webb, n. m. (2013). information processing approaches to collaborative learning. in c. e. hmelo-silver, c. a. chinn, c. k. k. chan, & a. m. o’donnell. (eds.), the international handbook of collaborative learning (pp. 19-40). new york: taylor & francis. frontline learning research vol. 5 no. 3 special issue (2017) 123 138 issn 2295-3159 visual expertise as embodied practice jonas ivarsson1 university of gothenburg, sweden article received 28 april / revised 21 november / accepted 23 march / available online 14 july abstract this study looks at the practice of thoracic radiology and follows a group of radiologists and radiophysicists in their efforts to find, discuss, and formulate issues or troubles ensuing the implementation of a new radiographic imaging technology. based in the theoretical tradition of ethnomethodology it examines the local endogenous practices pertaining to the radiologists’ expertise in the interpretation of visual representations and tries to explicate the ways in which they draw upon various resources in order to accomplish their professional tasks. as the study is addressing the topic of visual expertise it also aims to do so in terms that acknowledge that all expertise is rooted in embodied practices. the analysis follows a case of what is called the enacted production of radiological reasoning. one of the central features of the described work is the manner in which it is carried out by way of the living present body of an expert. the experienced radiologist interweaves anatomical and technological terminology with visual representations and gestures in such a way that none of these components can be said to be superfluous to the argumentation. as a consequence, we should appreciate gestures and embodied actions as important means through which expertise become organised. these are parts of a repertoire of methods through which the experts learn their profession. in addition, gestures can also become enrolled in the re-negotiation of expertise in the face of new challenges. keywords: visual expertise; radiology; ethnomethodology; body; gesture 1 contact information: jonas ivarsson, department of education, communication and learning, university of gothenburg, sweden. e-mail: jonas.ivarsson@gu.se doi: http://dx.doi.org/10.14786/flr.v5i3.253 http://dx.doi.org/10.14786/flr.v5i3.253 ivarsson | f l r 124 1. introduction this study looks at practitioners in thoracic radiology. it follows a group of radiologists and radiophysicists in their efforts to find, discuss, and formulate issues or troubles ensuing from the implementation of a new radiographic imaging technology. by foregrounding this case, where professionals grapple to overcome some difficulties in interpreting new forms of radiographs, it becomes possible to examine the matter of visual expertise through a dual lens; it simultaneously presents us with a specialised area of expertise as something performed and as something talked about by the very same practitioners. the profession of radiology has typically been described as the technical art of visually perceiving structures and pathologies by way of radiographs. it has been portrayed as a solitary practice where for instance the position of the radiologist’s eye in relation to the image can be examined for the ways that it will impact on the detection of pathologies (kundel, nodine, & toto, 1991). in this traditional view of radiological practice, the body plays an intriguingly subordinate role, and expertise in diagnosing x-rays is described as grounded in deep forms of cognitive processing (lesgold et al. 1988). when radiology in this way becomes prefaced by its function as visual assessments of representational objects, both the practitioners and their patients seem to figure merely as dis-embodied phantoms. the objective of this study then is to revisit this isolated focus on eyes and perceiving retinas and to bring the body back into the study of visual expertise. the general model of perception as a process where sensation and movement are seen as intrinsically tied to visual understandings of form is itself not new (cf., myers, 2008). ideas of this kind have been advanced in the theoretical works of such scholars as maurice merleau-ponty (1962) and james gibson (1968, 1979). this particular study though, will draw on insights generated within the tradition of ethnomethodology and studies of talk-in-interaction. as will be argued, this means that the analysis seeks to describe the work of the practitioners in its discipline specific details. it looks at the local endogenous practices pertaining to the radiologists’ expertise in the interpretation of visual representations and the ways in which they draw upon various resources, including the body, in order to accomplish their professional tasks. 2. ethnomethodology ethnomethodology is a form of social inquiry, “dedicated to explicating the ways in which collectivity members create and maintain a sense of order and intelligibility in social life” (have, 2004, p. 14). the tradition was founded by harold garfinkel in the 50s and 60s and one of his key publications is “studies in ethnomethodology” which was published 1967. this book began with a densely phrased description of the enterprise that nevertheless captures much of what then came to be expounded: ethnomethodological studies analyze everyday activities as members’ methods for making those same activities visibly-rational-and-reportable-for-all-practical-purposes, i.e., ‘accountable’, as organizations of commonplace everyday activities. (garfinkel, 1967, p. vii) rather than offering a method of study, ethnomethodology turns an eye towards the methods used by members and makes those into its object of study. as a consequence, the analyses are not primarily aimed at generating new knowledge. rather such studies seek to explicate what is already known and shared within a targeted group. a possible objection to this restriction in scope could be that analyses of this kind would not amount to much. however, through the close descriptions and detailed accounts provided by the analyses, also non-members can be granted partial access to the inner workings of a practice. in this way the ethnomethodological analysis can also become a form of pedagogy (garfinkel, 2002). ivarsson | f l r 125 one feature, central to the focus on members’ methods in garfinkel’s (1967) writing, is also the idea of accountability. the notion of accountability practices was originally borrowed from the legalities surrounding businesses but in the hands of ethnomethodology it came to be applied to the entirety of social life. the idea here is that members of a practice design their actions, for others, so as to make those actions visible as what they are. for instance, a pedestrian aiming to cross a busy street usually makes sure that this “crossing-the-street” becomes a witnessable thing in the world for others (especially drivers) to see and relate to. it is in this sense that ethnomethodology has come to speak of social life as constituting a “witnessable order”. this view embodies a radical methodological departure from other social inquiries that work from a belief in an underlying or hidden order, that can only be uncovered with the application of specific sociological methods or theoretical concepts (livingston, 2008). to ethnomethodology, social order is available for all to see and analyse, it is as available to laymen as it is to professionals. this general approach is immensely useful when studying a vast array of social situations and actions. nevertheless, when we move into domains of specialised professional practice some methodological complications arise. first of all, professional actions such as operations on and in a physical or symbolic environment are social in a special sense. when for instance a dentist is clearing out a root-canal with a file, the physical actions are done as parts of a medical procedure and are primarily done to do that job. simultaneously, those same actions are also accountable actions within the medical practice of root canal treatments. as the manual domain-specific operations are executed they become witnessable by other practitioners. this means that if those actions are carried out incorrectly (according to the standards upheld by the profession) they can be called out, reprimanded or made into a case of medical malpractice. the upshot here is that some forms of professional conduct may chiefly be designed for, and thus accessible to, other professionals. this condition can make the study of expert performance and reasoning more difficult (for a further discussion see lynch, 1993). there are different ways that this methodological difficulty has been managed. in their studies of archaeological excavations (goodwin, 1994), architectural reasoning (lymer, 2010) or gallbladder surgery (koschmann, lebaron, goodwin, & feltovich, 2011) the authors all chose to focus on educational settings, arrangements where skilled practitioners were instructing students in the professional forms of seeing and reasoning. in these settings the important things to be seen and known as professional objects became articulated and thereby also rendered available for an overhearing analyst. another possibility is instead that the analyst becomes a skilled member of the practice. in this tradition we find the highly detailed and most insightful analyses of such practices as improvised jazz (sudnow, 1978), mathematical proving (livingston, 1999) and the organisation of turn-taking in surfing (liberman, 2015). in these latter studies, the authors can be seen adhering to what garfinkel (2002) termed the “unique adequacy requirement of method”, the policy dictating that an analyst must also be competent in the very methods that he or she is studying. while the present study is adopting neither of these two approaches in any traditional sense the case to be analysed has nevertheless been selected and worked on with the above-mentioned complications in mind. any analyst interested in the expertise pertaining to radiology is presented with a daunting challenge due to the obscurity of the work—as part of an ordinary day’s work the practice of assessing radiographs is typically carried out in solitude. for this reason, the analysis presented here will focus on a specially organised meeting where a group of radiologists met with a group of radiophysicists to talk about their current skills in detecting pulmonary nodules 2 and some possible limitations of those skills. thus, the visual expertise within the group of radiologists was itself a topic for the discussion. whereas this situation presented more talk than would a solitary reading of radiographs, its analysis would still require some competency in the radiological matters discussed. in order to make any such analysis possible the author has collaborated with the group of radiologists and radiophysicists to the extent of doing joint analyses and co-authoring papers over several years. this form of research can be seen as an example of what garfinkel (2002) calls “hybrid studies of work”, a form of study that focuses on member’s 2 a pulmonary nodule is a small round or oval-shaped growth in the lung which is possibly malignant. ivarsson | f l r 126 methods in its discipline-specific details. grounded in the experience of this and other inter-disciplinary collaborations it is also suggested here that the requirement of unique adequacy could be somewhat reconsidered. rather than standing as a requirement pertaining to each and every individual it is perhaps better to consider the competency of the analysing collaborative or research team as a whole. the circumstances that brought about this specific collaboration and the studied situation will be addressed after some notes on the study of embodied interaction. 2.1 studying knowledge and the body there is a growing body of ethnomethodological studies that address interaction in everyday and workplace settings that also attend to embodied and material aspects of the communicative situations. as argued by charles goodwin (2000) traditional analytic and disciplinary boundaries have tended to isolate language from its environment. in order to avoid such separations he has aimed to “provide a systematic framework for investigating the public visibility of the body as a dynamically unfolding, interactively organized locus for the production and display of meaning and action” (2000, p. 1490). not only are bodies seen as central to the investigation of human action, they are implicated at a fundamental level in the very skills that people come to possess: “as the active body acquires skills, those skills are stored, not as representations in the mind, but as dispositions to respond to the solicitations of the situation” (streeck, 2015, p. 422). in this way knowledge has a tacit, gestural and even muscular side (griesemer, 2004), but these aspects are easily neglected once ideas becomes established and mastered on a personal level. in relation to medical practice, “bodies” become implicated in a multitude of ways. stefan hirschauer (1991) has for instance addressed the link between physicians, patients’ bodies, and anatomical representations by looking at the work done during surgery. he argues that surgeons have to acquire two bodies in their education: their own trained body and the abstract body as learnt from textbooks and other representations. when learning about anatomy griesemer sees the origin of ideas to lie much “in the coordination of the senses, particularly sight and touch” (2004, p. 440). this interest for the relationship between hands and object in surgery has also shifted the focus of the observation “away from visual and cognitive models toward a focus on what happens at the interface of hands and instruments” (prentice, 2005, p. 840). at this interface, gestures can play different important roles (kendon, 1997). streeck (2009) has distinguished what he calls six “ecologies of gesture” and two of these are most relevant in this context. first that gestures can select and elaborate features and significances of the world within sight and thereby orient participants to the visible environment beyond the reach of the hands. second, that gestures can evoke phenomena that are not present and depict imaginary and abstract worlds. in a study of how brain neuroscientists work with digital fmri scans alac (2008) discusses how the ‘seeing’ of images is an embodied process that is achieved through a coordination of the visual information generated by the technical instruments with the world of meaningful actions and practical problem solving. in this work “gesture, talk and the manipulation of the digital screen function together as techniques for managing perception” (2008, p. 493). furthermore, as “the gestures participate in the interpretive act as an embodied enactment of the process of change” (p. 494) alac argues that the neuroscientists display a way of seeing images that involves the hands as well as the eyes. a much similar argument is presented by slack, hartswood, procter, and rouncefield (2007) in their study of the diagnostic work of mammography. the authors characterize the reading of mammograms as “lived work” which is encompassed by “the arrangement of mammograms, gesturing and pointing to features on mammograms, manipulating mammograms, and annotations” (2007, p. 176). they stress the importance of appreciating the social nature of the work and the embodied nature of reading and annotation. when the studied experts, in their practices of seeing/reading mammograms, incorporate such things as hands, pencils and gestures one should note that “these techniques are not ad hoc workarounds but repertoires of manipulations that are an integral part of the embodied practice of realizing phenomena as what they accountably are” (2007, p. 182). ivarsson | f l r 127 in relation to how professionals acquire skills during training, koschmann and lebaron (2002) investigated learners in various medical professions and their use of gestures in articulating their knowledge. the authors distinguish two ways that the notion of “articulating knowledge” can be conceived. first, gestures can be seen to reveal something about the learner’s current understanding. on the other hand, gesture can be treated “not only as an external manifestation of understanding but also as reflecting a constructive process of connection making” (2002, p. 252). it is with this latter view in mind that we turn to the present study and its interest in how gestures can become active means through which visual expertise is built and enacted—how gestures are performed in the service of sense making (cf., koschmann, lebaron, goodwin, zemel, & dunnington, 2007), both for self and others. 3. background to the studied setting the analysed material stems from a collaborative research project carried out in radiology and radiation physics. on a general level, the project was addressing how advancements in imaging technologies were challenging existing forms of expertise and thereby imposing the development of new criteria and methods of interpretation. more specifically, the empirical material concerns the work following the introduction of a new radiographic technology, called tomosynthesis. at the time of its implementation, tomosynthesis was recognized to have considerable advantages over ordinary chest radiography. in a first study, it was shown that the detection of pulmonary nodules was significantly higher for tomosynthesis than for chest radiography when used by experienced thoracic radiologists (vikgren et al., 2008). on the other hand, compared to the technology of computed tomography (ct), chest tomosynthesis has a limited depth resolution, which was considered a disadvantage for interpreting pathologies (johnsson et al., 2010). furthermore, since tomosynthesis at that time was a new technology, the knowledge of how to best analyse the resulting images was limited. as a response to this new situation, a subsequent study was arranged. in this study, six observers analysed the same group of tomosynthesis cases (n. 89) for presence of pulmonary nodules in two reading sessions, with the purpose of measuring the difference in performance due to learning with feedback between the two sessions. the reading sessions were separated by a collective review session, at which the observers were given feedback on their analyses on an additional set of tomosynthesis cases (n. 25). the collective session also served the purpose of identifying pitfalls and formulating suggestions on how to avoid them (asplund et al., 2011; rystedt, ivarsson, asplund, johnsson, & båth, 2011). the present investigation will focus on the interaction between the participants during the collective review session. 3.1 recording and data processing the review session lasted for almost six hours and was recorded with two cameras. a primary high definition camera was aimed at two projector screens set side-by-side displaying tomosynthesis and ct images. in order to help discriminate between the voices of the active participants at the session, a secondary standard definition camera was also installed and aimed at the group. originally, the view provided by this camera was not intended to be included in the analysis. nevertheless, after the fact, this recording was found to display a number of interesting features and was subjected to further analysis. in order for the radiologists to properly carry out their work, of making discernments on the screens, the ambient light in the room had to be kept at a minimum. for the purposes of the recording, this resulted in dark and grainy video images that are difficult to print as stills on a page. when viewed this way, the embodied behaviours of the participants are easily lost. thus, the very phenomenon that this study seeks to explore evades a simple re-presentation. in order to overcome this analytic problem the events have been represented in a sequence of drawings. this is something that has now become common practice in many ivarsson | f l r 128 video studies (e.g., goodwin, 2007b; lindwall & ekström, 2012; melander & sahlström, 2009). rather than simply tracing the images provided by the video stills, the processing here has proceeded in a more roundabout way. after transcription (elan), episodes selected for further analysis have been digitally reenacted using 3d modelling software (poser). poser is a virtual film studio that centres on depicting the human figure in three-dimensional form. it allows for the recreation of some features of the setting, but primarily lets the user control the bodies of digital mannequins by way of their orientations, gestures and gazes. the outputted renderings have later been retouched (photoshop) and compiled together with the transcribed talk (indesign). aside from proffering visually clear output, this procedure has had two main analytic advantages. first, there is an analogy to how the transcription of spoken interaction compels the analyst to focus on details of the talk that would not be attended to under normal conditions. by engaging in the exact replication of body postures, the flexing of joints, the placement of limbs and the like, the analyst can get a handle on the situation that the simple tracing of outlines cannot offer. the three dimensions of the recorded bodies are momentarily recovered in the process. the second advantage works mostly in benefit of the reader. the images selected for presentation are not necessarily tethered to the camera view. if a different angle provides a better view of an unfolding action, it can be selected effortlessly, and, thereby help the reader to get a better understanding of the events as they took place. whereas the original video is still understood as the primary data (the transcripts and) the images should be seen in their capacity as analytic representations—on a par with the textual analysis itself. 4. analysis in order to begin with the analysis, the relation between the studied session and ordinary practice have to be clarified. during an ordinary day’s work, the major task facing the radiologists typically concerns diagnostic work in clinical practice and their formulation of recommendations tailored to referring physicians. in line with this characterisation, the efforts undertaken during the review session could be understood as diagnostic work of a second order. the things to be diagnosed not only had to do with suspected nodules, but also most centrally concerned errors in the work of finding nodules. out of this exploration, recommendations informing further first order diagnostic work had to be formulated. in the materials, both of these aspects can be discerned. in some instances, the participants work towards the formulation of what things accountably are (as anatomical structures). in yet other instances there are attempts at formulating difficulties pertaining to the very process of diagnosis—difficulties thus instigated by the new technology. as will become evident, these two processes of diagnostic reasoning are closely connected, and, to a varying degree, involve interesting forms of embodied conduct. the episode that we will examine in detail mainly follows an extended argument made by one of the senior radiologists. in order to enhance the readability of the 42 seconds long sequence the transcripts have been segmented into a number of figures (1–7). the labelling of these is merely meant to provide some clues as regards the evolving topic of the talk-in-interaction. it should also be noted that the separate images represent one continuous sequence with no omissions at the joints. 4.1 the tricky thing just prior to the sequence, the group has been discussing some general features of the technology of tomosynthesis and the potential benefits of discovering centrally placed tumours with patients that also have ivarsson | f l r 129 pleural plaques3. this part of the discussion is concluded with the notion that such a prospect will very much depend on the location of the pathology. at this point the radiologist anna opens up a somewhat different, but still related, point in relation to the specific materials that are currently displayed on the screens. as will be clear, on the topical level this stretch of talk is replete with the reported troubles of perception and understanding. in the short sequence examined here, the expression “to perceive” occurs no less than three times, and “to understand” is set up as a contrast to “believe”. the example is thus an endogenous formulation that speaks about some perceptual difficulties generated by the introduction of the new technology. the visual expertise of radiological diagnosis is thus both being demonstrated here and made into the very topic for the discussion. however, rather than raising this as a general type of problem (which the cognitively associated terms could suggest) it is cast as a “setting’s trouble”, a form of problem that builds on, and refers to, the knowledge and practices shared by parties to that setting. still, as we will see, the articulation of this trouble is not straightforward, nor is it done by mere talk. anna commences by, what the sociologist doug maynard calls, an “embodied telling of a seeing” (2006, p. 107). figure 1. establishing the referential grounds. anna starts her contribution with the word “but”, a disjunction marker which could be heard as making a slight shift in topic in relation to previous talk. what follows is the formulation of this topic, i.e. “the tricky thing with tomosynthesis”. this initial characterization of a trouble, functions as a preface to a telling (sacks, 1974) and thus as a “prospective indexical” (goodwin, 1996), an indexical expression whose referent is to be specified in ensuing talk. 3 pleural plaques are characterised by areas of fibrous thickenings on the lining of the lungs. although benign (not cancerous) they are the most common indication of significant exposure to asbestos. ivarsson | f l r 130 the group sits across from two separate projector screens set side-by-side, one showing the ct and the other the tomosynthesis data. in [1-3] anna’s attention is directed at the rightmost screen showing the tomosynthesis image [4] and she also makes a brief, but fully extended, pointing gesture towards this screen [3]. next, [5] there is a shift in direction: topically, bodily and referentially. she moves her already extended arm to the left so to point at the adjacent projector screen [6]. the words “a thing like that” also begins to specify the indexical referent more precisely. from the recording it is evident that the two objects pointed to here are both regarded as visible and publically available for everyone present at the session. the very finding of those objects is not the primary concern in this instance. however, it should be acknowledged that this was not always the case. the issue of discovering possible pathologies is a prerequisite for any subsequent diagnostic work and the increase in detection rates was also one of the critical features of tomosynthesis (vikgren et al., 2008). in this short passage, the two referential objects pointed to in succession (here marked by the added arrows in the tomosynthesis section image [4] as well as the ct counterpart [6]) become unified in that they are treated as denoting a single physical structure situated elsewhere. this “thing” is a previously discussed pathology: a plaque with a pleural basis4. figure 2. representing digital manipulations. with the arm extended towards the ct screen and her hand held flat, anna makes two cutting or slicing movements. although done at a distance, this gesture builds on and gives meaning to a specific structure of the environment, namely the ct image [6]. it is an environmentally coupled gesture (goodwin, 2007a) which is done to indicate a cutting across the visible plaque [7]–[8]. furthermore, this gesture follows the anatomical plane known as the sagittal plane which roughly divides the patient’s left and right sides. her comment speaks about what-we-all-see in this image, as the object being in the centre of the lung [8]. through the response token from maria (“m”), there is confirmation that the argument is being followed thus far. 4 although pathological, a plaque is not a pulmonary nodule and mistaking it for one, as some of the radiologists had done, constitutes a case of a “false positive.” the ensuing reasoning exhibited here is aimed at minimizing such mistakes in the future. ivarsson | f l r 131 having both secured the attention of the group and established the referential grounds for her further work, anna returns to the main topic of “the tricky thing”. it is now developed into “the particular thing about depth” [9]. while keeping her forearm in place, she turns her wrist so that the back of the hand faces the screens (in terms of the anatomical planes this would be analogous to shifting from the sagittal to the coronal plane), and, as the word “depth” is produced with emphasis, she simultaneously moves the hand away from her. the significance of this gesture should be understood in relation to its material and social environment. as argued by lebaron and streeck: it is our contention that gesture — certainly descriptive or ‘iconic’ gesture — necessarily involves indexical links to the material world, even though these links are rarely established or explicated in the communicative situation itself. rather, in conversational contexts that are detached from the talked-about world, participants must fill in encyclopaedic knowledge (ranging from universal bodily experiences to highly specific cultural practices) to see and recognize gestures. (lebaron & streeck, 2000, p. 131) the particular movement made by anna [9] thus represents a case of such a highly specific cultural practice, an action that is most central to professional radiologists. in addition to indicating a movement in the ventral direction (toward the front of the patient) it also resembles one way of navigating in a set of section images. this is one of the central methods through which the sense of volume and location is built. in this way, the gestural action is not simply an embellishment to the talk (kendon, 1972), rather, it works to indexically tie the meaning of the word “depth” to a material and everyday radiological practice, known and shared amongst the participating radiologists. the gesture becomes part of what koschmann and colleagues (2007) characterise as a gestural formulation, which in its design displays the speaker’s analysis of whom she is addressing; the gesture is selected and shaped because of its presumed recognisability to the members of the setting (schegloff, 1972). figure 3. perceiving pleura. in [10] anna starts a new construction with “when”. this clause begins an attempt to establish the setting for the exposition to come. she keeps her hands flat before her as if reading a plane x-ray or tomosynthesis image. after a couple of cut-of’s and pauses she changes the sentence frame and in [11-12] again refers to the tomosynthesis image. she makes a deictic gesture toward the left-hand screen [11], then returns to regarding the flat image [12]. now the “tricky thing” is related to the act of perceiving the location of the object. the use of the adverb “pleurally” also performs classificatory work: since their search for nodules delimits their interest to objects that are located inside the lung, anything found in the pleura, the layer covering the lung, is to be disregarded in this particular task. ivarsson | f l r 132 after the clause “to perceive that that one is situated pleurally” there is another try at starting a new clause with “when”. also this time the attempt is abandoned and we get a further qualification of the anatomical basis for the trouble; “since the chest vaults” [13]. again we find gesturing that is closely coupled with the practice of thoracic radiology. anna is continuously amalgamating the materials of anatomical and technological concepts, visual representations and embodied gestures alongside her otherwise vernacular talk. the gesture accompanying the entire stretch of talk [in 13] is repeated twice, and, in effect, maps this vaulting feature on to a generalized body. the frame of reference taken is one of an external observer where the object is created before her eyes. but in the visual contrast between the flattened image and the portrayed 3-d object we get an early hint of a complication which becomes articulated next. figure 4. subjective involvement. next, the consequences of this rounded shape of the lung is commented on. the act of perception is again introduced into the talk, and this time it is formatted as reported speech “where in the lung am i”. simultaneous to the verbal comment about location anna taps her chest twice [14]. at this point, her own body is enrolled as a new referential ground. she has thereby established a transition in the frame of reference, from that of an external reader of images to that of an idealized patient. in addition to the two screens, and the gesture space in front of her, she is now also designing her comments so that they make sense in relation to her upper body. in [15], the third attempt at starting with a “when”-construction is brought to its completion. together with her pointing gestures, it does the job of building a contrast between two locations [15-17]. in the two demonstrations, she is also timing her pointings with the deictic terms “he:re” [16] and “the:re” [17] (hindmarsh & heath, 2000). by shifting her gaze to where she is pointing she highlights the gesture for her ivarsson | f l r 133 interlocutors, but at the same time makes the gesture a tool for her own understanding (lebaron & koschmann, 2003). so, what is the role of her body, and the possible reasons for bringing it into the interaction here? for one thing, it should be seen as doing communicative work in relation to the other participants. it has on the one hand, a rhetorical side, as a developing embodied argument (mirivel, 2011) which is clearly recipient designed (schegloff, 1972) for present parties. however, what anna has to do is not merely to “read” the images at hand and present that reading to her colleagues. in the context of the self-reflective situation set up by the team, she’s struggling to express her current understanding of the relation between the unique (forthis-person) three-dimensional space constituted by the patient’s body, and how this is first mediated via the imaging technology and later represented in the radiographs. some of the work of formulating this relationship has a speculative or exploratory quality to it. and in the combination of these two aspects of communication-cum-speculation we find the specific features of the setting. the extensive gesturing carried out by anna becomes a way of organizing her reasoning so that it is made available to her peers. it thereby takes the form of a provisional radiological reasoning done of and for the setting. furthermore, it is done not so much to diagnose this patient, as to provide materials for generic formulations that speak to the renegotiation of this group’s expertise (for an analysis of this practice see lymer et al., 2014). this form of publically oriented professional reasoning is partly done by way of a subjective involvement with the graphical materials (cf., ochs, gonzales, & jacoby, 1996). the subjective involvement is accomplished by linguistic as well as gestural means: by grammatically placing herself in the patient’s body (e.g., “where in the lung am i” [14]), and, by gesturally positioning the observed structure as if this was located in her own body [16] & [17]. the interchangeability of these frames of reference further suggests close links between talk, gesture and the material environment. it is not because the professionals routinely handle the bodies of patients that a physical body is being involved in the argument here—radiologists predominantly work on representational objects. however, here and now anna’s body provides a threedimensional structure aiding the installation of a specific contrast. in other words, her body is made to double as a scaffold in the developing formulation of how tomosynthesis depicts volumetric information. figure 5. perceiving depth. immediately following the establishment of the two separate locations there is a third iteration speaking about the activity of perceiving [18] and [19]. this time, and in comparison to [9], [11] and [14], the formulation has become more succinct: “to perceive the depth in tomosynthesis”. the previous “vaulting” gesture is also reused and laminated with the described problem of perception [18], as is the “depth/ventral” gesture again overlaid with the word “depth” [19]. according to mcneil and levy (1993), when gestural forms are reused in this way they often mark the reappearance of a particular plot element, ivarsson | f l r 134 character, or narrative value. such gestural cohesion, as they call it, is thus an aspect of the process of creating and maintaining topical cohesion across turns at talk (mcneill & levy, 1993). here we find the two themes of, first, the shape of the thoracic cavity, and, second, the work of navigating in a stack of images to be gesturally reintroduced. in [20] anna further qualifies the categorization, or the distinction, that is at stake. next, she returns to, and elaborates on, the contrast. figure 6. creating contrast. with her right hand held vertically at the front of her chest, the first location is specified as “in the front” [21], whereupon she provides the alternate location. introduced as something surprising she points further back and to the side of the chest. this time she is using her left index finger while the right hand remains flat on the top of her chest [22]. through this particular configuration of talk, body and embodied action, the contrasting locations have now become publically visible. the contrast being created here revisits the contrast described earlier [15-17] and we see again some recycled gestures. where the problem before was characterized as one of accurately assessing depth within an imagined image, the problem is here made more vivid by illustrating how the vagaries of the tomosynthetic image could radically skew the evaluation of location in a human body. figure 7. wrapping up. ivarsson | f l r 135 finally, in [23-25] anna summarizes the argument and spells out the trouble in a non-technical terminology. this concludes her extended contribution. another participant, lena, adds a comment about using the ribs and when they come into focus as a useful method for determining location. the comment is briefly elaborated and four of the participating radiologists then close this particular discussion on the note that “it is still difficult.” 5. discussion the interest of this article has been to address the topic of visual expertise and to do so in terms that acknowledge that all expertise is rooted in embodied practices. to this end the episode discussed above has served as an example of what we can call the enacted production of radiological reasoning. albeit a special case, the expertise of radiological diagnosis that comes into play here is first demonstrated through the actions performed, but it is also talked about in the studied discussion. in this talk, terms such as ‘seeing’, ‘understanding’ and ‘perceiving’ figure as members’ matters. to quote slack and his colleagues on this phenomenon: to be sure, members speak of seeing, noticing and other topics that are grist to the mentalists’ mill, but we have shown that they do so not in an isolated context (neither behind the skull nor as atomic ‘cognisers’) but in a manner where terms such as ‘seeing’, ‘noticing’ and so on are practical members’ achievements, achieved in and through natural language and embodied conduct. (slack et al., 2007, p. 192) these ‘practical achievements’, or actions, show us a corporeal side of the expertise in interpreting visual representations. not only are visual phenomena—in this case the existence of a pleurally based plaque—made into something observable and reportable through the deployment of a professional language. one of the central features of the above illustration is the manner in which this work is also carried out by way of the living present body of the expert. the experienced radiologist interweaves anatomical and technological terminology with visual representations and gestures in such a way that none of these components can be said to be superfluous to the argumentation. furthermore, the sequence encompasses, not gestures as a general phenomenon, but as specialized embodied conduct indexical (i.e. uniquely fitted) to projected images, practical actions, or specific locations in patient-bodies. so, by building on, and referring to, the matters and routines known and shared by the parties to the setting, these gestural actions also work to anchor the meaning of the talk in material and everyday radiological practice. in this capacity, the gestures act as aids in the bridging of interpretative gaps between the radiographic renderings and what those come to mean as professionally relevant objects. in the studied case these interpretative difficulties have been aggravated, due to the extraordinary situation of the newly introduced radiographic technology of tomosynthesis. however, this situation also allows for several fruitful observations. what is pulled into view are some transmutations, or movements from one medium to another, from the technologically mediated body of the patient into formulations of members’ understandings. and at this intersection we come very close to what is ordinarily regarded as expertise. we find exhibited production procedures through which the body of the patient is coordinated with the skilled body of the practitioner. without downplaying the relevance of functioning eyes and brains, the approach exemplified here can help us to appreciate gestures and embodied actions as critical means through which visual expertise becomes organised. these are parts of a repertoire of methods through which the radiologists learn their profession, and, as is evident here, they can also become enrolled in the renegotiation of expertise in the face of new challenges. ivarsson | f l r 136 transcription legend (0.5) numbers in parentheses indicate silence, represented in tenths of a second. (.) a dot in parentheses indicates a “micropause.” a hyphen after a word or part of a word indicates a cut-off or self-interruption. :: colons are used to indicate the prolongation or stretching of the sounds just proceeding them. word underlining is used to indicate some form of stress or emphasis. [ separate left square brackets on two successive lines indicate the onset of overlapping [ or simultaneous talk. > < the combination of “more than” and “less than” symbols indicate that the talk between them is compressed or rushed. acknowledgements the study has been funded by the swedish research council (2015-03621) and has been a part of the letstudio—a strategic initiative for promoting interdisciplinary research within the learning sciences at the university of gothenburg. i would like to express my gratitude to all partners involved in this inspiring collaborative undertaking. references alac, morana. (2008). working with brain scans: digital images and gestural interaction in fmri laboratory. social studies of science, 38(4), 483-508. doi: 10.1177/0306312708089715 asplund, sara, johnsson, åse a, vikgren, jenny, svalkvist, angelica, boijsen, marianne, fisichella, valeria, flinck, agneta, wiksell, åsa, ivarsson, jonas, rystedt, hans, månsson, lars gunnar , kheddache, susanne, & båth, magnus. (2011). learning aspects and guidelines regarding detection of pulmonary nodules and developing quality criteria for chest tomosynthesis. acta radiologica, 52, 503-512. doi: 10.1258/ar.2011.100378 garfinkel, harold. (1967). studies in ethnomethodology. englewood cliffs, nj: prentice-hall. garfinkel, harold. (2002). ethnomethodology's program: working out durkheim's aphorism. lanham: rowman & littlefield publishers. gibson, james j. (1968). the senses considered as perceptual systems. london. gibson, james j. (1979). the ecological approach to visual perception. boston, ma: houghton mifflin. goodwin, charles. (1994). professional vision. american anthropologist, 96(3), 606-633. goodwin, charles. (1996). transparent vision. in e. ochs, e. a. schegloff & s. thompson (eds.), grammar and interaction. cambridge, ma: cambridge university press. goodwin, charles. (2000). action and embodiment within situated human interaction. journal of pragmatics, 32, 1489-1522. goodwin, charles. (2007a). environmentally coupled gestures. in s. duncan, j. cassel & e. t. levy (eds.), gesture and the dynamic dimensions of language (pp. 195-212). philadelphia: john bejamins. goodwin, charles. (2007b). participation, stance, and affect in the organization of activities. discourse and society, 18(1), 53-73. griesemer, james. (2004). three-dimensional models in philosophical perspective. in s. de chadarevian & n. hopwood (eds.), models: the third dimension of science (pp. 433-442). stanford, ca: stanford university press. ivarsson | f l r 137 have, paul ten. (2004). understanding qualitative research and ethnomethodology. london: sage. hindmarsh, jon, & heath, christian. (2000). embodied reference: a study of deixis in workplace interaction. journal of pragmatics, 32, 1855-1878. hirschauer, stefan. (1991). the manufacture of bodies in surgery. social studies of science, 21, 279-319. johnsson, åse a, vikgren, j, svalkvist, a, zachrisson, sara, flinck, a, boijsen, m, kheddache, s, månsson, l g, & båth, magnus. (2010). overview of two years of clinical experience of chest tomosynthesis at sahlgrenska university hospital. radiation protection dosimetry, 1-6. kendon, adam. (1972). some relationships between body motion and speech. in a. siegman & b. pope (eds.), studies in dyadic communication (pp. 177-210). new york: pergamon press. kendon, adam. (1997). gesture. annual review of anthropology, 26, 109-128. koschmann, timothy, & lebaron, curtis. (2002). learner articulation as interactional achievement: studying the conversation of gesture. cognition and instruction, 20(2), 249-282. koschmann, timothy, lebaron, curtis, goodwin, charles, & feltovich, p. (2011). 'can you see the cystic artery yet?' a simple matter of trust. journal of pragmatics, 43(2), 521-541. koschmann, timothy, lebaron, curtis, goodwin, charles, zemel, alan, & dunnington, gary. (2007). formulating the triangle of doom. gesture, 7(1), 97-118. kundel, harold l, nodine, calvin f, & toto, lawrence. (1991). searching for lung nodules. the guidance of visual scanning. investigative radiology, 26, 777-781. lebaron, curtis, & koschmann, timothy. (2003). gesture and the transparency of understanding. in p. glenn, c. lebaron & j. mandelbaum (eds.), studies in language and social interaction (pp. 102-112). mahwah, nj: lawrence erlbaum. lebaron, curtis, & streeck, jürgen. (2000). gestures, knowledge, and the world. in d. mcneill (ed.), language and gesture (pp. 118-138). cambridge: cambridge university press. lesgold, a., rubinson, h., feltovich, p., glaser, r., klopfer, d., & wang, y. (1988). expertise in a complex skill: diagnosing x-ray pictures. in m. chi, r. glaser, & m. farr (eds.), the nature of expertise (pp. 310-342). london: routledge. liberman, kenneth. (2015). turn-taking in the surfer's lineup. surfertoday.com. http://www.surfertoday.com/surfing/12275-turn-taking-in-the-surfers-lineup-an-academic-analysisby-kenneth-liberman lindwall, oskar, & ekström, anna. (2012). instruction-in-interaction: the teaching and learning of a manual skill. human studies, 35, 27-49. livingston, eric. (1999). cultures of proving. social studies of science, 29(6), 867-888. livingston, eric. (2008). ethnographies of reason. hampshire: ashgate. lynch, m. (1993). scientific practice and ordinary action: ethnomethodology and social studies of science. cambridge: cambridge university press. lymer, gustav. (2010). the work of critique in architectural education. göteborg: acta universitatis gothoburgensis. lymer, gustav, ivarsson, jonas, rystedt, hans, johnsson, åse allansdotter, asplund, sara, & båth, magnus. (2014). situated abstraction. from the particular to the general in second order diagnostic work. discourse studies, 16(2), 182-212. doi: 10.1177/1461445613514674 maynard, douglas w. (2006). cognition on the ground. discourse studies, 8(1), 105-115. mcneill, david, & levy, elena t. (1993). cohesion and gesture. discourse processes, 16(4), 363-386. doi: 10.1080/01638539309544845 melander, helen, & sahlström, fritjof. (2009). in tow of the blue whale: learning as interactional changes in topical orientation. journal of pragmatics, 41(8), 1519-1537. doi: http://dx.doi.org/10.1016/j.pragma.2007.05.013 merleau-ponty, maurice. (1962). phenomenology of perception. london: routledge & kegan paul. mirivel, julien. (2011). embodied arguments: verbal claims and bodily evidence. in j. streeck, c. goodwin & c. lebaron (eds.), embodied interaction: language, and body in the material world (pp. 254-263). cambridge: cambridge university press. http://www.surfertoday.com/surfing/12275-turn-taking-in-the-surfers-lineup-an-academic-analysis-by-kenneth-liberman http://www.surfertoday.com/surfing/12275-turn-taking-in-the-surfers-lineup-an-academic-analysis-by-kenneth-liberman http://dx.doi.org/10.1016/j.pragma.2007.05.013 ivarsson | f l r 138 myers, natasha. (2008). molecular embodiments and the body-work of modeling in protein crystallography. social studies of science, 38(2), 163-199. doi: 10.1177/0306312707082969 ochs, elinor, gonzales, patrick, & jacoby, sally. (1996). "when i come down i'm in the domain state": grammar and graphic representation in the interpretive activity of physicists. in e. ochs, e. a. schegloff & s. a. thompson (eds.), interaction and grammar (pp. 328-369). cambridge, ma, england: cambridge university press. prentice, rachel. (2005). the anatomy of a surgical simulation: the mutual articulation of bodies in and through the machine. social studies of science, 35(6), 837-866. doi: 10.1177/0306312705053351 rystedt, hans, ivarsson, jonas, asplund, sara, johnsson, åse allansdotter, & båth, magnus. (2011). rediscovering radiology. new technologies and remedial action at the worksite. social studies of science, 41(6), 867-891. doi: 10.1177/0306312711423433 sacks, harvey. (1974). an analysis of the course of a joke's telling in conversation. in r. bauman & j. sherzer (eds.), explorations in the ethnography of speaking (pp. 337-353). cambridge: cambridge university press. schegloff, emanuel a. (1972). notes on a conversational practice: formulating place. in d. sudnow (ed.), studies in social interaction (pp. 75-119). new york: macmillan. slack, roger, hartswood, mark, procter, rob, & rouncefield, mark. (2007). cultures of reading: on professional vision and the lived work of mammography. in s. hester & d. francis (eds.), orders of ordinary action (pp. 175-193). aldershot: ashgate. streeck, jürgen. (2009). gesturecraft. the manufacture of meaning. amsterdam: john benjamins. streeck, jürgen. (2015). embodiment in human communication. annual review of anthropology, 44(1), 419-438. doi: doi:10.1146/annurev-anthro-102214-014045 sudnow, david. (1978). ways of the hand: the organization of improvised conduct. cambridge, ma: harvard university press. vikgren, jenny, zachrisson, sara, svalkvist, angelica, johnsson, åse a , boijsen, marianne, flinck, agneta, kheddache, susanne, & båth, magnus. (2008). comparison of chest tomosynthesis and chest radiography for detection of pulmonary nodules: human observer study of clinical cases. radiology, 249, 1034-1041. frontline learning research 2 (2013) 53-69 issn 2295-3159 corresponding author: martin salaschek, university of münster, fliednerstrasse 21, 48149 münster, germany, martin.salaschek@uni-muenster.de http://dx.doi.org/10.14786/flr.v1i2.51 53 | f l r web-based progress monitoring in first grade mathematics martin salaschek a , elmar souvignier a a university of münster, germany article received 6 august 2013 / revised 27 november 2013 / accepted 11 december 2013 / available online 20 december 2013 abstract the purpose of our research was to examine a web-based tool for mathematics progress monitoring in first grade. the newly developed assessment tool uses several robust indicators and curriculum-based measures forming three competences (basic precursors, advanced precursors, and computation) to determine comprehensive early numeracy skills in general education. 373 students completed a total of eight online tests every two or three weeks. results indicate that delayed alternate-form reliability was adequate (rm = .78). repeated measures analyses with post hoc comparisons were used to ascertain the sensitivity to assess learning growth. all three competences showed linear growth rates that were significant over time, but only computation and overall scores produced dependable increases from test to test. predictive validity was determined using two standardised school achievement tests (end of first grade, end of second grade). results indicate high predictive validity of the first four online tests (rm = .67, rm = .66 for 6 months and 18 months prediction). correlations with teacher ratings of their students' skills confirmed this pattern. results from student and teacher questionnaires indicate that the students were able to conduct the tests independently and that a three-week interval was adequate for regular-education use. teachers declared to use the progress monitoring results diversely for classroom purposes. we conclude that the use of a web-based assessment setting with diverse measures is beneficial with respect to psychometric properties and feasibility for frequent use in general education. keywords: early numeracy; mathematics; progress monitoring; web-based assessment 1. introduction learning progress assessment aims at providing teachers with information about learning growth, and using diagnostic information for individualised instruction has been shown to result in higher learning gains (connor, morrison, & petrella, 2004; stecker, fuchs, & fuchs, 2005). especially in first grade, results from kim, petscher, schatschneider, and foorman (2010) show that the slope of learning is highly predictive for future achievement. however, stecker et al. note that teachers need assistance in interpreting and m. salaschek & e. souvignier 54 | f l r successfully using progress monitoring results. progress monitoring tools should therefore provide educators with reliable and comprehensive feedback about students' skills. for successful implementation in regulareducation classrooms, high utility and feasibility is additionally required. this can be achieved with highly automated assessment and feedback systems. traditional progress monitoring tools reliably and validly assess students' performance, but are time-consuming because they usually require face-to-face assessment. in addition, most tools for first grade consist of only a few different curricular tasks, making it difficult for educators to use results for adjustments in classroom work. in the present study, we examined psychometric properties and utility of a web-based progress monitoring tool for first-graders. the tool assesses early mathematics competences comprehensively and allows students to work on the tests independently without teacher aid. 1.1 early numeracy and later mathematical achievement early numeracy plays a vital role for the development of later mathematics performance and general school achievement (aunola, leskinen, lerkkanen, & nurmi, 2004; duncan et al., 2007). thus, much research in the past decade has focused on the identification of relevant skills that children should be proficient in when entering school (berch, 2005; gersten, jordan, & flojo, 2005; jordan, kaplan, oláh, & locuniak, 2006; koponen, aunola, ahonen, & nurmi, 2007; methe, begeny, & leary, 2011; missall, mercer, martínez, & casebeer, 2012). certain number sense abilities seem to form precursors or even gateways for further mathematical achievement, but the definition of number sense remains vague (cf. berch, 2005, for an overview). unlike reading, in which well-defined precursors (such as phonological awareness) have been identified, numeracy seems to develop from a diverse set of mental processes which evolve during childhood. the triple-code model of number processing (dehaene & cohen, 1995; dehaene, 1992, 2011) describes three systems involved in different aspects of number processing (i.e., for nonverbal semantic representations; for verbal representations; and for written numerals) derived from a biological viewpoint. these systems develop independently, and pathways are used for communication when solving mathematical problems. developmental models like the model of early mathematical development, which describes three levels of successional skills (krajewski & schneider, 2009; krajewski, 2008), take up a more growth-oriented stance. in krajewski's model, skills at the second level represent the linking of number words with quantities. these skills proved to be particularly predictive for mathematical achievement at the end of primary school (krajewski & schneider, 2009). 1.2 progress monitoring in early mathematics students at risk of not reaching educational goals can be identified by assessing progress of essential skills, such as curricular abilities and number sense skills, which have been described as "gateway" skills for further mathematical development (clarke, baker, smolkowski, & chard, 2008, p. 48). subsequently, suitable interventions can be implemented. educators can use tools to monitor learning progress over time and thereby identify students who do not improve (at an acceptable rate). assessment tools for this purpose should reliably assess students’ performance level and its development, so that students at risk of not reaching curricular goals can be identified. furthermore, diagnostic information about curricular competences should be provided, which teachers can use for instructional changes. implementation should be efficient and as effortless as possible such that general classroom work is not hindered (förster & souvignier, 2011). one progress monitoring approach for this purpose is curriculum-based measurement (cbm; see deno, 2003, for an overview). in cbm, short tests of important curricular competences are conducted regularly. for early mathematics, the psychometric properties of several cbm tests have been discussed in the literature recently (e.g., chard et al., 2005; clarke et al., 2011; seethaler & fuchs, 2011). much of the recent early mathematics cbm research focuses on a set of measures known as tests of early numeracy m. salaschek & e. souvignier 55 | f l r (ten). ten measures have demonstrated high levels of reliability and predictive value for later mathematics performance in a number of studies during kindergarten and first grade general education (e.g., baglici, codding, & tryon, 2010; chard et al., 2005; clarke & shinn, 2004; missall et al., 2012). ten consist of four measures: (1) oral counting, assessing the ability to count orally; (2) number identification, assessing the ability to verbally identify a written number between 0 and 20; (3) quantity discrimination, assessing the ability to identify the larger of two visually presented numbers; and (4) missing number, assessing the ability to name the missing number from a string of three numbers, with one of the three numbers missing. however, there are several issues still to be worked on if these measures shall serve as a basis for instructional changes in the classroom: first, as methe (2012, p. 68) notes, ten measures "struggle to capture more exact knowledge deficits" because they lack close relation to curricula. results are therefore hard to interpret by educators. measures that relate more closely to specific curricular goals might make it easier for educators to use the diagnostic information for classroom work or further interventions. second, reliability and predictive validity results of the four single measures vary from study to study (see missall et al., 2012, for an overview); missall et al. (l.c., p. 96) ascertain that a combination of several measures seems to result in elevated technical adequacy. as a consequence, the authors call for progress monitoring tools which assess early mathematics more comprehensively. third, with the recent exception of a study by hampton et al. (2012), most studies report results from only two or three data points and interpolate learning growth between them. this procedure does not allow a timely evaluation of individual learning growth and also leaves the possibility of non-linear growth patterns. this aspect is especially relevant in the light of low (interpolated) weekly growth rates that often do not exceed 0.30 points per week (foegen, jiban, & deno, 2007). low average growth rates make it more difficult to interpret stagnating scores as at-risk. finally, ten measures are time-consuming to implement because two of the measures (oral counting and number identification) require students to verbalize their answers and therefore can only be assessed in one-on-one settings. in general education, the time and effort needed are reasons why educators usually do not utilise early mathematics progress monitoring at all or regularly enough to make quick instructional adjustments possible. 1.3 aims of the study in our study we aim to approach the aforementioned issues with a web-based progress monitoring tool for first grade mathematics which is feasible for frequent use in general education. the tool intends to assess mathematics skills comprehensively and includes both precursor and curricular competences. that way, educators are enabled to make inferences about students' strengths and weaknesses for classroom work or intervention. assessment time needs to be low and the retrieval and use of results as effortless as possible. psychometric properties of the test concept should be sufficient for dependable estimations of students' shortterm and long-term curricular achievements and for the detection of learning growth. students should work on the tests in a motivated manner to obtain valid results. these aims lead to the following research questions: (1) does the progress monitoring tool assess students' performance reliably? (2) as measures of concurrent and predictive criterion validity, do the progress monitoring test scores correlate significantly with results from standardised achievement tests and teacher ratings of students' mathematics performance? (3) are learning gains represented in the test scores? i.e., can increases in test scores be observed when testing frequently? (4) do teachers and students rate the tool and its implementation feasible for frequent use in general education? m. salaschek & e. souvignier 56 | f l r 2. method 2.1 participants and setting two consecutive studies were conducted with a total of 373 first-grade students in 18 regulareducation classrooms (see table 1 for demographics). the studies took place in rural and urban areas of germany. eight progress monitoring tests were conducted in both studies in intervals of either two weeks (study 1, november 2010 to march 2011) or three weeks (study 2, november 2011 to may 2012). figure 1 provides an overview of the time structure and main dependent variables of the two studies. in study 1, a number of additional measures was obtained: three different standardised paper-pencil tests (pp1-pp3) were conducted, assessing relevant curricular competences of each time point. pp1 was conducted immediately before the first progress monitoring test, pp2 immediately after the last progress monitoring test. eight of the 10 classrooms in study 1 (148 students) participated in a follow-up paper-pencil test approximately 14 months later at the end of second grade (pp3). teacher ratings of students' overall mathematical competence were obtained before each of the three school achievement tests. at the end of first grade, teachers were also surveyed about the feasibility of the web-based progress monitoring tool and their use of the results. students completed a short questionnaire about the progress monitoring test before pp2. purpose of study 1 was to obtain detailed information about the tests' validity. study 2 was then conducted to inspect reliability and sensitivity to learning in an extended time-frame. in preparation of study 2, single items were revised pertaining to difficulty and parallelism after study 1. because of student mobility or sick absentees, some data were missing (progress monitoring tests: 0%11%, mmissing = 1.8%; paper pencil tests: 0%-3.6%, mmissing = 1.7%; teacher ratings: 4.5%-23.2%, mmissing = 12.6%). we used multiple imputation with five imputed data sets to handle missing test data (newton et al., 2004). unbiased results can be expected from multiple imputation when data are missing at random (mar; see schafer & graham, 2002, for a discussion of the term) or when auxiliary variables are included in the imputation model which closely relate to the missing data (collins, schafer, & kam, 2001). given the number of strongly correlated variables in our study designs, we assumed that our inclusive multiple imputation model produced results that are not meaningfully biased. where applicable, coefficients reported in the results section were obtained by combining the imputed data sets using the formulas reported by rubin (1987, 1996). table 1 demographics of study participants study 1 study 2 n 220 153 sex girls boys 51% 49% 46% 54% migration background 22% 9% age at first progress monitoring test 6.68 years 6.72 years note. migration background was defined via language(s) spoken at home. students who spoke another language than german at home were categorized as having a migration background. m. salaschek & e. souvignier 57 | f l r figure 1. schematic overview of the time structure of study 1 and study 2. study 1 was conducted from november 2010 to june 2012, study 2 was conducted from november 2011 to may 2012. pp = paper pencil test. 2.2 progress monitoring measures progress monitoring tests consisted of nine measures in three competences with a total of 52 problems (table 2 provides an overview of the measures used in the progress monitoring test in both studies). the tests were completely computerised, and students received detailed audio instructions before each new set of tasks via headphones to eliminate the influence of reading skills. all tasks were in multiple choice format, in which students clicked on the solution they thought to be correct. tests were untimed, and the children worked on them independently without teacher instruction. results were computed as percentage correct, and educators could access results (graphs and tables) at student and classroom level immediately after a test was completed by a student. results could be compared with class means or overall mean scores of all participating classrooms in the study, and results differing more than one standard deviation from the mean could be highlighted. during the two-week/three-week interval of each test, classrooms could choose to test all students during one class period (if computer rooms were available) or consecutively on computers in the classroom, e.g., during self-study periods. a time frame of two weeks per test was initially chosen for particularly close monitoring of learning growth. intervals were extended to three weeks in study 2 as a response to teacher feedback. the test emphasized the gateway role of number sense by assessing two sets of precursor skills, basic precursors and advanced precursors. both competences were closely related to the triple-code model (dehaene & cohen, 1995) and krajewski and schneider's model of early mathematical development (krajewski & schneider, 2009). precursor measures were complemented by relevant curriculum-based computation skills. all measures included questions of varying difficulty to differentiate between weaker and stronger students. four parallel versions (a-d) of the test were created by using item-cloning algorithms for task creation and the selection of distractors (cf. clause, mullins, nee, pulakos, & schmitt, 1998): for every task, attributes that define its difficulty were identified and held constant in the parallel tests (e.g., for an addition task, the size of the second summand and whether crossing the tens boundary was necessary). throughout the school year, each of the four tests was conducted twice to obtain eight data points (sequence a-d, a-d). basic precursors aimed at assessing fundamental skills that students should be proficient at when entering school. basic precursors contained the measures number discrimination (similar to the ten measure quantity discrimination), symbol quantity discrimination, and number identification (also similar to the corresponding ten measure). advanced precursors aimed at more sophisticated precursor skills, which usually partly develop before school entrance and should soon be mastered during school. advanced precursors contained the measures number sequence 1/number sequence 2 (similar to the ten measure missing number and the next number task used by hampton et al., 2012) and number line, which assesses the extent to which a linear mental number line is developed (see siegler & booth, 2004, for a discussion). study 1 study 2 progress monitoring tests 1-8 (3-week intervals) may june grade 2 pp1 progress monitoring tests 1-8 (2-week intervals) pp2 pp3 nov. dec. jan. feb. mar. apr. m. salaschek & e. souvignier 58 | f l r computation aimed at the main curricular arithmetic goals of german first grade, i.e., handling numbers in the range of 1-20. computation contained addition and subtraction tasks as well as equation problems with dice. table 2 description of progress monitoring measures competence/measure no. of items range example problem distractors task description basic precursors 20 number discrimination 8 1-500 64 | 38 select the larger number symbol quantity discrimination 6 1-10 select the picture with more shapes number identification 6 1-100 audio: "28" 82 | 27 | 72 | 28 | 38 select the number that was given via audio advanced precursors 17 number sequence 1 4 1-20 19, 18, ? 15 | 20 | 16 | 17 select the missing number (steps of 1) number sequence 2 4 1-20 4, 6, ? 10 | 8 | 9 | 7 select the missing number (steps of 2) number line 9 1-20 audio: "12" select the number line that has a mark at the position of the number that was given via audio computation 15 addition 5 1-20 6 + 5 = ? 9 | 10 | 11 | 13 select the correct solution subtraction 4 1-20 15 8 = ? 7 | 9 | 23 | 5 select the correct solution equation 6 1-10 4 + 4 | 7 + 3 | 4 + 3 select the problem with the same solution as the dice problem note. all measures contained problems of varying difficulty, e.g., lower or higher numbers. detailed task descriptions were provided via headphones in language suitable for children. 2.3 criterion measures the three paper-pencil achievement tests in study 1 were selected with reference to their curricular adequacy of the given time points. e.g., at the beginning of grade 1, an achievement test suitable for whole classrooms cannot yet test curricular competences which are only expected to develop during the school year. for this reason, the osnabrück test of number concept development (otz; van luit, van de rijt, &hasemann, 2001) was chosen as pp1. the otz is suitable for children age 4.5 to 7.5 and assesses m. salaschek & e. souvignier 59 | f l r precursor skills such as counting, sorting, and comparing quantities. at the end of first grade, the german mathematics test for first grade (demat 1+; krajewski, küspert, & schneider, 2002) was chosen as end-ofyear criterion (pp2). the demat 1+ was developed following models of early mathematical development, but mainly assesses curricular goals from first grade, e.g., addition/subtraction in the range of 1-20 and (de)composition of numbers. at the end of second grade, the german mathematics test for second grade (demat 2+; krajewski, liehm, & schneider, 2004) was chosen for inspecting long-term predictive validity (pp3). the demat 2+ assesses the main curricular goals from second grade, e.g., basic arithmetic operations in the range of 1-100, number properties, and geometry problems. paper pencil tests were groupadministered within one 45-minute period in all classrooms 1 . all paper-pencil data were collected and put in by trained university students. results were calculated automatically from raw test answers to prevent scoring errors. before each paper pencil test, teachers were asked to rate each of their students' overall mathematic competence on a 7-point likert scale. 2.4 usability and practicality for study 1, several measures of feasibility of the progress monitoring tests were assessed. students were surveyed about the computer tests after completion of all eight probes, asking (1) how they liked the tests, and (2) how they would like to do more tests in the next school year. a 5-point likert scale using smiley faces was used as answer format. additionally, as a measure of direct usability, the time needed to complete each test was logged by the test system. finally, all 10 teachers from study 1 completed a survey about implementation time and their usage of test results. 3. results study 1 3.1 internal reliability we computed the internal reliability for total scores and the three competences. mean reliability of total scores was .86 and varied within a narrow range, demonstrating good overall internal consistency. reliabilities of the single competences were lower: while advanced precursors showed satisfactory reliability, coefficients of basic precursors and computation ranged from low to acceptable (see table 3). 1 otz tasks were slightly adjusted to allow group administration (no german standardised paper pencil test that originally allows group administration was available). for demat 1+ and demat 2+, one task was omitted that had not been introduced in any of the participating classes at the time of testing. thus, overall results are not directly comparable to the reference sample reported by the test authors. m. salaschek & e. souvignier 60 | f l r table 3 internal consistencies of progress monitoring overall scores and competence scores progress monitoring overall score basic precursors advanced precursors computation time 1 .84 .65 .72 .71 time 2 .86 .60 .78 .71 time 3 .85 .62 .79 .69 time 4 .85 .55 .81 .74 time 5 .87 .65 .83 .74 time 6 .86 .64 .80 .76 time 7 .88 .66 .82 .79 time 8 .88 .65 .84 .79 m .86 .63 .80 .74 3.2 concurrent and predictive validity 3.2.1 school achievement tests 2 as a measure of concurrent validity, correlations between the progress monitoring tests and grade 1 fall pp1 scores were moderate, with .40 ≤ r ≤ .50. to assess the progress monitoring tests' capacity to predict later mathematics performance early in the school year, correlations between the first four tests and grade 1 spring pp2 scores were calculated. coefficients were higher, with .64 ≤ r ≤ .71, indicating strong predictive validity for the end-of-year performance. correlations between the first four progress monitoring tests and pp3 scores at the end of grade 2 were only slightly lower, with .61 ≤ r ≤ .68. later progress monitoring tests related to the pp2 and pp3 scores to a somewhat lesser degree (see table 4). 2 our study design resulted in data with a hierarchical structure (students nested in classrooms), and some intra-class correlations (icc) suggested that error variances may be underestimated if this was not accounted for (the mean icc for all progress monitoring and paper pencil tests was .08). we therefore performed multi-level modelling (using mplus 7.11) in addition to single-level modelling for all correlational analyses in both studies. concerning correlations, the maximum absolute difference between the methods in study 1 and 2 was .04 and .03, respectively. the mean difference of all correlation coefficients was <.01 and .01, respectively, with multi-level mean correlations being marginally higher in study 2. furthermore, there was no meaningful difference in the mean standard error (mdiff < .01; the single maximum absolute difference was .03), and all p levels were identical. because of the relatively small number of classrooms and because single-level results are slightly more conservative, we report results from single-level analyses. m. salaschek & e. souvignier 61 | f l r table 4 concurrent and predictive validity of progress monitoring scores measure 1 2 3 4 5 6 7 8 9 10 1. time 1 2. time 2 .74 3. time 3 .70 .80 4. time 4 .67 .74 .76 5. time 5 .62 .69 .69 .73 6. time 6 .64 .67 .74 .77 .73 7. time 7 .59 .59 .70 .76 .74 .80 8. time 8 .54 .59 .66 .68 .68 .75 .76 9. pp1 .41 .50 .47 .44 .45 .47 .43 .40 10. pp2 .64 .66 .65 .71 .62 .58 .59 .61 .46 11. pp3 a .61 .68 .65 .68 .51 .56 .57 .50 .42 .76 note. all correlation coefficients were statistically significant at an alpha level of p < .001. pp = paper pencil test a n = 148 3.2.2 teacher ratings teachers' ratings of their students' mathematical ability were correlated with the progress monitoring test scores (see table 5). results initially revealed low to moderate correlations between the progress monitoring scores and ratings provided at the beginning of grade 1 (teacher rating 1; .29 ≤ r ≤ .42). correlations with ratings provided at the end of grade 1 were substantially higher (teacher rating 2; .54 ≤ r ≤ .64) and remained stable for ratings provided at the end of grade 2 (teacher rating 3; .54 ≤ r ≤ .66), indicating high predictive validity. m. salaschek & e. souvignier 62 | f l r table 5 correlations between progress monitoring scores and teacher ratings of students' mathematical ability, provided at grade 1 fall, grade 1 summer, and grade 2 summer progress monitoring teacher ratings 1 2 3 time 1 .39 .60 .60 time 2 .40 .62 .66 time 3 .42 .64 .66 time 4 .37 .59 .62 time 5 .38 .54 .58 time 6 .34 .60 .59 time 7 .29 .54 .58 time 8 .37 .56 .54 note. all correlation coefficients were statistically significant at an alpha level of p < .01. 3.3 usability and practicality median test time for the first progress monitoring test was 15.48 minutes (sd = 4.81). later test times were considerably lower and declined continuously, from 13.85 minutes for test 2 (sd = 4.37) to 8.20 minutes for test 8 (sd = 3.81). the difference between the first test and all other tests was partly due to initial starting introductions to the test (approx. 1 minute) and to the students' unfamiliarity with the system. in the survey about the progress monitoring tests, students rated the tests highly, with mean scores of 4.28 (sd = 1.05) on the question, "how did you like the tests?" and 4.34 (sd = 1.13) on the item, "would you like to do the tests again next school year?" (on a smiley faces scale from 1, very unhappy to 5, very happy). 4% and 7% of the students rated the items negatively (scale points 1 or 2), opposed to 71% and 78% positive ratings (scale points 4 or 5). the 10 teachers who participated in study 1 gave similar estimations in the questionnaire provided after completion of the progress monitoring tests. on the 4-point likert scale (disagree to agree), all teachers agreed that, "most of the students had fun completing the tests" (m = 3.70). the same distribution of answers was found for the item, "the students were able to conduct the tests independently". nine teachers stated that the added benefit of the tool was worth the additional timely effort (m = 3.10). moreover, these teaches stated that they would continue to use the system in the next school year (m = 3.60) and recommend the program to fellow colleagues (m = 3.50). teachers declared that they used the progress monitoring results diversely for classroom purposes. apart from obtaining general performance information at student and class level (100%, 70% agreement, respectively), teachers found the information especially useful when they were previously unsure of a student's performance (70% also used the system for this purpose). most teachers adjusted their estimate of students' performance for some students (80% agreement) and claimed to have at least sometimes given weaker or stronger students adjusted exercises based on progress monitoring test results (70%, 90% agreement for weaker or stronger students, respectively). eight teachers stated that supplementary education for weak students was offered at their schools, and information from the progress m. salaschek & e. souvignier 63 | f l r monitoring tests was used for designing the supplementary education at six of these schools. a majority of respondents also found the information important for communicating about performances with students, parents and fellow teachers (90% agreement). the main concern of several teachers participating in the study was the two-week time frame per test in that study. they wished for three-week testing intervals to allow more time for analysing and working with the results. 4. results study 2 while study 1 evaluated the test's validity as well as its usability and practicality, study 2 focused on the reliability and sensitivity to learning. with respect to the different aims of the two studies, analyses also differed between the studies. additionally, given the extended test intervals and because some of the test items were adjusted concerning their difficulty for study 2, results differ slightly from study 1. 4.1 alternate-form reliability we calculated the delayed alternate-form reliability for each adjacent test (t1 × t2, t2 × t3, … t7 × t8). coefficients ranged from r = .71 to .83 (m = .78), which is a sign for parallelism across tests. parallelism is also indicated by the pattern of correlations between non-adjacent tests (see table 6), which decreased only slightly with increasing amount of time between the probes (e.g., test 1 × test 4). table 6 delayed alternate-form reliability of progress monitoring scores, study 2 progress monitoring 1 2 3 4 5 6 7 1. time 1 2. time 2 .71 3. time 3 .65 .74 4. time 4 .68 .76 .81 5. time 5 .67 .71 .78 .82 6. time 6 .60 .60 .64 .74 .77 7. time 7 .57 .63 .67 .74 .77 .79 8. time 8 .59 .67 .69 .69 .75 .76 .83 note. correlations of same test forms are printed in bold. all correlation coefficients were statistically significant at an alpha level of p < .001. 4.2 sensitivity to learning the test's overall capacity to assess learning gains was determined by calculating growth rates in test scores using linear regression for the eight tests. weekly growth rates were obtained by dividing the resulting slopes by 3 because of the three-week time frame of each test. weekly increases in overall scores of 1.0 percent points could be observed (see table 7; descriptive statistics for study 1 are listed in the appendix), m. salaschek & e. souvignier 64 | f l r with larger weekly gains for advanced precursors and computation skills than for basic precursors. smaller basic precursors gains are mainly due to the symbolic quantity discrimination task which revealed ceiling effects from the first probe (see figure 2). table 7 descriptive statistics and growth rates for competences, study 2 overall score basic precursors advanced precursors computation progress monitoring m sd m sd m sd m sd time 1 62.1 11.7 79.4 13.0 55.9 19.0 46.1 15.4 time 2 66.5 14.0 79.1 12.6 65.6 21.9 50.6 19.1 time 3 70.8 14.5 83.1 12.4 66.6 23.1 59.1 19.5 time 4 74.3 14.8 84.6 10.9 73.0 23.0 61.9 22.9 time 5 75.6 13.6 86.1 10.5 72.5 20.5 65.0 20.7 time 6 78.1 14.5 87.0 11.2 74.3 20.6 70.6 21.5 time 7 82.8 13.4 90.1 9.6 80.2 21.3 76.0 21.3 time 8 81.2 14.4 88.5 11.1 77.8 22.6 75.2 22.6 growth rate 1.0 0.5 1.0 1.5 note. all scores as percentage correct. growth rates are weekly growth rates, calculated as slopes of linear regressions of the 8 tests divided by 3 (because of the three-week delay between each test in study 2). figure 2. growth rates for single measures in study 2 (n = 153). statistical significance of growth rates for overall scores was examined by conducting repeatedmeasures analyses of variance. mauchly's test revealed a violation of sphericity (p < .001). thus, greenhouse-geisser corrections were used (greenhouse & geisser, 1959). results indicate an effect of time, f(5.50, 836.18) = 137.73, p < .001, η² = .48. there was also a significant effect of time for the three single m. salaschek & e. souvignier 65 | f l r competences basic precursors, f(6.22, 945.13) = 35.14, p < .001, η² = .19; advanced precursors, f(6.04, 917.67) = 51.47, p < .001, η² = .25; and computation, f(5.63, 855.82) = 96.95, p < .001, η² = .39. post hoc tests were performed to analyse for significant increases from test to test. all six increases in total scores from test 1 to test 7 were significant (see table 8). however, scores decreased from test 7 to test 8. for basic precursors and advanced precursors, 4 and 3 of the six increases from test 1 to 7, respectively, were significant (p < .05) as well as all six increases for computation scores. decreases from test 7 to 8 were significant only for advanced precursors, t(152) = 1.69, p = .049. table 8 comparisons of mean differences in progress monitoring scores for study 2 comparisons mean score difference (sd) t df p time 1 – time 2 -2.26 (5.20) -5.37 152 < .001*** time 2 – time 3 -2.25 (5.32) -5.23 152 < .001*** time 3 – time 4 -1.80 (4.73) -4.72 152 < .001*** time 4 – time 5 -0.67 (4.46) -1.87 152 .031* time 5 – time 6 -1.33 (5.01) -3.28 152 .001** time 6 – time 7 -2.45 (4.77) -6.18 152 < .001*** time 7 – time 8 0.86 (4.23) 2.24 152 .014* 5. discussion the current study extends the research on progress monitoring for young students by using an automated assessment tool that allows frequent tests in regular-education settings and provides educators with detailed information about students' skills. the primary goal of the study was to determine the adequacy of the newly-developed progress monitoring tool. first-grade students work independently on the short online tests, so that diagnostic information about students' performance and progress is obtained with minimal instructional time. the tool uses a combination of robust indicator and curriculum sampling approaches to comprehensively assess nine short measures of mathematic performance forming three competences. static scores and longitudinal psychometric properties were investigated alongside feasibility and usefulness for instructional changes. first, with regard to reliability, the overall scores of the progress monitoring tests showed good internal consistencies within a narrow range. consistencies of individual competence scores – particularly basic precursors and computation – were considerably lower. low coefficients for basic precursors may be due to ceiling effects; computation consistencies were larger for later tests, which may indicate that the three measures within the competence set are distinct skills at first. the distribution of difficulties (see figure 2) contributes to this interpretation. correlations between adjacent tests as a measure of delayed alternate-form reliability were strong, which indicates reliable assessment of students' performance despite the young age of the students. increasing adjacent-test correlations after test 3 (see table 6) argue that frequent tests are advantageous. second, progress monitoring tests 1 to 4 were closely related to the paper pencil results and teacher ratings at the end of first and second grade (pp2 and pp3). noteworthy is the stability of the predictions over time, which indicates that the progress monitoring tests in the first half of the school year assess skills m. salaschek & e. souvignier 66 | f l r particularly important for long-term mathematics success. somewhat lower correlations between tests 5 to 8 and the standardised tests pp2 and pp3 may be because – as indicated in figure 2 – some children showed ceiling effects at the end of the school year. some ceiling effects are a desired result because test items are designed to represent end-of-year competence goals, which several students typically already reach earlier in the school year. yet, reduced variance of progress monitoring tests is likely to result in a slight reduction of correlations with standardised measures of mathematical competence. progress monitoring results were less closely related to the paper pencil test at the beginning of grade 1, which merely assessed precursor abilities and was only moderately predictive of the results of the later paper pencil achievement tests (see table 4). moderate predictive value was also observed for the first performance ratings by the teachers, who had known their students for about two months at that time (correlations between teacher rating 1 and pp2/pp3 were r = .44 and .43, respectively). thus, in addition to the detailed results on precursor abilities from standardised tests (e.g., otz), the progress monitoring tests can provide teachers with information about students' abilities vital for long-term learning growth. third, the tests proved to be sensitive to learning growth with increasing scores from progress monitoring test 1 to 8 in all competences. however, some scores decreased in the last test, an occurrence which has also been observed in other progress monitoring research when frequent tests were conducted (förster & souvignier, 2011; hampton et al., 2012). for progress monitoring tests 1 to 7, all test-to-test increases were significant for overall scores and computation. for basic precursors and advanced precursors – skills that were expected to be mastered before or soon after school entrance – higher overall scores than for computation were observed, and only some of the increases were significant. thus, growth patterns of these two single competences should be interpreted with caution and over longer time periods. finally, several measures of feasibility and usefulness of the tool showed adequate results. the time that students needed to complete a test was low, and the students were able to work on the tests independently. the remaining implementation effort was justified in the eyes of the teachers, a precondition for frequent and beneficial use. teachers also stated that they used the results in diverse ways for classroom purposes and individualised instructions, although the exact scope of instructional changes remains unknown. to conclude, the study at hand addresses a number of issues that were discussed in previous research. by including measures from two approaches, robust indicators and curriculum sampling, the progress monitoring tool provided teachers with performance information about tasks which are directly related to classroom work. at the same time, the combination of different measures proved to be reliable and highly predictive of students' shortand long-term performance. overall scores increased from test to test for all but the last data point, enabling teachers to judge their students' progress and implement necessary interventions rapidly. low testing times and concise results views provide an adequate basis for use in general education. 5.1 limitations at least five limitations should be taken into account when generalising the findings of this study. first, although the participating classrooms were selected from rural and urban areas in different school districts, all schools were in the same federal state, and results could differ in other regions of germany. second, the differing test intervals and slightly adjusted test items between study 1 and 2 limit the comparability of results between the studies. third, no direct measure of parallel-forms reliability was obtained because different test forms were not administered at the same time. all test items were designed using detailed algorithms to ensure similar difficulties, and narrow-ranging reliability coefficients (a) for adjacent tests in study 2 and (b) for predictive validity in study 1 suggest some degree of parallelism. nonetheless, parallelism of the test concept should be assumed with caution until direct parallel-forms reliability has been determined. fourth, slightly larger test score increases in the first few progress monitoring tests (when students are still somewhat unfamiliar with the computer tests) may indicate some degree of retest effects. however, m. salaschek & e. souvignier 67 | f l r large differences in the slopes of different measures (cf. figure 2) and teachers' ratings of the usability of the tests for children suggest that this effect is small. finally, the added value of the basic precursors competence for the majority of students remains questionable. basic precursors scores showed ceiling effects early, with low internal consistencies and limited increases over time. the competence was included in the test as a measure for skills which students should already have acquired before school entrance. teachers should therefore pay special attention to students who do not reach high basic precursors scores. 5.2 implications for research and practice several different competences were included in the test concept at hand to provide teachers with detailed information about students' strengths and weaknesses, as recommended by methe (2012). overall scores were highly predictive of the students' long-term learning outcome, and teachers stated to utilise the information for individualised instruction and supplementary education. single competence scores in part showed lower levels of internal consistency and sensitivity to learning growth than desired. teachers should thus prefer overall test scores when making high-stakes educational decisions. results of the nine single measures can be used at individual level to detect specific deficiencies that prevent a student from advancing in other competence areas. all in all, general education teachers can use the progress monitoring tool to reliably and quickly assess different aspects of their students' mathematics performance and the development over time. a review by stecker et al. (2005) showed that the use of progress monitoring tools resulted in higher learning gains specifically if educators were provided with diverse information about student competences, which they then utilised for individualised instruction. most participating teachers in our study stated that they used the results to adjust their classroom work. however, the extent and success of these adjustments have not been assessed. we recommend two fields of interest for further research in this domain. first, the specific contribution of single competences for the performance of different groups of students remains to be determined. for low-performing students, certain precursor cut-off scores may provide a more accurate risk estimation of long-term mathematics success than total scores. second, it remains largely unexplored how teachers systematically use progress monitoring information to enhance student learning. although the tool at hand includes several measures that are directly related to the curriculum, the review by stecker et al. (2005) suggests that teachers need additional support with "translating" diagnostic information into improved classroom work. keypoints web-based progress monitoring is used for highly automated documentations of learning progress scores of progress monitoring tests are highly predictive of mathematics performance at the end of first and second grade first-grade students worked on the tests independently and with high satisfaction the short tests with nine different measures in three competences were sensitive to learning growth, showing test-to-test increases teachers stated to use progress monitoring results diversely for individualised instruction m. salaschek & e. souvignier 68 | f l r references aunola, k., leskinen, e., lerkkanen, m.-k., & nurmi, j.-e. (2004). developmental dynamics of math performance from preschool to grade 2. journal of educational psychology, 96(4), 699–713. doi:10.1037/0022-0663.96.4.699 baglici, s. p., codding, r. s., & tryon, g. (2010). extending the research on the tests of early numeracy: longitudinal analyses over two school years. assessment for effective intervention, 35(2), 89–102. doi:10.1177/1534508409346053 berch, d. b. (2005). making sense of number sense: implications for children with mathematical disabilities. journal of learning disabilities, 38(4), 333–339. doi:10.1177/00222194050380040901 chard, d. j., clarke, b., baker, s. k., otterstedt, j., braun, d., & katz, r. (2005). using measures of number sense to screen for difficulties in mathematics: preliminary findings. assessment for effective intervention, 30(2), 3–14. doi:10.1177/073724770503000202 clarke, b., baker, s., smolkowski, k., & chard, d. j. (2008). an analysis of early numeracy curriculumbased measurement: examining the role of growth in student outcomes. remedial and special education, 29(1), 46–57. doi:10.1177/0741932507309694 clarke, b., nese, j. f. t., alonzo, j., smith, j. l. m., tindal, g., kame’enui, e. j., & baker, s. k. (2011). classification accuracy of easycbm first-grade mathematics measures: findings and implications for the field. assessment for effective intervention, 36(4), 243–255. doi:10.1177/1534508411414153 clarke, b., & shinn, m. r. (2004). a preliminary investigation into the identification and development of early mathematics curriculum-based measurement. school psychology review, 33(2), 234–248. clause, c. s., mullins, m. e., nee, m. t., pulakos, e., & schmitt, n. (1998). parallel test form development: a procedure for alternate predictors and an example. personnel psychology, 51(1), 193–208. retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=buh&an=487650&lang=de&site=ehost-live collins, l. m., schafer, j. l., & kam, c.-m. (2001). a comparison of inclusive and restrictive strategies in modern missing data procedures. psychological methods, 6(4), 330–351. doi:10.1037/1082989x.6.4.330 connor, c. m., morrison, f. j., & petrella, j. n. (2004). effective reading comprehension instruction: examining child x instruction interactions. journal of educational psychology, 96(4), 682–698. doi:10.1037/0022-0663.96.4.682 dehaene, s. (1992). varieties of numerical abilities. cognition, 44(1-2), 1–42. doi:10.1016/00100277(92)90049-n dehaene, s. (2011). the number sense. how the mind creates mathematics (2nd ed.). new york, ny: oxford university press. dehaene, s., & cohen, l. (1995). towards an anatomical and functional model of number processing. mathematical cognition, 1(1), 83–120. deno, s. l. (2003). curriculum-based measures: development and perspectives. assessment for effective intervention, 28(3-4), 3–11. doi:10.1177/073724770302800302 duncan, g. j., dowsett, c. j., claessens, a., magnuson, k., huston, a. c., klebanov, p., … japel, c. (2007). school readiness and later achievement. developmental psychology, 43(6), 1428–1446. doi:10.1037/0012-1649.43.6.1428 foegen, a., jiban, c. l., & deno, s. l. (2007). progress monitoring measures in mathematics. the journal of special education, 41, 121–139. förster, n., & souvignier, e. (2011). curriculum-based measurement: developing a computer-based assessment instrument for monitoring student reading progress on multiple indicators. learning disabilities: a contemporary journal, 9(2), 65–88. gersten, r., jordan, n. c., & flojo, j. r. (2005). early identification and interventions for students with mathematics difficulties. journal of learning disabilities, 38(4), 293–304. greenhouse, s. w., & geisser, s. (1959). on methods in the analysis of profile data. psychometrika, 24(2), 95–112. m. salaschek & e. souvignier 69 | f l r hampton, d. d., lembke, e. s., lee, y.-s., pappas, s., chiong, c., & ginsburg, h. p. (2012). technical adequacy of early numeracy curriculum-based progress monitoring measures for kindergarten and first-grade students. assessment for effective intervention, 37(2), 118–126. doi:10.1177/1534508411414151 jordan, n. c., kaplan, d., oláh, l. n., & locuniak, m. n. (2006). number sense growth in kindergarten: a longitudinal investigation of children at risk for mathematics difficulties. child development, 77(1), 153–175. doi:10.2307/3696696 kim, y.-s., petscher, y., schatschneider, c., & foorman, b. (2010). does growth rate in oral reading fluency matter in predicting reading comprehension achievement? journal of educational psychology, 102(3), 652–667. doi:10.1037/a0019643 koponen, t., aunola, k., ahonen, t., & nurmi, j.-e. (2007). cognitive predictors of single-digit and procedural calculation skills and their covariation with reading skill. journal of experimental child psychology, 97(3), 220–41. doi:10.1016/j.jecp.2007.03.001 krajewski, k. (2008). prävention der rechenschwäche. [the early prevention of math problems]. in w. schneider & m. hasselhorn (eds.), handbuch der pädagogischen psychologie (pp. 360–370). göttingen: hogrefe. krajewski, k., küspert, p., & schneider, w. (2002). demat 1+. deutscher mathematiktest für erste klassen. [german mathematics test for first grades]. göttingen: beltz test. krajewski, k., liehm, s., & schneider, w. (2004). demat 2+. deutscher mathematiktest für zweite klassen. [german mathematics test for second grades]. göttingen: hogrefe. krajewski, k., & schneider, w. (2009). early development of quantity to number-word linkage as a precursor of mathematical school achievement and mathematical difficulties: findings from a fouryear longitudinal study. learning and instruction, 19(6), 513–526. doi:10.1016/j.learninstruc.2008.10.002 methe, s. a. (2012). innovations and future directions for early numeracy curriculum-based measurement: commentary on the special series, part 2. assessment for effective intervention, 37(2), 67–69. doi:10.1177/1534508411431256 methe, s. a., begeny, j. c., & leary, l. l. (2011). development of conceptually focused early numeracy skill indicators. assessment for effective intervention, 36(4), 230–242. doi:10.1177/1534508411414150 missall, k. n., mercer, s. h., martínez, r. s., & casebeer, d. (2012). concurrent and longitudinal patterns and trends in performance on early numeracy curriculum-based measures in kindergarten through third grade. assessment for effective intervention, 37(2), 95–106. doi:10.1177/1534508411430322 newton, h. j., baum, c., clayton, d., franklin, c., garrett, j. m., gregory, a., … royston, p. (2004). multiple imputation of missing values. the stata journal, 4(3), 227–241. retrieved from http://www.stata-journal.com/article.html?article=st0067 rubin, d. b. (1987). statistical analysis with missing data (4th ed.). new york, ny: wiley. rubin, d. b. (1996). multiple imputation after 18 + years. journal of the american statistical association, 91(434), 473–489. doi:10.1080/01621459.1996.10476908 schafer, j. l., & graham, j. w. (2002). missing data: our view of the state of the art. psychological methods, 7(2), 147–177. doi:10.1037/1082-989x.7.2.147 seethaler, p. m., & fuchs, l. s. (2011). using curriculum-based measurement to monitor kindergarteners’ mathematics development. assessment for effective intervention, 36(4), 219–229. doi:10.1177/1534508411413566 siegler, r. s., & booth, j. l. (2004). development of numerical estimation in young children. child development, 75(2), 428–444. doi:10.1111/j.1467-8624.2004.00684.x stecker, p. m., fuchs, l. s., & fuchs, d. (2005). using curriculum-based measurement to improve student achievement: review of research. psychology in the schools, 42(8), 795–819. doi:10.1002/pits.20113 van luit, h., van de rijt, b., & hasemann, k. (2001). osnabrücker test zur zahlbegriffsentwicklung [osnabrück test of number concept development]. göttingen: hogrefe. microsoft word chorney_publication.docx frontline learning research vol.5 no. 1 (2017) 43 -‐ 57 issn 2295-‐3159 contact information: sean chorney, 8888 university drive, v5a 1s6, burnaby, canada, email: sean_chorney@sfu.ca doi: http://dx.doi.org/10.14786/flr.v5i1.229 re-animating the mathematical concept: a materialist look at students practicing mathematics with digital technology sean chorney simon fraser university, canada article received 27 november / revised 30 september / accepted 26 october / available online 15 february abstract this paper proposes a philosophical approach to the mathematical engagement involving students and a digital tool. this philosophical proposal aligns with other theories of learning that have been implemented in mathematics education but rearticulates some metaphors so as to promote insight and ideas to further support continued investigations into the learning of mathematics. in particular, this philosophical proposal takes seriously the notion that a priori to activity, there are no objects which in turn challenge the notions of intention, affordance and/or representation. to exemplify this perspective, two episodes of grade nine students using a dynamic geometry software are analysed to elaborate how mathematics can be seen to emerge from working with a tool. keywords: post-humanism; materialism; mathematics education; theories of learning chorney | f l r 44 1. introduction with the proliferation of digital tools in the modern age, the implementation of these tools into the mathematics classrooms is becoming ubiquitous. these tools that include graphing calculators, ipads, iphones are implemented for the purpose of improving mathematical learning. since technology is changing so fast and new applications are being created everyday, different learning theories are necessary to accommodate or adapt to these new technologies. indeed, it is ultimately a philosophical question as to how a material object, a technological/mathematical tool for example, can become part of, contribute to, and help develop mathematical thinking. however, in addressing this philosophical challenge, this paper moves away from the “learner” as central and asserts that mathematics emerges amid the dynamic relations of humans and materials. the highlighting of materials partially aligns with actor network theorists since actor network scholars value the contributions of material objects in the production of knowledge. however, this study notes that materials are variable and often boundary-less and as such the perspective taken in this paper takes seriously the positioning exemplified by ingold’s (2011) provocation that there are no materials a priori to an activity. this paper explores a profound philosophical shift in terms of where it locates the "thinking" in a mathematical activity. this shift has been called post-humanist (barad, 2007) as well as new materialist (de freitas & sinclair, 2014). it is common in teaching to describe knowing in a concrete and human-centric way. “she knows the quadratic formula” is a simple phrase that conveys the ownership of a proposition. however the implication in such statements place the human as central knower and agent. in education studies, the student is often central, and although this seems appropriate within an educational system with a mandate to educate the individual, new theories are pointing to alternate conceptions that may offer productive new ways of understanding tool-based mathematical activity. i adopt the term post-humanist (barad, 2007) to challenge the isolating tendency or reductionist approach to see the human as the central actor. in post-humanist practice, the human is considered to be just one of many “actors” involved in that practice. other actors can include such things as tools, social influences, concepts or even a task itself. i draw on post-humanism not to deny the intangible, experiential aspect of mathematical practice but instead to implicate the dynamic and significant effect of the artefact on practice, and, in this study, how this implicates the emergence of the mathematical. that is, the material aspect of working with concrete tools is more than a matter of mastering a tool to exploit its affordances. a student and a tool might instead be seen as being on more equal footing, so that the tool is not simply subjugated to the all-knowing or all-deciding human. in this way post-humanism refers to the idea that mind and matter are not ontologically distinct. new materialism aligns with the ontological monism of spinoza and the historical materialism of marx, adopting the important aspects of the political, the ontological and the epistemological into mathematics education. the term “new”, however, in “new materialism” moves into a world of agency and animation and argues for mobility and action as necessary components to generative development. when distinctions of human agency and matter are challenged, particularly in education, different ways of seeing emerge in how the use of tools can generate interesting ways of thinking about learning. indeed, similar to learning theories, many current methods privilege the human actor (e.g. in transcripts, in modes of describing events in terms of being actor-driven). while there are numerous contributory frameworks that address the questions that emerge in an analysis of the relational engagement of a human working with a material, i suggest this study differs from many approaches by adopting both a post-humanist and a new materialist perspective. throughout this paper i draw on how these perspectives come together in one framing. in part, the post-humanism draws attention away from human and the new materialist approach looks more to the material. it is the combination of both approaches that i suggest differs from previous work in mathematics education. i suggest that to create turbulence in how one “reads” common learning situations is the first step towards new ways of seeing. i do not present insights based on the intrinsic nature of learning but rather a chorney | f l r 45 metaphoric re-description (hess, 1980) that in and of itself can lead to different ways of seeing. the question addressed in this paper is not about informing educators about the way things are. the purpose of introducing a post-humanist and new materialist approach is for the production of potential possibilities and to explore alternative ways of understanding different framings of mathematical activity. following from this re-conception of mathematical practice, the underlying question that emerges in this paper asks what insightful understandings and interventions emerge from a post-humanist/new materialist approach. in the first section, i discuss current approaches to theorizing tool-use in mathematics education. i then contrast these with a post-humanist, new materialist perspective on tool-use, which challenges the human-centric view of activity and argues for a process ontology. i draw upon the work of andrew pickering and articulate his resistance/accommodation model as the way in which scientists and mathematicians create new machines and/or ideas. i also expand on tim ingold’s work and his argument that tools are not distinct entities and should be seen as narratives—that is, how they may contribute to function. in this study i present two narratives that support a storied knowledge of actual engagement1. finally, i briefly describe the work of karen barad, who offers the powerful notion of intra-action in her new materialist account of the nature of scientific concepts, in which concepts and tools, for example, are seen as a single entity (a relation) rather than as two interacting things (or relata). i suggest that each of these scholars has identified particular and distinctive approaches to tools that are relevant to the study of tool-use in the mathematics classroom. i finish with the analysis of two episodes of grade nine students using a dynamic geometry software in terms of the framework described in this paper. 2. tools in mathematics the theoretical foundation of tool-use in mathematics education continues to be problematic (waltz, 2006). at one extreme, mathematics is valued as a mental discipline where concrete tools are dismissed as being a mere aid to learning, but not as constitutive of the knowing or of the mathematics (see balacheff, 1988). another approach is that tools can be considered an essential element to mathematical practice—a position that rotman (2008) encapsulates nicely when he says that mathematics has been, and will continue to be, involved in a two-way, co-evolutionary relationship with machines. here, the implication is that there would be no mathematics without mathematical tools, and vice versa. in mathematics education research, some of the more common frameworks for addressing the challenges of accounting for how tools are brought into and included in mathematical activity are instrumental genesis and semiotic mediation. for example, instrumental genesis suggests the instrumentalizing of an artefact develops over time as people use them for particular purposes, thus transforming them into tools. this approach distinguishes the instrumetalization of a tool as part artefact, part cognitive schema (artigue, 2002). the individual’s mental schemes together with the artefact’s inherent potential is what makes the artefact an instrument. the artefact is a material object, but an instrument is a psychological construct. verilon and rabardel (1995) define this process as one of appropriation, of making the tool one’s own. instrumental genesis does not focus solely on individuals with tools; instead, it also incorporates socio-cultural issues, including institutional meaning, class norms, and teacher’s expectations. according to ruthven (2002), these social factors are integral to the activity of instrumentation. another well-developed framework for thinking about tools within mathematics education is that of semiotic mediation (bartolini-bussi & mariotti, 2008). the theory of semiotic mediation has been developed specifically within the field of mathematics education with a focus on analysing the semiotic potential of a 1 ingold seems to use “narrative” and “storied knowledge” interchangeably. however, in the mathematics education literature, story and narrative often have different and distinct meanings (see dietiker, 2013) chorney | f l r 46 tool—that is, its potential for linking personal meanings with mathematical ones—and then studying how it can be realized by teachers in classroom interactions. this theory is based on the work of vygotsky and has been significant in socio-cultural and –historical understanding of thinking. in each of these frameworks, the tool plays an important role and influences the material situation of learning. another framework that has recently emerged is proposed by nemirovsky et al. (2013). nemirovsky et al. argue against the dualism of tool-mediated expression and mathematical understanding (and its related dualism of body and mind). in their non-dualist approach, they offer an approach to mathematical thinking that involves the temporal and developing entwining of perception and motor skills, and call the “interpenetration” of perception and motor skills “fluency”, the development of which constitutes mathematical thinking. a mathematical instrument for nemirovsky et al. is material, semiotic and has a set of embodied practices. they contrast their use of the term “instrument” with that of the theory of instrumental genesis, where an instrument is a mixed entity, part artifact (material) and part schema (mental). they see mathematical thinking as the bodily experience of developing fluency on the mathematical instrument, stating, “mathematical activity is constituted by bodily activity” and, correspondingly, mathematical learning as the “transformations in learners’ engagement in mathematical practices” (p. 376). the theory of enactivism based on systems theory (eg. varela, thompson, & rosch, 1991) has also been implemented in mathematics education and goes further in challenging boundaries of human body and environment. goodchild (2014) describes enactivism “as active processes that occur directly through the interaction between the cognizing subject and the environment, rather than as a construction of representations of the environment by the cognizing subject” (page 210). these theories have contributed insightful approaches and encourage further exploration to enrich perspectives and provide continued generative ways of thinking about education. the approach outlined in this paper aligns in many way with the previously mentioned frameworks. i suggest however that this study deviates from these approaches in two significant ways. i draw upon ingold’s notion that there are no objects a priori to activity. i refer to roth (2011) to support this difference, he notes that “kant, piaget, the constructivists, and the embodiment/enactivist theorists all presuppose a subject…” (p. 225). ingold’s provocation goes one step further than roth’s observation by denying any a priori distinctions of individuated entities. this has profound implications for what mathematics is since it cannot be found “in” the cognizing subject. this is the second deviation. mathematical concepts are framed as material. in the following section, i describe three different theoretical approaches that challenge the positioning of humans in activity and also in their conceptualizing of tools and of concepts. 3. frameworks to move away from subject object positions in this section, i look at andrew pickering, who draws attention to the temporal aspect of practice and gives materials a voice. i also look at tim ingold, an anthropologist who proposes that all “things” are snapshots of processes, including tools. i also draw on barad for her approach to concepts as physical apparatus. finally, i move to de freitas and sinclair’s elaboration of barad’s materialism in a mathematics educational framework. these perspectives each contribute a certain way of looking at data, and i suggest they lead to pressing questions and insights that support a unique way of seeing mathematics emerge from tool use. 4. material agency chorney | f l r 47 pickering (1995) offers the most accessible approach to the interactions of tool and human arguing for a back and forth model. he is concerned with the advancement and progression of scientific practice (including mathematics) in activity. in one of his case studies, he articulates how donald glaser attempts to accumulate data on strange particles. pickering argues that in glaser’s experimentation obstacles were a natural part of interacting with materials. consequently, glaser’s final result was very different from his initial intentions. pickering uses this example to emphasize that materials influence our activities. thus, pickering presents an analysis of scientific advancement in action in which material agency, as he calls it, is attributed to the natural phenomena with which scientists interact. his model can be summed up simply: materials can be understood as having agency when their structures, make-up and design restrict the subject within a context of activity. pickering then argues that scientists will accommodate their actions to overcome obstacles, and identifies accommodation as a response to these obstacles. in this way accommodation can be seen as a readjustment of action. i suggest that material agency has its relevance in mathematics specifically when working with tools. i draw on pickering’s constructs of resistance and accommodation as a way of approaching the phenomenon of an individual interacting with a material, specifically, material tools. for pickering, the construct of resistance can be seen as an obstacle to performing an action. on the other hand, accommodation is the response to resistance, usually in the form of a readjustment of an act. pickering’s construct of resistance is the important construct and may need some elaboration. for pickering, the construct of resistance can be seen as an obstacle to performing an action. it is important to understand in his model that resistance is not a human action but a material action. that is, resistance is a material obstruction to a goal or an intention. but resistance, according to pickering, goes beyond the simple notion of impeding, it is for him both a micro and a macro construct in that it is not only a “challenge” to overcome but also a way that materials, those outside the bounds of social norms and subjective interpretation, determine action. resistance, for example, can be thought of as a “voice” of an entity—a voice that is not audible but is expressed as a dynamic, in the moment, emergence of form. in this way, resistance can be thought of as an expression of the material's agency, giving it a dynamic, active “voice”. given that accommodation, or readjustment, on the part of the human depends upon the resistance of the material, and drives the activity in a fundamental way, it highlights that movement does not reside solely in human initiation. although a quick interpretation of this model seems to support a binary divide between the human and the tool, as well as centralizing the construct of accommodation in the human, this reading does not necessarily commit us to an anthropocentric point of view. in fact, one of the goals of pickering is to move away from human centric activity and he does this by offering the construct of material agency. his model is not about expressing a truth statement (lyotard, 1984) of what is occurring, it is rather a shift of attention that can help illuminate different ways of seeing. 5. becoming as opposed to being the importance of activity and interaction identified in the previous section, leads me to identify a process ontology as a significant part in developing theoretical frames in approaching mathematical activity. i draw from alfred north whitehead (1929/1978) who privileges process as the ontological realty of the world. for any entity to be identified, it is important to consider that there was a preceding activity that brought it to its current “state”. ingold (2011) extends whitehead by describing all “things” as processes, and while “being” implies a state, “becoming” is a process. humans become as they unfold within the weave of the world. he writes, “to move, to know, and to describe are not separate operations that follow one another in series, but rather parallel facets of the same process” (p. xii) implicating not only the practices of a researcher but also the process of learning. in movement then, a person elicits a knowing which is not a chorney | f l r 48 “property of knowing” but a “practice of knowing” and where “knowledge is perpetually under construction within the field of relations" (p. 159) in particular material contexts. ingold also values the material experience before the mental act. in the western world, epistemology is seen in terms of ideas and images, but for ingold, meanings of things come from our embodied experience. ingold states, “practical activity brings incorporeal minds into contact with a material world” (p. 21) and such a process defines the actors. the person is seen as a manifestation of a process of becoming, of continuous creation. this process ontology moves us from a noun-oriented understanding of things to a more verb-oriented approach where all “things” are always in motion. the significance of tools is not in their distinct demarcation, but rather in the role these features play—their dynamism—in relations with human actors. in this study’s analysis, i will mobilize the notion that meaning emerges only in relations. for ingold, the metaphor of lines is very valuable for distinguishing the process of becoming from the state of being. he contrasts the notion of direct lines, which imply transport or a passage from one place to another, with that of wayfaring, which can be thought of as improvisational movement that carves out a path from a starting point. while direct lines require the implementation of intentions, mental images or models in order to get from one place to another (for example, from one van hiele level to the next), wayfaring is first and foremost about the going or the moving. ingold uses the example of looking at art in a gallery to illuminate the difference between direct lines and wayfaring by arguing that looking at art is not a “shuttling back and forth between radically opposed and mutually exclusive domains of mind and world […] but rather to bind mind and world in an ongoing movement” (p. 178). as can be seen in this quotation, ingold is trying to reinscribe thinking within the temporal world of lived experience where there are no station stops to mark new thoughts. if we take looking at art, in ingold’s description, to be like using mathematics tools, then we can see the latter as being less about the development of mental schemes or intentional, efficient deployment, and more about a fused, mind-world movement. in this framing, it becomes important to attend to the largely spontaneous movements of the wayfarer. this can be hard to do in post-hoc assessments of what has changed during mathematical activity, when the direct lines metaphor encourages a view of there having been transport from state a to state b. it is within this framework that ingold (2011) presents a perspective that does not see “things” but adopts emergence in activity as a way of “seeing” reality. that is, he argues one makes a conscious choice to bypass an a priori reflection and offers the notion of narrative to describe the process of movement and knowing. it may seem common sense to identify objects when we look at the world, identifying this and that as objects ready-to-use or as things that exist. however, if we are to take seriously a process ontology, identifying objects can misinform. things “exist” and are “present” if we choose to draw a boundary around them and speak and/or think of them differently than the “other things” around it, but this, of course, is less an ontological reality than it is a choice of distinction or classification. according to linguist, deutscher (2010), language affects how one thinks and perceives the world. the act of nominalization that is common in represetationalist theories of mind can affect how one sees. english is a noun-based language and its use can suggest that it is things or beings that act. ingold’s metaphor is important in this regard because the line does not connect the subject and object like it might do in pickering’s framing or in actor network theory (latour, 1987). instead the line travels in a wavy pattern back and forth between the “subject” and “object” but the line only moves toward or away. the purpose of this metaphor is that it helps shift attention away from the objects so that theorizing of individual things are curtailed because according to a non-dualist ontology nothing acts alone. it is not just an attempt to break down the ontological divide between being and things but also to shift our gaze of analysis to one of function. ingold elaborates his argument against identifying objects as things in and of themselves by suggesting that tools are not determined by their names but by their “storied past”. he uses his own experience of sawing wood as an example to illustrate how the saw as well as himself are drawn into use and become tool and sawer together in activity. he argues that it is not an example of a human using an appropriated tool. the functionality of a tool, if one can describe it as such, is not a result its form or its design alone but is based on its history of use. a distinguishing aspect of ingold’s articulation of tool as chorney | f l r 49 narrative is his moving away from an outline of form. an outline, for ingold, is to make a distinction between inner and outer; it is to establish a boundary and, for ingold, this boundary is the delineation of a closure in which movement is restricted. it is within all these considerations that ingold challenges the notion of affordance originally put forth by gibson. according to ingold, gibson’s notion of affordance is an elusive quality, one that gibson has trouble reconciling. gibson states, “but, actually, an affordance is neither an objective property nor a subjective property; or it is both if you like.” (in ingold, 2011). ingold notes that gibson cannot quite commit to whether an affordance exists in the object or in the relation of use. ingold troubles the notion of affordance arguing that an affordance cannot exist prior to activity. if, for example, a child is placed in a room with a bunch of toys, one might argue that they will see toys and play with them but this only shows that a child can listen to instruction and imitate an approach to a world that has already been organized by way of language and interaction. for ingold, these toys would not be toys until they are “toyed” with, and only then the playing and the engagement that emerges is what “possibly” makes them toys, not because they are called toys in the first place. meta-level descriptions of students working with tools may seem similar in certain ways. for example, the students drew a circle with a compass. however, in more nuanced and detailed observation the narratives of each student might be quite different: some students had their compass slip, one drew an ellipse, another drew part of the circle off their paper. but even in these examples, there are problems because if we accept the notion that there are no things preceding activity, we should not begin the narrative with a student nor a compass. we should begin with a narrative of movement. this approach engages descriptions that do not define a trajectory-like approach to an outcome (eg. drawing a circle) but engages us in a description of a wayfarer, one who becomes and does in the moment, where an endpoint is not conceived (partly because there is no endpoint). 6. concept as material: mobilizing the concept in this section i go further with relations in activity and challenge the boundaries of things. barad (2007), a philosopher, feminist and physicist, adopts a post-humanist perspective in analyzing neils bohr’s quantum physics theories emerging out of early 20th century. she seeks to address bohr’s philosophy that concepts depend upon arrangement of apparatus. bohr’s contemplations of the two-slit experiment and the resulting wave-particle duality of light motivates barad. she notes while much of the community debated the nature of light, bohr observed that it was the “actual experiments that displayed the “dual” nature of matter and light” (p. 105). by looking at this example, barad concludes “the nature of observed phenomenon changes with corresponding changes in apparatus” (p. 106). concepts are not ideational, they depend upon physical apparatus. apparatus is not something that sits on a shelf waiting to serve a particular purpose, barad argues that science experiments have shown throughout history that apparatus are constituted through practice and are often rearticulated in new reworkings; thus, the apparatus is usually thought to merely aid in identifying the concept but is more appropriately thought of as creating them. similar to ingold, barad argues apparatus does not “pre-exist the experiment but rather emerges from it” (p. 142). barad aligning with ingold argues “objects are not already there; they emerge through specific practices” (p. 157). to move away from object and subject, barad uses the term intra-action as an alternative to interaction, positing that the prefix “intra-” is more ontologically sound than “inter-” given that things do not come to act together as individual, fixed parts but rather become together depending on their relation. building upon barad’s work, de freitas and sinclair (2014) describe the inseparability of concept and matter. they argue that the performative boundary making in activity animates the mathematical. this perspective is much more temporal and fluid, challenging the traditional approach of “acquiring” mathematical ideas. instead, mathematics becomes more an experience than reflection. de freitas and sinclair, drawing from barad and extending to mathematics via the work of the philosopher of mathematics gilles châtelet (2000), chorney | f l r 50 term their approach “inclusive materialism” indicating that their “theory of matter [...] resists the binary divide between human agency and inert passive matter” (p. 39). revitalizing materials often considered to be passive or inert challenges the idea of concepts, animating them in terms of materiality. de freitas and sinclair theorize mathematics concepts as engaging both the logical and ontological and argue the fusing and couplings of speech, movement and material items including the body, is the mathematics. in this dynamic relation where mathematics is animated by material tools, practice is affected. as an example, de freitas and sinclair (2014) draw on speech in a learning environment as an example of becoming in the moment. they argue that speech is not a medium to reflect what the thinker has already articulated in the mind, but instead organically flows in real time, as context, gestures and other factors are all active, changing and influencing. to ascribe speech a meaning of communication and/or representation relies too heavily on the meaning of words and to the intention of the speaker, but instead it should be seen as one of the many material factors that participate in a material assemblage2 of learning. these writers focus on activity, not as a process of acting based on what is already known, but as becoming and learning in movement. the tool and student, together, and what they can do, together, is what can be referred to as coupling (de freitas & sinclair, 2014) or fusing (barad, 2007). i refer to a hyphenated combination “student-tool” to specify this single entity. as such, the focus now is what the new entity can do as opposed to concern over what individuated objects bring to each other. mathematics, tools and students become a single focus, with a singular agency of production. the inclusive materialism, outlined by de freitas and sinclair describe the practice of mathematics as a material engagement of student, movement, tool. the mathematical concept is the intra-action, a human subject with the tool. the concept, therefore, partakes of the physical world; the concept is material. although a classroom can be seen as divided up discretely (ie. projectors, students), all movements, shifts, transformations are continuous and universes are being made known. there is neither origin nor final cause. there is only a weave of becoming. 7. towards post-human methods to address a method of observing students intra-acting with a digital tool is challenging when both ingold and barad argue there are no things a priori to activity. it is important to adopt a more accessible way to engage in an analysis and i draw on pickering who provides a methodology that identifies subject and object and offers an accessible way to engage in language about things. for example, one can identify resistance by looking for obstacles. such a task is not easy from a nonanthropocentric point of view since obstacles are typically framed in relation to what the student is trying to do. nonetheless, i suggest that it is always possible to counter the anthropocentric perspective by taking into consideration that the student must respond to the resistance—which is to say the material agency—of the tool. one can identify an obstacle by watching for sudden changes in action and attempting to decipher whether there was a significant reason for doing so. for example, if a student is using a compass and the graphite piece breaks, and the student subsequently stops using the compass, one can infer the breaking of the tool was an obstacle. to take another example, if in using a ruler to draw a line, the ruler slides slightly so that the angle of the line changes and the student stops drawing the line, then one can identify the sliding ruler as an obstacle. in these examples, the student is not able to perform an action—his or her motions or gestures are restricted. this noticing of obstacles offers a way of identifying the student’s active, in-themoment experiences in mathematical practice. 2 assemblage is a notion introduced by deleuze and guattari, 1980, meaning an emergent unity joining together heterogeneous bodies in a “consistency.” chorney | f l r 51 while pickering’s model will be used as a way of engaging in observation the other theoretical lenses, materialism (drawing on barad as well as de freitas and sinclair) and a process-ontology (drawing on whitehead and ingold) will be mobilized as well. a process approach attends to how particular aspects of students and the tool come together and their role in performing an action together. one can pair a process approach with a materialist view noting that it is important not to assign any aspect of movement or action to a hidden phenomenon such as thinking, but rather only to that which is observable. process attends less to nouns and more to verbs; it is the relation and action between fused people-and-things that are the fundamental building blocks of reality (whitehead, 1978). 8. site of data collection data collection for this study took place at a high school in western canada involving a grade nine class of approximately thirty students. the grade nine class was chosen because the mathematics curriculum for this grade includes a comprehensive geometric component where students work with rotational and line symmetries, reflections, polygons and circle geometry. the students had studied basic polygons and their properties in previous grades. therefore, the pedagogical purpose of the activity was for the students to experience some common polygons as a form of review and renewal. the episode outlined in this paper occurred in a computer lab that the students had not visited before and where students were introduced for the first time to the geometer’s sketchpad (gsp) (jackiw, 2001). they were requested to construct a triangle and a square using gsp. consider the following two episodes: 8.1 episode 1: is this a triangle? two grade nine students, calvin and jonas, constructed a triangle by using the segment tool and also by constructing an interior. they then dragged the triangle all over the screen and when the triangle was half off the screen, one of the boys posed the question, “is this a triangle?” before addressing this interesting question, i will first look at the process of movement involving the body, the tool and the triangle and subsequently identify resistance within that movement. a b c figure.1. calvin’s triangle being translated around the screen figures 1 a, b, and c, are snapshots from a mathematical activity, in which calvin moved the triangle around the screen employing different motions, in different directions and at different speeds. these images, when seen as static hide the previous activity as well as in-between and future movements. in terms of their temporal existence, it might seem like they are static representations of the tool’s affordance; that the student was the agent and the tool passively allowed the student’s predetermined actions. chorney | f l r 52 if movement occurs before thought, as ingold insists, then the hand-action that glides the mouse across the desk, while consequently dragging the triangle’s vertex on the screen, fuses human and tool (hand and mouse) in a continuous flow. this flow of movement saw the triangle move upward, downward, to the left, and line up with the edge of the screen. human-centric theorists might describe this event in terms of calvin’s use of a “random” or “wandering” drag mode (arzarello et al., 2002), but this ascribes all the agency and intention to calvin (who may be wanting to see simply what happens or does not know what else to do). in contrast, through a movement lens, the mind is not observing and leading, it is participating, in parallel with the tools (and the size, as well as friction, of the desk). the movements translate the knowing and learning; the knowing is the translated triangle. after a short period of time, the movement translated into the question “is this a triangle?” to describe the question as separate from the boy’s activity is to miss the process of his movements as well as create a binary divide between thought and action. it theorizes a dual role for the student—one as the mover and the other as the observer—but only one role for the tool: allowing the student to act upon and question it. ingold (2011) reminds us that moving and knowing are the same process; similarly, roth and radford (2011) insist that thinking is not externalized but becomes. it is a challenge to shift from a paradigm that sees the student dragging the shape around the screen and attributing the intention, “oh i can drag the triangle off the screen”, to a paradigm that sees the thinking in the hand-mouse gesture. while calvin is limited in perception by the material tools, including a bordered screen, and in movement, by the shape of his hand and the desk, he nonetheless “feels” the ability of “hiding”, “cutting”, and “dragging off” part of the triangle. doing justice to the movement of a wayfarer requires honouring it as it unfolds, rather than seeing it as a path of transport towards pre-determined ends or as a result of a conscious, deliberate decision. in this episode, the movement seems to give rise to a kind of resistance, or an obstacle, which does not appear until the translating motion of the shape is limited by the edge of the screen. in other words, the edge of the screen only becomes interesting under motion, without it, it may not ever be noticed. when its boundary is approached, reached and then transgressed, it begins to intra-act with the moving hand-mouse, and becomes a participant in occasioning new activity. in this case, the transgression gives rise to a triangle that seems visually truncated at the end of the dragging, while retaining an integrity in time that warrants the name “triangle”. the question, “is this a triangle?” highlights an accommodation-like action. the resistance of the edge of the screen, which “hides” part of the triangle, and the question that emerges becomes part of a story, part of an action. a truncated triangle drawn on a whiteboard would probably not be considered a triangle, especially if the action of the drawing and/or gesturing hand, which may be extending off the edge is ignored. however, in the gesture-triangle-edge intra-action that produces the shape that used to be a triangle, the resistance cannot just be seen as being imposed by the edge of the screen, but also in the question itself. indeed, if it was a triangle, it should still be a triangle, and it would also be a triangle if the edge of the window were enlarged. and surely the gesturing hand is still holding the invisible vertex in place, even when moving it around so as to affect the visible sides of the triangle? 8.2 episode 2: the almost-square students were asked to make a square but often their shape did not hold under dragging, the students tried other methods to create more robust squares. what emerged from this process is what i term an almostsquare (figure 2). they may well have looked liked squares before, but the visible measurements exposed the second decimal place difference in the unequal lengths. chorney | f l r 53 figure.2. an almost-square before using the measure tool, the dragging hand was entangled with the watchful eye, as the quadrilateral changed on the screen. instead of focusing on the student’s intention and the tool’s response, the notion of assemblage draws our attention to the relations among the tool, the student and the concept. once the measuring tool comes into play, as well as the measurements lastingly being visible on the screen, the activity changed. the students began to drag the vertices of the square in order to adjust the measurements so that the side lengths would all match. but as soon as they dragged one vertex to match two sides, a third side would change in length and therefore no longer match the other sides. in response to this, a certain student pair then dragged the vertex associated with the distorted side ever so slightly, in an attempt to make all four sides match, only to find in movement that once again this led to a different kind of disfigurement. this back-and-forth dragging continued for a while. now the almost-square emerged out of the actions of the dragging hand, which changed the measurements on the screen, which provoke further dragging. the relation changed as the segments themselves faded away to give primacy to the numbers, which now dictate the extent to which the shape could be considered a square. the measuring became part of (and set in motion) an activity of adjustment, which can be seen as an accommodation that gave rise to a new assemblage. the resistance manifests itself through the unequal measurements, which nonetheless have the potential to be made equal (or so it may seem), which are related to the known properties of the square. the dragging of the vertex to try to “fix” the almost-square can be seen as an accommodation made by the hand and prompted by the resistance encountered in movement. the properties gain new meanings in that the previous way in which the sides could merely look the same now gets replaced with them all having to be equal in measure, one to the other. it is crucial here to recognize that the dragging is not controlled by the student, but spurred on by the changing measurements. for it is only when the measurements indicate a need for change did the vertex get dragged. accommodation in this episode is not simply overcoming the stubbornness of the tool in an intentional change of strategy; rather, it is about a changing assemblage in which new relations come to the fore, leaving previously central “actors” (like the segments) to recede while new ones (the measurements) emerge. throughout, the hand, tool, eye are in motion, in continual reconfiguration. the on-going struggle for equilibrium is initiated and promoted by this assemblage of technology and student. the process of how the almost-square came to be involves multiple steps. the software’s response, the movement of the hand–mouse, the pointing and the student’s readings of the measurements, all contribute to the movement from which the almost-square emerges. from a process perspective, the almostsquare only exists in the activity of movement; it is dependent upon the intra-action with the student in terms of the initial construction and subsequent “adjustments” and “fine-tunings”. if left alone, it might merely be termed a quadrilateral, but in ongoing movement, it remains an almost-square. this is because in a process approach, we read it not from its shape at any particular point in time but through the process in which it is engaged—which, in this case, is asymptotically becoming a square. therefore, within a process ontology, the almost-square does not exist as a noun, but as movement in a becoming state. 8.3 episode 1: resistance as relational while pickering’s constructs of resistance/accommodation are centered on the tool (which resists) and the user (who accommodates), my reading of the translation of the triangle showed that the resistance chorney | f l r 54 (the edge of the screen) can be seen as highlighting a certain relation—a calvin–dragging relation that encounters an unexpected event that would not exist if each were taken in isolation. in other words, the resistance in this case was a result of activity between the student–tool and not the student alone, nor the tool alone. in addition, the resistance was less about avoiding or overcoming a problem than it was about dwelling in the problem itself. as the triangle was dragged around the screen and eventually partly off the screen, the student–tool experience can be viewed as having provided an opportunity for the student–tool to experience something unexpected, which ended up flushing out a question—is this a triangle?—that reconfigured the relations among the triangle, the dragging and calvin. this prompts a question about when resistance might be seen as productive in a learning environment. the teacher could have insisted that the students restrict their dragging to the contours of the screen, in which case the boundary between triangle and not-triangle would have remained unchallenged. but this is the very boundary that the lakatosian practice of concept formation through monster-adjusting and monster-barring seeks to identify and stress. is the fact that a tool is involved in the process somehow less mathematical? or might we come to see calvin’s question as a legitimate one that perturbs boundaries imposed by the visible and the static in school geometry? in which case, accommodation is not to be seen as a singular response that fixes a transgression, but rather as a setting off of possibilities for engagement. 8.4 episode 2: from resistance as back-and-forth to resistance as narrative in contrast to episode 1, the resistance that was identified in the almost-square episode did not seem to be a result of the fused student–tool, but of fusing itself. that is, the resistance aligned more closely with pickering’s model of a back-and-forth between the student and the tool. this leads to the question of whether there are different levels or layers of resistance. while in episode 1 intra-action seemed to be in the student– tool relation, the identification of resistance in this episode highlighted the back-and-forth between tool and student potentially supporting a perspective that distinguishes between the tool and student. so the question emerges of how to deal with resistance in this kind of example without assuming an a priori individuation of relata. perhaps the back-and-forth interpretation is not as relevant in the context of instant feedback where the measurements change as the dragging changes. if both are changing simultaneously, then the temporal ordering implied by a back-and-forth interpretation is misleading. it encourages a reading in which the student makes an intentional choice based on the outcome of the dragging, which reasserts the primacy of the student over the tool. the back-and-forth activity does not entail a metaphor of transport, but one of wayfaring. it is not that student is moving from an almost-square to a square as might be seen in a transport but that in movement, in the back and forthing, the almost-square emerges. but what might be gained from focusing on the relation rather than the relata? from a baradian point of view, we are forced to reckon with the impossibility of separating the relata, which, in this case, encourages us to attend to the dragging/measuring process. this leads to different questions: instead of asking whether the student will ever succeed in making a square or whether the tool’s measurements will always prevent a square from happening, we ask what new meanings (about “square”) emerge? or, from an ingoldian perspective, we ask what new narratives emerged. indeed, ingold helps with this relata-versus-relation tension by drawing on the construct of things-as-narrative. from this perspective, there is no student, nor is there a tool, before the back-and-forth activity. the student–tool is, in and of itself, a process according to ingold. to identify movement within the student–tool entity is natural through an ingoldian process lens. chorney | f l r 55 9. conclusions the post-human starting point of this paper invites us to move away from the tendency to locate learning and thinking solely within the mind of the learner. because of the tendency to place the learner at the centre of thinking and learning, and to look for changes in how the learner talks, acts and moves, it can be challenging to study mathematical learning situations from a post-humanist point of view. by drawing on some of the main constructs offered in the literature, such as resistance/accommodation, assemblage, intraaction, process and tools as narrative, none of which have been explicitly designed to account for educational research, i have attempted to find out what new insights such post-humanist constructs can provide and what new questions they prompt. de freitas & sinclair (2014) attempt to mobilise the notion of assemblage by referring, for example, to the student-tool-concept as a whole. they also suggest ways both of attending to and of analysing data that re-materialise language and attempt to include broader material dimensions of activity, instead of simply attending to spoken words or to gestures—highlighting other sounds and rhythms that shape the ongoing activity. the construct of resistance/accommodation was fruitful in identifying turning points in the two episodes, but also potentially invited a human-centric view in which the tool ends up being subordinate to the human. the constructs of intra-action and assemblage were then brought in as starting points for overcoming a human-centric perspective. these new constructs drew attention to the changing nature of relations and, in turn, to the evolving assemblages involved in each episode. however, it is challenging to describe mathematical learning in terms of assemblages. we may be able to see difference, but know very little about whether the difference is mathematically relevant. the shift to the physical reconfigures what it means to do mathematics. as opposed to a theorizing of how the mind is learning, responding or interiorizing, this shift draws back to the “in-the-moment”, embodied experience of tooling, rather than using a tool—much as we talk about walking rather than using our legs. considering the material and temporal experiences is to study mathematics education in a posthuman and materialist way, but one, that as discussed, does have challenges. to think about mathematics in the classroom involving “tooling” rather than using tools, requires avoiding the tendency and tradition of isolating nouns, like “student”, “tool” and “concept” and locating meaning in their entanglement. this is particularly difficult when wishing to discuss particular aspects of an assemblage, as the tendency is then to detach that aspect or part from the whole. this approach demands that when discussing any part of an assemblage, the focus should be on the part’s role in the intra-activity, which is to say, how it acts as a verb and how it contributes to the assemblage’s movement. a process ontology, the actions of student as wayfarer, and especially the notion of tool as narrative, draw attention to movement and temporality. movement and temporality were vital in interpreting the exercise involving the triangle and the exercise involving the almost-square; the triangle and the almostsquare would not have emerged—would not have meaning—if only the student or only the tool were considered as isolated and inert nouns. it was the doing—the fused nouns participating in the temporal verb of acting—that was the mathematical practice. to look at any single movement of transport from one static point to another, within the process of the student–tool assemblage changes the meaning entirely: one might miss the triangle entirely if, for example, it was only seen when it existed partly-off screen; one might only see a quadrilateral (noun) instead of the in-action almost-square. in analysing mathematical practice through a relational and verb-oriented process ontology, in seeing the student–tool as a line of becoming, the significance of the embodied experience and the material world are animated, consequentially changing the understanding of mathematics teaching and learning. this study was motivated and mobilized by a reconfiguration of perspective that challenged presupposed objects and their subsequent interactions. this study did not specifically address the age old philosophical quandary of how material and mental combine but helped generate a shift to addressing how the fusing of student-tool reaches out into the larger classroom environment with questions like “is this a triangle?” or “is this a well-constructed square?” these questions emerge from a paradigm of intra-action chorney | f l r 56 and not solely a mental contemplation. i suggest there is significant difference in these two perspectives. in this study’s approach, mathematics is less a label for a specific type of practice than it is a name for a specific kind of coupling. references artigue, m. (2002). learning mathematics in cas environment: the genesis of a reflection about instrumentation and the dialectics between technical and conceptual work. international journal of computers for mathematical learning, 7:245-274. arzarello, f., olivero, f., paola, d., & robutti, o. (2002). a cognitive analysis of dragging practises in cabri environments. zentralblatt für didaktik der mathematik, 34(3), 66-72. balacheff, n. (1988). aspects of proof in pupils’ practice of school mathematics. in d. pimm (ed.), mathematics, teachers and children. p. 216-235. london: hodder & st. barad, karen. (2007). meeting the universe halfway: quantum physics and the entanglement of matter and meaning. durham, n.c.: duke university press. bartolini bussi, m. g., & mariotti, m. a. (2008). semiotic mediation in the mathematics classroom: artifacts and signs after a vygotskian perspective. in l. d. english (ed.), bussi, m. b., jones, g. a., lesh, r. a., sriraman, b. (assoc. eds.), handbook of international research in mathematics education. new york: routledge. châtelet, g. (2000). figuring space: philosophy, mathematics and physics. dordrecht, the netherlands: kluwer. de freitas, e. & sinclair, n. (2014). mathematics and the body: material entanglements in the classroom. cambridge university press. deutscher, g. (2010). through the language glass: why the world looks different in other languages. henry holt and co. new york. dietiker, l. (2013). mathematical texts as narrative: rethinking curriculum. for the learning of mathematics, 33(3), 14-19. goodchild, s. (2014). enactivist theories. in s. lerman (ed.), encyclopedia of mathematics education. heidelberg: springer. hesse, m. (1980). revolutions and reconstructions in the philosophy of science. bloomington: indiana university press. ingold, t. (2011). being alive: essays on movement, knowledge and description. london: routledge. latour, bruno. (1987). science in action: how to follow scientists through society. cambridge, ma: harvard university. jackiw, n. (2001). the geometer’s sketchpad. emeryville, ca: key curriculum press. lyotard, jean-françois. (1984). the postmodern condition: a report on knowledge. trans. geoff bennington and brian massumi. minneapolis: university of minnesota. nemirovsky, r., kelton, m. l. & rhodehamel, b. (2013). playing mathematical instruments: emerging perceptuomotor integration with an interactive mathematics exhibit. journal for research in mathematics education, 44(2), 372–415. pickering, a. (1995). the mangle of practice: time, agency, and science. chicago: the university of chicago press. roth, wolff-michael. (2011). passibility: at the limits of the constructivist metaphor. springer, netherlands. roth. w-m., & radford, l. (2011). a cultural historical perspective on mathematics teaching and learning. sense publishers. rotman, b. (2008). becoming beside ourselves: the alphabet, ghosts, and distributed human being. duke university press, london. ruthven, k. (2002). instrumenting mathematical activity: reflections on key studies of the educational use of computer algebra systems. international journal of computers for mathematical learning, 7:275-291. chorney | f l r 57 varela, f.j. & thompson, e. & rosch, e. (1991). the embodied mind: cognitive science and human experience (first mit press paperback edition, 1993) cambridge, massachusetts, london, england: the mit press. verillon, p. & rabardel, p. (1995). cognition and artifacts: a contribution to the study of thought in relation to instrumented activity. european journal of psychology in education, 9(3): 77-101. waltz, s. (2006). nonhumans unbound: actor-network theory and the reconsideration of “things”. in educational foundations, summer-fall. whitehead, a. n. (1978). process and reality. new york: the free press. microsoft word nyberg_proofs.docx frontline learning research vol. 10 no. 1 (2022) 25 45 issn 2295-3159 corresponding author: kristin nyberg, university of education, institute of psychology, kunzenweg 21, 79117 freiburg, germany, kristin.nyberg@ph-freiburg doi: https://doi.org/10.14786/flr.v10i1.955 self-effective scientific reasoning? differences between elementary and secondary school students kristin nyberg1, susanne koerber1, christopher osterhaus2 1 university of education freiburg, germany 2 university of vechta, germany article received 21 september 2021 / article revised 14 may 2022 / accepted 13 june / available online 24 june abstract although scientific reasoning is not a formal, independent school subject, it is an increasingly important skill, especially for student learning in science, technology, engineering, and mathematics (stem) subjects. to promote scientific reasoning effectively, it is important to know its influencing factors. while cognitive influences have been investigated, affective-motivational factors, particularly self-efficacy, have rarely been considered in studies on scientific reasoning. to examine, for the first time, whether self-efficacy can be measured in a task-specific way and whether self-efficacy correlates with students’ scientific reasoning performance, the study assessed performance in scientific reasoning and self-efficacy (academic and task-specific) in a sample of 140 fourth graders and 148 eighth graders. as expected, higher correlations emerged for task-specific self-efficacy in both grades. a hierarchical cluster analysis showed that the correlational patterns were not the same across grade levels, with differences in self-estimated performance prevailing between the two grade levels: the largest cluster in grade 4 (41%) comprised children who significantly overestimated their performance, whereas the largest cluster in grade 8 (39%) comprised students who gave a realistic estimate of their own performance in scientific reasoning. this cluster was not present in grade 4. additional clusters of students who overestimated or underestimated their performance emerged in both grades. the results support the conclusion that self-efficacy expectations are important to consider when fostering scientific reasoning, and the large number of elementary school students who overestimated their performance suggests that not all students might benefit from interventions targeted at increasing self-efficacy. keywords: scientific reasoning, affective-motivational factor, self-efficacy, elementary school, secondary school nyberg, koerber & osterhaus 26 | f l r 1. introduction fostering science, technology, engineering and mathematics (stem) education in school and integrating technology and engineering into science education is a major challenge in recent years (bybee, 2010). nevertheless, stem education should not be reduced to content knowledge but instead should incorporate the development of scientific attitudes and critical, scientific reasoning or thinking (osborne, 2013). this broader conceptualization of science skills is mirrored in the conceptualization of scientific literacy in the pisa studies (oecd, 2006, 2015), which encompasses, apart from science content knowledge, scientific reasoning and personal affective-motivational factors. scientific reasoning can be viewed as a complex construct which includes several components, such as experimentation skills or understanding the nature of science. previous studies showed evidence for a common conceptual core in scientific reasoning (koerber et al., 2015) and validated group tests exist to reliably measure scientific reasoning in elementary school and above (e.g., koerber et al., 2015; osterhaus et al., 2015). for fostering scientific reasoning skills, it is important to know factors influencing scientific reasoning. most research in the area addresses the impact of general cognitive factors like language, intelligence, problem solving, executive functioning, and specific variables (e.g., advanced theory of mind) on the development of scientific reasoning (koerber et al., 2015; for an overview see zimmerman, 2007). research on the influence of affective-motivational factors in scientific reasoning, has been scarce, despite that affective-motivational constructs like self-beliefs and motivation are essential for academic achievement (cai et al., 2018; pajares, 1996; pajares & valiante, 1997). among affectivemotivational variables, the impact of self-efficacy expectations on academic performance is particularly strong, predicting up to 9% of academic performance in university students (richardson et al., 2012). self-efficacy expectations are classified as competence beliefs based on bandura’s social cognition theory (bandura, 1986; 1997). self-efficacy expectations are understood as the belief in one’s own ability to cope with a future situation or task. according to bandura (1997), the precise measurement of self-efficacy should include items that measure beliefs about expectations and performance as close as possible to the identical future tasks or situations, and the more specifically they are assessed, the more accurate the obtained action predictions will be (bandura, 1997; schwarzer & jerusalem, 2002). students’ expectations of self-efficacy are considered to play an important role in the school context. self-efficacy expectations positively influence academic performance, motivational processes, self-regulation, self-perception, and interest (bandura & schunk, 1981; klassen & usher, 2010; pajares & valiante, 1997; schunk, 1995). students with positive self-efficacy expectations spend more time on challenging tasks and situations. hence, they have the chance to receive more feedback compared to their classmates with lower self-efficacy expectations, who consequently avoid tasks and situations in this domain (britner & pajares, 2001; zeldin & pajares, 2000). when asking students in grades 6, 7 and 8 how skilled they would be at solving a math problem, they mainly use their experience with similar math problems to this point, as a basis for their judgment (usher & pajares, 2009). therefore, it is expected that the relation between performance and self-efficacy may be stronger at higher grade levels based on experience. researchers agree on the positive correlation between self-efficacy expectations and academic success, mostly referring to typical subjects of the curriculum such as mathematics (siefer et al., 2020) or writing (pajares & valiante, 1997). a meta-analysis by multon and brown (1991) with a total sample of roughly 5000 participants reported a correlation of r = .38 between academic performance and selfefficacy expectations. this result is backed up years later by a meta-analysis of honicke and broadbent (2016). they found a moderate positive correlation between the study performance of university students and their self-efficacy of r =.33 across 59 studies. according to bong (2006), the correlation between self-efficacy expectations and academic performance appears to be higher when measuring self-efficacy more specifically, which is consistent with bandura’s (1997) theory that self-efficacy expectations are context specific. in the following study, the assessment of self-efficacy is based on bandura's theory of measuring self-efficacy as specifically as possible, i.e. task-specific. nyberg, koerber & osterhaus 27 | f l r given this context-dependent requirement of a self-efficacy measure, generalizing results to other disciplines seems difficult, which consequently requires investigation in other domains, subjects, and skills. research dealing with self-efficacy and skills that are not included as an own subject in the curriculum but have an important relation to other subjects and skills like scientific reasoning is of great interest. liu et al. (2006), for example, found that science-class self-efficacy expectations of sixth-grade students (e.g., “i am confident i can learn the basic concepts taught in this science class”) correlated significantly with understanding science concepts (r =.28). jansen et al. (2015) investigated performance in scientific literacy and self-efficacy. the sample, which originated from the 2006 pisa survey in germany, included roughly 5000 secondary school students, most of whom were grade 9 at the time of the survey. the self-efficacy scale used in the study included eight items that assessed how confident they thought they were in solving a task in scientific literacy (e.g., predict how changes in an environment will affect the survival of certain species). the items that measured performance included real-world science tasks from the field of science, resembling the conceptualization of scientific literacy of the pisa study (bybee et al., 2010; oecd, 2006). the results of the study suggest that self-efficacy is a significant predictor of scientific literacy (jansen et al., 2015). even though a positive relation between performance and self-efficacy has been found in many domains, it does not necessarily imply that higher self-efficacy is always related to higher performance. students may overor underestimate their performance, which can hinder good performance (bandura, 1986). students who (strongly) overestimate their performance may not invest in learning the specific task or acquiring metacognitive skills (e.g., strategies to achieve goals), because they are confident to master the situation or task without investing more time and effort (hadwin & webster, 2013). identifying these groups of students is particularly important given that not all students might equally benefit from a direct increase in self-efficacy. the present study charters new territory: it investigates whether self-efficacy in scientific reasoning can be measured in a task-specific way. while this approach is similar to what has been shown in a specific area of mathematics (siefer et al., 2020), it is the first time that self-efficacy expectations are assessed in scientific reasoning in two ways of specificity (academic and task-specific). in addition, the study investigates whether there are groups of students with different levels of task-specific selfefficacy and performance in scientific reasoning. a sample of fourth and eighth graders was selected. eight graders were included because they had more opportunities to practice scientific reasoning and gain feedback on it, despite being more limited than in mathematics or sports. the fourth graders were chosen because previous studies, examining self-efficacy expectations had predominantly recruited secondary school and university students. moreover, studies show that perceived academic self-efficacy declines between sixth and eighth grade (harter, 1985). the decline can be described by a tendency of elementary school children to have unrealistically high beliefs about their competence (overestimate their competence) whereas as they get older, their self-judgement increasingly matches external evaluations (nicholls, 1978; stipek & hofman, 1980; pajares & schunk, 2001). this study intended to shed light on the beginnings of this relation. the study focuses on five key questions: (1) is our self-efficacy scale suitable to measure taskspecific self-efficacy in scientific reasoning for grades 4 and 8? (2) what is the relation between students’ performance in scientific reasoning and their self-efficacy expectations, specifically is this relation already observable in elementary school? (3) are the correlations higher when measuring taskspecific self-efficacy instead of academic self-efficacy? (4) and, are there differences in the strength of the relation between both grade levels? (5) finally, do diverse clusters exist which categorize students along the task-specific self-efficacy-performance relation (e.g., overor underestimate and high and low performer)? this last research question pertains to the question whether the relations within the sample are homogeneous or if interindividual differences exist. since on the one hand, both, self-efficacy and scientific reasoning, can be furthered through intervention or training (sodian et al., 2002; margolis & mccabe, 2006), and on the other hand, task-specific self-efficacy measurement provides an accurate insight into self-assessment and related task performance, this allows to specifically focusing on what nyberg, koerber & osterhaus 28 | f l r needs to be fostered: self-efficacy or scientific reasoning. getting better at scientific reasoning skills is especially crucial in relation to the performance in scientific literacy and the important stem subjects. 2. method 2.1 participants the sample consisted of n = 288 students (155 girls, 133 boys), n = 140 fourth graders (m = 9 years, 2 months, sd = 4 months) and n = 148 eighth graders (m = 13 years, 3 months, sd = 6 months). the students went mostly to 14 middle-class schools close to a mid-sized city in southern germany. the eighth graders all attended academic track schools (gymnasium). the data were collected in the first half of the academic year between october 2018 and january 2019. of the 288 children, 68 (23.6%) spoke at least one language other than german at home (the most frequently reported languages were russian, english, and french). student assent and written consent from caretakers were obtained for all participants. institutional review board (irb) approval was not sought for the study because the host institution did not have an established irb. 2.2 materials 2.2.1 scientific reasoning the performance in scientific reasoning was measured with six multiple-select tasks, three tasks testing students’ understanding of the nature of science (nos; cronbach’s a grade 4 = .53, grade 8 = .46) and three tasks testing their metaconceptual understanding of experimentation (unex cronbach’s a grade 4 = .52, grade 8 = .51). since previous studies revealed that the subcomponents of scientific reasoning skills share a common conceptual core among the subcomponents (koerber et al., 2015) and there is a correlation between nos and experimentation skills (osterhaus et al., 2017), the two scientific reasoning scales were collapsed and used as a single performance measure for scientific reasoning in the subsequent analysis (entire scale cronbach’s a grade 4 = .49, grade 8 = .48). further support for the use of the scale is the result of the study by osterhaus et al. (2020), who examined a short scale (spr-i(7)) for validity and reliability and showed similar reliabilities to the six items we used. nos. three nos tasks (a01 scientist, a03 middle ages and a11 mistakes in science) were selected from the science-p reasoning inventory (koerber et al., 2015; osterhaus et al., 2020). these three nos tasks assessed children’s understanding of what scientists do and the understanding of the hypothesis-evidence relation (see appendix for a sample item). for each task, three answer options were presented to the students (on a naïve level = 0 points, an intermediate level = 1 point, a scientifically advanced level = 2 points), and they were asked to agree or disagree with each of the answer options. the lowest level answer selected was taken as the final score on the entire item. thus, the children could obtain a maximum of two points per task and a maximum sum score of six (see osterhaus et al., 2020 for further coding details). unex. the unex tasks were taken from the study of osterhaus et al. (2015). the students were given three imaginary described experiments (trees u1, math textbook u2, classroom u3). the students were asked to decide whether it was a good or bad experiment to test the different assumptions (see appendix for a sample item). children were assigned 2 points, when they selected the correct answer and simultaneously rejected the wrong alternatives. one point was given for selecting the correct answer and the intermediate level but rejecting the naïve level (see coding of the nos tasks). nyberg, koerber & osterhaus 29 | f l r 2.2.2 self-efficacy expectations the self-efficacy expectations were measured on two levels: academic self-efficacy and task-specific self-efficacy expectations. academic self-efficacy. the academic self-efficacy expectations were measured with three items, used from jerusalem and satow (1999). the scale (wirkschul) originally comprises 7 items and was validated on 3000 secondary school students. the entire scale was used in our pilot study. from the subsequent interviews and the reliability tests, two items emerged as having poor reliability. in order to achieve a sufficient reliability in both grade levels for the analysis of the main survey, two additional items had to be excluded. the children were asked to indicate their agreement on a 4-point likert scale, ranging from “strongly disagree” (1) to “strongly agree” (4) with the following 1) “i can solve even the difficult tasks in class if i exert myself.” 2) “even if i am sick for a longer time, i am still able to perform well” 3) “if a teacher doubts my skills, i am sure that i can still perform well”. the scale was applied in german and the description of the items in the text are own translations. cronbach’s a was .61 for grade 4 and for grade 8 =.55. the reported reliabilities of > 5 can be interpreted for scales especially with few items (e.g., nunnally & bernstein, 1978). the corrected item-total correlation (rit) of .56-58 also point to the reliability of the scale. criterion validity is provided by the significant relations to task-specific self-efficacy (grade 8 r= .488 and grade 4 r= .360), school self-concept (grade 8 r= .690 and grade 4 r= .446), and interest (grade 8 r= .229 and grade 4 r= .180). task-specific self-efficacy. in this study a task-specific self-efficacy scale was used. in contrast to mathematics and other familiar academic areas, scientific reasoning tasks are less familiar to especially fourth graders, thus their respective performance might seem less predictable to them, more so since this competence is usually not explicitly taught in school as an own subject. the results of a pilot study with 57 fourth and eighth graders corroborated this impression. based on the participants feedback and in accordance with general guidelines on scale construction (bandura, 2006; moosbrugger & kleava, 2012) we designed a very task specific self-efficacy scale (3 items), in which the respective scientific reasoning task and the self-efficacy scale were presented together. the task-specific self-efficacy were assessed with the following three items together with the scientific reasoning task before the students were asked to solve each of the six scientific reasoning tasks: 1) “i know how to deal with the task” 2) “i am very familiar with such tasks and know how to solve the task” 3) “i would need help to solve the task.”. the likert scale ranged from “strongly disagree” (1) to “strongly agree” (4). 2.2.3 control variables with a proficiency test (elfe 1-6; lenhard & schneider, 2006) text comprehension was measured. to measure nonverbal intelligence a subtest of the cultural fair intelligence test (cft; weiß, 2006) was applied. 2.3 procedure in both age groups, the testing was conducted as a whole-class testing procedure, with each person working individually on their own booklet. before answering each scientific reasoning task, the students were asked to fill in the items of the task-specific self-efficacy. in a first step, they were instructed to look at the scientific reasoning task about 25 seconds but not to try to solve the task. the time of 25 seconds appeared in a pilot study to be the best time to give the students enough time to look over the task but not to try to solve it. after they rated the task-specific self-efficacy item, they solved the scientific reasoning task. to avoid confounding effects of reading ability, the items were presented by a powerpoint presentation and read aloud by an experimenter. the testing took about 60 minutes. nyberg, koerber & osterhaus 30 | f l r 3. results 3.1 core performance and suitability of the task-specific self-efficacy scale 3.1.1 scientific reasoning fourth graders scored an average of 5.23 (sd = 3.06) out of 12 points (43.6%), whereas eighth graders performed significantly better, scoring an average of 7.96 (sd = 2.46) out of 12 points (66.3%), t(277) = -6.03, p < .01. according to cohen (1992), the effect size of r = .60 is strong (see figure 1). 3.1.2 self-efficacy figure 1 shows the mean percent of academic and task-specific self-efficacy expectations transformed to a percentage scale (low to high feelings of self-efficacy). no significant differences in academic self-efficacy were found between grades (t(277) = 1.46, p = .15), between males and females in grade 8 (t(148) = -.294, p = .77) and grade 4 (t(140) = -1.25, p = .13). in task-specific self-efficacy, the eighth graders had significantly higher values than the fourth graders, t(279) = -3.36, p < .01 (see figure 1). the effect size was r = .04, representing a small effect (cohen, 1992). no significant difference was found between males and females in their task-specific self-efficacy in grade 8, t(146) = 1.92, p = .06, and in grade 4, t(139) = -.113, p = .91. figure 1. comparison of mean performance in scientific reasoning, task-specific and academic selfefficacy between grade 4 and 8. **p< .01, sr= scientific reasoning, se= self-efficacy the internal consistency in both grades (cronbach’s a grade 4 = .89, grade 8 = .88) were high as well as the corrected item-total correlations (rit grade 4 = .59-77, rit grade 8= .60-.82) which indicate a good reliability of the scale in both grade levels (table 1; bortz & döring, 2006). nyberg, koerber & osterhaus 31 | f l r table 1 corrected-item-total correlations of the task-specific self-efficacy for grades 4 and 8 rit-i nos unex 1_1 1_2 1_3 2_1 2_2 2_3 3_1 3_2 3_3 4_1 4_2 4_3 5_1 5_2 5_2 6_1 6_2 6_3 g 4 .70 .64 .68 .73 .73 .77 .77 .76 .75 .62 .59 .65 .59 .63 .67 .68 .64 .65 g 8 .62 .60 .61 .75 .81 .78 .68 .73 .76 .74 .76 .77 .65 .60 .82 .79 .80 .75 g=grade, 1= a01 scientist, 2= a03 middle ages, 3= a11 mistakes in science, 4= u1 trees, 5= u2 math textbook, 6= u3 classroom content and criterion validity are ensured by the process of scale construction and correlations in line with theoretical findings (marsh, 2018; schukajlow et al., 2012). significant relations were found with academic self-efficacy (grade 4 r= .360, grade 8 r= .488), academic self-concept (grade 4 r= .395, grade 8 r= .329), and interest (grade 4 r= .542, grade 8 r= .467). furthermore, we tested whether there was measurement invariance for the task-specific selfefficacy scale across grades 4 and 8, using r and the lavaan package (rosseel, 2012). the model shows scalar measurement invariance, following the criterion that maximum change per level in comparative fit index [cfi] and root-mean-square error of approximation [rmsea] should not exceed a certain threshold (cfi ≤ -.010 and rmsea ≤ .015; chen, 2007; for further discussions see cheung & rensvold, 2002). fit indices were as follows: configural (cfi= .997; rmsea= .043), metric (cfi= .994; rmsea= .051) and scalar (cfi= .989; rmsea= .066). all fit indices of the model accuracy remain in a good range (e.g., hu & bentler, 1999), and the maximum-change thresholds for cfi and rmsea are not exceeded by the scalar model. scalar invariance implies that the factor structure, factor loadings, and intercepts remain invariant across the two grade levels. thus, a comparison of items between participants from different groups (here grade 4 and 8) is possible with regard to the latent variable measured (here task-specific self-efficacy). 3.2 relation between scientific reasoning and self-efficacy 3.2.1 academic self-efficacy and scientific reasoning as shown in table 2, academic self-efficacy expectations correlated significantly with scientific reasoning in grade 8 but not in grade 4. no significant differences in the correlation in grade 8 between females and males were found (p = .34). the correlations were controlled for intelligence and reading ability. 3.2.2 task-specific self-efficacy and scientific reasoning performance in scientific reasoning and task-specific self-efficacy correlated significantly in both grades (table 2). again, no significant differences in the correlations between females and males were observed (grade 4, p = .45; grade 8, p =.24). the correlations were controlled for intelligence and reading ability. nyberg, koerber & osterhaus 32 | f l r table 2 correlations between scientific reasoning and academic self-efficacy or task-specific self-efficacy ***p<.001, *p<.05, se = self-efficacy 3.3 cluster analysis the correlation analysis showed a positive relation of task-specific self-efficacy and the performance in scientific reasoning, but the magnitude of the correlation could point to heterogenous patterns within the correlation. a hierarchical cluster analysis was performed to investigate whether there are clusters of students who underestimated or overestimated their performance in relation to their actual performance. therefore, the values of the performance in scientific reasoning and the taskspecific self-efficacy were z-standardized. for the performance, the z-standardized residuals were applied to eliminate the common variance between the performance and task-specific self-efficacy. the squared euclidean distance was taken as an approximate measure. as a clustering algorithm, the ward method was selected. ward's method has been shown to give better results with small clusters (e.g., everitt et al., 2001). this procedure was similar to the approach of previous studies (e.g., hallet et al., 2010; siefer et al., 2020). the cluster analysis was conducted separately for grades 4 and 8 because different profiles in the agreement between scientific reasoning and self-efficacy could be expected in each age group. the analysis revealed a four-cluster solution for the fourth graders and a five-cluster solution for the eighth graders. the number of clusters was determined based on dendrograms and cophenetic correlations. preliminary, on the whole no gender differences were found, except in the very small cluster 1 and 5, in grade 4 and grade 8 respectively. grade 4. cluster 1 (underestimators/good performance) included 18 students (14.5%) who showed the highest performance in scientific reasoning across all clusters. task-specific self-efficacy was also rated high, but it was below their performance. in other words, this group of students slightly underestimated their performance (see figure 2). the cluster contained significant more females than males, ꭕ2(1, n=17) = 3.56, p < .05). cluster 2 (overestimators/poor performance) was the most frequent cluster (n = 51, 41.1%). students performed below average, but despite their weak performance, they showed high task-specific self-efficacy indicating overestimation of their performance (see figure 2). no significant difference between the number of female or male students in this cluster was observed. cluster 3 (strong underestimators/good performance) (n = 24, 19.4%) could best be described as having high-performance and low task-specific self-efficacy. the students seemed to largely underestimate their performance (see figure 2). again, no significant difference was found in the number of female and male students. the last cluster 4 (slight underestimators/poor performance) contained n = 31 (25%) students showing poor performance and low task-specific self-efficacy. the task-specific self-efficacy was even poorer than the performance, which showed that students underestimate their performance (see figure 2). again, there was no significant difference between the numbers of female and male students. academic se task-specific se grade 4 scientific reasoning .076 .194* grade 8 scientific reasoning .253*** .290*** nyberg, koerber & osterhaus 33 | f l r figure 2. showing the resulting clusters with performance in scientific reasoning (z-standardized) and task-specific self-efficacy (z-standardized) in grade 4. cluster 2 is the most frequent cluster. se = self-efficacy a one-way analysis of variance (anova) was performed to statistically confirm the differences between the clusters. in grade 4, cluster assignment had a significant effect on the performance in scientific reasoning, f(4,131) = 90.567, p < .001. the performance of the different clusters differed significantly among students across different clusters, with only the performance in cluster 2 (m = -0.49, sd = 0.56) not being significantly different from the performance in cluster 4 (m = -0.70, sd = 0.50). cluster assignment also significantly affected the level of task-specific self-efficacy (f(4,130) = 91.05, p <.001). all clusters varied significantly in their level of task-specific self-efficacy, except for cluster 1 (m = -0.74 sd = 0.48) and 2 (m = -0.68 sd = 0.67), which were not significantly different. grade 8. the most frequent cluster in grade 8 was cluster 1 (realistic estimators/good performance) with n = 52 (39%) students. the students showed an above-average performance and a high task-specific self-efficacy, indicating that the students’ self-evaluated self-efficacy was close to their actual performance (see figure 3). no significant difference between the number of female or male students in this cluster was observed. cluster 2 (strong overestimators/very poor performance) contained n = 9 (6.7%) students with a poor performance and a high task-specific self-efficacy, indicating a high overestimation of their performance (see figure 3). again, no significant difference was found in the number of female and male students. in cluster 3 (strong underestimators/good performance) n = 21 (15.8%), students had a high performance and a poor task-specific self-efficacy. the students were high performance underestimators (see figure 3). there was no significant difference between the numbers of female and male students in this cluster. students (n = 25, 18.8%) in cluster 4 (overestimators/very poor performance) demonstrated poor performance and average task-specific self-efficacy, indicated below-average overestimation. also, no significant difference between the number of female or male students in this cluster was observed. cluster 5 (underestimators/average performance) with n = 28 (19.5%) included students with an average performance and a poor task-specific self-efficacy. the students underestimated their performance. cluster 5 was the only cluster to have a significant difference between the number of female and male students with significantly more males in the cluster, ꭕ2(1, n = 25) = 3.85. p < .05. nyberg, koerber & osterhaus 34 | f l r anovas were also conducted for grade 8 to test for differences between the clusters. cluster assignment had a significant effect on the performance in scientific reasoning, f(5,140) = 79.90, p < .001. the performance of the clusters differed significantly, except between cluster 2 (m = -1.28, sd = 0.42) and cluster 4 (m = 1.31, sd = 0.65). cluster assignment also significantly affected the level of task-specific self-efficacy, f(5,139) = 82.08, p < .001. all clusters varied significantly in their level of task-specific self-efficacy, except between cluster 3 (m = -0.75 sd = 0.53) and 5 (m = -0.88 sd = 0.40). figure 3. showing the resulting clusters with performance in scientific reasoning (z-standardized) and task-specific self-efficacy (z-standardized) in grade 8. cluster 1 is the most frequent cluster. se = selfefficacy 4. discussion the present study addressed five questions (1) is our self-efficacy scale suitable to measure task-specific self-efficacy in scientific reasoning for grades 4 and 8? (2) what is the relation between students’ performance in scientific reasoning and their self-efficacy expectations, specifically is this relation already observable in elementary school? (3) are the correlations higher when measuring taskspecific self-efficacy instead of academic self-efficacy? (4) and, are there differences in the strength of the relation between both grade levels? (5) finally, do diverse cluster exist which categorize students along the task-specific self-efficacy-performance relation (e.g., overor underestimate and high and low performer)? 4.1 core performance and suitability of the used scales as expected, performance in scientific reasoning was significantly higher in grade 8 than in grade 4. nonetheless, no ceiling effect was observed in the performance of the eighth graders. this finding suggests that performance in scientific reasoning continues to develop through the elementary school years into the secondary school years and may not be completed by the end of the secondary school years, which is in line with previous findings (bullock et al., 2009; bullock & ziegler, 1999; koerber et al., 2015). when studying students’ self-efficacy, it should be emphasized that self-efficacy could be measured task-specifically for scientific reasoning skills– in elementary and secondary school students. and, in contrast to many other studies, the present study investigated two levels of self-efficacy: nyberg, koerber & osterhaus 35 | f l r academic self-efficacy and task-specific self-efficacy. the item characteristics and criterion-related correlations show comparable values for grade 4 and 8 and indicate that the scales can be applied in both grades. the results related to academic self-efficacy should be interpreted carefully, as the scale does not show high internal consistency, nevertheless (significant) correlations were found, which indeed could have been stronger with a scale having higher internal consistency. nevertheless, for a replication of the results, especially in relation to the academic self-efficacy scale, the low internal consistency should be included in further considerations. the item characteristics, correlations of the task-specific self-efficacy scale, the high internal consistency and scalar measurement invariance across both grade levels suggest the reliable and valid use of the scale for grades 4 and 8. the two grades revealed similar academic self-efficacy with no significant difference (grade 8: 64% vs. grade 4: 62%). the task-specific self-efficacy, however, differed significantly between the two grades. eighth graders reported significantly higher task-specific self-efficacy than fourth graders. this finding is consistent with the development of self-efficacy expectations as described by bandura (1997) who identified different sources of self-efficacy. two of the crucial sources for building self-efficacy expectations may be previous mastery experiences and vicarious experiences (usher, 2009). mastery experiences occur when students experience feelings of success on a particular task, which engenders the belief that they can succeed in the task again. vicarious experiences refer to observing someone else perform the task and solve it. students learn from this observation that they may also succeed at the task. young students often have little opportunity to benefit from such experiences in building their selfefficacy beliefs in this specific area because they are not so familiar with scientific reasoning tasks yet. however, eighth graders are more likely to have the opportunity in school to work on science problems in physics or nwt (science and technology) than fourth graders. therefore, it is not surprising that eighth graders rated themselves significantly higher in task-specific self-efficacy. 4.2 relation between scientific reasoning and self-efficacy the correlation between academic self-efficacy and scientific reasoning was found only in grade 8. task-specific self-efficacy and performance in scientific reasoning correlated significantly across grades 4 and 8, with a higher correlation emerging for eighth graders. compared to other studies, we found correlations that tended to be lower. multon and brown (1991) reported a mean correlation of r = .38 in their meta-analysis, and honicke and broadbent (2016) found a similar result of r = .33. however, studies with different samples and heterogenous instruments were included in the studies. in a study by siefer et al. (2020), task-specific self-efficacy correlated with the performance in a specific mathematical area. they reported a correlation of r = .39 between task-specific self-efficacy and the students’ performance on a test of linear functions in grade 8 and 9. these studies also would have hinted towards a higher agreement between performance and self-efficacy if self-efficacy was measured specifically. the lower correlations in our study may have resulted from scientific reasoning not being assigned as an independent subject to the curriculum. receiving feedback and benefiting from mastery experiences in scientific reasoning is more difficult for scientific reasoning tasks (as opposed to mathematics). consequently, students receive less formal feedback on their performance (e.g., in the form of grades). fourth-grade students especially have little opportunity to profit from mastery experiences in scientific reasoning because the opportunities to work on science problems, as required in the here used scientific reasoning tests, are missing. in general, in contrast to self-efficacy measures concerning science content knowledge (e.g., in the 2015 pisa study; schiepe-tiska et al., 2016), we observed no gender differences neither in grade 4 nor in grade 8 in the (scientific reasoning) task-specific self-efficacy and the correlations. this is consistent with previous findings (e.g., koerber et al., 2015) who also reported no gender differences in scientific reasoning. nyberg, koerber & osterhaus 36 | f l r 4.3 differences in the agreement between scientific reasoning and task-specific self-efficacy the positive correlations between performance and task-specific self-efficacy in grades 4 and 8 suggested that higher performance is associated with higher task-specific self-efficacy. however, the rather low correlation coefficients could also indicate interindividual differences between students. for that purpose, a hierarchical cluster analysis was conducted. looking at the found clusters the most noteworthy result was the largest cluster in grade 8, which contained the students who judged their performance realistically and showed a high performance. this fits with the findings of siefer et al. (2020), which also showed that the largest group of the sample (33%) realistically assessed themselves in the domain of linear functions in grades 8 and 9. the cluster with the students, who realistically judge their performance, however, was nonexistent in grade 4. many fourth-grade students were assigned to the cluster of overestimators who performed poorly. this in line with previous research findings that elementary school students are more likely to overestimate themselves, in contrast to the beginning of secondary school, where self-efficacy appears to decline (harter, 1985; zimmerman, 1995). students in grade 4 seemed to mostly overestimate or underestimate their abilities. being able to realistically judge one’s performance requires the skill to realistically judge the required task at an abstract level, which could be difficult for fourth graders. thus, the result for grade 4 students could have reflected in a nonrealistic judgment of their performance. for example, a study by kruger and dunning (1999) suggests the skills needed for a certain task or domain are the same skills needed to evaluate one’s performance. a reasonable assumption is that the fourth graders, who are not as good at scientific reasoning as the eighth graders, may not be as skilled at judging their skills and may consequently overestimate or underestimate their scientific reasoning skills. students become more realistic in their self-efficacy evaluations over time. nevertheless, clusters are more divergent in grade 8 in their levels of performance and self-efficacy. it is well established in literature that both great overestimation and underestimation of one's own abilities can be a barrier to performance (e.g., bandura, 1986; hattie, 2013). if students are overconfident that they can complete the task or situation well, this may lead them to not take proper strategies or other actions to master the task (hadwin & webster, 2013). in contrast, learners with insufficient confidence may be absorbed with cognitive resources that they have to invest more effort in achieving the goal than they should have (hattie, 2013). the generally high number of underestimator distributed across grade 4 and 8 in this study could find themselves in a helpless position, which then results in poorer performance in scientific reasoning, which in turn is critical for performance in scientific literacy and the important stem subjects. this differentiated information is especially relevant for teachers, as it is crucial for them to 1) realize at all, that there might be a substantive heterogeneity in their class with respect to students’ estimation of their own performance. 2) this differentiated information helps teacher providing feedback that results in a realistic confidence with at most minimal overestimation, as this allows for the best student performance (e.g., bandura, 1977) 3) this in turn, might support students in their self-regulation behavior. and finally, 4) this promotional opportunity appears useful in the context that self-efficacy is a measure that can be improved through intervention or training (sodian et al., 2002; margolis & mccabe, 2006). 4.4 limitations and future directions the cross-sectional design used in the present study allows for a thorough investigation of selfefficacy expectations and their relation to scientific reasoning skills. when interpreting the results, the rather low reliability of the academic self-efficacy scale should be taken into account. to understand and explain interindividual differences in the agreement between performance in scientific reasoning and task-specific self-efficacy, variables that provide information about the use of feedback from teachers and parents would be important. furthermore, we cannot conclude whether self-efficacy influences scientific reasoning or, conversely, whether scientific reasoning influences self-efficacy. bandura (1997) argued for a reciprocal relation between skills and self-efficacy. lawson et al. (2007) found support for the hypothesis that reasoning skills are a good predictor of self-efficacy but not the other way around. the fact that results from specific domains cannot simply be transferred across nyberg, koerber & osterhaus 37 | f l r domains was shown in the findings from schöber et al. (2018). for the math domain, the authors found support for a self-enhancement approach (i.e. self-efficacy influences math performance), whereas for writing skills, a skill-development approach was supported (i.e. writing skills influence self-efficacy). only longitudinal studies can reveal the direction and strength of the relations in the context of scientific reasoning skills. the longitudinal design also allows to examine how self-efficacy and scientific reasoning develop over time. the outcomes could provide relevant implications for a potential intervention. identifying where to target an intervention is critical, whether to focus on scientific reasoning skills or self-efficacy expectations and to identify the developmental stage that may be most sensitive to intervention. 4.5 conclusion the present study showed, for the first time, that self-efficacy can be measured in a task-specific manner in scientific reasoning and found a positive correlation between (task-specific) self-efficacy and performance in scientific reasoning, already in students at the end of elementary school. this suggests that a precise and task-specific measurement of self-efficacy can detect effects already in elementary school children. in addition, our results show substantial interindividual differences and differences in age groups in the agreement between self-efficacy and scientific reasoning. this is an important outcome, especially regarding the influence of scientific reasoning skills on science content knowledge and the relevant stem subjects. therefore, this is a possible starting point for promoting this academic discipline. keypoints self-efficacy can be measured in a task-specific manner in scientific reasoningfor both elementary and secondary school students stronger correlation between self-efficacy and scientific reasoning with more precisely measured self-efficacy differences in the agreement between scientific reasoning and task-specific self-efficacy among and across grades references bandura, a. (1986). social foundations of thought and action: a social cognitive theory. engelwood cliffs. bandura, a. (1997). self efficacy: the exercise of control. freeman. bandura, a (2006). guide to the construction of self-efficacy scales. in f. pajares & t. urdan (eds.), self-efficacy beliefs of adolescents (pp. 307-337). information age. bandura, a., & schunk, d. h. (1981). cultivating competence, self-efficacy and intrinsic interest through proximal self-motivation. journal of personality and social psychology, 41, 586–598. https://doi.org/10.1037/0022-3514.41.3.586 bong, m. (2006). asking the right question. how confident are you thatyou could successfully perform these tasks? in f. pajares & t. c. urdan (eds.), self-efficacy beliefs of adolescents (pp.287–305). information age. nyberg, koerber & osterhaus 38 | f l r bortz, j., & döring, n. (2006). forschungsmethoden und evaluation für humanund sozialwissenschaftler [research methods and evaluation for social scientists]. springer. britner, s. l., & pajares, f. (2001). self-efficacy beliefs, motivation, race, and gender in middle school science. journal of women and minorities in science and engineering, 7, 271–285. https://doi.org/10.1615/jwomenminorscieneng.v7.i4.10 bullock, m., sodian, b., & koerber, s. (2009). doing experiments and understanding science: development of scientific reasoning from childhood to adulthood. in w. schneider & m. bullock (eds.). human development from early childhood to early adulthood: findings from a 20 year longitudinal study (pp.173–197). psychology press. bullock, m., & ziegler, a. (1999). scientific reasoning: developmental and individual differences. in f. e. weinert & w. schneider (eds.), individual development from 3 to 12: findings from the munich longitudinal study (pp. 38–54). cambridge university press. bybee, r. w. (2010). advancing stem education: a 2020 vision. technology and engineering teacher, 70, 30–35. cai, d., viljaranta, j., & georgiou, g. k. (2018). direct and indirect effects of self-concept of ability on math skills. learning and individual differences, 61, 1–38. https://doi.org/10.1016/j.lindif.2017.11.009 chen, f.f. (2007). sensitivity of goodness of fit indexes to lack of measurement invariance. structural equation modeling, 14, 464-504. https://dx.doi.org/10.1080/10705510701301834 cheung, g. w., & rensvold, r. b. (2002). evaluating goodness-of-fit indexes for testing measurement invariance. structural equation modeling, 9, 233-255. https://doi.org/10.1207/s15328007sem0902_5 cohen, j. (1992). a power primer. psychological bulletin, 112, 155-159. https://doi.org/10.1037/0033-2909.112.1.155 everitt, b. s., landau, s., & leese, m. (2001). cluster analysis (4th ed.). oxford university press. hadwin, a. f., & webster, e.a. (2013). calibration in goal setting: examining the nature of judgments of confidence. learning and instruction, 24, 37-47. https://doi.org/10.1016/j.learninstruc.2012.10.001 hattie, j. (2013). calibration and confidence. where to next? learning and instruction, 24, 62-66. https://doi.org/10.1016/j.learninstruc.2012.05.009 hallet, d., nunes t., & bryant, p. (2010). individual differences in conceptual and procedural knowledge when learning fractions. journal of educational psychology, 102, 395-406. https://doi.org/10.1037/a0017486 harter, s. (1985). manual for the self-perception profile for children. university of denver honicke, t., & broadbent, j. (2016). the influence of academic self-efficacy on academic performance: a systematic review. educational research review, 17, 63–84. https://doi.org/10.1016/j.edurev.2015.11.002 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6, 1-55. https://doi.org/10.1080/10705519909540118 nyberg, koerber & osterhaus 39 | f l r jansen, m., scherer, r., & schroeders, u. (2015). students’ self-concept and self-efficacy in the sciences: differential relations to antecedents and educational outcomes. contemporary educational psychology, 41, 13–24. https://doi.org/10.1016/j.cedpsych.2014.11.002 jerusalem, m., & satow, l. (1999). schulbezogene selbstwirksamkeitserwartung. in r. schwarzer & m. jerusalem (eds.), skalen zur erfassung von lehrerund schülermerkmalen. freie universität berlin. klassen, r.m., & usher, e. l. (2010). self-efficacy in educational settings: recent research and emerging directions. in t. c. urdan & s. a. karabenick (eds.), the decade ahead: theoretical perspectives on motivation and achievement (pp.1–33). emerald. koerber, s., mayer, d., osterhaus, c., schwippert, k., & sodian, b. (2015). the development of scientific thinking in elementary school: a comprehensive inventory. child development, 86, 327– 336. https://doi.org/10.1111/cdev.12298 kruger, j., & dunning, d. (1999). unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. journal of personality and social psychology, 77, 1121–1134. https://doi.org/10.1037/0022-3514.77.6.1121 lawson, a.e., banks, d.l., & lovgin, m. (2007). self-efficacy, reasoning ability, and achievement in college biology. journal of research in science teaching, 44, 706-724. https://doi.org/10.1002/tea.20172 lenhard, w., & schneider, w. (2006). elfe 1-6: ein leseverständnistest für erstbis sechstklässler. hogrefe. liu, m., hsieh, p., cho, y., & schallert, d. l. (2006). middle school students’ self-efficacy, attitudes, and achievement in a computer-enhanced problem-based learning environment. journal of interactive learning research, 17, 225–242. margolis, h., & mccabe, p. p. (2006). improving self-efficacy and motivation: what to do, what to say. intervention in school & clinic, 41, 218–227. https://10.1177/10534512060410040401 marsh, h.w., pekrun, r., parker, p. d., murayama, k., guo, j., dicke, t., & arens, a.k. (2018). the murky distinction between self-concept and self-efficacy: beware of lurking jingle-jangle fallacies. journal of educational psychology, 111, 331-353. https://doi.org/10.1037/edu0000281 moosbrugger, h., & kelava, a. (2012). testtheorie und fragebogenkonstruktion [test theory and questionnaire construction]. springer. multon, k. d., & brown, s. d. (1991). relation of self-efficacy beliefs to academic outcomes: a meta-analytic investigation. journal of counseling psychology, 18, 30–38. nagengast, b., marsh, h. w., scalas l.f., xu, m. k., hau, k. t., & trautwein, u. (2011). who took the “x” out of expectancy-value theory? a psychological mystery, a substantive-methodological synergy, and a cross-national generalization. psychological science, 22, 1058–1066. https://doi.org/10.1177/0956797611415540 nicholls, j. g. (1979). development of perception of own attainment and causal attributions for success and failure in reading. journal of educational psychology, 71, 94–99. https://doi.org/10.1037/00220663.71.1.94 nunnally, j.c. & bernstein i.h. (1978). psychometric theory. new york: mcgraw-hill oecd (2006). assessing scientific, reading and mathematical literacy: a framework for pisa 2006. paris: oecd. nyberg, koerber & osterhaus 40 | f l r oecd (2015). oecd science, technology and industry scoreboard 2015: innovation for growth and society. paris: oecd. osborne, j. (2013). the 21st century challenge for science education: assessing scientific reasoning. thinking skills and creativity, 10, 265–279. https://doi.org/10.1016/j.tsc.2013.07.006 osterhaus, c., koerber, s., & sodian, b. (2020). the science-p reasoning inventory (spr-i): measuring emerging scientific reasoning skills in primary school. international journal of science education, 42, 1087-1107. https://doi.org/10.1080/09500693.2020.1748251 osterhaus, c., koerber, s., & sodian, b. (2017). scientific thinking in elementary school: children’s social cognition and their epistemological understanding promote experimentation skills. developmental psychology, 53, 450-462. https://doi.org/10.1037/dev0000260 osterhaus, c., koerber, s., & sodian, b. (2015). children’s understanding of experimental contrast and experimental control: an inventory for primary school. frontline learning research, 3, 56–94. pajares, f. (1996). self-efficacy beliefs in academic settings. review of educational research, 66, 543–578. https://doi.org/10.3102/00346543066004543 pajares, f., & schunk, d.h. (2001). the development of academic self-efficacy. in a. wigfield & j.s. eccles (eds.), development of achievement motivation (pp. 15-31). academic press. pajares, f., & valiante, g. (1997). influence of self-efficacy on elementary students’ writing. the journal of educational research, 90, 353–360. https://doi.org/10.1080/00220671.1997.10544593 richardson, m., abraham, c., & bond, r. (2012). psychological correlates of university students’ academic performance: a systematic review and meta-analysis. psychological bulletin, 138, 353– 387. https://doi.org/10.1037/a0026838 rosseel, y. (2012). lavaan: an r package for structural equation modeling. journal of statistical software, 48, 1-36. https://doi.org/10.18637/jss.v048.i02 schiepe-tiska, a., simm, i., & schmidtner, s. (2016). motivationale orientierungen, selbstbilder und berufserwartungen in den naturwissenschaften in pisa 2015. in k. reiss, c. sälzer, a. schiepetiska, e. klieme, & o. köller (eds.), pisa 2015. eine studie zwischen kontinuität und innovation (pp. 99–132). waxmann. schöber, c., schütte, k., köller, o., mcelvany, n., & gebauer, m. m. (2018). reciprocal effects between self-efficacy and achievement in mathematics and reading. learning and individual differences, 63, 1–11. https://doi.org/10.1016/j.lindif.2018.01.008 schunk, d. h. (1995). self-efficacy and education and instruction. in j. e. maddux (ed.), self efficacy, adaptation, and adjustment: theory, research, and application (pp. 281–303). plenum press. schukajlow, s., leiss, d., pekrun, r., blum, w., muller, m., & messner, r. (2012). teaching methods for modelling problems and students’ task-specific enjoyment, value, interest and selfefficacy expectations. educational studies in mathematics, 79, 215–237. https://10.1007/s10649011-9341-2 schwarzer, r., & jerusalem, m. (2002). das konzept der selbstwirksamkeit. zeitschrift für pädagogik, 44, 28–53. siefer, k., leuders, t., & obersteiner a. (2020). leistung und selbstwirksamkeitserwartung als kompetenzdimension: eine erfassung individueller ausprägungen im themenbereich linearer funktionen. journal für mathematik-didaktik, 41, 267-299. https://doi: 10.1007/s13138-01900147-x nyberg, koerber & osterhaus 41 | f l r stipek, d.c. & hoffman, j.h. (1980). children’s achievement-related expectancies as a function of academic performance histories and sex. journal of educational psychology, 70, 154-166. sodian, b., thoermer, c., kircher, e., grygier, p., & günther, j. (2002). vermittlung von wissenschaftsverständnis in der grundschule [teaching understanding the nature of science in elementary school]. zeitschrift für pädagogik, 45, 192–206. usher, e. l., & pajares, f. (2009). sources of self-efficacy in mathematics: a validation study. contemporary educational psychology, 34, 89–101. https://doi.org/10.1016/j.cedpsych.2008.09.002 weiß, r.h. (2006). grundintelligenztest skala 2 revision (cft 20-r). hogrefe zeldin, a. l., & pajares, f. (2000). against the odds: self-efficacy beliefs of women in mathematical, scientific, and technological careers. american educational research journal, 37, 215–246. https://doi.org/10.3102%2f00028312037001215 zimmerman, b. (1995). self-efficacy and educational development. in a. bandura (ed.). self-efficacy in changing societies (pp. 202-231). cambridge university press. zimmerman, c. (2007). the development of scientific thinking skills in elementary and middle school. developmental review, 27, 172–223. https://doi.org/10.1016/j.dr2006.12.001 nyberg, koerber & osterhaus 42 | f l r this is not about solving the task, but you are asked to complete the question below in terms of yourself. appendix 1 1. sample items 1.1 task-specific self-efficacy item and nos task (a03 middle ages; osterhaus et al., 2020) strongly disagree disagree agree strongly agree 1. i know how to deal with the task. o o o o 2. i am very familiar with such tasks and know how to solve the task. o o o o 3. i would need help to solve the task. o o o o long ago, in the middle ages, people believed there are witches who could make people sick. a modern-day scientist traveled back to the middle ages with a time machine. scientists in the middle ages thought that witches can make people sick. the modern-day scientist believes that bacteria can make people sick. the modern-day scientist shows the scientist from the middle ages the bacteria under the microscope and explains: “these bacteria are the reason why people get sick!” what will the scientist from the middle ages say to this? nyberg, koerber & osterhaus 43 | f l r now you may solve the task. what will the scientist from the middle ages say to this? he would say this he would not say this 1. “of course, you’re right. bacteria make people sick, not witches.” ¨ ¨ 2. “bacteria could be the witches’ little helpers.” ¨ ¨ 3. “it may be true that there are bacteria here, but witches are still the ones who make people sick.” ¨ ¨ which is the best answer? no. nyberg, koerber & osterhaus 44 | f l r this is not about solving the task, but you are asked to complete the question below in terms of yourself. 1.2 task-specific self-efficacy item and unex task (u1 trees; osterhaus et al., 2015) a scientist travels to a faraway planet, planet ogi. there, he observes that all trees are very small. the scientist develops a tree medicine that is supposed to help the trees grow. he calls this medicine supergrow. the scientist wants to find out whether his supergrow works and whether it really makes the trees grow. therefore, he conducts an experiment. he gives his supergrow to all trees on planet ogi. six months later, the scientist travels back to planet ogi. he observes that all trees are huge now he is convinced: “my supergrow works!” was this a good experiment? strongly disagree disagree agree strongly agree 1. i know how to deal with the task. o o o o 2. i am very familiar with such tasks and know how to solve the task. o o o o 3. i would need help to solve the task. o o o o nyberg, koerber & osterhaus 45 | f l r now you may solve the task. was this a good experiment? ¨ yes ¨ no susan, lisa, and vera wonder whether this was a good experiment. who is right and who is not? is right is not right 1. susan says: „it was not a good experiment because he does not know how big the trees would have grown without his supergrow.” ¨ ¨ 2. lisa says: „it was a good experiment because he found that all trees have grown huge after receiving his supergrow.” ¨ ¨ 3. vera says: „it was a good experiment because you can only see whether things work if you test them.” ¨ ¨ which of the three girls has the best answer? no.______ frontline learning research 2 (2013) 33-52 issn 2295-3159 corresponding author: robert klassen, department of education university of york, york uk yo10 5dd, robert.klassen@york.ac.uk doi 33 | f l r measuring teacher engagement: development of the engaged teachers scale (ets) robert m. klassen a,b , sündüs yerdelen c , tracy l. durksen b a university of york, uk b university of alberta, edmonton, canada c middle east technical university, ankara, turkey and kafkas university, kars, turkey article received 27 june 2013 / revised 10 december 2013 / accepted 10 december 2013 / available online 20 december 2013 abstract the goal of this study was to create and validate a brief multidimensional scale of teacher engagement—the engaged teachers scale (ets)—that reflects the particular characteristics of teachers’ work in classrooms and schools. we collected data from three separate samples of teachers (total n = 810), and followed five steps in developing and validating the ets. the result of our scale development was a 16-item, 4-factor scale of teacher engagement that shows evidence of reliability, validity, and practical usability for further research. the four factors of the ets consist of: cognitive engagement, emotional engagement, social engagement: students, and social engagement: colleagues. the ets was found to correlate positively with a frequently used work engagement measure (the uwes) and to be positively related to, but empirically distinct from, a measure of teachers’ self-efficacy (the tses). our key contribution to the measurement of teacher engagement is the novel inclusion of social engagement with students as a key component of overall engagement at work for teachers. we propose that social engagement should be considered in future iterations of work engagement measures in a range of settings. keywords: teachers; engagement; scale validation; motivation 1. introduction a recurring theme of recent educational debate in public and research circles is the critical importance of providing all students with access to teachers who are highly engaged in their work (economist intelligence unit, 2012; pianta, hamre, & allen, 2012; rimm-kaufman & hamre, 2010; staiger & rockoff, 2010). klassen et al. 34 | f l r although work engagement research in business settings is thriving (bakker, albrecht, & leiter, 2011; sonnentag, 2003), the same attention has not been paid to the construct in education, at least partly due to the absence of context-relevant tools. building an understanding of teachers‘ engagement at work is vital: research shows that teachers‘ attitudes and motivation levels are transmitted to students (roth, assor, kanatmaymon, & kaplan, 2007). however, the most frequently used measure of work engagement (bakker et al., 2011)—the utrecht work engagement scale (uwes)—is designed for research involving workers in the business sector, and sharply contrasting work environments may demand dimensions of work engagement not currently covered in existing measures. shuck and colleagues noted ―an essential first step (to advance development of work engagement research) is a context-specific, conceptual exploration of the construct of employee engagement in relation to other well-researched job attitude(s)‖ (shuck, ghosh, zigarmi, & nimon, 2013, p. 11). thus, the purpose of this article is to report the design and validation of a teacher engagement scale that reflects the particular context and demands experienced by teachers working in classroom settings, and to explore the scale in relation to teachers‘ self-efficacy and to the frequently used work engagement scale, the uwes. work engagement is a motivation concept that refers to the voluntary allocation of personal resources directed at the range of tasks demanded by a particular vocational role (christian, garza, & slaughter, 2011). two core conceptual dimensions—energy and involvement—underpin work engagement (bakker et al., 2011), with three domains of engagement often posited: physical, emotional, and cognitive (e.g., saks, 2006). in some cases, these three domains are subsumed under a higher-order engagement construct, whereby the individual domains are experienced simultaneously or holistically (e.g., rich, lepine, & crawford, 2010; sonnentag, 2003). the relationship of engagement to burnout has been debated. in the view of some, engagement is the opposite of burnout, representing the other end of the continuum that stretches from fully engaged (low burnout) to not engaged (high burnout). recent research using the oldenburg burnout inventory (olbi; demerouti, mostert, & bakker, 2010), which simultaneously measures the energy and identification dimensions of engagement/burnout using positively and negatively worded items, provides equivocal results about the relationship of burnout and engagement. the creators of the olbi found that the identification dimension of burnout seemed to be opposite of the dedication dimension of engagement, whereas the energy dimensions of burnout (exhaustion) and engagement (vigour) operated as separate, but related, dimensions. existing engagement measures—such as the olbi and uwes—have the advantage of measuring engagement in a broad variety of settings, but have not been created to examine engagement in specific contexts, like teaching. creating a tailor made teacher engagement measure offers the advantage of including content that reflects the unique characteristics of teachers and the teaching context. engagement is considered to be relatively stable, with some fluctuations over time, reflecting both trait-like and state-like components (dalal, brummel, wee, & thomas, 2008; schaufeli, salanova, gonzalez-roma, & bakker, 2002). macey and schneider‘s (2008) review of the engagement literature and subsequent conceptualization of the construct suggests work engagement reflects the dispositions (feelings of energy) that lead to engaged behaviours (acting in an energetic fashion). engagement reflects motivational forces (e.g., intrinsic reasons for behaviour), but is conceptually distinct from these forces and from the ensuing behaviours (schaufeli & salanova, 2011); for example, the related construct of work commitment refers to an attitude of attachment to a job or career (e.g., meyer, allen, & smith, 1993; saks, 2006), but is conceptually separate from the feelings of energy during work time that defines engagement. work commitment refers to an attitude about work; work engagement refers to the degree of attention and absorption in work activities (shuck et al., 2013). work engagement has also shown discriminant validity from job attitudes (christian et al., 2011), and job involvement and satisfaction (rich et al., 2010). engagement has been shown to be related to self-efficacy; that is, beliefs in the capabilities to accomplish tasks in particular domains. xanthopoulou, bakker, demerouti, and schaufeli (2007) found that self-efficacy (along with optimism and organizational-based self-esteem) served as workplace resources that predicted engagement. in education settings, teachers‘ self-efficacy has been shown to be a potent motivational force associated with commitment to teaching and (inversely) to quitting intention (klassen & chiu, 2011), and to be robustly related to teacher resilience (gu & day, 2007). although there are close relationships between engagement and other work-related motivation constructs, there is support for empirical and conceptual klassen et al. 35 | f l r distinctiveness, and exploring the nomological web of relationships among key related variables results in a more nuanced picture of how people behave in the workplace. schaufeli and colleagues operationalised work engagement in their creation of the uwes (e.g., schaufeli, bakker, & salanova, 2006), and defined work engagement as an affective-cognitive state, not targeted at any particular work event or task. however, questions remain about the robustness of its factor structure (e.g., klassen et al., 2012; shimazu et al., 2008; sonnentag, 2003), and its item content may not be relevant for all contexts. for example, although the uwes has been used with teachers (e.g., bakker & bal, 2010; hakanen, bakker, & schaufeli, 2006), the scale content ignores the particular conditions associated with teachers‘ work. in particular, the uwes and other work engagement scales do not reflect the dimension of social engagement with students, a dimension which perhaps uniquely defines the act of teaching (jennings & greenberg, 2009). the work of teaching involves a level of demand for social engagement—energy devoted to establishing relationships—that is rarely found in other professions (e.g., pianta et al., 2012; roorda, koomen, spilt, & oort, 2011) and that is not included in other conceptual definitions of engagement (i.e., the uwes). although workers in many settings must engage socially with colleagues, teaching uniquely emphasises energy spent on the establishment of long-term, meaningful connections with the clients of the work environment (i.e., students) in a way that characterises the job of teaching. in fact, researchers propose that teacher-student relationships may play the primary role in fostering student engagement and positive student outcomes (davis, 2003; klassen, perry, & frenzel, 2012; pianta et al., 2012; wang, 2009). teachers who devote energy to forming warm and nurturing relationships with their students tend to experience higher levels of well-being, and less emotional stress and burnout (jennings & greenberg, 2009). to be sure, workers in other professions such as health (e.g., physicians, nurses, psychologists) or business (e.g., sales representatives), may form deep and meaningful relationships with their patients or clients, but rarely do workers in these fields spend the number of hours that most teachers spend with their students. like workers in other professions, teachers form social relationships with colleagues during work, but the emphasis on social relationships with students characterises the heart of the work of teaching; in fact, the opportunity to work closely with students is a strong motive for many teachers entering the profession (e.g., watt & richardson, 2007). measuring teachers‘ work engagement without capturing social engagement with students ignores one of the most important aspects of teacher engagement. shuck‘s recent review of work engagement (2011) concludes that the construct remains in a state of evolution, with disciplinary bridges needed between disparate communities of research. as educational psychologists, we question the fit of business-oriented work engagement models and measures to educational contexts, and see a clear need for a context-specific engagement measure tailored to the work performed by teachers. in this article, we address this need by creating and testing the engaged teacher scale (ets), in which workplace (i.e., classroom) engagement, comprising context-responsive physical, cognitive, and emotional dimensions (e.g., rich et al., 2010), is combined with social engagement with students and colleagues to represent teachers‘ overall engagement. 1.1 current study the goal of the study was to create and validate a usable (i.e., brief) scale of teacher engagement. we followed five steps involving three samples of teachers (total n = 810) in developing and validating the ets. in step 1 we developed item content, and received critical feedback from a focus group of experts. in steps 2 through 5 we collected data from three independent samples and conducted a series of statistical analyses designed to reduce the item pool, explore the factor structure, and examine the construct validity of the emerging scale. the result of our five steps is a 16-item, 4-factor scale of teacher engagement that shows evidence of reliability, validity, and usability for future research. klassen et al. 36 | f l r 2. step 1 step 1 consisted of creation of an item pool, and generation of feedback about the content of the item pool. to begin, our team of researchers (i.e., the three authors who represent disparate backgrounds— psychology, education, and educational psychology—and three countries) reviewed the existing literature and created and adapted item content through a process of generation, discussion, and revision. a comprehensive literature search revealed a number of theory-driven work engagement measures (e.g., rich, 2006; saks, 2006; schaufeli, et al., 2006; shuck, 2010; thomas, 2006; wang & qin, 2011). theoretical guidance from research by rich et al. (2010), kahn (1990, 1992), and schaufeli et al. (2006) provided the foundation for the dimensions of engagement (physical, cognitive, and emotional; or vigour, absorption, and dedication for the uwes). we also drew from teacher-student relatedness research (davis, 2003; klassen et al., 2012; pianta et al., 2012; wang, 2009) for generation of social engagement items. item development included adaptation of items from existing measures (e.g., at my work, i feel bursting with energy was adapted to when teaching, i feel bursting with energy), and creation of new items guided by theory (e.g., in class, i care about the problems of my students was an item reflecting social engagement: students). the proposed structure of the ets is presented in figure 1, with an over-arching engagement factor, and five second level dimensions: physical, cognitive, emotional, social: students, and social: colleagues. after reviewing the literature, an initial survey of 56 items was created and presented to 13 educational psychology graduate students, nine of whom were practicing teachers, during a graduate-level seminar. following an introduction to the engagement literature (e.g., discussion of the uwes; schaufeli et al., 2006), the students were given instructions to provide feedback on the content, wording, and plausibility of the initial item list. small groups (2-4 students) were formed to provide feedback on one dimension after which the students participated in a large group discussion of the item content. the items and item content were revised based on the feedback and discussion, with the resulting survey consisting of 48 items representing five factors. figure 1 presents the hypothesised dimensions of the ets, with initial number of items for each dimension, and item examples for each of the five dimensions. figure 1. hypothesised dimensions for the engaged teachers scale (ets). the number of initial items identified with each dimension is listed in parentheses, with example items listed in the following row. e n g a g e d t e a c h e r s c a le physical (7) i devote a lot of energy to teaching. cognitive (7) while teaching, i get absorbed in my work. emotional (12) i really put my heart into teaching. social: students (11) i connect well with my students. social: colleagues (11) i am accessible to my colleagues. klassen et al. 37 | f l r 3. step 2 in step 2, we administered the emergent 48-item measure to a sample of 224 practicing teachers, and analyzed the data using principle components analysis (pca) for item reduction purposes. although the use of pca has been criticised as a means of extracting factors (e.g., velicer & jackson, 1990), it is a preferred method for item reduction (conway & huffcutt, 2003; henson & roberts, 2006; matsunaga, 2010). 3.1 participants and procedures data for step 2 were collected at a compulsory teacher conference 1 in an urban/suburban setting with a population of about 1,000,000 in western canada. participants were volunteers who were recruited in the exhibition hall of the conference during breaks between professional development sessions. consenting teachers completed the paper-and-pencil survey on-site while research assistants kept notes on any verbal feedback offered during data collection. the sample for step 2 consisted of 224 teachers (74.6% female) between the ages of 23 and 65 years (m = 40.73 years). participants‘ highest level of education was reported as: undergraduate degree (73.4%), master‘s degree (22.5%), doctorate degree (0.9%), and 3.2% unspecified. most participants were employed full-time (84.8%) in urban 2 (77.5%), suburban (20.3%), and rural (2.3%) canadian schools. participants‘ school settings were elementary (43.3%), middle (17%), secondary (28%), and multiple (9%), with a mean class size of 26.6 students. participants typically rated the socioeconomic status of most students in their class as low to average (67.9%), with 26.7% reported as average-high to high (5.4% varied or unknown). teaching experience ranged from 0 to 38 years, with a mean of 13.42 (sd = 9.79) years of total teaching experience, and a mean of 5.05 years at their current school. most participants (48.7%) were early career (≤10 years experience), with 23.7% at mid-career stage (11-20 years), and 25.6% with more than 20 years of experience. before conducting analyses, we examined item correlations, and subsequently excluded three items from further analysis due to non-significant correlations with the other variables, leaving 45 items. we used pca with promax rotation (kappa set at 4) in order to derive a smaller number of items for subsequent steps. 3.2 step 2 results results of pca revealed several items that did not load on theoretically consistent components, as well as items that clearly loaded on more than one component. for example, the item ―i burst with energy while teaching‖ loaded on a component with items characterizing emotional engagement; however, the item was intended to characterise physical engagement. furthermore, items that did not load on components with an adequate number of items (at least three) were excluded. since the purpose of the pca in this step was not to explore the factor structure but to reduce items, the main focus of the analysis was item reduction. hence, rather than examining the number of components, we examined the emergence of principal components and the magnitude of component loadings, with a minimum component loading set at > .50. after inspecting conceptual fit of the items and the item loadings for each component, six items from three components and five items from one component were retained for further analyses. the loading of these items ranged between .61 and .98. in total, four components were extracted and retained, with a total of 23 items. items on two components—tentatively labelled as cognitive and physical engagement—did not extract separately as initially hypothesised. since we hypothesised physical engagement as an important facet of work engagement, we created an additional two items representing each of physical and cognitive engagement items for further analysis, resulting in 27 items available for analysis in step 3. 1 attendance at one of the regional annual two-day teacher conventions is mandatory for all of the approximately 30,000 public school teachers in the province. 2 the term ―urban‖ in a canadian context typically connotes geographical location (i.e., a large city or town), not sociological context (i.e., socioeconomic status level or ethnicity) as is sometimes the case in u.s.-based research. klassen et al. 38 | f l r 4. step 3 in step 3 we administered the emergent 27-item version of the scale to a new sample of 265 teachers and conducted exploratory factor analysis (efa) to test the scale‘s factor structure. 4.1 participants and procedures participants were recruited in a similar fashion to step 2, in a multi-district compulsory teacher conference at a different urban setting (population ~1,100,000) in the same western canadian province. the step 3 sample consisted of 265 teachers (68.7% female) between the ages of 21 and 68 years (m = 40.37 years). demographics—ses, teaching level, and teaching experience—were similar to those in step 2, with additional demographic information available from the authors. 4.2 step 3 results the 27 items from step 2 were analyzed using efa with principle axis factoring and promax rotation (kappa set at 4). results of the efa were first examined in terms of the appropriateness of the existing data for factor analysis. the kaiser-meyer-olkin measure of sampling adequacy was .92, suggesting that the data were appropriate for factor analysis. additionally, bartlett‘s test of sphericity, 2 (351) = 4402.20, p < .05, indicated that the population correlation matrix was not an identity matrix and suitable for factor analysis (field, 2009). we next followed three approaches to determine the number of factors to be retained. first, we examined kaiser‘s eigenvalues > 1.0 and scrutiny of the screen test. retaining factors with eigenvalues > 1.0 resulted in five factors and yielded 66.27% of the variance in respondents‘ scores. examination of the scree plot suggested four or five factors. although the eigenvalues > 1.0 rule and screen test are commonly used methods for determining number of factors, both are criticised for lack of reliability (e.g., ledesma & valero-mora, 2007; velicer & jackson, 1990). second, parallel analysis—based on statistical rather than mechanical rules—was used as an alternative and more accurate test to determine number of factors (ledesma & valero-mora, 2007; o‘connor, 2000; zwick & velicer, 1986). results from the parallel analysis suggested retention of four factors. third, efa was performed to compare 4and 5-factor solutions. only the 4-factor solution yielded interpretable factors. with the 5-factor solution, one item, ―in class, i am accessible to my students‖ created a factor by itself. in the 4-factor solution, this item loaded inappropriately (i.e., theoretically unjustifiable) on the factor that was extracted by cognitive engagement items. therefore, this item was excluded from the scale and the 4-factor solution was retained. as in step 2, cognitive and physical engagement items did not produce separate factors; since cognitive items dominated the content, we labelled the factor cognitive engagement. examining the factor pattern coefficients with the cut-off point set at .70 resulted in eight more items eliminated from the scale. however, two borderline-case items with coefficients between .50 and .70 were retained since the item content made the factors more representative in terms of the construct being measured. two items with redundant content were considered: ―at school, i value the relationships i build with my colleagues,‖ and ―at school, i value spending time with my colleagues.‖ we excluded the latter item due to lower factor loading (.82 versus .92 for the former item). as a result of these procedures, the scale was reduced to 16 items with four items in each of four factors. table 1 lists the pattern and structure coefficients of items for the related factors. the final version of the ets with item content of each engagement dimension is presented in the appendix. the efa resulted in four factors accounting for 71.31% of the variance in the respondents‘ scores. the first factor was named emotional engagement (ee), accounting for 40.25% of the variance in the correlation matrix. the other three factors were social engagement: colleagues (sec), cognitive engagement (ce), and social engagement: students (ses) accounting for 13.84%, 9.56%, and 7.66% of the variance, respectively. correlations between factors ranged from .33 to .62. cronbach‘s alpha coefficients for the ee, sec, ce and ses factors were .89, .85, .85, and .84, respectively. klassen et al. 39 | f l r table 1 factor pattern and structure coefficients in descending order (efa, promax rotation) for the four-factor model of ets item content factor ee sec ce se 10 i love teaching .95 (.89) 2 i am excited about teaching .80 (.81) 5 i feel happy while teaching .72 (.83) 13 i find teaching fun .70 (.76) 9 at school, i value the relationships i build with my colleagues .88 (.83) 7 at school, i am committed to helping my colleagues .83 (.83) 12 at school, i care about the problems of my colleagues .79 (.82) 1 at school, i connect well with my colleagues .57(.58) 11 while teaching i pay a lot of attention to my work .82 (.82) 8 while teaching, i really ―throw‖ myself into my work .77 (.80) 15 while teaching, i work with intensity .76 (.76) 4 i try my hardest to perform well while teaching .65 (.71) 14 in class, i care about the problems of my students .87 (.82) 16 in class, i am empathetic towards my students .79 (.83) 6 in class, i am aware of my students‘ feelings .75 (.73) 3 in class, i show warmth to my students .53 (.65) note. factor structure coefficients were included in the parenthesis. ee = emotional engagement, sec = social engagement: colleagues, ce = cognitive engagement, ses = social engagement: students. 5. step 4 in steps 4 and 5 we administered the final version of the scale to 321 teachers and analyzed the data using firstand second-order confirmatory factor analyses (cfa) for the purpose of testing construct validity. in particular, step 4 was performed to validate the factor structure of the ets. 5.1 participants and procedures data were collected at compulsory teachers‘ convention in an adjacent province. demographic information was similar to the samples in steps 2-3 and is available from the authors. klassen et al. 40 | f l r 5.2 step 4 results a series of cfas was performed in step 4 to test the factor structure of the ets. first, we performed cfa on the 16 items and 4 factors (model 1). second, we tested models with and without social engagement by testing models that excluded factors representing social engagement with students (ses, model 2) and social engagement with colleagues (sec, model 3). finally, a second-order cfa was performed to examine whether the four first-order factors could be explained by a second-order teacher engagement (te) factor (model 4). we used lisrel 8.80 (jöreskog & sörbom, 2006) with simplis command language to conduct cfa. we used a series of fit indices to evaluate the model fit in addition to the conventional use of chisquare (see kline, 2005): comparative fit index (cfi), normed fit index (nfi), goodness-of-fit index (gfi), and root mean square error of approximation (rmsea). since the level of missing data was low (1.8%), we replaced missing values with means (tabachnick & fidel, 2007). data were checked for multivariate normality through inspection of univariate and multivariate outliers (kline, 2005), with eight cases excluded as a result. skewness and kurtosis values were checked and absolute values were found within the ranges .40 1.0 and .03 .45, respectively. the maximum likelihood approach was selected to estimate the parameters of the model (chou & bentler, 1995). 5.2.1 model 1: four first-order factors the 16-item scale was subjected to first-order cfa to test the four-factor structure of ets. results demonstrated a good fit to the data ( 2 (98) = 292.67, p < .05; cfi = .97; gfi = .90; nfi = .96; rmsea = .08; 90% ci = .07, .09). standardised parameter estimates for each item of the four-factor ets model are listed in table 2. as presented in the table, all of the standardised estimates (ranging from .66 to .85) were significant and above a cut-off value of .50 (hair, black, babin, anderson, & tatham, 2010). table 3 presents the correlations (phi estimates) among the four factors. as seen in the table, correlations ranged between .49 and .73, and were significant at the p < .01 level. internal consistencies of each subscale of ets were examined, with cronbach‘s alpha coefficients at .84, .87, .83, and .79 for ce, ee, ses, and sec, respectively. table 4 presents the means, standard deviations, and reliability coefficients for the four factors. these findings supported our initial prediction of a first-order factor structure for teacher engagement. since we proposed the novel hypothesis that social engagement was a dimension of teacher engagement, we tested models 2 and 3 that examined the validity of including social engagement dimensions with students and colleagues in our model of teacher engagement. klassen et al. 41 | f l r table 2 standardised parameter estimates for the first-order factor solution for the ets (model 1) item content factor λ 4 i try my hardest to perform well while teaching ce .72 8 while teaching, i really ―throw‖ myself into my work ce .80 11 while teaching i pay a lot of attention to my work ce .75 15 while teaching, i work with intensity ce .74 2 i am excited about teaching ee .78 5 i feel happy while teaching ee .75 10 i love teaching ee .85 13 i find teaching fun ee .80 3 in class, i show warmth to my students ses .71 6 in class, i am aware of my students‘ feelings ses .69 14 in class, i care about the problems of my students ses .74 16 in class, i am empathetic towards my students ses .81 1 at school, i connect well with my colleagues sec .66 7 at school, i am committed to helping my colleagues sec .68 9 at school, i value the relationships i build with my colleagues sec .85 12 at school, i care about the problems of my colleagues sec .66 note. ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: students. all coefficients were significant, p < .05. table 3 factor correlations (phi estimates) of model 1 2 3 4 1. ce .73** .73** .49** 2. ee .64** .53** 3. ses .52** 4. sec note. ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: students. **p < .001. klassen et al. 42 | f l r table 4 means, standard deviations, and reliability coefficients for factors of ets factors mean sd α te (composite) 5.07 .56 .91 ce 5.16 .65 .84 ee 5.05 .73 .87 ses 5.26 .60 .83 sec 4.80 .80 .79 note. te = teacher engagement, ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: colleagues. 5.2.2 model 2: three first-order factors, ses excluded model 2 was constructed to test if a 3-factor structure without ses provided a better fit to the data than the full 4-factor structure. the purpose of this procedure was to examine the contribution of teacher‘ social engagement with students to explain their general work engagement. this model showed good fit to the data ( 2 (51) = 155.65, p < .05; cfi = .97; gfi = .93; nfi = .96; rmsea = .08; 90% ci = .07, .09). model 2 was compared to model 1 using the chi-square difference test. the δ 2 value of 137.02 (δdf = 47) was significant, indicating that model 2 was a significantly poorer fit for the data than model 1. 5.2.3 model 3: three first-order factors, sec excluded in model 3, we excluded the social engagement: colleagues (sec) factor from the 4-factor ets. the model was compared with model 1 to test the role of teachers‘ relationship with colleagues in teacher engagement. although model 3 showed an adequate fit to the data ( 2 (51) = 179.33, p < .05; cfi = .97; gfi = .91; nfi = .96; rmsea = .09; 90% ci = .08, .11), the chi-square difference test between model 1 and model 3 revealed a significantly poorer fit for the model 3 data (δ 2 = 113.34, δdf = 47). thus we concluded that social engagement with students and peers were viable dimensions with which to measure teacher engagement. 5.2.4 model 4: second-order factor the high reliabilities and intercorrelations found in the first-order factor structure of ets suggested the possibility of a second-order factor. therefore, a second-order cfa was conducted to examine whether the four-factor ets could be represented by a superordinate factor labelled teacher engagement. figure 2 presents the first order and second order models in graphic format. the fit indices for the second-order factor ( 2 (100)= 296.94, p < .05; cfi = .97; gfi = .89; nfi = .95; rmsea = .08; 90% ci = .07, .09) suggested that the hypothesised model fit the data well. as shown in table 5, all first-order factors significantly loaded on the second-order factor and their standardised coefficients were above the .50 cut-off suggested by hair et al. (2010). a chi-square difference test conducted between models 1 and 4 revealed no significant difference, suggesting the viability of an underlying single factor in addition to valid use of the four subscale scores. a summary of the goodness of fit indices for the four models is presented in table 6. thus, results suggested that using the four-factor or single factor models was viable for measuring teacher engagement. klassen et al. 43 | f l r table 5 standardised parameter estimates for the second-order factor solution for the ets (model 4) second-order factor first-order factors γ te ce .88 te ee .82 te ses .82 te sec .61 note. te = teacher engagement, ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: colleagues. all coefficients were significant, p < .05. table 6 goodness of fit indices for the four models model χ 2 df χ 2 /df rmsea cfi gfi nfi model comparison δ χ 2 δ df four first-order factor model (model 1) 292.67 98 2.99 .08 .97 .90 .96 three first-order factor model, ses excluded (model 2) 155.65 51 3.85 .08 .97 .93 .96 2 vs. 1 137.02** 47 three first-order factor model, sec excluded (model 3) 179.33 51 3.52 .09 .97 .91 .96 3 vs. 1 113.34** 47 one second-order factor model (model 4) 296.94 100 2.97 .08 .97 .89 .95 4 vs. 1 4.27 2 note. df = degrees of freedom; rmsea = root mean square error approximation; cfi = comparative fit index, gfi = goodness-of-fit index, nfi = normed fit index. **p < .001. klassen et al. 44 | f l r figure 2. first-order and second-order factor structures for ets (ee = emotional engagement, sec = social engagement: colleagues, ce = cognitive engagement, ses= social engagement: students, te = teacher engagement). 6. step 5 in step 5 we conducted canonical and zero-order correlation analyses to further test the construct validity of the scale. canonical correlation analysis examines commonalities in sets of factors from different variables by providing linear combinations of each set of the factors (hair, et. al., 2010). we examined correlations between the ets and two related measures: the utrecht work engagement scale (uwes; schaufeli et al., 2006) and the teacher sense of efficacy scale (tses; tschannen-moran & woolfolk hoy, 2001), a teacher motivation variable that taps teachers‘ expectancies of success in the classroom. the tses consists of three factors: self-efficacy for student engagement (se), instructional strategies (is), and classroom management (cm). the scale has been shown to be valid in a range of settings and to be related to positive teacher outcomes such as teacher commitment and inversely with quitting intention (klassen & chiu, 2011). schaufeli et al. (2006) found professional efficacy to be strongly related to work engagement across international contexts. 6.1 procedure the sample consisted of the same 321 participants described in step 4. to start, we used cfa to ensure the factor structure of the uwes and tses with our sample. results for the 3-factor uwes showed adequate fit to the data ( 2 (24)= 78.51, p < .05; cfi = .98; gfi = .95; nfi = .97; rmsea = .08; 90% ci = ce item4 item8 item11 item15 ee item2 item5 item10 item13 ses item3 item6 item14 item16 sec item1 item7 item9 item12 ce item4 item8 item11 item15 ee item2 item5 item10 item13 ses item3 item6 item14 item16 sec item1 item7 item9 item12 te model 1 (first-order, four factors) model 4 (second-order, one factor) klassen et al. 45 | f l r .06, .10). all factor loadings were significant and internal consistencies of each subscale raged from .74 to .78. results for the 3-factor tses also indicated good model fit ( 2 (41)= 112.90, p < .05; cfi = .98; gfi = .94; nfi = .97; rmsea = .07; 90% ci = .06, .09). all factor loadings were significant, with reliability coefficients above .80. 6.2 step 5 results the relationship of the ets with the tses and uwes scales was assessed through canonical correlation analyses (see table 7). the first canonical analysis (ets and tses) yielded three canonical variate pairs. a canonical correlation of .58 (33% overlapping variance), 2 (12) = 149.02, p < .001, was found for the first canonical variate, and .25 (6% overlapping variance), 2 (6) = 22.03, p < .05, for the second canonical variate. while the first two pairs of canonical variates accounted for the significant relationship, the 2 test was not statistically significant for the third pair. since the overlapping variance for the second variate was very low (i.e., < 10%, see tabachnick & fidell, 2007), only the result of the first pair is reported. as shown in table 7, with a cut off value set at .30 (tabachnick & fidell, 2007), all variables had significant relationship with the first canonical variate. thus, the first canonical analysis suggests positive relationships between all teacher engagement variables and teacher self-efficacy variables. table 7 correlations, standardised canonical coefficients, canonical correlations, percentages of variance, and redundancies between self-efficacy and engagement variables first canonical variate variables correlation coefficient set 1 (tses) student engagement -.84 -.57 instructional strategies -.89 -.66 classroom management -.50 -.13 percent of variance .59 redundancy .19 set 2 (ets) ce -.87 -.45 ee -.75 -.23 ses -.89 -.56 sec -.39 -.17 percent of variance .57 redundancy .19 canonical correlations .58 note. ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: colleagues. klassen et al. 46 | f l r the second canonical correlation analysis was performed between the set of ets variables and the set of uwes variables (vigour, dedication, and absorption). this second analysis yielded only two of the three variates as significant (see table 8). the first canonical correlation was .73 (i.e., 53% overlapping variance), 2 (12) = 286.92, p < .001. the second canonical correlation was .37 (14% overlapping variance), 2 (6) = 46.37, p < .05. in light of the high overlapping variance and the modest overlapping variance of the second correlation, only the first variate was taken into account. all factors had significant relationships (i.e., above .30 as suggested by tabachnick & fidell, 2007) with the first canonical variate suggesting a positive relationship between the ets and uwes variables. therefore, based on the results of two canonical correlation analyses, it can be concluded that teachers with high engagement scores on ets tend to gain high score on the tses and uwes. the zero-order correlation matrix (table 9) confirms this finding with all pairs of factors showing significant relationships. cognitive engagement showed the strongest correlations with absorption (r = .63, uwes) and student engagement (r = .48, tses); emotional engagement was most strongly related to dedication (r = .67, uwes) and student engagement (r = .39, tses); social engagement: students showed the strongest relationship with absorption (r = .42, uwes) and instructional strategies (r = .45, tses); and social engagement: colleagues showed the strongest relationship with dedication (r = .37, uwes) and student engagement (r = .26, tses). table 8 correlations, standardised canonical coefficients, canonical correlations, percentages of variance, and redundancies between the uwes and ets subscales first canonical variate variables correlation coefficient set 1 (uwes) vigour -.80 -.09 dedication -.94 -.58 absorption -.89 -.43 percent of variance .77 redundancy .41 set 2 (ets) ce -.86 -.46 ee -.93 -.65 ses -.61 -.06 sec -.53 -.07 percent of variance .57 redundancy .30 canonical correlations .73 note. ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: colleagues. klassen et al. 47 | f l r table 9 zero-order correlation coefficients between ets variables and uwes and tses variables uwes tses vigour dedication absorption instructional strategies student engagement classroom management ce .43** .54** .63** .38** .48** .28** ee .59** .67** .55** .38** .39** .22** ses .32** .41** .42** .45** .44** .27** sec .31** .37** .33** .15** .26** .24** note. ce = cognitive engagement, ee = emotional engagement, ses= social engagement: students, sec = social engagement: colleagues. **p < .001 7. discussion recent discussions about ways to improve social and educational outcomes have focused on the critical role played by teachers. rarely before has so much emphasis been placed on understanding the psychological make-up of effective teachers (rimm-kaufman & hamre, 2010; staiger & rockoff, 2010). from a psychological viewpoint, effective teaching is dependent on teachers who are motivated: fully engaged in their work, and engaged not just cognitively and emotionally, but also socially. our study‘s aim was to respond to the call for better understanding of teacher engagement by creating a reliable, valid, and usable multi-dimensional measure of work engagement that was specifically targeted at the work carried out by teachers in classrooms and schools. from a measurement perspective, the findings from this research provide support for the reliability and validity of the ets. in particular, the item statistics and reliabilities of the ets are very good, and the four factors represent appropriate measures of the internal structure of teacher engagement. furthermore, the analyses show that the ets factors are discrete, reliable, and valid. in general, the results suggest that measures of teacher engagement should incorporate the component factors of engagement, and that the factors are related to an overarching engagement factor. from a theoretical perspective, the findings show that social engagement with students and with colleagues should be considered as important dimensions of teacher engagement, alongside cognitive and emotional dimensions of engagement. our primary contribution to future research is in the creation of a four-factor teacher engagement measure that is practical (i.e., brief), valid, reliable, and that reflects the context of educational settings. our multiple steps of analyses resulted in a robust measure that correlates positively with a frequently used work engagement measure (the uwes), and is positively related to, but empirically distinct from, teachers‘ self-efficacy. the inclusion of social engagement is novel for conceptualizing and measuring work engagement, but the conceptual framework for work engagement is still developing (bakker et al., 2011; shuck et al., 2013), and conceptualizations that challenge how engagement is defined across contexts may contribute to a more general understanding of how the construct operates in diverse vocational settings. we know that social engagement with students is a fundamental aspect of teachers‘ work (e.g., pianta et al., 2012), and perhaps reflects the key mechanism through which student development is influenced. although conceptualizations of engagement that consist of dimensions of physical, cognitive, and emotional energy and involvement at work have been conventionally proposed, the results from our study suggest that social engagement—with students and with colleagues—forms an important dimension of overall engagement for teachers. we klassen et al. 48 | f l r suggest that a dimension representing social engagement is worth considering for future iterations of work engagement measures applied to a wide range of vocational settings. we failed to find separate domains of physical and cognitive engagement in our samples of teachers, and the question remains whether physical engagement is separable from cognitive, emotional, and social dimensions of teacher engagement. hakanen et al. (2006) proposed vigour (physical engagement) and dedication (emotional engagement) as the core dimensions of engagement in their study of the uwes with a group of teachers, but they did not test the hypotheses by including a cognitive dimension in their analyses. we did not find clear support for the separation of physical and cognitive engagement dimensions, and propose that for teachers, the line between the two is blurred. for example, we labelled ―i try my hardest to perform well while teaching‖ and ―while teaching, i really ‗throw‘ myself into my work‖ as examples of cognitive engagement, but the demands of individual teachers‘ classroom work may determine the relevance of particular dimensions for teachers. a teacher of young children may need to physically interact with students (crouching down, tying shoes, performing actions during music sessions) more often than a high school history teacher, thus increasing the salience of the physical engagement dimension for some teachers. hakanen et al. describe the physical job demands and resources that can be associated with engagement, but the level of physical demands for teachers varies as a function of the setting. further work should focus on teasing apart teachers‘ physical and cognitive engagement by exploring the two dimensions in a wider range of contexts. more work is needed to understand how engagement is fostered in teachers, and especially how the specific dimensions—emotional, cognitive, social, and perhaps physical engagement—develop through teacher training and into professional practice. research from related constructs such as teacher resilience (gu & day, 2007), self-efficacy (klassen & chiu, 2010, 2011), and commitment (collie, shapka, & perry, 2011) have shown that teacher motivation constructs change in predictable ways over the course of a career. the ets provides a way of measuring individual facets of engagement and how the facets change over time: for example, a teacher may exhibit high levels of social engagement at the beginning of a career but lower levels of cognitive engagement. we know that teacher engagement changes over even brief periods of time: recent research has shown that global teacher engagement shows weekly within-person variability in starting teachers (bakker & bal, 2010; durksen & klassen, 2012), with commitment to the profession mirroring the pattern of change in engagement. the job-demands resources model (jdr; e.g., bakker, hakanen, demerouti, & xanthopoulou, 2007; hakanen et al., 2006) provides a general way of conceptualizing the drivers of engagement, but details about how job resources in classrooms and schools—supportive climate, transformational leadership, access to information, job control—can be targeted at fostering specific engagement dimensions have not yet been studied. multilevel analyses of teacher engagement may provide insight into how engagement might be shared in a school, and how teachers working together transmit their engagement amongst themselves, and to their students. psychosocial research in a range of vocational contexts has shown that workers regularly share beliefs, emotions, and motivational patterns, and that social interaction influences individual psychology (e.g., bandura, 1997). 7.1 limitations although we found strong psychometric properties of the ets, and collected data from three independent samples of teachers, there are some clear limitations. the participants were all working in two western provinces in canada, and were largely female, and thus the samples offer only limited representativeness to other populations. the data collected were cross-sectional, and although engagement is said to possess state and trait characteristics (e.g., schaufeli & salanova, 2011), engagement fluctuates over time (e.g., bakker & bal, 2010). external validity of the measure is limited by the correlational nature of the study design, and no objective measure of teaching effectiveness or of student achievement was used as an outcome measure, a clear direction for future research. teacher engagement may lead to positive teacherstudent interactions, increased student engagement, and eventually to increased student achievement, but the evidence base needs developing. it must also be considered that the relationships among teacher engagement, klassen et al. 49 | f l r teacher-student interactions, student engagement, and student achievement are reciprocal: it is likely that teacher engagement both influences, and is influenced by, positive experiences of teacher-student interaction. although the ets focuses on in-classroom and in-school engagement, for some (but not all) teachers, out-of-school activities involving parents and the community form an important component of their social engagement. other measures, such as the olbi and uwes measure engagement more broadly, and their use may be preferable for cross-professional comparisons and to capture teachers‘ non-classroom related engagement. 7.2 conclusions and future research understanding teacher engagement is critical to understanding the psychological processes underlying effective teaching. our aim was to create a measure of teacher engagement that reflects the particular features of working in classrooms and in schools, and especially the social interactions shared by teachers and students. an important step in understanding effective teaching is to conceptualise and measure teacher engagement, and we hope that the ets can be useful in this regard, but our knowledge of how teachers‘ selfreports of engagement are reflected by behaviours in real classrooms is limited. although data from observation systems (e.g., the class from pianta et al., 2012) provide some insight into how engaged and effective teachers behave, such methods still leave interpretation of teachers‘ behaviours to the presence of external observers sitting in classes for relatively brief periods of time. further study is needed to identify the behavioural indicators of teacher engagement, and how these behaviours develop individually and collectively, and change over time. bakker and bal‘s 2010 study on weekly fluctuations of teacher engagement provides a useful starting point, but examining work engagement using finer-grained time spans may provide a valuable way forward in understanding teachers and teaching. creation of the ets may be a useful point of departure for better understanding teacher engagement, and by extension, student engagement and learning. keypoints we created and validated a 4-factor 16-item measure of teacher engagement: the engaged teachers scale (ets). the five steps of development resulted in a multidimensional measure that is practical (i.e., brief), valid, and reliable for use in education settings. the four factors were cognitive engagement, emotional engagement, social engagement: students, and social engagement: colleagues. acknowledgements the authors would like to gratefully acknowledge research funding from the social sciences and humanities research council of canada provided to the first author. references bandura, a. (1997). self-efficacy: the exercise of control. new york: w.h. freeman. bakker, a. b., albrecht, s. l., & leiter, m. p. (2011). key questions regarding work engagement. european journal of work and organizational psychology, 20, 4-28. doi:10.1080/1359432x.2010.485352 klassen et al. 50 | f l r bakker, a., & bal, m. (2010). weekly work engagement and performance: a study among starting teachers. journal of occupational and organizational psychology, 83, 189-206. doi:10.1348/096317909x402596 bakker, a. b., hakanen, j. j., demerouti, e., & xanthopoulou, d. (2007). job resources boost work engagement, particularly when job demands are high. journal of educational psychology, 99, 274– 284. doi: 10.1037/0022-0663.99.2.274 christian, m. s., garza, a. s., & slaughter, j. e. (2011). work engagement: a quantitative review and test of its relations with task and contextual performance. personnel psychology, 64, 89-136. doi:10.1111/j.1744-6570.2010.01203.x chou, c. p., & bentler, p. m. (1995). estimates and tests in structural equation modeling. in r. h. hoyle (ed.), structural equation modeling: concepts, issues, and applications (pp. 37-55), thousand oaks, ca: sage. collie, r. j., shapka, j. d., & perry, n. e. (2011). predicting teacher commitment: the impact of school climate and social-emotional learning. psychology in the schools, 48, 1034-1048. doi:10.1002/pits.20611 conway, j. m., & huffcutt, a. i. (2003). a review and evaluation of exploratory factor analysis practices in organizational research. organizational research methods, 6, 147-168. doi:10.1177/1094428103251541dalal, r. s., brummel, b. j., wee, s., & thomas, l. l. (2008). defining employee engagement for productive research and practice. industrial and organizational psychology, 1, 52-55. doi:10.1111/j.1754-9434.2007.00008.x davis, h. a. (2003). conceptualizing the role and influence of student– teacher relationships on children‘s social and cognitive development. educational psychologist, 38, 207–234. doi:10.1207/ s15326985ep3804_2 demerouti, e., mostert, k., & bakker, a. b. (2010). burnout and work engagement: a thorough investigation of the independency of both constructs. journal of occupational health psychology, 15, 207-234. doi:10.1037/a0019408 durksen, t. l., & klassen, r. m. (2012). pre-service teachers‘ weekly commitment and engagement during a final training placement: a longitudinal mixed methods study. educational and child psychology, 29, 32-46. economist intelligence unit (2012). the learning curve: lessons in country performance in education. retrieved from: http://thelearningcurve.pearson.com/ field, a. (2009) discovering statistics using spss (3rd ed.). london: sage. gu, q., & day, c. (2007). teachers‘ resilience: a necessary condition for effectiveness. teaching and teacher education, 23, 1302-1316. doi:10.1016/j.tate.2006.06.006 hair, j. f., black, b., babin, b., anderson, r. e., & tatham, r. l. (2010). multivariate data analysis. (7th ed.). upper saddle river, nj: prentice-hall. hakanen, j. j., bakker, a. b., & schaufeli, w. b. (2006). burnout and work engagement among teachers. journal of school psychology, 43, 495-513. doi:10.1016/j.jsp.2005.11.001 henson, r. k., & roberts, j. k. (2006). use of exploratory factor analysis in published research: common errors and some comment on improved practice. educational and psychological measurement, 66, 393-416. doi: 10.1177/0013164405282485 jennings, p. a., & greenberg, m. t. (2009). the prosocial classroom: teacher social and emotional competence in relation to student and classroom outcomes. review of educational research, 79, 491-525. doi:10.2307/40071173 jöreskog, k. g. & sörbom, d. (2006). lisrel 8.80 for windows [computer software]. lincolnwood, il: scientific software international, inc. kahn, w. a. (1990). psychological conditions of personal engagement and disengagement at work. academy of management journal, 33, 692-724. doi: 10.2307/256287kahn, w. a. (1992). to be fully there: psychological presence at work. human relations, 45, 321-349. doi: 10.1177/00187267920450040 klassen, r. m., al-dhafri, s., mansfield, c. f., purwanto, e., siu, a., wong, m. w., & woods-mcconney, a. (2012). teachers‘ engagement at work: an international validation study. journal of experimental education, 80, 1-20. doi: 10.1080/00220973.2012.678409 http://psycnet.apa.org/doi/10.1037/0022-0663.99.2.274 http://psycnet.apa.org/doi/10.1037/a0019408 klassen et al. 51 | f l r klassen, r. m., bong, m., usher, e. l., chong, w. h., huan, v. s., wong, i. y., & georgiou, t. (2009). exploring the validity of the teachers‘ self-efficacy scale in five countries. contemporary educational psychology, 34, 67-76. doi:10.1016/j.cedpsych.2008.08.001 klassen, r. m., & chiu, m. m. (2010). effects on teachers‘ self-efficacy and job satisfaction: teacher gender, years of experience, and job stress. journal of educational psychology. 102, 741-756. doi:10.1037/a0019237 klassen, r. m., & chiu, m. m. (2011). the occupational commitment and intention to quit of practicing and pre-service teachers: influence of self-efficacy, job stress, and teaching context. contemporary educational psychology, 36, 114-129. doi:10.1016/j.cedpsych.2011.01.002 klassen, r. m., perry, n. e., & frenzel, a. c. (2012). teachers‘ relatedness with students: an underemphasized component of teachers‘ basic psychological needs. journal of educational psychology, 104, 150-165. doi: 10.1037/a0026253 kline, r. b. (2005). principles and practice of structural equation modeling (2nd ed.). new york: guilford. ledesma, r. d., & valero-mora, p. (2007). determining the number of factors to retain in efa: an easy-touse computer program for carrying out parallel analysis. practical assessment, research & evaluation, 12(2), 2-11. retrieved from http://pareonline.net/pdf/v12n2.pdf macey, w. h., & schneider, b. (2008). the meaning of employee engagement. industrial and organizational psychology, 1, 3-30. doi: 10.1111/j.1754-9434.2007.0002.x matsunaga, m. (2010). how to factor-analyze your data right: do‘s, don‘ts, and how-to‘s. international journal of psychological research, 3, 97-110. retrieved from http://www.redalyc.org/articulo.oa?id=299023509007 meyer, j. p., allen, n. j., & smith, c. a. (1993). commitment to organizations and occupations: extension and test of a three-component conceptualization. journal of applied psychology, 78, 538–551. doi: 10.1037/0021-9010.78.4.538 o‘connor, b. p. (2000). spss and sas programs for determining the number of components using parallel analysis and velicer‘s map test. behavior research methods, instruments, & computers, 32, 396402. doi: 10.3758/bf03200807 pianta, r. c., hamre, b. k., & allen, j. p. (2012). teacher-student relationships and engagement: conceptualizing, measuring, and improving the capacity of classroom interactions (pp. 365-386). in s. l. christenson, a. l. reschly, & c. wylie (eds.), handbook of research on student engagement. dordrecht, netherlands: springer. doi:10.1007/978-1-4614-2018-7 rich, b. l. (2006). job engagement: construct validation and relationships with job satisfaction, job involvement, and intrinsic motivation (doctoral dissertation, university of florida). retrieved from econis, proquest umi 3228825 dissertation publishing. rich, b. l., lepine, j. a., & crawford, e. r. (2010). job engagement: antecedents and effects on job performance. the academy of management journal, 53, 617-635. doi:10.5465/amj.2010.51468988 rimm-kaufman, s. e., & hamre, b. k. (2010). the role of psychological and developmental science in efforts to improve teacher quality. teachers college record, 112, 2988-3023. retrieved from http://www.tcrecord.org/library roorda, d. l., koomen, h. m. y., spilt, j. l., & oort, f. j. (2011). the influence of affective teacherstudent relationships on students‘ school engagement and achievement: a meta-analytic approach. review of educational research, 81, 493-529. doi:10.3102/0034654311421793 roth, g., assor, a., kanat-maymon, y., & kaplan, h. (2007). autonomous motivation for teaching: how self-determined teaching may lead to self-determined learning. journal of educational psychology, 99, 761-774. doi: 10.1037/0022-0663.99.4.761 saks, a. m. (2008). the meaning and bleeding of employee engagement: how muddy is the water? industrial and organizational psychology, 1, 40-43. doi: 10.1111/j.1754-9434.2007.00005.x schaufeli, w. b., bakker, a. b., & salanova, m. (2006). the measurement of work engagement with a short questionnaire: a cross-national study. educational and psychological measurement, 66, 701-716. doi: 10.1177/0013164405282471 schaufeli, w., & salanova, m. (2011). work engagement: on how to better catch a slippery concept. european journal of work and organizational psychology, 20, 39-46. doi: 10.1080/1359432x.2010.515981 http://pareonline.net/pdf/v12n2.pdf http://psycnet.apa.org/doi/10.1037/0021-9010.78.4.538 http://www.tcrecord.org/library klassen et al. 52 | f l r schaufeli, w.b., salanova, m., gonzalez-roma, v., & bakker, a. b. (2002). the measurement of engagement and burnout: a two-sample confirmatory factor analytic approach. journal of happiness studies, 3, 71–92. doi: 10.1023/a:1015630930326 shimazu, a., schaufeli, w.b., kosugi, s., suzuki, a., nashiwa, h., kato, a., et al. (2008). work engagement in japan: development and validation of the japanese version of the utrecht work engagement scale. applied psychology: an international review, 57, 510-523. doi: 10.1111/j.14640597.2008.00333.x shuck, m. b. (2010). employee engagement: an examination of antecedent and outcome variables (doctoral dissertation, florida international university). retrieved from http://digitalcommons.fiu.edu/etd/235 shuck, b. (2011). four emerging perspectives of employee engagement: an integrative literature review. human resource development review, 10, 304-328. doi:10.1177/1534484311410840 shuck, b., ghosh, r., zigarmi, d., & nimon, k. (2013). the jingle jangle of employee engagement: further exploration of the emerging construct and implications for workplace learning and performance. human resource development review, 12, 11-35. doi:10.1177/1534484312463921 sonnentag, s. (2003). recovery, work engagement, and proactive behavior: a new look at the interface between non-work and work. journal of applied psychology, 88, 518-528. doi: 10.1037/00219010.88.3.518 staiger, d. o., & rockoff, j. e. (2010). searching for effective teachers with imperfect information. journal of economic perspectives, 24, 97-118. doi:10.1257/jep.24.3.97 tabachnick, b. & fidell, l. s. (2007). using multivariate statistics (5th ed.). boston: pearson. thomas, c. h. (2006). clarifying the concept of work engagement: construct validation and an empirical test. (doctoral dissertation, the university of georgia). retrieved from http://hdl.handle.net/10724/9113 tschannen-moran, m. & woolfolk hoy, a. (2001). teacher efficacy: capturing an elusive construct. teaching and teacher education, 17, 783-805. retrieved from http://www.sciencedirect.com/science/article/pii/s0742051x01000361# velicer, w. f., & jackson, d. n. (1990). component analysis versus common factor analysis: some further observations. multivariate behavioral research, 25(1), 97-114. doi: 10.1207/s15327906mbr2501_12 wang, m.-t. (2009). school climate support for behavioral and psychological adjustment: testing the mediating effect of social competence. school psychology quarterly, 24, 240–251. doi:10.1037/a0017999 wang, y., & qin, j. (2011). the structure of preschool teachers‘ work engagement survey in china. international conference on social science and humanity, 5, 464-468. retrieved from http://www.ipedr.com/vol5/no2/103-h10248.pdf watt, h. m. g., & richardson, r. w. (2007). motivational factors influencing teaching as a career choice: development and validation of the fit-choice scale. the journal of experimental education, 75, 167-202. doi:10.3200/jexe.75.3.167-202 xanthopoulou, d., bakker, a.b., demerouti, e., & schaufeli, w.b. (2007). the role of personal resources in the job demands-resources model. international journal of stress management, 14, 121-141. doi: 10.1037/1072-5245.14.2.121 zwick, w. r., & velicer, w. f. (1986). comparison of five rules for determining the number of components to retain. psychological bulletin, 99, 432-442. doi: 10.1037/0033-2909.99.3.432 http://psycnet.apa.org/doi/10.1037/0021-9010.88.3.518 http://psycnet.apa.org/doi/10.1037/0021-9010.88.3.518 http://www.sciencedirect.com/science/article/pii/s0742051x01000361 http://www.ipedr.com/vol5/no2/103-h10248.pdf http://psycnet.apa.org/doi/10.1037/0033-2909.99.3.432 codepen 2. willems frontline learning research special issue vol.9 no.2 (2021) 28 49 issn 2295-3159 predicting freshmen’s academic adjustment and subsequent achievement: differences between academic and professional higher education contexts jonas willems1, tine van daal1, peter van petegem1, liesje coertjens2 & vincent donche1 1university of antwerp, belgium 2uclouvain, belgium article received 18 march 2020 / revised 28 august/ accepted 4 december/ available online 12 march 2021 abstract this study tests an integrative model, which delineates how students’ academic motivation, academic self-efficacy and learning strategies (processing strategies and regulation strategies) at the end of secondary education impact academic adjustment in the first semester of the first year of higher education (fyhe) and subsequent academic achievement at the end of the fyhe, in two types of he programmes. more precisely, the present study explores the extent to which the explanatory values of aforementioned determinants of academic adjustment and academic achievement differ across academic (providing more theoretical and scientific education) and professional (offering more vocational education that prepares students for a particular occupation, such as nursing) programmes. hereto, multiple-group sem analyses were carried out on a longitudinal dataset containing 1987 respondents (academic programmes: n=1080, 54.4%; professional programmes: n=907, 45.6%), using mplus 8.3. results indicate differences in the predictive power of determinants under scrutiny between professional and academic contexts. firstly, learning strategies and motivational variables at the end of secondary education have more predictive power in the prediction of fyhe academic adjustment in the academic programmes than in professional programmes. secondly, our results indicate that academic adjustment in the first semester of the fyhe influences academic achievement to a bigger extent in professional programmes than in academic programmes. moreover, these differences across he contexts were found after controlling for prior education. implications of the findings are discussed. keywords: learning strategies; motivation; academic adjustment; first-year academic achievement; programme diversity info corresponding author email: jonas.willems@uantwerpen.be doi: https://doi.org/10.14786/flr.v9i2.647 1. introduction over the years, democratisation of higher education (he) around the world has led to a substantial increase and diversification of the student population enrolling in he (schuetze & slowey, 2002). this seems to be accompanied by low study success rates, early drop-out and study delay of students in the first year of higher education (fyhe). for example, in flanders (dutch speaking part of belgium), only 48.6% of freshmen successfully complete their required coursework in the fyhe (declercq & verboven, 2014). this has extensive psychological and financial costs for the individual student, the family and society (oecd, 2013). as such, more insight into factors that facilitate freshmen’s transition process to he can give rise to an increase in the academic achievement of these students (briggs, clark, & hall, 2012). in recent decades, several lines of research have argued that non-cognitive factors such as learning strategies and motivational and adjustment variables are important determinants of students’ academic achievement in the fyhe (e.g. bailey & phillips, 2016; credé & kuncel, 2008; richardson, abraham, & bond, 2012; robbins et al., 2004). this body of research, however, has been mostly carried out in academically oriented he programmes (offering more theoretical and scientific education), leaving professionally oriented he contexts (offering more vocationally oriented education that prepares students for a specific occupation) rather underexplored (for an exception, see vanthournout, gijbels, coertjens, donche, & van petegem, 2012). this paucity of research in professional he contexts is certainly remarkable, given that a significant part of adolescents worldwide enrols in professional he programmes (oecd, 2009), for example, in flanders, 54.4% of the he student population participate in professional he (flemish government, 2019). moreover, previous research points out that institutional and disciplinary differences might influence the interrelationships between variables in a predictive model of academic achievement (e.g. de clercq et al., 2013). simply assuming that the aforementioned determinants of academic achievement have the same predictive value in professional and academic fyhe contexts, thus, seems to neglect this important source of meso-level diversity. therefore, this study sets out to investigate to what extent the predictive power of learning strategies (processing strategies and regulation strategies), motivational variables and academic adjustment in predicting fyhe students’ academic achievement differs across academic and professional he contexts, using an integrative, longitudinal research design. in what follows, we firstly describe how he in flanders is organised, after which we briefly describe the main constructs and expected relationships under study – albeit as we will point out have been investigated in predominantly academic he contexts, with little attention to academic adjustment as an intervening variable for academic achievement in the fyhe. 2. research context: flemish he system as in many other european he systems (such as germany, the netherlands, finland, denmark and portugal), flemish he is provided by two types of institutions: universities and university colleges. universities offer academically oriented he programmes, which provide mainly theoretical and scientific education. they typically prepare students for a succeeding master programme and correspond to the bologna two-cycle programmes (bachelor and master, encompassing a total of 4 or 5 years; the bologna declaration, 1999). university colleges, on the other hand, are specialised institutions that organise so called ‘professional bachelor programmes’, which are mainly designed for learners to acquire the knowledge, skills and competencies specific to a particular occupation, such as nursing or social work (camilleri, delplace, frankowicz, hudak, & tannhäuser, 2014). these vocational programmes offer a direct access to the labour market and are in line with the bologna first cycle programmes (one cycle of 3 years). academic and professional bachelor programmes have different aims and expectations of students, and typically differ from each other with regard to their curricular organisation. in professional programmes, theory and practice are combined through the use of student-centred learning methodologies such as: simulations, working with real-life materials and workplace learning settings (e.g. long-term internships, machinery to repair, assignments for translators, see also camilleri et al. 2014). in academic programmes, then, subject matter is more abstract and often less practical. also, the teaching speed is higher, research activities and large-scale lectures are more common, and more independent learning and scientific research attitudes are expected from students in these academic programmes (van rooij et al., 2017). 3. the pivotal role of academic adjustment in predicting academic achievement academic adjustment is generally described as the extent to which a student successfully copes with the various educational demands and characteristics of the new he environment, and comprises components such as motivation to learn, taking action to meet academic demands, having a clear sense of purpose, management of expectations, and general satisfaction with the academic environment (baker, mcneil, & siryk, 1985; baker & siryk, 1999; gerdes & mallinckrodt, 1994). today, it is well established from a variety of studies that academic adjustment is imperative in the prediction of students’ academic achievement in the fyhe: students who are more academically adjusted drop out less often (bean, 1980; kuh, kinzie, buckley, bridge, & hayek, 2006) and achieve better grades (bailey & phillips, 2016; petersen, louw, & dumont, 2009; prospero & vohra-gupta, 2007; severiens & wolff, 2008; wintre et al., 2011). considering this importance of academic adjustment in the prediction of freshmen’s academic achievement, it is not surprising that a considerable number of studies on the first-year transition experience treat this construct as an important outcome in its own right (e.g. garriott, love, & tyler, 2008; rice, vergara, & aldea, 2006). this latter body of research has unveiled that students’ learning strategies and motivational variables, on their turn, have considerable impact on first-year academic adjustment (e.g. baker, 2004; cazan, 2012). moreover, previous research in academic he contexts has suggested that academic adjustment is a mediator of the effects of several learning strategies and motivational variables on academic achievement (petersen, louw, & dumont, 2009; van rooij, jansen, & van de grift, 2018), which further highlights the pivotal role of the academic adjustment construct in first-year students’ transition process. for instance, van rooij and colleagues (2018) found that intrinsic (autonomous) motivation and self-regulated study behaviour did not influence academic achievement directly, but through academic adjustment. however, the work of peterson et al. (2009), who also investigated the mediating role of first-year students’ academic adjustment, suggests that this construct is not a “pure” mediator on academic achievement. indeed, these authors found that the effects of students’ intrinsic motivation and identified regulation (together autonomous motivation) and self-esteem were mediated by adjustment, while extrinsic regulation (controlled motivation) and academic overload (being unable to cope with the academic workload) had a direct impact on academic achievement. this rationale leads us to the integrative conceptual model adopted in the present study, which delineates that students’ learning strategies (deep processing, surface processing, self-regulation, lack of regulation), academic motivation, and academic self-efficacy have an impact on academic adjustment in the first semester of the fyhe and subsequent academic achievement (fig. 1). acknowledging that academic adjustment might not be a pure mediator on academic achievement (peterson et al., 2009), the model also includes direct paths between the exogenous variables and academic achievement. furthermore, as it is clear from previous studies that students’ prior secondary education tracks might influence academic adjustment and achievement as well (e.g. de clercq et al., 2013; vermunt, 2005), in the present study, we have included this factor as a control variable. finally, for the design of the present study, we adhere to the suggestion of van rooij et al. (2018) that research on this matter should be conducted longitudinally and should “start measuring motivational and behavioural variables in secondary school and investigate how they relate to adjustment and student success outcomes later in university” (p. 763). figure 1. conceptual model of learning strategies and motivational variables affecting academic adjustment and subsequent academic achievement. from the above, it is clear that one might expect a positive relationship between academic adjustment and academic achievement. the subsequent paragraphs further detail on the hypothesised interrelations between motivational and learning related variables on the one hand, and academic adjustment and achievement on the other hand. the hypothesised directions of the associations in the conceptual model (fig.1) are summarised in table 1. 4. motivational and learning related determinants of academic adjustment and academic achievement 4.1 academic motivation it has previously been observed that fyhe students’ academic motivation (deci & ryan, 2000) is linked to academic adjustment. for instance, clark, middleton, nguyen, & zwick (2014), petersen et al. (2009) and van rooij et al. (2018) reported a positive relation between types of autonomous motivation and academic adjustment. amotivation, on the other hand, has been found to be associated with lower academic adjustment (baker, 2004). research further shows that students who are more autonomously motivated have higher academic achievement than students who are more controlled motivated or more amotivated (e.g. bailey & phillips 2016; guay, ratelle, roy, & litalien, 2010). finally, a number of studies revealed a negative association between amotivation and students’ academic achievement (e.g. prospero & vohra-gupta, 2007; vanthournout et al., 2012). 4.2 academic self-efficacy academic self-efficacy (further shortened to self-efficacy) is defined as individuals’ beliefs that they can successfully perform given academic tasks at designated levels (schunk, 1991). this construct has repeatedly been identified as one of the strongest determinants of academic achievement in the fyhe (e.g. richardson et al. 2012; robbins et al. 2004). furthermore, cazan (2012) and chemers, hu, & garcia (2001) demonstrated that self-efficacy was strongly and positively related with academic adjustment. van rooij et al. (2018), however, did not find a significant relationship between self-efficacy and academic adjustment, after controlling for intrinsic motivation, self-regulation and degree programme satisfaction. thus, the specific role of self-efficacy, especially after controlling for additional concepts, remains inconclusive. 4.3 learning strategies in the learning pattern model, developed by vermunt (1998), learning strategies are described to encompass both cognitive processing strategies and regulation strategies. firstly, processing strategies refer to those thinking activities and study skills students apply whilst studying (vermunt, 1998). generally, two qualitatively different types of cognitive processing are discerned in educational literature, namely deep and surface processing (e.g. vermunt & donche, 2017; vermunt & vermetten, 2004). deep processing refers to the use of learning activities that lead to meaningful learning and in-depth understanding of the learning content, such as relating and structuring. surface processing refers to the use of learning activities like memorizing that lead to the learning of superficial features of a study task, also described as root learning (vermunt & vermetten, 2004). traditionally, it has been argued that the use of deep processing strategies leads to high academic achievement, while surface processing strategies entail lower academic achievement (vermunt & donche, 2017). two important meta-analyses in the field corroborate this idea, as they found those relationships to be significant, albeit small (dent & koenka, 2016; richardson et al., 2012). however, the findings of studies on the direction of the relationship between cognitive processing and academic achievement in studies can also be inconclusive. for instance, it has been argued that surface learning, in some situations, might be advantageous to the learner (see dinsmore & alexander (2012) for an elaborate exposition). further, associations between cognitive processing strategies and academic adjustment have been less investigated. nevertheless, previous research has demonstrated that students who lack appropriate study skills in he are at risk of having problems with their academic adjustment (abbott-chapman, hughes, & wyld, 1992). therefore, we hypothesise that deep and surface processing strategies will be related to academic adjustment. secondly, regulation strategies are defined as those activities that students use to harness their cognitive processing strategies (schunk & zimmerman, 2012). students who are more self-regulated are able to actively steer their own learning processes through activities such as planning tasks, monitoring progress, and diagnosing problems. lack of regulation, on the other hand, refers to an absence of clarity on how to steer the learning process (vermunt & donche, 2017). several studies have found self-regulation to be positively related to both academic adjustment (cazan, 2012; hurtado et al., 2007; van rooij et al., 2018) and academic achievement (e.g. dent & koenka, 2016; richardson et al., 2012). although lack of regulation has repeatedly been found to affect academic achievement in a negative fashion (e.g. donche & van petegem, 2010; vermunt, 2005), to our knowledge, there has been no investigation of the relationship between lack of regulation and academic adjustment in professional and academic programmes. considering the ‘deficit’-character of the construct, we theoretically expect that lack of regulation is negatively associated to academic adjustment. table 1 hypothesised directions of the relationships under study 5. exploring programme diversity: a meso-level study when reviewing the literature on determinants of academic adjustment and academic achievement, it becomes apparent that relatively few studies have tackled these relationships in the specific setting of professional he. indeed, the vast majority of studies on the relationships under scrutiny (fig.1) have been carried out in academically oriented programmes (e.g. petersen et al., 2009; van rooij et al., 2018). several scholars, however, have established that disciplinary differences influence the learning environments wherein students reside in terms of, for instance; requirements of students, assessment systems, study goals and teaching methods (becher, 1994; braxton & hargens, 1996; young, 2010). moreover, previous research has suggested that such variations in environments might influence interrelationships between non-cognitive variables and academic achievement. an example of this disciplinary diversity is provided by de clercq et al. (2013) who investigated whether freshmen’s background, study choice process, experience of the university, motivational beliefs, learning strategies, and behavioural engagement had similar predictive power in two university disciplines: science and physical education. the authors found several differences in the effects of those determinants; in physical education courses, for example, self-efficacy was the most powerful predictor of academic achievement, whereas intention to persist was the most powerful determinant in the science discipline. another study by lizzio, wilson, & simons (2002) showed that relationships between university students’ prior achievement, learning strategies and academic achievement varied between faculties of humanities, science, and commerce. this also concurs with the study by fonteyne, duyck & de fruyt (2017), who found that the predictive power of background, cognitive, personality, metacognitive, self-efficacy and motivational factors on academic achievement varied considerably across various academic study disciplines, such as psychology, criminology, history, and pharmaceutical sciences. however, students’ academic adjustment, as a possible mediator in further understanding the effects of entry characteristics on academic achievement, was not taken into account in these studies. moreover, professionally oriented programmes have distinctive aims and expectations of students and typically adopt different didactical approaches than academically oriented programmes (camilleri et al., 2014; see also “research context: flemish he system”). we therefore expect that such programme diversity could influence relationships in a predictive model of academic achievement as well. determinants of academic adjustment and academic achievement might, thus, have a dissimilar predictive value in both contexts. therefore, the aim of this study is to explore this programme diversity, by comparing the impact of the different determinants depicted in the conceptual model (fig.1), in professionally and academically oriented programmes. the following two research questions are central to this study: rq1: to what extent is the explanatory value of secondary students’ academic motivation, academic self-efficacy and learning strategies (processing and regulation strategies) in the prediction of first-year he academic adjustment, different between professional and academic fyhe programmes? rq2: to what extent is the explanatory value of secondary students’ academic motivation, academic self-efficacy, learning strategies and subsequent first-year academic adjustment, in the prediction of first-year he academic achievement, different between professional and academic fyhe programmes? 6. method 6.1 respondents & procedure the data stem from a longitudinal research project on students’ transition from secondary to he in flanders. in this project, students from 32 randomly selected secondary schools (offering a mixture of secondary education (se) tracks; general, arts, technical and vocational) participated and were followed up until the second year of he. at the end of their last year of se, students completed questionnaires (both online and paper and pencil) measuring their academic motivation, self-efficacy and learning strategies (wave 1: may/june 2011, n=2839). informed consent and contact information for future research was obtained from 84.1% of these students (n=2387). data obtained from the flemish government show that 1987 (83.3%) of students who had given their informed consent transitioned to he, constituting the final sample for this study. a small majority of these students (n=1080, 54.4%) started an academic bachelor programme, whereas 907 (45.6%) opted for a professional bachelor programme. at a second wave, during the first semester of the fyhe, students’ academic adjustment was mapped out in an online questionnaire. several communication channels were used to reach respondents (letter, e-mail, sms, phone call after repeated non-response), making this an intensive data collection that extended over three months (october december 2011). in the second wave of the study, 604 (30.4%; academic programmes: n=331; professional programmes: n=273) of the 1987 students who transitioned to he completed the academic adjustment scale. table 2 shows the proportions of respondents’ prior education tracks in both academic and professional he programmes, and compares these with the corresponding student proportions in the actual flemish population in academic year 2010-2011, as reported by glorieux, laurijssen, & sobczyk (2014). as no students from the vocational se track completed the academic adjustment scale in he, this group of students is not represented in the present study. table 2 proportion of prior education study tracks, in the sample and the flemish population, across academic and professional he programmes 6.2 measures students’ motivational characteristics and learning strategies at the end of se (wave1) were mapped out using scales of the self-report questionnaire ‘learning strategy and motivation questionnaire’, which was previously validated in flanders (lemo; donche & van petegem, 2008). academic motivation. controlled motivation was operationalised by six items, of which ‘i am motivated to study, because i am supposed to do this’ is an example (α=.73). autonomous motivation was also measured by six items, for instance, ‘i am motivated to study, because i want to learn new things’ (α=.83). finally, amotivation was measured using three items, such as ‘i am motivated to study…honestly, i don’t know; i feel like i’m wasting my time in school’ (α=.78). five answering categories were given, ranging from ‘not important at all’ to ‘very important’. self-efficacy is defined more specifically as a student’s perception of having the necessary knowledge and skills to carry out learning tasks. the self-efficacy scale exists of four items, for instance ‘i have confidence in the way in which i study’ (α=.84). items were scored on a five-point likert scale ranging from ‘completely disagree’ to ‘completely agree’. learning strategies. we opted to measure qualitatively different cognitive processing and regulation strategies. more concretely, surface processing was measured by the ‘memorizing’-scale (e.g. ‘i memorise lists of characteristics of a certain phenomenon’; 4 items; α=.67), and deep processing was measured by the ‘critical processing’-scale (e.g. ‘i try to understand the interpretations of experts in a critical way’; 4 items; α=.73). on the level of regulation strategies, we measured self-regulation (e.g. ‘in addition to the compulsory subject matter, i read other books or texts that have to do with the subject matter’; 4 items; α=.64), and lack of regulation (e.g. ‘i notice that it is difficult for me to determine whether i have sufficiently mastered the subject matter; 4 items; α=.70). all items are scored ranging from 1 (i never or hardly ever do this) to 5 (i almost always do this). academic adjustment (wave2) was measured using the ‘adjustment’-scale (6 items, α=.76), validated by torenbeek and colleagues (2010, 2011). an item example is: ‘i have experienced some difficulties in adjusting to the teaching approach of my current study programme (reverse coded)’. items were scored on a five-point likert scale, ranging from ‘completely disagree’ to ‘completely agree’. academic achievement data from students at the end of the fyhe (wave3) was obtained from the flemish government. it was conceptualised as study progress, which is the ratio of credits (ects study points) earned by a student versus the credits attempted by that student. credits are earned when a course is passed, which is when a student scored a minimum of 10 out of 20 on the evaluation for that course. prior education. data on students’ prior education tracks was obtained from the flemish government. flemish se is provided for young people aged 12 to 18 in four tracks: general se, technical se, artistic se, and vocational se. as mentioned above, the present study was not able to include students from the vocational se track. although differences in the educational tracks in general se exist, students from the general se track tend to be more prepared to enrol in an academic he study programme. all students in the flemish educational system are free to access either professional or academic he programs after se. for use in further analysis, this variable was dummy coded (0=general track, 1=arts/technical track). 6.3 analysis in the present study, multiple-group structural equation modelling (sem; byrne, 2016) is used to examine the fit of the conceptual model illustrated in figure 1, and to conduct cross-group comparisons between university college students and university students (rq1 and rq2). all analyses were carried out in mplus (version 8.3). in all models, the maximum likelihood estimator with robust standard errors (mlr) was used, which is robust to non-normality of observations (muthén & muthén, 2010). this method also allows for missing data handling; using the complete sample by incorporating data from respondents that did not participate in every wave, which has been found to provide better results in terms of unbiased estimates and statistical power (enders, 2011). global fit of the models is assessed using the ‘comparative fit index’ (cfi) and ‘root mean square error of approximation’ (rmsea). a model has excellent fit when cfi has a value above .95, and rmsea has a value less than .05. a model has acceptable fit, with a cfi-value above .90, and rmsea value is less than .08 (hu and bentler, 1999). a prerequisite to conducting substantive comparisons between groups is the establishment of measurement invariance across those groups (vandenberg & lance, 2000). hence, in a first step, we carried out multiple-group measurement invariance testing (meredith, 1993) to seek evidence that our measurement instrument operates equivalently across the two groups under scrutiny (i.e. do university college students and university students understand all the scales and items in a similar way?). hereto, four steps were undertaken, in each of which more restricted confirmatory factor analysis (cfa) models are estimated (byrne, 2016; vandenberg & lance, 2000): (1) a configural invariance model, wherein only the number of factors and the factor-loading pattern are equivalent across groups. in this stadium, there are no equality constraints imposed on the parameter estimates of the model; (2) a metric invariance model requires that only factor loadings are equal across groups; (3) a scalar invariance model, wherein intercepts are constrained as well; and (4) a strict invariance model, finally, imposes equality constrains on the error variances across groups (brown, 2014; gregorich, 2006). when metric invariance is established, this means that the different constructs in the measurement model have the same meaning in the two groups. scalar invariance, then, implies that the means of the scales across both groups can be compared. researchers generally agree that assessing scalar invariance is sufficient for establishing measurement invariance (milfont & fischer, 2010). in every proceeding step, the invariance of the factor structure was evaluated by comparing the fit of the more restricted model with the fit of the less restricted model (byrne, 2016). to this end, we examined changes in the following fit indices: cfi and rmsea. a decrease in cfi of .01 or more (cheung & rensvold, 2002) and an increase in rmsea of .015 or more (chen, 2007) was considered as evidence that the invariance hypothesis should be rejected1 . if scalar measurement invariance was not attained, the model was tested for partial scalar invariance (byrne, shavelson, & muthén, 1989). hereto, modification indices were examined to identify possible item(s) that induced the lack of equivalence, after which the particular intercepts of these items were allowed to differ between groups. byrne et al. (1989) and steinmetz, schmidt, tina-booh, wieczorek, & schwartz (2009) suggest that a minimum of two intercepts have to be equal across groups to establish partial scalar invariance of a scale. after this initial testing of the measurement models and their equivalence over students from different types of bachelor programmes, the structural model (fig. 1) was as stated above tested using a multiple-group sem approach. this allowed us to compare (a) the standardised regression parameter estimates and (b) the explained variances in the endogenous variables in both groups of students. in this study p < .05 is used as a criterion of statistical significance. further, in order to more accurately compare the predictive power in terms of explained variance of the factors under study on their respective outcomes across academic and professional programmes, we controlled for prior education, and, subsequently, scrutinised the incremental values of these factors over prior education. during analyses, we encountered a multicollinearity problem that was induced by high latent correlations between the ‘lack of regulation’ and ‘self-efficacy’ variables in the academic (r=-.815) as well as in the professional (r=-.651) bachelor programme group of respondents. this multicollinearity issue clearly influenced the analyses as theoretically inconceivable parameter estimates emerged when both constructs were included in one predictive model. since ‘lack of regulation’ and ‘self-efficacy’ are theoretically clearly distinctive and both concepts have considerable impact on first-year academic adjustment and academic achievement, we opted to retain both variables in the analysis. hence, we preferred to break down the model under scrutiny (fig.1) in two components containing either learning strategies or motivational factors as determinants of academic adjustment and subsequent academic achievement (fig.2). figure 2. split of the conceptual models of learning strategies and motivational variables affecting academic adjustment and subsequent academic achievement. 7. results 7.1 measurement invariance multiple-group measurement invariance analyses were conducted on the two conceptual models (see figure 2 in the method section). in the next paragraphs, we first provide an overview of the results for the learning strategies variables and subsequently for the motivational variables. 7.1.1 learning strategies after adding one error covariance term in the lack of regulation scale, and one in the academic adjustment scale, the configural model for learning strategies showed adequate fit (see table 3). next, inspection of the metric model shows that the hypothesis of invariant factor loadings was not rejected (∆cfi=.003, ∆rmsea=0). after constraining the intercepts (scalar model), however, model fit decreased significantly (∆cfi=.016, ∆rmsea=-.003). we needed to relax constraints on one intercept of the lack of regulation scale, and one of the surface processing scale to improve model fit sufficiently. subsequently, imposing equality constrains on the error variances across groups did not significantly reduce fit (see table 3). thus, results for this model suggest (1) metric invariance for all scales in the model, (2) scalar invariance for deep processing, self-regulation and academic adjustment, and (3) partial scalar invariance for lack of regulation and surface processing. table 3 results from measurement invariance tests for learning strategies and academic adjustment *** p<.001; a lr=one item of lack of regulation scale freed; b sp=one item of surface processing scale freed; c the reference point for the calculation of these values is the metric model. 7.1.2 motivational variables in order to achieve adequate fit for the configural model of the motivational variables (see table 4), we added three error covariances in the autonomous as well as in the controlled motivation scale. furthermore, one error covariance was added in the self-efficacy scale and one in the academic adjustment scale (the same as in the learning strategies model). as can be seen in table 4, the results from the measurement invariance tests, thus, provide evidence that strict invariance is established for autonomous and controlled motivation, amotivation, self-efficacy, and academic adjustment. table 4 results from measurement invariance tests for motivational variables and academic adjustment 7.2 multiple-group sem 7.2.1 learning strategies in a next step, a multiple group sem analysis containing the learning strategies component (model 1 in figure 2) provided satisfactory fit (cfi=.911, rmsea=.031). parameter estimates of this model (table 5) demonstrate that students’ first-year academic achievement is related with their prior education (dummy coded; 0=general se) in both the academic (β=-.149, p<.001) and professional (β=-.215, p<.001) contexts, indicating that students from more academically preparing study tracks in secondary education (general education), achieve better in first-year he. further, it appears that academic adjustment is a direct significant (positive) determinant of academic achievement in academic (β=.290, p<.001) programmes, as well as in professional (β=.334, p<.001) programmes. academic adjustment is, in its turn, significantly and negatively associated with lack of regulation in both types of programmes (acad.: β=-.430, p<.001; prof.: β=-.253, p=.004). finally, prior education predicts academic adjustment in professional programmes (β=-.248, p<.001), but not in academic programmes (β=-.096, p=.135). table 5 results of the multiple-group sem analysis for learning strategies; standardised parameter estimates and explained variances (r²) in academic and professional programmes a dummy coded: 0=general se in order to accurately compare the predictive power, in terms of explained variance, of learning strategies, motivational variables and academic adjustment on their respective outcomes across academic and professional programmes, we contrasted the incremental values of these factors over prior education, which are calculated in table 6 (i.e. δr² learning strategies in the prediction of academic adjustment, and δr² learning strategies + adjustment in the prediction of academic achievement). results show that the larger regression coefficients of learning strategies in the prediction of academic adjustment in the academic group, are also reflected in the explained variances (δr² learning strategies). learning strategies at the end of se predict over double the amount of variance in first-semester academic adjustment in academic programmes (19.4%) in comparison to professional programmes (7.2%). table 6 further shows that in professional he, 3.1% more variance in academic achievement is explained by learning strategies and academic adjustment (11.1%), compared to the academic context (8%). further analyses (see 7.2.3 incremental value of academic adjustment) will show that this latter difference in incremental value can be completely attributed to the increase in explained variance of academic adjustment, not learning strategies. table 6 calculation of incremental value (δr²) of learning strategies and academic adjustment over prior education a as also reported in table 5. 7.2.2. motivational variables the motivational variables model (model 2 in figure 2) had satisfactory fit as well (cfi=.924, rmsea=.039). parameter estimates of this model (table 7) demonstrate that prior education relates to adjustment in academic (β=-.128, p=.043) as well as in professional (β=-.276, p<.001) programmes. next, self-efficacy had a significant positive effect on academic adjustment in the academic programmes (β=.266, p<.001), but not in the professional programme group (β=.088, p=.285). controlled motivation was significantly and negatively related with adjustment in both types of programmes (acad.: β=-.188, p=.024; prof.: β=-.189, p=.028). academic adjustment, subsequently, positively predicted academic achievement in the academic programmes (β=.215, p=.001) as well as in the professional programmes (β=.333, p<.001). further, next to the indirect effect through academic adjustment in the academic group, self-efficacy also has a direct positive impact on academic achievement in the academic (β=.134, p=.002) and in the professional group (β=.117, p=.014). finally, prior education was also related to academic achievement in both types of programmes (acad.: β=-.160, p<.001; prof.: β=-.213, p<.001). table 7 results of the multiple-group sem analysis for motivational variables: parameter estimates and explained variances in academic and professional programmes similar to the learning strategies model, we observed that the incremental value of motivational variables in the prediction of academic adjustment, over prior education (table 8), is larger in academic programmes (12.9%) relatively to professional programmes (9.3%). however, this difference across programmes (3.6%) is smaller than that in the learning strategies model (12.2%). furthermore, motivational variables and academic adjustment together, in the professional programmes were able to predict 13% of variance in academic achievement, over and above prior education. this is 4.1% more variance than was predicted in the academic contexts, where the increase in explained variance of these variables over prior education was 8.9%. again, this latter difference in incremental value is attributable to the increase in explained variance of academic adjustment (see 7.2.3 incremental value of academic adjustment). table 8 calculation of incremental value (δr²) of motivational variables and academic adjustment over prior education a as also reported in table 7. 7.2.3 incremental value of academic adjustment in predicting academic achievement, over prior education, learning strategies and motivation the motivational variables model (table 7) showed that, in addition to academic adjustment, self-efficacy was a significant direct predictor of academic achievement. moreover, we know that the non-significant effects of the determinants of academic achievement presented in table 5 and 7 have a small impact on the reported explained variances as well in addition to the significant effects. hence, these considerations bring about the question of what the ‘net’ impact of academic adjustment on academic achievement is in terms of explained variance, and to what extent this is different between academic and professional programmes. therefore, we also examined a model containing only the direct impact of the learning strategies and motivational variables on academic achievement (models b in table 9 and 10), wherein academic adjustment was not included as predictor of achievement. this allowed us to estimate the explained variances (r²) of learning strategies and motivational variables in academic achievement, and consequently the incremental value (δr²) of academic adjustment in the predictive model. our calculations show that the difference in explained variance in academic achievement in the learning strategies model, between academic and professional contexts (δr²=3.1%, see table 6), can be completely ascribed to the incremental value of academic adjustment, as can be seen in model c of table 9 (9.8% 6.7%). indeed, learning strategies have identical incremental value over prior education in both academic and professional programmes (model b, table 9: δr²=1.3). table 9 calculation of incremental value (δr²) of academic adjustment in predicting academic achievement, over prior education and learning strategies closer inspection of table 10, then, shows that the difference of 4.1% explained variance in academic achievement in the motivational variables model (see table 7), can also be attributed to the fact that academic adjustment predicts more variance in professional programmes (δr²=9.4%), relative to academic programmes (δr²=3.9%); a difference of 5.5%. however, given that learning strategies have a larger incremental value over prior education in academic programmes (δr²=5%), in comparison to professional programmes (δr²=3.6%) a difference of 1.4% this larger predictive value of academic adjustment in professional programmes is slightly compensated. table 10 calculation of incremental value (δr²) of academic adjustment in predicting academic achievement, over prior education and motivation 8. discussion and conclusion this study set out to explore whether the determinants in our conceptual model (fig. 1) have dissimilar predictive power in professional he programmes, in comparison with more academically oriented programmes. more specifically, we examined (1) to what extent the explanatory value of secondary students’ academic motivation, academic self-efficacy and learning strategies in the prediction of first-year he academic adjustment, was different between these two types of programmes, and (2) whether secondary students’ academic motivation, academic self-efficacy, learning strategies and subsequent first-year academic adjustment, are similarly or differently predictive for academic achievement within the two different he programmes. we examined these relationships and differences in explanatory value of determinants, controlling for students’ prior education track. in what follows, we discuss the resulting parameter estimates of the multi-group sem-models in relation to previous research, after which we further focus on the differences in predictive power between academic and professional he contexts. 8.1 identified relationships in academic and professional he contexts our results indicate that academic adjustment in the first semester of the fyhe exerted the largest influence on academic achievement in both academic and professional programmes; students who feel more academically adjusted to their new learning environment in the first semester of he will obtain a higher percentage of their credits at the end of their first year. this result corroborates the findings of several previous studies in academic he contexts (bailey & phillips, 2016; petersen et al., 2009; prospero & vohra-gupta, 2007; severiens & wolff, 2008; wintre et al., 2011). the only other direct and positive significant predictor of academic achievement in both he contexts was self-efficacy which is also in line with former studies (richardson et al., 2012; robbins et al., 2004). however, as our study went beyond the consideration of separate direct effects by modelling the relationship between variables within integrated models, we could also identify some motivational and learning strategy variables that influenced academic achievement indirectly, through their impact on academic adjustment. firstly, students in academic programmes who had more confidence in their way of studying (self-efficacy) at the end of se felt more academically adapted in the first semester of the fyhe, which contradicts the findings by van rooij et al. (2018), but are in line with findings from other research (cazan, 2012; chemers et al., 2001). this latter relationship was non-significant in professional programmes. further, for both academic and professional he programmes, results confirm the hypotheses that students who have difficulties in steering their own learning process (lack of regulation) and those for whom the drivers for studying were more determined by external sources (controlled motivation) at the end of se, felt less adapted to their new learning environment. a strength of the present study is that all the above relationships were present after controlling for the expected effect of the prior se education track which students followed. in contrast to previous research, several variables were not significantly associated with academic adjustment in either he programmes: autonomous motivation (clark et al., 2014; petersen et al., 2009; van rooij et al., 2018), amotivation (baker, 2004) and self-regulation (cazan, 2012; hurtado et al., 2007; van rooij et al., 2018). the hypothesis that deep and surface processing at the end of se influences fyhe academic adjustment (abbott-chapman et al., 1992) was not supported by our data either. thus, similar to the study by van rooij et al. (2018), which highlighted the pivotal role of academic adjustment in predicting achievement in university, we found academic adjustment to be an important mediator of the effects of several motivational variables and learning strategies on academic achievement. interestingly, however, while van rooij and colleagues did not find evidence of self-efficacy being related with academic adjustment nor with academic achievement, the present study found that students’ self-efficacy of studying, especially in he academic programmes, to be positively associated with academic achievement both directly and indirectly, through academic adjustment. 8.2 differences between academic and professional programmes in line with previous work on disciplinary diversity of he programmes influencing interrelationships between non-cognitive variables and academic achievement (de clercq et al., 2013; fonteyne et al., 2017; lizzio et al., 2002), the present study provides empirical evidence that also he programme diversity (academic vs. professional) influences relationships in predictive models of academic achievement. the fact that the size of regression coefficients varies over academic and professional programmes, is an important first indication that learning strategies, motivational variables, and academic adjustment affect their respective outcomes differently in both he contexts. furthermore, one relationship between self-efficacy and academic adjustment was significant in academic programmes, but not in professional programmes. the value of investigating the diversity between he programmes on the meso-level, is also further evidenced by the differences in explained variance across both groups. firstly, learning strategies and motivational variables at the end of se seem to have more predictive power in the prediction of fyhe academic adjustment in the academic context (motivational variables: δr²=12.9%; learning strategies: δr²=19.4%) than in the professional context (motivational variables: δr²=9.3%; learning strategies: δr²=7.2%). in this light, especially lack of regulation and self-efficacy proved to have important differential effects in both contexts. secondly, our results suggest that academic adjustment in the first semester of he influences academic achievement to a bigger extent in professional programmes than in academic programmes. indeed, the incremental value of academic adjustment on academic achievement in terms of explained variance, was relatively larger in the professional programmes (motivational variables model: δr²= 9.4%; learning strategies model: δr²=9.8%), compared to the academic programmes (motivational variables model: δr²= 3.9%; learning strategies model: δr²= 6.7%). finally, after controlling for prior education, students’ learning strategies seem to have an identical predictive power regarding academic achievement in both he programmes (δr²=1.3%), while motivational variables and more specifically self-efficacy, predicts academic achievement to a slightly larger extent in academic versus professional programmes (acad.: δr²=5%; prof.: δr²=3.6%). 8.3 implications, limitations and perspectives the results of this study indicate that scholars investigating students’ transition into he should be attentive for the possible influences of the specific he programme or educational context wherein research is carried out. indeed, our study strengthens the idea that predictive models of academic achievement developed in academic programmes should not be imprudently applied in professional programmes considering that the predictive power of the included variables in this study varies across the two contexts. further research is needed to accurately establish (1) why learning strategies and motivational variables at the end of se seem to affect fyhe academic adjustment to a bigger extent in academic programmes than in professional programmes and (2) why academic adjustment seems to have more predictive power in the prediction of fyhe academic achievement in the professional context relative to the academic context. several theoretical models point at the importance of both the learning environment and student characteristics in the development of students’ learning processes (e.g. biggs, 1987), motivation (e.g. deci & ryan, 2000) and fyhe adjustment (e.g. tinto, 1975). in this regard, it should be noted that, although we did control for students’ prior education which is an important student characteristic, it remains unclear whether the unveiled differences in predictive power of the abovementioned determinants emerge from differences between the he programmes (contextual), or differences between students within the two systems (individual). with reference to the contextual characteristics, it remains unclear whether and how specific aspects of the learning environments under scrutiny (e.g. aims, expectations of students, assessment methods or didactical approaches) might have been moderating the relationships in the predictive model. possibly, compared to students in academic contexts, students in professional contexts might get a clear-cut indication of how well they are functioning, much earlier on in their programme (due to, for instance, different evaluation periods/feedback loops). this, then, would entail professional bachelor students to be able to make a more accurate assessment of their own academic adjustment, which could explain the larger effect of academic adjustment on achievement in professional contexts. another hypothesis we introduce here, is that there might be more demanding requirements related to regulation in academic he programmes, so that a higher level of lack of regulation at the end of se impacts academic adjustment to a larger extent in academic he programmes, relative to professional contexts. some limitations of the present study need to be highlighted. firstly, although this study adopts a longitudinal research design and, thus, enables further understanding of the directionality of effects, it does not allow for causal interpretation (cohen, manion, & morrison, 2011). the results and conclusions should therefore be interpreted cautiously. a second limitation concerns the adopted academic adjustment scale. this measure was developed in an academic context, and therefore, we cannot guarantee that this measure is a valid representation of academic adjustment in professional contexts. indeed, it is conceivable that the academic adjustment construct in professional programmes comprises sub-facets specific to that context, which are not accounted for in the present study. nonetheless, the items of the adopted scale are drawn up rather generally (another item example is: “i have the impression that i experience too many difficulties for my studies in higher education.”), and measurement invariance analyses have shown that the meaning of items of the adopted academic adjustment scale is equivalent for students from the two types of programmes (full scalar invariance was established). finally, although this study demonstrated the significance of diversity on the level of study programmes (academic vs. professional) in predicting academic adjustment and achievement, additional studies need to explore the effect of discipline diversity within the professional he context (becher, 1994). this would help us to establish a greater degree of accuracy and understanding of the matter at hand. based upon the results of this study, we suggest that (especially professional) he administrators should be attentive to the pivotal role of freshmen’s academic adjustment in the first semester of the fyhe. also, next to academic adjustment, coaching and guidance initiatives aimed at facilitating the academic transition in the fyhe should particularly target students’ self-efficacy, lack of regulation, and controlled motivation, which especially in academic he programmes – have considerable impact on academic adjustment. this is promising, as previous longitudinal research has shown that these latter variables are malleable, and not fixed pre-entry characteristics (e.g. vermunt & donche, 2017). keypoints the present study provides empirical evidence that he programme diversity (academic vs. professional he programmes) influences the relationships in predictive models of academic achievement. learning strategies and motivational variables at the end of se have more predictive power in the prediction of fyhe academic adjustment in academic he programmes, relative to professional he programmes. academic adjustment in the first semester of the fyhe influences academic achievement to a bigger extent in professional programmes than in academic programmes. these differences across he contexts were found after controlling for prior education, and adopting a longitudinal, integrative study design. references abbott-chapman, j. a., hughes, p. w., & wyld, c. (1992). monitoring student progress: a framework for improving student performance and reducing attrition in higher education. hobart: national clearinghouse for youth studies. bailey, t. h., & phillips, l. j. (2016). the influence of motivation and adaptation on students’ subjective wellbeing, meaning in life and academic performance . higher education research & development, 35(2), 201–216. https://doi.org/10.1080/07294360.2015.1087474 baker, s. r. (2004). intrinsic, extrinsic, and amotivational orientations: their role in university adjustment, stress, well-being, and subsequent academic performance. current psychology, 23(3), 189-202. https://doi.org/10.1007/s12144-004-1019-9 baker, r. w., mcneil, o. v., & siryk, b. (1985). expectation and reality in freshman adjustment to college. journal of counseling psychology, 32(1), 94. https://doi.org/10.1037/0022-0167.32.1.94 baker. r.w.. & siryk. b. (1999). student adaptation to college questionnaire (sacq): manual. los angeles: western psychological services. bean, j. p. (1980). dropouts and turnover: the synthesis and test of a causal model of student attrition. research in higher education, 12(2), 155-187. https://doi.org/10.1007/bf00976194 becher, t. (1994). the significance of disciplinary differences. studies in higher education, 19(2), 151-161. https://doi.org/10.1080/03075079412331382007 biggs, j. b. (1987). student approaches to learning and studying. research monograph. hawthorn: australian council for educational research ltd. braxton, j. m., & hargens, l. l. (1996). variation among disciplines: analytical frameworks and research. in j. c. smart (ed.),higher education: handbook of theory and research (vol. 11) (pp. 1-46). new york: agathon press. briggs, a. r., clark, j., & hall, i. (2012). building bridges: understanding student transition to university. quality in higher education, 18(1), 3-21. https://doi.org/10.1080/13538322.2011.614468 brown. t. a. (2014). confirmatory factor analysis for applied research. new york: guilford publications. byrne. b. m. (2016). structural equation modelling with amos: basic concepts. applications. and programmeming . new york: routledge. byrne. b. m.. shavelson. r. j.. & muthén. b. (1989). testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. psychological bulletin. 105(3). 456–466. https://doi.org/10.1037/0033-2909.105.3.456 camilleri, a. f., delplace, s., frankowicz, m., hudak, r., and tannhäuser, a. c. (2014). professional higher education in europe. characteristics, practice examples and national differences . malta: knowledge innovation centre. cazan. a. m. (2012). self-regulated learning strategies predictors of academic adjustment. procedia-social and behavioral sciences. 33(1). 104-108. https://doi.org/10.1016/j.sbspro.2012.01.092 chemers, m. m., hu, l. t., & garcia, b. f. (2001). academic self-efficacy and first year college student performance and adjustment. journal of educational psychology, 93(1), 55-64. https://doi.org/10.1037/0022-0663.93.1.55 chen. f. f. (2007). sensitivity of goodness of fit indexes to lack of measurement invariance. structural equation modeling. 14 (3). 464-504. https://doi.org/10.1080/10705510701301834 cheung. g. w. & rensvold. r. b. (2002). evaluating goodness-of-fit indexes for testing measurement invariance. structural equation modeling. 9(2). 233-255. https://doi.org/10.1207/s15328007sem0902_5 clark, m. h., middleton, s. c., nguyen, d., & zwick, l. k. (2014). mediating relationships between academic motivation, academic integration and academic performance. learning and individual differences, 33(1), 30-38. https://doi.org/10.1016/j.lindif.2014.04.007 cohen, l., manion, l., & morrison, k. (2011). research methods in education (6th edition). london: routledge. credé, m., & kuncel, n. r. (2008). study habits, skills, and attitudes: the third pillar supporting collegiate academic performance. perspectives on psychological science, 3(6), 425-453. https://doi.org/10.1111/j.1745-6924.2008.00089.x de clercq, m., galand, b., dupont, s., & frenay , m. (2013). achievement among first-year university students: an integrated and contextualised approach. european journal of psychology of education, 28(3), 641–662. https://doi.org/10.1007/s10212-012-0133-6 deci, e. l., & ryan, r. m. (2000). the" what" and" why" of goal pursuits: human needs and the self-determination of behavior. psychological inquiry, 11(4), 227-268. https://doi.org/10.1207/s15327965pli1104_01 declercq, k., & verboven, f. (2014). enrollment and degree completion in higher education without ex ante admission standards . leuven: faculty of economics and business. dent, a. l., & koenka, a. c. (2016). the relation between self-regulated learning and academic achievement across childhood and adolescence: a meta-analysis. educational psychology review, 28 (3), 425–474. https://doi.org/10.1007/s10648-015-9320-8 dinsmore, d. l., & alexander, p. a. (2012) . a critical discussion of deep and surface processing: what it means, how it is measured, the role of context, and model specification. educational psychology review, 24(4), 499-567. https://doi.org/10.1007/s10648-012-9198-7 donche, v., & van petegem, p. (2008). the validity and reliability of the short inventory of learning patterns. in e. cools, h. van den broeck, c. evans, & t. redmond (eds.), style and cultural differences: how can organisations, regions and countries take advantage of style differences (pp. 49-59). gent, belgium: vlerick leuven gent management school. donche, v., & van petegem, p. (2010). the relationship between entry characteristics, learning style and academic achievement of college freshmen. in m. poulson (ed.), higher education: teaching, internationalisation and student issues (pp. 277–288). new york: nova science publishers. enders, c. k. (2010). applied missing data analysis. london: guilford press. flemish government (2019). hoger onderwijs in cijfers [higher education in numbers]. retrieved 24 march, 2020, from https://onderwijs.vlaanderen.be/nl/hoger-onderwijs-in-cijfers . fonteyne, l., duyck, w., & de fruyt, f. (2017). program-specific prediction of academic achievement on the basis of cognitive and non-cognitive factors. learning and individual differences, 56(1), 34-48. https://doi.org/10.1016/j.lindif.2017.05.003 garriott, p. o., love, k. m., & tyler, k. m. (2008). anti-black racism, self-esteem, and the adjustment of white students in higher education. journal of diversity in higher education, 1(1), 45-58. https://doi.org/10.1037/1938-8926.1.1.45 gerdes, h., & mallinckrodt, b. (1994). emotional, social, and academic adjustment of college students: a longitudinal study of retention. journal of counseling & development, 72(3), 281-288. https://doi.org/10.1002/j.1556-6676.1994.tb00935.x glorieux, i., laurijssen, i., & sobczyk, o. (2014). de instroom in het hoger onderwijs van vlaanderen: een beschrijving van de huidige instroompopulatie en een analyse van de overgang van secundair onderwijs naar hoger onderwijs. [the inflow into higher education in flanders. a description of the current enrolment population and an analysis of the transition from secondary education to higher education]. leuven: steunpunt studieen schoolloopbanen. gregorich. s. e. (2006). do self-report instruments allow meaningful comparisons across diverse population groups? testing measurement invariance using the confirmatory factor analysis framework. medical care. 44(11). doi: 78-94. 10.1097/01.mlr.0000245454.12228.8f guay, f., ratelle, c., roy, a., & litalien, d. (2010). academic self-concept, autonomous academic motivation, and academic achievement: mediating and additive effects. learning and individual differences, 20(6), 644–653. https://doi.org/10.1016/j.lindif.2010.08.001 hu, l. t., & bentler, p. m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives.structural equation modeling: a multidisciplinary journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118 hurtado, s., han, j. c., sáenz, v. b., espinosa, l. l., cabrera, n. l., & cerna, o. s. (2007). predicting transition and adjustment to college: biomedical and behavioral science aspirants’ and minority students’ first year of college. research in higher education, 48(7), 841-887. https://doi.org/10.1007/s11162-007-9051-x iacobucci, d. (2010). structural equations modeling: fit indices, sample size, and advanced topics. journal of consumer psychology, 20(1), 90-98. https://doi.org/10.1016/j.jcps.2009.09.003 kuh, g. d., kinzie, j., buckley, j. a., bridges, b. k., & hayek, j. c. (2006). what matters to student success: a review of the literature. commissioned report for the national symposium on postsecondary student success. washington, dc: national postsecondary education cooperative. lizzio, a., wilson, k., & simons, r. (2002). university students' perceptions of the learning environment and academic outcomes: implications for theory and practice. studies in higher education, 27 (1), 27-52. https://doi.org/10.1080/03075070120099359 meredith. w. (1993). measurement invariance. factor analysis and factorial invariance. psychometrika. 58(4). 525-543. https://doi.org/10.1007/bf02294825 milfont, t. l., & fischer, r. (2010). testing measurement invariance across groups: applications in cross-cultural research. international journal of psychological research, 3(1), 111-130. https://doi.org/10.21500/20112084.857 muthén. l. k.. & muthén. b. o. (2010). mplus. statistical analysis with latent variables. user’s guide (6th ed.) . los angeles: muthén & muthén. oecd (2009), education at a glance 2009: oecd indicators, paris: oecd publishing. oecd (2013), education at a glance 2013: oecd indicators, paris: oecd publishing. petersen, i. h., louw, j., & dumont, k. (2009). adjustment to university and academic performance among disadvantaged students in south africa. educational psychology, 29(1), 99-115. https://doi.org/10.1080/01443410802521066 prospero,m., & vohra-gupta, s. (2007). first generation college students: motivation, integration, and academic achievement. community college journal of research and practice, 31 (12), 963–975. https://doi.org/10.1080/10668920600902051 rice, k. g., vergara, d. t., & aldea, m. a. (2006). cognitive-affective mediators of perfectionism and college student adjustment. personality and individual differences, 40(3), 463-473. https://doi.org/10.1016/j.paid.2005.05.011 richardson, m., abraham, c., & bond, r. (2012). psychological correlates of university students’ academic performance: a systematic review and meta-analysis. psychological bulletin, 138(2), 353–387. https://doi.org/10.1037/a0026838 robbins, s. b., lauver, k., le, h., davis, d., langley, r., & carlstrom, a. (2004). do psychosocial and study skill factors predict college outcomes? a meta-analysis. psychological bulletin, 130(2), 261–288. https://doi.org/10.1037/0033-2909.130.2.261 schuetze, h. g., & slowey, m. (2002). participation and exclusion: a comparative analysis of non-traditional students and lifelong learners in higher education. higher education, 44(3-4), 309-327. https://doi.org/10.1023/a:1019898114335 schunk, d. h. (1991). self-efficacy and academic motivation. educational psychologist, 26(3-4), 207-231. https://doi.org/10.1080/00461520.1991.9653133 schunk, d. h., & zimmerman, b. j. (2012). motivation and self-regulated learning: theory, research, and applications . new york: routledge. severiens, s., & wolff, r. (2008). a comparison of ethnic minority and majority students: social and academic integration, and quality of learning. studies in higher education, 33(3), 253-266. https://doi.org/10.1080/03075070802049194 steinmetz, h., schmidt, p., tina-booh, a., wieczorek, s., & schwartz, s. h. (2009). testing measurement invariance using multigroup cfa: differences between educational groups in human values measurement. quality & quantity, 43(4), 599-616. https://doi.org/10.1007/s11135-007-9143-x the bologna declaration (1999). the european higher education area: joint declaration of the european ministers of education, 19 june 1999, bologna. tinto, v., (1975). dropout from higher education: a theoretical synthesis of recent research. review of educational research, 45(1), 89-125. https://doi.org/10.3102/00346543045001089 torenbeek. m. (2011). hop. skip and jump? the fit between secondary school and university (phd diss.). university of groningen. groningen. torenbeek. m.. jansen. e.. & hofman. a. (2010). the effect of the fit between secondary and university education on first‐year student achievement. studies in higher education. 35(6). 659-675 . https://doi.org/10.1080/03075070903222625 van rooij, e., brouwer, j., fokkens-bruinsma, m., jansen, e., donche, v., & noyens, d. (2017). a systematic review of factors related to first-year students' success in dutch and flemish higher education. pedagogische studiën, 94(5), 360-405. van rooij, e. c., jansen, e. p., & van de grift, w. j. (2018). first-year university students’ academic success: the importance of academic adjustment. european journal of psychology of education, 33(4), 749-767. https://doi.org/10.1007/s10212-017-0347-8 vandenberg. r. j.. & lance. c. e. (2000). a review and synthesis of the measurement invariance literature: suggestions. practices. and recommendations for organizational research. organizational research methods. 3(1). 4-70. https://doi.org/10.1177/109442810031002 vanthournout, g., gijbels, d., coertjens, l., donche, v., & van petegem, p. (2012) . students’ persistence and academic success in a first-year professional bachelor programme: the influence of students’ learning strategies and academic motivation. education research international, 1–10. https://doi.org/10.1155/2012/152747 vermunt, j. d. (1998). the regulation of constructive learning processes. british journal of educational psychology, 68(2), 149–171. https://doi.org/10.1111/j.2044-8279.1998.tb01281.x vermunt, j. d. (2005). relations between student learning patterns and personal and contextual factors and academic performance. higher education, 49(3), 205-234. https://doi.org/10.1007/s10734-004-6664-2 vermunt, j. d., & donche, v. (2017). a learning patterns perspective on student learning in higher education: state of the art and moving forward. educational psychology review, 29(2), 269–299. https://doi.org/10.1007/s10648-017-9414-6 vermunt, j. d., & vermetten, y. j. (2004). patterns in student learning: relationships between learning strategies, conceptions of learning, and learning orientations. educational psychology review , 16(4), 359-384. https://doi.org/10.1007/s10648-004-0005-y wintre. m. g.. dilouya. b.. pancer. s. m.. pratt. m. w.. birnie-lefcovitch. s.. polivy. j.. & adams. g. (2011). academic achievement in first-year university: who maintains their high school average? higher education. 62(4). 467-481. https://doi.org/10.1007/s10734-010-9399-2 young, p. (2010). generic or discipline‐specific? an exploration of the significance of discipline‐specific issues in researching and developing teaching and learning in higher education. innovations in education and teaching international, 47(1), 115-124. https://doi.org/10.1080/14703290903525887 codepen laine et al publication frontline learning research vol.8 no. 2 (2020) 90 108 issn 2295-3159 individual interest and learning in secondary school stem education erkka laine b, marjaana veermansa, andreas gegenfurtner b& koen veermansa auniversity of turku, finland bdeggendorf institute of technology, germany article received 27 february 2019/ revised 23 december/ accepted 26 march 2020 / available online 29 april abstract interest research offers different hypotheses about the association between interest and learning outcomes. the standard hypothesis proposes that interest predicts learning outcomes: people acquire new knowledge about a topic they find interesting. the affective by-product hypothesis assumes that learning predicts interest: by learning something, people develop an interest in this topic. finally, the reciprocal hypothesis states that interest and learning covary. this longitudinal study aimed to test the predictive validity of these three hypotheses in the context of secondary school stem education. the participants were 104 finnish 7th grade students aged 12-14. data were collected at three times during the school year through questionnaires and grade evaluations in mathematics and biology. a partial least squares (pls) path modeling approach was used to determine the relationships between interest and course grades across the three measurement points: at the beginning of the autumn semester, at the beginning of the spring semester, and after the spring semester at the end of the school year. the results differed between the autumn and spring semesters: during the autumn semester, students’ interest predicted their grades, whereas during the spring semester, grades predicted their interest. these findings indicate that the relationships between students’ individual interest towards science and mathematics with learning vary. as a practical implication, more focus should be put on when and what type of performance feedback is given to students with differing interest profiles. keywords: interest; learning; stem; partial least squares (pls) path modeling info corresponding author: email: etlain@utu.fi doi: https://doi.org/10.14786/flr.v8i2.461 1. introduction previous research literature has shown that students’ interest in learning science, technology, engineering, and mathematics (stem) varies at different ages (osborne, simon & collins, 2003; hofer, 2010). the most notable change usually takes place during the transition from elementary to secondary school (krapp & prenzel, 2011). during that period, some students seem to start losing interest in investing effort into stem learning while the interest of others takes a deeper and more long-lasting form. this decline in interest is particularly alarming from the perspective of how modern societies will be able to respond to a multitude of science related challenges in the future. for example, the european commission’s report “does the eu need more stem graduates” (european commission, 2015) estimates that the need for high qualification stem jobs will generally increase throughout europe by 2025. this is expected to coincide with a diminishing number of low qualification level jobs due to increased digitalization and robotics. although long-term predictions of this kind are highly uncertain, it is likely that the labour market will continue shifting towards more knowledge-intensive employment and the demand for highly-skilled jobs will increase (ec, 2015). this will pose challenges for educational systems in terms of how to ensure that students will not only learn relevant skills and knowledge in schools but can also be fostered to develop a lasting interest towards science. the latter is particularly important because early interest in stem careers has been found to predict student persistence in science and the choice of a science-related major in college (tai, liu, maltese, & fan, 2006). aside from encouraging adolescents to pursue careers in stem professions, stem education is also important from the perspective of giving students the qualifications to become scientifically literate citizens in their adult lives. this includes not only providing adequate knowledge content for different stem topics, but also inspiring a long-lasting interest in it, as well as a basic understanding of the scientific method. to be able to do this means that the teaching provided in schools should be both cognitively satisfying, and at the same time encourage students to adopt a positive and curious mindset towards stem. there are several possible reasons behind the declining trends in students’ interest to learn stem subjects. it may be that the way school education is organized, the curriculum or the quality and type of instruction do not provide enough support for students’ interest to develop further. another explanation relates to some of the psychological demands that adolescents confront in their lives, which may cause them to view academic learning as less important compared to other aspects of life. thirdly, it may be that students’ view of themselves as learners, their ideal self-concept, may become separated from stem domains and hence cause them to become disinterested in investing effort in those topics (krapp & prenzel, 2011). the relationship between interest and learning has been studied widely in the past research literature (krapp & prenzel, 2011) and recently some researchers have turned their focus on examining the directionality of the two concepts. while the prevalent view has been that interest is an antecedent for learning, some researchers have taken a different approach and aimed to determine how the acquisition of knowledge on a certain subject influences learners’ interest towards it (rotgans & schmidt, 2017b). if it would be so that acquiring knowledge about a school subject in fact generates and predicts students’ interest later on, then this would have consequences on how schools should design their pedagogical approaches. this would emphasize the importance of cognitive support in education and perhaps give less weight on constantly trying to come up with new and more exciting ways to get students to engage with the study topic. the aim of this study was to contribute to this discussion by examining how the relationships between students’ interest and learning in stem subjects develop and vary during the time period of one school year. although the relationship between students’ interest in stem and their academic achievement has already been studied longitudinally (i.e. köller et al., 2001), the time points at which interest and achievement outcomes were measured have often spanned over many years. in addition, the focus of these studies has mostly been on how interest predicts students’ course choices or college majors in the stem domain, and a more detailed view on what takes place during a single school year is needed. for this reason, the current study concentrated on revealing predictive patterns within a time frame that would be long enough to see changes but at the same time short enough to see interaction patterns. 1.1 the phases of interest development interest can be conceptualized as a phenomenon that arises from the interaction between a person and his or her environment (hidi & renninger, 2006), and which produces experimental modes that have both positive cognitive and affective qualities (krapp & prenzel, 2011). the cognitive qualities might include for example personally meaningful goals or viewing the activity or topic as valuable for the person’s future. affective qualities, then, might for example relate to feeling enjoyment when interacting with the activity, or engaging deeply with the topic at hand. theoretical literature usually acknowledges two different types of interest, namely situational and individual interest. these two types differ from each other in how much they are based on affect, knowledge and value, as well as the temporal duration of interest. situational interest is viewed to be more affect based and temporarily fleeting, whereas individual interest is considered to connect more with the individual’s values and acquired knowledge on the subject, and be more stable over time (krapp, hidi & renninger, 1992; hidi & renninger, 2006; renninger & hidi, 2011; rotgans & schmidt, 2017b). both of these forms of interest can be viewed to represent different analytical levels, where situational interest refers to the actual and on-going process of engaging an activity, and individual interest the relatively stable tendency to invest time and effort in the topic of interest (krapp & prenzel, 2011). current interest theory often further divides these two different types of interest into several sub-categories which are thought to reflect their developmental phases. krapp (2002) uses a three-category model, which divides interest into three phases of development, namely emerging situational, stabilised situational, and individual interest. hidi and renninger (2006) further add to this model a fourth phase by dividing individual interest into emerging and well-developed individual interest. a person’s interest is theorized to develop through each of these phases consecutively, starting from the situational interest being triggered and leading to well-developed individual interest through continued engagement with the topic, as well as increased value and knowledge acquisition (hidi & renninger, 2006; renninger & hidi, 2011). this study concentrates on individual interest; how its emerging or well-developed phases might manifest in students’ self-reported levels of interest to study mathematics and biology during the duration of one school year. in particular, it aims to see how their interest levels reflected their grades in these subjects, and whether the grades they received predict their interest levels later on. the period of one school year was chosen as the duration of the study because it represents the natural annual cycle of students’ schoolwork, and at the same time is long enough for changes to take place in their reported interest levels or received grades, which could in turn affect their grades or reported interest levels. the context of this study was closely tied to a real-life setting in the school’s everyday life, and the aim was to acquire a general understanding of what happens to students’ interest during this one year of formal education. following the four-phase model by hidi and renninger (2011), individual interest was conceptualized as consisting of two phases, namely emerging individual interest, and well-developed individual interest. these two phases represent the last stages of interest development, when the activity, topic or domain is viewed to be personally valued, relates to the person’s existing knowledge structures, and is intrinsically motivated (hidi & renninger, 2006; krapp & prenzel, 2011). an emerging individual interest is characterized not only by positive feelings, but also by stored value and knowledge. the activity itself is of value to the person and he or she would engage it in any case if given the option to choose. this phase is usually viewed to be self-generated, although it may at times require external support from peers or experts and can, in the context of education, be affected by instructional conditions or the learning environment. this can then lead to the last phase of the model, namely well-developed individual interest which is viewed as a psychological state of interest as well as a more or less enduring tendency to engage the topic or object of interest. it has very much the same attributes as the emerging individual interest, but with more stored value and knowledge on the topic or activity. it is also viewed to be mostly self-regulated, but can benefit from instructional designs and learning environments that offer opportunities to gain knowledge through challenging tasks and interaction. although, the development of interest should be viewed as a continuum, separating different phases in it can have theoretical and practical value. an actually operating interest can become generated either through an already existing disposition, i.e. individual interest or through the special conditions that take place in a teaching or learning situation, i.e. the interestingness of the situation (krapp & prenzel, 2011). individual interest is affected by the environmental and situational factors that take place in different learning situations. school education, with its separate lessons in different school subjects, can be viewed as a continuum in which a student’s situational interest towards the study topic may change from one time to another. a student’s experiences of focused attention and positive emotions in one learning situation may affect his or her interest in another situation later on, and gradually start to develop towards a more sustained form of interest. meaningfulness of the task and personal involvement are seen as the pre-requisite for a person to acquire an individual interest towards a domain (hidi & renninger, 2006). anchoring the phases into different intra-individual processes that take place throughout one’s interest development can for example, help teachers to adjust their teaching according to the different needs of students (krapp & prenzel, 2011). in this study the focus was on examining the relationships between students’ individual interest in the subjects of mathematics and biology, and their knowledge acquisition on those subjects. the aim was to examine how the predictive relationships between interest and knowledge acquisition might vary during the course of one school year. we chose to concentrate on measuring individual interest on the subject level, since this is the dominant level on which students’ learning outcomes are measured in schools. there exist also more finely grained methods to measure individual interest, for example on the sub-domain level (e.g. geometry in mathematics, or photosynthesis in biology). however, by concentrating our analyses on the level of school subjects, we aimed to provide information relevant to the work of educators and teachers, who could also benefit from the information that the results provide. 1.2 interest and learning interest has been found to have facilitating and mediating effects on learning outcomes. this has been observed in different contexts and settings, such as writing (albin, benton, & khramtsova, 1996), studying psychology (harackiewicz, durik, barron, linnenbrink & tauer, 2008), learning statistics (hay, callingham & carmichael, 2015), and reading science texts (ainley, hidi & berndorff, 2002). in addition, interest has been found to have a positive connection with other motivational factors such as mastery goals (harackiewicz et al., 2008), utility value (gegenfurtner, knogler & schwab, 2020), and self-efficacy (ainley, 2012). the traditional view in the research literature has seen interest as the prerequisite, or at least the facilitator of learning. in this study, we will call this the standard hypothesis as it is the most common way to define the relationship between interest and learning (rotgans & schmidt, 2017b; 2017c). the main idea behind this is that in order for learning to take place the individual has to become interested in the topic either through the support of one’s individual interest or through arousing situational factors. individual interest can even in its less-developed form help to generate situational interest through its relation to students’ prior knowledge and value for the subject. in school education context students encounter study domains and topics that do not necessarily rank high on their list of individual interests, but they may view the information provided at the lesson as important or realize its value for completing their studies successfully. a student who is able to connect with the study content and develop strategies for working with it is more likely to start developing curiosity questions towards the topic. these curiosity questions in turn increase the student’s sense of possibilities for learning and increase the perceived value of the studied topic, which may in time become realized as more well-developed individual interest (renninger, 2000). many pedagogical approaches, such as inquiry learning (renninger et al., 2014), problem-based learning (rotgans & schmidt, 2011), multi-user virtual environments (chen et al., 2015), and game-based learning (knogler, harackiewicz, gegenfurtner, & lewalter, 2015; rodríguez‐aflecht et al., 2018) have been developed with the aim of increasing learners’ interest towards the studied topic. all these approaches rely, at least in part, on the idea that by making the learning situation more engaging and enjoyable to the student it increases their interest, and ultimately leads to better learning outcomes. however, recent research literature has highlighted the fact that there exists incongruence between the direct effects of interest in learning, and that the empirical findings are not as self-evident as previously suggested (köller, baumert, & schnabel, 2001; nieswandt, 2007; tapola, veermans, & niemivirta, 2013). despite a significant amount of empirical research existing on interest-enhancing practices in educational settings, a considerable portion of the studies report only partial improvements or small effect sizes. this has been especially evident among sub-groups of students with differing levels of pre-existing interest (hulleman & harackiewicz, 2009; renninger et al., 2014; rodríguez‐aflecht et al., 2018). this has caused some researchers to question whether the relationship has been viewed from the correct perspective. in their recent work, rotgans and schmidt (2017a; 2017b) raise the question of what would happen if the relationship between interest and learning, or knowledge acquisition, is reversed so that learning precedes interest. this proposition gains some theoretical support from the notion that individual interest may help students to remain situationally interested during learning situations, and according to the four-phase model, individual interest is more knowledge-based and less reliant on affective fluctuations (krapp & prenzel, 2011; renninger, 2000). hence it would be plausible to expect that some amount of learning has to take place before the student can start developing a longer lasting individual interest towards the study topic. in this study we call this the affective by-product hypothesis. the third hypothesis stemming from the work of rotgans and schmidt (2017b) posits that interest and learning both affect each other’s development reciprocally. the lack of research on the directionality of the interest-learning relationship had already surfaced some years earlier. for example, köller et al. (2001) already raised the question whether academic achievement necessarily follows interest, or that it could be the other way around so that those students who feel more competent in the study subject could also generate interest towards it more easily. in the reciprocal hypothesis, interest and learning are seen to interact with each other in varying degrees over time, so that their relative emphasis in the learning process differs from time to time. in their analyses, rotgans and schmidt (2017b) did not find support for the so-called standard hypothesis that individual interest would precede learning. instead they found a significant path coefficient (standardized β=0.20, p < .05) between the knowledge measured at first time point, and the interest measured at second time point; thus, supporting the affective by-product hypothesis. this means that the amount of knowledge students had at the beginning of the experiment seemed to influence their interest towards the topic at the end. lastly, they did not find support for the reciprocal hypothesis that interest and knowledge acquisition affect each other reciprocally. the purpose of this current study was to test these hypotheses in a classroom environment over the duration of one school year (9.5 months). 2. research question and hypotheses this study aimed to explore longitudinally how students’ individual interest relates to their learning outcomes in mathematics and biology during a school year. for this purpose, and based on the theoretical discussion that was presented earlier, this study addressed the following research question: what is the relationship between interest and learning in mathematics and biology education? to examine the relationship between students’ interest and learning we formulated three theoretically derived hypotheses. these hypotheses and their relation to the theoretical model can be seen in table 1. the partial least squares (pls) structural equation model that was constructed to examine the relationship between the variables can be seen in figure 1, along with the three groups of hypotheses. table 1 the three models and the hypotheses figure 1. hypothesized pls model for individual interest and grades in mathematics and biology. in the first group of hypotheses based on previous research literature (hidi & renninger, 2006; ainley, hidi, & berndorf, 2002), interest was expected to predict learning. for this, we formulated the so-called standard hypotheses. this theoretical approach suggests that interest development precedes learning and because of this, students’ interest at time points 1 and 2 should predict their grades at time points 2 and 3 respectively. h1.1: interest at time 1 predicts learning outcomes at time 2. h1.2: interest at time 2 predicts learning outcomes at time 3. second, based on the findings by rotgans and schmidt (2017b), learning was expected to predict interest. for this we formulated the so-called affective by-product hypothesis. if individual interest would require some initial knowledge acquisition to become generated, then the students’ grades at time point 2 should predict their interest at the end of the school year at time 3. h2: learning outcomes at time 2 predict interest at time 3. the third possible line of reasoning was based on the idea that knowledge and interest may influence each other reciprocally. in other words, interest would facilitate students’ learning, and therefore increased knowledge would in turn cause interest to increase (rotgans & schmidt, 2017b). to test this, we formulated the reciprocal hypothesis in which interest was expected to be reciprocally related to learning outcomes at the same time points. since pls analysis does not allow testing bidirectional relationships at the same time, we formulated two separate models which we called primary and alternative models. these two models differed only in terms of the direction of the simultaneous relationship between interest and grades at time points 2 and 3. the idea was that if interest and learning had reciprocal relationships during the autumn semester and spring semesters, then these relationships would manifest in the pls analyses as predictive connections between the two variables at simultaneous time points 2 and 3, or be indicated by changes in these predictive relationships during the school year. h3: interest and learning outcomes predict each other at one time point or over time 3. method 3.1 participants and study design the participants were 104 (53 girls, 51 boys) 7th grade students aged 12-14 from six different classes in a lower secondary school in southern finland. the study setting was longitudinal, following the same students through their first year of lower secondary school. formal consent was obtained from parents for the students’ participation, and students without a letter of consent were excluded from the study as participants. since the collection of those letters was organized by the different class teachers there is no exact number on how many students were excluded due to no consent. it can however be estimated that the number was not high, most probably less than 10%. data collection occurred at three time points. time 1 was at the beginning of the autumn semester (0.5 months after starting the school year), time 2 was at the beginning of the spring semester (4.5 months), and time 3 was at the end of the spring semester (9.5 months). students’ interest in mathematics and biology were measured on all the time points, and their learning on time points 2 and 3. in the school, teaching was divided into 5 periods each lasting about 8 weeks. each class of students received the same amount of teaching during the school year, but the courses may have taken place in different periods. this also explains why mathematics and biology were chosen as the subjects, since they were the only two stem subjects that the students received teaching during both autumn and spring semester. although there was no additional demographic data collected from the participants, it can be said that schools in finland are generally very homogenous in terms of student population. majority of them receive their funding through public sources, which could give support for the sample being representative. it is also not customary to collect such demographic data in finnish schools, and in terms of the focus point of this study, these types of data were not relevant. 3.2 measures 3.2.1 individual interest in mathematics and biology an instrument of tapola et al. (2013) was used to assess students’ individual interest in mathematics and biology. to measure interest in mathematics, a single item was used (“how interested are you in mathematics”) with a five-point scale, ranging from 1 (not at all interested) to 5 (very interested). similarly, a five-point scale single item, ranging from 1 (not at all interested) to 5 (very interested) was used to measure students’ individual interest in biology (“how interested are you in biology”) with a five-point scale, ranging from 1 (not at all interested) to 5 (very interested). single-item scales have previously been used to measure interest (ainley, 2006; palmer, 2009; tulis & ainley, 2011; tapola, jaakkola, & niemivirta, 2014) and since we were concentrating on students’ interest in these study subjects on a generalised level, we adopted this view of measuring. the situation in which the students were asked about their interest took place outside learning situations during the school day. this arrangement reduced the possibility of students connecting the question to any particular learning situation in their everyday studies. 3.2.2 learning outcomes students’ learning outcomes were measured by grade level evaluation after each semester on time points 2 and 3. grade evaluation was done by the subject teachers and was based on students’ learning and performance throughout the whole duration of each semester. in the finnish system students’ evaluation is based on both formative and summative assessment. the teachers observe the students’ development throughout the duration of the course and most often also use final tests to evaluate students’ learning outcomes at the end of each course. these both forms of evaluation are used to determine the course grades for each student. in this study students’ grades were used to indicate their skill and knowledge levels, and these were indicated by grades from 4 (failed) to 10 (excellent). 3.3 data analysis correlation analysis and pls modelling were used in the data analysis. the correlation analysis provides a more global view on relations between individual variables while the pls modelling provides an integrated model combining all variables in one model. descriptive statistics and correlation analyses were carried out using spss version 21 (ibm 2012). the structural equation models were modelled using the warppls software. partial least squares (pls) is a structural equation modeling (sem) technique which can simultaneously test the measurement model through relationships between indicators and their corresponding constructs, and the structural model through relationships between constructs (gil-garcia, 2008). pls is efficient when working with small sample sizes and complex models, and it does not assume the data to be normally distributed (hair, hult, ringle & sarstedt, 2017). instead of assessing overall model fit, pls is an approach for predicting relationships in a model which were the focus in this study. 4. results 4.1 correlational analyses the correlation analyses presented in table 2, show that students’ individual interest at time 1 was positively correlated with their grades at time 2 and time 3 in both mathematics and biology. in mathematics time 1 interest and time 2 grade had a moderate positive correlation (r(101) = .41, p < .01). in biology there was a moderate positive correlation between time 1 interest and time 2 grade (r(102) = .28, p < .01). these are in line with the standard hypothesis h1.1. table 2 correlation of individual interest and subject grades in mathematics and biology however, interest at time 2 did not correlate with grade at time 3 in either subjects, which is not in line with the standard hypothesis 1.2. mathematics grade at time 2 had a moderate correlation with mathematics interest at time 3 (r(91) = .44, p < .01). similar correlation was also found in biology (r(92) = .30, p < .01). these results support the affective by-product hypothesis h2. regarding the reciprocity of interest and learning outcomes, interest in mathematics at time 2 was significantly correlated with mathematics grade at time 2 (r(79) = .25, p < .05) albeit the correlation being rather small. interest at time 3 also significantly but also more sizeably correlated with mathematics grade at time 3 (r(91) = .47, p < .01. in biology correlation between interest time 2 and biology grade at time 2 was nonsignificant, but interest at time 3 and grade at time 3 had a positive correlation (r(92) = .39, p < .01). these findings only very weakly support the reciprocal hypothesis h3. students’ individual interest exhibited signs of stability across measurement times. in mathematics time 1 interest correlated strongly with time 2 interest (r(80) = .53, p < .01) and time 3 interest (r(92) = .66, p < .01). the relationship remained at a similar level from time 2 to time 3 (r(73) = .61, p < .01) in biology the results were similar but the effect sizes were smaller. interest in biology at time 1 had a moderate correlations with time 2 interest (r(80) = .43, p < .01) and time 3 interest (r(92) = .45, p < .01). from time 2 to time 3 there was a strong correlation between the interest measures (r(73) = .52, p < .01). overall these results indicate some level of stability, but also of changes. in both mathematics and biology the grades across time were also highly correlated. in mathematics the effect size between time 2 and time 3 grades was (r(100) = .83, p < .01), and in biology (r(102) = .74, p < .01). 4.1.1 mean level differences in students’ interests and grades between classes because the participants were spread to six different classes there was a possibility for the data to be nested differently in these classes. for this, intraclass correlation coefficients (icc) were estimated for each of the measured variables for each of the six classes in order to control that their variances did not differ significantly from each other. the icc values were estimated through variance components estimation using the type iii sum of squares. specific cut-off values for icc that would require the use of multilevel methods usually range from 0.10 (e.g., lee, 2000; koo & li, 2016) to 0.25 (e.g., bowen & guo, 2011). the analysis revealed excellent reliability (icc < .10) for interest in mathematics, interest in biology, and mathematics grade at all three measurement points and for biology grade at time 2. for biology grade at time 3 the icc estimate was 0.11. the low icc values, and based on previous literature about the homogeneity of finnish school classes in terms of mathematics learning (brezovszky et al., 2019), indicated that the classes were homogenous enough, and that there was no need for multi-level methods to be used in the analyses. 4.2 partial least squares (pls) modeling the main differences between pls modelling and more traditional methods of structural equation modelling, such as cbm-sem or regressions based on sum-scores, are how they treat the latent variables included in the model. in cb-sem the constructs are considered as common factors that explain the covariation between the indicators that are associated to the constructs. when estimating the model parameters in cb-sem, the scores of these common factors are not known or needed. in pls-sem the constructs are represented through proxies; the weighted composites of indicator variables to that particular construct. this relaxes the assumption that all the covariation between the sets of indicators are caused by a common factor, and also facilitates accounting for measurement error, which gives it an advantage when compared with multiple regression using sum scores. another advantage that pls-sem has, is its ability to produce a single specific score for each composite of each observation by establishing weights for each proxy. pls-sem estimates coefficients that aim to maximize the r2 values of each target variable, giving it the ability to estimate predictive patterns between the model constructs. hence, pls-sem is a suitable method when the aim of the research is to develop theory and explain variance between the constructs (hair et al., 2017) for the pls analysis, two sets of hypothesized path models, the primary and the alternative models were constructed for both mathematics, and biology. the models consisted of the individual interest variable (mathematics or biology) at time points 1, 2, and 3, and learning outcome variables (grades) at time points 2 and 3. the primary path models, one in each subject domain, were aimed at clarifying the standard hypotheses h1.1 and h1.2 and the affective by-product hypothesis h3. in addition, alternative models were constructed in order to complement the analyses on the behalf of the reciprocal hypothesis h3. 4.2.1 collinearity assessment to check for collinearity in the structural model, the average collinearity variance inflation factor (avif) values were obtained from the model analyses. this was done by looking at each predictor construct separately and estimating how much their variance is artificially increased by other predictor constructs in the model. an avif value higher than 5 exhibits a critical value (hair et al., 2017). the avif values were 1.24 for the primary mathematics model, and 1.53 for the alternative mathematics model. in biology the avif values were 1.07 for the primary model, and 1.31 for the alternative model. these results showed that collinearity was not a critical issue in any of the structural models. 4.2.2 coefficient of determination to evaluate the predictive power of a structural model, a commonly used measure is the coefficient of determination value r2. the coefficient represents the amount of variance that is explained by all of the exogenous constructs in the model that are linked to a certain endogenous construct. (hair et al., 2017). in this study all of the exogenous constructs consisted of one-item measures, some of which also functioned as endogenous constructs. therefore, the r2 values obtained in the analyses represent the amount of variance in a given construct explained by all of the constructs linked to it in the model. the links between constructs and the hypothesized directions of the predictive effects are indicated by arrows, and can overall be seen in figure 1, and specified by the primary and alternative models, as well as by the mathematics and biology domains in figures 2, 3, 4, and 5. the average r2 value for the primary mathematics model was .37, and .37 for the alternative model. in biology these values were .28 for the primary model, and .29 for the alternative model. among the primary models for mathematics and biology the lowest r2 value was obtained in the biology grade at time 2 construct (r2 = .09) meaning that only 9% of its variance was explained by the two constructs linked to it, namely interest in biology at time 1 and time 2. the highest value was obtained in the mathematics grade at time 3 construct where the r2 value was .69. while this construct had three explaining constructs linked to it, namely interest in mathematics at time 2 and time 3, and mathematics grade at time 2 making a higher r2 value understandable the difference is still considerable. in the alternative models for mathematics and biology the results were very similar, with the lowest r2 value again being with biology grade at time 2 (r2 = .08), and the highest with mathematics grade at time 3 (r2 = .66). 4.2.3 predictive relevancy of the models following the recommendation of hair et al., (2017) stone-geisser’s q2 values were obtained through a blindfolding procedure in order to examine the models’ predictive relevancies. in pls path modeling, predictive relevancy means that the model also accurately predicts data that has not been used in the model estimation. in the blindfolding procedure data is re-used by deleting data points systematically and providing them a prediction of their original values, by treating them as missing values in the model. these values are then compared to the original data in order to determine the prediction error between the predicted data points and the true omitted data points. the sum of squared prediction errors is used to calculate the q² value. q2 values higher than 0 suggest that the endogenous construct is relevantly predicted by the model (hair et al., 2017). the q2 values for mathematics variables ranged between .19 and .68 in both the primary and the alternative models, and for biology between .08 and .58 in the both models which means that all models had predictive relevancy for the constructs. 4.3 results from the structural equation models in this section the results of the partial least squares analyses are presented. the results of the primary pls models can be seen in figure 2 for mathematics, and figure 3 for biology. in addition, the results for the alternative models that aimed to complement the analyses for the reciprocal hypotheses h3, by converting the direction of the relationships between interest measures and learning outcomes at simultaneous time points, are presented in figure 4 for mathematics, and figure 5 for biology. 4.3.1 standard hypotheses examining the first research hypotheses of whether interest predicts learning outcomes, the focus was on the primary model, which can be seen in figures 2 and 3. students’ interest in mathematics at time 1 predicted their mathematics grade at time 2 (β = 0.42, p < .01). in biology a similar pattern was found with time 1 interest also predicting biology grade in time 2 (β = 0.26, p < .01). however, the predictive relationship between interest at time 2 and grades at time 3 were found to be nonsignificant in both subjects. these findings supported the standard model hypothesis h1.1 but not h1.2. figure 2. primary partial least squares model of individual interest in mathematics and mathematics grades. figure 3. primary partial least squares model of individual interest in biology and biology grades. 4.3.2 affective by-product hypotheses examining the second research question of whether learning outcomes predict interest the results were again obtained from the primary model. this choice was made because the direction of the predictive relationships concerning the affective by-product hypothesis h2 did not differ between the primary and the alternative models. results of the analyses showed that mathematics grade at time 2 predicted interest in mathematics at the end of the school year at time 3 (β = 0.32, p < .01). similarly, biology grade at time 2 predicted interest in biology at time 3 (β = 0.25, p < .01). these findings supported affective by-product hypothesis h2. 4.3.3 reciprocal hypotheses results from the primary model analyses revealed that in both mathematics and biology, students’ interest at time 2 did not predict their grades at time 2 as the results were nonsignificant. however, there was a significant predictive effect from interest to grades at time 3 in both mathematics (β = 0.15, p < .05) and biology (β = 0.16, p < .05). based on these results the reciprocal hypothesis h3 was supported only at time 3. as it was already mentioned earlier, pls analysis does not allow testing bidirectional relationships at the same time. this shortcoming was compensated by formulating an alternative model in which the direction of the relationships between interest and grades were reversed at time 2 and time 3. in the alternative model the results were similar to the standard model. in both mathematics and biology, students’ grades at time 2 did not predict their interest at time 2 as the results were nonsignificant. this meant that no support was found for hypothesis h3.1. at time point 3 students’ grades did, however, predict their interest in both mathematics (β = 0.32, p < .01) and biology (β = 0.36, p < .01) again providing partial support for the reciprocal hypothesis h3. figure 4. alternative partial least squares model of individual interest in mathematics and mathematics grades figure 5. alternative partial least squares model of individual interest in biology and biology grades. 5. discussion the current study examined how students’ individual interest in mathematics and biology related to their learning outcomes in these subjects. the study objective was to test the predictive validity of three different sets of hypotheses that were introduced by rotgans and schmidt (2017b) in the context of mathematics and biology education in secondary schools: the standard hypothesis, the affective by-product hypothesis, and the reciprocal hypothesis. when looking at the predictive validity of the standard hypothesis, namely whether or not interest was a predictor of learning outcomes, the results differed in the autumn and spring semester. in the autumn semester students’ interest had a predictive effect on learning outcomes in both mathematics and biology. this finding was also supported by the results of the correlational analyses. when looking at the spring semester the results were quite different. in this semester, the predictive effect of interest towards learning outcomes was not found in either of the two subjects, and the correlations also became non-significant. this indicated at best partial support for the standard hypotheses that interest is an antecedent for knowledge acquisition. in the second hypothesis, the framing was reversed, and the focus was on whether students’ learning outcomes predicted their individual interest. in this alternative model both mathematics and biology students’ grades at time 2 predicted their interest at the end of the school year. in addition, the correlation analyses revealed moderate positive correlations on both subjects across the two measurement times. in other words, students who received higher grades at the mid-semester evaluation were more likely to express higher levels of interest in the subject at the end of the school year. these results offer support for the affective by-product hypothesis. similar to the standard hypotheses, the results for reciprocal hypotheses of interest and learning differed between semesters. at time 2, interest did not predict learning outcomes at the same time point in either of the two subjects. correlational analyses showed only a weak positive relationship between interest and grade in mathematics, while in biology this was not observed. however, at the end of the school year at time 3 interest did have a predictive effect on grades in both subjects and correlations were moderate. similar pattern was also found in the alternative model where students’ grades at time 2 did not predict their interest at the same time point, but at time 3 they did. when looking over the course of the whole year however both models show a path from interest to interest through grades. this seems to indicate that the relationship between students’ interest and learning outcomes did not stay the same throughout the school year, which is in line with the reciprocity view. to summarize, the standard hypothesis was supported only during either the autumn semester or the spring semester, but not throughout the school year. this finding was consistent across the two subjects. thus, the affective by-product hypothesis was supported, but because it could only be tested across two measurement times between time 2 and time 3, namely spring semester, drawing too strict conclusions of these results would be premature, especially considering the fact that there were indications for reciprocity both on one time point and across time points one explanation for these findings could relate to the differences between the measurement situations. at time 1 measurement point the students had just started their journey through secondary education and had changed to a new school with new teachers and new classmates. in this kind of environment, new external factors may have affected their interest. this aligns with the idea that in the interplay between interest and learning interest goes through an evolutional process. at first, when a student has insufficient knowledge about the topic, situational interest needs to be triggered and re-triggered before individual interest develops. over time the effect of situational interest decreases, and more value and knowledge based individual interest becomes the decisive factor in learning (rotgans & schmidt, 2017c). changes during this process (e.g. grades) may reinforce or disrupt this development. 5.1. practical implications interest and its relation to other motivational factors as well as learning outcomes have been studied extensively in the past (e.g. ainley et al., 2002; hidi & renninger, 2006; harackiewicz et al., 2008; krapp & prenzel, 2011). previous literature has often concluded that helping students become interested in the topic at hand would also help them to achieve better learning outcomes. however, evidence for interest directly predicting learning outcomes has largely been missing and this has given rise to alternative interpretations of the role interest has in the learning process. following the ideas of rotgans and schmidt (2017b; 2017c) this study tested three sets of hypotheses as regards to the possible relationship that interest and learning might have during secondary school students’ school year. based on the results we present two conclusions that should be taken into consideration in future research. firstly, the relationship between individual interest and learning found in this study underline that this relationship is not stable throughout time, but can exhibit changes, both positive and negative, during the time period of one school year. over longer periods of time the relationship may vary; a student might, for instance, express individual interest towards a topic or a subject but may not be able to achieve the learning outcomes he or she wished for, which may affect interest. from the perspective of educators these findings point out the importance of designing curriculums and learning environments so that they offer experiences of success for each student. aiming for high academic standards in schools is of course a desirable goal for education but it can also lead to some adverse effects if it is seen as the sole purpose of teaching. receiving low grades can have a negative impact on students’ self-perceived ability and may quell their interest towards the study subject (baumert, schnabel & lehrke, 1998). supporting students’ interest in studies should in itself be viewed as an important goal since it has been found to relate to their career choices later on in life (maltese & harsh, 2015). supporting students’ interest towards mathematics and science is also relevant from the perspective of 21st century skills, since societies are becoming increasingly technology-driven, and navigating in them in the future requires skills that are many times taught in stem subjects in schools. technology may also offer solutions for the issue of students becoming disengaged in stem learning because of negative performance assessment. digital learning environments can offer more individualized feedback to each learner and at the same time offer more accurate support for learning as well as learning tasks that are more finely balanced in terms of difficulty. 5.2. theoretical implications one theoretical contribution that this study has to offer is the longitudinal setting spanning over a whole school year that revealed patterns that are not easy to explain within the existing frameworks. previous research literature has usually focused only on narrow time frames where hard to expect any interest development or very long time frames where development may occur, but fluctuations may also easily stay out of sight. there is no clear consensus in the research literature what would constitute an appropriate timeline for interest development. in their study, knogler et al. (2015) carried out a science study intervention over the course of three weeks. their findings suggest that situational interest is as its name indicates, situational, and does not transfer to other learning situations to great extent. the case with individual interest is less clear and especially the process of how and when situational interest starts to evolve into individual interest is subject to debate. rotgans and schmidt (2017c) criticize the four-phase model of interest development (hidi & renninger, 2006; renninger & hidi, 2011) for being too simplified and vague. their suggestion is that situational and individual interests differ, especially in the way they are connected to knowledge. situational interest arises from a knowledge gap that the person wants to fill, whereas individual interest can only start to develop once the person has acquired some knowledge of the object of interest. within this view situational interest is not something that only precedes individual interest, but always exists, depends on past experiences of interest and knowledge development, and may influence further interest and knowledge development in the situation. the findings of this study seem to indicate that mathematics and biology differ somewhat in terms of stability of interest as well as in relation to learning outcomes. in biology (m = 3.33) students rated their interest, on average, higher than in mathematics (m = 2.88) throughout the school year. this is in line with findings from previous research that biology is the more popular science subject (baram-tsabari, 2015) among school students. however, when looking at the correlations, the students’ interest in biology did not correlate as strongly across time points as it did in mathematics, thus indicating lower level of stability. what is perhaps even more surprising is that students’ interest in biology at the beginning of school year did not correlate with their grade either at time 2 or time 3. one explanation for this could be that biology as a subject is less clear than mathematics, causing interest in the subject to be also less stable. 5.3. future directions and limitations the measurement used to measure individual interest in this study consisted of only one item, and although similar one-item instruments have been used in previous studies (tapola et al., 2013; tapola et al., 2014), it still warrants the question of how would a more fine-grained instrument have affected the results. as data for this study was collected as a part of a larger research project the choice to limit the questionnaire items was practical; keeping the questionnaire compact enough not to risk overburdening the participants. in the future, it would be recommended to widen the instrument so that it could take into consideration the value component of interest. combining this with qualitative methods, such as interviews, would provide a better understanding of the processes that affect students’ interest development over time. one limitation could also be that information about students’ learning outcomes were obtained only twice during the school year. although students’ grades are the normal way of evaluating school performance, it may be that the pressure for students to receive a better final grade during the spring semester is greater than during the first half of the school year. because their initial knowledge levels on stem subjects were not controlled at the beginning of the school year, there was no exact way of knowing how much learning had taken place between time points 1 and 2. however, the participants were 7th graders who were on their first year of secondary education that normally lasts for three years. it can be argued that the pressure to perform well increases towards the end of 9th grade, when they need to start making choices about their future education, and the possible negative consequences of that also probably have a bigger effect. another limitation relates to the results of the pls analyses. some of the r-squared values in the model are quite low which means that the model was able to explain only a relatively small amount of the variance in some of the variables. for example, variances in students’ time 2 grades were only explained by 19% through their individual interest in both mathematics and biology. this leaves the question of what other, perhaps latent variables, would account for the rest of the variance. in addition, and perhaps more surprisingly, a large portion of variance in students’ interest at the end of the school year was left unexplained by this model. this calls for further research to investigate what other factors, external and internal, affect students’ interest development during the school year. keypoints the predictive validity of three hypotheses on the relationship between interest and learning during a school year were tested interest and learning outcomes had a reciprocal relationship that alternated during the school year. future studies would benefit from combining a longitudinal setting with more detailed student profiles. as a practical implication, instead of just grading, offering students' more detailed feedback on their performance across learning situations may foster their interest towards stem. new learning technologies could provide support for teachers to receive information on their students' interest development and offer possibilities for more individualized learning paths. acknowledgements this research was partially supported by the finnish cultural foundation. references ainley, m. (2012) students’ interest and engagement in classroom activities. in s. christenson, a. reschly, & c. wylie (eds.), handbook of research on student engagement (pp. 283–302). springer. https://doi.org/10.1007/978-1-4614-2018-7_13 ainley, m. d., hidi, s., & berndorff, d. (2002). interest, learning, and the psychological processes that mediate their relationship . journal of educational psychology, 94(3), 1–17. https://doi.org/10.1037/0022-0663.94.3.545 albin m. l., benton s. l., khramtsova i. (1996). individual differences in interest and narrative writing. contemporary educational psychology, 21(4), 305–324. https://doi.org/10.1006/ceps.1996.0024. baram-tsabari, a. (2015). promoting information seeking and questioning in science. in k. a. renninger, m. nieswandt, & s. hidi (eds.), interest in mathematics and science learning (pp. 135–152). american educational research association. https://doi.org/10.3102/978-0-935302-42-4 baram-tsabari, a., & yarden, a. (2010) quantifying the gender gap in science interest.international journal of science and mathematics education, 9(3), 523–550. https://doi.org/10.1007/s10763-010-9194-7 baumert, j., schnabel, k., & lehrke, m. (1998). learning math in school: does interest really matter? in l. hoffmann, a. krapp, k. a. renninger, & j. baumert (eds.), interest and learning (pp. 327–336). kiel: ipn. bong, m., lee, s. k., & woo, y.-k. (2015). the roles of interest and self-efficacy in the decision to pursue mathematics and science. in k. a. renninger, m. nieswandt, & s. hidi (eds.), interest in mathematics and science learning (pp. 33–48). american educational research association. https://doi.org/10.3102/978-0-935302-42-4 bowen, n. k., & guo, s. (2011). structural equation modeling. oxford university press. https://doi.org/10.1093/acprof:oso/9780195367621.001.0001 brezovszky, b., mcmullen, j., veermans, k., hannula-sormunen, m. m., rodríguez-aflecht, g., pongsakdi, n., laakkonen e., & lehtinen, e. (2019). effects of a mathematics game-based learning environment on primary school students’ adaptive number knowledge. computers & education, 128, 63–74. https://doi.org/10.1016/j.compedu.2018.09.011 chen, j. a., tutwiler, m. s., metcalf, s. j., kamarainen, a., grotzer, t., & dede, c. (2016). a multi-user virtual environment to support students’ self-efficacy and interest in science: a latent growth model analysis. learning and instruction, 41, 11–22. https://doi.org/10.1016/j.learninstruc.2015.09.007 european commission (ec)(2015). does the eu need more stem graduates? publications office of the european union. https://doi.org/10.2766/000444 gegenfurtner, a., knogler, m., & schwab, s. (2020). transfer interest: measuring interest in training content and interest in training transfer. human resource development international, 23(2), 146–167. https://doi.org/10.1080/13678868.2019.1644002 gil-garcia, j. r. (2008). using partial least squares in digital government research. in g. d. garson & m. khosrow-pour (eds.), handbook of research on public information technology (vol. 1, pp. 239–253). information science reference. https://doi.org/10.4018/978-1-59904-857-4.ch023 hair, j. f., hult, g. t. m., ringle, c. m., & sarstedt, m. (2017). a primer on partial least squares structural equation modeling (2nd edition.). los angeles: sage. harackiewicz, j. m., durik, a. m., barron, k. e., linnenbrink-garcia, l., & tauer, j. m. (2008). the role of achievement goals in the development of interest: reciprocal relations between achievement goals, interest, and performance. journal of educational psychology, 100(1), 105–122. https://doi.org/10.1037/0022-0663.100.1.105 hay, i., callingham, r., & carmichael, c. (2015). interest, self-efficacy, and academic achievement in a statistics lesson. in k. a. renninger, m. nieswandt, & s. hidi (eds.), interest in mathematics and science learning (pp. 203–224). american educational research association. https://doi.org/10.3102/978-0-935302-42-4 hidi, s., & renninger, a. (2006). the four-phase model of interest development. educational psychologist, 41(2), 111–127. https://doi.org/10.1207/s15326985ep4102_4 hofer, m. (2010). adolescents’ development of individual interests: a product of multiple goal regulation? educational psychologist, 45 (3), 149–166. https://doi.org/10.1080/00461520.2010.493469 hulleman, c. s., & harackiewicz, j. m. (2009). promoting interest and performance in high school science classes. science, 326 (5958), 1410–1412. https://doi.org/10.1126/science.1177067 knogler, m., harackiewicz, j. m., gegenfurtner, a., & lewalter, d. (2015). how situational is situational interest? investigating the longitudinal structure of situational interest. contemporary educational psychology, 43, 39–50. https://doi.org/10.1016/j.cedpsych.2015.08.004 koo, t., & li, m. (2016). a guideline of selecting and reporting intraclass correlation coefficients for reliability research. journal of chiropractic medicine, 15(2), https://doi.org/10.1016/j.jcm.2016.02.012. krapp, a., hidi, s., & renninger, k. a. (1992). interest, learning and development. in k. a. renninger, s. hidi, & a. krapp (eds.), the role of interest in learning and development (pp. 3–25). erlbaum. krapp, a., & prenzel, m. (2011). research on interest in science: theories, methods, and findings. international journal of science education, 33(1), 27–50. https://doi.org/10.1080/09500693.2010.518645 köller, o., baumert, j., & schnabel, k. (2001). does interest matter? the relationship between academic interest and achievement in mathematics. journal for research in mathematics education, 32(5), 448–470. https://doi.org/10.2307/749801 lee, v. e. (2000). using hierarchical linear modeling to study social contexts: the case of school effects, educational psychologist, 35(2), 125–141, https://doi.org/10.1207/s15326985ep3502_6 maltese, a. v., & harsh, j. a. (2015). students’ pathways of entry into stem. in k. a. renninger, m. nieswandt, & s. hidi (eds.), interest in mathematics and science learning (pp. 203–224). https://doi.org/10.3102/978-0-935302-42-4 nieswandt, m. (2007). student affect and conceptual understanding in learning chemistry. journal of research in science teaching, 44(7), 908–937. https://doi.org/10.1002/tea.20169. osborne, j., simon, s., & collins, s. (2003). attitudes towards science: a review of the literature and its implications. international journal of science education, 25, 1049–1079. https://doi.org/10.1080/0950069032000032199 renninger, k. a. (2000). individual interest and its implications for understanding intrinsic motivation. in c. sansone & j. m. harackiewicz (eds.), intrinsic and extrinsic motivation: the search for optimal motivation and performance (pp. 373–404). academic press. https://doi.org/10.1016/b978-012619070-0/50035-0 renninger, k. a., austin, l., bachrach, j. e., chau, a., emmerson, m. s., king, b. r., riley, k. r., & stevens, s. j. (2014). going beyond whoa! that’s cool! achieving science interest and learning with the ican intervention. in s. karabenick & t. urdan (eds.), motivation-based learning interventions: advances in motivation and achievement series (vol. 18, pp. 107–140). https://doi.org/10.1108/s0749-742320140000018003 renninger, k. a., & hidi, s. (2011). revisiting the conceptualization, measurement, and generation of interest. educational psychologist, 46(3), 168–184. https://doi.org/10.1080/00461520.2011.587723 rodríguez‐aflecht, g., jaakkola, t., pongsakdi, n., hannula-sormunen, m., brezovszky, b., & lehtinen, e. (2018). the development of situational interest during a digital mathematics game. journal of computer assisted learning, 34(3), 259–268. https://doi.org/10.1111/jcal.12239 rotgans, j. i., & schmidt, h. g. (2011). situational interest and academic achievement in the active-learning classroom. learning and instruction, 21, 58–67. https://doi.org/10.1016/j.learninstruc.2009.11.001 rotgans, j. i., & schmidt, h. g. (2017a). how individual interest influences situational interest and how both are related to knowledge acquisition: a microanalytical investigation. the journal of educational research,. 111(5), 530–540. https://doi.org/10.1080/00220671.2017.1310710 rotgans, j. i. & schmidt, h. g. (2017b). the relation between individual interest and knowledge acquisition. british educational research journal, 43(2), 350–371. https://doi.org/10.1002/berj.3268 rotgans, j. i. & schmidt, h. g. (2017c). the role of interest in learning: knowledge acquisition at the intersection of situational and individual interest. in p. a. o'keefe & j. m. harackiewicz (eds.) the science of interest (pp. 69–93). cham: springer. tai, r. h., liu, c. q., maltese, a. v., & fan, x. (2006). planning early for careers in science. science, 312(5777), 1143–1144. https://doi.org/10.1126/science.1128690 tapola, a., jaakkola, t., & niemivirta, m. (2014). the influence of achievement goal orientations and task concreteness on situational interest. journal of experimental education, 82(4), 455–479. https://doi.org/10.1080/00220973.2013.813370 tapola, a., veermans, m., & niemivirta, m. (2013). predictors and outcomes of situational interest during a science learning task. instructional science, 41(6), 1047–1064. https://doi.org/10.1007/s11251-013-9273-6 uitto, a., juuti, k., lavonen, j., & meisalo, v. (2006). students’ interest in biology and their out-of-school experiences. journal of biological education, 40(3), 124–129, https://doi.org/10.1080/00219266.2006.9656029 microsoft word fleckenstein_publication.docx frontline learning research vol. 3 no. 2 (2015) 27 46 issn 2295-3159 corresponding author: johanna fleckenstein, leibniz institute for science and mathematics education, department of educational research, olshausenstr. 62, 24118, kiel, germany. email address: fleckenstein@ipn.uni-kiel.de doi: http://dx.doi.org/10.14786/flr.v3i2.162 what works in school? expert and novice teachers’ beliefs about school effectiveness johanna fleckensteina, friederike zimmermannb, olaf köllera, jens möllerb aleibniz institute for science and mathematics education, germany bkiel university, germany article received 4 april 2015 / revised 4 april 2015 / accepted 27 april 2015 / available online 4 june 2015 abstract in 2009, john hattie first published his extensive metasynthesis concerning determinants of student achievement. it provides an answer to the question: “what works in school?” the present study examines how this question is answered by preand in-service teachers, how their beliefs correspond to the current state of research and whether they differ according to the teachers' level of expertise. thus, it takes on a novel approach as it draws on data from two sources in the field of education -empirical research and teachers’ beliefs -and examines their similarities and differences. the teachers’ beliefs were elicited by asking n = 729 participants to estimate the effect sizes of several determinants of student achievement. those were compared to the empirical effect sizes found by hattie (2009). profile correlations showed that expert teachers’ beliefs are more congruent with current research findings than those of novice teachers. we further examined where expert and novice teachers’ beliefs differ substantially from each other by using confirmatory factor analysis (cfa) and comparing group means in latent variables. our findings suggest that teachers’ beliefs about school effectiveness are related to professional experience: expert teachers showed a stronger overall congruence with empirical evidence, scoring higher in achievement-related variables and lower in variables concerning surfaceand infrastructural conditions of schooling as well as student-internal factors. results are discussed with regard to teacher-education practices that emphasize research findings and challenge existing beliefs of (prospective) teachers. keywords: teacher beliefs; teacher education; professional competence; school effectiveness fleckenstein et al f | f l r 28 teachers’ beliefs are often guided by subjective experience rather than by empirical data. thus, it is to be expected that they generally diverge from research findings. this assumption also pertains to the specific case of teachers’ beliefs that manifest in their response to the question: “what works in school?” school effectiveness research has tried to answer this question, the latest attempt being hattie’s (2009, 2012) metasynthesis of factors that influence student achievement. however, little is known of the practitioner’s answer to one and the same question: what factors do (expert and novice) teachers believe to influence student achievement? where do their beliefs differ from research findings substantially? these questions are highly relevant as teachers’ beliefs have been shown to influence teaching and learning. if they differ substantially from results of school effectiveness research, we have reason to assume a negative effect on educational outcomes. for example, if a teacher undervalues a particular teaching method or overvalues surface structural aspects like class size, this could be a serious threat to effective teaching. an investigation into teachers’ beliefs concerning school effectiveness can show us where these discrepancies are and thus inform teacher education practice and classroom instruction. teachers’ beliefs are typically represented as part of a multi-dimensional construct of teachers’ professional competence (baumert & kunter, 2006). they influence teachers’ perceptions and judgments and, consequently, affect their classroom instruction (calderhead, 1996; pajares, 1992). beliefs are subject to change and thus can be expected to differ in expert and novice teachers. in the present study, we are particularly interested in a specific subset of teachers’ beliefs, namely those regarding the effectiveness of schooland education-related factors. in the last few decades, there has been a lot of research concerning the question of what works in school – and what does not (fraser, walberg, welch & hattie, 1987; walberg, 1986; hattie, 2003; wang, haertel, & walberg, 1990, 1993). the most recent and extensive example of this is hattie’s synthesis of meta-analyses (2009, 2012), in which he examines the influence of 138 factors on student achievement. teachers should be familiar with such findings of school effectiveness research in order to make informed decisions and focus on the most effective interventions. for this kind of evidence-based practice, however, teachers not only have to know about such findings but actually to believe that they are true. hence, we asked pre-service (“novice”) and in-service (“expert”) teachers for their beliefs about the efficacy of certain determinants of student achievement. these ratings of our expert and novice teachers were then compared with each other and contrasted with the findings of hattie (2009). in the following we will provide a brief theoretical background on teachers’ beliefs. since the body of research on teachers’ beliefs is quite extensive we concentrate on the following relevant aspects: the theoretical construct of teachers’ beliefs, the influence of teachers’ beliefs on classroom processes and student outcomes, the issue of teachers’ beliefs being guided by subjective experience rather than objective fact, and the general differences in beliefs of novice and expert teachers. subsequently, we locate teachers’ beliefs within a model of teachers’ professional competence, which – in accordance with the prevailing expert(-novice) paradigm – suggests the malleability of all its components. the beliefs examined here focus on determinants of student achievement, therefore, we will also show the most important and recent results of school effectiveness research. as hattie (2009) serves as the basis of our study, particular attention is paid to his comprehensive aggregation of existing meta-analyses (metasynthesis; see zell & krizan, 2014). 1. theoretical background 1.1 teachers’ beliefs according to barcelos (2003), beliefs are types of thoughts which provide a basis for decisions and actions. harvey (1986) described a belief system as “a set of conceptual representations which signify to its fleckenstein et al f | f l r 29 holder a reality or a given state of affairs of sufficient validity, truth or trustworthiness to warrant reliance upon it as a guide to personal thought and action” (p. 146). beliefs in general are thought of as psychologically held understandings, premises, or propositions about the world that a person perceives as being veritable (richardson, 1996). teachers’ beliefs can be seen as a substructure of the general belief system. they consist of beliefs that serve as a guide when dealing with schooland instruction-related situations. in educational settings, haney et al. (2003) defined beliefs as “one’s convictions, philosophy, tenets, or opinions about teaching and learning” (p. 367). as such, teachers’ beliefs may include subjective theories about how students learn, what a teacher should or should not do and which instructional strategies work effectively. the last few decades have brought out a substantial body of research on the beliefs of teachers (for comprehensive research reviews see calderhead, 1996; fang, 1996; pajares, 1992; nespor, 1987; richardson, 1996; stuart & thurlow, 2000; verloop et al., 2001; wenden, 1999; woods, 1996; zheng, 2009). teachers’ beliefs influence perception and judgment which in turn guide their actions in the context of school and education (pajares, 1992). prior research has shown that teachers’ beliefs have a critical impact on the way they teach in the classroom, learn how to teach, and perceive educational reforms (m. borg, 2001; allen, 2002; s. borg, 2003; freeman, 2002; yook, 2010). other studies have shown the importance of teachers’ beliefs for student achievement (peterson, fennema, carpenter & loef, 1989; staub & stern, 2002). an especially well-researched issue in this context are teachers’ self-efficacy beliefs. according to tschannen-moran et al. (1998, p. 233), teacher efficacy is “the teacher’s belief in his or her capability to organize and execute courses of action required to successfully accomplish a specific task in a particular context”. such beliefs have been shown to critically influence a teacher’s performance and motivation (bandura, 1997; ross,1998; tschannen-moran & woolfolk hoy, 2001; tschannen-moran, woolfolk hoy & hoy,1998; woolfolk & hoy, 1990; woolfolk, rosoff & hoy, 1990; woolfolk hoy & davis, 2006) as well as his or her students’ achievement in school (bates, latham & kim, 2011; muijs & rejnolds, 2001; ross, 1992, 1998). against this background it becomes evident that the beliefs of teachers are an important issue for teaching and learning. the definitions above show that beliefs are always perceived to be correct by the individual. however, this is not necessarily the case: teachers’ beliefs – as a subgroup of beliefs in general – are especially likely to be flawed in the sense that they contradict empirical evidence. in the following we discuss why this is and in what way it can lead to substantial problems, in particular for novice teachers. teachers’ beliefs are highly subjective, tend to be persistent and develop at a rather early stage in life (lortie, 1975; pajares, 1992). this is partly due to the long-term experience with schools and classrooms during a teacher’s own time as a student, which serves as the starting point of his or her training and career. hattie (2009) calls this relatively stable system of beliefs the “grammar of schooling” (p. 5). it contains tacit and simplified notions of what a good teacher is and how students are supposed to behave (clark, 1988; nespor, 1987). as such, this belief system often diverges from empirical findings and can fatally influence the process of teaching (kunter & pohlmann, 2009). understanding and challenging one’s own beliefs is therefore considered an important aspect of teacher qualification (bromme, 1997; woolfolk-hoy, davis & pape, 2006). however, this does not seem to be an easy task, since not even the confrontation with dissonance (e.g., induced by empirical evidence that contrasts one’s beliefs) necessarily leads to a corresponding change in beliefs (hart, 2004; pajares, 1992). hattie (2009) claims that teachers experience almost everything they do in the classroom to have a positive influence on their students’ learning. the remarkable differences in effectiveness of their efforts are easily overlooked due to the lack of firsthand comparison, since teachers are usually confined to their own classroom. they see that what they do seems to work fine, as (almost) everything a teacher does leads to an increase in students’ achievement. thus, there is always anecdotal evidence for the effectiveness of certain methods, even though the actual extent to which they support students’ learning often differs dramatically. hattie describes this basic principle of teaching as “just leave me alone as i have evidence that what i do enhances learning and achievement” (hattie, 2009, p. 6). admittedly, this is a rather simplified account of fleckenstein et al f | f l r 30 teachers’ self-efficacy beliefs. most teachers indeed struggle with the teaching methods they use, try out new things and reflect whether or not they seem to be working. however, they often do this within their own frame of reference, not based on research evidence. the fact that teachers’ beliefs do not necessarily coincide with empirical findings can be problematic if not reflected thoroughly. especially novices can have inadequate notions of what constitutes good teaching. weinstein (1989) found that on average, pre-service teachers overestimate affective and social variables of classroom instruction (such as patience and the ability to relate to children) and underestimate cognitive and academic variables (such as organization and challenging). however, when contrasted with the perspectives of educational policy makers and researchers, in-service teachers’ beliefs seem very similar to those of their less experienced colleagues. while the former two speak of good teaching in terms of outcomes in standardized assessment and direct instruction models (policy makers), as well as ‘masterful teachers’ with a whole set of well-defined professional skills (researchers; e.g. shulman, 1987), the latter two have a notion of ‘good teachers’ that can be described as “warm, caring individuals who enjoy working with children” (weinstein, 1989, p. 59). hogan, rabinowitz, and craven (2003) compared novice and expert teachers and found that student achievement was important for expert teachers, while novice teachers paid more attention to student interest. 1.2 teachers’ professional competence teacher training and professional development are central issues in the international discussion on teacher effectiveness (bauer & prenzel, 2012; cochran-smith & zeichner, 2005; darling-hammond & bransford, 2005). the development of standards in teacher education requires an explicit analysis of challenges teachers face in their everyday professional life. moreover, it demands a specification of competences necessary to master these challenges. the quest for the good teacher is not new; however, the specific facets of teachers’ professional competence and the premise that teachers’ cognitions are modifiable by means of training are direct results of a relatively recent objective in teacher(-education) research: the expert(-novice) paradigm (berliner, 2004; bromme, 1997, 2003; ericsson & lehmann, 1996; ericsson, charness, feltovich & hoffman, 2006). accordingly, this paper is based on the central assumptions that (a) good teachers are experts of learning and teaching, and (b) they achieve this expertise in the form of professional competence through continuous teacher education and professional experience. experts are roughly defined as “individuals who exhibit reproducibly superior performance on representative, authentic tasks in their field” (ericsson, 2006, p. 688). it is assumed that teachers’ expertise or professional competence is acquired throughout preand in-service training as well as by hands-on experience in the classroom (berliner, 2004). teachers’ professional competence is usually represented as a multi-dimensional construct. as such, based on the five core propositions of the national board for professional teaching standards (nbpts), baumert and kunter (2006) proposed a model of teachers’ professional action competence with four non-hierarchical dimensions: (1) specific declarative and procedural knowledge, which further distinguishes between content knowledge (ck), pedagogical knowledge (pk), and pedagogical content knowledge (pck) (shulman, 1986, 1987); (2) professional beliefs, values, subjective theories, normative preferences and objectives; (3) motivational orientations; and (4) meta-cognitive skills and professional self-regulation. in line with the expert(-novice) paradigm we assume that all of these competencies are subject to change throughout a teacher’s professional life. the boundaries between these categories of teachers’ cognition, however, are more or less fuzzy: knowledge – pk and pck in particular – and beliefs are strongly interrelated theoretical constructs, even though they rely on dissimilar epistemological notions, as „belief is based on evaluation and judgment; knowledge is based on objective fact“ (pajares, 1992, p. 313). one and the same response to a pedagogical question can either demonstrate well-founded knowledge or it can be based on subjective belief. the two answers may differ in their epistemological status, though their distinction – philosophically speaking – is a mere social construct: leatham (2006) argues that beliefs (things we just believe) and knowledge (things we fleckenstein et al f | f l r 31 more than believe) can be viewed as complementary subsets of the things we believe. in comparison to belief, knowledge is characterized by a higher degree of certainty, for example, by being grounded in empirical evidence. in many empirical studies on teacher beliefs, however, the distinction between knowledge and beliefs is rather blurry. it is very difficult to distinguish whether teachers refer to their knowledge or beliefs when they plan, make decisions, or act in classroom (verloop, van driel, & miejer, 2001). in the present study, we use the concept of teachers’ beliefs to refer to cognitions of teachers that are subjective and normative in nature, while they may or may not coincide with the more objective construct of knowledge. 1.3 school effectiveness research the present study deals with teachers’ beliefs concerning factors of school effectiveness. thus, in the following we briefly summarize the central findings of school effectiveness research. we focus on proximal vs. distal aspects of schooling as there is a broad consensus concerning this dichotomy in the literature. the analyses performed in this paper focus on the broad categories school, teaching and student, so we aim at summarizing the comprehensive literature in school effectiveness research on this rather general level. subsequently, we give a more detailed overview of hattie’s 2009 metasynthesis, concentrating on those variables and categories that were used in our questionnaire to elicit teacher’ beliefs. in the last few decades, increased efforts were made in school effectiveness research to study the importance of a range of determinants on successful schooling. the question of what works in school (and what does not) has been the central issue of a number of meta-analyses and research syntheses (coleman et al., 1966; fraser et al., 1987; hattie, 2003, 2009, 2012; jencks et al., 1973; scheerens & bosker, 1997; seidel & shavelson, 2007; walberg, 1986; wang et al., 1990, 1993). those studies do not always agree on the specific size of an effect, however, there is a general tendency with regards to certain factors of school effectiveness. in general, the majority of these studies suggest that the amount of variance explained by proximal – schoolor classroom-related – variables is considerable, and has a greater influence on student learning than more distal aspects such as school system and educational policy (seidel & shavelson, 2007). the general emphasis on proximal variables was, for example, shown by scheerens and bosker (1997). they combined the results of three meta-analyses as well as a re-analysis of an international data set and found that school-organizational factors (e.g., monitoring/ evaluation, orderly climate), instructional conditions (e.g., opportunity to learn, homework), and aspects of structured teaching (e.g., feedback, cooperative learning) are a better explanation for the differences between the achievement of students than more distal aspects such as resource input factors (e.g., student-teacher ratio, teachers’ salary). moreover, the rank-ordering presented by wang et al. (1993) put student characteristics and classroom practices ahead of design of program and school demographics. they found particularly strong effects for the variables meta-cognition, classroom management and quantity of instruction. fraser et al. (1987) found the highest correlations with performance tests for variables related to student characteristics (especially cognitive ones), learning strategies, and structured or direct teaching. the results also revealed that open teaching and individualization are less powerful factors, at least when the dependent variable is (cognitive) achievement. similar findings were shown by walberg (1986). there have been many attempts to find a comprehensive consensus with regards to the effects of certain factors of school effectiveness. scheerens (2004) presented the effectiveness enhancing conditions of schooling in five review studies (cotton, 1995; levine & lezotte, 1990; purkey & smith, 1983; sammons, hillman & mortimore, 1995; scheerens, 1992): a consensus was reached with respect to many instruction-related factors, such as achievement orientation, high expectations, frequent testing/ monitoring, professional development, and structured or purposeful teaching. hattie’s (2009) synthesis of over 800 meta-analyses was one of the most recent milestones in school effectiveness research: 52,637 individual studies with over 83 million students were used in order to determine the relevance of 138 factors for student achievement. for each of these factors he determined fleckenstein et al f | f l r 32 cohen’s d as the averaged effect size. as a convention in the school context, effect sizes of d > .40 are considered substantial, since this would imply greater effects than one year of average schooling (köller, 2012); hattie calls this the zone of desired effects. hence, the point of reference for the effectiveness of an innovation is not d = 0, but d = .40. hattie’s results were largely in line with the prior findings in school effectiveness research as described in the preceding paragraphs. he systematized the individual factors according to six superordinate categories: student, teacher, teaching, curriculum, school, and family. the central results can be summarized as follows: more or less ineffective factors with d < .40 were primarily infrastructural conditions of schooling, such as withinor between-class grouping, finances, and reduction of class size. moreover, aspects of the surface structure of teaching, which is often associated with progressive teaching approaches (e.g. open learning, multi-grade/-age classes, team teaching), did not show to be very effective either. these results may be surprising considering the socio-political discourse on education; however, against the background of modern classroom research they are to be expected: research has shown that successful learning can be better predicted by the deep structure of teaching and learning than by the surface structure (e.g., seidel & shavelson, 2007). the latter can be observed and described without much effort, while the former requires more elaborate assessment. the use of surface-structure learning methods is not beneficial by itself, but only if it affects the level of deep-structure cognitive processing (e.g., by giving constructive feedback or teaching meta-cognitive strategies). in line with prior research on school effectiveness, high effect sizes could also be shown for cognitive and emotional student characteristics (e.g. prior knowledge, motivation) and instructional, achievement-related variables such as direct instruction and high expectations of the teacher. in agreement with prior research, hattie’s findings suggest that more distal factors are less important than proximal factors and that the structural conditions of teaching are less important than the process of teaching itself. the results also highlight the importance of the students’ cognitive and noncognitive prerequisites for learning. 2. the present study in our study we attempted a direct comparison of the results of school effectiveness research (i.e. the effect sizes from hattie’s study) with the beliefs of novice and expert teachers (i.e. their ratings of effect sizes). this was a rather novel approach; however, wang et al. (1993) adopted a similar strategy when they compared the results of 91 meta-analyses with the ratings of experts in education, namely 61 distinguished educational researchers. the correlation they found between expert ratings and meta-analyses was .59 (p < .01). the authors concluded that there is a general agreement between expert ratings and the meta-analyses regarding the effect of different variables on student learning and their relative strength. while wang et al. (1993) examined the judgments of experts in educational research, our study dealt with the beliefs of teachers that are either enrolled in a teacher training program or work as teachers and school administrators. the objective of wang et al. was to build a knowledge base in school effectiveness research: they used three different methods – content analyses, expert ratings, and results from meta-analyses – to quantify the importance and consistency of variables that influence student learning. our objective, on the other hand, was to explicitly address the beliefs of those groups that actually are or are going to be working in the field and directly influence classroom processes. prior research on teachers’ beliefs (in general and especially those of novice teachers) has shown that they tend to be very subjective and are rather unlikely to be guided by empirical evidence, so we had reason to assume that is also the case also for beliefs concerning determinants of student achievement. thus, we expected our preand in-service teachers’ beliefs to differ more strongly from the findings of school effectiveness research than the ratings of expert researchers. with the epistemological question in mind that we raised above, one could also argue that wang et al. (1993) examined knowledge while we examined beliefs. fleckenstein et al f | f l r 33 school effectiveness research gives us a good theoretical understanding of what works in school. however, we have reason to assume that teachers’ cognitions are not congruent with these findings, since their beliefs are at risk to be guided by subjective experience and beliefs rather than by empirical data. the present study focused on teachers’ beliefs about which factors determine their students’ achievement. hence, the central questions were: a) what are teachers’ beliefs about the impact of the above-mentioned factors on student achievement, and to what extent do these beliefs diverge from findings of empirical research (i.e., the effect sizes of hattie’s research synthesis)? b) what are the differences in the beliefs of novice and expert teachers on a latent level of meaningful factors of school effectiveness? 3. methods 3.1 sample the sample comprised n = 729 participants (64% female); n = 358 were in-service (“expert”) teachers and n = 371 pre-service (“novice”) teachers. teachers of the first group were in service at different schools in the federal states schleswig-holstein and hamburg, germany. of these participants 53 % were women, the medium age was m = 52.3 years (sd = 8.8) ranging from 28 to 64 years. this subgroup included teachers from different types of schools (primary and secondary).the data from this subsample was collected in the context of professional development lectures for in-service teachers. though attendance was not mandatory, the lectures were open for all teachers from the two states. the attending teachers can be considered true experts as many of them were in leadership positions at their schools (training supervision, school administration, etc.). the pre-service teachers were university students enrolled in the first year of a master of education (m.ed.) at a university in the northern part of germany. to illustrate the background of our sample we briefly outline a typical teacher training program in germany: at most german universities teacher training is composed of a three-year bachelor (b.a./ b.sc.) program and a two-year master (m.ed.) program. it includes the academic study of two scientific disciplines and didactics for the corresponding school subjects. in addition, students take a variety of courses in educational sciences. after university studies, students transfer to the more practical part of teacher training. they train teaching in schools for one to two years before they become proper teachers. our pre-service teachers had completed a bachelor program that prepared them for graduate studies in teacher education. they had done practical training in schools for a period of six weeks in total; however, the bachelor program was clearly focused on the theoretical study of the two scientific disciplines. only a small proportion of the degree was dedicated to introductory classes on educational sciences and didactics. thus, we can assume that their prior knowledge concerning these subjects was not very advanced. the percentage of female participants in this group was 71%. their mean age was m = 24.7 years (sd = 2.5), ranging from 23 to 39 years. the data was collected during a lecture on psychology in education, in which all students enrolled in the m.ed. were required to participate. 3.2 procedure in order to assess teachers’ beliefs about the effectiveness of factors for students’ achievement a questionnaire was developed based on 16 determinants of student learning (see table 2) selected from hattie (2009). criteria for the selection of items were the coverage of a large range of effect sizes (d = .01-.73) and fleckenstein et al f | f l r 34 the coverage of the a priori categories school, teaching and student from hattie’s study. these categories were chosen as a focus of our study since there seemed to be the highest consensus about the extent of their impact on student learning among school effectiveness studies. moreover, we selected those variables that we assumed even inexperienced university students would be familiar with, as most of them are also a frequent issue in political and academic discourse. table 1 intervals of effect sizes and their interpretations by hattie (2009) and köller (2012) range of effect sizes interpretation by hattie (2009) interpretation by köller (2012) d < 0 reverse effects harmful 0 ≤ d < 0.15/0.2 developmental effects not harmful, not helpful 0.15/0.2 ≤ d < 0.4 teacher effects a little helpful 0.4 ≤ d < 0.6 zone of desired effects helpful d ≥ 0.6 very helpful the questionnaire was administered in the context of a lecture on hattie’s study. first of all, participants were introduced to the concept of meta-analysis in general and the design of hattie’s research synthesis in particular. subsequently, they were familiarized with the concept of cohen’s d and its practical implications: the formula d = (mtest – mcontrol) / sdpooled was presented to the teachers and explained in detail with the help of examples. in order to give practical significance to the rather abstract notion of effect size the interpretation of effect sizes as presented in the rightmost column of table 1 was introduced and displayed during the completion of the questionnaire. the participants were asked to estimate the impact of each of the factors on a scale of effect sizes ranging from d = -0.4 to d = 1.0. the precise instruction was: “please estimate the effect of each of the factors below on students’ achievement”, followed by the list of variables. participants were briefly familiarized with the 16 factors, that is, short comments were given on what was meant by the factors. 3.3 statistical analyses first, group means were calculated on the level of the 16 manifest variables and further analyzed by comparing them to the effect sizes of hattie (2009). this was achieved by calculating the correlation coefficient pearson’s r for each person’s rating profile with the distribution of hattie’s effect sizes. the coefficients were transformed by fisher’s z in order to approximate constant variance for all values. the use of fisher’s z transformation is recommended when averaging correlation coefficients as the distribution of r is skewed (silver & dunlap, 1987). the resulting fisher’s z coefficients were aggregated per group (preand in-service teachers) and the resulting means (mz) of the two groups were compared using an independentsample t-test. thus, we could determine the difference in preand in-service teachers in terms of congruence with hattie’s results. second, we examined whether ratings on several individual items could be aggregated on a higher level, that is, in a latent variable model. confirmatory factor analysis (cfa) was carried out in mplus 7 (muthén & muthén, 1998-2012) in order to analyze the underlying latent structure of the data, which then served as a basis for comparisons between preand in-service teachers on the reduced number of meaningful categories on a higher level. the assumed factor structure was based on hattie’s (2009) a priori categorization of the variables. the factor teaching contained variables from hattie’s categories teaching and teacher, the factor school contained variables from hattie’s category school, and the factor student contained variables from hattie’s category student. modification indices indicated, however, that the factor teaching fleckenstein et al f | f l r 35 was to be split in two (teaching and achievement). in the final measurement model we specified four latent variables using maximum likelihood estimation (ml). the latent variables were allowed to correlate and some error terms of manifest variables were allowed to covary if considered plausible. missing data (< 5%) were estimated with the help of the full information maximum likelihood (fiml) procedure. subsequently, in order to allow for meaningful comparisons between the subgroups, measurement invariance was tested using a multiple-group modeling approach (meredith & teresi, 2006). for a comparison of latent means across the two groups (prevs. in-service teachers) at least partial scalar invariance is required (byrne, shavelson & muthén, 1989). in multiple group analysis, when the specified model includes a mean structure, both the intercepts and factor loadings of the continuous factor indicators are held equal across groups to specify (scalar) measurement invariance. the intercepts of the factors are fixed at zero in the first group and are free to be estimated in the other groups. thus, differences between the two groups can be determined based on the latent factors. 4. results 4.1 descriptive statistics and item level analysis table 2 shows descriptive statistics for the two groups – preand in-service teachers – on item level as well as hattie’s (2009) research results. the factors are ranked by the size of their effect (d) on student achievement as found by hattie in his metasynthesis. in the following we elaborate on the descriptive results, especially on those variables with the highest and lowest ratings in each group. both groups seemed to believe in the importance of student variables, as they both showed the highest means on the factors motivation and attitude. ranked third by both preand in-service teachers was feedback. multi-grade/age learning (in-service group) and direct instruction (pre-service group), respectively, had the lowest ratings of all 16 factors. pre-service teachers had significantly higher effect sizes for the variables feedback, prior achievement, motivation, attitude, class size, co-/team teaching, within-class grouping, and open learning. in-service teachers’ beliefs showed higher effect sizes for direct instruction, high expectations, and selfconcept. table 2 hattie's effect sizes (d), group means (mgroup), and standard deviations (sd) factors dhattie min-service mpre-service feedback .73 .55 (.23) .62 (.24)a teaching meta-cognitive strategies .69 .53 (.25) .53 (.26) prior achievement .67 .32 (.23) .39 (.24)a professional development .62 .40 (.23) .42 (.25) direct instruction .59 .28 (.23)a .18 (.25) motivation .48 .63 (.24) .73 (.22)a expectations .43 .36 (.25)a .23 (.29) self-concept .43 .55 (.21)a .48 (.24) attitude .36 .56 (.24) .66 (.21)a frequent/effects of testing .34 .34 (.24) .34 (.26) fleckenstein et al f | f l r 36 class size .21 .34 (.31) .59 (.30)a co-/team teaching .19 .37 (.27) .45 (.28)a within-class grouping .16 .48 (.27) .61 (.24)a problem-based learning .15 .52 (.23) .52 (.25) multi-grade/age classes .04 .19 (.26) .22 (.27) open learning .01 .29 (.27) .37 (.27)a a superscript characters indicate statistically significant (p < .01) higher mean effect sizes for the respective group (results of two-sample independent t-tests using bonferroni correction to account for the multiple comparisons problem) we determined bivariate correlations for each teacher’s rating profile on the one hand and hattie’s results on the other by calculating pearson’s r for each person. the coefficients were transformed by fisher’s z and aggregated per group. these means were then compared using an independent-sample t-test. for the pre-service teachers the mean (fisher z-transformed) correlation with hattie’s d was mz=.06 (sd =.32), for the in-service teachers it was mz = .23 (sd = .35). these group differences were statistically significant (t[715] = 7.12; p < .001; d = .51), indicating a substantially higher degree of conformity of the experts’ ratings with hattie’s results. 4.2 cfa and multiple-group analysis a priori, for the item pool selected for this study we assumed three latent factors based on hattie’s categorization of indicators: school, teaching/teacher and student (χ2[95]=586.36; cfi=.85; rmsea=.08; tli=.81; srmr=.08). empirically, however, a four-dimensional structure resulted in a better model fit (χ2[92] = 340.85; cfi = .92; rmsea = .06; tli = .90; srmr = .05). the factor teaching was split in two, separating the strongly achievement-focused indicators from more specific instructional teaching behaviors. the improvement in goodness-of-fit indices was substantial (δcfi > .01; δrmsea > .015) (cheung & rensvold, 2002), so we decided on the four-dimensional model (see table 3). residual correlations were allowed for some indicators with substantial covariance that was not explained by the latent factor. all items loaded significantly (p < .001) and almost all items loaded substantially (λ ≥ .4) on one of the latent factors. the only items with factor loadings slightly below the minimum value were prior achievement (λ = .37) and multi-grade/age classes (λ =.37). the former is the only indicator for the factor student that focuses on cognitive rather than motivational aspects of a student’s academic prerequisites. this might explain the low factor loading. the latter may have been a difficult concept for many of the participants as by far not all teachers encounter this instructional challenge throughout their careers. due to high modification indices, residual correlations were allowed for six item pairs (multigrade/age classes with open learning and co-/team teaching, class size with co-/team teaching and withinclass grouping, teaching meta-cognitive strategies with feedback, motivation with attitude). the majority of these modifications were performed within the factor structure. they seemed to be theoretically sound as certain infrastructural conditions of schooling (class size, multi-grade/age classes) are strongly associated with or even demand certain surface-structural aspects of learning (open learning; co-/team teaching; withinclass grouping). teaching meta-cognitive strategies and feedback are both direct and concrete instructional measures of the teacher, motivation and attitude towards subject refer to very similar student-internal constructs (as opposed to the other respective indicators). the allowed covariances were the same for both models (three and four latent factors) that were tested. fleckenstein et al f | f l r 37 table 3 standardized factor loadings matrix of the cfa teaching achievement school student 1 feedback 0.62 2 teaching meta-cognitive strategies 0.68 3 professional development 0.70 4 problem-based learning 0.69 5 direct instruction 0.52 6 expectations 0.69 7 frequent/effects of testing 0.48 8 multi-grade/age classes 0.37 9 open learning 0.65 10 class size 0.51 11 co-/team teaching 0.56 12 within-class grouping 0.77 13 motivation 0.67 14 prior achievement 0.37 15 self-concept 0.50 16 attitude towards subject 0.57 in the following we will explain the four dimensions in more detail: the factor school comprises infrastructural conditions of schooling (class size; multi-grade/age classes) and the surface-structure of learning (open learning; co-/team teaching; within-class grouping). the factor teaching contains manifest variables concerning instructional methods (feedback; teaching meta-cognitive strategies; problem-based learning) and the teacher (professional development), while the factor achievement emphasizes achievementfocused and teacher-centered variables (direct instruction; teacher expectations; frequent/effects of testing). student-internal prerequisites (motivation; self-concept; prior achievement; attitude) constitute the factor student. table 4 correlation matrix of latent factor model achievement school student teaching .41** .70** .66** achievement -.16** .17* school .78** **p < .01; *p < .05 table 4 shows bivariate correlations of the four latent variables, which were all statistically significant. all coefficients were positive apart from the one between achievement and school, which showed a negative relationship. the intercorrelations were strongest for teaching/school, teaching/student, and school/student. the factor achievement consistently showed the weakest relationships with all the other fleckenstein et al f | f l r 38 factors. these results indicate that, in general, teachers had a tendency to believe in either high or low effect sizes for factors of school effectiveness. however, achievement-related variables seemed to be the exception. table 5 measurement invariance across groups (preand in-service teachers) model parameters constrained χ² df cfi tli rmsea srmr 1 none (configural invariance) 405.22 184 .932 .912 .058 .055 2 fl (metric invariance) 428.09 196 .929 .913 .057 .057 3 fl, il (scalar invariance) 627.50 208 .872 .852 .075 .069 3b fl, il (partial scalar invariance) 448.51 205 .925 .913 .058 .058 note. fl = factor loadings. ii = item intercepts. for model identification in model 1 and 2 (item intercepts freely estimated) latent means were fixed to zero. cfi = comparative fit index. tli = tucker-lewis index. rmsea = root mean square error of approximation. srmr = standardized root mean square residual. the model showed partial scalar invariance (strong measurement invariance; see table 5), which indicated that factor structure and factor loadings as well as item intercepts were equal for both groups. pre and in-service teachers attributed the same meaning to the latent constructs and the levels of the underlying items. thus, the proposed factor model can be assumed to represent the belief structure of pre-service as well as in-service teachers. strong measurement invariance allows for the comparison of latent group means. in the latent mean structure analysis the pre-service teachers were chosen as a reference group so that the difference in means between preand in-service teachers on each construct equals the mean of the nonreference group (in-service teachers). the means of the in-service teachers are as shown in table 6. they valued the achievement-related factor considerably higher, while rating the effects of infra-/surface-structure and student-internal variables lower than pre-service teachers. the group differences concerning the factor teacher/teaching were not significant. table 6 mean group differences in latent variables* factor mdifference p teaching -0.12 ns achievement 0.67 <.001 school -0.56 <.001 student -0.70 <.001 *mpre-service = 0; min-service = mdifference 5. discussion 5.1 summary the present paper deals with expert and novice teachers’ beliefs about school effectiveness. we investigated the differences between (a) teachers’ beliefs versus findings of school effectiveness research (cf. hattie, 2009), and (b) expert versus novice teachers’ beliefs. for this purpose preand in-service teachers fleckenstein et al f | f l r 39 were asked to rate the effect sizes of several determinants of student achievement. profile correlations were aggregated and compared in terms of similarity to recent empirical findings (hattie, 2009). we found significant differences between preand in-service teachers, as the latter showed a stronger overall congruence with hattie’s results. subsequently, data were combined by using a four-dimensional cfamodel with the latent factors school, teaching, achievement, and student. partial measurement invariance could be established allowing the comparison of latent factor means of the two groups. in-service teachers showed higher means in achievement-focused variables (e.g., direct instruction, and high expectations) and lower means in variables concerning the infraand surface-structural conditions of schooling (e.g., class size, co-/team teaching, within-class grouping, and open learning) as well as student-internal variables (e.g., prior achievement, motivation, and attitude). the structure of teachers’ beliefs concerning school effectiveness seemed to resemble the a priori categorization of relevant research studies. the assumed categorical structure (school, teaching, and student) used by hattie (2009) to organize his metasynthesis was largely met by the data. an exception was the separation of the factor teaching into teaching and achievement. teachers seemed to distinguish two types of instructional preferences: one that foregrounds the support of students’ learning (feedback, teaching metacognitive strategies) and one that focuses mainly on cognitive achievement (high expectations, frequent/effects of testing). while school effectiveness research has shown that both have a strong positive influence on student achievement (see chapter 1.3), teachers undervalued instructional choices that point directly and explicitly at academic achievement. one key finding of this study was that in-service experience and training seem to be associated with teachers’ beliefs about school effectiveness. the beliefs of experienced teachers were more consistent with empirical results than those of novice teachers. this is in line with the current theoretical expert(-novice) paradigm, according to which teachers develop expertise in the course of their education and career. first of all, the ratings of our in-service teachers suggested that they value a kind of activating, teacher-directed instruction, which is supposed to affect the deep structure of classroom learning more than the pre-service teachers do. second, in comparison to the novices they believed infraand surface-structural variables to be not as relevant. this hierarchy is also emphasized by the results of school effectiveness research as outlined in chapter 1.3. this compliance indicates that expert teachers are more competent in assessing the effects of a range of variables than novice teachers. furthermore, it shows that educational research appears not as far from a teacher’s perceived everyday reality as is often suggested. in turn, we must acknowledge that some of the beliefs concerning influences on student achievement – particularly (but not only) those of pre-service teachers – diverge from empirical findings quite dramatically. similarly to weinstein (1989), we found that affective (e.g. motivation, attitude) and social (e.g. within-class grouping) variables were overestimated, while cognitive variables (e.g. direct instruction, prior achievement) were underestimated, especially by the pre-service teachers, but also – to a lesser extent – by in-service teachers. this insight is quite valuable considering the impact of beliefs on classroom instruction. hence, our results call for a paradigmatic change in the way teachers are trained. in the following we consider the limitations of this study before we conclude with its strengths and practical implications for the field of teacher education. 5.2 limitations one problem with the interpretation of our results was the epistemological status of the information given by the participants: did we actually elicit their beliefs or rather their theoretical knowledge about what should be correct. as we mentioned above, the confrontation with objective data, such as the findings of empirical investigations, does not necessarily lead to a change in individual beliefs: knowing something does not equal believing in it. so the response of the teachers to a certain item may not have shown their beliefs but instead, represent the knowledge they have about school effectiveness research. hence, the epistemological distinction between beliefs and knowledge that we addressed above could not be followed fleckenstein et al f | f l r 40 through completely. similarly, the issues of tacit knowledge, its accessibility and the reciprocal relationship of implicit and explicit knowledge are highly relevant topics that go beyond the scope of this study. even if we assume that we are dealing with teachers’ beliefs (not knowledge), we cannot be certain that those beliefs are actually put into practice. in our report of prior research on teachers’ beliefs we pointed out that beliefs are considered to affect classroom processes and, in turn, student outcomes. however, the assumption that teachers apply everything they believe to classroom practice would go too far (beliefbehavior gap; cf. sheeran, 2002). so a general identification of “good beliefs” with “good teacher” is too simplistic. this issue requires further research that examines the relevance of teachers’ beliefs (concerning factors of school effectiveness) for instructional choices and student achievement. another drawback is that in the present study hattie’s findings served as a kind of “quasi-reality”. despite all its merits, as a synthesis of meta-analyses it also poses some methodological difficulties (for a more extensive discussion see terhart, 2011), which limits the interpretability of the discrepancy between “real” and teacher-estimated effect sizes. thus, even though we concentrated on those variables that have shown similar results in other school effectiveness studies, the comparison with hattie’s results is to be interpreted with caution. the same holds for the comparison of preand in-service teachers: as our data was not longitudinal, strictly speaking we cannot interpret the discrepancy between the groups as a development or acquisition of competence. additionally, in order to validate and stabilize the latent factor structure further investigations with a larger number of variables from hattie’s study are needed. the generalizability of our results to other countries is limited for two different reasons: firstly, the intercultural generalizability of hattie’s study is already questionable. it mainly relies on research findings from anglophone countries, not all of which are equally relevant for education in germany and for the beliefs of german teachers. secondly, we also have to take into account that our sampling was restricted to the german education system and, moreover, to one specific teacher training program. beliefs are rather likely to differ in terms of structureand content-related aspects of teacher education in different countries and institutions. furthermore, they are subject to the more general cultural and academic situation in a certain time and place. for example, the acceptance and appreciation of empirical research may differ considerably from country to country and from faculty to faculty. in case of this study, the fact that german teacher training programs generally focus on the study of two scientific disciplines rather than on educational science and field experience might impact the beliefs of teachers. the question of differences in teachers’ beliefs according to differences in their (cultureand program-specific) education and practice would be an interesting topic for future research. our research was restricted to teachers’ beliefs concerning cognitive achievement. unfortunately, hattie’s work does not identify determinants of motivation, attitude, self-concept or other affective variables. a study that examines determinants of affective outcomes to the extent that hattie did this for cognitive outcomes is still missing in educational effectiveness research. thus, there is no basis for research on teachers’ beliefs concerning those variables yet. 5.3 strengths and educational implications especially the beliefs of pre-service teachers differed significantly from the results of hattie’s research synthesis. of course neither hattie’s findings nor the beliefs of expert teachers can be taken as ultimately true or as factual reality. however, they both emphasize similar aspects of schooling: the role of the teacher as an activator (rather than a facilitator), the importance of academic achievement and the comparably little significance of structural conditions. if teacher educators address these issues explicitly and confront their students with their own beliefs as well as with the findings of school effectiveness research, they can help (prospective) teachers to focus on what has yet been shown to work best in school. the fact that pre-service teachers’ beliefs diverged from empirical findings more strongly than those of experienced teachers suggests that learning opportunities in the field do make a difference. the experience fleckenstein et al f | f l r 41 that teachers gain in their years of classroom practice seems to affect their judgment in a way that is beneficial for their belief systems. one could argue that the constant feedback they get from their students’ performance in terms of (successful and unsuccessful) interventions helps them challenge and adjust their beliefs where necessary. in-service teachers’ focused on instructional strategies and factors that support the deep structure of learning might thus be a reaction to their (more or less systematic) observation and monitoring of what actually works in their own classroom. pre-service teachers, however, are missing this direct feedback in terms of actual student outcomes as their confrontation with actual classroom situations is very limited. holding on to established beliefs might be a result of them not being challenged by reality in the field. moreover, one could expect novice teachers to be quite overwhelmed by their first tentative efforts in teaching. the demands they have to meet in the classroom are manifold and they need to concentrate on various things at once. in such a state, focusing on the surface structure of learning seems easier than focusing on deep-structural aspects. only with substantial practice and experience, when other processes come to them more naturally and intuitively, teachers get the opportunity to pay attention to those instructional details that have actually been shown to work in school. early and regular work experience in school during teacher training might aid the acquisition of the necessary professionalism, as it presents the opportunity to familiarize pre-service teachers with real classroom situations and their role as a teacher. however, this should be realized only with respect to the current state of research as we have little reason to assume that practical school experiences for pre-service teachers automatically lead to better teaching abilities or a better understanding for the purposes and consequences of teaching. (tabachnik et al., 1979-1980). practical experience in school does not automatically make better teachers. this also applies to their beliefs: studies on short-term work experiences during teacher training has shown the resilience of teachers’ beliefs to change (hascher, 2012; richardson, 1996). in order to avoid this misguided process, these early experiences need to be instructed and accompanied by professional teacher trainers. if planned and exercised carefully, practical teaching experience during teacher training at university can lay a solid foundation for a teacher’s career. the other central finding of this study was the fundamental discrepancy of teachers’ beliefs and empirical evidence from school effectiveness research. to some extent, this might be due to shortcomings in teacher training programs to convey to future teachers the importance of evidence-based practice. the australian educational scholar and administrator michele bruniges puts the lack of data usage in the teaching profession into the following words: “a greek philosopher might suggest that evidence is what is observed, rational and logical; a fundamentalist – what you know is true; a post modernist – what you experience; a lawyer – material which tends to prove or disprove the existence of a fact and that is admissible in court; a clinical scientist – information obtained from observations and/or experiments; and a teacher – what they see and hear” (bruniges, 2005; p. 102). while systematic observation and monitoring of students’ learning processes are very desirable actions to be taken by teachers, “seeing and hearing” should not be the only sources for their professional choices and actions. our study supports the claim that teachers rarely rely on available research evidence. their assessment of what actually works in school rather seems to be guided by subjective experiences that are usually gained in the isolation of their own classrooms. but it would certainly be wrong to lay all the blame on the teachers: what we need is an evidencebased culture of improvement in teaching and learning. in order to achieve this goal, three professions in the field of education need to assume responsibility: researchers, teacher trainers, and teachers themselves. first of all, educational researchers are confronted with the issue of making their findings available to teachers. more often than it is already done, they should break rather abstract studies down to what is of practical relevance for the field. such efforts may counteract aversion to empirical research on the side of the teachers. with his follow-up book “visible learning for teachers”, hattie sets a good example for this kind of transfer. secondly, those who educate and train pre-service teachers need to make sure their students are fleckenstein et al f | f l r 42 familiar with relevant research findings, can interpret them appropriately, and have the necessary skills to implement them in school. in addition to assessing students’ knowledge, they should be attentive to their beliefs and make room for critical discussion of empirical versus anecdotal evidence. explicitly addressing the issue of teachers’ beliefs and confronting (future) teachers with cognitive dissonance might support a critical reflection and examination of existing beliefs. last but not least, it is a necessity for teachers themselves to stay in touch with research communities in order to understand current developments and to constantly reflect on their beliefs in comparison with crucial evidence provided by researchers. keypoints teachers’ beliefs diverge from empirical evidence expert teachers’ beliefs diverge from novice teachers’ beliefs expert teachers show more congruence with empirical evidence than novice teachers expert teachers believe in the effectiveness of achievement-related variables novice teachers believe in the effectiveness of structural and student factors references allen, l. q. (2002). teachers’ pedagogical beliefs and the standards for foreign language learning. foreign language annals, 35, 518-529. http://dx.doi.org/10.1111/j.1944-9720.2002.tb02720.x bandura, a. (1997). self-efficacy: the exercise of control. new york: freeman. barcelos, a. m. f. (2003). researching beliefs about sla: a critical review. in p. kalaja and a. m. f. barcelos (eds.), beliefs about sla: new research approaches (pp. 7-33). dordrecht: kluwer academic publishers. http://dx.doi.org/10.1007/978-1-4020-4751-0_1 bates, a. b., latham, n. & kim, j. (2011). linking preservice teachers' mathematics self-efficacy and mathematics teaching efficacy to their mathematical performance. school science and mathematics. 111, 325-333. http://dx.doi.org/10.1111/j.1949-8594.2011.00095.x bauer, j. & prenzel, m. (2012). european teacher training reforms. science, 336, 1642-1643. http://dx.doi.org/10.1126/science.1218387 baumert, j., & kunter, m. (2006). stichwort: professionelle kompetenz von lehrkräften. zeitschrift für erziehungswissenschaft, 9, 469-520. http://dx.doi.org/10.1007/s11618-006-0165-2 berliner, d. (2004). describing the behavior and documenting the accomplishments of expert teachers. bulletin of science, technology and society, 24, 200-214. http://dx.doi.org/10.1177/0270467604265535 borg, m. (2001). teachers’ beliefs. elt journal, 55, 186-188. http://dx.doi.org/10.1093/elt/55.2.186 borg, s. (2003). teacher cognition in language teaching: a review of research on what language teachers think, know, believe, and do. language teaching, 36, 81-109. http://dx.doi.org/10.1017/s0261444803001903 bromme, r. (1997). kompetenzen, funktionen und unterrichtliches handeln des lehrers. in f. e. weinert (ed.), psychologie des unterrichts und der schule. enzyklopaedie der psychologie (vol. 3, pp. 177212). goettingen: hogrefe. bromme, r. (2003). on the limitations of the theory metaphor for the study of teachers' expert knowledge. in m. kompf & p. denicolo (eds.). teacher thinking twenty years on: revisiting persisting problems and advances in education (pp. 283-294). liss, nl: swets & zeitlinger. bruniges, m. (2005). an evidence-based approach to teaching and learning. http://research.acer.edu.au/research_conference_2005/15. fleckenstein et al f | f l r 43 byrne, b. m., shavelson, r. j., & muthén, b. (1989). testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. psychological bulletin, 105, 456-466. http://dx.doi.org/10.1037/0033-2909.105.3.456 calderhead, j. (1996). teachers: beliefs and knowledge. in d. berliner & r. calfee (eds.), handbook of educational psychology (pp. 709-725). new york: macmillan. cheung, g. w., & rensvold, r. b. (2002). evaluating goodness-of-fit indexes for testing mi. structural equation modeling, 9, 235-55. http://dx.doi.org/10.1207/s15328007sem0902_5 clark, c. m. (1988). asking the right questions about teacher preparation: contributions of research on teacher thinking. educational researcher, 17, 5-12. http://dx.doi.org/10.3102/0013189x017002005 cochran-smith, m., & zeichner, k. (eds.). (2005). studying teacher education: the report of the aera panel on research and teacher education. mahwah: lawrence erlbaum associates. coleman, j. s., campbell, e. q., hobson, c. f., mcpartland, j., mood, a. m., weifeld, f. d., et al. (1966). equality of educational opportunity. washington, d.c.: u.s. government printing office. cotton, k. (1995). effective schooling practices: a research synthesis. 1995 update. school improvement research series. northwest regional educational laboratory. darling-hammond, l., & bransford, j. (eds.). (2005). preparing teachers for a changing world. what teachers should learn and be able to do. san francisco: jossey-bass. ericsson, k. a. (2006). the influence of experience and deliberate practice on the development of superior expert performance. in k. a. ericsson, n. charness, p. feltovich, and r. r. hoffman, r. r. (eds.). cambridge handbook of expertise and expert performance (pp. 685-706). cambridge, uk: cambridge university press. ericsson, k. a., & lehmann, a. c. (1996). expert and exceptional performance: evidence of maximal adaptations to task constraints. annual review of psychology, 47, 273-305. http://dx.doi.org/10.1146/annurev.psych.47.1.273 ericsson, k. a., charness, n., feltovich, p. j., & hoffman, r. r. (2006). the cambridge handbook of expertise and expert performance. cambridge: cambridge university press. http://dx.doi.org/10.1017/cbo9780511816796 fang, z. (1996). a review of research on teacher beliefs and practices. educational research, 38, 47-65. http://dx.doi.org/10.1080/0013188960380104 fraser, b. j., walberg, h. j., welch, w. w., & hattie, j. a. (1987). syntheses of educational productivity research. international journal of educational research, 11, 147-252. http://dx.doi.org/10.1016/08830355(87)90035-8 freeman, d. (2002). the hidden side of the work: teacher knowledge and learning to teach. language teaching, 35, 1-13. http://dx.doi.org/10.1017/s0261444801001720 haney, j. j., lumpe, a. t., & czerniak, c. m. (2003). constructivist beliefs about the science classroom learning environment: perspectives from teachers, administrators, parents, community members, and students. school science and mathematics, 103, 366-377. http://dx.doi.org/10.1111/j.19498594.2003.tb18122.x hart, l. c. (2004). beliefs and perspectives of first-year, alternative preparation, elementary teachers in urban classrooms. school science & mathematics, 104, 79-88. http://dx.doi.org/10.1111/j.19498594.2004.tb17985.x harvey, o. j. (1986). belief systems and attitudes toward the death penalty and other punishments. journal of psychology, 54, 143-159. http://dx.doi.org/10.1111/j.1467-6494.1986.tb00418.x hascher, t. (2012). lernfeld praktikum – evidenzbasierte entwicklungen in der lehrer/innenbildung. [learning setting student teaching – evidence-based developments in teacher education]. zeitschrift für bildungsforschung, 2, http://dx.doi.org/10.1007/s35834-012-0032-6 hattie, j. (2003). teachers make a difference: what is the research evidence? paper presented at the australian council for educational research annual conference on building teacher quality. hattie, j. (2009). visible learning. a synthesis of over 800 meta-analyses relating to achievement. london & new york: routledge. hattie, j. (2012). visible learning for teachers. london: routledge. fleckenstein et al f | f l r 44 hogan, t., rabinowitz, m., & craven, j. a. (2003). representation in teaching: inferences from research of expert and novice teachers. educational psychologist, 38, 235-247. http://dx.doi.org/10.1207/s15326985ep3804_3 jencks, c., smith, m. s., ackland, h., bane, m. j., cohen, d., grintlis, h., et al. (1973). inequality: a reassessment of the effect of family and schooling in america. new york: basic books. köller, o. (2012). what works best in school? hatties befunde zu effekten von schulund unterrichtsvariablen auf schulleistungen. [what works best in school? hattie’s findings concerning school and instructional effectiveness on student achievement]. psychologie in erziehung und unterricht, 59, 72-78. http://dx.doi.org/10.2378/peu2012.art06d kunter, m., & pohlmann, b. (2009). lehrer. in j. möller & e. wild (eds.), einführung in die pädagogische psychologie (pp. 261-282). berlin: springer. http://dx.doi.org/10.1007/978-3-540-88573-3_11 leatham, k. (2006). viewing mathematics teachers’ beliefs as sensible systems. journal of mathematics teacher education, 9, 91-102. http://dx.doi.org/10.1007/s10857-006-9006-8 levine, d.k., & lezotte, l.w. (1990). unusually effective schools: a review and analysis of research and practice. madison, wise: nat. centre for effective schools research and development. lortie, d. (1975). schoolteacher: a sociological study. chicago: university of chicago press. meredith, w., & teresi, j. a. (2006). an essay on measurement and factorial invariance. medical care, 44, 69-77. http://dx.doi.org/10.1097/01.mlr.0000245438.73837.89 muijs, r. d., & rejnolds, d. (2001). teachers' beliefs and behaviors: what really matters. journal of classroom interaction, 37, 3-15. muthén, l.k. and muthén, b.o. (1998-2012). mplus user’s guide. seventh edition. los angeles, ca: muthén & muthén. national board for professional teaching standards (2002). what teachers should know and be able to do. arlington. nespor, j. (1987). the role of beliefs in the practice of teaching. journal of curriculum studies, 19, 317-328. http://dx.doi.org/10.1080/0022027870190403 pajares, m. f. (1992). teachers' beliefs and educational research: cleaning up a messy construct. review of educational research, 62, 307-332. http://dx.doi.org/10.3102/00346543062003307 peterson, p., fennema, e., carpenter, t. p., & loef, m. (1989). teachers' pedagogical content beliefs in mathematics. cognition and instruction, 6, 1-40. http://dx.doi.org/10.1207/s1532690xci0601_1 purkey, s.c., & smith, m.s. (1983). effective schools: a review. the elementary school journal, 83, 427452. http://dx.doi.org/10.1086/461325 richardson, v. (1996). the role of attitudes and beliefs in learning to teach. in j. sikula, t. buttery, & e. guyton (eds.), handbook of research on teacher education (pp. 137-147). new york: macmillan. ross, j. a. (1992). teacher efficacy and the effect of coaching on student achievement. canadian journal of education, 17, 51-65. http://dx.doi.org/10.2307/1495395 ross, j. a. (1998). the antecedents and consequences of teacher efficacy. in j. brophy (ed.), advances in research on teaching, vol. 7 (pp. 49-74). greenwich, ct: jai press. sammons, p., hillman, j., & mortimore, p. (1995). key characteristics of effective schools: a review of school effectiveness research. london: ofsted. scheerens, j. (1992). effective schooling, research, theory and practice. london: cassell. scheerens, j. (2004). review of school and instructional effectiveness research. paper commissioned for the efa global monitoring report 2005, the quality imperative. unesco, 2005/ed/efa/mrt/pi/44. scheerens, j., & bosker, r. j. (1997). the foundations of educational effectiveness. oxford: elsevier science ltd. seidel, t., & shavelson, r. j. (2007). teaching effectiveness research in the past decade: the role of theory and research design in disentangling meta-analysis results. review of educational research, 77, 454499. http://dx.doi.org/10.3102/0034654307310317 sheeran, p. (2002). intention-behavior relations: a conceptual and empirical review. in w. strobe and m. hewstone (eds.) european review of social psychology, vol. 12. chichester: wiley, 1-30. http://dx.doi.org/10.1080/14792772143000003 fleckenstein et al f | f l r 45 shulman, l. s. (1986). those who understand: knowledge growth in teaching. educational researcher, 15, 4-14. http://dx.doi.org/10.3102/0013189x015002004 shulman, l. s. (1987). knowledge and teaching: foundations of the new reform. harvard educational review, 57, 1-22. silver, n. c., & dunlap, w. p. (1987). averaging correlation coefficients: should fisher’s z transformation be used? journal of applied psychology, 72, 146-148. http://dx.doi.org/10.1037/0021-9010.72.1.146 staub, f. c., & stern, e. (2002). the nature of teachers’ pedagogical content beliefs matters for students’ achievement gains: quasi-experimental evidence. journal of educational psychology, 94, 344-355. http://dx.doi.org/10.1037/0022-0663.94.2.344 stuart, c., & thurlow, d. (2000). making it their own: pre-service teachers’ experiences, beliefs, and classroom practices, journal of teacher education, 51(2), 112-121. http://dx.doi.org/10.1177/002248710005100205 tabachnik, b. r., popkewitz, t., & zeichner, k. (1979-1980). teacher education and the professional perspectives of student teachers. interchange, 80, 12-29. http://dx.doi.org/10.1007/bf01810816 terhart, e. (2011). has john hattie really found the holy grail of research on teaching? an extended review of visible learning. journal of curriculum studies, 43(3), 425-438. http://dx.doi.org/10.1080/00220272.2011.576774 tschannen-moran, m., & woolfolk hoy, a. (2001). teacher efficacy: capturing an elusive concept. teaching and teacher education, 17, 783-805. http://dx.doi.org/10.3102/00346543068002202 tschannen-moran, m., woolfolk hoy, a., & hoy, w. k. (1998). teacher efficacy: its meaning and measure. review of educational research, 68, 202-248. verloop, n., van driel, j., & meijer, p. (2001). teacher knowledge and the knowledge base of teaching. international journal of education research, 35, 441-461. http://dx.doi.org/10.1016/s08830355(02)00003-4 walberg, h. j. (1986). synthesis of research on teaching. in m. c. wittrock (ed.), handbook of research on teaching. new york: macmillan. wang, m. c., haertel, g. d. & walberg, h. j. (1990). what influences learning? a content analysis of review literature. journal of educational research, 84, 30-43. http://dx.doi.org/10.1080/00220671.1990.10885988 wang, m. c., haertel, g. d. & walberg, h. j. (1993). toward a knowledge base for school learning. review of educational research, 63, 249-294. http://dx.doi.org/10.3102/00346543063003249 weinstein, c. s. (1989). teacher education students' preconceptions of teaching. journal of teacher education, 40, 53-60. http://dx.doi.org/10.1177/002248718904000210 wenden, a. l. (1999). an introduction to metacognitive knowledge and beliefs in language learning: beyond the basics. system, 27, 435-441. http://dx.doi.org/10.1016/s0346-251x(99)00043-3 woods, d. (1996). teacher cognition in language teaching: beliefs, decision-making, and classroom practice. cambridge: cambridge university press. woolfolk, a. e., & hoy,w. k. (1990). prospective teachers' sense of efficacy and beliefs about control. journal of educational psychology, 82, 81-91. http://dx.doi.org/10.1037/0022-0663.82.1.81 woolfolk, a. e., rosoff, b., & hoy, w. k. (1990). teachers' sense of efficacy and their beliefs about managing students. teaching and teacher education, 6, 137-148. woolfolk hoy, a., & davis, h. a. (2006). teacher self-efficacy and its influence on the achievement of adolescents. in f. pajares & t. urdan (eds.), self-efficacy of adolescents (pp. 117-137). greenwich, connecticut: information age publishing. http://dx.doi.org/10.1016/0742-051x(90)90031-y woolfolk hoy, a., davis, h., & pape, s. j. (2006). teacher knowledge and beliefs. in p. a. alexander & p. h. winne (eds.), handbook of educational psychology (2 ed., pp. 715-737). mahwah, nj: erlbaum. yook, c. m. (2010). korean teachers' beliefs about english language education and their impacts upon the ministry of education-initiated reforms. applied linguistics and english as a second language dissertations. paper 14. zell, e., & krizan, z. (2014). do people have insight into their abilities? a metasynthesis. perspectives on psychological science, 9, 111-125. http://dx.doi.org/10.1177/1745691613518075 fleckenstein et al f | f l r 46 zheng, h. (2009). a review of research on pre-service teachers’ beliefs and practices. journal of cambridge studies, 4, 73-81. frontline learning research 6 (2014) 46-66 issn 2295-3159 corresponding author: fang zhao, department of psychology, university of koblenz-landau, germany, email: zhao@unilandau.de doi: http://dx.doi.org/10.14786/flr.v2i4.98 46 | f l r eye tracking indicators of reading approaches in text-picture comprehension fang zhao a , wolfgang schnotz a , inga wagner a , robert gaschler a a university of koblenz-landau, germany article received 21 march 2014 / revised 4 august 2014 / accepted 21 october 2014 / available online 27 october 2014 abstract despite numerous studies on reading and multimedia comprehension, the usage of text and picture with different reading strategies has rarely become a focus of research. the current study aims to explore whether the usage of text differs from the usage of picture when readers follow different strategies of knowledge acquisition. in a within-subjects design using eye tracking, seventeen secondary school students comprehended blended text and picture materials with three different strategies. (1) initial coherence-formation strategy, which requires students to process text and picture unguidedly. (2) consecutive task-oriented strategy, which requires them to gather information to answer the question (which explains the task) provided after prior experience with text and picture. (3) initial task-oriented strategy, which requires them to comprehend text and picture to solve the task equipped with the prior information about the question from the beginning. eye tracking data showed that text and picture play different roles in these processing conditions. (1) the results are in line with the assumption that text (rather than picture) is more likely used to construct mental models in initial coherence-formation processing of text and picture. (2) students seem to primarily rely on the picture to answer the question after the prior experience with the material with consecutive task-oriented strategy. (3) text and picture are both used heavily when the question is presented first, enabling students to selectively process question-relevant aspects of the material at first contact. keywords: multimedia learning; mental model; eye tracking f. zhao et al 47 | f l r after learning to read in primary school, students are required to use their reading skills for learning from written materials. in secondary school, these materials usually include text and different kinds of pictures, such as maps and diagrams. students therefore need skills for integrating text and picture information in order to build the required knowledge structures (ainsworth, 1999; kintsch & van dijk, 1978). to support students’ learning from text and pictures, we need sophisticated knowledge about the usage of text and pictures under different learning conditions. unfortunately, there is so far not much knowledge available about integrated processing of text and pictures. abundant studies have explored reading strategies in text comprehension (e.g. anderson & pearson, 2002; frase, 1967, 1968; rothkopf, 1964, 1966; rouet, 2006). the aim of reading can fundamentally influence cognitive processing due to the application of different reading strategies (andre, 1979; hamilton, 1985; rickards, 1979). however, there has not been much research about reading strategies for combined comprehension of text and pictures. the current study aims at exploring whether the usage of texts differs from the usage of pictures when readers follow different strategies of knowledge acquisition. first, we will introduce a theoretical framework of text-picture comprehension. second, we will formulate research questions and derive hypotheses about the usage of text and pictures under the condition of different reading strategies. third, we will describe a study that was designed to test these hypotheses. fourth, we present the results in the light of the previously mentioned hypotheses. fifth, we will discuss the empirical findings and analyze their relations to findings from the previous literature. 1. theory 1.1 theories of text-picture comprehension many theories focused on text-picture comprehension (tpc; kress & leeuwen, 1996; rouet & britt, 2011; zwaan, 1998). however, there are mainly three theories specifically targeting formats of mental representations involved in tpc: (1) dual coding theory (paivio, 1986), (2) the cognitive theory of multimedia learning (mayer, 2005) and (3) the integrative model of tpc (schnotz & bannert, 2003). the three theories share the assumption of separate channels for text and picture processing but differ on other issues. the dual coding theory focuses on the referential connections between text and pictures and assumes that people can retrieve information better by using two different channels. the cognitive theory of multimedia learning assumes that people process information through an auditory-verbal channel and a visual-pictorial channel of limited capacity. multimedia learning is assumed to include: (i) selecting relevant words; (ii) selecting relevant images; (iii) organizing the selected words into a verbal mental model; (iv) organizing the selected images into a pictorial mental model; and (v) integrating the verbal model and the pictorial model with prior knowledge into a coherent mental representation. the integrative model of tpc combines the concepts of multiple memory systems, multiple sensory modalities, and two kinds of representations: descriptions (such as natural language and propositional representations) and depictions (such as pictures, visual images and mental models). according to this theory, readers construct only one mental model, which contains verbal and pictorial information. due to the importance of distinguishing descriptive and depictive representations for the analysis of reading strategies, our study is mainly inspired by the integrative model of tpc. 1.2 reading strategies of text-picture comprehension in order to explain differences between reading strategies when learning from texts, rickards and denner (1978) suggested a distinction between general and specific processing. general processing deals with the global thematic coherence of the text, whereas specific processing focuses on unique information required for specific purposes. there seems to be an inherent conflict between these two kinds of processing, f. zhao et al 48 | f l r as rickards and denner found that pre-posed questions can lead to a highly selective processing, but at the expense of global understanding of the text. according to our knowledge, this distinction between two different kinds of processing has not been applied to the combined processing of text and pictures, yet. in line with rickards and denner, we differentiate between general coherence-formation processing and selective task-oriented processing of text and pictures. the two kinds of processing are not meant as a strategy dichotomy. instead, coherenceformation processing and task-oriented processing are strategy components that can be combined. as time and processing resources are limited, however, the two components cannot be both maximized at the same time. instead, they can obtain different emphasis in the process of tpc. thus, there is a continuum with a primarily general coherence-formation processing at the one end and a primarily selective task-oriented processing at the other end. depending on the specific learning situation, different kinds of processing will be combined into a suitable strategy. in the following, we will consider three different learning situations: (a) if a reader receives a text with pictures without a specific goal in mind, he/she will put the emphasis on general coherence-oriented processing. that is, he/she will try to construct coherent mental representations based on the available information. we will refer to this kind of processing as the initial coherence-formation strategy. (b) if a reader has first processed a text with pictures without a specific goal in mind (i.e., applied an initial coherence-formation strategy) and then gets access to the specific questions to be answered, he/she will put emphasis on selective task-oriented processing and focus on task-relevant information. we will refer to this kind of processing as the consecutive task-oriented strategy. (c) if a reader is presented specific questions before receiving a text with pictures to be used for answering these questions, he/she will put more emphasis on selective task-oriented processing. however, although processing is goal-directed from the very beginning on, some coherence-oriented processing is also required because the reader needs some understanding of what the text and the picture are about. we will refer to this kind of processing as the initial task-oriented strategy. 2. research questions and hypotheses the abovementioned strategies refer to general vs. specific information processing (rickards & denner, 1978), without specifying the differential roles that text vs picture might play in these strategies. in order to fill this conceptual gap, we proposed research questions and hypotheses on the usage of text vs. picture taking the described strategies into consideration. we arranged the order of presentation of (a) the question (b) the material containing text and picture, as well as potential prior exposure to the material in that way, that these context factors simulated different processing. this allowed us to compare the relative amount of text vs. picture processing among the three strategies based on eye tracking data. as mentioned above, the initial coherence-formation strategy is comparatively general and coherence-driven. as learners have not been provided with a question to be solved based on the material, yet they might process the material in order to construct a coherent mental model covering the general content of the material. the consecutive task-oriented strategy is fairly specific and goal-driven. it is supposed to come into play when learners are provided with a question they should solve based on the material which they have already processed before. conversely, the initial task-oriented strategy is a combination of coherencedriven and goal-driven processing. in this case, learners are provided with the question to be solved before they get access to the material. they can thus selectively process information that is relevant to the question at first contact. as there are two strategies with a higher proportion of question-oriented processing, we will focus (1) on the comparison between initial coherence-formation processing and consecutive task-oriented processing and (2) on the comparison between initial coherence-formation processing and initial taskoriented processing. our research questions concerned potential differences between the usage of texts and the usage of pictures in the different processing. more specifically, our first research question was: f. zhao et al 49 | f l r (1) does the initial coherence-formation strategy (no question yet) differ from the consecutive taskoriented strategy (question after material) in terms of using texts and pictures? we hypothesized that text processing differs fundamentally from picture processing with the initial coherence-formation strategy and with the consecutive task-oriented strategy. this may be routed in different functions of text and pictures in reading comprehension as well as in the effect of reading strategy. reading through a text is a major activity to make the meaning of the content (schmidt-weigand et al., 2010), whereas pictures mainly serve as an external representation scaffolding the answering of questions (eitel et al., 2013). initial coherence-formation processing guides readers to comprehend without any task, which is more coherence-driven and general. in order to understand the main content, learners probably pay more attention to text with the initial coherence-formation processing than with the consecutive task-oriented processing. in comparison, consecutive task-oriented processing guides readers to solve the task after their prior knowledge construction, which is rather task-oriented and selective. participants might thus pay more attention to pictures with consecutive task-oriented processing than with initial coherence-formation processing in order to solve the question. (2) does initial coherence-formation processing (no question yet) differ from initial task-oriented processing (question before material) in terms of using texts and pictures? we also expected text and pictures to be comprehended differently between the initial coherenceformation strategy and the initial task-oriented strategy. the initial task-oriented strategy combines coherence formation and selective processing at first contact with the material. when readers engage in initial task-oriented processing, they, on the one hand, know the aim from the start and reading is relatively goal-directed and selective. on the other hand, they need to understand the content to search for the relevant information, which makes reading comparatively coherence-driven. in other words, the initial task-oriented strategy is more goal-driven and less coherence-driven than the initial coherence-formation strategy. as readers primarily use text for understanding, we assumed that participants focus more on text with the initial coherence-formation strategy than with the initial task-oriented strategy. as pictures can assist task solving (larkin & simon, 1987), we hypothesized that participants focus more on pictures and less on texts with the initial task-oriented strategy than with the initial coherence-formation strategy. 3. method 3.1 participants seventeen participants from secondary schools in germany were included in this study (m = 13 years, sd = 3.4 years). eleven participants were male and six were female. we used the heller and perleth (2000) intelligence tests on spatial ability and verbal ability. participants were marginally above average of the norm sample in spatial ability (average t of 53.65; sd = 7.39) and verbal ability (average t of 55.59; sd = 9.08). 3.2 materials in a previous pilot study, we had selected 60 text-picture units from textbooks about geography and biology with 288 test questions. the units and questions were tested with 1060 students in grades 5 to 8 according to item-response theory including dif-analyses for gender, grade and school. additionally, we carried out a rational task analysis on questions (schnotz et al., 2011). in order to answer the questions correctly, participants need to process both text and pictures. as we adopted eye-tracking methodology, we used only a subset of the text-picture units for pragmatic reasons. each participant received six text-picture units. the units were selected in a way that the type of image (realistic pictures vs. graphs) and the level of difficulty (easy vs. medium vs. difficult) were balanced throughout the experiment. participants were f. zhao et al 50 | f l r randomly distributed to different task orders to control for sequencing effects. the selected units (see appendix) and their average difficulties (beta-values in terms of item-response theory) were as follows: 1) banana trade: easy level (beta = -0.95) containing 95 words; realistic image in geography. 2) legs of insects: easy level (beta = -0.75) containing 59 words; realistic image in biology. 3) auditory: medium level (beta = 0.10) containing 122 words; graph in biology. 4) pregnancy: medium level (beta = 0.39) containing 136 words; realistic image in biology. 5) map of europe: difficult level (beta = 1.37) containing 136 words; map in geography. 6) savannah: difficult level (beta = 1.47) containing 170 words; graph in geography. 3.3 stimulating specific strategies each participant was instructed to process six text-picture units under different conditions in order to gather eye tracking indicators of text vs. picture processing under three different processing conditions. three units were presented without any information about the task (question) to be solved afterwards. this was expected to stimulate an initial coherence-formation strategy. after this first phase of processing, the task appeared on the screen and participants were asked to solve the task. participants could then re-read the text and re-observe the picture under the guidance of the task. this second phase of processing was expected to stimulate a consecutive task-oriented strategy. the other three units were presented when the participants had the task to be solved already in mind. participants read the question first. after participants had read the task, the corresponding text and pictures appeared on the screen. therefore, participants could explore the text and the pictures under the guidance of the question from the very beginning on. this was expected to stimulate an initial task-oriented strategy. f. zhao et al 51 | f l r figure 1. design overview for the three reading conditions. the timeline indicates the time that text, picture and question appeared on the screen. 3.4 procedure we conducted the experiments (each taking about 45 min) individually in a lab environment with the permission of students’ parents. materials and instructions were in german (native language of the research participants). after being informed about the purpose of the study, participants took the paper-pencil iq tests and watched an instruction video. participants were seated at 60-65 cm distance from the 24-inch monitor of the eye tracker positioned vertically at the eye level. a 5-point calibration was conducted before participants read the material. once the calibration was successful, the experiment would start. the experiment included a warm-up phase and a main test. the aim of the warm-up phase was to give participants the opportunity to get familiar with the eye tracking system and with using keyboard and mouse for turning pages and answering the questions. we used a tobii xl60 eye tracker to record eye movements at the rate of 60 hz. the system can compensate for head movements and thus provides relatively precise data from young participants and a comfortable testing situation. for each unit, the strategy was indicated by a written instruction presented upfront. finally, participants were thanked and rewarded with € 12 for taking part in our experiment. 3.5 indicators of cognitive processing past research has established the basic assumption that cognitive processes influence and are mirrored by eye movement indicators (e.g. gaschler, marewski, & frensch, in press; godau et al., 2014). the positions that a person’s eyes fixate are related to cognitive processing according to the eye-mind hypothesis and the immediacy hypothesis (just & carpenter, 1980). the eye-mind hypothesis assumes that eye movements can reflect information processing. the immediacy hypothesis states that the processing of information is immediate and happens directly after it is perceived. fixation counts and fixation duration, i.e. accumulated fixation counts and the fixation time at a particular area of interest (aoi) are associated with the depth of cognitive processing and spatial distribution of attention (hegarty, 1992; rayner, 1998). visit counts and visit duration (i.e. number of entries into and exits from an aoi and the accumulated duration of visits at an aoi) are also used in detecting reading processing as they mirror the importance of the aoi and the perceived informativeness (jacob & karn, 2003; friedman & liebelt, 1981). time to the first fixation of an aoi (i.e. latency until an aoi is fixated for the first time) indicates readers’ interest in the aoi (goldberg f. zhao et al 52 | f l r & kotval, 1999). transitions between aois (i.e., frequency of the eye movements from one aoi to the other) can mark the integration of contents presented in the aois (johnson & mayer, 2012). 3.6 scoring the aois were drawn manually using tobii studio software. each task had three separated aois: the picture, the text and the question. the average picture aoi for the six materials covered 25.14% of the screen; and the average text aoi covered 27.72%; the average question aoi occupied 18.55%. participants obtained one point when they answered a question correctly. they could obtain a maximum score of six points (i.e., accuracy rate of 100%) and a minimum score of zero points. 4. results participants had an average of 59% accurate answers to the questions (sd = 25%) with initial coherence-formation strategy combined with consecutive task-oriented strategy and an average of 53% accurate answers (sd = 36%) with initial task-oriented strategy, t(16) = 0.65, p = .52, d = 0.19. the average time for comprehending text and pictures and answering the question was 1.14 seconds per word (sd = 0.39 second per word) from a range of 1.71 seconds per word to 0.61 seconds per word. most importantly, the different experimental conditions of processing strategies differed in eye fixations on text vs. picture. considering the variation of reading speed between participants and the limited number of participants, we decided to use proportion measures for capturing the relative weight of text vs. picture (i.e. proportion of fixation counts on text, proportion of fixation time on text, proportion of visit counts on text and proportion of visit duration on text; holmqvist et al., 2011). for example, the proportion of fixation counts on text was calculated by dividing this count by the sum of fixation counts on text and picture. due to the dependency of text and picture, we only show the data for text with these indicators to avoid redundancy. thus, if we refer to high proportion of fixations on text, this implies a low proportion on the picture and vice versa. in order to explore whether the usage of text differs from the usage of picture with different strategies, we conducted two one-way repeated-measures multivariate analysis of variance (manova) with four eye-tracking indicators in three reading conditions. as there were two types of task-driven strategies, two manovas were performed: (1) initial coherence-formation strategy (no question yet) vs. consecutive task-oriented strategy (question after material), and (2) initial coherence-formation strategy vs. initial taskoriented strategy (question before material). the eye-tracking indicators included the proportion of fixation counts and fixation duration on text, the proportion of number of visit and visit duration on text, the time to the first fixation on text and picture and the number of transitions between text and picture. 4.1 initial coherence-formation strategy vs. consecutive task-oriented strategy in order to check whether the initial coherence-formation strategy (no question yet) differs from the consecutive task-oriented strategy (question after material) in fixations on text vs. picture, we performed a manova and univariate analyses (anovas) with the eye tracking indicators listed in table 1. the initial coherence-formation strategy and consecutive task-oriented strategy differed significantly with respect to the relative weight on text (rather than picture) across the eye tracking indicators, f(7, 26) = 14.10, p < .001, ηp 2 = .79. as reported below, separate anovas confirmed the difference between the two strategies for indicators like fixation counts and fixation duration, visit counts and visit duration and time to the first fixation. (1) fixation indicators during text-picture comprehension the anova revealed that the proportion of fixation counts on texts (rather than on picture) was significantly higher with initial coherence-formation processing than with consecutive task-oriented f. zhao et al 53 | f l r processing, f(1, 32) = 56.51, p < .001, ηp 2 = .64. proportion of fixation counts on texts and proportion of accumulated fixation duration on texts were correlated by r = .97. thus, unsurprisingly the proportion of accumulated fixation duration on text was higher with initial coherence-formation strategy than with consecutive task-oriented strategy, f(1, 32) = 70.41, p < .001, ηp 2 = .69. (2) visit indicators during text-picture comprehension participants visited the text aoi (rather than the picture aoi) more often with the initial coherenceformation strategy than with the consecutive task-oriented strategy. the proportion of number of visits on text was higher when participants engaged in initial coherence-formation processing rather than in consecutive task-oriented processing, f(1, 32) = 7.93, p = .008, ηp 2 = .20. participants had a higher proportion of accumulated visit duration on text with initial coherence-formation processing than with consecutive task-oriented processing, f(1, 32) = 14.24, p = .001, ηp 2 = .31. (3) time to first fixation on text and picture participants fixated within a shorter time on texts with the initial coherence-formation strategy than with the consecutive task-oriented strategy, f(1, 32) = 9.43, p = .004, ηp 2 = .23. they also fixated more quickly on pictures with the initial coherence-formation strategy than with the consecutive task-oriented strategy, f(1, 32) = 4.96, p = .033, ηp 2 = .13. pictures were fixated slightly quicker than texts: f(1, 32) = 0.87, p = .358, ηp 2 = .03 for the initial coherence-formation strategy; f(1, 32) < 1, for the consecutive taskoriented strategy. (4) transitions between text and picture the transitions between text and picture did not show a robust difference between initial coherenceformation and consecutive task-oriented processing, f(1, 32) = 2.34, p = .136, ηp 2 = .07. with the consecutive task-oriented strategy, participants had 27% of transitions (sd = 10.6%) between text and picture, 25% of transitions (sd = 10.4%) between text and question and 48% of transitions (sd = 13.7%) between picture and question. in short, participants transferred their eyes slightly more often between text and picture with initial coherence-formation strategy than with consecutive task-oriented strategy. when questions were illustrated, participants mainly transferred their attention between picture and question. table 1 means and standard deviations of eye tracking indicators in different goal-oriented strategies eye tracking indicators initial coherenceformation consecutive task-oriented initial taskoriented m (sd) m (sd) m (sd) % fixation counts on text 80.11 (12.81) 47.87 (12.19) 64.95 (16.4) % fixation time on text 81.27 (12.99) 44.00 (12.73) 66.93(15.72) % visit counts on text 48.29 (9.13) 39.64 (8.79) 46.16 (8.47) % visit time on text 66.72 (20.27) 44.88 (12.61) 69.31 (11.91) time to first fixation on text (sec) 2.11 (1.44) 6.60 (5.85) 2.21 (1.92) time to first fixation on picture (sec) 1.57 (1.92) 5.02 (6.09) 0.87 (2.07) average number of transitions between text and picture 21.59 (13.4) 14.94 (14.47) 28.53 (16.18) f. zhao et al 54 | f l r 4.2 initial coherence-formation strategy vs. initial task-oriented strategy eye tracking indicators also showed that the initial coherence-formation strategy (i.e. general processing without task) differed from the initial task-oriented strategy (i.e. task was presented at the beginning) in terms of text and picture processing. the manova showed the significant effect of goalorientation on eye tracking indicators, f(7, 26) = 3.49, p = .009, ηp 2 = .48. separate anovas yielded effects of strategy condition on fixation indicators but not on visit indicators. (1) fixation indicators during text-picture comprehension there was a significantly higher proportion of fixation counts on text (rather than on picture) with the initial coherence-formation strategy than with the initial task-oriented strategy, f(1, 32) = 9.03, p = .005, ηp 2 = .22. as proportion of counts and of duration of fixations were highly correlated (r = .84), this was mirrored by a similar effect on proportion of fixation duration on text, f(1, 32) = 8.41, p = .007, ηp 2 = .21. (2) visit indicators during text-picture comprehension participants had a similar pattern of results in visiting text and picture for both experimental strategy conditions. no difference was detected for the proportion of visit counts and visit duration on text between the initial coherence-formation strategy and the initial task-oriented strategy, fs(1, 32) < 1. (3) time to first fixation on text and picture we did not find any difference between the experimental strategies for latency of first fixation on text or picture, fs(1, 32) < 1.03. participants fixated the picture marginally sooner than the text with initial task-oriented processing, f(1, 32) = 3.79, p = .06, ηp 2 = .11. (4) transitions between text and picture transitions between text and picture did not differ when participants followed the initial coherenceformation strategy vs. the initial task-oriented strategy, f(1, 32) = 1.48, p = .23, ηp 2 = .04. for the initial taskoriented strategy, participants had 45% (sd = 13%) of transitions between text and picture, 20% (sd = 7.6%) between text and question and 35% (sd = 17.5%) between picture and question. in brief, participants transferred their eyes dominantly between text and picture, secondarily between picture and question and lastly between text and question. 5. discussion the current eye tracking study provides first methodological tools and results to specify the distinction between general and specific processing proposed by rickards and denner (1978) for processing of text vs. picture in mixed material. general processing (i.e., when the question is not known yet) deals with the global thematic coherence of the material. pre-posed questions can lead to a highly selective processing. in order to apply this account to the processing of mixed material (text and picture), the current study examined whether text processing differs from picture processing and whether this difference is moderated by the strategies used by the learner. specifically, we compared (1) initial coherence-formation strategy (no question yet) with consecutive task-oriented strategy (question after material) and (2) initial coherenceformation strategy with initial task-oriented strategy (question before material). a higher emphasis on text relative to picture was expected for the initial coherence-formation strategy. pictures were assumed to lead to higher values in fixation indicators with the consecutive task-oriented processing. we used eye tracking indicators to reveal the processing of text and picture, according to the eyemind theory and the immediacy theory. our results confirmed general differences of text vs. picture processing (mcnamara, 2007). importantly, eye tracking indicators of relative emphasis on text (rather than on picture) differed among the experimentally induced processing strategies. for the initial coherenceformation strategy, participants primarily fixated on text rather than on picture. as fixation count and f. zhao et al 55 | f l r fixation duration have been linked to the depth of cognitive processing and distribution of attention (rayner, 1998), the result suggest the primary usage of text during mental model construction in initial coherenceformation processing. the same is true for visit counts and visit duration. as visit indicators have been linked to the importance and informativeness of the aois (jacob & karn, 2003), participants possibly consider text to be important and informative with initial coherence-formation strategy. participants also needed little time to proceed to text and picture (i.e. time to the first fixation) with initial coherenceformation strategy. this indicator has been linked to participants’ interest on text and picture when reading is general (goldberg & kotval, 1999). in addition, participants had frequent transitions between text and picture with initial coherence-formation strategy. according to the integrative model of tpc, participants may establish their mental model by integrating text and picture. this is consistent with our assumption that participants intensely processed the content to establish an initial mental model with initial coherenceformation strategy. for the consecutive task-oriented strategy, picture was mainly used to scaffold question solving after the initial construction of the mental model. the results from fixation counts and fixation duration were consistent with the idea that pictures are primarily used when participants need to answer the question with consecutive task-oriented processing (cf. hochpöchler et al., 2012). they had a high amount of visit counts and visit duration on pictures with consecutive task-oriented processing. it seems that participants considered pictures important and informative when they were asked to solve tasks after the initial construction of the mental model. besides, data also revealed that participants perceived pictures sooner than texts with both processing strategies. this can be explained by pictures attracting readers’ attention (mayer, 1989; tversky, 2001; winn, 1989). transition data showed that they focused their main attention on picture and question, less attention on text and picture and the least on text and question when they followed consecutive taskoriented strategy. participants seemed to primarily use the picture to scaffold question answering. they paid less attention on text and picture because they have already constructed the initial mental model and they just need to further construct or update this mental model in order to answer the question. they paid the least attention on text and question, which implies that the text also helps participants to answer the question. pictures possibly serve as a tool for question solving with consecutive task-oriented processing and text may be used for building and updating the mental model. these results correspond to the assumption of unequal usage of text and pictures in the integrative model of tpc. according to this model, pictures are mainly processed as an external representation to solve questions. with the initial task-oriented strategy, participants invested a large amount of time on pictures because participants may have used pictures as an external tool to scaffold question answering. they visited more frequently and spent longer time on text than on picture. this might be explained by coherenceformation being supported by the text. in our design, participants need to process both text and picture to get the correct answer. although participants were instructed with questions in initial task-oriented processing, they still needed to understand the text and the picture, thus also constructing a mental model. therefore, text and pictures were both used with initial task-oriented processing. also, time to the first fixation suggested that text and pictures drew participants’ interest with initial task-oriented processing. learners might integrate text and picture to build the initial mental model, as assumed in the integrative model of tpc. similar patterns were shown for transitions between text and pictures with initial task-oriented strategy. participants transferred their attention most frequently between text and picture, less between picture and question and the least between text and question with the initial task-oriented strategy. on the one hand, participants integrated text and picture with initial task-oriented strategy. on the other hand, more transitions between picture and question than between text and question support our assumption that picture is primarily used to solve the question. this result also corresponds to the assumption that text is more likely to be used for general coherence-formation processing and picture is especially used for specific task-oriented processing. summarizing our results, we found that text processing and picture processing differ substantially when learners are exposed to text-and-picture material. the differences are moderated by processing strategies triggered by context factors such as presentation of the question prior to vs. after the first exposure to the text-and-picture material. likely, text is primarily used for coherence-formation processing. it assists f. zhao et al 56 | f l r learners to comprehend the content of the materials, which generate an initial mental model or coherent semantic representation. picture is likely used for task-oriented specific processing. when learners have constructed the initial mental model, picture is mainly used as an external representation to update the mental model and to answer the question. when learners have the tasks beforehand, picture might serve mainly to scaffold the initial mental model construction. future studies should provide more evidence for the link between (a) differences in fixation patterns elicited by different processing strategies and (b) the formation and usage of mental models integrating text and picture information. in conclusion, returning to the hypotheses posed at the beginning of this study, it is now possible to state that text processing differs fundamentally from picture processing and that this difference is moderated by different reading strategies. more specifically, this study suggested that text is mainly used to build the mental model with coherence-formation general strategy. comparatively, picture is more likely to guide readers to solve questions with task-oriented selective strategy. similar to the previous studies on text comprehension, reading strategies also influence the comprehension of text and picture. our findings expand the rickards and denner (1978) account of global vs. selective processing to the domain of mixed (picture and text) materials. the results suggest that eye tracking indicators can play a major role in assessing and scaffolding text and picture comprehension. eye tracking indicators might be used to assess whether the learner is following an approach suitable to the current context factors (i.e. presentation of the question prior to vs. after the first exposure to the text-and-picture material). based on such assessment, interventions guiding visual attention to areas relevant at the current processing stage can involve salient visual cues presented online during task processing (cf. rouinfar et al. 2014). keypoints according to eye tracking indicators, text processing differs from picture processing and this difference is moderated by processing strategies. a high emphasis on text when processing material before the question is known, suggests that text is mainly used to build a mental model in general coherence-formation processing. picture is more likely to guide readers when they need to solve a question with selective taskoriented processing. acknowledgments this study is part of the on-going bite project on text-picture integration, which is funded by the german science foundation (grant no: schn 665/3-1, schn 665/6-1). we appreciate all the help from student participants and student assistants involved in this study. we thank for the cooperation of parents, teachers and principals. we also express our gratitude to dr. loredana mihalca, dr. axel zinkernagel and dr. thorsten rasch for their suggestions during setting up the experiment and interpreting the data. f. zhao et al 57 | f l r appendix six materials displayed on tobii eye tracker (translated from german) [each topic has three levels of questions. they were analysed on item-response theory with a one-parametric logistic model (raschmodel): level 1 (beta = -1.15); level 2 (beta = +0.57); level 3 (beta = +1.39). we also carried out a rational task analysis, which showed the mapping procedures between text and picture: level 1 (1.25); level 2 (1.50); level 3 (3.00)]. 1. banana trade figure a1. banana trade. many people like to eat bananas. they are planted in countries like ecuador, costa rica or columbia and then exported to europe. undoubtedly, this is related to costs. the banana that you see in the picture costs just one euro. this euro includes... (a) salary of the farmers; (b) cost of the fertilizer; (c) cost for transportation to the harbour; (d) profit of the plantation owners; (e) tax for bananas; (f) cost for shipping; (g) profit of the wholesalers; (h) cost for storage; (i) profit of the retailers question level 1 how many cents can a retailer earn from a banana? (3 cent/ 21 cent/ 31 cent/ 7 cent) question level 2 who do people pay the least if they buy a banana? (profit of the wholesalers/ profit of the plantation owners/ profit of the retailers/ salary of the farmers) question level 3 if we compare the farmers, retailers and wholesalers, then... (farmers earn the most, wholesalers earn less and retailers earn the least from a banana/ retailers earn the most, wholesalers earn less and farmers earn the least from a banana/ retailers earn the most, farmers earn less and wholesalers earn the least from a banana/ wholesalers earn the most, farmers earn less and retailers earn the least from a banana) f. zhao et al 58 | f l r 2. legs of insects figure a2. legs of insects. the legs of insects presented in figures a to d have the same structure: hip (orange ), leg ring (brown ), thigh (green ), bar (pink ), and foot (blue ). the legs are primarily organs for movement, which can be used for: running (a), swimming (b) or jumping (c). however, they can also be used for cleaning (d). question level 1 which insect has a cleaning leg? (ant/ honey bee/ water bug/ grasshopper) question level 2 which type of leg has the longest leg ring? (leg for cleaning/ leg for jumping/ leg for running/ leg for swimming) question level 3 does the leg for cleaning compared to the leg for jumping have… (a longer bar but a shorter foot/ a thicker thigh but a thinner leg ring/ a longer foot but a shorter bar/ a shorter foot but a longer leg ring) f. zhao et al 59 | f l r 3. auditory figure a3. auditory. tones and sounds are sound-waves. the faster the sound-vibrations the higher we perceive the sound/tone. the human ear can differentiate sounds/tones with low vibrations (20) and high vibrations (20 000) per second. the number of vibrations per second is called frequency; its unit is hertz (hz). a ten-year-old child is able to hear every sound/tone between the frequencies of 20 and 20000 hertz. this area is called the hearing range, which is displayed in blue in the picture. furthermore, a ten-year-old child is able to produce sounds/tones, for example by speaking, which are between 70 and 1000 hertz. this area is called vocal range, which is displayed in pink the picture. the illustration shows the hearing and vocal ranges of different species. question level 1 which of the following species are able to perceive tones/sounds at 120 hertz? (dog/ cricket/ cat/ bat) question level 2 which of the following species has a vocal range for producing the lowest tone? (human being, 45 year-old/ cat/ dog/ cricket) question level 3 which of the following four species is able to hear tones below 100 hertz as well as produce tones below 1000 hertz and above 1500 hertz? (human being, 10 year-old/ human being, 45 years old/ cricket/ cat) f. zhao et al 60 | f l r 4. pregnancy figure a4. pregnancy. the child develops in the uterus, or uterine wall (1). from the fourth month of gestation it is called a fetus (2). the fetus is nourished by the placenta (3). it is here where exchange between the blood vessels of the fetus (8) and the blood vessels of the mother (9) takes place (look at zoomed area). in the blood vessels, nutrients and oxygen (o2) as well as waste products and carbon dioxide (co2) are exchanged. the fetus is connected to the mother by the umbilical cord (4). the amniotic fluid protects the child. it could be called a protective-pillow for the fetus, because it helps to cushion the fetus from impact. when the mother’s water membrane (7) has broken, the delivery process is initiated and the fetus will move through the cervix (5) during the labour. question level 1 what is the name of the pink area? (placenta/ umbilical cord/ uterine wall/ amniotic fluid) question level 2 which parts do not directly link to each other? (blood vessels of the mother and blood vessels of the child/ cervix and amniotic fluid/ umbilical cord and water membrane/ placenta and uterine wall) question level 3 which path does the blood of the fetus take after getting nutrition and oxygen (o2) from the mother? it flows … (back through the placenta and by umbilical cord to the fetus/ back through the placenta and by the blood vessels of the fetus to the water membrane/ back through the placenta and by blood vessels of the fetus to the uterine wall/ back through the placenta and by blood vessels of the mother directly to the fetus) f. zhao et al 61 | f l r 5. map of europe figure a5. map of europe. the big map shows the continent of europe. actually, europe is not an independent continent. together with asia, it forms the continent “eurasia”. in the south, west and north, the border of europe is clearly defined by the seas. the delineation in the east is more difficult because there are no natural borders. an agreement set the borders at the ural mountains and further south, so parts of russia and kazakhstan belong to both europe and asia. the states of europe are divided into different districts according to economic and geographic features. these subspaces are: northern europe (blue) western europe (dark green) central europe (light green) southern europe (red) eastern europe (yellow) in figure a, one box represents one unit of the european area. in figure b, one box represents one unit of the european population. question level 1 in which subspace are countries located that belong to both europe and asia? f. zhao et al 62 | f l r (northern europe/ southern europe/ middle europe/ eastern europe) question level 2 which subspaces have the same number of area units? (eastern europe and northern europe/ southern europe and south-eastern europe/ middle europe and southern europe/ west europe and middle europe) question level 3 which subspace has one more unit of area, but four fewer units of population, when compared to western europe? (middle europe/ northern europe/ southern europe/ south-eastern europe) f. zhao et al 63 | f l r 6. savannah figure a5. savannah. because there are different rainy seasons, the savannah is differentiated into separate types. the city of enugu is located in the wet savannah; accra and ouagadougou are located in a dry savannah and the city of sinder in a thorn-bush savannah. depending on the amount of rainfall (e.g. ouagadougou 887mm rainfall per year) different food is cultivated and plants exported in each region and city. the months with enough rain for the respective plants to grow are shown in the diagrams by the areas with blue stripes above the red lines for temperature. if the line for rainfall (blue) is above the line for temperature (red), then there is more rainfall than evaporation. each month is represented by a letter in the lower part of the diagrams, for example f = february. the following plants need different amounts of rainfall per year for ideal growth: millet: 180 to 700 mm manioc: 500 to 2000 mm yams: more than 1500 mm peanut: 250 to 700 mm cotton: 700 to 1500 mm question level 1 what is the amount of rainfall per year in accra? (887 mm/ 1661 mm/ 787 mm/ 549 mm) question level 2 which plant can grow well in enugu? (peanut/ yams/ cotton/ millet) f. zhao et al 64 | f l r question level 3 which plants will grow well in the most cities, respective of the type of savannah? (millet/ manioc/ cotton/ peanut) f. zhao et al 65 | f l r references anderson, r.c., & pearson, p.d. (2002). a schema-theoretic view of basic processes in reading comprehension. in p. d. pearson, b. rebecca, m. l. kamil & p. mosenthal (eds.), handbook of reading research (pp. 255-291). mahwah, nj: lawrence erlbaum. ainsworth, s.e., (1999) a functional taxonomy of multiple representations. computers and education, 33(2/3), 131-152. doi:10.1016/s0360-1315(99)000299 andre, t. (1979). does answering higher-level questions while reading facilitate productive learning? review of educational research, 49(2), 280-318. doi: 10.3102/00346543049002280 eitel, a., scheiter, k., schüler, a., nyström, m., & holmqvist, k. (2013). how a picture facilitates the process of learning from text: evidence for scaffolding. learning and instruction, 28, 48-63. doi: 10.1016/j.learninstruc.2013.05.002 frase, l.t. (1967). learning from prose material: length of passage, knowledge of results, and position of questions. journal of educational psychology, 58(5), 266-272. doi: 10.1037/h0025028 frase, l.t. (1968). effect of question location, pacing, and mode upon retention of prose material. journal of educational psychology, 59(4), 244-249. doi: 10.1037/h0025947 friedman, a., & liebelt, l. s. (1981). on the time course of viewing pictures with a view towards remembering. in d. f. fisher, r. a. monty & j. senders (eds.), eye movements: cognition and visual perception (pp. 137-155). hillsdale, nj: lawrence erlbaum associates. gaschler, r., marewski, j. n., & frensch, p. a. (in press). once and for all – how people change strategy to ignore irrelevant information in visual tasks. quarterly journal of experimental psychology. doi: 10.1080/17470218.2014.961933 godau, c., wirth, m., hansen, s., haider, h., & gaschler, r. (2014). from marbles to numbers – estimation influences looking patterns on arithmetic problems. psychology, 5, 127-133. doi: 10.4236/psych.2014.52020 goldberg, j. h., & kotval, x. p. (1999). computer interface evaluation using eye movements: methods and constructs. international journal of industrial ergonomics, 24(6), 631-645. doi: 10.1016/s01698141(98)00068-7 hamilton, r. j. (1985). a framework for the evaluation of the effectiveness of adjunct questions and objectives. review of educational research, 55(1), 47-85. doi: 10.3102/00346543055001047 hegarty, m. (1992). mental animation: inferring motion from static displays of mechanical systems. journal of experimental psychology: learning, memory, and cognition, 18(5), 1084-1102. doi: 10.1037/02787393.18.5.1084 heller, k.a. & perleth, c. (2000). kft 4-12+r. kognitiver fähigkeitstest für 4. bis 12. klassen, revision. göttingen: beltz test gmbh. hochpöchler, u., schnotz, w., rasch, t., ullrich, m., horz, h., mcelvany, n., & baumert, j. (2012). dynamics of mental model construction from text and graphics. european journal of psychology of education, 1(22). doi: 10.1007/s10212-012-0156-z holmqvist, k., nyström, m., andersson, r., dewhurst, r., jarodzka, h., & weijer, j. v. d. (2011). eye tracking: a comprehensive guide to methods and measures. oxford; new york: oxford university press. jacob, r. & karn, k. s. (2003). commentary on section 4. eye tracking in human-computer interaction and usability research: ready to deliver the promises. in j. hyönä, r. radach, & h. deubel (eds), the mind’s eye: cognitive and applied aspects of eye movement research (pp. 573-605). oxford: elsevier science. johnson, c. i., & mayer, r. e. (2012). an eye movement analysis of the spatial contiguity effect in multimedia learning. journal of experimental psychology: applied, 18(2), 178-191. doi: 10.1037/a0026923 just, m. a., & carpenter, p. a. (1980). a theory of reading: from eye fixations to comprehension. psychology review, 87, 329-354. doi: 10.1037/0033-295x.87.4.329 kintsch, w., & van dijk, t. a. (1978). toward a model of text comprehension and production. psychological review, 85(5), 363-394. doi: 10.1037/0033-295x.85.5.363 f. zhao et al 66 | f l r kress, g. r., & leeuwen, v. t. (1996). reading images: the grammar of visual design. london; new york: routledge. larkin, j. h., & simon, h. a. (1987). why a diagram is (sometimes) worth ten thousand words. cogs cognitive science, 11(1), 65-100. doi: 10.1111/j.1551-6708.1987.tb00863 mcnamara, d. s. (ed.). (2007). reading comprehension strategies: theories, interventions, and technologies. mahwah, nj: lawrence erlbaum associates publishers. mayer, r. e. (1989). models for understanding. review of educational research, 59(1), 43-64. mayer, r. e. (2005). cognitive theory of multimedia learning. in r. e. mayer (ed.), the cambridge handbook of multimedia learning (pp. 31-48). cambridge, u.k.; new york: cambridge university press. paivio, a. (1986). mental representations: a dual coding approach. new york: oxford university press. rayner, k. (1998). eye movements in reading and information processing: 20 years of research. psychological bulletin, 124(3), 372-422. doi: 10.1037/0033-2909.124.3.372 rickards, j. p. (1979). adjunct postquestions in text: a critical review of methods and processes. review of educational research, 49(2), 181-196. doi: 10.3102/00346543049002181 rickards, j. p., & denner, p. r. (1978). inserted questions as aids to reading text. instructional science, 7(3), 313-346. doi: 10.1007/bf00120936 rothkopf, e. z. (1964). learning and the educational process; selected papers from the research conference on learning and the educational process, held at stanford university, june 22-july 31, 1964. in j. d. krumboltz (ed.), research conference on learning the educational process (pp. 193-221). chicago: rand mcnally. rothkopf, e.z. (1966). learning from written instructive materials: an exploration of the control of inspection behavior by test-like events. american educational research journal, 3(4), 241-249. doi: 10.3102/00028312003004241 rouet, j.-f. (2006). question answering and document search. in j.-f. rouet (ed.), the skills of document use. from text comprehension to web-based learning (pp. 93-121). mahwah, nj: erlbaum. rouet, j.-f., & britt, m. a. (2011). relevance processes in multiple document comprehension. in m. t. mccrudden, j. p. magliano & g. schraw (eds.), text relevance and learning from text. (pp. 19-52). charlotte, nc, us: iap information age publishing. rouinfar, a., agra e., larson, a. m., rebello, n. s., & loschky, l. c. (2014). linking attentional processes and conceptual problem solving: visual cues facilitate the automaticity of extracting relevant information from diagrams. frontiers in psychology, 5:1094. doi:10.3389/fpsyg.2014.01094 schmidt-weigand, f., kohnert, a., & glowalla, u. (2010). explaining the modality and contiguity effects: new insights from investigating students' viewing behaviour. applied cognitive psychology, 24(2), 226-237. doi: 10.1002/acp.1554 schnotz, w., & bannert, m. (2003). construction and interference in learning from multiple representation. learning and instruction, 13(2003), 141-156. doi: http://dx.doi.org/10.1016/s0959-4752(02)00017-8 schnotz, w., ullrich, m., hochpöchler, u., horz, h., mcelvany, n., schröder, s., & baumert, j. (2011). what makes text-picture integration difficult? a structural and procedural analysis of textbook requirements. ricerche di psicologia, 1, 103-135. doi: 10.1007/s10212-011-0078-1 tversky, b. (2001). spatial schemas in depictions. in m. gattis (ed.), spatial schemas and abstract thought (pp. 79-112). cambridge, mass.: mit press. winn, w. (1989). the role of graphics in training documents: toward an explanatory theory of how they communicate. ieee trans. profess. commun. ieee transactions on professional communication, 32(4), 300-309. doi: 10.1109/47.44544 zwaan, r., radvansky, g., hilliard, a., & curiel, j. (1998). constructing multidimensional situation models during reading. scientific studies of reading, 2(3), 199-220. doi: 10.1207/s1532799xssr0203_2 frontline learning research 5 special issue ‗learning through networks‘ (2014) 15-37 issn 2295-3159 corresponding author: kaisa hytönen, department of education, 20014 university of turku, finland, sakahy@utu.fi doi: http://dx.doi.org/10.14786/flr.v2i2.90 15 | f l r cognitively central actors and their personal networks in an energy efficiency training program kaisa hytönen a , tuire palonen a , kai hakkarainen a a university of turku, finland article received 17 february 2014 / revised 31 march 2014 / accepted 29 may 2014 / available online 15 july 2014 abstract this article aims to examine cognitively central actors and their personal networks in the emerging field of energy efficiency. cognitively central actors are frequently sought for professional advice by other actors and, therefore, they are positioned in the middle of a social network. they often are important knowledge resources, especially in emerging fields where standard knowledge exchange mechanisms are weak. by adopting a personal network approach, we identified the cognitively central participants of a one-year energy efficiency training program, studied the structure and heterogeneity of their personal networks and determined which features were relevant to achieving these cognitively central positions. at the end of the training, the social networking questionnaire was sent to 74 course participants. semi-structured interviews were conducted for the six mostcentral actors, whose personal networks were larger than those of the other participants. these six actors differed from each other in many respects; there did not appear to be a single explanation for why these persons achieved their central positions. in conclusion, we propose that becoming a cognitively central actor is an intricate process. it cannot be explained only, for instance, by actors’ educational backgrounds, the level of their previous energy efficiency knowledge or their field of know-how. to understand this phenomenon, we must examine which organizations such people come from and how their expert profiles, which are related to their fields and competences, fit into the wider context of energy efficiency. more research is needed to determine whether the results are only typical of emerging fields. keywords: personal networks; cognitive centrality; advice seeking; social network analysis; emerging fields; energy efficiency training programme hytönen et al. 16 | f l r 1. introduction in rapidly changing and complex environments and their associated emergent knowledge-laden global problems and challenges, professionals must share their knowledge and expertise (hakkarainen, palonen, paavola, & lehtinen, 2004) rather than rely on mere individual competencies. this study focuses on examining key experts who have crucial roles in adaptively coping with novel challenges and changing professional requirements emerging from swiftly transforming professional fields. the key experts are often considered to be exceptionally valuable networking partners and collaborators because they have strategic knowledge and competence as well as in-depth meta-level vision regarding a transforming multi-professional field. their knowledge and competence is likely to be seen valuable by colleagues because they are deliberately building personal networks to interconnect heterogeneous social resources, expertise and knowhow and reaching beyond their immediate peers and bridging professional fields, thereby changing the ecology of their professional learning. as a consequence, the key experts are most often sought for advice and assistance by those struggling with novel professional challenges. personal networking connections with key professionals and the expert cultures they represent are important in updating the expertise and skills needed for responding to the professional challenges of future working lives, especially in turbulent environments (lehtinen, hakkarainen, & palonen, in press). professionals must be able to solve unforeseen complex problems and to share knowledge and competences, often breaking the boundaries of traditional disciplines. energy efficiency is one of the rapidly developing fields that has emerged through the intersection of several professional domains. therefore, there does not appear to be one unified system to direct professional activity, and the standard knowledge exchange mechanisms are weak. cooperation between professionals from diverse fields, who master varying bodies of expertise and pursue divergent professional tasks and projects, plays an important role in energy efficiency work. extensive professional experience alone does not automatically guarantee a central professional position; deliberate and sustained efforts to work at the edge of competence and cultivate expertise play a critical role as well (bereiter & scardamalia, 1993). although developing efficient energy usage practices and meeting global and national standards and directives regarding energy efficiency are some of the most important challenges of the 21st century, there are no established educational methods and practices for cultivating associated expertise in finland. therefore, efforts to create multifaceted personal expert networks and informal learning seem to play a significant role in professional development and updating expertise (see the similar situation regarding magicians‘ expert networks in rissanen, palonen, pitkänen, kuhn and hakkarainen, 2013). in our previous study (hytönen, palonen, lehtinen, & hakkarainen, 2014), we examined whether a training model that we call academic apprenticeship education initiated in finland in 2009, could help increase professional networking ties among participants. the study revealed that this energy efficiency training program, organized for actors who were already working on expert-level tasks, did not effectively support comprehensive networking or the creation of a knowledge exchange forum among the participants. however, there were some key professionals who were able to create valuable personal networking connections and contribute to professional collaboration during the training. this paper focuses on them. 1.1 conceptual background in complex and changing professional environments, targeted knowledge or competence is not always easily found or verified. in order to acquire new knowledge and appropriate novel professional practices as well as find required professional help and advice, professionals have to deliberately build and extend their personal networks (see pataraia, margaryan, falconer, & littlejohn, 2013). resources obtained through personal networks can benefit professional development by providing access to networking partners and associated professional support and opportunities for informal learning. in order to obtain new knowledge, many key experts have to rely on their personal social networks, reaching beyond the boundaries of their workplace organizations rather relying merely on traditional institutional resources (nardi, whittaker, & schwarz, 2000). however, to benefit from personal professional learning networks, workers must have cultivated networking competencies in terms of having the capability of finding and creating hytönen et al. 17 | f l r useful connections, as well as maintaining and activating these connections when needed (gruber, lehtinen, palonen, & degner, 2008; rajagopal, joosten-ten brinke, van bruggen, & sloep, 2012). the factors influencing the choices involved in building, maintaining and activating personal professional networks are related to (a) the trajectories of an actor‘s personal professional interests and needs, (b) the features of the contacts, such as the like-mindedness, benevolence and the potential learning and collaboration value of the relationship, and (c) the characteristics of the work environment (rajagopal, joosten-ten brinke, van bruggen, & sloep, 2012). according to the homophily principle, people often interact and create strong ties with those who have similar characteristics to themselves (kleinbaum, stuart, & tushman, 2013; mcpherson, smith-lovin, & cook, 2001; reagans, 2011). it follows that networks are often homogeneous in nature; people are more likely to create contacts with others who share the same gender, age, educational level, professional group and structural position. therefore, homophily often impacts the information people receive from their personal social networks, the attitudes they form and the interactions they experience (lozares, verd, cruz, & barranco, 2013; mcpherson, smith-lovin, & cook, 2001). professionals functioning in such networks often share a great deal of their knowledge and practices, immediately understanding each other (wenger, 1998). homogeneous professional networks do not, however, provide an adequate way of coping with the challenges involved in profound transformations of professional practices extending across multiple fields, such as in the case of energy efficiency work; personal networks are rich repositories of professional knowledge if they involve people with heterogeneously distributed knowledge and expertise and, thus, provide access to the resources embedded in these social relations (lin, 2001). cultivating strategic competence in complex and extended professional fields, such as energy efficiency, appears to require deliberate efforts of creating networking connections across the boundaries of several fields of professional activity (akkerman, admiraal, simons, & niessen, 2006). such efforts of crossing boundaries between professional cultures are likely to characterize networking activities of key experts, allowing them to mediate knowledge across the borders of different cultures and environments and bridge various fields of expertise with one another. key persons are positioned in the middle of the communication structure and therefore have access to extended pools of knowledge and diverse sources of information. in the literature, actors with strategic networking positions mediating, translating and transmitting knowledge and good practices and creating connections between diverse people between different cultures (meyer, 2010) are referred to as knowledge brokers (sverrisson, 2001), gatekeepers (morrison, 2008), stakeholders (krueger, page, hubacek, smith, & hiscock, 2012; svendsen & laberge, 2005), stars (borgatti, mehra, brass, & labianca, 2009) and hubs (barabasi, 2002). sverrisson (2001) has distinguished between three approaches to knowledge brokering. networking brokerage refers to connecting people, knowledge orientated brokerage relates to translating concepts and theories across disciplines that are critical to applying knowledge in complex projects, and organizational or technological brokerage involves facilitating novelty and innovation. overall, people in the middle of the social network often disseminate knowledge culture by sharing information with people around them and between workplace organizations and their surrounding environments, and by building bridges among people and between bodies of knowledge (burt, 1999). to cope with the challenges of rapidly transforming environments of professional activity, key experts have to cultivate practices of adaptive expertise (hatano & inagaki, 1986). such practices involve the cultivation of competency in successfully dealing with challenging, novel and unanticipated professional problems instead of clinging to old routines. adaptive experts are those who deliberately invest resources released by accumulating experience in new learning and seek challenges that assist and elicit their learning and the development of expertise. toward that end, many participants create deliberately novel networking connections and engage in inspiring encounters with heterogeneous networking partners. the creation of versatile networking connections and sustained sharing of professional expertise elicits the development of relational expertise, which is understood as the capability to productively tailor and fine-tune personal expertise to create joint or shared competence within communities and organized groups of experts and professionals (edwards, 2010). people working in the emerging fields often come from different working sectors and representing various fields of know-how when combining different fields of expertise appears to hytönen et al. 18 | f l r be important (see mieg, 2006). relational expertise recognizes the importance of resources provided by the different actors and the relevance of generating mutual understanding and shared goals over the borders of different fields of expertise, enabling collaboration (edwards, 2010). one way of assessing key experts‘ positions within a social network is the number of networking partners seeking their advice. advice networks are comprised of relations through which participants share resources, supporting the completion of their assignments (sparrowe, liden, wayne, & kraimer, 2001). who people contact when needing knowledge and advice and the reasons for seeking advice from these people has been studied (creswick & westbrook, 2010; nebus, 2006), as has the kind of knowledge sought in advice networks (cross, 2004; cross, borgatti, & parker 2001). motivations for asking for professional advice from someone seem to be related to the relevance and value of their information, the level of interpersonal trust (levin & cross, 2004), the advice seeker‘s perceptions of the knowledge source‘s expertise and credibility, accessibility, the expectations on how the contact will respond, and the assessed value and costs of seeking advice (nebus, 2006). investigations have revealed that information and advice relationships cultivated by people provide several types of knowledge, such as answers to know-what, knowhow and know-who questions as well as meta-knowledge concerning where information needed for answering these questions may be found. in addition, knowledge received from advice networks might help to think differently about problems faced as well as validate and legitimize solutions and plans made (see cross, 2004). attainment of a central networking position in advice networks is often related to personal characteristics, such as an in-depth professional commitment, motivational engagement (aalbers, doflsma, & koppius, 2013), a high level of professional performance (sparrowe, liden, wayne, & kraimer, 2001) and transformational leadership (bono & anderson, 2005). in this study, we adopt a personal (egocentric) network approach to identify and examine key experts in an energy efficiency training program whose professional knowledge the other course participants frequently sought to share. we call such key experts, whose cognitive achievements are shared by their professional peers, cognitively central actors. the concept of cognitive centrality is derived from studies on group decision making in a social network framework (kameda, ohtsubo, & takezawa, 1997). kameda, ohtsubo and takezawa (1997) suggested that the more knowledge and competence a person shares with the other group members, the more cognitively central position he or she has in the group. cognitively central group members who contribute intensively in collective problem-solving efforts are more influential in decision-making situations than peripheral members (stasser, abele, & vaughan parsons, 2012). here, the concept of cognitively central actors will be used to refer to course participants who were positioned in the middle of the social network, have valuable, extended and heterogeneous networking connections, and, therefore, provide other participants with new and relevant knowledge, competences and assistance more often than others (see kameda, ohtsubo, & takezawa, 1997; palonen, hakkarainen, talvitie, & lehtinen, 2004). in many cases, they appeared to have a high level of relational expertise in terms of having metaknowledge regarding the social distribution of relevant knowledge across professional networks (i.e., knowing who knows what in a professional network). traditionally, it is thought that persons who are often sought for professional and work-related advice are more knowledgeable and have more expertise than others. however, this is not necessarily the case in the emerging fields of complex professional activity where expertise needed for solving emerging problems is radically distributed or may not exist to begin with. more symmetric advancement of heterogeneously distributed knowledge (scardamalia, 2002) by different participants may characterize such situations. under such conditions, participants having a comprehensive vision of the future of their field as well as a high level of discernment, that is, a capability of assessing knowledge relationally in context (facer, 2011), may become cognitively central participants. this paper examines more closely who the cognitively central participants are in the field of energy efficiency, and it attempts to determine the possible reasons or personal features for achieving this kind of important networking position in the emerging field. we aim to understand why certain key persons are contacted and asked for knowledge and advice more often than others. personal networks offer illustrative ways to examine knowledge exchanges and communication in complicated environments by enabling the integration of individual and community level attributes. therefore, they enable the analysis of the properties of the one person ―owning‖ the network (ego) and the properties of people belonging to his or her network hytönen et al. 19 | f l r (alters), as well as the attributes of ego-alter ties and alter-alter ties (hakkarainen, palonen, paavola, & lehtinen, 2004). as a unit of analysis, personal networks, supplemented by other techniques, enabled us to look at network connections from different angles and across several levels and thus achieve a more accurate picture of multi-faceted and complicated social structures (fuhse & mützel, 2011). in all, we shall examine cognitively central actors‘ personal, social and organizational features relevant to achieving a central, strategic position among the participants in the energy efficiency training. 2. the aim of the study the purpose of this study is to examine the personal networks of those key energy efficiency professionals who are often sought for professional information and advice by other actors working in the field, in other words, the cognitively central actors. our specific focus is analysing how knowledge and competence sharing regarding energy efficiency issues was organized around particular persons and whether there were some features explaining why certain persons achieved a cognitively central position. the study was carried out in the context of a year-long energy efficiency training program. our hypotheses are as follows: 1) at the overall network level, cognitively central participants can be identified using an advice size indicator, that is, from whom the participants ask advice regarding their energy efficiency related problems. 2) a) at the ego-alter level, the structure of the cognitively central participants‘ personal networks differs from that of other course participants‘ so that their networks are bigger, denser and they have more broker capacity, that is, they connect the members in their personal networks. b) the central participants‘ personal networks are expected to be diverse in relation to their members‘ genders, university divisions, working sectors, educational backgrounds, previous experience-based knowledge of energy efficiency and the fields of their know-how. 3) at the ego level, the cognitively central participants have certain features that explain their prominent networking position. such features can be expected to relate to their personal attributes and affiliations. 3. methods 3.1 energy efficiency training program this study was conducted in the context of the one-year academic apprenticeship education program in the field of energy efficiency (hytönen, palonen, lehtinen, & hakkarainen, 2014). it was a pilot educational program organized for the first time in finland in 2010–2011. the energy efficiency training aimed to support the cultivation of energy efficiency expertise in the public and private sectors, promote professional networking between the actors in the field and encourage the sharing of good professional practices. three technical universities organized the training collaboratively: universities a (n = 29) and b (n = 28) organized education mainly for actors working in the public sector, and university c (n = 30) organized education for actors working in the private sector. fourteen participants working in the private sector participated in the educational training organized by universities a and b because there were not enough spaces for all willing private-sector participants at university c. altogether, 74 of 87 course participants completed the training program; 13 participants dropped out for various reasons. hytönen et al. 20 | f l r the energy efficiency training was based on real-life working practices and included theoretical studies and workplace learning. the theoretical studies were organized into seven contact days, including lectures, small group work and discussions. the first three and the last contact days were organized jointly for all course participants, but the remaining three days were organized separately for the public and private sector actors. the three separated contact days involved themes that were relevant especially to either the public or the private sector. the timespan and practices for organizing the contact days were the same for all three universities. about 70–80% of the active time in the training program was assumed to take place in the participants‘ workplaces, where the participants conducted a developmental study project. the developmental study project aimed to support participants‘ professional development as well as the development of the workplaces‘ energy efficiency practices. on the last contact day, each participant presented his or her developmental study project. networking between the participants was supported by small group work. in each university, the course participants were organized into five small groups of five to six members according to their places of residence. in addition to small group work taking place during the contact days, the small group members were advised to meet at least three times during the training to discuss their developmental study projects and provide peer support. in addition, the course participants were encouraged to use the virtual learning environments provided by each university to support open discussion and knowledge exchange. however, the small group meetings and the use of the virtual learning environments were not controlled by any means. furthermore, each participant was assigned an academic expert advisor on behalf of the universities and a workplace supervisor from his or her workplace organization. their role was to provide professional support for participants in their developmental study projects and the process of workplace learning. the practices of the energy efficiency training are presented in further detail in hytönen, palonen, lehtinen and hakkarainen (2014). 3.2 participants at the overall level of analysis, all course participants were asked to participate in this study. participation was voluntary, and the energy efficiency training was independent of this research. the participants were engineers, architects and other professionals with a mastersor bachelors-level education and varied lengths of experience in professional practices related to energy efficiency. at the ego-alter level of analysis, the participants were the 40 members (alters) of the central participants‘ personal networks in the context of the energy efficiency training. personal networks included only other course participants; the academic expert advisors, the workplace supervisors and other colleagues were not investigated. twentyfour of the alters were male and 16 were female. fifteen of the alters participated in the education organized by university a, 12 participated in the education organized by university b and 13 participated in the education organized by university c. more detailed information regarding the alters is provided in the results section. at the ego level of analysis, the participants in the study were six cognitively central actors from the energy efficiency training who were identified from all the course participants by analysing adviceseeking in the first section of the analysis. they are described in more detail in the results section. 3.3 social network methods network data were collected by administering an online social networking questionnaire to all 74 course participants (males, 50; females, 24) at the end of the training, out of whom 52 responded; the response rate was 70%. we also collected networking data in a similar way in the beginning of the training, but this study is based only on the latter data. the results concerning the changes in networking ties during the training are reported elsewhere (hytönen, palonen, lehtinen, & hakkarainen, 2014). the networking questionnaire involved a list of the names of all course participants, and in relation to one another, the respondents were asked to assess the following: 1) from whom they sought advice regarding energy efficiency and 2) with whom they collaborated in terms of energy efficiency activity. to measure the strength of the networking relations, the respondents were asked to rate each of these items on a valued scale of 0 (no connection), 1 (a connection) or 2 (a strong connection). hytönen et al. 21 | f l r a social network analysis (sna) was conducted via ucinet 6 (borgatti, everett, & freeman, 2002). we examined both the advice-seeking network, that is, how the participants sought energy efficiency information from one another, and the collaboration network, that is, how the participants collaborated with one another regarding energy efficiency issues. sna was conducted at the overall network level and the egoalter level. the different levels of analysis provided complementary dimensions for examining the cognitively central participants‘ networking. regarding the overall network, multidimensional scaling (mds) and advice size variables were used. in relation to the ego-alter level, the structure of connections between ego and alters was examined using different networking methods. information about the features of alters was collected by a networking questionnaire that was developed according to earlier studies (palonen, 2003). at the overall network level of analysis, both the advice-seeking and collaboration networks were examined. from these two, the advice-seeking network was used to identify the cognitively central participants in the training because it is asymmetric in nature and does not require reciprocal networking connections. therefore, it functions well as an indicator of a person‘s cognitive centrality (palonen, hakkarainen, talvitie, & lehtinen, 2004; sparrowe, liden, wayne, & kraimer, 2001). the cognitive centrality of the course participants was examined by calculating the centrality value (advice size), which indicates the amount of information that a person provides to the other members of the network. this was done using freeman‘s in-degree measurement, which revealed how many course participants sought energy efficiency advice from the actor in question, that is, the number of incoming networking linkages based on peer evaluation. the analysis indicated how significant a role an actor‘s expertise played in the social network and thereby allowed one to identify the cognitively central actors among the participants. the analysis was conducted for the dichotomized network, so the frequency of communication was not analysed. further, the network cohesion for the overall advice-seeking network was analysed via a density measure that characterized the number of existing networking ties in relation to all possible ties. to illustrate the structure of the overall network of all course participants and the structural position of the cognitively central participants, the advice-seeking and collaboration networks were visualized using the spindel visualization tool (see www.spindel.fi) using the participants‘ network distances, which were provided by mds techniques. at the ego-alter level, to deepen the analysis, the structure and heterogeneity of the central participants‘ personal networks were examined. the advice-seeking and collaboration networks were merged for the following analyses by summing them up, and the merged network was dichotomized (cut point 0). the egocentric network was used as the unit of analysis. the structure of the central participants‘ personal networks was analysed by size, density and a brokering index. size indicates the number of alters the ego is directly connected to; central members are expected to have a high number of contacts. density was calculated among the central participants‘ network members; the number of ties was divided by the number of pairs multiplied by 100. a high density in the alter network indicates a low brokering or mediating role for a given ego. on the other hand, a low density indicates that the ego‘s position in the alter network is crucial. the brokering index is the number of times an ego lies on the shortest path between two alters. it is a parallel indicator for knowledge mediating. an undirected type of ego neighbourhood was used, meaning that all actors connected to and from an ego were considered (borgatti, everett, & freeman, 2002). the mannwhitney u-test was used to analyse whether the structure of the central participants‘ personal networks differed from the structure of all other course participants‘ personal networks. the heterogeneity of the central participants‘ personal networks was analysed by comparing the various properties among alters, as well as the properties between the egos and alters. first, we identified the alters by examining the egos‘ neighborhood in advice-seeking and collaboration. second, we classified all participants in terms of the university they belonged to, educational background, working sector, gender, level of previous experience-based knowledge in energy efficiency and field of know-how. the estimation of the alters‘ previous energy efficiency knowledge was based on their self-reports. the central participants‘ personal networks were visualized using cytoscape. the advice-seeking and collaboration networks were merged for the visualizations. hytönen et al. 22 | f l r 3.4 semi-structured interviews and qualitative content analysis semi-structured interviews were conducted with all the cognitively central actors to complement the social networking data at the ego level of analysis. the interviews were carried out to examine the features of the cognitively central participants and the possible reasons they achieved a central networking position among energy efficiency workers. data collection was carried out in two phases. four of the six central participants identified were interviewed, both in the beginning and at the end of the training, as a part of broader data collection. after we conducted sna and identified the cognitively central participants, we complemented the interviews by asking them to assess the possible reasons for their central networking positions. at this stage, the two remaining central participants were interviewed as well. the interview themes addressed the participants‘ educational backgrounds, work experiences, current work assignments and professional roles in relation to energy efficiency; their reasons for attending the training; their views on the energy efficiency field; their networking with the other course participants and other energy efficiency professionals, future prospects of developing energy efficiency expertise and their own opinions regarding the possible reasons for their cognitive centrality. the interviews were audio recorded and transcribed by the first author. qualitative content analysis was performed using atlas.ti 6.2. the analysis was conducted by identifying expressions related to the themes of adaptive expertise, relational expertise, disseminating knowledge culture and knowledge brokering. content was identified and clustered independently by two researchers. 4. results 4.1 identifying the cognitively central participants at the overall network level at the overall network level, we identified the cognitively central actors of the energy efficiency training program. the density analysis for the overall network revealed that 5% (sd = 21.8) of all potential networking linkages were present in the advice-seeking network. all course participants‘ cognitive centrality was analysed via freeman‘s in-degree measure in the advice-seeking network. the measure is based on peer evaluation, and it reveals how many course participants have selected the actor in question as an information source. the cognitively central participants were selected on the basis of their high in-degree value, i.e., a minimum of seven linkages, as compared to the average for all course participants (m = 3.7; sd = 2.0) in the advice-seeking network (see table 1). we selected six actors (a20, a26, b2, b21, c17 and c23) who were most often sought advice by their peers. there were two central actors from each university. multidimensional scaling (figure 1) representing the overall network of all course participants revealed that the central actors from the public sector universities (a20, a26, b2 and b21) were located in the middle of the network, indicating that they were in close connection with participants from both public sector universities (see the video of figure 1). central participant c17 from university c appeared to be connected mainly with the other private sector participants. however, the other central participant from the private sector (c23) was positioned between the private and public sector universities. overall, the course participants from universities a and b were clustered more closely than the participants from university c. hytönen et al. 23 | f l r figure 1. overall network. the mds figure is based on collaboration ties, whereas lines reflect advice ties. the figure, visualized using spindel tools (www.spindel.fi), reveals how the central participants were positioned in the network of all course participants. the colour code in the graphs represents the university that the person comes from: red, university a; green, university b; blue, university c. the central actors are indicated by the large nodes and personal numbers. click here to start the video. 4.2 central participants’ personal networks at the ego-alter level at the ego-alter level of ties, we examined the structure and heterogeneity of the central participants‘ personal networks. the structure of the personal networks was assessed using the ego networks‘ basic measures, which are reported in table 1. two of the central participants, a26 and b21, did not respond to the networking questionnaire. thus, their measures are based only on information provided by other course participants. we used a mann whitney u-test to analyse whether the structure of the central participants‘ personal networks differed from the structure of all other course participants‘ personal networks. it appeared that there was a statistically significant difference in relation to the size (z = -3.368; p = .001), which was self-evident, and density (z = -2.009; p = .045) of the personal networks, as well as the brokering index (z = -3.275; p = .001). to conclude, in addition to the fact that central members were most often asked for advice (that was the defining criterion), they had larger networks that were relatively sparse, indicating their own mediation role, which was also shown by the broker indicator. a20 had an especially large network, in which her own contribution was important and her brokering role was essential. https://www.youtube.com/watch?v=hbajsn_i3ty&feature=youtu.be hytönen et al. 24 | f l r table 1. in-degree and ego network measures in-degree measures size density (%) broker a20 9 19 8 158 a26* 10 10 13 39 b2 7 11 23 42.5 b21* 9 9 8 33 c17 7 9 39 22 c23 8 11 22 43 m 8.3 11.5 18.8 56.3 sd 3.8 11.9 50.5 measures for all other course participants (does not include central participants‘ measures) m 3.7 5.8 38.7 16.0 sd 2.0 3.7 29.7 29.7 * a26 and b21 did not respond to the networking questionnaire, and, therefore, their measures are based only on information provided by other course participants. the heterogeneity of the central participants‘ personal networks was examined by analysing the network alters‘ university divisions, working sectors, educational backgrounds, genders, previous experience-based knowledge of energy efficiency (self-reported) and field of know-how. in table 2 (see appendix 1), we have provided the frequencies of alters belonging to each central participant‘s personal network, indicating the heterogeneity of the networks. figure 2. network members‘ working sector. the colour code represents the working sector of the participants: green, public sector; blue, private sector. the large spheres represent the cognitively central participants and the small ones represent their network alters. for every participant, we have provided a personal number and a code identifying the university. the six central participants‘ personal networks are merged for the visualization. alter-alter ties are not represented in the figure. hytönen et al. 25 | f l r figure 2 is a visualization of the central participants‘ personal networks in terms of their alters‘ working sectors. all personal networks have been merged into the same figure. the results indicate that the heterogeneity of the central participants‘ networks varied in terms of alters‘ home universities and working sectors, that is, whether they came from the public or private sector (see table 2 and figure 2). a20‘s and c23‘s personal networks were the most heterogeneous in this respect; they included rather even amounts of actors from both the public and private sectors and from all three universities. b2, who worked in the private sector but participated in the public sector education, had contacts with only the participants from the public sector universities. obviously, participating in home university activities had more influence than the working sector as such. figure 3. network members‘ educational background. the colour code represents the educational background of the participants: orange, engineer; blue, architect; green, other; white, information missing. the large spheres represent the cognitively central participants and the small ones represent their network alters. for every participant, we have provided a personal number and a code identifying the university. the six central participants‘ personal networks are merged for the visualization. alter-alter ties are not represented in the figure. furthermore, the public sector actors (a20, a26, b2 and b21) had varied educational backgrounds (see figure 3 and table 2), as did their alters, whereas the personal networks of c17 and c23, who worked in the private sector, had low levels of variety in this respect. with one exception, they included only engineers. the personal networks of a20, b2, b21 and c23 were rather heterogeneous in respect to their alters‘ know-how, whereas a26‘s and c17‘s personal networks were more homogeneous; in the personal network of a26, there were many alters doing either land use planning or construction planning in the public sector, and the majority of c17‘s alters were industrial planners in the private sector (see figure 4 and table 2). hytönen et al. 26 | f l r figure 4. network members‘ filed of know-how. the shape and colour code represent the field of know-how of the participants: circle: red, land use planning; orange, construction planning; brown, environmental surveillance; violet, other; white, information missing. square: blue, planning for industry; green, consultant/surveillance/planning; white, information missing. the large spheres represent the cognitively central participants and the small ones represent their network alters. for every participant, we have provided a personal number and a code identifying the university. the six central participants‘ personal networks are merged for the visualization. alter-alter ties are not represented in the figure. figure 5 reveals that in the personal networks of a20, a26, b21 and c23, there were nearly the same number of female and male alters (see also table 2). in the networks of b2 and c17, there were more participants from their own gender group. it appeared that in the private sector, the central participants‘ personal networks were more male-oriented. this could be explained by the fact that in the context of this particular energy efficiency training, males worked in the private sector more often than females. hytönen et al. 27 | f l r figure 5. network members‘ gender distribution. the colour code represents the genders of the participants: blue, male; red, female. the large spheres represent the cognitively central participants and the small ones represent their network alters. for every participant, we have provided a personal number and a code identifying the university. the six central participants‘ personal networks were merged for the visualization. alter-alter ties are not represented in the figure. figure 6 visualizes the central participants‘ personal networks in terms of their alters‘ previous selfreported, experience-based energy efficiency knowledge. the figure reveals that two of the central participants (a26 and b2) had little or no previous knowledge of energy efficiency (see also table 2). obviously, their central networking position is explained by something else. overall, in each central participant‘s personal network, there were alters with varying amounts of previous energy efficiency knowledge. in this respect, the personal networks of inexperienced participants did not differ from those of experienced participants. hytönen et al. 28 | f l r figure 6. network members‘ previous experience-based professional knowledge of energy efficiency. the colour code represents the level of participants‘ previous knowledge of energy efficiency: green, strong; blue, some; red, minor or none; white, information missing. the large spheres represent the cognitively central participants and the small ones represent their network alters. for every participant, we have provided a personal number and a code identifying the university. the six central participants‘ personal networks are merged for the visualization. alter-alter ties are not represented in the figure. 4.3 ego level: features for achieving the cognitively central position at the ego level of analysis, using the interview data, we examined who the cognitively central actors were and which features were relevant to achieving a central position in more detail. the cognitively central actors differed from one another in terms of age, educational background and the length of work experience (see table 3). in addition, they had different levels of previous energy efficiency-related knowledge in terms of their job description. the interviews revealed that there was not one common explanation as to why these six participants achieved cognitively central positions. instead, various features were emphasized. it is obvious that a central position was not achieved based only on the strength of personal characteristics but also on the basis of what kind of information the other participants were requesting from the cognitively central actors. therefore, the features relevant to achieving the central position are related to the nature of the central participants‘ expertise, their knowledge brokering roles or positions between various fields or cultures, the nature of their employers and their own attitudes towards energy efficiency. in addition, they appeared to be interested in pursuing careers in the energy efficiency field. central participant a20, ―a knowledge-sharing representative of an important organization‖, had strong and wide-ranging working experience in energy efficiency in both the public and private sectors. in her current workplace, a significant public organization, she worked as an energy efficiency expert. as the organization‘s ―internal help‖, she was responsible for ensuring that energy efficiency was taken into account in the organization‘s approaches and decisions, and she advised fellow workers on energy efficiency issues. a20 appeared to have versatile professional connections that supported her daily work, and she emphasized the importance of professional collaboration. her employer functioned as a forerunner in developing and implementing energy-efficient practices and operational models in the public sector. she hytönen et al. 29 | f l r considered it crucial to openly discuss and share the newest knowledge and experiences among actors working with energy efficiency issues in order to promote the development of the energy efficiency field, energy efficiency consciousness and good operational practices: ―i‘ve pretty openly adopted the orientation that i‘m just going to talk and give those ideas.‖ in her experience, it is important to freely discuss both successful and unsuccessful undertakings because this benefits the development of the entire domain. a20 herself assessed that her open attitude towards sharing all types of energy efficiency knowledge was the most important reason for her cognitively central position. by performing research, a20 deliberately aimed to expand her own know-how regarding energy efficiency as well as to produce new information. she highlighted the fact that even though plenty of theoretical and technical energy efficiency knowledge and expertise exists, it is important to produce more practical knowledge and real-life examples to help steer the work of actors working with energy efficiency issues. the challenge is also to produce intelligible energy efficiency knowledge for common people: ―about 80 percent of the others [populace] don‘t understand anything about basic facts if you don‘t translate them into images, and they don‘t need to, because i don‘t understand anything about basic medication. it‘s the doctor who tells me what i have to eat to cope with those symptoms.‖ table 3. background information for the central participants age gender education work experience (years) job description in relation to energy efficiency a20 35–39 female engineer 11–15 a central part of the job description a26 55–59 female architect 36–40 in the background b2 40–44 female engineer 11–15 in the background b21 30–34 male m.sc. 1–5 about half of the job description c17 25–29 male engineer 1–5 a central part of the job description c23 30–34 female m.sc. 1–5 a central part of the job description central participant a26, ―an experienced worker and ‗missionary‘‖, was an architect by training and, like a20, worked at a remarkable organization in the public sector. she did not have any actual experience in energy efficiency issues before participating in the energy efficiency training but did have a great deal of work experience in her own field. by participating in the energy efficiency training and other available education, a26 aimed to become a kind of ―internal energy efficiency consultant‖ in her employing organization: ―it‘s like i have this kind of a model currently in my mind, or that‘s developed, about how i can first get this workplace community educated about taking the importance of energy efficiency into consideration.‖ in this way, she wished to be able to raise the awareness of energy efficiency practices and deliver them to her employer; she described herself as ―a kind of a missionary‖, though she reported holding a peripheral position in her workplace, without any support from her superintendent. central participant b2, ―a gatekeeper for electrical engineering‖, worked in a small private company, although she participated in education that was organized mainly for the public sector actors. she had strong technical know-how related to electrical engineering. as an electrician, she worked with assignments that were not directly related to energy efficiency, and her previous energy efficiency knowledge was minor. however, she highlighted the fact that awareness of energy efficiency matters is increasing in electrical engineering because of changing legislation and the increasing demands of customers; in the future, designs will have to be sustainable in the long term and not ―only such easy fixes‖. b2 emphasized that in electrical engineering, actors are ―contemplating their navels‖ too much instead of collaborating with other domains. participating in the training and networking with the other course hytönen et al. 30 | f l r participants widened b2‘s own professional viewpoint and convinced her of the importance of networking and collaboration across the borders of professional fields: ―we often considered, together with the planners, before the basic elements of a construction project, how could energy efficiencies be defined and such, even before the building is on the table.‖ the field of electrical engineering was unfamiliar to the most of the other course participants, and therefore b2 herself was able to provide them with a new kind of knowledge, presenting a novel perspective on energy efficiency. central participant b21, ―a liaison and eco-man‖, was a m.sc. by training. his job description and know-how comprised mainly of eco-efficiency, thus including many aspects of energy efficiency: ―i am some sort of eco-man, so in a sense, when situations emerge in which i have to take a position on climate or energy issues, then i‘m involved in such projects.‖ in his workplace, b21 functioned as a coordinator and knowledge mediator between land-use-planning actors and environmental authorities regarding issues related to energy efficiency: ―it is just this kind of role of ‗combiner‘, because of course i don‘t know about energy issues as much an engineer from city energy [name changed]. on the other hand, he doesn‘t know anything about land-use planning. still, i‘m not such a great land use designer either, so we have several architects, but then again, they don‘t necessarily know anything about energy efficiency.‖ overall, b21 emphasized that cross-administrative and versatile professional network connections are important in dealing with daily assignments. his employer was a significant public organization that functioned as an example for smaller municipalities. b21 emphasized that, as a large organization, it has better resources with which to develop energy-efficient operational models than smaller municipalities: ―we have really been able to do the kind of development work that not many municipalities can afford or even have time for maybe, so in that sense, we‘ve got a pioneering role.‖ therefore, b21 had profitable energy efficiency related knowledge and advice that he could share with the other course participants working with similar questions. central participant c17, ―an adaptive expert in the industrial sector‖, worked in a private company. although he had only three years of working experience, he had developed strong expertise in energy efficiency issues in a particular industrial field in which his daily work assignments were directly related. c17 had acquired his current energy efficiency knowledge through a few years of purposeful and deliberate efforts toward self-development, and further, he aimed to achieve a comprehensive understanding of all kinds of energy efficiency matters: ―since i started working in this company, i‘ve tried to find an extensive vision for the industrial air pressure systems, their energy efficiency and industrial energy efficiency in general.‖ c17 highlighted the extreme importance of increasing the awareness of efficient energy usage in the industry so that energy efficient behaviour will become a natural and axiomatic part of daily routines, instead of being ―a mandatory chore‖. overall, c17 wished for more openness and interaction between those actors dealing with energy efficiency issues in order to promote the diffusion of good ideas and, more generally, the development of the entire energy efficiency field: ―my overall opinion, outside of this training in general, is a desire to pursue openness and open communication, like exchanging ideas and not holding back information.‖ he aimed to promote this himself by sharing new energy efficiency knowledge, information and perspectives with his colleagues, as well as to customers and other actors in the industry. he had a mission of ―starting, so to speak, to declare our message to our customers and collaborators and other possible parties‖. central participant c23, ―a bridge between the public and private sector‖, had a degree in environmental technology. therefore, she had a different educational background than the majority of her colleagues and other course participants, who were mainly engineers, and a less technical perspective on energy efficiency. she had become acquainted with energy efficiency matters in her current workplace, a private company; her job description included consultancy and planning related to various energy efficiency issues and projects. c23‘s clients were actors and organizations from both the private and public sectors, and therefore, she had gained wide-ranging knowledge and experience in various kinds of energy efficiency issues that could be exploited in industrial and public sector assignments. because of her professional position in the intersection of these two sectors, many participants already knew her before the training: ―i work on both the municipal and the industrial sides, which is probably why, in my training, the people on the municipal and industrial sides knew me. i was probably in the middle there.‖ c23 emphasized that networking and collaboration are required in the diverse energy efficiency field; she stated that it is hytönen et al. 31 | f l r important to have a network of professionals with various kinds of know-how to consult when help and advice are needed—a kind of meta-knowing about who-knows-who-knows-what (borgatti & cross, 2003): ―nobody can be an expert in everything, so it‘s good to know about people who know about some issues and to be able to create such [connections] if you end up working on some projects for customers.‖ to sum up, becoming a cognitively central actor is an intricate process that cannot be reduced to personal characteristics. it is related to the organizations that the actors represent, the expert profiles or competences that they have and how these complement the wider context. cognitive centrality is therefore not only an individual-level capacity. 5. discussion in this study, we relied on the personal network approach to examine which features were relevant to achieving a cognitively central networking and knowledge sharing position in the academic apprenticeship education program in the field of energy efficiency. in emerging fields such as energy efficiency, where standard knowledge exchange mechanisms are still weak, cognitively central members, whose professional knowledge is frequently sought by other actors, are expected to be very important knowledge resources for other members in the network in terms of mediating knowledge and creating connections between different professional cultures. the analysis revealed that the six most central participants differed from each other in many respects, including the length of their work experience, educational background, how much they were involved in energy efficiency and what kind of organizations they came from. whatever the reason, these participants were asked for energy efficiency-related information more often than the other participants, and their knowledge mediating role in energy efficiency issues was essential. thus, the results revealed that there was not a single shared feature that can explain why certain participants became more cognitively central than their peers. according to the homophily principle, people tend to interact more frequently with those who have similar characteristics to themselves, such as those with similar educational levels or members of a joint professional group (mcpherson, smith-lovin, & cook, 2001). the present analysis of the central participants‘ personal networks, in contrast, revealed that many of the networks were rather heterogeneous in nature, including a rich variety of people with different educational and working backgrounds, as well as professional and energy efficiency-related experiences. in particular, the personal network of central participant a20, who had the most important knowledge sharing position in the training, was outstandingly heterogeneous in nature. such heterogeneous resources are obviously needed for coping with a continuously changing environment. even though our previous study (hytönen, palonen, lehtinen, & hakkarainen, 2014) indicated that the energy efficiency training did not support participants in comprehensive networking, the creation of an occupational knowledge-exchange forum and the use of one another‘s complementary expertise on a large scale, the results of this study revealed that some course participants were able to find valuable new connections with people who had novel perspectives on energy efficiency and to cross the boundaries of their immediate professional fields (akkerman, admiraal, simons, & niessen, 2006). apparently, the cognitively central actors possessed knowledge that other course participants found usable, even though they did not necessarily represent the same professional context or culture (see edwards, 2010). cognitive centrality is obviously not related only to personal attributes, such as a high level of professional experience, previous energy efficiency-related knowledge or personal characteristics. it is also related to social contexts, for instance, the nature of the operational environments and employing organizations that the participants represented. in addition, the results indicated that the participants‘ forms of expertise and competence were relationally and contextually assessed (mieg, 2006); their fields of knowhow were not necessarily energy efficiency, but they had strategic and special knowledge in some particular area, such as electrical engineering, that was found useful by other course participants. in addition, the participants representing significant public sector organizations appeared to possess advanced and trustworthy knowledge that was valued by the other participants and that they needed in their own professional contexts (levin & cross, 2004). hytönen et al. 32 | f l r in advice-seeking networks, help is often asked for from persons presumed to be the most knowledgeable and having the strongest experience in the issue in question (nebus, 2006). cumulative individual experience is expected to increase individual proficiency (reagans, argote, & brooks, 2005). however, this investigation revealed that it is not only lengthy professional experience or strong expertise in energy efficiency that make a person cognitively central. other factors such as personal enthusiasm or energy efficiency awareness were, in some cases, more important than strong professional competency or an extensive experience in the field. it seemed to us that young workers with rather limited working experience may quickly acquire relatively strong expertise and become cognitively central knowledge mediating professionals if they deliberately attempt to increase their expertise and succeed in reaching considerable professional capability (ericsson, 2006; hatano & inagaki, 1986). this can be the case especially in emerging fields, in which there are no strong established paradigms and working cultures and where good operational practices are still developing. recent changes in the working world highlight the importance of multi-professional collaboration (edwards, 2010) and a role as a boundary-spanning knowledge broker for professionals (johri, 2008). in addition to mediating knowledge, the key persons acting as knowledge brokers often produce a new kind of brokered knowledge that has been assembled based on knowledge collected from different cultures (meyer, 2010). one essential reason for achieving a cognitively central networking position in the energy efficiency training program was obviously bridging the gaps between various professional cultures and working environments, that is, those between the public and private sectors and between disciplines. in these positions, the cognitively central participants were able to process, build and even create new energy efficiency knowledge to be utilized in novel situations and tasks. it appears to us that the three brokering roles introduced by sverrisson (2001) were present at least in some forms in the central participants personal networks; they obviously connected people working with the energy efficiency issues (networking brokerage); created and translated concepts, theories and new knowledge of energy efficiency (knowledge oriented brokerage); and facilitated innovations and good operational practices and new operational models (brokerage of organizational or technological novelties) in and between the public and private sector organizations. the results indicated that the knowledge mediating role of the central participants was important both in the energy efficiency training and in their larger working environments in terms of aiming to increase awareness of energy efficiency practices and disseminating them to their workplaces. in addition to efforts towards purposeful and continuous self-development (ericsson, 2006; hatano & inagaki, 1986), some cognitively central participants showed a strong willingness to promote the overall development of the energy efficiency field by systematically creating and sharing knowledge and working for the diffusion of good energy efficiency practices. in this, the importance of socially shared professional goals appeared to have essential role (edwards, 2010). in emerging fields, there is often a lack of a stable knowledge base and formal education, as is the case in the field of energy efficiency in finland, and therefore, professional learning takes place through informal and incidental learning (watkins, marsick, & fernández de álava, 2014; palonen, lehtinen, & boshuizen, 2014). finally, it is presumably not possible to determine all possible reasons why someone is a hub for communication. the interviews revealed that informal networking connections and collaboration had important roles in professional activities and development. one example of this was found in the context of participants‘ joint discussions related to everyday energy-efficient practices, such as cooling gardens in the summertime. informal and incidental learning happens without much external facilitation and often occurs unsystematically, and it is therefore difficult to elicit and understand from an outside perspective. 5.1 limitations and further steps one of the limitations of this study was that two of the central participants did not respond to the networking questionnaire. therefore, their data were based only on information given by the other course participants, and we were not able to examine those relationships that they themselves may have had with others, that is, outgoing linkages. in addition, only a limited number of the course participants were interviewed, and, therefore, more research is needed to generalize the results. however, this study demonstrates the potential value of the personal network approach in the study of professional knowledge hytönen et al. 33 | f l r exchange in complex environments. sna provided a useful multi-level approach for determining the cognitively central actors possessing strategic competence in multi-professional fields, studying their role in professional networking and knowledge exchange and examining both the social context and the characteristics of individual actors in these processes. personal networks are often studied via egocentric network interviews in which the participants (egos) are asked to list the alters belonging to their personal networks and to evaluate the relationship between themselves and the alters as well as between each individual pair of alters. in this study, we used the overall network data to study the cognitively central participants‘ personal networks. this approach allowed us to use ties incoming from other course participants to estimate cognitive centrality and to analyse the structure of the personal networks (mccarty & govindaramanujam, 2005) and to visualize the networks on both the overall (sociocentric) and personal levels (see mccarty, molina, aguilar, & rota, 2007). our study contributes to professional learning research by elaborating the concept of cognitive centrality and widening its use outside a small group research. this approach is useful, especially for extension studies. future studies should examine in detail what kind of advice is sought from the cognitively central participants and how it is related to the nature of their expertise. in addition, more research is needed to better understand the phenomenon of cognitive centrality and to discover whether the results found are typical for emerging fields but not generalizable to other contexts. keypoints methods of analysing personal social networks provide a functional unit of analysis for studying the personal and social features of knowledge exchange in complex environments. this study introduces a concept of cognitive centrality and the diverse reasons behind this phenomenon. the article explains how cognitive central actors can be identified and how the flow of advice is centralized in the context of professional networks. this study addresses professional development in an emerging field (i.e., energy efficiency) where the knowledge base is not yet stable or consolidated. the paper focuses on learning processes in the context between working life and higher education institutions and explicates the features that are essential there. acknowledgments research has been funded by futurex project that is part of european social fund programme and finnish ministry of education and culture (asko-project). we would like to thank otto and antti seitamaa for translating transcribed interviews from finnish to english. references aalbers, r., dolfsma, w., & koppius, o. (2013). individual connectedness in innovation networks: on the role of individual motivation. research policy, 42, 624–634. doi: 10.1016/j.respol.2012.10.007. akkerman, s., admiraal, w., simons, r. j., & niessen, t. (2006). considering diversity: multivoicedness in international academic collaboration. culture & psychology, 12, 461–485. doi: 10.1177/1354067x06069947. barabasi, l-l. (2002). linked: how everything is connected to everything else and what it means for business, science, and everyday life. cambridge, ma: perseus publishing. hytönen et al. 34 | f l r bereiter, c., & scardamalia, m. (1993). surpassing ourselves: an inquiry into the nature and implications of expertise. chicago, il: open court. bono, j. e., & anderson, m. h. (2005). the advice and influence networks of transformational leaders. journal of applied psychology, 90, 1306–1314. doi: 10.1037/0021-9010.90.6.1306. borgatti, s. p., & cross, r. (2003). a relational view of information seeking and learning in social networks. management science, 49, 432–445. borgatti, s. p., everett, m. g., & freeman, l. c. (2002). ucinet 6 for windows. harvard, ma: analytic technologies. borgatti, s. p., mehra, a., brass, j. p., & labianca, g. (2009). network analysis in the social sciences. science, 323, 892–895. doi: 10.1126/science.1165821. burt, r. s. (1999). entrepreneurs, distrust, and third parties: a strategic look at the dark side of dense networks. in l. l. thompson, j. m. levine, & d. m. messick (eds.), shared cognition in organizations: the management of knowledge (pp. 213–243). mahwah, nj: erlbaum. creswick, n., & westbrook, j. i. (2010). social network analysis of medication advice-seeking interactions among staff in an australian hospital. international journal of medical informatics, 79, e116–e125. doi:10.1016/j.ijmedinf.2008.08.005. cross, r. (2004). more than an answer: information relationships for actionable knowledge. organization science, 15, 446–462. doi: 10.1287/orsc.1040.0075. cross, r., borgatti, s. p., & parker, a. (2001). beyond answers: dimensions of the advice network. social networks, 23, 215–235. edwards, a. (2010). being an expert professional practitioner. london: springer. ericsson, k. a. (2006). the influence of experience and deliberate practise on the development of superior expert performance. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 683–703). cambridge: cambridge university press. fuhse, j., & mützel, s. (2011). tackling connections, structure, and meaning in networks: quantitative and qualitative methods in sociological network research. quality & quantity, 45, 1067–1089. doi: 10.1007/s11135-011-9492-3. gruber, h., lehtinen, e., palonen, t., & degner, s. (2008). persons in shadow: assessing the social context of high ability. psychology science quarterly, 50, 237–258. hatano, g., & inagaki, k. (1986). two courses of expertise. in h. a. h. stevenson, & k. hakuta (eds.), child development and education in japan (pp. 262–272). new york: freeman. hakkarainen, k., palonen, t., paavola, s., & lehtinen, e. (2004). communities of networked expertise. professional and educational perspectives. amsterdam: elsevier. hytönen, k., palonen, t., lehtinen, e., & hakkarainen, k. (2014). does academic apprenticeship increase networking ties among participants? a case study of an energy efficiency training program. higher education. doi: 10.1007/s10734-014-9754-9. johri, a. (2008). boundary spanning knowledge broker: an emerging role in global engineering firms. proceedings from 38th asee/ieee frontiers in education conference. saratoga springs, ny. kameda, t., ohtsubo, y., & takezawa, m. (1997). centrality in sociocognitive networks and social influence: an illustration in a group decision-making context. journal of personality and social psychology, 73, 296–309. doi: 10.1037/0022-3514.73.2.296. kleinbaum, a. m., stuart, t. e., & tushman, m. l. (2013). discretion within constraint: homophily and structure in a formal organization. organization science, 24, 1316–1336. doi: 10.1287/orsc.1120.0804. krueger, t., page, t., hubacek, k., smith, l., & hiscock, k. (2012). the role of expert opinion in environmental modelling. environmental modelling & software, 36, 4–18. doi: 10.1016/j.envsoft.2012.01.011. lehtinen, e., hakkarainen, k., & palonen, t. (in press). understanding learning for the professions: how theories of learning explain coping with rapid change. in s. billett, c. harteis, & h. gruber. (eds.), international handbook of research in professional and practice-based learning. dordrecht: springer. levin, d., & cross, r. (2004). the strength of weak ties you can trust: the mediating role of trust in effective knowledge transfer. management science, 50, 1477–1490. doi:10.1287/mnsc.1030.0136. hytönen et al. 35 | f l r lin, n. (2001). social capital. a theory of social structure and action. cambridge: cambridge university press. lozares, c., verd, j. m., cruz, i., & barranco, o. (2013). homophily and heterophily in personal networks. from mutual acquaintance to relationship intensity. quality & quantity. doi: 10.1007/s11135-0139915-4. mccarty, c., & govindaramanujam, s. (2005). a modified elicitation of personal networks using dynamic visualization. connections, 26, 61–69. mccarty, c, molina, j. l., aguilar, c., & rota, l. (2007). a comparison of social network mapping and personal network visualization. field methods, 19, 145–162. doi: 10.1177/1525822x06298592. mcpherson, m., smith-lovin, l., & cook, j. (2001). birds of a feather: homophily in social networks. annual review of sociology, 27, 415–444. meyer, m. (2010). the rise of knowledge broker. science communication, 32, 118–127. doi: 10.1177/1075547009359797. mieg, h. a. (2006). social and sociological factors in the development of expertise. in k. a. ericsson, n. charness, p. feltovich, & r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 743–760). cambridge: cambridge university press. morrison, a. (2008). gatekeepers of knowledge within industrial districts: who they are, how they interact. regional studies, 42, 817–835. nardi, b. a., whittaker, s., & schwartz, h. (2000). it‘s not what you know, it‘s who you know: work in the information age. first monday, 5. nebus, j. (2006). building collegial information networks: a theory of advice network generation. the academy of management review, 31, 615–637. doi: 10.2307/20159232. palonen, t. (2003). shared knowledge and the web of relationships. turku: painosalama. palonen, t., hakkarainen, k., talvitie, j,. & lehtinen, e. (2004). network ties, cognitive centrality, and team interaction within a telecommunication company. in h. p. a. boshuizen, r. bromme, & h. gruber (eds.), professional learning: gaps and transitions on the way from novice to expert (pp. 271– 294). dordrecht: kluwer academic publisher. palonen, t., lehtinen, e,. & boshuizen, h. p. a. (2014). how expertise is created in emerging professional fields. in s. billett, t. halttunen, & m. koivisto (eds.), promoting, assessing, recognizing and certifying lifelong learning: international perspectives and practices (pp. 131–149). dordrecht: springer. pataraia, n. margaryan, a., falconer, i., & littlejohn, a. (2013). how and what do academics learn through their personal networks. journal of further and higher education. doi: 10.1080/0309877x.2013.831041. rajagopal, k., joosten-ten brinke, d., van bruggen, j., & sloep, p. (2012). understanding personal learning networks: their structure, content and the networking skills needed to optimally use them. first monday, 17, 1–12. reagans, r. (2011). close encounters: analyzing how social similarity and propinquity contribute to strong network connections. organization science, 22, 835–849. reagans, r., argote, l., & brooks, d. (2005). individual experience and working together: predicting learning rates from knowing who knows what and knowing how to work together. management science, 51, 869–881. doi: 10.1287/mnsc.1050.0366. rissanen, o., palonen, t., pitkänen, p., kuhn, g., & hakkarainen, k. (2013). personal social networks and the cultivation of expertise in magic: an interview study. vocations and learning, 6, 347–356. doi: 10.1007/s12186-013-9099-z. scardamalia, m. (2002). collective cognitive responsibility for the advancement of knowledge. in b. smith (eds.), liberal education in a knowledge society (pp. 67–98). chicago: open court. sparrowe, r. t., liden, r. c., wayne, s. j., & kraimer, m. l. (2001). social networks and the performance of individuals and groups. academy of management journal, 44, 316–325. doi: 10.2307/3069458. stasser, g., abele, s., & vaughan parsons, s. (2012). information flow and influence in collective choice. group processes and intergroup relations, 15, 619–635. doi: 10.1177/1368430212453631. svendsen, a. c., & laberge, m. (2005). convening stakeholder networks: a new way of thinking, being and engaging. journal of corporate citizenship, 19, 91–104. hytönen et al. 36 | f l r sverrisson, á. (2001). translation networks, knowledge brokers and novelty construction: pragmatic environmentalism in sweden. acta sociologica, 44, 313–327. doi: 10.1177/000169930104400403. watkins, k. e., marsick, v. j., & frenández de álava, m. (2014). evaluating informal learning in the workplace. in t. halttunen, m. koivisto, & s. billett (eds.), promoting, assessing, recognizing and certifying lifelong learning (pp. 59–77). dordrecht: springer. wenger, e. (1998). communities of practice: learning, meaning, and identity. cambridge: cambridge university press. hytönen et al. 37 | f l r appendix 1 table 2. heterogeneity of personal networks university working sector education gender previous energy efficiency knowledge a b c public private engineer architect other not known m f strong some minor not known a20 10 6 3 12 7 11 3 2 3 10 9 7 4 5 3 a26 7 3 0 10 0 5 4 1 0 4 6 3 1 6 0 b2 7 4 0 10 1 6 2 2 1 2 9 5 1 4 1 b21 4 5 0 7 2 6 2 0 1 5 4 3 1 4 1 c17 0 0 9 0 9 7 0 0 2 8 1 3 2 1 3 c23 2 2 7 4 7 8 0 1 2 7 4 3 2 4 2 field of know-how public sector private sector land use planning construction planning environmental surveillance other not known industrial planning consulting, surveillance, planning not known a20 5 3 0 3 1 0 5 2 a26 5 5* 1 1 0 0 0 0 b2 5 1* 1 3 1 0 1 0 b21 3 3* 1 0 1 0 2 0 c17 0 0 0 0 0 5 2 2 c23 1 1 2 0 0 4 1 2 a the number in each column indicates how many alters the central participants have in their personal network in relation to specific indicators (university, working sector, educational, gender, previous energy efficiency knowledge and field of know-how). b * for land use planning and construction planning the expertise areas are overlapping and there are 2 persons that have been added to both columns. codepen li et al frontline learning research vol.8 no. 2 (2020) 1 17 issn 2295-3159 peer selection and influence: students’ interest-driven socio-digital participation and friendship networks shupin lia, noona kiurub, tuire palonena, katariina salmela-aroc, kai hakkarainend a department of teacher education, university of turku, finland bdepartment of psychology, university of jyväskylä, finland c faculty of educational sciences, university of helsinki, finland d department of education, university of helsinki, finland article received 6 february 2019/ revised 26 february 2020/ accepted 9 march/ available online 10 june abstract digital technologies have been increasingly embedded in students’ everyday lives. interest-driven socio-digital participation (isdp) involves students’ pursuit of interests mediated by computers, social media, the internet, and mobile devices’ integrated systems. isdp is likely to intertwine closely with young people’s social networks that has been scarcely studied quantitatively. to close this gap, the present paper investigated students’ peer selection and influence effects of the intensity of their isdp and friendship networks. we collected two-wave data by administering a peer nomination to trace students’ friendship networks with peers and a self-reported questionnaire to examine students’ isdp. participants were 100 students in finland (female: 53%; mean age = 13.48, in grade 7 in the first wave). through stochastic actor-oriented modelling, the results showed that the students’ friendship ties with peers influenced the intensity of their isdp practices to become more similar. yet, students did not select peers as friends based on similar intensity levels of isdp. utilizing influence effect found in students’ isdp and their peer networks, we suggest that connected learning (ito et al., 2013) should be promoted to integrate students’ informal and formal learning in order to bridge the gap between students’ informal interest-related digital practices and formal educational practices. keywords: interest-driven socio-digital participation; peer friendship; peer selection and influence; social network analysis info corresponding authoremail: shupin.li@utu.fi doi: https://doi.org/10.14786/flr.v8i4.457 1. introduction the purpose of the present investigation was to examine peer selection and influence effects over time in interest-driven socio-digital participation (isdp). adolescents’ everyday practices are increasingly embedded with socio-digital technologies (e.g., computers, social media, the internet, and mobile devices’ integrated systems), and they socialize using such technologies from the very beginning of their lives (palfrey & gasser, 2011). yet, there appears to be a gap between young people’s digital and educational practices (kumpulainen & sefton-green, 2012; salmela-aro, muotka, alho, hakkarainen, & lonka, 2016) in terms of students who prefer digital learning becoming less and less engaged in school. students who prefer to apply digital technologies to developing their interests by learning in virtual communities outside of school appeared to disengage in traditional schools. these out-of-school interests mediated by digital technologies are not well recognized (rajala, kumpulainen, hilppö, paananen, & lipponen, 2016). informal learning and knowledge obtained outside of school are different from those within the school environment so that there is a mismatch between learners and the formal learning contexts (mcfarlane, 2015). thus, there is an urgent and vital need to research students’ digital practices for pursuing their interests. ubiquitous socio-digital technologies have blurred the boundaries between the time and space of interactions as well as between the virtual and real worlds (baym & boyd, 2012). the increasing use of socio-digital technologies enables young people to pervasively network with their peers (conti, passarella, & das, 2017) in three qualitative genres (ito et al., 2010): 1) friendship-driven participation by connecting with their friends on social media (e.g., chatting with friends), 2) interest-driven participation by seeking interest-relevant knowledge and socializing with peers on the internet who share similar interests and hobbies (e.g., searching information, sharing knowledge and discussing about interests), and 3) creative participation that involves participating in creative production and developing associated digital competences (e.g., creating and modifying media artifacts) with a network of more capable peers, which can assist to develop their career tracks. most finnish young people are engaged in friendship-driven socio-digital participation that often represents a rather shallow use of socio-digital technologies (e.g., chatting with friends on social media) (hietajärvi, salmela-aro, tuominen, hakkarainen, lonka, 2019). although there is not a large group of youth who participates in the creative use of technologies through the internet (hietajärvi, seppä, & hakkarainen, 2016), expanding such learning supportive socio-digital practices across outand in-school contexts appears to be important for meeting challenges of the emerging innovation-driven society (ito et al., 2010; hakkarainen, hietajärvi, alho, lonka, & salmela-aro, 2015). on the other hand, in order to pursue interests, adolescents may participate in either closely or loosely bound social networks for discussing and sharing experiences of their hobbies (e.g. discussing about sports in online virtual communities). cultivating interests by building such extended social networks beyond immediate social community enables adopting a role of local expert sharing knowledge and competences with peers. by this way, adolescents’ interest-related activities are embedded on social networks and involve seamless possibilities of socially sharing interests and intellectual efforts. interest-driven socio-digital participation, in this regard, may be seen as a transition zone that highly intertwines with social networks between the genres of friendship-driven and creative participation. thus, interest-driven practices may be an important prerequisite for students to engage in creative participation that may possibly lead to further academic and even career opportunities (ito et al., 2010). in the present study, we focused on examining the influence and selection processes related to students’ social networks with peers that involve in their isdp. because young people spend much of their time with their peers (subrahmanyam & greenfield, 2008), the norms and characteristics of peer groups have become increasingly important indicators for fitting in. interest-driven digital activities involve both direct contacts between peers (i.e., what they say) and the social modelling of young people’s digital activities (i.e., how they behave) (ito et al., 2010). although research confirms that young people within the same peer groups are inclined to be similar across an array of behavioural outcomes (li, lynch, kalvin, liu, & lerner, 2011), a number of extant studies examine students’ disruptive behaviours (delay, laursen, kiuru, salmela-aro, & nurmi, 2013), academic achievements (fortuin, geel, & vedder, 2016), and school engagement (wang, kiuru, degol, & salmela-aro, 2018). rigorous studies of peer effects on isdp remain scarce, and the processes that underlie peer similarity in isdp are unclear. peer effects in social networks are not easy to estimate, and causal interpretations should be undertaken with caution, as individuals choose whom they will associate with (kremer & levy, 2008). accounting for peer selection effect is suggested because such effect reveals to what extent individuals tend to seek peers as friends based on similar intensity level of interest-related digital activities (manski, 1993). on the other hand, peer influence (christakis & fowler, 2013) causes a distinct shift in the intensity of young people’s interest-related digital participation, making these more similar with those of their connected peers. thus, we simultaneously examined whether adolescents actively selected their peers based on sharing similar levels of intensity in their interest-driven digital activities (i.e., selection effect) and whether youths’ friendship connections with peers contributed to adolescents’ adjusting their isdp to become more similar to their peers’ isdp over time (i.e., influence effect). 1.1 peer selection and peer influence establishing and maintaining friendships are vital in adolescence because young people spend much more time with their peers during this period than any other phase in their life spans (witkow & fuligni, 2010). especially after the transition from elementary to secondary school, it is important for youth to have peers as friends and to be with them in their school lives (haynie, 2001). students’ interactions, friendship negotiation, and their peer groups develop mostly with peers in school (farmer, lines, & hamm, 2011). these interactions and negotiations emerge as a tendency for students to become friends with similar peers. such a phenomenon of similarity among friends is typically known as homophily (mcpherson, smith-lovin, & cook, 2001), with a variety of similar characteristics, for instance, regarding adolescents’ use of tobacco and alcohol (cruz, emery, & turkheimer, 2012; kiuru, burk, laursen, salmela-aro, & nurmi, 2010) and their academic orientation (shin & ryan, 2014, wang et al., 2018). there are two processes that can underlie homophily among peers: peer selection and peer influence (kandel, 1978; veenstra & steglich, 2012). peer selection refers to the procedure by which people select peers according to pre-existing similar characteristics (byrne, 1971). for instance, early adolescents are likely to interact with peers of the same gender (wang & degol, 2017), as well as with whom they collaborate in and out of school (juvonen, espinoza, & knifsend, 2012; li, palonen, lehtinen, hakkarainen, 2018). peer influence, in contrast, refers to the procedure by which peers become more similar over time because of indirect and direct social influence (kandel, 1978). reinforcement may be one of the main mechanisms in the process of peer influence (see kindermann, 2016, for a comprehensive summary). because friendship is, in nature, reciprocal and dyadic (bagwell & bukowski, 2018), young people attempt to achieve common grounds or establish intimacy with friends by reinforcing certain behaviours of their friends’. research on peer influence has revealed that adolescents’ behaviours are remarkably similar to those of their friends due to peer influence (see a review by christakis and fowler [2013]). because peer selection and influence indicate relationships between peers’ friendship ties and their behaviours in two opposite directions (i.e. students select peers based on similar behaviours as selection effect while peer ties influence behaviours to become more similar as influence effect), researchers have suggested that the two procedures work complementarily to explain the similarity of students’ behaviours with their peers’ (svensson, burk, stattin, & kerr, 2012). adolescents may select peers as friends who are at a similar intensity level of isdp, perhaps because it is consistent with their prior behavioural tendencies (farmer et al., 2011). alternatively, it could be because of a similar intensity of using interest-driven socio-digital technologies that provides youth with a seamless channel to remain connected with peers. whether or not students select friends with a similar intensity of isdp, connected friends may become more similar due to peer influence over time. adolescence is a developmental period characterized by the desire to fit in with one’s peers (hamm, farmer, lambert, & gravelle, 2014). peer influence not only occurs through modelling or imitation, but it also occurs through the social comparison and behaviour approximation effects. adolescents increasingly invest in their peers as primary sources of social and emotional support while simultaneously using feedback and acceptance from their peers to achieve a sense of their selves. young people thus engage in behaviours that match the social norms of a valued or desired peer group (brechwald & prinstein, 2011). youth’s interest-driven socio-digital participation occurs in a social sphere in which their practices are immediately visible to their peers. in most cases, young people’s socio-digital practices are ultra-social in nature in terms of calling for the engagement of peers. how intensively young people engage in isdp may, thus, be related to their friendship networks with peers. adolescents often attempt to fit in to peer groups in which members share similar activity patterns for interest pursuits, as mentioned above. hence, peer groups may often reach high levels of similarity in their isdp through selection and influence processes. because the selection and influence processes occur complementarily (svensson et al., 2012), it is critical to estimate selection and influence effects simultaneously within behavioural and network dynamics (steglich, snijders, & pearson, 2010). yet, there is scant research examining the selection and influence effects between youths’ peer friendship networks and their isdp interest-driven use of digital technologies. most extant research shows qualitatively that young people’s pursuit of their interests is highly embedded in their social networks with peers (e.g., penuel, digiacomo, van horne, & kirshner, 2016; wernholm, 2018). however, these previous studies have not distinguished particularly peer selection from peer influence. instead, they merely examine the extent to which students’ interest-driven digital practices are related to their participation in social communities. our paper attempts to close this gap. it is noteworthy that peer academic support online is likely to embed in the co-evolution of students’ isdp and their peer social networks (van rijsewijk, snijders, dijkstra, steglich, & veenstra, 2019). the concept of connected learning (ito et al., 2013) elaborates this issue theoretically. connected learning integrates three contexts for learning: peer-supported, interest-driven and academically oriented. peer-supported collaboration enables young people to use skills acquired in both formal and informal learning contexts. in turn, interest-driven digital learning practices may elicit the inspiration to learn related competences in schools. connected learning appears to link students’ practices related personal interests with formal learning to empower academic achievements and even career possibilities (ito et al., 2013). as indicated by connected learning research (e.g., deng, connelly, & lau, 2016), peer academic support online intertwines with students’ interest-driven socio-digital participation. therefore, present study considered peer academic support online as an influencing function in the co-evolution of students’ friendship and their isdp. in addition, we also treated gender and being in the same classroom as the other influencing factors in this co-evolution process because previous research posits that early adolescents are likely to interact with peers of same gender (wang & degol, 2017) in the same classroom (gremmen et al., 2019) over time. 1.2 objectives the present study aimed at providing insights into possible selection and influence processes among peers related to isdp within grade networks. we selected the school period of grade 7 as the starting measurement point because it falls at the beginning of secondary education in finland, when students enter into new peer environments. many young people must establish new friendships and find places in the new peer ecology, which we assumed would serve as an ideal context for examining selection and influence effects (altermatt & pomerantz, 2003). the following research questions were addressed: 1. do young people select their friends according to similar intensity of isdp over time (i.e., selection effect)? 2. do adolescents’ friendship network dynamics influence the intensity of their isdp to become more similar with that of their peers (i.e., influence effect)? 2. methods 2.1 participants and procedure students from five classes at a school in a city in southern finland participated in the present study in the spring of 2013 (time 1 [t1]) and 2014 (time 2 [t2]). at t1, participants were in grade 7 at the average age of 13.48 (sd = 0.55). we simultaneously administered a peer nomination (using the grade roster) and a self-reported questionnaire for all participants during their ordinary class time at both time waves. a total of 103 students were in the grade roster in both time waves. three students, who were nominated by grade peers but did not agree to participate in the present study, were removed from the list when we created friendship networks for both time waves (shin, 2018). hence, 100 students (male: 47, 47%; female: 53, 53%; mean age = 13.48, sd = 0.55) were in two-wave networks in this research. the grade included five classes with numbers of students ranging from 15–25 in each classroom. sixteen students (16%) appeared only in t2 and not in t1, whereas fifteen students (15%) were present only in t1 and not in t2. altogether, 84 (84%) and 82 (82%) participants responded to self-reported questionnaire in the two waves, respectively. 2.2 measures 2.2.1 friendship networks (t1 and t2) we collected the friendship networks within the grade in two waves through peer nomination using a grade roster (scott, 2000). in practice, each respondent received a list of names in the grade, and the respondents could not add any participants outside of this name list. additionally, participants could nominate as many or as few peers as they wished within the grade roster (for similar methods, see cillessen & borch, 2006). we asked the respondents to indicate the existence of each networking relation of “who you spend time with” by marking the name with an “x.” we imputed all the responses from the peer nomination into adjacency matrices, altogether yielding one matrix for a peer friendship network for each time wave respectively. we coded “1” for linked ties and “0” to represent situations in which two participants lacked a tie between them. further, we used “na” to code the cells in the matrices to indicate friendship relations for participants who did not appear (ripley, snijders, boda, vörös, & preciado, 2018). 2.2.2 interest-driven socio-digital participation (t1 and t2) we examined students’ isdp during both time waves by using a self-reported questionnaire. rather than students’ experiences of interest pursuits (maul et al., 2017), we were interested in young people’s digital practices mediated in their interest pursuits. we relied on earlier measurements of adolescents’ various isdp (e.g., hakkarainen et al., 2000); we also used new items that hietajärvi et al. (2016) developed representing the relatively recent emergence of internet-related activities in the finnish context. accordingly, the questionnaire included 5 items using a likert-type scale from 1 (“never”) to 7 (“all the time”) to assess the intensity of various interest-driven digital activities (see constructs in appendix, li, hietajärvi, palonen, salmela-aro, & hakkarainen, 2017), including “how often do you search or follow new information about your hobbies or things that interest you?”, “how often do you read blogs or forums?”, “how often do you write and comment in forums?”, “how often do you share pictures and picture updates that you took with your phone?”, and “how often do you share music or ‘mix tapes’ you have made?” in addition, these five items anchored on qualitative findings of (isdp) by ito and colleagues (2010) that involved one-year long ethnographic investigation on students’ socio-digital participation. the cronbach’s alphas of these items in two waves were 0.69 and 0.75, respectively. it is notable that there were 21% (n = 21) and 18% (n = 18) missing values of isdp in t1 and t2, respectively. because rsiena statistical package (ripley et al., 2018) for modelling network dynamics requires categorical dependent behavioural variables, we used the mean values of isdp rounded to the nearest integer. 2.2.3 covariates because connected learning integrates three contexts for learning (i.e., peer-supported, interest-driven and academically oriented) that we mentioned above, in the self-reported questionnaire for t1, we considered students’ peer academic support as controlled variable in the process of interest-driven digital practices. thus, we also asked participants the following: “how often do you ask for help from friends on school work-related issues?” and “how often do you give help to your friends on school work-related issues?” with likert scales from 1 “never” to 7 “all the time.” seventy-seven students (77%) replied to each question, respectively. we used the mean values of these two items as a measure of participants’ engagement in peer academic support (m = 2.86, sd = 1.52). on the other hand, we coded gender as 1 = female and 2 = male. we had no missing value for the gender variable. we used whether participants came from the same classroom (1 = yes, 0 = no) as another covariate. 2.3 analytic strategy we applied multiple imputation (rubin 1987, 1996) for missing data of individual variables to impute 20 data sets (van buuren, 2018) (see section of missing data treatment). we used the imputed data and original friendship networks to estimate 20 stochastic actor-oriented models (saoms). finally, we combined the results of these 20 models. 2.4 treatment of missing data a total of 10–20% of the data were missing in the self-reported questionnaire mentioned above, creating difficulty in obtaining model convergence and good model estimates in further dynamic network modelling (ripley et al., 2018, p. 32). multiple imputation is one of the most efficient methods for handling incomplete data in which missing data occur in more than one variable in a data set (van buuren & groothuis-oudshoorn, 2011). we utilized the mice (multivariate imputation via chained equations) 3.3.0 package (van buuren & groothuis-oudshoorn, 2011) in r 3.5.1 (r development core team, 2011) to impute our data on isdp (t1 and t2) and peer academic support online (t1). mice assumes that the missing data are “missing at random” (mar) meaning that the probability that a value is missing depends only on other observed values and can be predicted by using these values based on their linear or correlation relationship. because of the correlation and linear relationships found between adolescents’ social use of digital technologies and their mathematic achievements (qing & xin, 2010) as well as between youths’ social use of digital technologies and their digital competences (hargittai, 2010), we included digital competences (t1) and mathematic achievements (t1) to impute isdp (t1 & t2). we assessed students’ digital competences through 23 items with a likert-type scale from 1 (“not at all”) to 7 (“proficient”) in a self-reported questionnaire. we adapted measures developed by hakkarainen and colleagues (2000) by adding items that emerged due to recent technological developments. the measures included basic (e.g., “use a text-processing program to search for information on the internet”), moderate (e.g., “edit and modify digital photos”), and advanced skills (e.g., “set up a desktop with components (e.g. processor, sound card, graphic card)” and “programming”). seventy-five participants (75%) responded to all the digital competences items; the cronbach’s alpha of these items was 0.91. further, we obtained information on students’ mathematic achievements in the t1 self-reported questionnaire. the grades were from 4 (lowest) to 10 (highest). self-reported academic achievement had a correlation coefficient of 0.96 with actual achievement among finnish students at the secondary level (holopainen & savolainen, 2005). we used items measuring the sharing of academic materials online (t1) and discussing schoolwork issues online (t1) to impute peer academic support (t1) due to their correlation: sharing academic materials online and asking for schoolwork help online (pearson: 0.36, p < 0.001), sharing academic materials online and giving school work help online (pearson: 0.44, p < 0.001), discussing schoolwork issues and asking for schoolwork help online (pearson: 0.52, p < 0.001), discussing schoolwork issues and giving school help online (pearson: 0.66, p < 0.001). although these correlation values are not too high, this might not matter in this case because the amount of imputed missing data is relatively small. for sharing academic materials online, participants responded to the item “how often do you share materials you have created related to your schoolwork (homework, notes, essays) online with your peers?” we used the item “how often do you discuss school work-related issues with your peers online?” to measure students’ discussion of schoolwork online. both items used a likert-type scale from 1 (“never”) to 7 (“all the time”) in the t1 self-reported questionnaire. van buuren (2018) suggested 5–20 imputations “will be enough under moderate missingness”. hence, we imputed 20 sets of isdp (missing values in t1: n = 20, 20%; t2: n = 18, 18%) and peer academic support (t1 missing values: n = 23, 23%). 2.5 stochastic actor-oriented model (saom) our primary analyses included saom (conducted in rsiena 1.2-12) representing network-behaviour dynamics that snijders (2005) and snijders, van de bunt, and steglich (2010) developed. the model consisted of parameters representing friendship changes (i.e., network dynamics) and changes in individual isdp (i.e., behavioural dynamics). we applied a continuous-time markov chain monte carlo procedure to model the sequence of individual events with the highest probability of describing the total amount of change in friendship networks and individual isdp behaviours observed between the two time points (snijders, 2005; snijders et al., 2010). we included 20 imputed isdp data sets in the model (see the treatment of the missing data above). in addition, we included the imputed peer academic support online, gender and whether coming from the same classroom as controlled variables. the indicator for model convergence—the absolute value of the t-ratio for an individual parameter—was less than 0.1, and the overall t-ratio was less than 0.2, both of which statistically confirmed the model’s convergence (ripley et al., 2018). we applied the pool function in mice to combine the 20 model outcomes. 2.6 model parammeters we described the parameter estimates of the model (see table 1) based on terminology that snijders and colleagues (2010) applied. the model primarily examined selection (the extent to which students selected peers as friends based on their similar levels of isdp) and influence (whether students’ peer friendship influence their levels of isdp to become more similar) with the covariates of peer academic support, gender, and being in the same classroom. we assessed both the selection and influence effects in regard to isdp in the models (see de la haye, green, kennedy, pollard, & tucker [2013] for more details on these co-evolution models). selection effect was represented by “isdp similarity”—that is, the extent to which adolescents selected new connected peers at t2 based on similar level of isdp at t1. in other words, whether similarity in isdp predicted the formation of new ties. we also estimated “isdp alter” and “isdp ego” effects. “isdp alter” was the effect of being nominated by peers based on isdp; a positive effect meant a higher likelihood of receiving peers’ friendship nominations when adolescents had a higher level of isdp. “isdp ego” was the effect of nominating other peers as friends based on isdp; a positive effect meant that the students with higher values of isdp were more likely to nominate more peers as friends. we also considered the variation that adolescents nominated their grade peers as friends (“peer academic support ego”) and were nominated as friends (“peer academic support alter”) as a function of their academic support with peers. in addition, we used “peer academic support similarity,” “same gender,” and “same class” to estimate whether students became friends with peers with similar levels of peer academic support, with the same gender, and who were in the same class. we used the parameter of “average similarity” as the influence effect instead of other potential specifications of friend influence because ripley et al. (2018) suggested that it consistently converged well across models. “isdp average similarity” was the tendency of adolescents’ isdp level to become more similar with that of their peers over time. we included the effect of change in isdp as a function of peer academic support, gender and being in the same classroom (“effect from”). moreover, we controlled for important network structural effects that are suggested to be the basic effects included in sao models (veenstra & steglich, 2012): outdegree (density), which is the general tendency of adolescents to selectively nominate their peers as friends; reciprocity, which is the tendency to make reciprocated friendship nominations, and transitive reciprocated triplets, referring to the tendency to reciprocate the nomination of friends of their friends. effects of out-degree (or density) and reciprocity are those always included in a model of rsiena package while effects of transitive triplets and transitive reciprocated triplets attempt to capture the tendency to network closure and they contribute to a good fit of the model (ripley et al., 2018). table 1 shows the effects we included in the rsiena model; for detailed effect descriptions, see ripley et al. (2018). the observed networks’ various measures excluded from the model (i.e., indegree, outdegree, and triad census) were found to be within the distributions of those measurements within 100 simulated networks with the same density of observed networks. this indicated that the model presented in this study was able to capture and represent the observed networks. table 1 explanation of parameters in the rsiena model note. isdp = interest-driven socio-digital participation 3. results 3.1 descriptive statistics 3.1.1 friendship networks (t1 and t2) the descriptive results of the developmental networks are shown in table 2. both the density and the average number of ties per participant increased from t1 to t2 within friendship networks, meaning that the students had more friends over time. the friendship networks in the two waves thus showed density values of 8.0% and 10.0%, respectively. the jaccard index between the two-wave networks was 0.36, showing that peer networks did not change rapidly or abruptly (value > 0.30), as per ripley et al. (2018, p. 20). 3.1.2 interest-driven socio-digital participation (t1 and t2) there were 65 participants (65%) who responded in both time waves. twenty-eight students (28%) reported higher frequencies of isdp in t2 than in t1, whereas 11 participants’ (11%) frequencies of isdp decreased in t2 compared to t1. twenty-six students (26%) reported having the same level of isdp in the two-time waves. table 2 shows descriptive statistics in detail. table 2 descriptive statistics of friendship network structure and interest-driven socio-digital participation 3.2 selection and influence related to friendships and isdp table 3 shows the model of peer selection and the influence effects related to peer friendships and isdp. as one of the primary effects of this model, students did not select peers as friends within the same grade based on their similar levels of isdp with peers (“isdp similarity”), meaning that we found no selection effect for isdp in friendship dynamics. by contrast, the model showed that students’ friendship ties influenced each other’s intensity of isdp over time (positive “isdp average similarity”). students’ isdp appeared to become more similar with that of their peers across time. in other words, peers’ isdp contributed to the increase or decrease of the intensity of students’ isdp. on the other hand, students with a higher intensity of isdp were likely to nominate more peers as their friends (positive “isdp ego”), whereas isdp did not affect the number of receiving friendship nominations from peers (no significance in “isdp alter”). in terms of controlling peer academic support online, adolescents who engaged more in peer academic support online were likely to receive fewer friendship nominations from their peers (negative “peer academic support alter”). moreover, students were likely to have friendships with peers of the same gender and of the same class as per the positive significant “same gender” and “same class” parameters in the model. the negative “out-degree” parameter indicated that there are in general costs to establishing ties; that is, young people typically would not nominate an infinite number of peers as friends. adolescents were also tended to reciprocate the nominations they had (“reciprocity”), meaning that connections between two participants were likely to be reciprocal. young people were inclined to form hierarchical triadic relationships with the friends of their friends (positive “transitive triplets” and negative “transitive reciprocated triplets”) in the grade, indicating that a nested structure was statistically significant in adolescents’ friendship networks with peers. table 3 dynamic model of selection and influence in friendship networks: estimates and standard errors (ses) for interest-driven socio-digital participation note. gender coded: female = 1, male = 2. ***p < .001, **p <.01, *p < .05 4. discussion within a school in finland, we exploratorily examined selection and influence effects in co-evolution of students’ friendship dynamics and the intensity of their isdp, controlling for student gender, being in the same classroom and peer academic support online, as well as several network and behavioural tendencies. by applying stochastic actor-oriented modeling, we found out that the levels of students’ isdp became more similar with that of their peers over time, whereas young people did not select peers as their friends based on their similar levels of isdp. one explanation for the lack of selection effect could be that the data was collected from classrooms that provided already constrained social contexts with limited possibilities of selecting peers as friends. peer influence is a process where a young person affects or is affected by another. influence in behaviours occurs when an adolescent acts in ways that he or she may not otherwise act; it is an effect that is attributed to joint experiences with friends. in particular, mechanism of reinforcement (bagwell & bukowski, 2018) may be able to explain the influence effect between young people’s friendship with peers and their isdp. mutual friendship may have a powerful reinforcement effect over time on behaviours mediated digitally by mutual interest-driven activities among young people. within friendships, norms for expected behaviours (e.g., mastering digital activities, pursuing shared interests) are created and friends actively push their peers to engage in activities that are in accordance with the shared norms and expectations. in this way, they attempt to achieve common ground of shared interests, deepen their mutual relations or even develop intimacy with friends (gottman, 1983). generally, influence is considered to be “a reflection of engagement” (laursen, 2018). peers who are intensively engaged in activities that interest the young person are likely to have greater influence than those who are not, especially if the engagement is collaborative (brechwald & prinstein, 2011). interest-driven socio-digital activities are collaborative in nature. various interest-driven digital practices (e.g., seeking for information, producing and sharing knowledge with connected peers) enable young people to engage in exerting great influence effect among peers. as young people work to strengthen their friendship ties with their peers, similarity might increase fastest in the early stage of the relationship (laursen, 2018). participants in present study were at grade 7 at the first measure while they were at grade 8 at the second measure time. grade 7 is the first year of lower secondary school in finnish educational system, right after finishing elementary school. therefore, similarity among present participants is likely to increase rapidly during their grade 7 and grade 8 and it also contributes to explain why influence effect between students’ friendship and isdp was found. in terms of selection, previous research posits that demographic attributes (e.g., gender for early adolescence) appear to play primary role when young people select networking partners in initial friendship interaction (mcpherson et al., 2001). while friendship develops closer, similarity might continue to increase in private domains (e.g., those related to interests) that were not part of initial social interactions (laursen, 2018).this would explain why we did not find early adolescents select peers as their friends based on their similar level in isdp. there has been little research examining peer selection and influence effects related to the intensity of adolescents’ isdp and their friendship network dynamics. our results on peer influence effect are partly in line with previous studies that did not simultaneously estimate peer selection and influence effects. escardíbul et al. (2013) found that the intensity of spanish youth playing video games was similar with that of their peers. more recently, amialchuk and kotalik (2016) reported similar results among us male adolescents that students’ intensity of playing games is influenced by their peers to become more similar. while these two investigations examined peer influence on the intensity of young people’s video game playing, our study simultaneously focused on peer selection and peer influence regarding interest-driven activities. present study is unique in terms of examining selection and influence effects at the same time in co-evolution of young people’s friendship with peers and their interest-driven socio-digital practices. yet, we found out that adolescents within a school in finland are not likely to select peers as friends based on their aligned isdp. it is critical to understand young people’s influence on the isdp of their peers as well as how isdp affects the selection of friends in a context where ample knowledge and information are available for students’ learning. educational activities are increasingly mediated by digital practices and social learning with peers, and require students having increasingly more sophisticated socio-digital competences, especially in relation to academic studying and creative production (hietajärvi et al., 2016; li et al., 2017). the fact that students are able to influence their peers’ interest-driven socio-digital activities through their informal interactions provides an option for teachers to capitalize on students’ social, peer-to-peer learning resources. students who are competent in digital technologies could be engaged in tutoring their peers as part of computer-supported collaborative learning activities (riikonen, seitamaa-hakkarainen, & hakkarainen, 2018). social learning and peer tutoring play important roles in the type of computer-supported collaborative learning that is becoming more commonplace in finnish educational institutions (korhonen & lavonen, 2017; niemi, kynäslahti, & vahtivuori-hänninen, 2013). through such pedagogies, similar social learning resources that appear to be involved in isdp could be also harnessed for supporting school learning. 5. educational implications and limitations more importantly, because students’ friendship-based peer networks influence their interest-driven socio-digital participation to become more similar, connected learning (ito et al., 2013) should be promoted to integrate informal interest-related activities and formal learning to bridge the gap between students’ informal interests and educational practices. digitally mediated connected learning can be seen as “a social construct that emerges in interaction while learners engage in various social practices mediated by different artefacts” (kumpulainen & sefton-green, 2012); as we mention above, it integrates interest-driven, peer-supported and academically oriented learning contexts. such multi-contextual settings enable students’ learning practices to be production-centered and sharing-grounded across various networked borders. for instance, penuel and colleagues (2016) illustrated a case that jerome (pseudonym) participated in a programme of a science museum for a ninth or tenth grader and served as a docent for the museum visitors; he had opportunities, with peers, to contribute to science investigations by resident scientists. during such connected learning programmes, students are able to engage in interest-driven, peer-supported and academically oriented knowledge practices across multiple contexts (i.e., out of school and in school). as some reviews have summarized, such “border crossing” (akkerman & bakker, 2011) knowledge practices between formal and informal learning (bronkhorst & akkerman, 2016; rajala et al., 2016) are simultaneously interestand network-based. the fact that students’ friendship networks with peers influence their interest-driven digital practices suggests that educational institutions should foster students’ competences in the interest-driven and academic use of digital technologies so that academic and out-of-school knowledge flows and peer-supported communities expand from students’ daily lives to schools and vice versa. in this way, rather than a closed, undialectical or immobile space, school becomes an open, dynamic and multifaceted learning community with different connections (e.g. knowledge, social relationships, learning artefacts) to students’ everyday practices and learning. the limitations of the present study warrant consideration. participants’ self-reports on the intensity of their isdp may have been biased to some extent by being overestimated or underestimated due to errors in memory or a lack of awareness of the actual frequency with which they used socio-digital technologies for their interests. in addition, we examined the intensity of students’ interest-driven participation. future studies could qualitatively examine youths’ isdp to obtain comprehensive knowledge about what students actively do related to their interests mediated by socio-digital technologies. finally, the present study addresses results from a school in southern finland; the small sample size provided relatively small pool for students to possibly connect with peers of similar level of isdp. additionally, results may be different in other contexts due to possible diverse patterns of young people’s friendship with peers and their isdp in other cultural contexts. future studies should expand the sample to other areas of finland. after collecting the present data, the finnish matriculation examination that is the only high-stake test in finland, has been digitalized together with nation-wide efforts of supporting digitalization of schools; this is likely to have a significant impact on school use of digital technologies for learning and instruction. consequently, it will be critical to obtain more detailed information of young people’s within school practices of using socio-digital technologies and associated pedagogical approaches; actually we have developed refined self-report instruments for that purpose, including also collecting social networking data from larger sample of students. similar instruments are being administered to a sample of teachers to have their perspectives to complement student data. 6. conclusions pervasiveness of socio-digital technologies has been incredibly increasing and young people’s socio-digital practices are constantly transforming from one cohort to another. by applying stochastic actor-oriented modelling upon two-wave students’ social network with peers and their intensity of isdp, we examined peer selection and influence between early adolescents’ friendship with their peers of the same grade and their intensity of isdp in a school in finland. the findings indicated that students did not appear to select peers as friends based on their similar intensity level of isdp. yet, students’ friendship ties with peers enable their intensity of isdp to become more similar with that of their friends. in order to bridge the gap of students’ socio-digital participation outside school and the educational practices, the results suggested that schools should utilize connected learning (ito et al., 2013) to take into consideration the interests pursued by students outside of school when designing formal learning contexts. toward that end, the phenomenon-based pedagogy, which characterizes the finnish national curriculum and calls for inviting even primary students to participate in co-designing open-ended technology enhanced study projects, provides opportunities for connected learning. keypoints students’ friendship ties influenced their intensity of interest-driven socio-digital participation to become similar as that of peers’. students did not select peers as friends based on similar intensity levels of interest-driven socio-digital participation. young people with a higher intensity of interest-driven socio-digital participation were likely to nominate more peers as their friends. intensity of interest-driven socio-digital participation did not affect the number of receiving friendship nominations from peers. connected learning should be promoted to integrate informal and formal learning. acknowledgements this study was supported by: finnish cultural foundation (shupin li, 00172381), the academy of finland project “bridging the gaps: affective, cognitive, and social consequences of digital revolution for youth development and education” (katariina salmela-aro, pi, 308351), the academy of finland project “co4-lab” (kai hakkarainen, pi, 286837) and, the strategic research council project of the academy of finland “growing mind: educational transformations for facilitating sustainable personal, social, and institutional renewal in the digital age” (kai hakkarainen, pi, 3125 references akkerman, s. f., & bakker, a. (2011). boundary crossing and boundary objects. review of educational research, 81(2), 132–169. doi: 10.3102/0034654311404435 altermatt, e. r., & pomerantz, e. m. (2003). the development of competence-related and motivational beliefs: an investigation of similarity and influence among friends. journal of educational psychology, 95(1), 111–123. doi: 10.1037/0022-0663.95.1.111 amialchuk, a., & kotalik, a. (2016). do your school mates influence how long you game? evidence from the us. plos one, 11(8): e0160664. doi: 10.1371/journal.pone.0160664 bagwell, c. & bukowski, w. (2018). friendship in childhood and adolescence: feature, effects and processes. in w. m., bukowski, b. laursen, & k. h. rubin (eds.), handbook of peer interactions, relationships, and groups (2nd ed.) (pp. 371-390). new york, ny: the guilford press. baym, n. k., & boyd, d. (2012). socially mediated publicness: an introduction. journal of broadcasting & electronic media, 56(3), 320–329. doi: 10.1080/08838151.2012.705200 brechwald, w. a., & prinstein, m. j. (2011). beyond homophily: a decade of advances in understanding peer influence processes. journal of research on adolescence, 21(1), 166–179. doi: 10.1111/j.1532-7795.2010.00721.x bronkhorst, l. h., & akkerman, s. f. (2016). at the boundary of school: continuity and discontinuity in learning across contexts. educational research review, 19, 18-35. doi: 10.1016/j.edurev.2016.04.001 byrne, d. (1971). the attraction paradigm. new york, ny: academic. christakis, n. a., & fowler, j. h. (2013). social contagion theory: examining dynamic social networks and human behavior. statistics in medicine, 32(4), 556–577. doi: 10.1002/sim.5408 cillessen, a. h. n., & borch, c. (2006). developmental trajectories of adolescent popularity: a growth curve modelling analysis. journal of adolescence, 29(6), 935–959. doi: 10.1016/j.adolescence.2006.05.005 conti, m., passarella, a., & das, s. k. (2017). the internet of people (iop): a new wave in pervasive mobile computing. pervasive and mobile computing, 41, 1–27. doi: 10.1016/j.pmcj.2017.07.009 cruz, j. e., emery, r. e., & turkheimer, e. (2012). peer network drinking predicts increased alcohol use from adolescence to early adulthood after controlling for genetic and shared environmental selection. developmental psychology, 48(5), 1390. doi: 10.1037/a0027515 de la haye, k., green, h. d., kennedy, d. p., pollard, m. s., & tucker, j. s. (2013). selection and influence mechanisms associated with marijuana initiation and use in adolescent friendship networks. journal of research on adolescence, 23(3), 474–486. doi: 10.1111/jora.12018 delay, d., laursen, b., kiuru, n., salmela-aro, k., & nurmi, j. -e. (2013). selecting and retaining friends on the basis of cigarette smoking similarity. journal of research on adolescence, 23, 464–473. doi: 10.1111/jora.12017 deng, l., connelly, j., & lau, m. (2016). interest-driven digital practices of secondary students: cases of connected learning. learning, culture and social interaction, 9, 45-54. doi: 10.1016/j.lcsi.2016.01.004 escardíbul, j. o., mora, t., & villarroya, a. (2013). peer effects on youth screen media consumption in catalonia (spain). journal of cultural economics, 37(2), 185–201. doi:10.1007/s10824-012-9177-3 farmer, t. w., lines, m. m., & hamm, j. v. (2011). revealing the invisible hand: the role of teachers in children’s peer experiences. journal of applied developmental psychology, 32(5), 247–256. doi:10.1016/j.appdev.2011.04.006 fortuin, j., geel, m. v., & vedder, p. (2016). peers and academic achievement: a longitudinal study on selection and socialization effects of in-class friends. the journal of educational research, 109(1), 1–6. doi:10.1080/00220671.2014.917257 gottman, j. m. (1983). how children become friends. monographs of the society for research in child development, 48(3), 1–86. doi:10.2307/1165860 gremmen, m. c., berger, c., ryan, a. m., steglich, c. e., veenstra, r., & dijkstra, j. k. (2019). adolescents’ friendships, academic achievement, and risk behaviors: same‐behavior and cross‐behavior selection and influence processes. child development, 90(2), e192-e211. doi:10.1111/cdev.13045 hakkarainen, k., ilomäki, l., lipponen, l., muukkonen, h., rahikainen, m., tuominen, t., . . . lehtinen, e. (2000). students’ skills and practices of using ict: results of a national assessment in finland. computers & education, 34(2), 103–117. doi:10.1016/s0360-1315(00)00007-5 hakkarainen, k., hietajärvi, l., alho, k., lonka, k., & salmela-aro, k. (2015). socio-digital revolution: digital natives vs digital immigrants. in j. d. wright (ed.), international encyclopedia of the social and behavioral sciences (2nd ed.) (vol. 22, pp. 918-923). amsterdam, the netherlands: elsevier. doi: 10.1016/b978-0-08-097086-8.26094-7 hamm, j. v., farmer, t. w., lambert, k., & gravelle, m. (2014). enhancing peer cultures of academic effort and achievement in early adolescence: promotive effects of the seals intervention. developmental psychology, 50, 216–228. doi: 10.1037/a0032979 hargittai, e. (2010). digital na(t)ives? variation in internet skills and uses among members of the “net generation.” sociological inquiry, 80(1), 92–113. doi: 10.1111/j.1475-682x.2009.00317.x haynie, d. l. (2001). delinquent peers revisited: does network structure matter? american journal of sociology, 106, 1013–1057. doi: 10.1086/320298 hietajärvi, l., salmela-aro, k., tuominen, h., hakkarainen, k., & lonka, k. (2019). beyond screen time: multidimensionality of socio-digital participation and relations to academic well-being in three educational phases. computers in human behavior, 93, 13-24. doi: 10.1016/j.chb.2018.11.049 hietajärvi, l., seppä, j., & hakkarainen, k. (2016). dimensions of adolescents’ socio-digital participation. qwerty, 11(2), 79–98. holopainen, l., & savolainen, h. (2005). unpublished raw data. finland: university of joensuu and university of jyväskylä. ito, m., baumer, s., bittanti, m., boyd, d., cody, r., stephenson, b., . . . tripp, l. (2010). hanging out, messing around, and geeking out: kids living and learning with new media. cambridge, ma: mit press. ito, m., gutiérrez, k., livingstone, s., penuel, b., rhodes, j., salen, k., ... & watkins, s. c. (2013). connected learning: an agenda for research and design. irvine, ca: digital media and learning research hub. juvonen, j., espinoza, g., & knifsend, c. (2012). the role of peer relationships in student academic and extracurricular engagement. in s. l. christenson, a. l. reschly & c. wylie (eds.), handbook of research on student engagement (pp. 387–401). boston, ma: springer. doi: 10.1007/978-1-4614-2018-7_18 kandel, d. b. (1978). homophily, selection, and socialization in adolescent friendships. the american journal of sociology, 84, 427–436. doi: 10.1086/226792 kindermann, t. a. (2016). peer group influences on students’ academic motivation. in k.r. wentzel, & g. b. ramani (eds.), handbook of social influences in school contexts (pp. 31–47). new york, ny: routledge. kiuru, n., burk, w. j., laursen, b., salmela-aro, k., & nurmi, j. e. (2010). pressure to drink but not to smoke: disentangling selection and socialization in adolescent peer networks and peer groups. journal of adolescence, 33(6), 801–812. doi: 10.1016/j.adolescence.2010.07.006 korhonen, t., & lavonen, j. (2017). a new wave of learning in finland: get started with innovation! in s. choo, d. sawch, a. villanueva, & r. vinz (eds.), educating for the 21st century: perspectives, policies and practices from around the world (pp. 447–467). singapore: springer. doi: 10.1007/978-981-10-1673-8_24 kremer, m., & levy, v. (2008). peer effects and alcohol use among college students. journal of economic perspectives, 22(3), 189–206. doi: 10.1257/jep.22.3.189 kumpulainen, k., & sefton-green, j. (2012). what is connected learning and how to research it? international journal of learning and media, 4(2), 7–18. doi: 10.1162/ijlm_a_00091. laursen, b. (2018). peer influence. in w. m., bukowski, b. laursen, & k. h. rubin (eds.), handbook of peer interactions, relationships, and groups (2nd ed.) (pp.447-469). new york, ny: the guilford press. li, s., hietajärvi, l., palonen, t., salmela-aro, k., & hakkarainen, k. (2017). adolescents’ social networks: exploring different patterns of socio-digital participation. scandinavian journal of educational research, 61(3), 255–274. doi: 10.1080/00313831.2015.1120236 li, s., palonen, t., lehtinen, e., & hakkarainen, k. (2018). face-to-face contacts, facebook connections and academic support: adolescents’ networks between and across gender and culture in finland. young, 27(2), 1–17. doi: 10.1177/1103308818766773 li, y., lynch, a. d., kalvin, c., liu, j., & lerner, r. m. (2011). peer relationships as a context for the development of school engagement during early adolescence. international journal of behavioral development, 35, 329–342. doi: 10.1177/0165025411402578 manski, c. (1993). identification of endogenous social effects: the reflection problem. review of economic studies, 60(3), 531–542. doi: 10.2307/2298123 maul, a., penuel, w. r., dadey, n., gallagher, l. p., podkul, t., & price, e. (2017). measuring experiences of interest-related pursuits in connected learning. educational technology research and development, 65(1), 1-28. doi: 10.1007/s11423-016-9453-6 mcfarlane, a. (2015). authentic learning for the digital generation: realising the potential of technology in the classroom. london, uk: routledge. mcpherson, m., smith-lovin, l., & cook, j. (2001). birds of a feather: homophily in social networks. annual review of sociology, 27, 415–444. doi: 10.1146/annurev.soc.27.1.415 niemi, h., kynäslahti, h., & vahtivuori-hänninen, s. (2013). towards ict in everyday life in finnish schools: seeking conditions for good practices. learning, media and technology, 38(1), 57–71. doi: 10.1080/17439884.2011.651473 palfrey, j., & gasser, u. (2011). reclaiming an awkward term: what we might learn from “digital natives”. in m. thomas (ed.), deconstructing digital natives: young people, technology, and the new literacies (pp. 186–204). london: routledge. penuel, w. r., digiacomo, d. k., van horne, k., & kirshner, b. (2016). a social practice theory of learning and becoming across contexts and time. frontline learning research, 4(4), 30-38. doi: 10.14786/flr.v4i4.205 qing, l., & xin, m. (2010). a meta-analysis of the effects of computer technology on school students’ mathematics learning. educational psychology review, 22(3), 215–243. doi: 10.1007/s10648-010-9125-8 r development core team (2011). r: a language and environment for statistical computing. r foundation for statistical computing, vienna, austria. isbn 3-900051-07-0. retrieved from http://www.r-project.org/. rajala, a., kumpulainen, k., hilppö, j., paananen, m., & lipponen, l. (2016). connecting learning across school and out-of-school contexts: a review of pedagogical approaches. in o. erstad, k. kumpulainen, å. mäkitalo, k. p. pruulmann-vengerfeldt, & t. jóhannsdóttir (eds.), learning across contexts in the knowledge society. (pp. 15-35) rotterdam, the netherlands: sense publishers. riikonen, s., seitamaa-hakkarainen, p., & hakkarainen, k. (2018). bringing practices of co-design and making to basic education. presentation of the 13th international conference on the learning sciences. uk: institute of education, university college london. ripley, r. m., snijders, t. a., boda, z., vörös, a., & preciado, p. (may 2018). manual for rsiena version 4.0. oxford, uk: university of oxford department of statistics. retrieved from http://www.stats.ox.ac.uk/siena/. rubin, d. b. (1987). multiple imputation for nonresponse in surveys. new york, ny: john wiley & sons. rubin, d. b. (1996). multiple imputation after 18+ years. journal of the american statistical association, 91(434), 473–489. doi: 10.1080/01621459.1996.10476908 salmela-aro, k., muotka, j., alho, k., hakkarainen, k., & lonka, k. (2016). school burnout and engagement profiles among digital natives in finland: a person-oriented approach. european journal of developmental psychology, 13(6), 704–718. doi: 10.1080/17405629.2015.1107542 scott, j. (2000). social network analysis: a handbook (2nd ed.). london: sage. shin, h. (2018). the role of friends in help-seeking tendencies during early adolescence: do classroom goal structures moderate selection and influence of friends? contemporary educational psychology, 53, 135–145. doi: 10.1016/j.cedpsych.2018.03.002 shin, h., & ryan, a. m. (2014). friendship networks and achievement goals: an examination of selection and influence processes and variations by gender. journal of youth and adolescence, 43(9), 1453–1464. doi: 10.1007/s10964-014-0132-9 snijders, t. a. (2005). models for longitudinal network data. in p. carrington, j. scott, & s. wasserman (eds.), models and methods in social network analysis (pp. 215–247). new york, ny: cambridge university press. snijders, t. a., van de bunt, g. g., & steglich, c. e. (2010). introduction to stochastic actor-based models for network dynamics. social networks, 32(1), 44–60. doi: 10.1016/j.socnet.2009.02.004 steglich, c., snijders, t. a., & pearson, m. (2010). dynamic networks and behavior: separating selection from influence. sociological methodology, 40(1), 329–393. doi: 10.1111/j.1467-9531.2010.01225.x subrahmanyam, k., & greenfield, p. (2008). online communication and adolescent relationships. the future of children, 18(1), 119–146. retrieved on january 16, 2019, from www.jstor.org/stable/20053122 svensson, y., burk, w. j., stattin, h., & kerr, m. (2012). peer selection and influence of delinquent behavior of immigrant and nonimmigrant youths: does context matter? international journal of behavioral development, 36(3), 178–185. doi: 10.1177/0165025411434652 van buuren, s. (2018). flexible imputation of missing data (2nd ed.). boca raton, fl: chapman & hall/crc press. retrieved on january 16, 2019, from https://stefvanbuuren.name/fimd/ van buuren, s., & groothuis-oudshoorn, k. (2011). mice: multivariate imputation by chained equations in r. journal of statistical software, 45(3), 1–67. retrieved on january 16, 2019, from http://www.jstatsoft.org/v45/i03/. van rijsewijk, l. g., snijders, t. a., dijkstra, j. k., steglich, c., & veenstra, r. (2019). the interplay between adolescents' friendships and the exchange of help: a longitudinal multiplex social network study. journal of research on adolescence, 30(1), 63-77. doi: 10.1111/jora.12501 veenstra, r., & steglich, c. (2012). actor-based model for network and behavior dynamics. in b. laursen, t. d. little, & n. a. card (eds.), handbook of developmental research methods (pp. 598–618). new york, ny: the guilford press. wang, m. t., & degol, j. l. (2017). gender gap in science, technology, engineering, and mathematics (stem): current knowledge, implications for practice, policy, and future directions. educational psychology review, 29(1), 119–140. doi: 10.1007/s10648-015-9355-x wang, m. t., kiuru, n., degol, j. l., & salmela-aro, k. (2018). friends, academic achievement, and school engagement during adolescence: a social network approach to peer influence and selection effects. learning and instruction, 58, 148–160. doi: 10.1016/j.learninstruc.2018.06.003 wernholm, m. (2018). children’s shared experiences of participating in digital communities. nordic journal of digital literacy, 13(04), 38-55. doi: 10.18261/issn.1891-943x-2018-04-04 witkow, m. r., & fuligni, a. j. (2010). in-school versus out-of-school friendships and academic achievement among an ethnically diverse sample of adolescents. journal of research on adolescence, 20, 631–650. doi: 10.1111/j.1532-7795.2010.00653.x frontline learning research 1 (2013) 323 issn 2295-3159 corresponding author: suparna sinha, graduate school of education rutgers university, 10 seminary place, new brunswick, nj 08901, suparna.sinha@gse.rutgers.edu http://dx.doi.org/10.14786/flr.v1i1.1 3 | f l r conceptual representations for transfer: a case study tracing back and looking forward suparna sinha a , steven gray b , cindy e. hmelo-silver a , rebecca jordan a , catherine eberbach a , ashok goel c , spencer rugaber c a rutgers university, united states of america b university of hawaii, united states of america c georgia institute of technology, united states of america article received 10 march 2013 / revised 15 may 2013 / accepted 23 may 2013 / available online 27 august 2013 abstract a primary goal of instruction is to prepare learners to transfer their knowledge and skills to new contexts, but how far this transfer goes is an open question. in the research reported here, we seek to explain a case of transfer through examining the processes by which a conceptual representation used to reason about complex systems was transferred from one natural system (an aquarium ecosystem) to another natural system (human cells and body systems). in this case study, a teacher was motivated to generalize her understanding of the structure, behaviour, and function (sbf) conceptual representation to modify her classroom instruction and teaching materials for another system. this case of transfer was unexpected and required that we trace back through the video and artefacts collected over several years of this teacher enacting a technology-rich classroom unit organized around this conceptual representation. we provide evidence of transfer using three data sources: (1) artefacts that the teacher created (2) in-depth semi-structured interview data with the teacher about how her understanding of the representation changed over time and (3) video data over multiple years, covering units on the aquatic ecosystem and the new system that the teacher applied the sbf representation to, the cell and body. borrowing from interactive ethnography, we traced backward from where the teacher showed transfer to understand how she got there. the use of the actor-oriented transfer and preparation for future learning perspectives provided lenses for understanding transfer. results of this study suggest that identifying similarities under the lens of sbf and using it as a conceptual tool are some primary factors that may have supported transfer. keywords: transfer; technology; teacher learning; systems thinking s. sinha et al. 4 | f l r 1. introduction the aim of transfer research is to identify instructional conditions that prepare learners to apply what they have learned to new contexts. as designers of learning environments, we seek to create tools to facilitate transfer. we argue that one such tool is the use of conceptual representations to organize instruction by allowing students to develop a means to think about conceptual elements in a more generalised way (liu & hmelo-silver 2009). in addition, our prior research suggests that use of certain conceptual representations can promote understanding of complex systems. helping students and their teachers develop an understanding of complex systems is a difficult yet important component of scientific literacy (sabelli 2006). given the ubiquity of complex systems in the natural world, transferring ideas about complex system learning in one context to another is critical for the development of scientific thought. in many cases the behaviour of system components can affect its overall function, through emergent processes and localized interactions (jacobson & wilensky s2006). these interactions are often dynamic and invisible which make them difficult for learners to understand and present instructional challenges for teachers (feltovich et al. 2001; hmelo-silver et al. 2007). here we define systems thinking as being able to understand how bounded phenomena arise through considering the interactions and relationships among these interdependent structures, behaviours, and functions (hmelo-silver et al., 2007; nrc, 2012). there is evidence to suggest that students find it especially challenging to think about: (1) the interactions between visible and invisible structures, (2) the effect of their dynamic behaviours on overall functions, and (3) being able to extend their thinking beyond direct causality of complex systems (grotzer & bell-basca 2003; hogan 2000; hogan & fisherkeller 1996; jacobson & wilensky, 2006; leach et al. 1996; reiner & eilam 2001). in the research presented here, we investigate an unexpected case of transfer in a teacher as the learner who had been involved in a long-term classroom research project and appropriated the conceptual representation from the researcher-developed units to develop new instruction. this is particularly notable because learning about complex systems is often difficult (hmelo-silver et al., 2007). although our research focuses on the use of conceptual representations as a tool for learners, it also appears that it can be a tool for teachers to deepen their own understanding of complex systems (liu & hmelo-silver 2009; goel et al., 1996). specifically we discuss how structure-behaviour-function (sbf) served as a conceptual representation that promoted transfer across different complex systems (goel et al., 1996). structures are defined as the components of a system, behaviours as the mechanisms or processes that occur within a system and functions as system outcomes (goel et al., 1996; machamer et al., 2000). we developed technological tools using the sbf representation that make these features of complex systems salient (hmelo-silver et al., 2007; liu & hmelo-silver 2009; vattam et al. 2011). our study draws attention to a teacher‘s journey of understanding sbf as a conceptual tool, using it in the context of a technologyintensive science curriculum and her initiative to appropriate sbf as a conceptual representation beyond what we designed it for and use it meet local curricular needs. 2. research goals this study focuses on two main research questions: 1. how does a middle school science teacher develop her understanding of sbf as a representational tool? 2. how does generalization of sbf prepare her to make sense of a new complex system? specifically the focus of this study is to understand the means by which the teacher takes up opportunities to generalise her understanding of sbf as a representational tool to view similarities between two systems; one provided by researchers and one designated by the teacher. to understand the conditions that facilitated transfer, we need to view it through a lens that magnifies this teacher‘s learning trajectory. to s. sinha et al. 5 | f l r focus on the dynamic nature of transfer, we did not see a traditional model of transfer as a productive lens. traditional transfer researchers consider decontextualised expert knowledge, independent of how learners construe meaning in situations (cobb & bowers, 1999; greeno, 1997). because our objective was to highlight the processes the teacher used to understand and transfer a conceptual representation, we needed to consider alternative transfer models. such models should illuminate the interactions that were meaningful and engaging for the teacher and subsequently, led her to generalize her learning experience. 2.1 transfer through alternative lenses we consider transfer from both an actor oriented approach (aot; lobato 2004, 2006) and a preparation for future learning perspective (bransford & schwartz 1999) to investigate a teacher as a learner applying knowledge in a new curricular unit. lobato (2003, 2006) proposes that shifting from the observer‘s (expert‘s) perspective to considering how the actor (learner) perceives similarities between the new problem scenarios to prior experiences is a useful tool to understand transfer. evidence for transfer from this perspective is found by scrutinizing a given activity for any indication of influence from previous activities. moreover, we investigate how a greater understanding of sbf representations might have contributed to transfer from a preparation for future learning (pfl; bransford & schwartz, 1999) perspective. the pfl perspective focuses on the strategies used by learners in knowledge rich environments and their ability ―to learn a second program as a function of their previous experiences‖ (bransford & schwartz, 1999, p. 69). this provides a framework for evaluating the quality of particular kinds of learning experiences and the feedback they provide. feedback is a powerful factor in preparing students to make sense of instructional materials, to help them in knowledge construction and as a result facilitate transfer of skills needed to unpack novel problems (moreno, 2004; tan & biswas, 2006). like other alternative perspectives on transfer (e.g., konkola, tuomi-grohn, lambert, & ludvigsen, 2007), the classroom context and activity is an important factor in promoting transfer. we add to the transfer literature by exploring use of the sbf conceptual tool for abstracting systems thinking. that is, the conceptual tool can be used to make sense of complex systems by thinking about macro and micro level connections either independently or at multiple levels of intersections. we make the conjecture that sbf as a conceptual tool can serve as a focusing phenomenon, which makes it suitable for integrating the aot and pfl lenses of transfer as we describe in the next section. in this study, we investigate how the experiences that led to successful generalization of sbf as a conceptual tool prepared the teacher to keep refining her systems thinking. 2.2 supporting transfer through focusing phenomena lobato et al (2003) propose that focusing phenomena supports transfer by prompting students to generalize their learning. as a concept they define focusing phenomena as "observable features of the classroom environment that regularly direct attention to certain mathematical properties or patterns" (p.2). they attribute a combination of factors such as curriculum materials, artefacts, teacher‘s instructions as important for directing and focusing students' attention towards the intended content. in the context of this study, we extend the notion of focusing phenomena to science. we propose that sbf serves as focusing phenomena (figure 1) to advance systems thinking. it helps the teacher focus her attention on understanding connections between multiple structures, their functional roles within the complex system and the behaviours they exhibit. here we consider the importance of generalizing sbf as a tool for transfer. from an aot perspective, sbf as a focusing phenomena highlights what is similar between two complex systems i.e. the aquatic ecosystem (introduced by the researcher) and human digestive system (introduced by the teacher). it helps concretize the idea that biological systems are similar to ecosystems in terms of interacting at multiple levels. using this framework affords the teacher opportunities to focus on the s. sinha et al. 6 | f l r connections that exist between various organs of the digestive system. specifically, it directs the teacher‘s attention to the ways that ―structure and function in biological systems are causally related through behavioural mechanisms‖ (hmelo-silver et al., 2007, p. 308). the teacher‘s understanding of sbf in the classroom mirrors her understanding of systems thinking. this is important for us, as researchers, as it lets us trace the teacher‘s learning trajectory. from a pfl perspective, thinking in terms of sbf prepares learners to understand that behaviours are mechanisms and processes that enable structures to achieve their functions in biological systems (bechtel & abrahamson, 2005; machamer, darden, & craver, 2000). in the remainder of the paper, we present a case study that considers how several aspects of the learning environment influenced the teacher‘s generalization of sbf as a conceptual tool. figure 1. sbf as focusing phenomena. 3 a case of transfer: the instructional context this study is part of a larger research program, which is a technology-intensive curriculum unit centred on an aquarium based aquatic ecosystem. the curriculum provides multiple opportunities for learners to develop and deepen their understanding of sbf as a conceptual tool. first, technological tools such as the reptools toolkit (hmelo-silver et al., 2011) and the aquarium construction toolkit (act; vattam et al., 2011) were designed: (1) to help learners think about aquatic ecosystems in terms of structures, the functions they perform within the system and the behaviours they exhibit to perform the functions, (2) teach about the aquarium ecosystems using sbf as a conceptual tool for a period of 4 years, and (3) engage in active discussions about the concept and ways to teach it with the research team present daily in the classroom and at the annual professional development workshops. 3.1 sbf tools the reptools toolkit includes a function-oriented hypermedia (hmelo-silver et al., 2007; 2009; liu & hmelo-silver, 2009) organized in terms of sbf representation and net logo computer simulations (wilensky & reisman, 2006). the hypermedia (figure 2) introduces the aquarium system with a focus on functions and provides linkages between structural, behavioural and functional levels of aquariums. it is organized around what, how, and why questions which correspond to structures, behaviors, and functions. http://www.youtube.com/watch?v=y0n9jectfuu&feature=youtube_gdata http://reptools.rutgers.edu/%20startpage.html s. sinha et al. 7 | f l r figure 2. aquarium hypermedia. two netlogo simulations allow learners to explore macroscopic processes of fish reproduction (i.e., the fishspawn simulation, figure 3a) as well as microscopic processes (the nitrification simulation, figure 3b) that represent the chemical and biological processes in the aquarium. the simulations provide a context for learners‘ investigation of the aquatic ecosystem. they afford opportunities for designing experiments, manipulating variables, making predictions, and discussing conflicts between predictions and results. each simulation allows learners to explore key features that are relevant to the process of fish spawn or nitrification cycle. figure 3a. macro levelfish spawn simulation. http://reptools.rutgers.edu/revisedfishspawnmodel.html http://reptools.rutgers.edu/revisednitrificationmodel.html s. sinha et al. 8 | f l r figure 3b. micro level – nitrification simulation. the second component to the learning environment, act is designed to promote construction of sbf models (vattam et al., 2011). models can be constructed either in a table (figure 4a) or graph (figure 4b) format. the model table focuses learners‘ attention on thinking about various structures in an ecosystem. the three column table affords the opportunity for learners to think about the structural components, their multiple behaviours and functions. this is valuable because learners get an opportunity to understand both individual mechanisms in the system and the meta-level concepts related to complex systems. figure 4a. sample act model table. s. sinha et al. 9 | f l r figure 4b. sample act model graph. the act model graph is a platform for learners to create models of their evolving understanding of ecosystem processes in terms of sbf. as students read through the hypermedia, generate and test their hypotheses with the simulations, they integrate the critical structures with their behaviours and functions in act models. 3.2 methods we used a case study approach to characterize how a science teacher, ms. y, appropriated her understanding of sbf as a representational tool and applied it to make sense of a new complex system. case study methodology allowed us to use multiple data sources to study this complex phenomenon in context (e.g., stake, 1998; yin, 2009). borrowing from interactional ethnography (castanheira, green, & yeager, 2009) we began at the end—the sbf hypermedia that ms. y constructed. the unit of analysis for this case is the individual teacher in her classroom context over several years. through this approach, we used multiple sources of data to trace the social and cognitive events that occurred over time and led ms. y to see sbf as a tool she could appropriate for her teaching practice. although this was not an ethnography we borrowed the logic of this inquiry approach to understand how an individual within a social context constructed particular knowledge over time (bridges, botelho, green, & chau, 2012). 3.3 context ms. y taught seventh grade science at a public middle school in north east united states. she had been teaching science for 26 years and had a bachelor‘s degree in elementary education. this study was part of a larger 4-year study focused on teaching middle school science students about aquatic ecosystems. ms. y participated in annual professional development (pd) workshops. the pd focused on concepts related s. sinha et al. 10 | f l r to aquatic ecosystem, analysis in terms of sbf and the technological tools that she would need to use in her classroom. during the pd, ms. y. had the chance to share her pedagogical challenges and experiences, such as difficulties in using the software or teaching about sbf as a conceptual tool. ms. y had been using the reptools and act in an aquarium curriculum for four years when she informed us that she wanted to develop her own instructional tools using the sbf representation to teach about cell and human body systems. this prompted her to collaborate with her colleague, another science teacher, ms. t. together they used microsoft power point to create a human body system presentation, modelled after the function-centred aquatic hypermedia. we refer to it as the teacher-created hypermedia. given their limitations in terms of technical knowledge in designing a hypermedia similar to the one we had created, the teachers hyperlinked key words in their power point presentation and follow up questions to point to relevant slides. ms. t also taught seventh grade science in the same school. she was a new teacher with one year of teaching experience. ms. t had a science education background. while she collaborated with ms. y, she also attended the annual pd and implemented the same technology intensive curriculum on aquatic ecosystem in her classroom. each teacher taught four diverse seventh grade classes with approximately twenty-five students in each section. during the curriculum implementation the students were grouped together in small heterogeneous groups. 3.4 data sources we had three primary sources of data. first was the artefact that the teacher created (this indeed was the impetus for our research). second, we conducted an hour-long semi-structured interview with the two teachers, ms. y & ms. t. finally we collected video data of classroom interactions. these videos were drawn from classroom data from a long-term (i.e., four year) research project. these helped us to understand: (1) why the teacher transferred her generalizations of sbf representations to new instructional domains and (2) how she transferred these understandings. we interviewed ms. y & ms t approximately two months after ms. y completed teaching about both systems. the primary focus of the interview was to understand how she conceived the idea of extending the computer-based representational tools beyond what was expected from her, the influence of her prior knowledge during this process, and her attempts to prepare herself to solve new challenges. following powell, francisco and maher‘s (2003) recommendations for video analysis, we reviewed video data to identify critical events. in an attempt to trace and track the nature of ms. y‘s generalizations of sbf we selected representative clips of critical events from her classroom that demonstrated evidence of her developing understanding and generalization of sbf representations as a tool to teach about another complex system. these video clips included whole class discussions that ms. y had with her students while: (1) introducing the sbf representation for the aquatic ecosystem in year 3 (i.e., the year before she created the digestive system unit), (2) introducing the sbf representation for the aquatic ecosystem in year 4 i.e. the year she employed the digestive system unit, and (3) explanation of sbf representations and modelling of the digestive system unit. we viewed a total of nine clips that consisted of three classroom interactions for each of the three kinds of whole class discussions. 3.5 analysis we examined classroom interactions that highlight ms. y‘s learning trajectory with sbf as a representational tool. the video data were analysed using interaction analysis (ia; jordan & henderson 1995), which involved collaborative viewing of video clips by six members of the interdisciplinary research team. we successively conducted nine ia sessions to collaboratively review the selected video clips, describe observations, and generate hypotheses. any differences in opinions were resolved by discussions. s. sinha et al. 11 | f l r this helped ensure the trustworthiness of our interpretations through the initial independent interpretations of the ia session participants and the subsequent discussions. during the ia sessions we focused our attention on two specific aspects of ms y‘s practice. first, we paid attention to patterns and variations in the ways that she introduced the sbf as a conceptual tool in relation to the aquatic ecosystem across the four years. specifically, we examined her explanation of the concept, the analogies she presented and whether or not she sought help from any external resources, such as researchers in the classroom or ms t. second, we focused on how she introduced sbf as a conceptual tool in the context of the human body unit. at this time we made comparisons between the ways the topic was introduced in the aquatic ecosystem with the human body system. we also looked for similarities in terms of analogies. in particular, we wanted to understand if and how her prior knowledge of sbf prepared her to discuss this particular complex system with ease and confidence. to gain a holistic perspective of the teacher‘s journey we also examined the interview transcript. we looked for themes related to the mechanisms by which transfer occurred in the ways in which the teacher constructed similarities between aquarium and digestive systems. this allowed us to triangulate the teacher‘s perspective with the ia and artefact analysis. 4. findings based on our analysis of the interview and video data we identified themes related to aot or pfl perspectives. these findings helped strengthen our understanding of the processes ms. y used to generalize sbf as a representational tool and observe how it prepared her for the transfer. the aot perspective provided a framework to trace ms. y‘s evolving understanding of using the sbf lens as a tool to make sense of aquatic ecosystem. the pfl perspective demonstrated how ms. y transferred and used her knowledge of sbf to make sense of a complex system that was outside the scope of our research. 4.1 tracing and tracking ms y’s understanding of sbf from an aot lens 4.1.1 orientation to the sbf representation led by the teacher ms y‘s journey began with using the act tool. the act technology enabled construction of sbf representations using the model table (figure 4a). the tool introduced the students and ms. y to the language of sbf representations. initial data analysis of the whole class video revealed that the teacher‘s introduction of the sbf representation played a critical role in students‘ conceptual understanding of the complex system. she presented the idea that the sbf representations captured interconnected entities within a complex system while completing the act table: 1. ms. y: alright, so the first thing yours say is fish right? so, lets go back and tell me what is the behaviour of the fish? 2. student: releases waste. 3. ms y. ok. so the fish releases waste. right? alright, so it, it releases what kind of waste? 4. students (in unison): ammonia. 5. ms. y: right, so you have that in there right? now. what is the function? 6. students (in unison): remove toxins from the body. 7. ms. y: okay. so we want to get these things out of the fishes‘ body. now next, the next one is what? 8. students: ammonia. s. sinha et al. 12 | f l r 9. ms. y: ammonia. so, put, put ammonia here. alright, so now, what is, what is the behaviour of the ammonia? what‘s it do if you look at it in the tank? 10. student: water? 11. ms. y: yeah, it‘s just floating around right? what‘s its function do? it‘s food for bacteria. so it has its purpose right? so the next one on our list which is blank on yours will be what? in this excerpt, ms. y drew the students' attention to the functions and behaviours of various structures present in the aquarium. the students identified structures such as fish (turn 1) and ammonia (turn 8). next she prompted them to think about their behaviours and functions. in turn 2, the students responded that the behaviour of the fish is to release waste. she pushed them to think in detail about the kind of waste (turn 3) and the function or overall purpose of this behaviour (turn 5). in turn 11, she clearly articulated that structures have a function within complex systems. although this is a somewhat mechanical application, it also allowed her to begin to see how the sbf lens might serve as a tool for understanding systems. we speculate that this discussion prepared both the students and ms y. to use the sbf conceptual representation to understand the interconnectivity between various structures within complex systems. this initial understanding of sbf as a representation may have prepared ms. y to appropriate sbf as tool when she collaborated with her colleague to create a new learning tool i.e. (the teacher-created hypermedia). 4.1.2 teacher-created hypermedia just as the orientation to sbf was the starting point, the artefact that ms. y created at the other end bound the case study. ms y., in collaboration with her colleague ms. t, created new hypermedia in the form of an interactive powerpoint of the cell and body systems mirroring the aquarium hypermedia developed by the research team (figures 5a and 5b). the teachers‘ hypermedia outlined the different structures in the system along with orienting why and how questions. the how questions were directed towards behaviours of system components and the why questions focused on functions. the teachers created this hypermedia as a learning resource to help students connect cell systems to larger body systems. the research team did not plan either the body system hypermedia or the use of modelling these systems using the act software; the teachers did this of their own volition. figure 5a. researcher-developed hypermedia. figure 5b. teacher-developed hypermedia. the development of the cell hypermedia demonstrated multiple ways by which ms. y generalized and transferred her understanding of sbf as a conceptual tool. first, understanding the sbf of the aquatic ecosystem prepared her to teach it better in successive years and second, she was able to modify the learning environment (i.e., by changing them physically–from an aquarium hypermedia to a cell hypermedia and by seeking resources) into something that was more compatible with her current goals. s. sinha et al. 13 | f l r 4.1.3 identifying similarities through sbf representations ms. y‘s initiative to extend and appropriate our research and develop additional classroom instruction suggested that the sbf representation was becoming a tool for her to see similarities across complex systems. adopting an aot perspective helped us understand how she constructed similarities between what she had been teaching for several years (the aquatic ecosystem) to the current unit she developed (cell and body systems). this perspective helped us recognize which connections she made, on what basis, and how and why those connections were productive (lobato, 2004). for example, consider ms. y‘s response when asked about the utility of their hypermedia during the interview session: right, and it's a hard concept to get. so, what we were thinking about is like the kids actually think when they eat food it breaks down and then leaves the body. they don't get that the food has to go to the cells and the cell actually works and creates energy from this food and then there's a waste and it sends that back to the body for it to be excreted. so we're trying to give them not only the names of the parts and what each part does individually but how it needs to work-...and we're doing the behaviour not only of the cell itself but behaviour of all the systems and then the behaviour of the whole body. and the cells are all part of that whole body. this highlights that ms. y understood that the cells were an integral part of the body systems and could not be taught in isolation. earlier, she noted that systems in the body are not disconnected and have complex mechanisms that allows for higher order operation. this provided evidence that she now understood how structures within a system perform multiple behaviours in order for it to function effectively. the ia results showed how ms. y introduced the sbf representation and refined her thinking over multiple years. 4.1.4 refining the sbf representation as a conceptual tool from an aot perspective we needed to track ms. y‘s transition from her initial naïve ideas about sbf representations to a more expert conception. the results from the ia indicated that ms. y‘s understanding of the sbf representation as a conceptual tool changed. she used several distinct strategies to introduce the topic of complex systems ranging from discrete (i.e., in years 1, 2 and 3), to acknowledging complexity (in year 4), and finally providing a systems perspective with her new cell/body unit. in the first three years, she introduced to the sbf representation to her students by mentioning the new terminology being used to understand the aquatic system. however, she introduced structures, behaviours, and functions as discrete constructs. in year 4, she espoused a coherent view of sbf representation as a conceptual unit. later that year, while introducing sbf in the context of the unit on cells and body she explained sbf as a system, complete with nested and interconnected subsystems. 4.1.4.1 year 3: sbf representations ms. y‘s early introduction to sbf representation suggested a focus on linear connections. this was shown by the way in which she filled out the act sbf table (figure 4a) in front of the classroom. as a way to connect ideas about sbf she drew clear conceptual lines between one structure at a time and all the behaviours exhibited by that structure as the following example shows: we just named them all yesterday. the heater, the fish, the plants. those things are called the ‗structure‘. the next word we're gonna use is ‗behaviour‘. the behaviour is what the fish do. what do the things do in the tank? and the next word we're gonna use is ‗function‘ okay? so what i want to do today is to start with structure and behaviour. so, i made a chart and the first column is the structure, or the parts. so everyone write down one of the things in the fish tank is fish and the second column i wrote was behaviour, and the third column i wrote was function. we're going to start with this second column that is behaviour. when i ask you the behaviour of something, i want to s. sinha et al. 14 | f l r know is what does it do? "what do fish do?" swims, eats, breathes, and poops. okay, all fish swim. that is their behaviour okay. they swim. what else do fish do? here ms. y. described the meaning of the term ―behaviour‖ somewhat superficially as ―what fish do‖ rather than the more expert mechanistic view. she established linear connections between the structure (fish) and the multiple behaviours (swims, eats, breathes, poops) that this structure exhibits. after promoting an understanding of the behaviour exhibited by the structure (fish), she then drew another relationship between each individual behaviour in the last column to indicate the behaviour‘s function. 4.1.4.2 year 4: sbf representations are interconnected over time, ms y‘s introduction to the sbf representation became richer and more complex. in the excerpt below taken from a whole class discussion in year 4 she described structures, behaviours and functions as interconnected entities within a system, rather than discrete elements on a worksheet: 1. ms. y: okay, now, let's do the filter. i'm gonna do the filter with you and then you're gonna do one on your own. all right, so what does the filter do? what does the filter do? jim what does the filter do? 2. jim: um, cleans out the tank 3. ms. y: cleans the tank. or cleans the ―what part of the tank?‖ 4. jim: the water in the tank? 5. ms. y: all right, so the filter will clean the water. okay? now, why does it clean the water? 6. jim: so it can put more oxygen into the water? 7. ms. y: no. that's another thing that it does. it actually, because it's spinning around, because it's spinning like this, it's actually, one of the things it does…is it adds oxygen to the water. now, this part here, why does it do it? first of all, i want to stop right here. the filter is this big grey thing here. right? now, first of all, how does it work? what's this big tube doing? [points to picture of filter on the screen.] 8. pat: sucking up the water 9. ms. y: sucking up the water. then the water comes up here, right? and it gets sucked up and it goes back here and it pours back down. when it flushes back over that's when the oxygen from the air can get pulled back into the water. okay, so howyou said it cleans the waterhow does it do this? 10. pat: well, it has the filter. the filter has like chemicals and stuff. 11. ms. y: what do you think is in this bag? 12. pat: bad stuff 13. ms. y: well, eventually the bad stuff is going to get in here, but actually there's charcoal in here, gravel in here. and then when the water flows through it, can it catch all the big chunks? maybe the fish faeces and stuff like that? so, and then see how it spins back down here? water splashes and it's pulling in the oxygen. so now, all right so now, why does it clean the water? what is the point of cleaning the water? after turn 13, the class went on to discuss the fish and the plants, how the filter aerates the tank and how it affects the whole system. in turns 3 and 5 when ms. y discussed the behaviours (the mechanism that cleans water in tank) and function of the filter (by collecting faeces from fish) she was guiding students‘ answers to structure, behaviour, and function simultaneously and filling in the chart appropriately, stressing relationships rather than focusing on any one aspect in isolation. turns 6-12 show that ms. y used student response to generate more questions that linked what and why questions throughout her classroom discussion, highlighting the system complexity. 4.1.4.3 year 4: sbf representations at multiple levels of complex systems s. sinha et al. 15 | f l r later in the same year, when introducing her unit on the cells to the class, ms y emphasized that sbf works as a whole across multiple levels of complex systems. as the next excerpt shows, she did so not directly, but more discretely through leading questions: 1. ms. y: eventually what we want [the researchers] to do for us is allow us to model systems within systems. what happens if i can click on the cell and zoom in on that and put the cell parts in there? because they don't have the ability to zoom right in on that one part, are there any ideas on how to connect the cell through modelling to the other body systems? because you also want to go and look at the function. what do you think? 2. lucia: umm, what about if you like umm put a picture of the cell. 3. ms y: yeah but i want to drive everything to the cell because that's, you know, the whole body operates to get things to the cell you know that right? but then i also want to show what the cell does inside once you send the food there. so how can i show that part…on this graph? okay. you know how this is a system. the body parts and the cell is its own little mini system, how can i show the stuff inside the cell? should i circle all the mitochondria right around the cell? or should i pull the cell out and make that part separate? … these demonstrate how ms. y refined her thinking about sbf as a conceptual tool. whereas earlier, her focus was primarily in working with the aquatic ecosystem, she later introduced a new level of complexity by introducing the idea that there exists multiple ‗mini systems‘ within the human body system. she still focused largely on structures but she also made connections to behaviours and functions. in addition, she helped students understand that one structure may have multiple behaviours and functions (in turns 1 and 3). comparing her sbf representation of the cell system here to that of the aquatic ecosystem in the earlier unit, she presented it to the class as a coherent system rather than discrete sbfs. in addition, when applying the sbf representation to the cell, ms. y introduced a meta-perspective by explicitly explaining that the task was to represent their ideas through modelling (in turn 1). moving away from the isolated task provided in earlier (i.e., filling out the table by first listing structure followed by behaviour, and then function), ms. y explained that the students were organizing their knowledge in model graph. by placing emphasis on the modelling tool and providing students with the starting point of the structure, the cell, ms. y explained that the task was to develop a representation of their ideas about the human body system, using the table to organize their ideas and providing the students with leading questions that she had provided earlier when talking about the sbf representation in the aquarium unit. this transition suggested that ms. y was an active learner herself. she frequently asked questions to the research team and ms. t, to refine her understanding. this practice of asking questions had two effects. first, it helped ms. y identify and address the gaps in her understanding, which prepared her for future learning. second, it shed light on the processes that she as an actor (learner) used to construct similarities between the aquatic ecosystem and cell system. 4.2 experiences to promote transfer from a pfl perspective 4.2.1 recognition of teacher as a learner in the interview, ms. y indicated that since the beginning of her involvement in the project, her knowledge continually developed. she explained that she was the primary source by which information was passed from the research team to the students and that over time she felt that she became more competent in this role. in the interview, she acknowledged her lack of mastery over the content and was aware that she refined her ideas of the sbf representation and the aquarium unit which lead to development of the new unit: okay, my knowledge of this still develops every year because it‘s knowledge that [research team leader] had and ityou knowwas her angle on something and then i had to try to understand what s. sinha et al. 16 | f l r was going on in her head. so it's taken me many years of practice and talking to [research team leader], talking to [researchers in the room], to kind of get this. and i still do not feel like i'm really solid on it, but i get it more and more each year. these statements demonstrated that ms. y saw herself as a learner in her classroom as she was looking critically at her current knowledge and beliefs. this experience prepared her to deepen her understanding of the content, and revise her ideas as she gathered new information. 4.2.2 collaboration facilitates generalisation the collaboration aspect was beneficial during the inception, design, and construction of the teacher created hypermedia. together they went beyond our research agenda by using sbf as a conceptual tool to create a power point presentation of human body systems. it afforded opportunities for sense making and focus on critical aspects of complex systems while working with the tools (figure 6). as ms. y talked about the creation of the cell hypermedia, she revealed that she was highly motivated to do so because of the potential for feedback and interaction with ms. t. for example, when asked how the idea came about and the variables that affected the development of the new tool, ms. y responded: so then i kind of realized that what i needed to do was give her [ms. t.] my idea and then hear from her what she would add to that and in turn that wouldi would take what she added into my lesson, so one of us throws out like a main idea and then the other one builds upon that main idea and then we get a better idea. and that's how i think that the hypermedia came along. because this whole concept has been in my head for a long time, about how kids don't understand the whole body and the cells connection to the body. so i talked about it with ms. t and then she started talking about making a hypermedia and then we went back and forth on how we we're going to do it. figure 6. affordances of the learning environment that promote sbf thinking. from a pfl perspective, people seeking multiple viewpoints about issues may be one of the most important ways to prepare them for future learning (bransford & schwartz, 1999). it is clear from this excerpt that ms. y. felt it useful that she could exchange her ideas and collaborate in the creation of the new hypermedia with ms. t. this finding suggests that ms. y. was able to see the possibilities for transferring her understanding of the sbf representation. however, this transfer was dependent on the idea of using hypermedia itself as a s. sinha et al. 17 | f l r way to organize complex content in addition to the sbf representations. our next set of results focus on elaborating how she used the aquatic hypermedia to guide her thinking about designing for another complex system. 4.2.3 appropriating salient features of the aquarium hypermedia when asked about what parts of the hypermedia she found useful in her own development, ms. y felt that working with the same aquarium hypermedia for four years allowed her to incorporate some of the key features in the hypermedia she created. although her hypermedia did not possess the technological and conceptual sophistication of the aquarium hypermedia, it prepared her for refining her understanding along a trajectory of increasing expertise. this process was important from an aot perspective as it enabled her to see the connections between two situations by identifying the salient features from the earlier hypermedia environment (lobato, 2004). it is notable that she transferred other features of the hypermedia structure beyond sbf, including the use of guiding questions as well as the use of short pieces of text accompanied by simple and relevant graphics: i would say that i definitely liked how each question lead to another question because that's how we modelled ours was every question gave an answer but then lead to another question and another question and another question…. we also used just short pieces of information because i think the kids get bored if you put too much it's overwhelming. we used pictures and then we also had it not only lead to different the next one and the next one but it bounced back sometimes a design in the hypermedia too. from the interview it is clear that ms. y drew upon relevant features of the aquarium hypermedia. although her rationale for keeping a short text was different from what we had in mind while designing the aquarium hypermedia, this process of experimentation also helped her clarify her own thinking about the concepts that she is placing within the new hypermedia contexts (bransford et al., 1990). 4.2.4 approaching act to model a new system in addition to appropriating aspects of the aquarium hypermedia, ms. y also appropriated the act tool so that students could model body systems in the same fashion as they had for the aquarium system (figures 7a and 7b). figure 7a. digestive system act table view. s. sinha et al. 18 | f l r figure 7b. digestive system act graph view. the following excerpt highlights ms.y‘s journey of trying to understand how to use sbf as a conceptual tool, the act technology itself and feel comfortable using it to teach by herself: at first she (research team leader) came and she was just testing the kids‘ knowledge and that i was not really involved. and then … we originally started talking about the cell and the body as that was an area she worked in, and then she got the idea of the respiratory system because that slowly developed into … the net logo and the hypermedia. back then structure, function and behaviour i think for me was all just disjointed. all the pieces were here and i was just trying to keep up with her. and then … the act program helped a lot because it sort of put everything together for me in the end, like okay, here's all the knowledge that the kids have been getting along the way, here is proof that they got it. and for me it was just a slow process of absorbing everything and you know kind of understanding it until i could you know turnkey it and then we could turn around and together make another hypermedia with it. this exemplified the importance of the act software as a capstone to allow for students‘ and ms. y‘s understanding of the new system be made explicit. in the interview ms. y recalled that in the beginning of the research program (i.e., years 1 and 2), her understanding of the framework was ―disjointed‖. she attributed the act modelling toolkit to prepare her to create the human body system hypermedia. it appeared to help her think about interconnections between structures, their functions and visible behaviours. this example from the interview, and the classroom task of modelling body systems in act, indicated that ms. y possessed the confidence to organize the new ideas generated by her hypermedia into sbf terms using the tool and the importance. additionally it also highlighted her ability to appropriate the act tool as the final classroom task to evaluate knowledge generated by the hypermedia as a way to organize student ideas about complex systems. 4.2.5 preparing to ask sbf oriented questions a critical aspect of transfer of the sbf framework involved being able to make sense of the new complex system in terms of "what", "how" and "why" questions. the act modelling table (see figure 4a) prepared learners to think about the aquatic ecosystem in terms of sbf by answering questions related to "what", "how" and "why." it was evident that questions related to "what" pertain to visible and invisible structures that determine key variables of the aquatic ecosystem. because the learner had to only identify s. sinha et al. 19 | f l r relevant components in the first column, it involved an important but superficial level of system understanding, unless it led the learner to consider why and how it performs specific actions in context to the aquatic ecosystem. video analysis in year 1 revealed that although ms. y discussed the role of functions and behaviours, she was more comfortable labelling the aquatic ecosystem in terms of its relevant components. this was apparent, as she would begin the class with "what" questions. if the students gave her the expected answer she would make an attempt to elaborate on it. but when the students gave incorrect answers, she just ignored the response. as a result the students were not encouraged to share their confusion with the class in terms of why they thought so and how they came to the conclusion. during the year we observed that ms. y consistently asked more "what" questions. this prompted the students to give single word responses. the students also noticed that the teacher expected them to give short answers that did not call for detailed explanations. this indicated that ms. y was hesitant to open the discussion for an in-depth systems thinking conversation that focused on sbf relations. it was likely that at that stage her idea about complex systems was focused on identifying relevant structures. we observed a slightly different trend in year 2. although the "what" questions dominated the whole class discussions, students were also asked to think about possible interactions or connections between structures. as the students identified such relationships, ms. y led the discussion on ―how‖ questions by writing down behaviours that connected structures. video analysis indicated that in years 3 and 4, ms. y appeared to be confident in discussing the aquatic ecosystem in terms of a complex system, interconnected by visible and invisible components as this next example shows: 1. ms. y: yes, anybody have something else, let‘s put another living thing in there. what do you have? 2. jaden: microorganisms. 3. ms. y: okay. so what are microorganisms? 4. jaden: they clean up the waste. 5. ms. y: what do you mean they clean up waste? 6. jaden: they eat. 7. ms. y: ok, the next problem is function. these particular structures do a particular behaviour and that behaviour fits in a little bit more into the whole picture. think why does it need to do this behaviour for it, why do the fish need to swim? this excerpt shows that ms. y opened the discussion by asking the class to identify structures connected to the aquatic ecosystem. next she drew their attention to thinking about their behaviours. as soon as the class discussed some behaviours, she asked them to think about behaviours in context to their functional role in the aquarium based aquatic ecosystem. ms. y was able to build upon her prior understanding of sbf as a conceptual tool. 5. discussion as we seek to understand transfer, we must address questions related to the ―what‖ and ―how‖ of transfer. that is, we need to articulate the exact nature of the content or ―what‖ is being transferred. equally important is identifying the mechanisms or the ―how‖ that is responsible for this transfer to occur. we suggest that we can accomplish these goals through the integration of aot and pfl perspectives on transfer. we used aot to reach backwards and see how the similarities were constructed, whereas pfl allowed a look forward at how applying sbf prepared ms. y for her future learning and practice. the case study findings showcase how different perspectives on transfer allowed us to understand how participation in a research project driven by principles of learning empowered a teacher to appropriate these tools in her own practice, going beyond the research project context. s. sinha et al. 20 | f l r this case study suggests that sbf as a conceptual tool has potential for making sense of complex systems. we propose that using sbf as a focusing phenomena (lobato et al., 2003) is a mechanism that facilitates transfer. sbf was a lens through which ms. y could see the relationship between systems and prepare her to learn about new systems. our findings demonstrated the processes adopted by the teacher to generalize her understanding of sbf. this included an initial superficial engagement with sbf that she deepened and refined over several years and her own reflectiveness in seeing herself as a learner. in addition we discussed the influence of the social environment and technological affordances that appear to prepare her for transfer. the additional viewpoints of ms. t. and the conversations with the research team suggest that collaboration is important in preparing for transfer. having a general-purpose tool that she could repurpose to use for a new unit was instrumental in this process. finally, she was able to use the hypermedia that the research team had created as a worked example that allowed her to explore the content and how sbf could be applied to a new domain. from a pfl perspective, these results shed light on specific processes and challenges that ms. y had to overcome. specifically we were keen to understand what it took for a teacher to acquire mastery over using a conceptual tool in one context and be prepared to use it to solve a problem in a different context. the findings indicate that the sbf representation focused the teachers‘ attention on the behavioural connections and functional roles of components within complex systems. it prepared them to think about the actions or ―how‖ components behave within a complex system in relation to their overall functions. both teachers reported that this was useful when they started working on creating the hypermedia on digestive system. although the teacher-constructed hypermedia lacked the technical sophistication of the researcher created hypermedia, the teachers made productive use of a technology they were familiar with, a power point presentation. the teachers also successfully incorporated key features of the aquarium hypermedia such as leading questions, short descriptions and use of images. their interview responses indicated that their prior experience with the aquarium hypermedia drew their attention to these features. this prepared them to be efficient and effective with their own hypermedia design. both these processes (i.e., creation of the new hypermedia and thinking in terms of behaviours, in addition to structures and functions) were vital as ms. y was able to revise her knowledge and beliefs, which set the stage for her to analyse and appreciate critical features of the new information presented to her (bransford et al. 1990; moore & schwartz, 1998). this process of analysing her beliefs and strategies also highlights the active nature of transfer, which is an important part of pfl. the initiative she took in applying her sbf representation understanding to teaching a new unit demonstrates her ability to revise and rethink the current situation to suit her current goals. from a pfl perspective this is valuable as it reveals the importance of activities and practices that are beneficial for ―extended learning‖ rather than on one-shot task performances (bransford & schwartz, 1999). our study also extends the transfer literature by proposing new ways for understanding teacher learning trajectories. as we observed ms. y‘s transition over multiple years, our focus was on the processes she followed during this transition rather than assessing mastery over content knowledge. in terms of learning trajectories, our results highlight the fact that ms. y was looking critically at her knowledge and gradually developed a deeper understanding in that content area. data analysis from earlier years revealed a limited understanding of the sbf representation as a conceptual tool. however, she actively sought resources (fellow colleague, ms. t and researchers present in the classroom) to help her understand the interconnections between multiple structures, their functions in the system and visible and invisible behaviours. her increasing confidence in the content area, coupled with collaboration, resulted in her being highly motivated to extend the research tools to other areas of her classroom practice. this case study provides an existence proof that aot and pfl can be used to explain a single case of transfer. it is important however to consider the limitations from a single case (yin, 2009). although we cannot rule out all possible rival explanations, we triangulated data from multiple data sources and included researchers with a range of disciplinary backgrounds and experience in the interaction analysis. other members of the research team who were not involved in the ia sessions reviewed the examples and interpretations that were presented here. we acknowledge that further research in complex classroom environments is needed in order to generalize these findings. because of the importance of the social interactions and feedback that ms. y received from teaching her students (e.g., okita & schwartz, in press), s. sinha et al. 21 | f l r it is unlikely that a purely cognitive explanation could account for these results. the analysis presented in this study suggests the possibilities of extending research on alternative approaches to transfer (lobato, 2006; bransford & schwartz, 1999; van oers, 1998). these new approaches to transfer suggest a much more complex and dynamic process than traditional cognitive accounts. our results also suggest that different theoretical frameworks can be productively integrated in providing accounts of transfer. in our case, teacher adoption and appropriation of a learning framework was an exciting by-product of scholarly research because it provides evidence that classroom innovations can be appropriated and sustained. keypoints sbf as a focusing phenomena is a mechanism that facilitates transfer. it acts as a lens through which the learner can see the relationship between systems and prepare them to learn about new systems. there are possibilities of extending research on alternative approaches to transfer (lobato, 2006; bransford & schwartz, 1999; van oers, 1998). these new approaches to transfer suggest a much more complex and dynamic process than traditional cognitive accounts. different theoretical frameworks can be productively integrated in providing accounts of transfer. acknowledgements this research was funded by institute of education sciences (ies) grant # r305a090210. conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of ies. we thank the participating teachers without whom this work would not be possible. an earlier version of this research was presented at the 2010 international conference of the learning sciences. references bechtel, w., & abrahamson, a. (2005). explanation: a mechanist alternative. studies in the history and philosophy of biological and biomedical sciences, 36, 421-441. bransford, j. d., vye. n„ kinzer, c, & risko, v. (1990). teaching thinking and content knowledge: toward an integrated approach. in b. f. jones & l. idol (eds.), dimensions of thinking and cognitive instruction: implications for educational reform. hillsdale, nj: erlbaum. bransford, j. d. & schwartz, d. l. (1999). rethinking transfer: a simple proposal with multiple implications. review of research in education, 24, 61-100. bridges, s., botelho, m., green, j. l., & chau, a. c. m. (2012). multimodality in problem-based learning (pbl): an interactional ethnography. in s. bridges, c. mcgrath & t. l. whitehill (eds.), problem-based learning in clinical education (pp. 99-120). dordrecht netherlands: springer. castanheira, m. l., green, j. l., & yeager, e. (2009). investigating inclusive practices: an interactional ethnographic approach. in k. kumpalainen, c. e. hmelo-silver & m. césar (eds.), investigating classroom interaction: methodologies in action (pp. 145-178). rotterdam: sense publishers. cobb, p., & bowers, j. s. (1999). cognitive and situated learning perspectives in theory and practice. educational researcher, 28, 4-15. darling-hammond, l., & mclaughlin, m. w. (1995). policies that support professional development in an era of reform. phi delta kappan, 76, 597-604. feltovich, p. j., coulson, r. l., & spiro, r. j. (2001). learners‘ (mis)understanding of important and s. sinha et al. 22 | f l r difficult concepts. in k.d. forbus & p. j. feltovich (eds.), smart machines in education: the coming revolution in educational technology (pp. 349–375). menlo park, ca: aaai/mit press. greeno, j.g. (1997). response: on claims that answer the wrong questions. educational researcher, 26, 517. goel, a. k., gomez de silva garza, a., grué, n., murdock, j. w., recker, m. m., & govinderaj, t. (1996). towards designing learning environments -i: exploring how devices work. in c. fraisson, g. gauthier & a. lesgold (eds.), intelligent tutoring systems: lecture notes in computer science. ny: springer. grotzer, t.a., & bell-basca, b. (2003). how does grasping the underlying causal structures of ecosystems impact students‘ understanding? journal of biological education, 38, 16-28. hmelo-silver, c. e., jordan, r., honwad, s., eberbach, c., sinha, s., goel, a., rugaber, s., & joyner, d. (2011). foregrounding behaviors and functions to promote ecosystem understanding. proceedings of hawaii international conference on education (pp. 2005-2013). honolulu hi: hice. hmelo-silver, c. e., marathe, s., & liu, l. (2007). fish swim, rocks sit, and lungs breathe: expert-novice understanding of complex systems. journal of the learning sciences, 16, 307-331. hmelo-silver, c. e., liu, l., & jordan, r. (2009). visual representation of a multidimensional coding scheme for understanding technology-mediated learning about complex natural systems. research and practice in technology enhanced learning environments, 4, 253-280 hogan, k., & fisherkeller, j. (1996). representing students‘ thinking about nutrient cycling in ecosystems: bidimensional coding of a complex topic. journal of research in science teaching, 33, 941– 970. hogan, k. (2000). exploring a process view of students' knowledge about the nature of science. science education, 84, 51-70. jacobson, m. j., & wilensky, u. (2006). complex systems in education: scientific and educational importance and implications for the learning sciences. journal of the learning sciences, 15, 11-34. jordan, b., & henderson, a. (1995). interaction analysis: foundations and practice. journal of the learning sciences, 4, 39-103. konkola, r., tuomi-grohn, t., lambert, p., & ludvigsen, s. (2007). promoting learning and transfer between school and workplace. journal of education and work, 20, 211-238. leach, j., driver, r., scott, p. & wood-robinson, c.: 1996, ‗children‘s ideas about ecology 3: ideas about the cycling of matter found in children aged 5–16. international journal of science education, 18, 129-142. liu, l., & hmelo-silver, c. e. (2009). promoting complex systems learning through the use of conceptual representations in hypermedia. journal of research in science teaching, 9, 1023-1040. lobato, j., ellis, a. b., & munoz, r. (2003). how focusing phenomena in the instructional environment support individual students generalizations. mathematical thinking and learning, 5, 1-36. lobato, j. (2004). abstraction, situativity, and the ―actor-oriented transfer‖ perspective. in j. lobato (chair), rethinking abstraction and de contextualization in relationship to the ―transfer dilemma.‖ symposium conducted at the annual meeting of the aera, san diego, ca. lobato, j. (2006). alternative perspectives on the transfer of learning: history, issues, and challenges for future research. the journal of the learning sciences, 15, 431-449. machamer, p., darden, d., & craver, c. f. (2000). thinking about mechanisms. philosophy of science, 67, 1-25. moore, j. l., & schwartz, d. l. (1998). on learning the relationship between quantitative properties and symbolic representations. in proceedings of the international conference of the learning sciences (pp. 209-214). mahwah, nj: erlbaum. moreno, r., (2004). decreasing cognitive load for novice students: effects of explanatory versus corrective feedback in discovery-based multimedia. instructional science, 32, 99-113. national research council. (2012). a framework for k-12 science education practices, crosscutting concepts, and core ideas. washington, dc. okita, s., & schwartz, d. l. (in press ). learning by teaching human pupils and teachable agents: the importance of recursive feedback journal of the learning sciences. powell, a. b., francisco, j., & maher, c. a. (2003). an analytical model for studying the development of learners' mathematical ideas and reasoning using videotape data. journal of mathematical behaviour, s. sinha et al. 23 | f l r 22, 405-435. reiner, m., & eilam, b. (2001). conceptual classroom environment: a system view of learning. international journal of science education, 23, 551-568. sabelli, n. (2006). complexity, technology, science, and education. journal of the learning sciences, 15, 510. stake, r. e. (1998). case studies. in n. k. denzin & y. s. lincoln (eds.), strategies of qualitative inquiry (pp. 86-109). thousand oaks ca: sage. tan, j., & biswas, g., (2006). the role of feedback in preparation for future learning: a case study in learning by teaching environments. intelligent tutoring systems, 2, 370-381. van oers, b. (1998). from context to contextualizing. learning and instruction, 8, 473-488. vattam, s., goel, a., rugaber, s., hmelo-silver, c., jordan, r., gray, s., & sinha, s. (2011) understanding complex natural systems by articulating structure-behavior-function models. educational technology & society, 14, 66-81. wilensky, u. & reisman, k. (2006). thinking like a wolf, a sheep or firefly: learning biology through constructing and testing computational theories – an embodied modelling approach. cognition and instruction, 24, 171-209. yin, r. k. (2009). case study research: design and methods (fourth ed.). thousand oaks ca: sage. microsoft word dignath_publication.docx frontline learning research vol.4 no. 5 (2016) 83 -‐ 105 issn 2295-‐3159 contact information: charlotte dignath-van ewijk, goethe university frankfurt, theodor-w.-adorno-platz 6, 60323 frankfurt am main, email: dignath@psych.uni-frankfurt.de doi: http://dx.doi.org/10.14786/flr.v4i5.247 which components of teacher competence determine whether teachers enhance self-regulated learning? predicting teachers’ self-reported promotion of self-regulated learning by means of teacher beliefs, knowledge, and self-efficacy charlotte dignath-van ewijk goethe university frankfurt, germany article received 10 march / revised 16 august / accepted 9 october / available online 18 january abstract in this study, the predictive value of three aspects of teacher beliefs regarding teachers’ promotion of self-regulated learning (srl) is modelled by means of structural equation modelling. these include teacher beliefs on (1) instructing srl, (2) regarding their own self-efficacy towards the promotion of srl, and (3) their epistemological beliefs regarding learning. 173 primary school teachers participated in the study. path analysis revealed that teachers’ beliefs on instructing srl, along with their self-efficacy beliefs regarding the promotion of srl, were predicting teachers’ promotion of srl mostly positively. the results offer new insights into teacher beliefs and how they account for self-reported teacher practice regarding the promotion of srl. this study is particularly innovative as it is the first study in the field of teachers and srl to investigate teacher beliefs and teacher selfefficacy as potential determinants of teachers’ promotion of srl in the classroom. these results can serve to construct a model of teachers’ promotion of srl, as well as provide ideas on how to help teachers supporting srl. this study is frontline as it appears that no other research has been published on teachers’ beliefs, in particular self-efficacy beliefs, towards promoting srl and how these are related to teachers’ promotion of srl. most research on teachers and srl has so far focused on training teachers, but no model of the professional competence of teachers in this area exists until now. keywords: self-regulated learning; teacher beliefs; teacher self-efficacy dignath et al | f l r 84 1. theoretical background a large amount of research on the impact of self-regulated learning (srl), as well as on factors that determine the use of srl and how srl can be fostered in learners, has been examined in the past. however, there is still a lack of research on what determines teachers’ promotion of srl. when looking at the literature, one can imagine the amount of research in the area of srl as being similar to the shape of a funnel: the numerous studies on the impact that srl has on a learner’s achievement and motivation have drawn the interest of many researchers to investigate, as a next step, how srl in learners can be improved. several meta-analyses on how to promote srl have summarized the considerable number of training studies in the field of srl (see e.g., hattie, biggs & purdie, 1996; dignath, büttner & langfeldt, 2008). when looking further at how teachers can enhance their students’ srl, the amount of research decreases substantially (for a literature review on the role of the teacher in promoting srl, see moos & ringdal, 2012). and finally, only few studies have explicitly searched for reasons why some teachers do, while others do not promote srl in their classrooms, in particular with regard to their instruction of srl strategies (e.g., chatzistamatiou, dermitzaki & bagiatis, 2014; dignath-van ewijk, dickhäuser & büttner, 2012). many studies in the past showed that individual factors, such as teaching self-efficacy, value beliefs, etc., are related and/or affect teachers' reports regarding several aspects of instructional quality, as well as covering aspects of supporting students’ autonomy while learning (e.g., kunter, tsai, klusmann, brunner, krauss & baumert, 2008). however, knowing more about what determines whether teachers promote srl would be helpful to generate ideas on what teachers should develop in order to enhance srl in their students and how teachers can be supported during the process of teacher training (see kramarski, desoete, bannert, narciss & perry, 2013). to fill this gap, we investigated several potential predictors in order to learn what determines teachers’ self-reported promotion of srl. the aim of the study was to investigate the impact of potential determinants for teachers’ promotion of srl, including teacher beliefs, teacher self-efficacy, and teacher knowledge, on teachers’ self-reported promotion of srl in the classroom in order to find out which teacher characteristics should be addressed when training teachers in srl. teachers had to fill out questionnaires regarding their educational beliefs, self-efficacy beliefs, and their knowledge on how to support students’ srl, and they also had to rate their promotion of srl while they taught. the teacher characteristics which were assessed should be placed in a general model of teaching competence. 1.1 promotion of self-regulated learning zimmerman (2000) defines self-regulated learners as learners who set themselves goals, plan their actions to pursue these goals, monitor their learning, and finally evaluate their learning process. in terms of a feedback loop, the result of this evaluation influences and regulates the following learning process. when looking at how teachers can foster srl in students, paris and paris (2001) describe two different ways: directly, by providing students with knowledge and skills on how to self-regulate (teaching them strategies in terms of informed training (brown, campione & day, 1981), and indirectly, by arranging the learning environment in a constructivist way so that students can and have to self-regulate their learning (e.g., by offering choices to students and providing them with situations in which they can take over responsibility for their learning) (pressley, harris & marks 1992). although some literature can be found on how to promote srl among students using specific interventions (see e.g., dignath, 2009; paris & paris, 2001; perry & vandekamp, 2000; perry, phillips & dowler, 2004; pressley et al., 1992), only little research has been published so far about teachers’ actual promotion of srl in the classroom. the few observational studies that have been conducted in order to register in how far teachers foster srl among their students have concluded that teachers spend only little time on strategy instruction, even if they often design learning environments which require self-regulation (see e.g., bolhuis & voeten, 2001; dignath et al., 2013; hamman, berthelot, saia & crowley, 2000; spruce & bol, 2014; moely et al., 1992). the outcomes of those studies raise the question, why teachers do not dignath et al | f l r 85 invest more in preparing their students for self-regulation. do teachers not support the idea of self-regulated and strategic learning (teacher beliefs), or do they not know how to support it (teacher knowledge)? and how are both, beliefs and knowledge, related? 1.2 teacher beliefs and teacher knowledge when looking at teacher beliefs, a clear labelling of the constructs of beliefs and knowledge seems necessary. while beliefs are supposed to be more affective in this distinction, knowledge is supposed to have the higher epistemic status by being more justifiable when compared with beliefs (fenstermacher, 1984). although the terms have been used interchangeably in teacher education literature (hofer & pintrich, 1997), we draw on the terminology used by pajares (1992) when reviewing the literature of teacher beliefs and teacher knowledge: “belief systems, unlike knowledge systems, do not require general or group consensus regarding the validity and appropriateness of their beliefs. individual beliefs do not even require internal consistency within the belief system. this nonconsensuality implies that belief systems are by their very nature disputable, more inflexible, and less dynamic than knowledge systems.” (pajares, 1992, p. 311). beliefs are supposed to include value commitments, epistemological beliefs, subjective theories about learning, and goals. furthermore, motivational orientations cover teachers’ self-referred cognitions – in particular locus of control and self-efficacy – as well as their intrinsic motivation (baumert & kunter, 2006, 2013). 1.3 epistemological beliefs beliefs on fostering srl could be influenced by more general beliefs on the nature of learning and knowing: epistemological beliefs. epistemological beliefs refer to the assumptions an individual has about the origin, nature and structure of knowledge and knowing (schraw, crippen & hartley, 2006). hofer and pintrich (1997) identified three lines of research on epistemological beliefs. the first line of research has dealt with developmental models of beliefs about knowing and knowledge, building on the initial work of perry (1970). the second line of research has focused on the consequences of epistemological assumptions on the way people think and reason. the last and most recent line of research has been concerned with the structure of a system of beliefs and how these beliefs influence comprehension and academic achievement (e.g., schommer, 1990; schommeraikins, 2004). schommer has developed a taxonomy of beliefs consisting of four more or less independent dimensions: innate ability (learning is not changeable, but rather a fixed ability), quick learning (content is either learned quickly or not at all), simple knowledge (most important information is rather simple), and certain knowledge (information does not change over time). these four dimensions were extracted by factor analysis with each factor seen as a continuum. the factor innate ability is probably of central importance for the promotion of srl in our study and will therefore be described later on in more detail compared with the other three factors of which detailed descriptions can be found elsewhere (e.g., hofer & pintrich, 1997). several researchers have found significant relationships between epistemological beliefs and srl (see e.g., pieschl, stahl & bromme, 2008 or schommer, 1990 for epistemological beliefs in general; see e.g., bendixen & hartley, 2003 for innate ability). in the way that this relationship can be found between learners’ epistemological beliefs and their self-regulation, one can assume that such a relationship exists between teachers’ epistemological beliefs and their promotion of self-regulation. if the assumption that learning and knowing is something that a learner cannot change or influence, then the learner does not feel capable to self-regulate his or her learning. if a teacher assumes that learning and knowing is not changeable, why would he or she try to teach self-regulation of learning? in a meta-synthesis, hattie (2013) synthesized the results of several meta-analyses on the effect that teacher beliefs have on their effectiveness. he found a strong effect of teachers’ epistemological beliefs related to their conceptions of teaching and learning (hattie, 2013). several studies on the association of teachers’ epistemological beliefs and their instruction dignath et al | f l r 86 point into this direction as teachers’ epistemological beliefs affect their curricular and pedagogical decisions (for an overview, see schraw et al., 2006). however, results are sometimes unclear and other studies found the impact of epistemological beliefs on teaching competence to be mixed (e.g., creemers et al., 2013; shraw & olafson, 2003; sosu & gray, 2012). bell (2006) examined the effects of srl and epistemological beliefs on academic achievement while holding constant the effects of self-efficacy and prior college academic achievement. he found students’ prior academic achievement and their expectancy to be the only significant predictors for academic achievement. however, he argues that students’ self-regulation, as well as their epistemological beliefs, probably influence students’ expectancy, which in turn influences academic achievement, but he did not show any own predictive value in the multiple regression analyses that he had run (bell, 2006). we therefore decided to analyze the impact of epistemological beliefs in another way so that we can account for indirect effects on our outcome variable, as epistemological beliefs might not directly influence teachers’ selfreported promotion of srl, but only indirectly via more specific teacher beliefs. 1.4 beliefs on the promotion of self-regulated learning to our knowledge, only few studies thus far have been conducted in order to investigate the relationship between teachers’ educational beliefs and teachers’ promotion of srl. lombaerts and colleagues have addressed the question of the relationship between flemish primary school teachers’ beliefs and their self-reported teaching practice by developing questionnaires to investigate teacher beliefs on promoting srl (lombaerts, engels, van braak & athanasou, 2009), as well as on teachers’ realization of self-regulation during their teaching (lombaerts, engels & athanasou, 2007). they found teacher beliefs about srl in primary school to be a significant predictor for teachers’ self-reported recognition of srl. these teacher beliefs were predicted significantly by beliefs about teacher-level influence on srl, but not by beliefs about school context influence on srl. thereby, teacher beliefs about teacher-level influence on srl cover aspects like e.g., beliefs on instructional pedagogy, and on innovations in teaching and learning (sample item: personal insight into how to support self-regulated learning does influence the introduction of selfregulated learning in my classroom.), while beliefs about school-level influence on srl include beliefs on the stimulation of srl by the school as an organization, collaboration of teachers as part of the school culture, or curriculum changes (sample item: the level of involvement of our team in the school development plan influences the introduction of self-regulated learning in my classroom.) (see lombaerts et al., 2009). moreover, another significant predictor for teachers’ self-reported recognition was teacher-level satisfaction, which was in turn predicted by school context satisfaction, itself not having any direct predictive value for self-reported teacher behavior. the main conclusion of lombaerts and his colleagues (2009) is that beliefs about teacher-level influence on srl predict teachers’ self-reported behavior directly, but beliefs about school context influence on srl do not. vandevelde, vandenbussche and van keer (2012) conducted a study with flemish primary school teachers that revealed that teachers who report developmental educational beliefs reported to implement srl more often than teachers holding transmissive educational beliefs. teachers’ implementation of srl was assessed using the scale that lombaerts and colleagues had developed and used in their study (lombaerts et al., 2007). dignath-van ewijk & van der werf (2012) examined the predictive value of dutch primary school teachers’ educational beliefs and their knowledge about promoting srl for their realization of supporting srl in the classroom. their results showed that teacher beliefs on srl predict teachers’ self-reported implementation of srl, contrary to teachers’ general educational beliefs, or to teachers’ knowledge on promoting srl. moreover, they found that teachers were more positive towards the realization of a constructivist learning environment in general than towards the instruction of srl strategies. spruce and bol (2014) examined the relationship between (1) teacher beliefs and (2) knowledge and (3) the observed classroom practice of ten american primary and middle school teachers in a qualitative case dignath et al | f l r 87 study. they found teachers often deviated within these three constructs: out of the ten teachers that they observed in classrooms, teachers with high knowledge regarding the promotion of srl and rather positive beliefs reached only low observation scores when promoting srl; teachers having a positive attitude towards srl and possessing high knowledge regarding the promotion of srl was not reflected in their promotion of srl, which the researchers observed (spruce & bol, 2014). these studies have investigated the predictive value of teacher beliefs and, regarding teacher knowledge, the promotion of srl on behalf of teachers. however, none of the studies have considered the impact of teachers’ self-efficacy on implementing srl in the classroom. 1.5 self-efficacy beliefs when looking at motivational orientations, self-efficacy and locus of control play an important role in determining teacher behavior (see e.g., baumert & kunter, 2006, 2013; kunter, 2013). the feeling of control over the behavior in question arises from self-efficacy theory (bandura, 1977). according to bandura (1977), self-efficacy beliefs represent a judgment of one’s capabilities to reach a certain goal. with regard to teacher efficacy, tschannen-moran and woolfolk (2001) define teachers’ efficacy beliefs as an assessment of “his or her capabilities to bring about desired outcomes of student engagement and learning, even among those students who may be difficult or unmotivated” (tschannen-moran & woolfolk, 2001, p. 783). bandura (1986) proposed that people with high self-efficacy beliefs are more likely to perform better and show a higher frustration tolerance than people with low self-efficacy beliefs. self-efficacy beliefs would, therefore, play an important role in one’s motivation to initiate an effort, the persistence of effort, and the way in which one deals with setbacks (skaalvik & skaalvik, 2010). four factors are supposed to determine a person's self-efficacy: (1) personal experience of success or failure (also: enactive attainment) whose interpretation is closely related to people’s beliefs and values, (2) vicarious experience (also: modeling) which is influencing the knowledge of what is “right” and “wrong”, (3) social persuasion in terms of encouragement or discouragement, as well as (4) the perception of physiological reactions (bandura, 1977). these four factors also affect teachers’ beliefs and knowledge which will, in turn, act as determinants for teachers’ self-efficacy. these sources of efficacy beliefs are considered to provide the basis for one’s task analysis and appraisal of one’s personal competence. the resulting judgment of the match of task requirements and personal ability creates the teachers’ efficacy belief (tschannen-moran, hoy & woolfolkhoy, 1998). how is the self-efficacy of teachers linked to teacher beliefs, knowledge, and behavior? a large amount of research on teacher efficacy has evolved within the last decades (see klassen, tze, betts & gordon, 2011 for a review). research has shown the relationship between self-efficacy and teacher behavior (see e.g., guo, piasta, justice & kaderavek, 2010; holzberger, phillip & kunter, 2013; tschannen-moran & woolfolk hoy, 2001). studies have found e.g., a significant relationship between teachers’ self-efficacy and teacher beliefs towards instructional innovation (e.g., ghaith & yaghi, 1997; guskey, 1988) and instructional strategies (e.g., tschannen-moran & johnson, 2011; wertheim & leyser, 2002; swars, 2005), and a relationship between teachers’ self-efficacy and a control orientation against teacher control of students (woolfolk & hoy, 1990). with regard to the promotion of srl, chatzistamatiou et al. (2014) found that teachers' self-reported strategies to enhance students' srl in mathematics were predicted by their own selfefficacy beliefs. perry, hutchinson, and thauberger (2008) asked teachers about their instruction of strategies and found teachers to be positive towards the idea of fostering self-regulation of their students, but they did not know how to do this. beside the pure knowledge of doing so, the teachers’ self-efficacy on feeling competent enough to handle this might play a role. dignath et al | f l r 88 1.6 hypotheses in this study, we wanted to investigate determinants of teachers’ promotion of srl when looking at teacher beliefs and teacher knowledge. although also other variables on the institutional level or the teacher level can have an impact on teachers’ promotion of srl, we focused on variables on the teacher level in our investigation in order to start researching potential determinants on a micro-level first. the relationships between the above mentioned concepts, on which we base the theoretical model, can be derived from the theory as follows (see figure 1 for a graphical illustration on the predicted relationships): a) firstly, general teacher beliefs are assumed to predict more specific teacher beliefs (pajares, 1992). therefore, epistemological beliefs about learning as a fixed or changeable ability (in terms of rather general teacher beliefs) are assumed to predict teachers’ beliefs on the promotion of srl (in terms of rather specific beliefs). b) second, knowledge is assumed to be predicted by one’s beliefs. in how far teachers learn new knowledge and how new information will be integrated into their existing knowledge base, will be predicted by the beliefs that teachers already report (ertmer, 2005). thus, teacher beliefs are assumed to predict teacher knowledge on the promotion of srl. c) third, most researchers agree on the strong impact that teacher beliefs have on teaching behavior. most empirical evidence suggests that teacher beliefs have a stronger impact on teaching behavior than does teacher knowledge (see e.g., the reviews of kagan, 1992, and pajares, 1992). we therefore assume that teacher beliefs predict teacher knowledge, and that both teacher beliefs and teacher knowledge predict self-reported teacher practice, with teacher beliefs being a stronger predictor than teacher knowledge. d) whether an intention is carried out as a certain behavior depends not only on one’s beliefs, but also on one’s perceived behavioral control (ajzen, 1991; bandura, 1977). the self-efficacy of teachers to promote srl is thus supposed to predict teachers’ self-reported promotion of srl, next to teacher beliefs and teacher knowledge. e) finally, self-efficacy is supposed to be determined by several factors (see point described earlier in this section), of which the interpretation is influenced by one’s beliefs and knowledge (bandura, 1977). we therefore assume that teacher beliefs and teacher knowledge will predict the self-efficacy of teachers. 2. methods 2.1 sample primary school teachers from southern germany had been contacted via email and telephone to ask if they could complete a questionnaire on teachers’ knowledge and beliefs towards srl. one hundred and seventy-three primary school teachers participated in the study, which equates to a response rate of 41%. the data from 140 teachers was complete and could be included in the analyses. teachers were mainly female (87.1%), and ranged in age from 22 to 64 years (m= 39.6 years, sd=12.63) and had on average 15.06 years of teaching experience (sd=12.91). dignath et al | f l r 89 2.2 procedure in order to find out whether teacher beliefs and knowledge predict teachers’ self-reported promotion of srl, a path analysis was conducted. as we assumed beliefs and self-efficacy to predict teachers’ selfreported practice, self-reported teacher behavior was regressed on beliefs regarding the promotion of srl, as well as epistemological beliefs, on knowledge, and on self-efficacy within the path analysis. furthermore, epistemological beliefs as more general beliefs on learning were assumed to predict teacher beliefs on the promotion of srl, as those are more specific. figure 1 shows the model that was specified based on the previously described theories. figure 1. expected model of predictors for teachers’ self-reported promotion of self-regulated learning the questionnaires were administered to all teachers within a time period of two weeks. most schools received the questionnaires personally; only schools which were situated more than 40 km away received the questionnaires by post. one week after administration, the completed questionnaires were collected from the schools. the questionnaire took approximately 30 minutes to complete. teachers were told about the purpose of the study. questions from the questionnaire were to be answered with regard to teaching in grade 4 of primary school1 to allow for comparisons among teachers. 2.3 instruments the teacher questionnaire consisted of four self-reporting scales and one open question regarding teacher knowledge. teachers first had to answer the open question that aimed at assessing their knowledge on promoting srl before possibly being influenced by the content of the items in the questionnaires. next, teachers had to answer questions using the scales measuring teacher beliefs on srl, teacher self-efficacy beliefs, as well as epistemological beliefs. finally, the questionnaire contained a scale assessing teachers’ self-reported promotion of srl. the order of the scales was chosen in a way that minimizes influences from answering one scale to the next. 1 in the german school system, students enter primary school at the age of six (1st grade) and leave primary school after 4th grade. epistemological beliefs (“innate”) epistemological beliefs (“changeable”) beliefs on promotion of srl self-efficacy for the promotion of srl knowledge on the promotion of srl self-reported promotion of srl dignath et al | f l r 90 2.3.1 teacher epistemological beliefs teachers’ epistemological beliefs regarding learning were measured using the subscale for fixed ability from a german translation of the epistemological questionnaire by schommer (1990) (schiefele, moschner & husstegge, 2002). innate, or fixed ability, draws on the concept of individuals’ implicit theories of intelligence by dweck and legget (1988) who found individuals to differ in their view on intelligence as being a fixed versus a malleable entity. although schommer (1990) treated the five items assessing innate, or fixed ability, as being one scale, we decided to include two subscales: one measuring the belief whether learning skills are innate (sample item: “there exists an innate talent, which determines how quickly one can learn.”), and one measuring the belief whether learning behavior is changeable (sample item: “the ability to learn can hardly be influenced through practice.”). in former research from schommer, the subset of items that measured ability to learn is innate, was assumed to be part of the factor fixed ability; however, it had not consistently loaded on this factor, but on the factor quick learning (see hofer & pintrich, 1997). schommer (1990) had developed the questionnaire to assess the epistemological beliefs of learners. in our study, we assessed the epistemological beliefs of teachers who completed the questionnaire in the first instance from the perspective of a teacher and not of a learner. for teachers, it makes sense by nature that learning is changeable; otherwise, their entire profession would come into question. however, there is variance in teachers’ beliefs regarding fixed ability that we wanted to be able to capture. epistemological beliefs were therefore assessed by means of two subscales: the belief whether learning skills are innate, and the belief whether learning behavior is changeable. the teachers rated each item on a sixpoint likert scale ranging from not true at all to completely true. internal consistency for the subscale innate was .57 and for the subscale changeable, .62. although .57 is a low coefficient by most reliability standards, results from other studies using the measurement of epistemological constructs have presented reliability coefficients in this range (see e.g., hofer, 2000; schraw et al., 2002). since the problem to be expected from relatively low internal consistency is low statistical power, only the subscale changeable, which consisted of three items and still had an acceptable reliability of α = .62, was kept in the analysis. 2.3.2 teacher beliefs on srl teacher beliefs on instructing srl were assessed with a german version of the self-regulated learning teacher belief scale (lombaerts et al., 2009). the twelve items from this scale focused on teachers’ beliefs about supporting primary school students’ self-regulation of learning using different measures (sample item: “the instruction of learning strategies leads to students being better in evaluating their learning.”) and had to be rated on a five-point likert scale. the internal consistency (cronbach’s α) for the german version of this scale in our sample was .80. 2.3.3 teacher self-efficacy beliefs to measure teachers’ beliefs regarding their own self-efficacy with regard to the promotion of srl, the teacher-self-efficacy scale by schmitz und schwarzer (2000), consisting of ten items, had been tailored to the context of srl. item formulations had been adapted in order to assess specific teacher self-efficacy beliefs concerning the promotion of srl (example: “i know that i manage to help even to the most difficult students to learn the learning content.” [original item] was changed to “i know that i manage to help even to the most difficult students to learn how to learn the learning content themselves.” [adapted item]). for this adapted scale, we found an internal consistency of α = .81 in this sample. 2.3.4 teacher knowledge teacher knowledge was assessed by means of an open-ended question developed by lonka, joram, and bryson (1996), asking teachers about the best way to enhance the learning behavior of students, teaching them learning to learn and why. dignath-van ewijk & van der werf (2012) have developed a coding scheme for this question in order to code the answers of teachers according to models of direct and indirect ways of fostering srl. the teachers’ written answers were transcribed and coded for data analyses according to this dignath et al | f l r 91 coding scheme using two different coders. the coding was based on a theoretical foundation of promoting srl which states that teachers should both provide students with autonomy to regulate their own learning themselves (indirect way of promoting srl) and to teach them srl-strategies on how to deal with such an autonomy (direct way of promoting srl). in terms of scaffolding, the more students are used to selfregulation, the less the teacher has to structure himor herself (so he or she can shift from a more direct way of promoting srl, to a more indirect way). providing students with srl-strategies without enabling them to self-regulate their learning by means of a learning environment that offers the students opportunities to take over responsibility, is not supposed to foster students’ srl, as they learn in a theoretical way about selfregulation but do not have the chance to practice it. the same applies the other way around: creating a learning environment that provides students with autonomous learning opportunities, but that doesn’t teach them how to handle such autonomy, does not foster srl, as many students will not be able to take over the responsibility for their learning when they lack srl-strategies. to this effect, teachers should do both: provide students with learning environments that allow them self-regulation and provide them with strategies to handle these learning environments more effectively (see dignath, 2009; paris & paris, 2001). we therefore coded teacher responses to whether teachers did not show any knowledge of promoting srl (coded as 0: no answer or answer that does not fit to the question, e.g.: “caring for a nice atmosphere in which the pupils feel comfortable.”), only partial knowledge so either mentioning the composition of an autonomous learning environment (coded as 1: example item: “cooperative learning, as learning with peers leads to discussions and that leads to a better understanding instead of the teacher saying how it has to be and then it is like that.”, or the instruction of srl-strategies (coded as 1: “analysing together with the pupils how they learn and make them get to know also other strategies/ways of learning.”), or whether they showed full knowledge (mentioning both, characteristics of an autonomous learning environment, as well as strategy instruction (coded as 2: “on the one hand it is important for pupils to be allowed to work on their own ideas. on the other hand, for some pupils with a greater need for structure, it can also be good to first get familiar with certain learning processes and to first learn how to do that.”). teacher answers for the composition of a learning environment that supports self-regulation were coded according to the coding scheme of an observation instrument developed by dignath-van ewijk et al. (2013) in order to assess teachers’ promotion of srl in the classroom. this coding scheme included categories for teaching formats like cooperative learning, discovery learning, providing students with choices, situated learning, and problem-based learning. teacher answers concerning the instruction of srl-strategies were sorted according to the same coding scheme containing strategy categories like metacognitive strategies (planning, monitoring, evaluating, reflecting), cognitive strategies (organization, elaboration, problem solving), and motivation strategies (e.g., attribution, feedback seeking, cooperation, etc.). for a more precise description of the coding scheme, see dignath-van ewijk et al., 2013 and dignath-van ewijk & van der werf, 2012). teachers were allowed to give more than one answer to that question. this is even desirable as teachers would have to mention at least one direct and one indirect way of fostering srl in order to reach the highest rating. interrater reliability computed with cohen’s kappa revealed an agreement of 𝜅 = .83. 2.3.5 teachers’ self-reported promotion of srl a german version of the self-regulated learning inventory for teachers (lombaerts et al., 2007) was used to assess teachers’ self-reported promotion of srl in their classroom. the questionnaire of lombaerts et al. (2007) consists of 24 items, capturing srl during the three phases of zimmerman’s (2000) model of srl: a planning phase, an action phase, and a self-reflection phase (sample item: “my pupils work on tasks that require them to plan their work themselves towards a deadline.”). we included 16 items from this questionnaire into our questionnaire as some items did not match our interest in teachers’ promotion of srl, but we additionally included five items regarding the direct instruction of srl strategies (sample item: “i ask my students to work independently without explicitly discussing learning strategies beforehand.”) developed by dignath-van ewijk & van der werf (2012). all items had to be rated on a six-point likert scale ranging from never to always. the reliability estimate (cronbach’s α) for the overall scale in our research was .90. dignath et al | f l r 92 table 1 overview of instruments used construct instrument number of items example item teacher beliefs on instructing srl german version of the self-regulated learning teacher belief scale (lombaerts, engels, van braak & athanasou, 2009) 12 the instruction of learning strategies can be realized in primary school. teachers’ epistemological beliefs regarding learning german translation of the epistemological questionnaire by schommer 1990 (schiefele, moschner & husstegge, 2002) 5 there exists an innate talent which determines how quickly one can learn. teachers’ beliefs regarding their own self-efficacy teacher-self-efficacy scale by schmitz und schwarzer (2000) 10 i know that i can teach learning strategies even in difficult situations. teachers’ self-reported promotion of srl german version of the self-regulated learning inventory for teachers (lombaerts, engels & athanasou, 2007) 21 i teach my students how to plan one’s learning. teachers’ knowledge on how to promote srl open-ended question developed by dignath-van ewijk & van der werf (2012) based on lonka et al. (1996) 1 according to you: what is the best way to teach students learning to learn? why? table 2 means, standard deviations and cronbach alpha reliabilities construct m sd cronbach’s α teacher beliefs on instructing srl 2.74 .42 .80 teachers’ epistemological beliefs regarding learning subscale “innate”: 3.54 subscale “changeable”: 1.25 .84 .72 .57 .62 teachers’ beliefs regarding their own selfefficacy 1.78 .41 .81 teachers’ knowledge on how to promote srl cohen’s kappa: 83% teachers’ self-reported promotion of srl 2.58 .59 .90 dignath et al | f l r 93 2.4 path analysis we used path modelling as an extension of multiple regression in order to test both direct and indirect (mediator) effects on the dependent variable. for the model proposed from the data in this study, we included indicators of teacher beliefs (including epistemological beliefs, beliefs on srl, and self-efficacy beliefs) and teacher knowledge, as well as teachers’ self-reported promotion of srl as an outcome measure that is predicted by the other variables in the model. since we did not want to estimate any latent variables, including an explicit estimation of measurement errors, we chose path analysis instead of structural equation modelling using stata. path analysis is based on multivariate regression modelling which allows for more than one dependent variable and the analysis of mediator effects, using one regression equation per endogenous variable derived from the path diagram. the relative strength of the postulated effect is presented in terms of path coefficients, most commonly indicated by the beta coefficient. teachers’ self-reported promotion of srl, as the criterion measure, is considered to be under the influence of all other variables in the model, either directly or when mediated through other variables. 3. results 3.1 test for multivariate normality we conducted the doornik-hansen test for multivariate normality (doornik & hansen, 2008) for the five variables included in the model in order to show that the data was normally distributed (chi2=9.34, p=.50). 3.2 scale intercorrelations table 3 provides an overview of correlations among the scales. results were largely consistent with the hypothesized predictions that can be found in figure 1. scores measuring epistemological beliefs concerning the belief of ability as being changeable correlated negatively with teacher beliefs on srl (r=-.19), teacher self-efficacy beliefs (r=--.21), and self-reported promotion of srl (r=--.21). epistemological beliefs scores did not correlate with teacher knowledge, which was not consistent with our predictions. teacher beliefs on srl correlated positively with teacher knowledge (r=.37), teacher self-efficacy beliefs (r=.43), and teachers’ self-reported promotion of srl (r=.49). self-efficacy beliefs further correlated positively with teacher knowledge (r=.33) and teachers’ selfreported promotion of srl (r=.56). finally, teacher knowledge also correlated with teachers’ self-reported promotion of srl (r=.36). dignath et al | f l r 94 table 3 overview of correlations among the scales teacher knowledge teacher beliefs on srl teacher selfefficacy teachers’ self-reported promotion of srl teachers’ epistemological beliefs “changeable” teacher knowledge 1.00 teacher beliefs on srl 0.37*** 1.00 teacher selfefficacy 0.33** 0.43*** 1.00 teachers’ selfreported promotion of srl 0.36*** 0.49*** 0.56*** 1.00 teachers’ epistemological beliefs “changeable” 0.09 -0.19* -0.21* -0.20* 1.00 note: *correlation is significant at the .05 level, one-tailed. **correlation is significant at the .01 level, onetailed. ***correlation is significant at the .001 level, one-tailed. intercorrelations among teacher knowledge, teacher beliefs, teacher self-efficacy, self-reported teacher behavior, and teachers’ epistemological beliefs (n=140) 3.3 path modelling regression analyses path regression analyses were then conducted according to the hypotheses specified earlier in order to predict teachers’ reported promotion of srl directly through teachers’ knowledge, beliefs and selfefficacy towards the promotion of srl, as well as indirectly through teacher beliefs. theoretically, we assumed that all variables would predict teachers’ self-reported promotion of srl; however, not all of them would have a direct effect. compared to the fully saturated model, we therefore omitted the direct paths between epistemological beliefs and self-efficacy of promoting srl, and between epistemological beliefs and self-reported srl. we calculated direct, indirect, and total effects for all endogenous variables by applying sewall wright’s multiplication rule as the path tracing rule: each structural equation is multiplied by its predetermined variables (wright, 1934). we used stata 13 to estimate the path model and tested the model fit using maximum likelihood estimation. the likelihood ratio, chi square, and corresponding p value, rmsea and cfi, were calculated to assess the model fit. the initial path model contained five predictors and had an r2=.38, indicating that the model explains 38% of the variance in teachers’ self-reported promotion of srl. chi square indicated that the model could be improved in order to fit the data accordingly compared to the fully saturated model: χ2 (2, n = 140) = 8.06, p = 0.02. in a next step, we therefore added the two omitted paths to the model. the path from epistemological beliefs to self-efficacy in promoting srl was significant (p = 0.01), while the path from epistemological beliefs to self-reported srl was not significant (p = 0.17). finally, we added the former path to the model. this model did not fit the data significantly worse than the fully saturated model: χ2 (1, n = 140) = 1.85, p = 0.17). compared to the baseline model without any predictors, the revised model was highly significant (likelihood ratio chi square = 144.76; p < .001). as additional measures of fit, the rmsea and cfi were calculated and both indicated an acceptable fit (rmsea = .076; p(rmsea<.05) = dignath et al | f l r 95 -.11* .25; cfi = .994). as the evidence on the investigated constructs would allow for several theoretical assumptions, we tested five different models and compared their fit indices, as summarized in table 4: table 4 goodness-of-fit-indices for the compared models model 1 model 2 model 3 model 4 model 5 beliefs predicts knowledge and self-efficacy; and self-efficacy is predicted by knowledge and by beliefs beliefs predicts knowledge and self-efficacy beliefs, knowledge, and self-efficacy do not predict each other self-efficacy is predicted by knowledge and beliefs knowledge predicts beliefs and self-efficacy chi2 = 1.853 rmsea = 0.076 chi2 = 15.960 rmsea = 0.219 chi2 = 36.996 rmsea = 0.279 chi2 = 21.473 rmsea = 0.258 chi2 = 28.084 rmsea = 0.239 only for the model in which beliefs predicts knowledge and self-efficacy, while knowledge also predicts self-efficacy, the goodness-of-fit indices are acceptable. we therefore rejected the four alternative models and worked with the model presented in figure 2. figure 2. path model of predictors for teachers’ self-reported promotion of self-regulated learning including estimates (beta weights) of direct and indirect effects and explained variances (adjusted r2 coefficients). *p<.05; **p < .01; ***p < .001; tp < .10. based on the theoretical assumptions, we expected all variables to predict teachers’ selfreported promotion of srl, though not all variables were assumed to have a direct effect on the promotion of srl. the revised model indicates that teachers’ self-reported promotion of srl is predicted by the groups of variables: teacher beliefs, teacher knowledge, and teachers’ self-efficacy towards promoting srl. we found -.11* .10 t .54*** .30*** .15*** .68*** epistemological beliefs (“changeable”) beliefs on promotion of srl self-efficacy for the promotion of srl knowledge on the promotion of srl self-reported promotion of srl .40** .18* dignath et al | f l r 96 direct effects on teachers’ self-reported promotion of srl only for teacher beliefs on fostering srl and for teachers’ self-efficacy towards promoting srl.. teachers’ self-efficacy towards fostering srl had the largest direct effect (ß=.36, p<.001), followed by teacher beliefs (ß =.28, p<.001). we found no significant path from teacher knowledge on the promotion of srl to teachers’ self-reported promotion of srl (ß =.13, p=.07). no direct effect had been assumed from the teachers’ epistemological beliefs on teachers’ self-reported promotion of srl. however, teachers’ epistemological beliefs were found to have an indirect effect on teachers’ self-reported behavior in the classroom (see table 5). teachers’ self-efficacy was strongly predicted by teachers’ beliefs (ß =.31, p<.001) and teachers’ knowledge on promoting srl (ß=.29, p<.001), and negatively by teachers’ epistemological beliefs (ß =-.18, p<.05) (teachers who assumed learning behavior to be changeable reported a higher self-efficacy than teachers who assumed learning to be a fixed ability). we found teacher beliefs on the promotion of srl to be a highly significant predictor for teacher knowledge in this field (ß =.36, p<.001), but also teachers’ epistemological beliefs on the changeability of learning ability played a significant role (ß =.16, p<.05), with teachers believing in learning as a fixed ability showing more knowledge on the promotion of self-regulation strategies than teachers who didn’t assume learning to be a fixed ability. finally, teacher beliefs on the promotion of srl were predicted negatively by teachers’ epistemological beliefs in a way that teachers who believed in learning ability as not being fixed, were more positive towards the promotion of srl (ß =-.18, p<.05). table 5 reports regression coefficients of the path regression analyses for direct, indirect, and total effects. table 5 direct, indirect, and total effects for coefficients of the revised path model direct effect indirect effect total effect b (se) b (se) b (se) beliefs promotion of srl ß epistemological beliefs -.11 (.05)* 0 (no path) -.11 (.05)* knowledge promotion of srl ß beliefs promotion of srl epistemological beliefs .68 (.15)*** .18 (.09)* 0 (no path) -.07 (.04)* .68 (.15)*** .11 (.09) specific self-efficacy beliefs ß beliefs promotion of srl knowledge promotion of srl epistemological beliefs .30 (.07)*** .15 (.04)*** -.10 (04)* .10 (.02)*** 0 (no path) -.02 (.02) .40 (.08)*** .15 (.04)*** .12 (.05)** promotion of srl ß beliefs promotion of srl knowledge promotion of srl specific self-efficacy beliefs epistemological beliefs .40 (.11)*** .10 (.05)t .54 (.11)*** 0 (no path) .28 (.05)*** .08 (.02)*** 0 (no path) -.10 (.05)* .68 (.12)*** .18 (.06)** .54 (.11)*** -.10 (.05)* note: ***p < .001; **p < .01; *p < .05; tp < .10. dignath et al | f l r 97 4. discussion 4.1 contribution of the study contrary to the field of srl, which has been covered by a large amount of research in the meantime, teachers’ promotion of srl has been a rather neglected field in this research area, leaving many questions open about how teachers can improve students’ self-regulation and how they can be supported. this study contributes to the field of srl in adding to what we already know about why teachers do or do not promote srl during their teaching. as research in this area has shown, teachers do not promote strategy instruction very often during their teaching (see e.g., bolhuis & voeten, 2001; dignath-van ewijk et al., 2013; hamman et al., 2000; lombaerts et al., 2007; spruce & bol, 2014; vandevelde et al., 2012). only few studies present results about teacher characteristics that have an impact on teachers’ support for srl during their teaching (dignath-van ewijk & van der werf, 2012; lombaerts et al., 2009; spruce & bol, 2014; vandevelde et al., 2012), and if they do, they only cover single aspects such as teachers’ educational beliefs or their knowledge. in order to build a model of teachers’ promotion of srl, we need to know more about the teacher variables that predict teachers’ promotion of srl and how they might interact with each other. moreover, most studies have investigated single aspects of the promotion of srl without drawing on a model of teachers’ professional competence. the present results provide us with a further insight into the predictors of teachers’ enhancement of srl embedded in a framework on teacher competence. 4.2 summary in this study we investigated the predictive impact of primary school teachers’ epistemological beliefs, their beliefs and their knowledge on the promotion of srl, and teachers’ self-efficacy beliefs as determinants of teachers’ self-reported promotion of srl in the classroom. as the results show, teachers’ self-efficacy towards promoting srl has the strongest direct predictive value on self-reported teacher behavior, while teacher beliefs have a strong direct and indirect impact via teacher self-efficacy and teacher knowledge. teacher knowledge has a direct, as well as an indirect, effect via teacher self-efficacy. teachers’ epistemological beliefs towards the changeability of learning ability shows a direct effect on teacher beliefs towards the promotion of srl, and it shows a direct effect on teachers’ self-efficacy beliefs in a way that teachers, who do not assume learning to be changeable, show lower self-efficacy and less positive beliefs towards the promotion of srl. moreover, we found a direct effect of teachers’ epistemological beliefs on teacher knowledge with teachers, who assume learning not to be changeable, showing more knowledge regarding the promotion of srl. the results offer new insights into teacher beliefs and how they might account for teacher (selfreported) behavior regarding the promotion of srl. the more teachers feel capable of instructing selfregulation strategies, and to manage self-regulating students, the more they report to promote more srl when they teach. furthermore, the more teachers report to believe that students can benefit from srl, the more they report to support their students by supplying them with self-regulation strategies, and by offering learning situations that allow for self-regulation. finally, the more teachers have knowledge on how to foster srl, the more they report to show supporting teaching behavior with regard to srl. dignath et al | f l r 98 4.3 conclusions the following conclusions can be drawn with regard to our hypotheses: 4.3.1 hypothesis 1: epistemological beliefs about learning as fixed or changeable ability (in terms of rather general teacher beliefs) are assumed to influence teachers’ beliefs on the promotion of srl(as these are more specific). the more teachers think of learning as not being changeable, the less positive they are towards the promotion of srl. this result is not surprising: why would a teacher, who does not believe in the nature of learning as something to change and to develop, support the idea of students taking over responsibility for their own learning. as a fixed ability, learning to self-regulate can hardly make sense. it also supports the findings of pajares (1992) that more general beliefs predict more specific beliefs. 4.3.2 hypothesis 2: teacher beliefs are assumed to affect teacher knowledge on the promotion of srl. first, we found that teachers, who think of learning as not being changeable, report more knowledge than teachers who do not. at first sight, this result seems to be counterintuitive, but when looking at the operationalization of teacher knowledge, the finding makes more sense. teacher answers that included only one of the two aspects of fostering srl – 1. creation of a learning environment that allows for students’ selfregulation or 2. instruction of strategies that help students to handle a self-regulatory learning environment effectively – were ranked lower than teacher answers that included both aspects. however, most teachers, who included only one aspect, focused on the autonomous learning environment, and not on strategy instruction. this result replicates the results of former studies showing that most teachers, who support srl in school, particularly associate this with allowing students to have more freedom, but rarely providing them with the necessary strategies to deal with this autonomy (see e.g., bolhuis & voeten, 2001; de kock, sleegers & voeten, 2005; dignath-van ewijk et al., 2013). the results found here might indicate that teachers who think of learning as not being changeable, cannot integrate the idea of providing students with autonomy as a mean to support their self-regulation (lower ranked teacher answer) into their concept of knowledge. if, at all, they might accept the idea of giving students more autonomy by additionally teaching them first how to regulate this autonomy (higher ranked teacher answer). secondly, as assumed, the results showed that teachers, who are positive towards the promotion of srl, also share the idea of providing students with strategy knowledge, as well as with autonomy. 4.3.3 hypothesis 3: teacher beliefs and teacher knowledge have direct effects on self-reported teacher behavior, with teacher beliefs being a stronger predictor than teacher knowledge. teachers who know about the importance of creating a learning environment that allow students to self-regulate, and who are aware of the importance of providing students with the necessary strategies to deal with more autonomy, report to also implement these factors in their teaching. furthermore, our results confirm former evidence on the significance of teacher beliefs and teacher knowledge for self-reported teaching behavior. although both can play a role for teachers’ self-reported practice in the classroom, teacher beliefs seem to have a larger impact than does their knowledge (knowledge was not significant on the 5% level): on the one hand, by directly and strongly influencing teachers’ self-reported practice, and on the other hand, by influencing teacher knowledge and teacher self-efficacy which again predict teachers’ self-reported practice as well. this conforms with former evidence of teacher knowledge and teacher beliefs in general (see for an overview e.g., pajares, 1992) and more specifically for the promotion of srl, as well as with the results of spruce and bol (2014). although their findings of ten teachers did not deliver quantitative results, their descriptives showed that those teachers with the highest scores in classroom observations (i.e. teacher behavior) also reached higher scores for teacher beliefs, but not for teacher knowledge. looking at it the other way around, teachers with the highest scores on teacher beliefs also achieved higher observation scores, while teachers with the highest scores on teacher knowledge achieved only low observation scores. one could therefore assume that teacher beliefs would also be a better predictor for teacher behavior than teacher knowledge (spruce & bol, 2014). dignath et al | f l r 99 teachers’ epistemological beliefs were not found to directly predict teachers’ self-reported practice. this is also in line with former research in which no close alignment between teachers’ epistemological world views and their teaching practice could be found (olafson & schraw, 2006), although the results in this area are mixed (see creemers et al., 2013; shraw & olafson, 2002; sosu & gray, 2012). ravindran, greene and debacker (2005) studied the relationships between several epistemological beliefs and the meaningful cognitive engagement of preservice teachers. although they could find positive relationships between most epistemological beliefs and cognitive engagement, the factor innate ability did not turn out to directly predict the cognitive engagement of preservice teachers (ravindran et al., 2005). 4.3.4 hypothesis 4: teachers’ self-efficacy to promote srl predicts teachers’ self-reported promotion of srl, next to teacher beliefs and teacher knowledge. the amount in which teachers feel competent enough to foster their students’ self-regulation depends on their beliefs, as well as on their knowledge. this result is also in line with the results of chatzistamatiou et al. (2014) who found teachers’ self-efficacy beliefs to predict teachers’ instruction of srl in mathematics. it is also in line with the teachers’ answers that perry et al. (2008) found, showing that teachers might have a positive attitude towards srl, but that they just do not feel able to support their students with their selfregulation. moreover, self-efficacy seems to play an even bigger role rather than just being another component in the model, as self-efficacy has the strongest predictive value among teacher beliefs and teacher knowledge. 4.3.5 hypothesis 5: teacher beliefs and teacher knowledge predict teachers’ self-efficacy. as expected, we found that the more positive teachers are towards the promotion of srl in primary school classrooms, and the more they know about supporting their students’ self-regulation, the more competent they feel with handling a learning environment conducive to self-regulation. as the goodness-offit indices had suggested, our initial model showed options to be improved in order to fit the data accordingly. we therefore refitted the model by including another path from teachers’ epistemological beliefs to teachers’ self-efficacy. initially, we did not assume teachers’ epistemological beliefs to predict teacher self-efficacy towards promoting srl directly, but rather only through teachers’ beliefs specifically towards the promotion of srl. however, this added path does make sense theoretically when considering teachers’ self-efficacy as beliefs as well. the self-efficacy beliefs of teachers can be considered as specific beliefs, as well, that are supposed to be influenced by more general beliefs (pajares, 1992). it is therefore theoretically reproducible that there is a direct path from teachers’ beliefs of learning as changeable to their self-efficacy beliefs about feeling able to promote srl in their classrooms. 4.4 limitations as for all research on srl or its promotion by teachers, which is carried out by means of self-report, this limitation also applies to this study: teachers might have tried to present themselves in a socially desired way, or, in other words, more positive (or negative) towards the promotion of srl then they actually are. moreover, the questionnaire data implies the risk that teachers might not understand or misunderstand certain items. finally, having asked teachers questions retrospectively, teacher answers might be incorrect due to problems with recall. all variables had been assessed by means of self-reporting. this problem seems to be the smallest with regards to teacher knowledge, since knowledge can somehow be assessed more objectively than can be beliefs. the problem is biggest for teacher behavior which was assessed as the teachers’ self-reported practice, since teachers might have answered in a most socially desirable way here. classroom observations would be more objective and could add to the reliability of the data in order to judge the teachers’ promotion of srl (see e.g., dignath-van ewijk et al., 2013). furthermore, by solely relying on the teachers’ self-report, all answers have been assessed within the same sample. high intercorrelations between all variables could be attributed to teachers following the same answering pattern for all questionnaires. by including an external perspective, e.g. of the students or external classroom observers, this problem could be resolved. moreover, we have limited this investigation to potential determinants on the dignath et al | f l r 100 teacher level. yet, research by lombaerts et al. had shown that variables on the school level and on the student level could also have an impact on teachers’ promotion of srl (lombaerts et al., 2007). future research should also include these aspects in order to broaden the picture. particularly with regards to adaptive teaching, teachers’ reactions to student variables might play an important role. finally, the generalizability of our results is limited to the limited sample size. as participation in the study was voluntary, it might be that only very motivated teachers or teachers who have been interested in srl already agreed to participate. our sample would then not be representative. yet, the results can provide interesting first insights into the interrelation of determinants of the promotion of srl which should be investigated further with a larger sample. 4.5 implications for future research 4.5.1 implications for intervention research on srl research on the role of teachers in the promotion of srl is most notably intervention research. however, interventions to help teachers promote srl in their classrooms mainly focus on the instruction of what srl is and how teachers can create learning environments to foster srl. first, when developing and evaluating interventions, researchers should also include information about a potential inheritability of learning abilities in order to correct misconceptions of teachers. second, as the results show, teachers’ attitudes towards how beneficial srl is for their students, their epistemological beliefs on whether the learning behavior of their students is innate or not, as well as teachers’ own self-efficacy to promote srl, should be addressed in order to integrate not only cognitive, but also motivational aspects in teacher training. 4.5.2 implications for research on teacher knowledge and srl concerning theoretical contributions to the research of teachers’ promotion of srl, the question of what teacher knowledge on the promotion of srl implies and how it relates to epistemological beliefs has to be investigated further. in this study, teacher knowledge on promoting srl was defined as a two-folded construct, including the instruction of strategies plus the design of the learning environment (see dignathvan ewijk et al., 2013; paris & paris, 2001). more detailed analyses of these two aspects and their relation to teachers’ beliefs and behavior are needed to understand the result found in this study. wilson and bai (2010) investigated the relationship between teachers’ metacognitive knowledge and their pedagogical understandings of what is necessary for the teaching of metacognition. they found that teachers’ understanding of metacognition was related to their ideas of instructional strategies for srl (wilson & bai, 2010). based on their results, it would be interesting to further investigate in how far teachers’ understanding of srl and metacognition influences their knowledge on how to promote srl among their students as two steps in their knowledge of promoting srl: 1. the understanding of srl could serve as prior knowledge, and 2. the understanding of teaching srl would then be based on this prior knowledge. future research should take this first step of knowledge on srl into account when investigating further determinants of teachers’ promotion of srl. 4.5.3 implications for research on teacher self-efficacy and srl former research has shown the negative consequences of low self-efficacy in teachers, e.g. in terms of teacher burnout (skaalvik & skaalvik, 2007) or lower instructional quality (tschannen-moran & johnson, 2011; wertheim & leyser, 2002; swars, 2005). low teacher self-efficacy can have a negative impact on what teachers dare to try out in their classrooms. on the other hand, holzberger et al. (2013) investigated not only the effect of self-efficacy on instructional quality, but, in a longitudinal design, also the impact of instructional quality on teacher self-efficacy in the following school year. although their results also supported the results of former research on the effect of self-efficacy on instructional quality, they additionally found a reciprocal effect (holzberger et al., 2013). their findings suggest that self-efficacy and teacher behavior have rather a reciprocal relationship than just a one-sided one. as holzberger et al. (2013) dignath et al | f l r 101 argue, the experience that teachers make during teaching is used as feedback from the students about the teachers’ instructional success. this feedback can serve in terms of one of bandura’s four factors that determine a person’s self-efficacy as described earlier: enactive attainment (bandura, 1986). for our study, this implies that it would be interesting to investigate in how far teachers’ promotion of srl can predict teachers’ self-efficacy. such a reciprocal relationship between teachers’ self-efficacy and their promotion of srl would not only be interesting for the theoretical understanding of how these aspects of teacher competence are related, but it would also have implications for research on teacher training. future research should investigate in how far students’ reactions on teachers’ behavior could serve as a feedback for the teacher on his or her success in promoting srl among the students, and in how far this feedback can serve to enhance teachers’ self-efficacy for continuing to foster srl. when implementing innovative teaching in schools, the self-efficacy of teachers should be taken into account in order to succeed with the implementation. this implies close cooperation with teachers, including getting to know what teachers find doable and finding out how to support them. 4.5.4 implications for research in other contexts finally, it would be interesting to conduct a similar study with high school teachers in order to compare the results found here with teachers from different contexts and different subject matters. we know from meta-analysis that, for the training of srl, different training characteristics are more or less effective for primary versus secondary school students and for different school subjects (dignath et al., 2008). therefore, one can assume that, for the promotion of srl, teachers might have made different experiences with different groups of students or within different subjects that affect their beliefs. 4.5.5 implications for research on building a model on teachers’ promotion of srl when looking at the literature on srl, one finds only the lack of a clear model on teaching srl. although there are contributions on how srl should be promoted (e.g., paris & paris, 2001; pressley et al., 1992; for an overview, see dignath, 2009), no specific model of teacher competence to foster srl exists yet. future research should connect the findings of studies on determinants of teachers’ promotion of srl (e.g., dignath-van ewijk & van der werf et al., 2012; lombaert et al., 2009; spruce & bol, 2014; vandevelde et al., 2012) and merge them with findings on how to assess teachers’ behavior in the classroom with regard to fostering srl by means of more sophisticated methods than self-reporting (see e.g., dignath-van ewijk et al., 2013 for observation methods on promoting srl) in order to collect more information on teacher competence in effectively promoting srl. keypoints this study reports research on determinants of teachers’ self-reported promotion of self-regulated learning an area which remains a gap in research on self-regulated learning. the investigation includes teacher beliefs on (1) teachers’ self-reported practice on instructing srl, (2) teachers’ self-efficacy with regard to promoting srl, and (3) teachers’ epistemological beliefs regarding learning. path analyses were conducted and reveal new insights into constructing a model of teachers’ selfreported promotion of srl in the classroom. a large teacher sample participated in the study in order to provide representative classroom data. the study is innovative as there is no research yet regarding teacher beliefs and teacher selfefficacy predicting teachers’ self-reported practice of enhancing srl. dignath et al | f l r 102 acknowledgments the author thanks the student assistants cornelia haaß, melanie scheuermann, and katharina uhrig for their help with the data collection, as well as oliver dickhäuser for his comments on an earlier draft of this article. references ajzen, i. (1991). the theory of planned behavior. organizational behavior and human decision processes, 50, 179–211. bandura, a. (1977). self-efficacy: toward a unifying theory of behavioral change. psychological review, 84, 191. bandura (1986). social foundations of thought and action: a social cognitive theory. englewood cliffs, nj: prentice-hall. baumert, j. & kunter, m. (2013). the coactiv model of teachers’ professional competence. in m. kunter, j. baumert, w. blum, u. klusmann, s. krauss & m. neubrand (eds.), cognitive activation in the mathematics classroom and professional competence of teachers (pp. 25-48). new york: springer us. baumert, j. & kunter, m. (2006). stichwort: professionelle kompetenz von lehrkräften [keyword: professional competence of teachers. zeitschrift für erziehungswissenschaft, 9, 469-520. bell, p. d. (2006). can factors related to self-regulated learning and epistemological beliefs predict learning achievement in undergraduate asynchronous web-based courses? perspectives in health information management/ahima, american health information management association, 3, 7. bendixen, l. d. & hartley, k. (2003). successful learning with hypermedia: the role of epistemological beliefs and metacognitive awareness. journal of educational computing research, 28, 15-30. bolhuis, s. & voeten, m. j. m. (2001). toward self-directed learning in secondary schools: what do teachers do? teaching and teacher education, 17, 837–855. brown, a. l., campione, j. c. & day, j. d. (1981). learning to learn: on training students to learn from texts. educational researcher, 10, 14-21. canrinus, e. t., helms-lorenz, m., beijaard, d., buitink, j. & hofman, a. (2012). self-efficacy, job satisfaction, motivation and commitment: exploring the relationships between indicators of teachers’ professional identity. european journal of psychology of education, 27, 115-132. chan, w. y., lau, s., nie, y., lim, s. & hogan, d. (2008). organizational and personal predictors of teacher commitment: the mediating role of teacher efficacy and identification with school. american educational research journal, 45, 597–630. chatzistamatiou, m., dermitzaki, i. & bagiatis, v. (2014). self-regulatory teaching in mathematics: relations to teachers' motivation, affect and professional commitment. european journal of psychology of education, 29, 295-310. de kock, a., sleegers, p. & voeten, m. j. m. (2005). new learning and choices of secondary school teachers when arranging learning environments. teaching and teacher education, 21, 799–816. dignath-van ewijk, c., dickhäuser, o. & büttner, g. (2013). assessing how teachers enhance self-regulated learning a multi-perspective approach. journal of cognitive education and psychology, special issue on self-regulated learning, 21, 338-358. dignath-van ewijk, c. & van der werf, g. (2012). what teachers think about self-regulated learning: an investigation of teacher beliefs about enhancing students' self-regulation and how they predict teacher behavior. education research international, doi:10.1155/2012/741713. dignath et al | f l r 103 dignath, c. (2009). different aspects of the promotion of self-regulated learning: a multi-method investigation on the instruction of self-regulated learning at primary and secondary school. dissertation universität frankfurt. dignath, c., büttner, g. & langfeldt, h.-p. (2008). how can primary school students acquire self-regulated learning most efficiently? a meta-analysis on interventions that aim at fostering self-regulation. educational research review, 3, 101-129. doornik, j. a. & hansen, h. (2008). an omnibus test for univariate and multivariate normality. oxford bulletin of economics and statistics, 70, supplement, 927-939. dweck, c. s. & leggett, e. l. (1988). a social-cognitive approach to motivation and personality. psychological review, 95, 256. ertmer, p. a. (2005). teacher pedagogical beliefs: the final frontier in our quest for technology integration? educational technology research and development, 53, 25–39. fenstermacher, g. d. (1994). the knower and the known: the nature of knowledge in research on teaching. review of research in education, 20, 3-56. fives, h. (2003, april). what is teacher efficacy and how does it relate to teachers’ knowledge? a theoretical review. american educational research association annual conference, chicago. ghaith, g. & yaghi, h. (1997). relationships among experience, teacher efficacy, and attitudes towards the implementation of instructional innovation. teaching and teacher education, 13, 451-458. guo, y., piasta, s. b., justice, l. m. & kaderavek, j. m. (2010). relations among preschool teachers’ selfefficacy, classroom quality, and children’s language and literacy gains. teaching and teacher education, 26, 1094-1103. guskey, t. r. (1988). teacher efficacy, self-concept, and attitudes toward the implementation of instructional innovation. teaching and teacher education, 4, 63-69. hamman, d., berthelot, j., saia, j. & crowley, e. (2000). teachers' coaching of learning and its relation to students' strategic learning. journal of educational psychology, 92, 342. hattie, j. (2013). visible learning: a synthesis of over 800 meta-analyses relating to achievement. london: routledge. hattie, j., biggs, j. & purdie, n. (1996). effects of learning skills interventions on student learning: a metaanalysis. review of educational research, 66, 99-136. hofer, b. k. (2000). dimensionality and disciplinary differences in personal epistemology. contemporary educational psychology, 25, 378-405. hofer, b. k., & pintrich, p. r. (1997). the development of epistemological theories: beliefs about knowledge and knowing and their relation to learning. review of educational research, 67, 88-140. holzberger, d., philipp, a. & kunter, m. (2013). how teachers’ self-efficacy is related to instructional quality: a longitudinal analysis. journal of educational psychology, 105, 774-786. kagan, d. m. (1992). implication of research on teacher belief. educational psychologist, 27, 65–90. klassen, r. m., tze, v. m., betts, s. m. & gordon, k. a. (2011). teacher efficacy research 1998–2009: signs of progress or unfulfilled promise?. educational psychology review, 23, 21-43. kramarski, b., desoete, a., bannert, m., narciss, s. & perry, n. (2013). new perspectives on integrating self-regulated learning at school. education research international, 2013, article id 498214. kramarski, b. &. michalsky, t. (2009). investigating preservice teachers’ professional growth in selfregulated learning environments. journal of educational psychology, 101, 161–175. kunter, m. (2013). motivation as an aspect of professional competence: research findings on teacher enthusiasm. in m. kunter, j. baumert, w. blum, u. klusmann, s. krauss & m. neubrand (eds.), cognitive activation in the mathematics classroom and professional competence of teachers (pp. 273289). new york: springer us. kunter, m., tsai, y. m., klusmann, u., brunner, m., krauss, s. & baumert, j. (2008). enjoying teaching: enthusiasm and instructional behaviors of secondary school mathematics teachers. learning and instruction, 18, 468-482. lombaerts, k., engels, n. & athanasou, j. a. (2007). development and validation of the self-regulated learning inventory for teachers. perspectives in education, 25, 29–47. dignath et al | f l r 104 lombaerts, k., engels, n., van braak, j. & athanasou, j. a. (2009). development of the self-regulated learning teacher belief scale. european journal of psychology of education, 1, 79-96. lonka, k., joram, e. & bryson, m. (1996). “conceptions of learning and knowledge: does training make a difference?” contemporary educational psychology, 21, 240–260. moely, b. e., hart, s. s., leal, l., santulli, k. a., rao, n., johnson, t. & hamilton, l. b. (1992). the teacher's role in facilitating memory and study strategy development in the elementary school classroom. child development, 63, 653-672. moos, d. c., & ringdal, a. (2012). self-regulated learning in the classroom: a literature review on the teacher’s role. education research international, 2012, article id 423284. muis, k. r. (2007). the role of epistemic beliefs in self-regulated learning. educational psychologist, 42, 173-190. olafson, l. & schraw, g. (2006). teachers’ beliefs and practices within and across domains. international journal of educational research, 45, 71-84. pajares, m. f. (1992). teachers’ beliefs and educational research: cleaning up a messy construct. review of educational research, 62, 307–332. paris, s. g. & paris, a. h. (2001). classroom applications of research on self-regulated learning. educational psychologist, 36, 89-101. perry, n. e., hutchinson, l. & thauberger, c. (2008). talking about teaching self-regulated learning: scaffolding student teachers’ development and use of practices that promote self-regulated learning. international journal of educational research, 47, 97–108. perry, n. e., phillips, l. & dowler, j. (2004). examining features of tasks and their potential to promote self-regulated learning. the teachers college record, 106, 1854-1878. perry, n. e. & vandekamp, k. j. (2000). creating classroom contexts that support young children's development of self-regulated learning. international journal of educational research, 33, 821-843. perry, w. j. r. (1970). forms of intellectual and ethical development in the college years: a scheme. new york: holt, rinehart and winston. pieschl, s., stahl, e. & bromme, r. (2008). epistemological beliefs and self-regulated learning with hypertext. metacognition and learning, 3, 17-37. pressley, m., harris, k. r. & marks, m. b. (1992). but good strategy instructors are constructivists!, educational psychology review, 4, 3–31. ravindran, b., greene, b. a. & debacker, t. k. (2005). predicting preservice teachers’ cognitive engagement with goals and epistemological beliefs. the journal of educational research, 98, 222-232. schiefele, u., moschner, b. & husstegge, r. (2002). skalenhandbuch smile-projekt. [scale handbook of the smile project] bielefeld: university of bielefeld. schmitz, g. s. & schwarzer, r. (2000). selbstwirksamkeitserwartung von lehrern: längsschnittbefunde mit einem neuen instrument. [perceived self-efficacy of teachers: longitudinal findings with a new instrument] zeitschrift für pädagogische psychologie, 14, 12-25. schommer-aikins, m. (2004). explaining the epistemological belief system: introducing the embedded systemic model and coordinated research approach. educational psychologist, 39, 19-29. schommer, m. (1990). effects of beliefs about the nature of knowledge on comprehension. journal of educational psychology, 82, 498-504. schraw, g., crippen, k. j. & hartley, k. (2006). promoting self-regulation in science education: metacognition as part of a broader perspective on learning. research in science education, 36, 111139. schraw, g. & olafson, l. (2003). teachers' epistemological world views and educational practices. journal of cognitive education and psychology, 3, 178-235. shulman, l. s. (1986). those who understand: knowledge growth in teaching. educational researcher, 15, 4-14. sinatra, g. m. & kardash, c. m. (2004). teacher candidates’ epistemological beliefs, dispositions, and views on teaching as persuasion. contemporary educational psychology, 29, 483-498. dignath et al | f l r 105 skaalvik, e. m. & skaalvik, s. (2010). teacher self-efficacy and teacher burnout: a study of relations. teaching and teacher education, 26, 1059-1069. skaalvik, e. m. & skaalvik, s. (2007). dimensions of teacher self-efficacy and relations with strain factors, perceived collective teacher efficacy, and teacher burnout. journal of educational psychology, 99, 611. sosu, e. m. & gray, d. s. (2012). investigating change in epistemic beliefs: an evaluation of the impact of student teachers’ beliefs on instructional preference and teaching competence. international journal of educational research, 53, 80-92. spruce, r. & bol, l. (2014). teacher beliefs, knowledge, and practice of self-regulated learning. metacognition and learning, 1-33. swars, s. l. (2005). examining perceptions of mathematics teaching effectiveness among elementary preservice teachers with differing levels of mathematics teacher efficacy. journal of instructional psychology, 32, 139-147. tillema, h. h. (1995). changing the professional knowledge and beliefs of teachers: a training study. learning and instruction, 5, 291–318. tschannen-moran, m. & johnson, d. (2011). exploring literacy teachers’ self-efficacy beliefs: potential sources at play. teaching and teacher education, 27, 751-761. tschannen-moran, m. & woolfolk hoy, a. (2001). teacher efficacy: capturing an elusive construct. teaching and teacher education, 17, 783-805. tschannen-moran, m., hoy, a. w. & hoy, w. k. (1998). teacher efficacy: its meaning and measure. review of educational research, 68, 202-248. vandevelde, s., vandenbussche, l. & van keer, h. (2012). stimulating self-regulated learning in primary education: encouraging versus hampering factors for teachers. procedia-social and behavioral sciences, 69, 1562-1571. weinert, f. e. (2001). concept of competence: a conceptual clarification. in rychen, d. s. & salganik., l. h. (eds.), defining and selecting key competencies (pp. 45-65). ashland, oh: hogrefe. wertheim, c. & leyser, y. (2002). efficacy beliefs, background variables, and differentiated instruction of israeli prospective teachers. the journal of educational research, 96, 54-63. wilson, n. s. & bai, h. (2010). the relationships and impact of teachers’ metacognitive knowledge and pedagogical understandings of metacognition. metacognition and learning, 5, 269-288. wright, s. (1934). the method of path coefficients. the annals of mathematical statistics, 5, 161-215. woolfolk, a. e. & hoy, w. k. (1990). prospective teachers’ sense of efficacy and beliefs about control. journal of educational psychology, 82, 81-91. yadav, a. & koehler, m. (2007). the role of epistemological beliefs in preservice teachers’ interpretation of video cases of early-grade literacy instruction. journal of technology and teacher education, 15, 335361. zimmerman, b. j. (2000). attaining self-regulation: a social cognitive perspective. in m. boekaerts, p. r. pintrich & m. zeidner (eds.), handbook of self-regulation (pp. 13-39). san diego, ca, us: academic press. microsoft word seiz et al_publication.docx frontline learning research vol.3 no. 1 (2015) 55 77 issn 2295-3159 corresponding author. johanna seiz, institute of psychology, department of educational psychology, goethe university, theodor-w.-adorno-platz 6, 60629 frankfurt/main, germany. e-mail: seiz@psych.uni-frankfurt.de doi: http://dx.doi.org/10.14786/flr.v3i1.141 when knowing is not enough – the relevance of teachers’ cognitive and emotional resources for classroom management johanna seiza, thamar vossb, mareike kuntera agoethe university of frankfurt, germany buniversity of tübingen, germany article received 22 december 2014 / revised 13 march 2015 / accepted 26 march 2015 / available online 13 may 2015 abstract this study expands the discussion on teacher competence by investigating the relevance of teachers’ combined cognitive resources and emotional resources for effective classroom management. while research on teacher qualification stresses the importance of knowledge for effective teaching, research on teacher stress focuses on their emotional functioning, often without connection to their in-class behaviour. drawing on findings from health psychology showing that high levels of emotional exhaustion can impair cognitive performance, we hypothesised that teachers’ pedagogical/psychological knowledge would predict their classroom management behaviour only when their level of emotional exhaustion was low. we administered a test to assess the pedagogical/psychological knowledge of 205 secondary school teachers, measured their emotional exhaustion, and assessed their classroom management using ratings of their 4,672 students obtained one year later. data were analysed using latent moderation analyses, a novel statistical approach that rarely has been employed in research on learning and instruction. our findings confirmed our hypotheses and indicated an interaction between teachers’ cognitive resources and emotional resources, which together predict their classroom management behaviour. thus, the new theoretical and empirical integration of two distinct areas of teacher quality broadens our understanding of teacher resources necessary for effective instruction. we argue that teacher education should acknowledge the interplay of the different resources teachers have and help them develop their emotional resources to ensure effective instruction. keywords: classroom management; teacher competence; emotional exhaustion; professional knowledge seiz et al | f l r 56 1. introduction there has been considerable debate in educational research about which qualities make teachers effective (e.g., roehrig et al., 2012). from a subject-specific perspective, professional knowledge as a cognitive resource is essential (shulman, 1986, 1987); however, some authors (e.g., jennings & greenberg, 2009) stress the importance of teachers’ emotional resources. in this study we expand the discussion on teacher competence by investigating the relevance of teachers’ combined cognitive resources and emotional resources for effective classroom management. teaching is a complex activity (doyle, 2006; helsing, 2007) in two respects. first, classrooms have unique characteristics (doyle, 2006). for instance, the multitude of tasks which all require an adequate response from the teacher, reflects considerable multidimensionality. as many tasks occur simultaneously, teachers need appropriate monitoring and management skills. unexpected disruptions can occur in the classroom and put constant pressure on the teacher and the teaching task (doyle, 2006). second, teachers need to employ several skills for effective instruction (e.g. baumert et al., 2010). they must choose instructional tasks and appropriate methods, establish rules and structures to manage the class, and provide students with emotional as well as individual learning support (baumert et al., 2010; pianta & hamre, 2009). all these demands and practices occur simultaneously and are interconnected. some authors argue that efficient classroom management supports learning-related activities as it structures the learning environment (doyle, 2006; ophardt & thiel, 2008). the widely-used observation tool classroom assessment scoring system (class) considers classroom organisation to be one dimension in its framework, as it is believed to be relevant to students’ academic and social development (pianta & hamre, 2009). taking this into account, we focus in this study on classroom management as an important part of instructional quality. evertson and weinstein (2006) defined classroom management as “the actions teachers take to create an environment that supports and facilitates both academic and social-emotional learning” (p. 4). this definition subsumes various dimensions of teacher behaviour. empirically effective and therefore central dimensions of classroom management include monitoring students’ behaviour, preventing disturbances, establishing rules, and quickly intervening during disruptions (marzano, marzano, & pickering, 2003). monitoring involves continuously observing students, which enables the teacher to prevent or detect, and possibly react quickly to, disruptions (doyle, 2006; kounin, 1970). establishing rules in the classroom is an important part of classroom management (emmer & evertson, 2013), as these help students to regulate their behaviour. in addition, reacting adequately and quickly to disruptions is crucial (marzano et al., 2003). further dimensions of effective classroom management focus on the quality of student-teacher relationships and the maintenance of instructional flow (doyle, 2006; pianta, 2006). empirical evidence shows that classroom management is crucial for students in various groups and in different domains (wang, haertel, & walberg, 1993). effective classroom management is a strong predictor for students’ academic outcomes (e.g. wang et al., 1993). yet classroom management also is related to non-cognitive outcomes such as students’ motivation and interest in (fauth, decristan, rieser, klieme, & büttner, 2014; kunter, baumert, & köller, 2007), and satisfaction with, school (nie & lau, 2009). further, effective classroom management can result in better studentteacher relationships (de jong et al., 2014). however, effective classroom management is challenging, especially for young teachers, who often do not feel well prepared for the task (liston, whitcomb, & borko, 2006). to summarise, effective classroom management is crucial for students yet challenging for teachers, as it requires pedagogical, social and emotional competence as well as the ability to react quickly and appropriately in critical situations. given the complexity of classroom management, the seiz et al | f l r 57 question arises as to which resources—in the sense of personal prerequisites—teachers need to manage their classrooms effectively. 1.1 necessary resources for effective classroom management there is considerable discussion about the prerequisites for providing high quality instruction and managing the classroom effectively. in the following section we introduce two views, one stressing the importance of teachers’ professional knowledge and the other stressing the relevance of teachers’ emotional resources. 1.1.1 professional knowledge – the importance of teachers’ cognitive resources one particular cognitive resource that often is considered a prerequisite for high quality instruction is professional knowledge (depaepe, verschaffel, & kelchtermans, 2013; shulman, 1986, 1987). within this discussion professional knowledge is understood as specialised knowledge shared within a community of professionals. research has shown that subject matter related knowledge such as subject-specific content knowledge and subject-specific pedagogical content knowledge are important for processing and communicating content related tasks (depaepe et al., 2013; krauss et al., 2008). subject matter related knowledge is an important predictor for cognitive activation and student achievement (baumert et al., 2010; hill, rowan, & ball, 2005). regarding classroom management, subject-unspecific knowledge such as pedagogical/ psychological knowledge, meaning the teachers’ knowledge of creating and improving classroom situations and interactions, is of great importance. such knowledge includes that of classroom management strategies, teaching methods, classroom assessment and dealing with students’ heterogeneity (park & oliver, 2008; voss, kunter, & baumert, 2011; voss, kunina-habenicht, & kunter, 2015). pedagogical/psychological knowledge subsumes declarative and procedural knowledge (voss et al., 2011). the importance of this knowledge was indicated in a recent study in which teachers’ pedagogical/psychological knowledge was shown to be associated with the quality of their instruction, including classroom management (voss, kunter, seiz, hoehne, & baumert, 2014). helping teacher candidates develop classroom management skills is therefore an essential part of teacher education (emmer & stough, 2001). the assumption that teachers need cognitive resources such as professional knowledge for effective instruction thus seems well established. however, considering the great challenge that teaching may present, other researchers have argued that emotional resources are another important asset for teachers. 1.1.2 the importance of teachers’ emotional resources in their prosocial classroom model, jennings and greenberg (2009) claim that teachers’ social and emotional resources are prerequisites for effective teaching and especially for classroom management. following their model, teachers with sufficient emotional resources are better capable of dealing with diverse challenges in their classrooms such as effectively managing their classrooms. in the model it is assumed that effective classroom management leads to an optimal classroom climate with positive social, emotional and academic outcomes for students (jennings & greenberg, 2009). teachers’ emotional resources clearly are an important topic to investigate (sutton, 2005; sutton & wheatley, 2003). teaching is an emotionally challenging profession and teachers need to be able to regulate their emotions (roeser et al., 2013). teachers’ emotional resources have often been analysed within health psychology, focusing on how negative emotions evolve; yet few studies have analysed the relationship between teacher emotions and instructional behavior. keller, chang, becker, seiz et al | f l r 58 goetz, and frenzel (2014) showed that emotional exhaustion was related to teachers’ emotional experience in their classrooms. highly exhausted teachers reported increased feelings of anger and less enjoyment during instruction. further, teacher emotions were associated with student-rated instructional quality (frenzel, goetz, stephens, & jacob, 2009). in a study testing the assumption that teachers’ emotional resources are relevant to their instructional behaviour, klusmann, kunter, trautwein, lüdtke, and baumert (2008) found that teachers who were able to balance their emotional engagement attained better instructional quality and their students reported greater motivation. additionally, students’ and teachers’ emotions seem to be related (becker, goetz, morger, & ranellucci, 2014): students who witnessed their teachers enjoying instruction also felt more enjoyment in class. summing up, teacher emotions are a relevant resource for effective instruction. 1.1.3 combining the perspectives: the interplay of cognitive and emotional resources to date, researchers have investigated the cognitive and emotional resources of teachers mostly in separate studies stemming from different theoretical traditions (e.g. brouwers & tomic, 1999; depaepe et al., 2013; skaalvik & skaalvik, 2011), neglecting a possible joint relevance for effective teaching, especially for classroom management. in our study, we combine both perspectives. although we agree that cognitive resources such as professional knowledge are crucial for effective instruction, we argue that due to the complexity of teaching, teachers will be able to profit from their cognitive resources only if they also possess a sufficient amount of emotional resources. thus, one the one hand, in this study we consider teachers’ pedagogical/psychological knowledge as an example of their cognitive resources. on the other hand, we consider emotional exhaustion as a central aspect of teachers’ emotional resources (klusmann et al., 2008). emotional exhaustion is the feeling of being drained or experiencing chronic fatigue and a low level of energy (maslach & leiter, 1999; schwarzer, schmitz, & tang, 2000), and it is the core component of burnout syndrome (maslach, schaufeli, & leiter, 2001). many studies have shown that teachers generally report higher levels of emotional exhaustion than other professionals although significant differences among teachers exist (e.g., hakanen, bakker &, schaufeli, 2006; unterbrink et al., 2007). the interdependence of cognitive and emotional resources already has been empirically demonstrated in research on health psychology. studies comparing the cognitive functioning of highly exhausted adults and non-exhausted adults have shown that those with high levels of exhaustion had impaired cognitive functioning (kleinsorge, diestel, scheil, & niven, 2014; sandström, rhodin, lundberg, olsson, & nyberg, 2005). in a study by van der linden, keijsers, eling, and schaijk (2005) a non-clinical sample of exhausted teachers performed significantly lower on cognitive performance tasks than a sample of non-exhausted teachers. feuerhahn et al. (2013) investigated the relation between emotional exhaustion and multiple indicators of performance using a sample of teachers with varying degrees of exhaustion. they found that emotional exhaustion was related to cognitive impairment. in a follow-up investigation six months later, emotional exhaustion at the first testing period predicted impairment ratings at the second testing period. however, emotional exhaustion at the second testing period was not predicted by cognitive impairment at the first testing period, meaning that emotional exhaustion leads to cognitive impairment, rather than the other way around. most of these studies were framed within information processing theory (e.g., feldon, 2007; mayer, 2012; sweller, van merrienboer, & paas, 1998) which assumes that emotional exhaustion limits information processing capacity and thus leads to poorer performance on cognitive performance tasks. applying this to teachers, who need sufficient information processing capacities to be able to use their cognitive resources in challenging classroom situations (feldon, 2007), one might assume that high levels of emotional exhaustion could drain processing capacities limiting the access to professional knowledge. seiz et al | f l r 59 further theories and approaches can be used to support our argument. ego depletion theory assumes that self-regulation is based on a limited amount of resources (baumeister, gailliot, dewall, & oaten, 2006) and that each act of self-control exhausts these resources and leads to a state of ego depletion. subsequent attempts at self-control or volition will fail due to a lack of available resources. studies supporting ego depletion theory showed that after acts of self-regulation (e.g. emotional or attentional regulation) performance was impaired in tasks demanding high-order cognitive functioning (johns, inzlicht, & schmader, 2008; schmeichel, vohs, & baumeister, 2003). it could be argued that teachers with a high level of emotional exhaustion are in a state of ego depletion because their selfregulatory efforts have used up resources for further acts of volition (e.g., knowledge-based decisions concerning classroom management). 1.2 the present study in this study we analyse the interaction between teachers’ cognitive resources and emotional resources for classroom management behaviour. we argue that knowledge (as a cognitive resource) and emotional exhaustion (as an emotional resource) are interconnected when it comes to predicting teachers’ behaviour as emotional exhaustion might limit capacities to process knowledge. classroom management behaviour such as monitoring or preventing disturbances relies on cognitive resources as it requires quick reactions to the unforeseen (e.g., feldon, 2007). we hypothesize that the successful application of knowledge in challenging classroom situations requires sufficient information processing capacity, but that high emotional exhaustion will reduce these processing capacities, thus limiting teachers’ access to knowledge. only when teachers possess sufficient emotional resources will they have enough capacities to apply knowledge-based strategies to manage the classroom. to our knowledge, this is the first study that combines cognitive and emotional resources of teachers to predict their in-class teaching behaviour. 1.3 hypotheses methodologically, we thus investigate whether teachers’ emotional exhaustion moderates the relation between their professional knowledge and their classroom management behaviours, as indicated by their prevention of disturbances and their monitoring behaviour. we hypothesise as follows: [1] pedagogical/psychological knowledge will not predict: a) classroom disturbances when the level of emotional exhaustion is high. b) monitoring behaviour when the level of emotional exhaustion is high. [2] when the level of emotional exhaustion is low, pedagogical/psychological knowledge will relate: a) negatively to classroom disturbances. b) positively to monitoring behaviour. seiz et al | f l r 60 2. method 2.1 design and sample the data used in this study were derived from a larger longitudinal study investigating the development of secondary school mathematics teacher candidates’ professional competence during and after the practical induction phase. the practical induction phase is mandatory in germany and follows university studies. during this phase teacher candidates are placed in schools where they observe instruction and gradually start their own teaching. in addition, they attend courses on general principles of teaching. two assessments of this study were used for this analysis. the first assessment involved 568 participants and was conducted at the end of the participants’ induction phase. in this assessment, pedagogical/psychological knowledge and emotional exhaustion were assessed. the aim of the second assessment was to gather data on instructional quality (rated via students) of the participants after they had taken over full teaching responsibilities. therefore, the second assessment was conducted 14 months after the end of the induction phase to ensure that participants were already established as teachers. in this assessment 205 teachers and their students still participated. in our study we aimed at predicting studentrated quality of classroom management using prior teacher resources. therefore we used this subsample of 205 teachers of the second assessment as our sample of analysis. our sample was 61% women and the average age of the participants was 28.4 years (sd = 3.74) at the first assessment. participants had on average 14 months of teaching experience when the data were collected at the second assessment. germany has a tracked school system with a high, an intermediate and a low track. these different school types were represented in the sample; however, the sample was slightly skewed as 61.3% of the participants taught the highest school track. in 2013/2014, 47% of all students in germany attended the highest school track (statistisches bundesamt, 2014). we analysed the demographic variables (age, sex and school type) and self-reports on motivation and exhaustion of the dropouts from the two different samples of the longitudinal study. participants of the second assessment taught more in the higher school track, were more enthusiastic and satisfied with their jobs, and showed less emotional exhaustion. thus, the generalisability of our results may be somewhat compromised. in addition, 4,672 students from grades 7 to 10 participated in the second assessment and were included in our analyses. on average 12 students rated the classroom management of each teacher. teachers were allowed to have up to five classes participate in the ratings. however, ratings from all the classes of each teacher were combined, as they revealed high correlations across classes and our focus was on the teacher. a different analysis of this data focussing on the importance of pedagogical/psychological knowledge for general instructional quality based on a different teacher sample already has been published (voss et al., 2014). the focus of this investigation is the relevance of the interplay between different teacher resources and how this interplay affects classroom management, which has not yet been the subject of investigation. including emotional exhaustion as a moderator expands existing research and allows testing more differentiated hypotheses on the relevance of teachers’ professional knowledge. seiz et al | f l r 61 2.2 measures we applied confirmatory factor analysis and structural equation modeling. the scales and items described represent the multiple indicators for the latent factors. table 1 provides an overview of the descriptive data and the reliabilities of the measures based on the raw dataset. the remainder of the analysis refers to the latent dimensions of the variables. table 2 provides an overview of the fit indices of the measurement models; appendix a displays information on factor loadings of the indicators on the respective factors. table 1 psychometric properties of study variables variables items m sd icc1 icc2 adm α missing in % teacher ratings pedagogical/psychological knowledge 39 73.37 11.45 ― ― ― .79 19.5 emotional exhaustion 4 72.02 1.64 ― ― ― .81 8.3 student ratings classroom disturbances 6 72.17 1.74 .33 .92 .70 ― .6 monitoring 3 72.85 1.65 .23 .87 .64 ― .7 note. student ratings based on teacher mean scores. icc = intraclass correlation, adm = average deviation index, averaged across all classes of each teacher. 2.2.1 pedagogical/psychological knowledge to assess teachers’ pedagogical/psychological knowledge we employed a test that had been used and validated in previous studies (voss et al., 2011; voss et al., 2014). the test consists of four scales measuring knowledge of classroom management, teaching methods, classroom assessment and students’ heterogeneity. test construction and validation analysis indicated that the scales are well represented by a second order factor expressing general pedagogical/psychological knowledge (voss et al., 2011); thus we used the scales as indicators of one latent factor. altogether, the measure consists of 39 items across the four subscales including multiple-choice, short-answer and video-based items (voss et al., 2011). the multiple-choice items assessed declarative knowledge whereas procedural knowledge also was assessed using video-based items on classroom management. pedagogical/psychological knowledge as measured by this test has proven to be differentiable from discriminant constructs such as general reasoning ability, pedagogical content knowledge and teacher beliefs about mathematics learning and teaching (voss et al., 2011). 2.2.2 emotional exhaustion we used an established german version (enzmann & kleiber, 1989) of the maslach burnout inventory (maslach, jackson, & leiter, 1996) to assess teachers’ state of emotional exhaustion. the seiz et al | f l r 62 instrument consists of four items and participants rated their agreement with statements (e.g., “i often feel exhausted at school”) on a 4-point response scale (1 = strongly disagree, 4 = strongly agree). table 2 fit indices of individual and combined measurement models without interaction term model χ2 df p cfi rmsea srmr (between) srmr (within) teacher ratings emotional exhaustion 2 8.57 2 .01 .97 .03 .03 ― pedagogical/ psychological knowledge 6.85 2 .03 .94 .03 .04 ― student ratings classroom disturbance 173.42 18 .00 .98 .04 .03 .03 monitoring .00 0 1.00 1.00 .00 .00 .00 measurement models without interaction term model 1 277.93 94 .00 .98 .02 .05 .03 model 2 75.24 49 .01 .98 .01 .06 .00 note. cfi = comparative fit index; rmsea = root-mean-square error of approximation; srmr = standardized root-mean-square residual. dashes indicate nonavailable data. the monitoring model is saturated. 2.2.3 classroom management there are several methods to assess instructional quality: teacher ratings, student ratings or ratings of external observers (lüdtke, robitzsch, trautwein, & kunter, 2009). we measured the quality of classroom management with students’ ratings to avoid shared method variance (podsakoff, mackenzie, lee, & podsakoff, 2003) and because several studies have indicated that students are a reliable and valid source for judging instructional quality (e.g., fauth et al., 2014; lüdtke, trautwein, kunter, & baumert, 2006). research suggests that teacher and student ratings of classroom management are highly congruent (e.g. kunter & baumert, 2006). students responded to all classroom management items using a 4-point likert scale (1 = strongly disagree, 4 = strongly agree). classroom disturbances were assessed using six items (e.g., “in mathematics class it takes a long time at the beginning of the lesson until the students have settled down and started working”), giving examples of wasting time in class and student disruptions. a high score on this scale represented a high rate of classroom disturbances. monitoring was assessed using three items (e.g., “in mathematics our teacher always knows what is going on in the classroom”). both scales were developed in a previous project (baumert, gruehn, heyn, köller, & schnabel, 1997) and have been validated in several other studies (e.g., kunter et al., 2007). two-level confirmatory factor seiz et al | f l r 63 analysis and model difference tests revealed a significantly better fit for a two-factor model of the two aspects of classroom management than for a global factor. to estimate whether the individual student ratings can be conceptualised as indicators for behaviour on the teacher level, we followed recommendations by marsh et al. (2009). the reliability and agreement of the ratings on the teacher level was calculated using intra-class correlations (icc) and the average deviation index (adm) of the manifest scales (lüdtke et al., 2006). the icc1 indicated the amount of variance among groups; in our case it reflected differences in classroom management ratings among teachers. the icc2 described the reliability of the group-mean rating of the whole scale, taking into account the number of raters. it can be interpreted in a similar manner as cronbach’s alpha. the adm is a means for assessing agreement within the group. it represents the average individual deviation from the group mean and is expressed in the metric of the original scale. there were substantial differences in ratings of classroom disturbances (icc1 = .33) and monitoring (icc1 = .23) among the teachers in our sample. both scales showed good reliability on the class level (classroom disturbances icc2 = .92; monitoring icc2 = .87). the adms were at .70 for classroom disturbances and at .64 for monitoring, indicating good agreement, with average individual ratings differing less than one point of the scale from the group mean. 2.2.4 control variable school type was included as a control variable on the teacher level (dummy coded: high track versus lower tracks). 2.3 statistical analysis our data has a hierarchical structure with students being nested in teachers. we therefore analysed the data using multilevel modeling, which overcomes the violation of the independence of observations and produces correct standard errors (hox, 2010). teacher resources were assessed on the teacher level. ratings of classroom management were assessed on the student level. we chose the teacher level and not the class level as our unit of analysis, since our focus is on the relevance of teacher resources. we combined multilevel modeling with structural equation modeling, thus correcting measurement errors. all constructs were estimated as latent factors with multiple indicators using mplus (muthén & muthén, 1998-2010). in our analysis, classroom management was modeled as a latent factor simultaneously on the individual level and the teacher level. with this doubly latent approach we followed the recommendations by marsh et al. (2009), correcting measurement error on both levels as well as sampling error on the teacher level. to test our hypotheses that the relation between pedagogical/psychological knowledge and classroom management is moderated by teachers’ exhaustion, we used the latent moderation structural equation approach (lms; klein & moosbrugger, 2000) implemented in mplus. by using latent predictors and calculating the interaction term of latent predictors we overcame the problem of manifest moderation analysis, in which the multiplicative term is affected particularly by measurement error (klein, 2000). the lms approach corrects measurement error in the predictor terms as well as in the multiplicative interaction term, leading to unbiased estimates for interaction effects. following the suggestion of klein and moosbrugger (2000), the latent factors were entered as predictors and then a multiplicative term of these two latent factors was formed, resulting in the following equation for the between-level (schermelleh-engel, kerwer, & klein, 2014): η b = α + γ 1b ξ 1b + γ 2b ξ 2b + γ 3b ξ 1b ξ 2b + ζ b (1) seiz et al | f l r 64 this analytical approach is relatively new, computationally intensive and has rarely been applied in research on learning and instruction. we calculated two separate models for each aspect of classroom management due to the computational complexity. the rate of missing values was acceptable for most variables (0.7 % for student ratings; 8.3 % for emotional exhaustion) except for pedagogical/psychological knowledge (19.5 %; see table 1). this test was conducted only in the first assessment. the high percentage of missing data for the knowledge scores emerged as not all teachers participating in the second assessment (our sample of analysis) had completed the knowledge test in the first assessment. we analysed the selectivity of teacher respondents vs. non-respondents regarding demographic variables and teachers’ emotional exhaustion. because there were no significant differences between these groups and thus no indication for systematic missing values (schafer & graham, 2002), we used the effective full information maximum likelihood (fiml) algorithm (enders & bandalos, 2001) to estimate missing values in the following analysis. all significance testing was undertaken at the .05 level. for the calculation of practical effect sizes of multilevel analysis the following formula was employed (reyes, brackett, rivers, white, & salovey, 2012): 𝛿 = ! !!!!!! . while γ is the association between predictor and outcome variable, 𝜏!! and 𝜎! are the betweenand within-group variances of the outcome variable (from the unconditional model). reyes et al. (2012) states that δ can be interpreted similarly to cohen’s (1988) d. 3. results 3.1 preliminary analysis we conducted zero-order correlations on the teacher level between all latent factors involved in the analysis (see table 3). seiz et al | f l r 65 table 3 latent standardized correlations of the study variables variables 1 2 3 4 5 teacher ratings 1 pedagogical/psychological knowledge ― -.18 -.10 .11 .22* 2 emotional exhaustion ― .02 .04 -.05 student ratings 3 classroom disturbances ― -.78* -.11 4 monitoring ― -.29* 5 school type ― note. school type is dummy-coded: high track versus low track. * p < .05. 3.2 results of the latent moderation models after our preliminary analysis, we calculated two separate models of latent interaction. pedagogical/psychological knowledge and the moderator emotional exhaustion were entered as predictors. then the multiplicative term of the two latent factors pedagogical/psychological knowledge and emotional exhaustion was added as the third predictor. the dependent variables were either students’ ratings of classroom disturbance or monitoring-ratings. we controlled for school type by including it as an additional predictor in the models. although fit indices for the lms approach have not yet been developed (see table 2 for fit indices for the measurement models without interaction term), it is possible to test the models with interaction effect against models without interaction effect using log likelihood differences, which are χ-distributed (klein & moosbrugger, 2000). the results of the difference tests revealed that the models with interaction term fit the data significantly better than models without interaction term, indicating that a significant interaction effect existed in both models (see table 4). seiz et al | f l r 66 table 4 latent regression on teachers’ classroom management behavior with pedagogical/psychological knowledge as predictor and emotional exhaustion as moderator model 1 model 2 classroom disturbances monitoring variable b (se) δ b (se) δ intercept 2.24*(.04) 2.63*(.03) school type -.08*(.08) -.11 -.22*(.06) -.63 pedagogical/psychological knowledge -.03*(.06) -.04 .06*(.04) .17 emotional exhaustion .06*(.05) .09 -.04*(.04) -.12 ppk x ee .11*(.01) .16 -.10*(.03) -.29 r² .07 .14 note. model fit indices for lms not yet provided by mplus. ppk = pedagogical/psychological knowledge; ee = emotional exhaustion; b = unstandardised regression coefficient; se = standard error; δ = effect size. * p < .05. the effect sizes for the interaction effects can be considered small. following recommendations by aiken and west (1991) we plotted the interactions using different levels of the moderator. the three slopes represent different levels of the moderator emotional exhaustion (one standard deviation below the mean, the mean, and one standard deviation above the mean; see figures 1 and 2). additionally, we tested whether the simple slopes differed significantly from zero, meaning that the slope for the chosen value of the moderator was significant. since the interactions were disordinal, there can be no valid interpretation of the main effects (aiken & west, 1991). for the prediction of monitoring there also was a significant interaction between knowledge and emotional exhaustion. testing the simple slopes revealed that only the slope for a large amount of knowledge and a low level of emotional exhaustion was significant (see figure 2), indicating that only teachers with a high level of knowledge experiencing a low level of exhaustion showed better monitoring. seiz et al | f l r 67 * p < .05. figure 1. interaction effect of pedagogical/psychological knowledge (ppk) and emotional exhaustion (ee) on classroom disturbances * p < .05. figure 2. interaction effect of pedagogical/psychological knowledge (ppk) and emotional exhaustion (ee) on monitoring 4. discussion the aim of this study was to analyse the joint relevance of teachers’ cognitive and emotional resources for classroom management. by analysing the distinct interplay of these resources we extended research in the area of teacher competence. our results indicate that neither cognitive nor 2,00 2,10 2,20 2,30 2,40 2,40 2,50 2,60 2,70 2,80 c la ss ro om d is tu rb an ce s ppk (-1sd) ppk (+1sd) ee (-1 sd) ee (mean) ee (+1 sd) * ppk (-1sd) ppk (+1sd) ee (-1 sd) ee (mean) ee (+1 sd) m on ito rin g * seiz et al | f l r 68 emotional resources alone are linked to students’ ratings of classroom management, as there were no significant bivariate correlations or main effects. still, significant interaction effects illustrate that teachers’ cognitive and emotional resources interact. the results of both interaction models reflect the hypothesised mechanism of interplay between the resources: only the combination of knowledge and a low level of emotional exhaustion is associated with ratings of effective classroom management (a low level of classroom disturbances or a high level of monitoring). these results confirm hypotheses 2a and 2b. however, knowledge does not predict classroom management when the level of emotional exhaustion is high or average (hypotheses 1a and 1b). our results indicate that pedagogical/psychological knowledge alone may not be sufficient for effective classroom management but rather that cognitive and emotional resources are synergistic: only the combination of resources results in better classroom management. these results could be interpreted as potential support for our theoretical argumentation following information processing theory. a high level of emotional exhaustion may influence teachers’ information processing capacity and consequently teachers will not be able to process their knowledge extensively. in a similar vein the results can be interpreted through the lens of ego depletion theory. teachers experiencing high emotional exhaustion need to intensively regulate their emotions during instruction (näring, briët, & brouwers, 2006). this emotional labour may deplete volitional resources for consecutive higher-order cognitive activities, like applying professional knowledge in challenging classroom situations. no matter which theoretical approach is followed, processing of knowledge fails if teachers are highly exhausted, and classroom management is less effective. our study integrated several innovative aspects. first, we combined two theoretical approaches to teacher competence, which have not yet been brought together empirically. through analysing the interaction of cognitive and emotional resources we aimed to detect relevant psychological processes influencing teacher behaviour. second, with our test of teachers’ pedagogical/psychological knowledge we introduced an objective and direct measure of teachers’ cognitive resources, and thus went beyond subjective or distal measures (e.g., course work) to assess teacher knowledge. third, we applied an advanced methodological approach by using latent moderation analysis with multilevel data which rarely has been applied in educational research but overcomes problems of measurement error of the multiplication term (klein, 2000). 4.1 limitations and areas for future research some limitations of this study need to be considered. first, the causal direction of our argumentation and interpretation of our results needs further proof. due to our longitudinal design and the temporal ordering of our variables we concluded that the interplay between knowledge and emotional exhaustion has an effect on later classroom management and that prior teacher resources cannot be affected by later classroom management problems with the classes that provided the ratings; however, we were not able to control prior levels of classroom management. there are several studies indicating that teacher stress and emotional exhaustion may be a consequence of problems with classroom management, and thus reciprocal effects seem likely (e.g., chaplain, 2008; dicke, parker, marsh, et al., 2014). further, problems with classroom management may also impact student’s functioning and behavior (helmke & renkl, 1993; luckner & pianta, 2011), which then may influence teachers’ in-class experiences and thus affect teachers’ resources in return. the relation of teacher resources, classroom management and student functioning is much more complex and our study was only able to focus on some of these relations. more research and different designs are needed to disentangle the different relations, especially between classroom management and teachers’ emotional resources. studies using cross-lagged designs could help researchers approach this question. seiz et al | f l r 69 as our results remained stable when using ratings of emotional exhaustion of the second assessment, they can also support our argumentation. however, the fact that the time interval between the first and the second assessment was 14 months needs to be considered an additional limitation as pedagogical/psychological knowledge is likely to still change after the induction phase. further, another study based on a different teacher sample showed a direct association between pedagogical/psychological knowledge and classroom management (voss et al., 2014), which contrasts with our findings. this study assessed teacher knowledge data at the beginning of the induction phase, using a slightly different subsample. we conducted several analyses in order to interpret these differences. as participants did not differ substantially and the knowledge test was invariant across measurements we conclude differences in results to be on the conceptual level. apparently, during the evolvement of the induction phase and the beginning of regular teaching emotional functioning becomes more important, explaining our findings of interaction effects and our lack of main effects. further, in our study, we combined two approaches to assessing teacher resources focusing on their professional knowledge and emotional exhaustion. however, there are other relevant aspects of teacher competence such as motivational orientations, and other domains of their cognitive resources such as beliefs (baumert & kunter, 2013). some researchers already have approached the question as to how different competence aspects influence each other (dicke et al., 2014; klusmann, kunter, voss, & baumert, 2012). however, we argue that instead of analysing these associations with regard to teacher variables as outcomes it would be interesting to study the additional impact of these interplays on instructional or student outcomes. in general, alternative explanations for the results might be applied. for instance, it would be possible that teachers high in pedagogical/psychological knowledge are very self-efficacious regarding their classroom management. these favourable motivational orientations could also help to apply knowledge during instruction, resulting in effective classroom management (morris-rothschild & brassard, 2006). regarding the generalisability of our results we need to point out some specific characteristics of our sample. first, our sample was not representative. second, our sample consisted of secondary school mathematics teachers and their students. as our research question was not subject-specific we would expect similar results in samples of teachers of other subjects. third, our sample included teachers with relatively little teaching experience. since we based our arguments on information processing theory, our results need to be interpreted with caution: research has shown differences in information processing between expert and novice teachers (swanson, o’connor, & cooney, 1990; wolff et al. 2014). more experienced teachers possess more automatised routines and schemas which claim less information processing capacity (e.g., feldon, 2007). further, differences between experts and novices in terms of classroom management exist in their perceptions of classroom events, in that novices have problems noticing simultaneous class events (van den bogert, van bruggen, kostons, & jochems, 2014). this could be interpreted in a way that that classroom management claims more information processing capacity from novices than from experts (sabers, cushing, & berliner, 1991). according to such findings it is possible that the joint relevance of teacher resources analysed in this study could apply especially to teachers with little experience and thus may be overestimated in our sample. also, it could be that experienced teachers’ emotions differ from those of less experienced teachers when reacting to classroom management situations (sutton & wheatley, 2003). it would be highly recommendable for future studies to explore this relation with samples of more experienced teachers. seiz et al | f l r 70 4.2 theoretical and educational implications teaching is challenging and our results show that resources of successful teachers interact in complex ways. we argue that this expanded view on teacher resources is highly relevant for teacher education and pedagogical practice. teacher education aims to prepare students for their professional career, yet the understanding of teacher resources focuses foremost on teachers’ professional and practical knowledge (korthagen & kessels, 1999). also, teacher selection programs often focus on knowledge, yet our results indicate that knowledge alone might not be sufficient. we argue for a combined approach in teacher education that focuses on the development of professional knowledge as well as on teachers’ emotional resources. several authors highlight the importance of acknowledging teaching as an emotional practice (chang, 2009; sutton & wheatley, 2003).teachers’ emotional resources are also relevant for students. students profit from having warm and highly supportive student-teacher interactions in regard to their academic development, self-regulation and their executive control (e.g. roorda, koomen, spilt, & oort, 2011; williford, whittaker, vitiello, & downer, 2013). as emotions often emerge in various classroom situations regulation of emotions is especially relevant for effective classroom management (sutton & wheatley, 2003). nevertheless, little is known about teachers’ emotional processes in such situations (chang, 2009). chang (2013) showed that the appraisal of classroom incidents involving problematic student behaviour is related to unpleasant emotions, which are associated with burnout. this association between negative emotions and burnout was in turn mediated by different coping strategies. chang (2009) argues that teachers should learn to regulate their emotions by using reappraisal techniques and coping mechanisms. another way to help teachers deal with their emotions has emerged. mindfulness training programs equip teachers with techniques to integrate mindfulness skills in the classroom (flook, goldberg, pinger, bonus, & davidson, 2013) and thereby cope with stress more effectively (roeser et al., 2013). after completing a mindfulness training program, participants showed fewer burnout symptoms, performed better on attentional tasks and even organised their classrooms better than those in a control group (roeser et al., 2013). it would seem beneficial to incorporate such training on emotion regulation in teacher education and professional development programs. helping teachers understand their emotions and enhance their competence in regulating them certainly would not replace teachers’ professional knowledge; however, knowledge of classroom management strategies may help teachers prevent later exhaustion (dicke et al., 2015; klusmann et al., 2012). we argue that teacher education and continuing professional development programmes would profit from broadening their scopes and acknowledging the relevance of cognitive and emotional aspects of teacher competence and their potential interplay. helping teachers address their emotions during teacher education and continuously supporting them in doing this through professional development would have two benefits: synergies between teachers’ cognitive and emotional resources may be promoted, enabling teachers to make the most use of their knowledge in the classroom; and, in the long run, work-related stress and burnout may be lessened or even avoided. seiz et al | f l r 71 keypoints teachers’ cognitive and emotional resources interact. teachers’ knowledge is not related per se to ratings of classroom management. teachers’ knowledge predicts classroom management only when emotional exhaustion is low. acknowledgments this study used data from the coactiv-r research project which was funded by the max planck society’s strategic innovation fund (2008–2010). we would like to thank patricia alexander for her helpful comments on a previous version of this paper. references aiken, l. s., & west, s. g. (1991). multiple regression. testing and interpreting interactions. newbury park, ca: sage publications baumeister, r. f., gailliot, m., dewall, c. n., & oaten, m. (2006). self-regulation and personality: how interventions increase regulatory success, and how depletion moderates the effects of traits on behavior. journal of personality, 74(6), 1773–1802. doi: 10.1111/j.1467-6494.2006.00428.x baumert, j., gruehn, s., heyn, s., köller, o., & schnabel, k. u. (1997). bildungsverläufe und psychosoziale entwicklung im jugendalter (biju). dokumentation, band 1. skalen längsschnitt i, welle 1-4 [learning processes, educational careers and psychological development in adolescence and young adulthood (biju). documentation, vol. 1, scales, waves 1-4]. berlin, germany: max planck institute for human development. baumert, j., & kunter, m. (2013). the coactiv model of teachers’ professional competence. in m. kunter, j. baumert, w. blum, u. klusmann, s. krauss, & m. neubrand (eds.), cognitive activation in the mathematics classroom and professional competence of teachers (vol. 8, pp. 25–48). new york, ny: springer. baumert, j., kunter, m., blum, w., brunner, m., voss, t., jordan, a., . . . tsai, y.-m. (2010). teachers’ mathematical knowledge, cognitive activation in the classroom, and student progress. american educational research journal, 47(1), 133–180. doi: 10.3102/0002831209345157 becker, e., goetz, t., morger, v., & ranellucci, j. (2014). the importance of teachers' emotions and instructional behavior for their students' emotions: an experience sampling analysis. teaching and teacher education, 43, 15–26. doi: 10.1016/j.tate.2014.05.002 brouwers, a., & tomic, w. (1999). teacher burnout, perceived self-efficacy in classroom management, and student disruptive behaviour in secondary education. curriculum and teaching, 14(2), 7–26. doi: 10.7459/ct/14.2.02 chang, m.-l. (2009). an appraisal perspective of teacher burnout: examining the emotional work of teachers. educational psychology review, 21(3), 193–218. doi: 10.1007/s10648-009-9106-y chang, m.-l. (2013). toward a theoretical model to understand teacher emotions and teacher burnout in the context of student misbehavior: appraisal, regulation and coping. motivation and emotion, 37(4), 799– 817. doi: 10.1007/s11031-012-9335-0 chaplain, r. p. (2008). stress and psychological distress among trainee secondary teachers in england. educational psychology, 28(2), 195–209. doi: 10.1080/01443410701491858 seiz et al | f l r 72 cohen, j. (1988). statistical power analysis for the behavioral science (2nd ed.). hillsdale, nj: lawrence erlbaum associates. de jong, r., mainhard, t., van tartwijk, j., veldman, i., verloop, n., & wubbels, t. (2014). how preservice teachers' personality traits, self-efficacy, and discipline strategies contribute to the teacher– student relationship. british journal of educational psychology, 84, 294–310. doi: 10.1111/bjep.12025 depaepe, f., verschaffel, l., & kelchtermans, g. (2013). pedagogical content knowledge: a systematic review of the way in which the concept has pervaded mathematics educational research. teaching and teacher education, 34, 12–25. doi: 10.1016/j.tate.2013.03.001 dicke, t., parker, p. d., holzberger, d., kunina-habenicht, o., kunter, m., & leutner, d. (2015). beginning teachers' efficacy and emotional exhaustion: latent changes, reciprocity, and the influence of professional knowledge. contemporary educational psychology, 41, 62–72. dicke, t., parker, p. d., marsh, h. w., kunter, m., schmeck, a., & leutner, d. (2014). self-efficacy in classroom management, classroom disturbances, and emotional exhaustion: a moderated mediation analysis of teacher candidates. journal of educational psychology, 106(2), 569–583. doi: 10.1037/a0035504 doyle, w. (2006). ecological approaches to classroom management. in c. m. evertson & c. s. weinstein (eds.), handbook of classroom management (pp. 97–125). mahwah, nj: lawrence erlbaum. emmer, e. t., & evertson, c. (2013). classroom management for middle and high school teachers. boston, ma: pearson. emmer, e. t., & stough, l. m. (2001). classroom management: a critical part of educational psychology, with implications for teacher education. educational psychologist, 36(2), 103–112. doi: 10.1207/s15326985ep3602_5 enders, c. k., & bandalos, d. l. (2001). the relative performance of full information maximumlikelihood estimation for missing data in structural equation models. structural equation modeling: a multidisciplinary journal, 8(3), 430–457. doi: 10.1207/s15328007sem0803_5 enzmann, d., & kleiber, d. (1989). helfer-leiden: stress und burnout in psychosozialen berufen [helpers' suffering: stress and burnout in psycho-social professions]. heidelberg, germany: roland asanger verlag. evertson, c. m., & weinstein, c. s. (2006). classroom management as a field of inquiry. in c. m. evertson & c. s. weinstein (eds.), handbook of classroom management (pp. 3–15). mahwah, nj: lawrence erlbaum. fauth, b., decristan, j., rieser, s., klieme, e., & büttner, g. (2014). student ratings of teaching quality in primary school: dimensions and prediction of student outcomes. learning and instruction, 29(0), 1–9. doi: 1016/j.learninstruc.2013.07.001 feldon, d. f. (2007). cognitive load and classroom teaching: the double-edged sword of automaticity. educational psychologist, 42(3), 123–137. doi: 10.1080/00461520701416173 feuerhahn, n., stamov-roßnagel, c., wolfram, m., bellingrath, s., & kudielka, b. m. (2013). emotional exhaustion and cognitive performance in apparently healthy teachers: a longitudinal multi-source study. stress and health, 29(4), 297–306. doi: 10.1002/smi.2467 flook, l., goldberg, s. b., pinger, l., bonus, k., & davidson, r. j. (2013). mindfulness for teachers: a pilot study to assess effects on burnout, and teaching efficacy. mind, brain, and education, 7(3), 182– 195. doi: 10.1111/mbe.12026 frenzel, a., goetz, t., stephens, e., & jacob, b. (2009). antecedents and effects of teachers' emotional experiences: an integrated perspective and empirical test. in p. a. schutz & m. zembylas (eds.), advances in teacher emotions research: the impact on teachers' lives (pp. 129–146). new york, ny: springer. hakanen, j. j., bakker, a. b., & schaufeli, w. b. (2006). burnout and work engagement among teachers. journal of school psychology, 43, 495–513. doi: 10.1016/j.jsp.2005.11.001 seiz et al | f l r 73 helsing, d. (2007). regarding uncertainty in teachers and teaching. teaching and teacher education, 23(8), 1317–1333. doi: 10.1016/j.tate.2006.06.007 helmke, a., & renkl, a. (1993). unaufmerksamkeit in grundschulklassen: problem der klasse oder des lehrers? [inattention in primary classrooms: problem of the class or of the teacher?]. zeitschrift für entwicklungspsychologie und pädagogische psychologie, 25(3), 185–205. hill, h. c., rowan, b., & ball, d. l. (2005). effects of teachers’ mathematical knowledge for teaching on student achievement. american educational research journal, 42(2), 371–406. doi: 10.3102/00028312042002371 hox, j. j. (2010). multilevel analysis. techniques and application (2nd ed.). east sussex: routledge. johns, m., inzlicht, m., & schmader, t. (2008). stereotype threat and executive resource depletion: examining the influence of emotion regulation. journal of experimental psychology: general, 137(4), 691–705. doi: 10.1037/a0013834 jennings, p. a., & greenberg, m. t. (2009). the prosocial classroom: teacher social and emotional competence in relation to student and classroom outcomes. review of educational research, 79(1), 491–525. doi: 10.3102/0034654308325693 keller, m., chang, m.-l., becker, e., goetz, t., & frenzel, g. (2014). teachers’ emotional experiences and exhaustion as predictors of emotional labor in the classroom: an experience sampling study. frontiers in psychology, 5(1442). doi: 10.3389/fpsyg.2014.01442 klein, a. (2000). moderatormodelle: verfahren zur analyse von moderatoreffekten in strukturgleichungsmodellen [moderator models: methods for the analysis of moderator effects in structural equation models]. hamburg, germany: kovač. klein, a., & moosbrugger, h. (2000). maximum likelihood estimation of latent interaction effects with the lms method. psychometrika, 65(4), 457–474. doi: 10.1007/bf02296338 kleinsorge, t., diestel, s., scheil, j., & niven, k. (2014). burnout and the fine-tuning of cognitive resources. applied cognitive psychology, 28(2), 274–278. doi: 10.1002/acp.2999 klusmann, u., kunter, m., trautwein, u., lüdtke, o., & baumert, j. (2008). teachers' occupational wellbeing and quality of instruction: the important role of self-regulatory patterns. journal of educational psychology, 100(3), 702–715. doi: 10.1037/0022-0663.100.3.702 klusmann, u., kunter, m., voss, t., & baumert, j. (2012). berufliche beanspruchung angehender lehrkräfte: die effekte von persönlichkeit, pädagogischer vorerfahrung und professioneller kompetenz [occupational stress of beginning teachers: the effects of personality, pedagogical experience and professional competence]. zeitschrift für pädagogische psychologie, 26, 275–290. doi: 10.1024/1010-0652/a000078 korthagen, f., & kessels, j. p. (1999). linking theory and practice: changing the pedagogy of teacher education. educational researcher, 28(4), 4–17. doi: 10.3102/0013189x028004004 kounin, j. s. (1970). discipline and group management in classrooms. new york, ny: holt, rinehart & winston. krauss, s., brunner, m., kunter, m., baumert, j., blum, w., neubrand, m., & jordan, a. (2008). pedagogical content knowledge and content knowledge of secondary mathematics teachers. journal of educational psychology, 100(3), 716–725. doi: 10.1037/0022-0663.100.3.716 kunter, m., & baumert, j. (2006). who is the expert? construct and criteria validity of student and teacher ratings of instruction. learning environments research, 9(3), 231–251. doi: 10.1007/s10984-006-90157 kunter, m., baumert, j., & köller, o. (2007). effective classroom management and the development of subject-related interest. learning and instruction, 17, 494–509. doi: 10.1016/ j.learninstruc.2007.09.002 liston, d., whitcomb, j., & borko, h. (2006). too little or too much: teacher preparation and the first years of teaching. journal of teacher education, 57(4), 351–358. doi: 10.1177/0022487106291976 seiz et al | f l r 74 luckner, a. e., & pianta, r. c. (2011). teacher–student interactions in fifth grade classrooms: relations with children's peer behavior. journal of applied developmental psychology, 32, 257–266. doi: 10.1016/j.appdev.2011.02.010 lüdtke, o., robitzsch, a., trautwein, u., & kunter, m. (2009). assessing the impact of learning environments: how to use student ratings of classroom or school characteristics in multilevel modeling. contemporary educational psychology, 34(2), 120–131. doi: 10.1016/j.cedpsych.2008.12.001 lüdtke, o., trautwein, u., kunter, m., & baumert, j. (2006). reliability and agreement of student ratings of the classroom environment: a reanalysis of timss data. learning environments research, 9(3), 215– 230. doi: 10.1007/s10984-006-9014-8 marsh, h. w., lüdtke, o., robitzsch, a., trautwein, u., asparouhov, t., muthén, b., & nagengast, b. (2009). doubly-latent models of school contextual effects: integrating multilevel and structural equation approaches to control measurement and sampling error. multivariate behavioral research, 44(6), 764–802. doi: 10.1080/00273170903333665 marzano, r. j., marzano, j. s., & pickering, d. j. (2003). classroom management that works researchbased strategies for every teacher. alexandria, va: association for supervision and curriculum development. maslach, c., jackson, s. e., & leiter, m. p. (1996). maslach burnout inventory manual (3rd ed.). palo alto, ca: counsulting psychologists press. maslach, c., & leiter, m. p. (1999). teacher burnout: a research agenda. in r. vandenberghe & a. m. huberman (eds.), understanding and preventing teacher burnout: a sourcebook of international research and practice (pp. 295–303). cambridge, uk: cambridge university press. maslach, c., schaufeli, w. b., & leiter, m. p. (2001). job burnout. annual review psychology, 52, 397– 422. doi: 10.1146/annurev.psych.52.1.397 mayer, r. e. (2012). information processing. in k. r. harris, s. graham, t. urdan, c. b. mccormick, g. m. sinatra, & j. sweller (eds.), apa educational psychology handbook, volume 1: theories, constructs, and critical issues (pp. 85–99). washington, dc: american psychological association. morris-rothschild, b. k., & brassard, m. r. (2006). teachers´ conflict management styles: the role of attachment styles and classroom mangement efficacy. journal of school psychology, 44, 105–121. doi: 10.1016/j.jsp.2006.01.004 muthén, l. k., & muthén, b. o. (1998-2010). mplus user's guide. sixth edition. los angeles, ca: muthén & muthén. näring, g., briët, m., & brouwers, a. (2006). beyond demand–control: emotional labour and symptoms of burnout in teachers. work & stress, 20(4), 303–315. doi: 10.1080/02678370601065182 nie, y., & lau, s. (2009). complementary roles of care and behavioral control in classroom management: the self-determination theory perspective. contemporary educational psychology, 34, 185–194. doi: 10.1016/j.cedpsych.2009.03.001 ophardt, d., & thiel, f. (2008). klassenmanagement als basisdimension der unterrichtsqualität [classroom management as a basic dimension for instructional quality]. in m. k. w. schweer (ed.), lehrer-schüler-interaktion [teacher-student-interaction] (2nd ed., pp. 259–282). wiesbaden, germany: vs verlag für sozialwissenschaften. park, s., & oliver, j. s. (2008). revisiting the conceptualisation of pedagogical content knowledge (pck): pck as a conceptual tool to understand teachers as professionals. research in science education, 38, 261–284. doi: 10.1007/s11165-007-9049-6 pianta, r. c. (2006). classroom management and relationships between children and teachers: implications for research and practice. in c. m. evertson & c. s. weinstein (eds.), handbook of classroom management (pp. 685–709). mahwah, nj: lawrence erlbaum. pianta, r. c., & hamre, b. k. (2009). conceptualization, measurement, and improvement of classroom processes: standardized observation can leverage capacity. educational researcher, 38(2), 109–119. doi: 10.3102/0013189x09332374 seiz et al | f l r 75 podsakoff, p. m., mackenzie, s. b., lee, j. y., & podsakoff, n. p. (2003). common method biases in behavioral research: a critical review of the literature and recommended remedies. journal of applied psychology, 88(5), 879–903. doi: 10.1037/0021-9101.88.5.879 reyes, m. r., brackett, m. a., rivers, s. e., white, m., & salovey, p. (2012). classroom emotional climate, student engagement, and academic achievement. journal of educational psychology, 104(3), 700–712. doi: 10.1037/a0027268 roehrig, a. d., turner, j. e., arrastia, m. c., christesen, e., mcelhaney, s., & jakiel, l. m. (2012). effective teachers and teaching: characteristics and practices related to positive student outcomes. in k. r. harris, s. graham, t. urdan, s. graham, j. m. royer, & m. zeidner (eds.), apa educational psychology handbook, volume 2: individual differences and cultural and contextual factors (pp. 501– 527). washington, dc: american psychological association. doi: 10.1037/13274-020 roeser, r. w., schonert-reichl, k. a., jha, a., cullen, m., wallace, l., wilensky, r., . . . harrison, j. (2013). mindfulness training and reductions in teacher stress and burnout: results from two randomized, waitlist-control field trials. journal of educational psychology, 105(3), 787–804. doi: 10.1037/a0032093 roorda, d. l., koomen, h. m., spilt, j. l., & oort, f. j. (2011). the influence of affective teacher–student relationships on students’ school engagement and achievement: a meta-analytic approach. review of educational research, 81(4), 493–529. doi: 10.3102/0034654311421793 sabers, d. s., cushing, k. s., & berliner, d. c. (1991). differences among teachers in a task characterized by simultaneity, multidimensionality, and immediacy. american educational research journal, 28(1), 63–88. doi: 10.2307/1162879 sandström, a., rhodin, i. n., lundberg, m., olsson, t., & nyberg, l. (2005). impaired cognitive performance in patients with chronic burnout syndrome. biological psychology, 69(3), 271–279. doi: 10.1016/j.biopsycho.2004.08.003 schafer, j. l., & graham, j. w. (2002). missing data: our view of the state of the art. psychological methods, 7(2), 147–177. doi: 10.1037//1082-989x.7.2.147 schermelleh-engel, k., kerwer, m., & klein, a. (2014). evaluation of model fit in nonlinear multilevel structural equation modeling. frontiers in psychology, 5, 1–11.doi: 10.3389/fpsyg.2014.00181 schmeichel, b. j., vohs, k., & baumeister, r. f. (2003). intellectual performance and ego depletion: role of the self in logical reasoning and other information processing. journal of personality and social psychology, 85(1), 33–46. doi: 10.1037/0022-3514.85.1.33 schwarzer, r., schmitz, g., & tang, c. (2000). teacher burnout in hong kong and germany: a crosscultural validation of the maslach burnout inventory. anxiety, stress and coping, 13, 309–326. doi: 10.1080/10615800008549268 shulman, l. s. (1986). those who understand: knowledge growth in teaching. educational researcher, 15(2), 4–14. doi: 10.2307/1175860 shulman, l. s. (1987). knowledge and teaching: foundations of the new reform. harvard educational review, 57(1), 1–22. skaalvik, e. m., & skaalvik, s. (2011). teacher job satisfaction and motivation to leave the teaching profession: relations with school context, feeling of belonging, and emotional exhaustion. teaching and teacher education, 27, 1029–1038. doi: 10.1016/j.tate.2011.04.001 statistisches bundesamt (2014). bildung und kultur. allgemeinbildende schulen. [education and culture. secondary schools]. wiesbaden: statistisches bundesamt. sutton, r. e. (2005). teachers' emotions and classroom effectiveness: implications from recent research. the clearing house, 78(5), 229–234. doi: 10.2307/30189914 sutton, r. e., & wheatley, k. f. (2003). teachers' emotions and teaching: a review of the literature and directions for future research. educational psychology review, 15(4), 327–358. doi: 10.1023/a:1026131715856 seiz et al | f l r 76 swanson, h. l., o’connor, j. e., & cooney, j. b. (1990). an information processing analysis of expert and novice teachers’ problem solving. american educational research journal, 27(3), 533–556. doi: 10.3102/00028312027003533 sweller, j., van merrienboer, j. j. g., & paas, f. g. w. c. (1998). cognitive architecture and instructional design. educational psychology review, 10(3), 251–296. doi: 10.1023/a:1022193728205 unterbrink, t., hack, a., pfeifer, r., buhl-grießhaber, v., müller, u., wesche, u., . . . bauer, j. (2007). burnout and effort–reward-imbalance in a sample of 949 german teachers. international archives of occupational and environmental health, 80, 433–441. doi: 10.1007/s00420-007-0169-0 van den bogert, n., van bruggen, j., kostons, d., & jochems, w. (2014). first steps into understanding teachers' visual perception of classroom events. teaching and teacher education, 37, 208–216. doi: 10.1016/j.tate.2013.09.001 van der linden, d., keijsers, g. p. j., eling, p., & schaijk, r. v. (2005). work stress and attentional difficulties: an initial study on burnout and cognitive failures. work & stress, 19(1), 23–36. doi: 10.1080/02678370500065275 voss, t., kunina-habenicht, o., & kunter, m. (2015). stichwort pädagogisches wissen von lehrkräften: empirische zugänge und befunde [teachers’ pedagogical knowledge: empirical approaches and findings]. zeitschrift für erziehungswissenschaft. doi: 10.1007/s11618-015-0626-6 voss, t., kunter, m., & baumert, j. (2011). assessing teacher candidates’ general pedagogical / psychological knowledge: test construction and validation. journal of educational psychology, 103(4), 952–969. doi: 10.1037/a0025125 voss, t., kunter, m., seiz, j., hoehne, v., & baumert, j. (2014). die bedeutung des pädagogischpsychologischen wissens von angehenden lehrkräften für die unterrichtsqualität [the impact of teachers’ general pedagogical and psychological knowledge on instructional quality]. zeitschrift für pädagogik, 60(2), 184–201. wang, m. c., haertel, g. d., & walberg, h. j. (1993). toward a knowledge base for school learning. review of educational research, 63(3), 249–294. doi: 10.3102/00346543063003249 williford, a. p., whittaker, j. e. v., vitiello, v. e., & downer, j. t. (2013). children’s engagement within the preschool classroom and their development of self-regulation. early education and development, 24, 162–187. doi: 10.1080/10409289.2011.628270 wolff, c. e., van den bogert, n., jarodzka, h., & boshuizen, h. p. a. (2015). keeping an eye on learning: differences between expert and novice teachers’ representations of classroom management events. journal of teacher education, 66(1), 68–85. doi: 10.1177/0022487114549810 seiz et al | f l r 77 appendix a standardized factor loadings of latent factors factors and indicators factor loadings (within level) factor loadings (between level) emotional exhaustion i often feel exhausted at school. ― .79 as a whole, i feel overworked. ― .65 i often notice how listless i am at school. ― .70 i sometimes feel really depressed at the end of a school day. ― .75 pedagogical/ psychological knowledge teaching methods ― .72 classroom management ― .35 classroom assessment ― .56 students’ heterogeneity ― .56 classroom disturbance in mathematics teaching is very often interrupted. .75 .99 in mathematics students talk among themselves the whole time. .76 .99 in mathematics students mess around the whole time. .70 .98 in mathematics it takes a very long time at the start of the lesson until the students have settled down and started working. .60 .95 in mathematics a lot of lesson time is wasted. .63 .95 in mathematics the lesson often starts late. .42 .82 monitoring in mathematics our teacher always knows what is going on in the classroom. .44 .87 in mathematics our teacher always checks our homework thoroughly. .60 .68 in mathematics our teacher makes sure that we pay attention. .45 .95 note. all loadings were significant at p < .05. codepen hirt et al frontline learning research vol.8 no. 4 (2020) 74 111 issn 2295-3159 types of social help-seeking strategies in different and across specific task stages of a real, challenging long-term task and their role in academic achievement carmen nadja hirta, yves karlena,francesca suterb & katharina maag merkib auniversity of applied sciences and arts northwestern switzerland, school of education, switzerland buniversity of zurich, institute of education, switzerland article received 24 february 2020 / revised 9 june / accepted 17 july / available online 5 august abstract social help seeking (shs) is an important strategy for successful self-regulated learning at all school levels. the aim of this longitudinal study is threefold: to ascertain the existence of different types of shs strategies in various task stages of creating an individual academic paper, examine the extent to which these types of shs strategies change in the course of that challenging long-term task and analyse the extent to which these types are relevant to academic achievement. this examination extends previous studies by adopting a task-specific, person-centred development perspective on shs outside regular classroom instruction. in particular, we explore shs types in the context of a real, long-term task, whereby aspects neglected in previous studies (need for help, help sources based on specific issue areas) are used for type creation and test for differences in academic achievement. three online questionnaires were completed by 603 upper secondary school-level students (62.9% female) with a mean age of 17.3 (sd = .71) within one school year. latent class analyses, latent transition analyses (lta) and non-parametric procedures (kruskal-wallis h test, post hoc dunn-bonferroni test) were performed. different shs types were identified (independents, fac-tual supervisor-focused, factual supervisor-focused and motivational family-focused, motivational family-focused, and factual and motivational family-focused) and found to vary over different task stages. moreover, lta indi-cated a considerable change between the shs types over time. nevertheless, no significant differences in achievement emerged between the types per task stage, thus reflecting the adage, 'there is more than one way of doing it'. keywords: social help seeking, longitudinal study, latent class analyses, latent transi-tion analyses, academic achievement info corresponding author email: carmen.hirt@fhnw.ch doi: https://doi.org/10.14786/flr.v8i4.627 1. introduction learners usually encounter difficulties while working on a challenging task. one strategy for overcoming a difficulty that hinders further operations involves seeking help (nelson-le gall, 1985; newman, 2000). help seeking (hs) is an important self-regulatory strategy (e.g. newman, 2000; pintrich & zusho, 2002) at all school levels (järvelä, 2011), and it represents an important external resource management strategy for achieving goals (schenke, lam, conley & karabenick, 2015). some authors include seeking help from both personal and non-personal sources in their definition of hs (e.g. aleven, stahl, schworm, fischer & wallace, 2003); other authors such as zimmerman and moylan (2009) consider hs as "[...] a social form of information seeking" (p. 303). the social aspect of seeking help is emphasized in the latter, which distinguishes this strategy from information seeking (e.g. books, internet and computer-supported, interactive learning environments). the hs strategy is used when a task cannot be completed on its own, which is also expressed by nelson-le gall (1985) in her explanation of hs as an "adaptive alternative to individual problem solving" (p. 66). as the focus in the present study is on seeking help from other persons, information seeking from non-real personal supporters is excluded. the term social help seeking (shs) is therefore used to underline this conceptual distinction. shs differs from other self-regulation strategies (schworm & fischer, 2006) because it requests interaction with other people such as teachers, peers and parents (ellis, 1997; karabenick & newman, 2010; newman, 2000), thereby confirming its uniqueness, whereas technologically mediated hs can also be social if the presence of the other individual is real (e.g. phone call, e-mail) (karabenick & newman, 2010). one consequence of the social-interactive character of the described form of help seeking is that it makes the shs process vulnerable to a variety of influences (karabenick & berger, 2013). most of the studies available on shs aimed to determine whether help is sought, for what reasons and from whom and concluded that not every shs is equally beneficial to learning (wolters, pintrich & karabenick, 2003). making a distinction between the more and less productive forms of shs is therefore important. thus far, these forms of shs have been primarily studied in the context of regular instruction, strongly based on the goals, reasons or orientations of the learners. this effort has resulted in the forms or types of instrumental shs, executive shs and shs avoidance, with different effects on achievement, both in variableand person-centred approaches. however, shs does not only occur in the setting of regular instruction. school tasks for completion outside of regular instruction can also lead to difficulties and thus to shs processes, where peers and teachers do not necessarily serve as source, but other people such as parents are possibly more likely to be available. although the contextual resources can determine the rate and effectiveness of shs (karabenick & gonida, 2018), they have been hardly examined for educational tasks in the context beyond regular instruction. the present study therefore seeks to examine shs in an educational task outside the classroom environment, which consists in writing a compulsory school leaving certificate paper at the upper secondary school level. this paper is written individually or in groups outside of class during extracurricular time over approximately one year, and it entails three task stages (development, implementation and final stages). in this context, we aim to examine whether we can group shs strategies into certain types at different stages during the entire process of developing the individual academic paper, through the adoption of a person-centred approach to focus on students as a complex system of interacting components. for the creation of these types, we particularly focus on the perceived need for help, the different issue areas that arise, and various contact persons, as these aspects, aside from the goal of shs, have been demonstrated to have a crucial influence on shs (karabenick & gonida, 2018; karabenick & knapp, 1988, 1991; makara & karabenick, 2013). currently, studies linking shs strategies to specific stages of a complete task process are non-existent, and research with a developmental perspective is sparse, even though a development perspective is especially relevant when considering issues related to the need for help and the selection of shs sources (karabenick & gonida, 2018). therefore, we examine the issue of whether and in what way shs strategies change over a longer period and identify the specific shs strategies that are the most successful in view of the need for help, different issue areas, and various contact persons considering different types. 1.1 social help seeking as a processual learning strategy depending on the author, shs is described as a process that includes five to seven steps. the following seven steps can be defined according to different authors: at the beginning of the process, the student must realise that he or she has a problem (1) and will not make any progress without help (2). ascertaining that help is needed does not automatically induce shs behaviour, as motivational, cognitive and social factors mediate students’ shs behaviour in a specific context (schworm & fischer, 2006). if the decision to seek help is made, the student sets an shs goal (3) that can be executive (i.e. asking for a direct answer or solution) or instrumental (asking for hints) in nature (karabenick & knapp, 1991; karabenick & newman, 2010; schworm, 2018). the next step is to search for potential helpers (4). this decision is influenced by various characteristics of the possible helper on the one hand, and of the person seeking help, on the other (ryan, pintrich & midgley, 2001; ryan & shin, 2011; schworm & fischer, 2006). based on the type of assistance requested (5) and received (6), the learner must ultimately decide whether such assistance has served its purpose or whether he or she should seek further help and in doing so, start a new shs cycle (7) (nelson-le gall, 1985). these steps may be conducted in a different order, and individual steps can also be undertaken in parallel (nelson-le gall, 1985). similar to zimmerman's (2002) self-regulation model, in which reflections serve to decide whether students can continue learning in a certain manner or whether adaptations are necessary, a parallel assumption for the shs process is that the final evaluation may engender an adaptation of future shs (karabenick & berger, 2013). for example, if a student could not find the desired help from a particular person, he or she may ask another individual for help in the future (nelson-le gall, 1981). therefore, shs places high demands on the reflection of previous processes and consequently on the decision making of the person seeking help. 1.2 previous types of social help seeking and their role in academic achievement regarding previous research on shs forms and types, a distinction can be made between variable-centred and person-centred approaches to highlight both the theoretical and the empirical backgrounds. variable-centred approaches focus on the relationship of the variables themselves, whereas person-centred approaches underscore the subpopulations of people identified by similar value patterns on a set of variables (finney, barry, horst & johnston, 2018). by examining the pattern of values across variables, a person is typified in a holistic meaning (magnusson, 1998). therefore, the goal of using a person-centred approach is to represent the types (nagin, 2005) and thereby to provide a "view of the person as a system of interacting components" (robins, john & caspi, 1998, p. 135). in contrast, a variable-centred approach does not provide information on the combination of dimensions within the person, but it establishes relationships between aggregated dimensions (finney et al., 2018). up to now, shs research within variable-centred approaches mainly distinguishes three forms (reasons or orientations, sometimes also referred to as types) of shs in the classroom context that chiefly emphasise the goals of learners (e.g. butler, 1998, 2006; nadler, 1998; nelson-le gall, 1981; ryan, patrick & shim, 2005). first, instrumental (autonomous, adaptive, appropriate) shs highlights the improvement of individual knowledge and competence, and here in-depth learning transpires (karabenick & newman, 2006). second, executive (expedient, dependent) shs strives to finalise solutions to avoid exertion. the third form is to avoid shs, even if help is needed. the last two forms are less conducive to sustainable learning (karabenick & newman, 2010). thus, shs not only implies an act of dependence but also represents an adaptive and strategically beneficial process (butler, 1998; karabenick, 1998; nelson-le gall, 1981). given the potential benefit of shs, various studies in this context have focused on the determinants of persons and situations (wolters et al., 2003) that affect whether help is sought, for what reasons and by whom (butler, 1998, 2006; butler & neuman, 1995; elliot & church, 1997; elliot & mcgregor, 2001; karabenick, 1998; nelson-le gall, 1981; ryan & pintrich, 1997; ryan et al., 2001). the few studies with person-centred approaches (finney et al., 2018; karabenick, 2003), whereby shs types were analysed, are of particular interest here and are therefore described in more detail. karabenick (2003) examined students using five hs scales, namely, instrumental and executive help seeking, help-seeking avoidance, help-seeking threat, and formal versus informal help seeking, whereby the latter points to the aspect of shs. all items were formulated hypothetically in relation to a possible difficulty that might occur (e.g. "getting help would be one of the first things i would do if i were having trouble in this class" (karabenick, 2003, p. 55)). on this basis, a hierarchical cluster analysis was conducted, which yielded four types of student shs: strategic/adaptive/formal (17%), strategic/adaptive/informal (25%), non-strategic (36%) and avoidant (23%). although this outcome broadly confirmed the division into the three forms of shs for the classroom context (e.g. butler, 2006; nadler, 1998; nelson-le gall, 1981), it resulted in an expansion by two specific source levels (i.e. informal versus formal). the types were subsequently examined with regard to their differences in achievement. the most clearly identified cluster was the instrumental shs type (strategic/adaptive/formal) with a preference for help from formal sources (teachers), which had the highest performance level, whereas the avoidant type exhibited the lowest performance level (karabenick, 2003). finney et al. (2018) investigated whether they could replicate karabenick’s (2003) four-type solution if they employed a more general measurement of shs. they initially examined incoming first-year college students using modified items already utilised by karabenick (2003) and related them to the general school context (i.e. across all classes during one semester). similar to karabenick’s (2003) approach, the items were formulated in such a way that the need for help should be controlled by the formulation of the question about what the students would do if they needed help (finney et al., 2018). in addition, the items were verbalised in relation to a hypothetical problem. in contrast to karabenick (2003), finney et al. (2018) produced three shs types and identified only minor differences between the individual types. hence, the four types from the first study could not be replicated, although the authors recognised a possible reason for this case in the more general measurement of shs (i.e. across several classes). these reports argue in favour of examining shs types in a much stronger task-related manner. in a second study with upper class students and the same instruments, finney et al. (2018) indicated that the types had similar profiles to the types of first-year college students. additionally, with regard to dissimilarities in achievement, finney et al. (2018) established that the instrumental and formal source-related shs dimension was positively related to optimal outcomes and negatively related to non-optimal outcomes; the avoidance, threat, and executive shs dimensions had inverse relations. altogether, the strong orientation toward the goals of shs in previous research on shs types in the classroom context becomes clear, which can demonstrate that different forms or types of shs based on the goals make dissimilar contributions to learning and thus to the achievement of the learners, both with variableand person-centred approaches (e.g. butler, 2006; karabenick, 2003; ryan et al., 2005). thus far, the focus has been primarily on the purpose of shs: to explore the shs forms and types of learners. we intend to broaden this focus through an in-depth analysis of the need for help, different perceived issue areas and various contact persons, as these factors have also emerged as central influencing factors in shs. 1.3 central influencing factors in social help seeking the need for help can be considered as a key element to understand the part played by shs in the learning process (karabenick & knapp, 1991). as awareness of the need for help constitutes the starting point of shs (ryan et al., 2001) shs "should be directly related to the learners' perceived need for help" (karabenick & gonida, 2018, p. 422). when learners are aware of what they need to learn/work effectively, they are able to take action to meet the demands of the task (nelson-le gall, 1981). karabenick and knapp (1988) demonstrated the high importance of the learner's need for help in their early analyses of students in various courses. thus, the need for help was positively related to the frequency of asking for help. moreover, this relationship was curvilinear (karabenick & knapp, 1988) so that students with a high need and students with a very low need for help were the least likely to seek help. karabenick and knapp (1988) attributed the fact that students in great need seek less help to cognitive and emotional hindrances, especially helplessness. this curvilinear relationship was also established for the link between the stated need and the grades expected by learners (karabenick & knapp, 1988). in a later study, karabenick and knapp (1991) again examined the need for help and the shs of students. they concluded that the need for help was strongly associated with shs. if grades were included, the researchers demonstrated that the need for help was inversely related to grades, and that grades were inversely related to shs. in contrast to their previous study, no significant quadratic but a linear trend could be found for the relationship between help seeking and grades. hence, karabenick and knapp (1991) revealed that learners with a low need for help sought less help and overall performed better, which they attributed to these learners' increased use of learning strategies. overall, the results indicate that when learners need help, they are more likely to seek it. however, the authors note that the relationships presented merely demonstrated what learners would do if they were confronted with a problem (karabenick & knapp, 1991). the results of studies with variable-centred approaches illustrate the relationship between the perceived need for help and shs. the inclusion of the perceived need for help in studies of shs also with person-centred approaches is therefore essential. if the help is needed and recognised as such, shs also depends on different contact persons (e.g. makara & karabenick, 2013) who are selected on the basis of the perceived issue area (e.g. boldero & fallon, 1995). contextual resources can determine the rate and effectiveness of shs (karabenick & gonida, 2018). within different courses or tasks, challenges in different sub-topics can appear for which help is needed and sought. various early studies indicated that different people asked for help depending on the issue area for which help is required. for example, family and peers are primarily asked for help with psychological, personal challenges (boldero & fallon, 1995; rickwood, 1995; tinsley, de st. aubin & brown, 1982). however, academic advisors or teachers are more often asked for help with career-related aspects (tinsley et al., 1982). therefore, the characteristics of the person providing help (e.g. confidence [newman, 2000], knowledge [stroebe, hewstone, codol & stephenson, 2013] and care [ryan & shim, 2012]) play a central role. within the challenging task of developing an individual academic paper over a longer period, huber, lehmann and husfeldt (2011) identified different thematic areas that are relevant for writing a school leaving certificate paper and can therefore cause difficulties as well. most of the pupils turned to the supervisor for challenges regarding the content and structure of the paper. a few students asked the supervisor for help with formulating the research question, formal principles (e.g. footnotes, bibliography, citations) and information sources. in terms of working methods, for questions about the timetable and organisation of work and about writing the paper (writing process), even fewer learners turned to the supervisor. students were the least likely to ask their supervisors for help with their choice of topic or with overcoming a crisis. the supervisor is evidently not selected as the first contact person for challenges in all of the issue areas. the authors revealed that other contact persons were also used when difficulties arose, including parents and peers. however, the specific issue to which the shs referred regarding these sources was unclear. from these results with variable-centred approaches, shs must be viewed in the light of the challenges in different areas, and the choice of the persons providing help is based on what kind of issue is causing difficulty. 1.4 research deficits in summary, shs has a strong social-interactive character, in which the need for help of the person seeking assistance as well as the context (task, issue area) and the contextual resources (contact persons) based on this context can have a crucial impact. previous research on shs types reported similar results across different school levels, with instrumental shs types with a preference for help from formal sources performing better than executive shs types or avoiders. there has been a strong focus to date on the goals of shs and potential helpers in the classroom context, both in person-centred (finney et al., 2018; karabenick, 2003) and variable-centred approaches (butler, 1998, 2006; butler & neuman, 1995; elliot & church, 1997; elliot & mcgregor, 2001; nelson-le gall, 1981; ryan & pintrich, 1997; ryan et al., 2001). however, several research desiderata can be identified. the few studies that dealt with shs types in a person-centred approach usually included only one measurement point in their analyses or merely depicted a cross-section and recorded shs in relation to different university courses across several different tasks. today, few studies with a developmental perspective are available (karabenick & gonida, 2018), and reports argue in favour of examining shs types in a much stronger task-related manner. furthermore, the need for help has a connection with shs. this factor must be considered in the formation of shs types. thus far, this case has hardly occurred in explicit terms. the items in the investigations to date were formulated hypothetically in relation to a possible undefined difficulty that might occur. the perceived level of need was not integrated. additionally, shs can vary based on challenges in diverse task and issue areas. therefore, shs must be considered as a task-situated process in relation to the difficulties actually experienced in different issue areas. moreover, most of the analyses especially focus on shs behaviour in the classroom context. as learning transpires not only in the classroom, shs processes should also be examined for educational tasks beyond the context of regular instruction that allow for a high degree of self-regulation; one example is writing an academic paper during extracurricular time, in which teachers or peers are not always present but other contacts may play a role. 1.5 present study: questions and hypotheses the aim of this study is to extend previous investigations by adopting a task-specific, longitudinal perspective and focusing on shs in the context of a real, challenging, long-term, academic task outside of regular classroom instruction. the study extends previous research by examining shs strategies along different types (person-centred approach), focusing on the perceived need for help, various sources and specific areas in which challenges can arise. three questions are investigated. first, do different types of shs strategies exist in each task stage of developing an academic paper outside the classroom context (q1)? due to the theory on shs with dissimilar decision possibilities (nelson-le gall, 1981), different characteristics of influence on the part of the person providing help (e.g. ryan & shim, 2012), and as students are confronted with various challenges in each creation stage (backhaus & tuor, 2008), we expect to find diverse shs types per task stage (hypothesis 1). second, what is the extent to which the types of shs strategies change during the course of the creation process (q2)? following zimmerman (2002), we assume that experiences in one stage can result in a change in shs in a subsequent stage. thus, despite the assumptions of finney et al. (2018), our presupposition is the changeability of shs behaviour over time based on different sub-tasks per stage and previous experiences. consequently, we hypothesise that students may transfer between different shs types (hypothesis 2). furthermore, assumptions in an explorative manner can be made regarding possible changes, as previous theoretical and empirical foundations are insufficient: students who received help from their paper supervisor that they found useful in the first stage need less help with the academic paper in the following stages (hypothesis 3); however, if they do not discuss the concept broadly with the supervisor, these students will need help at a later stage (hypothesis 4). we also expect that some students will need help throughout all stages, relating to different issues, and will ask various contact persons for help (hypothesis 5). third, what is the degree of importance of these shs types per task stage in academic achievement (q3)? based on previous findings of both variable-centred (butler, 1998, 2006; karabenick & knapp, 1988, 1991; nadler, 1998; nelson-le gall, 1981; ryan et al., 2005) and person-centred approaches (finney et al., 2018; karabenick, 2003), we assume that the shs types are ultimately related to different academic achievement (hypothesis 6). 2. context and methods 2.1 context in switzerland, where this study was conducted, the upper secondary school level has different tiers that are oriented toward various professional tracks. grammar schools (isced levels 3–4) represent the track with a strong emphasis on academic learning that prepares students for university. toward the end of this track, students must write a compulsory school leaving certificate paper, which is referred to as the matura thesis. this academic paper significantly contributes to the grade on the final exit examination, which in turn, if passed, guarantees unrestricted admission (with the exception of medical studies in switzerland) to a university of applied sciences or university (swiss federal council & edk, 1995). after completion of the upper secondary school leaving certificate, students have the ability to access new knowledge, develop their curiosity, imagination and communication skills, and work alone and in groups (swiss federal council & edk, 1995, art. 5). as the matura thesis is a demanding task, it requires self-regulated learning. however, it also has the potential to promote self-regulated learning competencies (huber, husfeldt, lehmann & quesel, 2008). students are given approximately one year's time to complete their papers. the papers are written individually or in groups, outside of class during extracurricular time. the developing process of the matura thesis consists of three task stages, each containing different key activities. the first stage is concept development (t1), in which a research question is identified; the learners have ample freedom to select the topic and plan implementation to answer the research question. the second phase refers to the implementation stage (t2), in which students start creating their paper. in the final stage (t3), students finish their matura thesis, evaluate and revise it, and finally submit it. 2.2 participants all study procedures complied with the human subjects' guidelines of the swiss national science foundation. students had the opportunity to withdraw their participation at any time. this longitudinal study began with 1,250 students (55.9% female) with an average age of 17.5 (sd = .81) at 12 urban and rural upper secondary schools. the number of participants was later reduced based on the criteria below. to determine possible changes between shs types and task stages, we had to reduce the sample based on two arguments. first, we included in the analyses only those students who filled out all three online questionnaires (t1, t2, t3) needed for this study. this criterion provided a longitudinal sample of n = 713 (63.0% female) students with an average age of m = 17.4 (sd = .76) over three measurement points. second, the academic paper can be written alone or in a group. to be able to compare the hs of individuals, we were interested in those students who wrote their paper on their own. we consequently excluded those students who wrote their thesis in pairs or groups. this criterion resulted in the reduction of 110 students. the final sample of this study consisted of n = 603 students (62.9% female) with an average age of 17.3 (sd = .71). up to 91.7% of these learners stated that they were born in switzerland, and 86.4% specified that they most often spoke german/swiss german at home. we compared the sample of this study with the characteristics of the population of students in grammar schools in the german-speaking part of switzerland. the sample reflected the population of this type of school with respect to gender, age, nationality and native language. 2.3 measurements the participants answered all the items in the online self-report questionnaires. for these analyses, the perceived frequency of asking the supervisor, family and peers for help, as well as the issue areas and the related contact persons were quantified at three measurement times, retrospectively for one task stage each. the perceived need for help was calculated at one measurement time, retrospectively for the three task stages of creating the academic paper. this approach resulted in 72 collected shs variables (24 variables per stage). all of the aspects included are explained in more detail in the following sections. 2.3.1. perceived need for help after the students had submitted their matura thesis (t3), we asked them about their retrospective perceived need for help in the different stages of the creation process. we asked, "to what extent were you reliant on help to continue working through the whole process?" (3 items, 1 item for each stage: "during the [fill in the stage] stage, i was...."). the responses ranged from 1 (not at all reliant on help) to 6 (entirely reliant on help). 2.3.2. perceived frequency of asking for help the perceived frequency of asking for help was measured thrice with the following question during the entire process, retrospectively for each stage (t1, t2, t3): "how often did you request help from the following persons?" participants responded on a 6-point likert scale from 1 (never) to 6 (very often) for supervisor, family and peers, and it was respectively adjusted for each creation stage (3 × 3 items). 2.3.3. issue areas and contact persons to identify the individuals from whom the students asked for help relating to specific issue areas, we asked them the following question retrospectively for each stage (t1, t2, t3) at three different points in time: "who did you ask for help concerning the following issue areas?" we gave the students different possible contact persons (nobody, supervisor (formal), family (informal), peers (informal), and others) whom they could select relating to different given issue areas (5 × 4 × 3 items). most of the issue areas were obtained from huber et al.'s (2011) study, as they proved to be relevant for writing a school leaving certificate paper. however, to ensure the comparability of the types across the phases, those issue areas that turned out to be relevant and thus possibly leading to difficulties for all three phases were integrated into the present analyses, namely, information source, working methods, timetable and organisation of the paper (factual issue areas), and motivation and resolving crises (motivational issue areas). for this question, multiple answers were possible (1 = selected, 2 = not selected). 2.3.4. academic achievement academic achievement was also recorded as the students’ grades on their academic papers. the paper supervisors and the second assessors evaluated the papers. we were given access to the students’ official grades, which in switzerland range from 1 to 6, with 6 being the highest grade. 2.4 statistical analyses as a person-centred approach, we conducted latent class analyses (lca) to ascertain the existence of different shs types at each task stage of generating an academic school leaving certificate paper. latent class analysis is a statistical method for classifying individuals into homogeneous subgroups (latent classes) (finch & bronk, 2011). to calculate the results of the lca, we used mplus version 8.1 (muthén & muthén, 1998-2017). to consider missing values, the full information maximum likelihood method (fiml) was used. the proportion of missing values of the variables was on average 1.99% (min = 0.50%, max = 5.14%). the maximum likelihood robust estimator (mlr) was used to account for a possible deviation from the multivariate normal distribution. with the categorical command in mplus, we specified that the issue area and contact person items were ordered categorical (dichotomous in this case) variables, whereas all the other variables (perceived need for help, perceived frequency of asking for help) were treated as metric variables. we were able to conduct a mixed distribution analysis (lca in this case) in mplus by using the command analysis of type = mixture. with mixed distribution analysis, we assumed that the population of students was composed of different subgroups (latent classes, hs types). the model quality statistics and parameter estimates of the lca were determined in mplus using the maximum likelihood estimation method. the log-likelihood value is a measure of the probability of the data provided by the model, and it serves as a basis for the calculation of further model fit tests or indices (finch & bronk, 2011). as the final number of classes is unknown beforehand, different models with a changing number of classes have to be tested and compared in terms of various statistical and non-statistical criteria (sample size, interpretability of the classes, average latent class probabilities). the entropy (e) assesses the quality of the measurement as a whole (asparouhov & muthén, 2014), and it should have a value greater than .80 (rost, 2006). furthermore, the values of the akaike information criterion (aic) and of the bayesian information criterion (bic) were included in the analyses, whereby the lower values correspond to the better fitting model (geiser, 2011). the lo-mendell-rubin (lmr) and the bootstrap likelihood ratio test (blr) were also utilised to identify the suitable number of classes. if the lmr and the blr tests are significant, the model with k classes represents the data better than the model with k-1 classes (finch & bronk, 2011). the differences between the classes per stage regarding perceived need for help, perceived frequency of asking for help, issue areas and contact persons, and achievement were analysed using nonparametric methods, namely, the kruskal-wallis test and post hoc dunn-bonferroni test for categorical and metric variables (martens, 2003), as the data contained outlier values that were retained due to inconstancy (sheskin, 2011). the kruskal-wallis h test is based on rank numbers assigned to individual characteristic values. finally, the sums of the rank numbers were calculated for each group (shs type per stage) and were analysed for significant differences (martens, 2003). we conducted a latent transition analysis (lta) to examine the changes in the course of the learning process. latent transition analysis can be considered as a longitudinal extension of the lca. it uses an autoregressive relationship to link the latent class variables of several measurement points (muthén & asparouhov, 2011). the latent transition probabilities, which are of particular interest in the analyses of the lta results, indicate the probability of being assigned to a specific class at time t based on the assignment to a class at time t-1 (muthén & asparouhov, 2011). further details can be found in the next section. 3. results 3.1 descriptive statistics table 1 presents the descriptive statistics for all categorical variables. the values represent the percentages of selected contact persons per issue area and the measurement points. table 2 presents the descriptive statistics for all metric variables. table 1 descriptive statistics for the categorical variables note. t1 = concept development stage; t2 = implementation stage; t3 = final stage; n = number of cases; 100% minus the percentage of the selected response option corresponds to the percentage of the not selected response option. table 2 descriptive statistics for the metric variables note. t1 = concept development stage; t2 = implementation stage; t3 = final stage; n = number of cases; m = mean; sd = standard deviation; perceived need for help: 1 = not at all reliant on help, 6 = entirely reliant on help; perceived frequency of asking for help: 1 = never, 6 = very often; academic achievement: 1 = lowest grade, 6 = highest grade. 3.2 identification and description of the help-seeking types per stage (lca) table 3 lists the models that were best suited to the data and their criteria to statistically evaluate the model fit for the different class solutions generated by the lca. the analyses were based on the aforementioned 72 collected shs variables (i.e. 24 variables per stage), referring to perceived need for help, perceived frequency of asking for help, and the issue areas and contact persons (see section 2.3). table 3 statistical fit indices for the most appropriate class solutions at different measurement times note. t1 = concept development stage; t2 = implementation stage; t3 = final stage; bic = bayesian; aic = akaike; e = entropy; be = boundary estimates (logit thresholds that were set at the extreme values -15.000 and 15.000); plmr = significance of the lo-mendell-rubin; pblr = significance of the bootstrap likelihood ratio test. initially, the 4-class solution for measurement point t1, the 5-class solution for t2 and the 6-class solution for t3 seem to be ideal (see table 3). however, closer inspection apparently indicated that these solutions contained many boundary estimates, thus making interpretation difficult. even though the bic is the most frequently used decision criterion, it sometimes refers to theoretically implausible solutions, especially for large samples, and results in the overestimation of class numbers (specht, luhmann & geiser, 2014). hence, the criteria were only evaluated in combination. the 3-class solution is preferred at all measurement times due to the interpretability and uniqueness of the classes, statistical fit indices, average latent class probabilities and the smallest number of boundary estimates. the following sections describe the shs types in more detail for each stage of the creation process of the matura thesis. first, the differences between the classes per measurement point are listed. as both metric and binary individual items were included in the lca, the results are then listed separately for each task stage. 3.2.1. social help-seeking types in the concept development stage (t1) the first classes refer to the concept development stage. three classes could be identified, which mainly differ in the categorical and metric variables examined (see table 4). table 4 differences between classes in the concept development stage (t1) note. χ2 = chi-square (df = 2); kruskal-wallis test and post hoc dunn-bonferroni test for categorical and metric variables. figure 1 shows the mean values of the metric class variables for the three latent classes, and table 5 presents the respective probabilities for the categorical class variables. the first class containing 20.6% (n = 124) of the students surveyed (see figure 1) mostly had values that were in-between the ranges of the other classes. with an average value of m = 3.53 (sd = 1.16), they had a self-reported dependence on help that ranked above the second class and below the third class. similarly, the self-reported perceived frequency of asking for help was in the middle range for the supervisor (m = 3.89, sd = 1.08) and for the family (m = 3.01, sd = 1.49) compared to the other classes. they exhibited a slightly higher value (m = 2.78, sd = 1.33) only regarding perceived frequency of asking peers for help. however, they had the highest value on seeking help from the supervisor. this result already indicated that students in this first class especially asked their supervisor for help, if at all. the results of the categorical variables (see table 5) confirmed this initial assumption and revealed that for the factual issue areas (information source and working methods), the primary person asked for help was the supervisor. these results explicated the class that we named factual supervisor-focused. figure 1. mean values of the respective metric class variables in the concept development stage (t1, n = 603). perceived need for help: 1 = not at all reliant on help, 6 = entirely reliant on help. perceived frequency of asking: 1 = never, 6 = very often. class 1 = factual supervisor-focused, class 2 = independents, class 3 = factual supervisor-focused and motivational family-focused. the second class containing 31.3% (n = 189) of the students (see figure 1) had the overall lowest values compared to the other classes in the concept development stage. these students gained an average value of m = 3.22 (sd = 1.16). according to self-reports, the students were minimally dependent on help. the perceived frequency of asking for help from the supervisor revealed the lowest value for all the classes with an average value of m = 3.59 (sd = 1.08). the values for the perceived frequency of asking the family m = 2.98 (sd = 1.49) and peers m = 2.33 (sd = 1.33) for help were the lowest in comparison to the other classes as well. for motivational aspects (motivation and resolving crises), the students did not ask anyone for help (see table 5). these findings explained the class that we named independents. the third class (see figure 1), which contains 48.1% (n = 290) of the students, was the largest of the three classes in the concept development stage. compared to the other two classes, the students in the third class stated on average that they were most dependent on help (m = 3.65, sd = 1.16) and that they sought help most from the supervisor (m = 4.00, sd = 1.08) and the family (m = 4.00, sd = 1.49). this group was found in a middle range (m = 2.77, sd = 1.33) only when seeking help from peers. for factual issue areas (information source, working methods, and timetable and organisation of work, see table 5), the students in this group stated that they turned to supervisors. in contrast, for motivational aspects (motivation and resolving crises), the family was the primary contact person. these outcomes clarified the class name factual supervisor-focused and motivational family-focused table 5 probabilities for the respective categorical class variables in the concept development stage (t1) note. class 1 = factual supervisor-focused; class 2 = independents; class 3 = factual supervisor-focused and motivational family-focused; in bold type = probabilities with a value > 0.500. 3.2.2. social help-seeking types in the implementation stage (t2) three classes could also be found for the implementation stage. table 6 shows the differences between classes in the implementation stage. as in the concept development stage, the metric class variables were examined in more detail before exploring the categorical variables for each of the classes at this stage. figure 2 shows the mean values of the metric class variables for the three latent classes, and table 7 presents the respective probabilities for the categorical class variables. the first class containing 31.9% (n = 192) of the students (see figure 2) had the lowest overall values compared to the other classes. with an average value of m = 2.98 (sd = 1.02), students in this class had a low self-reported perceived need for help. the mean values for perceived frequency of seeking help for the two contact groups family (m = 2.78, sd = 1.44) and peers (m = 1.92, sd = 1.27) were the lowest overall that could be found at this stage. the mean value for self-reported perceived frequency of asking for help from the supervisor was at a medium level (m = 3.64, sd = 1.02) compared to the others. once again, this finding could be confirmed by the probability table (see table 7), which showed that the students in the first class of this stage had individually indicated that they had not asked anyone for help. this case had already emerged in the concept development stage, and no value above 0.50 could be found for the issue area 'information source'. for this reason, this class was called independents. table 6 differences between classes in the implementation stage (t2) note . χ2 = chi-square (df = 2). the second class (see figure 2) formed the largest group in the implementation stage with 42.9% of the students (n = 259). it seemed to be highly similar to the third class from the first stage. figure 2 shows that the second class had by far the highest levels of self-reported perceived need for help (m = 3.66, sd = 1.02) and perceived frequency of asking for help from the supervisor (m = 4.04, sd = 1.02), family (m = 3.94, sd = 1.44) and peers (m = 2.55, sd = 1.27) compared to the other two classes at this stage. a closer examination of the issue areas in combination with the contact persons (see table 7) apparently indicated that the students in the second class primarily turned to the supervisor for fact-related issue areas (information source, working methods, timetable and organisation of work). however, they seemed to prefer to ask the family for help in overcoming crises (motivation and resolving crises). we named this class factual supervisor-focused and motivational family-focused. the third class (25.2%, n = 152) constituted the smallest class (see figure 2). here, the students showed an average value of m = 3.25 (sd = 1.02) for self-reported need for help and therefore were in the medium range. the perceived frequency of help seeking was also in the medium range (family, m = 3.44, sd = 1.43; peers, m = 2.18, sd = 1.27), except for perceived frequency of asking the supervisor, which was the lowest in comparison (m = 3.56, sd = 1.02). this class also stated that they had not turned to anyone except for help on motivational issue areas (motivation and resolving crises, see table 7). according to the students, the family was consulted in such cases. this rationale underlies the naming of this class as motivational family-focused. figure 2. mean values of the respective metric class variables of the implementation stage (t2, n = 603). perceived need for help: 1 = not at all reliant on help, 6 = entirely reliant on help. perceived frequency of asking: 1 = never, 6 = very often. class 1 = independents, class 2 = factual supervisor-focused and motivational family-focused, class 3 = motivational family-focused. table 7 probabilities for the respective categorical class variables in the implementation stage (t2) note. class 1 = independents; class 2 = motivational family-focused; class 3 = factual supervisor-focused and motivational family-focused; in bold = probabilities with a value > 0.500. 3.2.3 social help-seeking types in the final stage (t3) we also found three classes in the last stage of the creation process (about one month before the submission of the paper). the differences between classes in the final stage are presented in table 8. figure 3 includes the mean values of the metric class variables for the three latent classes and table 9 presents the respective probabilities for the categorical class variables. table 8 differences between classes in the final stage (t3) note: χ2 = chi-square (df = 2). an examination of the metric class variables (see figure 3) revealed that the first class (28.4%, n = 171) had the lowest mean values compared to the other two classes. the relatively low dependence on help (m = 3.63, sd = 1.05) indicated by the students also corresponded to perceived frequency of seeking help from other persons (supervisor, m = 3.03, sd = 1.27; family, m = 3.54, sd = 1.31; peers, m = 1.99, sd = 1.39). the probabilities regarding the categorical class variables (see table 9) similarly confirmed this finding, as the students in this class were apparently the most likely not to have asked anyone for help in the issue areas surveyed. hence, we named the class independents. the second class (39.4%, n = 238, see figure 3) had a self-reported need for help of m = 3.98 (sd = 1.05). the perceived frequency of seeking help from the supervisor was m = 3.20 (sd = 1.27), m = 4.27 (sd = 1.31) for the family, and m = 2.69 (sd = 1.39) for the peers (see figure 3). based on the metric class variables, these values were middle ranged compared to the other two classes. however, if one considers the probabilities of the categorical variables (see table 9), the family was clearly asked for help in particular with the issue area of timetable and organisation of work and also when motivational aspects were involved (motivation and resolving crises). the family thus played a central role in this class, and we therefore called it motivational family-focused. the family was likewise central to the third class (32.2%, n = 194, see figure 3), especially in matters of solving problems relating to issues in timetable and organisation of work and overcoming crises (see table 9). the mean values for the metric class variables (see figure 3) were the highest compared to the other two classes (perceived need for help, m = 4.30, sd = 1.05; perceived frequency of asking supervisor, m = 3.62, sd = 1.27; perceived frequency of asking family, m = 4.79, sd = 1.31; perceived frequency of asking peers, m = 2.75, sd = 1.39). given this family-based orientation, we named this class factual and motivational family-focused. figure 3. mean values of the respective metric class variables in the final stage (t3, n = 603). perceived need for help: 1 = not at all reliant on help, 6 = entirely reliant on help. perceived frequency of asking: 1 = never, 6 = very often. class 1 = independents, class 2 = motivational family-focused, class 3 = factual and motivational family-focused. table 9 probabilities for the respective categorical class variables of the final stage (t3) note: class 1 = independents; class 2 = motivational family-focused; class 3 = factual and motivational family-focused; in bold = probabilities with a value > 0.500. 3.3 changes in social help-seeking types over time (lta) before proceeding to the results of the lta, several methodological notes need to be provided. the first is that "[if] the same measurement model is used across all time points (e.g., lca) and the same number and type of classes are used, it is reasonable to explore measurement invariance" (nylund, 2007, p. 44). further, nylund (2007) notes that a full measurement invariance is not always plausible, depending on which classes have emerged from the lca. in this investigation, the lca produced the same number but not the same types of classes over time. one of the three classes seemed to occur at all three measurement points (independents), and two classes transpired in each case at two measurement points (factual supervisor-focused and motivational family-focused, motivational family-focused). therefore, model 1 was constructed first, in which the thresholds and intercepts became restricted over time for the same occurring classes. model 1 was finally compared with a second model (model 2), in which the thresholds and intercepts were freely estimated over time. due to the large number of cells in the frequency table, the chi-square test could not be computed. a model comparison using the aic and bic provides information on the particular model that is more suitable for the available data, whereby the lower values correspond to the better fitting model (geiser, 2011). as an additional dimension, the entropy, which should have a value greater than .80, was used (rost, 2006). finally, the more suitable model was used for interpreting the latent transition probabilities. table 10 summarizes the values of the model comparison: model 1 had no restrictions toward model 2, which had restrictions for constant classes over time. the model comparison indicated that the two models only differed marginally from each other. as the entropy value for model 1 was more favourable, this model was used for further analyses. this decision can also be justified theoretically, because although some classes remained the same over time, this does not signify that all persons remained in the same classes. table 10 statistical fit indices for the most appropriate model: comparison of the freely estimated model (model 1) with the restricted model (model 2) note. bic = bayesian; aic = akaike; a = independents, t1/2, t2/1, t3/1; factual supervisor-focused and motivational family-focused, t1/3, t2/2; motivational family-focused, t2/3, t3/2. figure 4 shows the latent transition probabilities for the total sample from t1 to t2 and from t2 to t3. the probabilities of 0.11 and 0.27 (t1 to t2), and 0.41 (t2 to t3) in bold type indicated a relatively low stability of the intraindividual behavioural style of students in the classes of the same name over time. first, the probabilities for a change in the shs type from the concept development stage (t1) to the implementation stage (t2) were considered. persons who belonged to the factual supervisor-focused class (class 1) in the concept development stage (t1) switched to independents (class 1) for the implementation stage (t2) with a 62% probability and to the motivational family-focused class (class 3) with a probability of 16%. as the factual supervisor-focused only existed for t1, we cannot speak of permanent members of this group. all these students changed to another shs group for the implementation stage (t2): 62% changed to independents, 23% to factual supervisor-focused and motivational family-focused and 16% to motivational family-focused. students who belonged to the independents (class 2) in the concept development stage (t1) were likely to change to the factual supervisor-focused and motivational family-focused (class 2) for the implementation stage (t2) with a 66% probability and to the motivational family-focused (class 3) with a probability of 13%. however, approximately 11% remained in the class of independents. persons who belonged to the class factual supervisor-focused and motivational family-focused (class 3) switched to independents (class 1) for the implementation stage (t2) with a probability of 28% and to the class motivational family-focused (class 3) with a probability of 45%. about 27% remained in the class factual supervisor-focused and motivational family-focused (class 2). second, we examined the probabilities of changing the shs type from the implementation stage (t2) to the final stage (t3). students in the class independents (class 1) had a probability of 41% of remaining in this class for the final stage, and they did not change to motivational family-focused (class 2). however, they were 59% likely to switch to factual and motivational family-focused (class 3). persons who belonged to the factual supervisor-focused and motivational family-focused (class 2) in the implementation stage (t2) did not switch to independents (class 1), but they were 46% likely to join the motivational family-focused (class 2) in the final stage. nevertheless, 54% moved to the factual and motivational family-focused (class 3). persons belonging to t2 in the class of motivational family-focused (class 3) neither switched to independents nor stayed in the same class in the final stage. a full 100% of this class 3 changed to factual and motivational family-focused for the final stage (t3). figure 4. transition probabilities based on the different shs types and measurement points (n = 568). only changes are shown. t1 = concept development stage, t2 = implementation stage, t3 = final stage. (1), (2), (3) = class number per measurement point. in bold type = remained in the same class. 3.4 disparities between the social help-seeking types per stage regarding academic achievement to answer the third question, a kruskal-wallis h test was run to determine any differences in grades between the three shs types per stage and therefore ascertain whether the shs type to which the students belonged per stage played a role in academic performance. the distributions of grades were similar for all shs types in all three stages, as assessed by the visual inspection of the box plots. the results indicated that the median values did not differ statistically significantly between the shs types in the concept development stage (t1), χ2(2) = 2.20, p = .333, or in the implementation stage (t2), χ2(2) = 1.36, p = .506, or in the final stage (t3), χ2(2) = 0.54, p = .763 (see table 11). table 11 disparities between shs types per stage regarding academic achievement note. t = measurement point; n = number of cases; mdn = median (1 to 6); χ2 = chi-square; df = degrees of freedom; p = level of significance (asymptotic, two-sided). no multiple comparisons due to non-significant differences between samples in the overall test. 4. discussion 4.1 overall results the main objectives of this study were to adopt a person-centred approach to gain in-depth insights into how young adults seek help in dealing with a real, challenging long-term task; to attempt to break it down into specific types of shs strategies in view of the need for help, different issue areas and various contact persons; and to integrate the role of academic achievement. swiss students at different grammar schools at the upper secondary level participated in the study and thus provided us with insights into their help-seeking behaviour. the aim of this study was threefold. the first aim was to ascertain the existence of different shs types in each task stage of creating an academic paper outside the classroom context (q1). the shs process is characterized by many different decision-making options (nelson-le gall, 1981). in addition, shs is dependent on various characteristics of the person giving help (e.g. ryan & shim, 2012). based on these points and the diverse challenges that the learners encounter in each creation phase, we expected to find different shs types in each stage of the long-term task under investigation (hypothesis 1). this hypothesis can be confirmed: not every stage includes all the same shs types. the lcas per stage indicate that one of the three classes occurs in all stages of the paper creation process, namely, independents. this group is referred to as avoiders in karabenick (2003) because of their low values in terms of the quantity of shs. however, we chose to call them independents instead, because in addition to the low perceived frequency of asking for help, they also report less perceived need for help. the members of this class could be students who already have some previous knowledge and therefore encounter fewer challenges. this low perceived need for help and the associated low perceived frequency of asking for help confirm the results of karabenick and knapp (1988, 1991), in which students with a very low need are the least likely to seek help. the extent to which this case is due to competencies in self-regulated learning (karabenick & knapp, 1991) would require further examination. students who are factual supervisor-focused and motivational family-focused only appear in the first two stages. these group members seem to be highly uncertain about what to exactly do. furthermore, they do not seem to know for sure how to motivate themselves in the concept development and the implementation stage, based on their comparatively highest values for both stages regarding perceived need and perceived frequency of seeking help from the supervisor concerning factual issues and from the family regarding motivational issues. however, the underlying reasons for these generally increased values would have to be verified in further investigations. in contrast, the motivational family-focused group only appears in the last two stages, which makes sense, as maintaining motivation and persisting is especially relevant for long-term tasks, and challenges in this regard might arise more toward the middle and the end of the process (ulmi, bürki, verhein & marti, 2017). research has also revealed that motivational aspects are preferably discussed with trusted persons (boldero & fallon, 1995; fallon & bowles, 1999; nelson le gall, gumerman & scott-jones, 1983). factual supervisor-focused students only appeared in the concept development stage. motivational challenges do not seem to be an issue for this group. in addition, the factual and motivational family-focused appear in only one of the stages, namely, the final stage. these students mainly report asking the family for help with motivational issues and issues in work organisation. however, a striking aspect of the final stage is that none of the class members has a strong orientation toward supervisors. work on the paper should have further progressed by this time; hence, the inhibition threshold for asking the supervisor might naturally rise, as the supervisor is also the assessor (bonati & hadorn, 2009). another possibility is that the ensuing questions no longer concern the area of competence or responsibility of the supervisor, such as proofreading the work. overall, the supervisors do not seem to be perceived as the contact persons with whom crises or motivational challenges can be discussed. this inference particularly confirms the findings of boldero and fallon (1995), who highlighted that personal issues are most likely to be discussed with family or peers. thus, compared to karabenick’s types (2003), we have found a hybrid type in which informal and formal sources are asked for help, namely, the factual supervisor-focused and motivational family-focused. this type appears because we focused on the specific issue areas for certain contact persons, which in turn underlines the relevance of this differentiated perspective. the second aim was to determine the extent to which these shs types change during the course of the creation process of a long-term task (q2). in accordance with zimmerman (2002), we postulated that experiences in one stage can induce a change in self-reported behaviour in the subsequent stage. additionally, different problems can arise in long-term tasks, which in turn can have an influence on shs. we consequently assumed that the students can switch between the shs types and that these types do not have to be stable over time (hypothesis 2), as was expected by finney et al. (2018). this hypothesis can be confirmed, as the lta indicates a diligent switching between shs types over time. whether these changes are due to adaptive shs processes or to different key activities within the three stages requires further investigation. regarding the transfer between types, the explorative assumption was made that students who receive help from their supervisor that they find useful in the first stage need less help later on (hypothesis 3). sixty-two percent of the factual supervisor-focused students (t1) change to the class of independents (t2). thus, hypothesis 3 can be largely confirmed. it indicates that these students feel that they are prepared to implement the paper concept based on a productive concept stage with the supervisor. an effective concept and precise aim specifications facilitate targeted work afterwards (mccardle, webster, haffey & hadwin, 2017; ritschl, weigl & stamm, 2016), which in turn benefits independent work. thus, we also assumed that without a broad discussion on or clarification of the concept, the students might need help at a later stage (hypothesis 4). seventy-six percent of students from the independents (t1) changed to the factual supervisor-focused and motivational family-focused (t2) type of shs, confirming hypothesis 4. these students seem to have many open factual issues and apparently need help with motivational issues in the implementation stage. the assumption is that the concept was not specifically thought through or that certain questions will only arise during the implementation stage. questions that are not clarified at the concept development stage can reappear in the implementation stage. another reason could be a relatively large amount of prior knowledge, which can result in the perceived need for help at the concept development stage (t1) being underestimated by overestimating the understanding of the learning material (scardamalia & bereiter, 1992). we also expected students who need help throughout all the phases with different issues and therefore ask different contact persons for help (hypothesis 5). students assigned to the motivational family-focused during the implementation stage switch 100% to the factual and motivational family-focused for the final stage. this large group of students is apparently dependent on help from their families not only with motivational issues but also with factual issues, especially timetable and organisation of the work. hence, hypothesis 5 can be partially confirmed. although these students continuously seek help, the contact persons do not vary considerably: these students strongly involve their families to resolve challenges. however, a conclusion that can be derived based on the results is that issues with content simply no longer exist and that these queries must have been resolved with the supervisor beforehand. this group does not turn to the supervisor during either the implementation stage or the final stage; thus, a performance goal orientation should also be considered, as these students may not intend to embarrass themselves in front of their assessing supervisor (karabenick, 2003; ryan et al., 2001). the third aim of the present research was to examine the extent to which the shs types found per stage are important for academic achievement (q3). against the background of previous studies (butler, 1998, 2006; finney et al., 2018; karabenick, 2003; karabenick & knapp, 1988, 1991; nadler, 1998; nelson-le gall, 1981; ryan et al., 2005), we had assumed that the shs types differ in their academic performance (hypothesis 6). this hypothesis could not be confirmed, because no significant differences between the shs types per stage can be identified. the students examined here were apparently able to adequately assess their need for help and then seek and receive help in such a way that they received a favourable grade on the paper. however, it is another question whether this help-seeking behaviour enhances the student's understanding in the area for which help was sought. this condition would require closer examination of individual help-seeking/help-giving interactions and behaviour and knowledge in future, similar situations. 4.2 conclusion for theory and practice it should be noted for shs theory that shs types can significantly change during task stages and thus within a learning process as well. we have found some shs types that are addressed by the types discovered in previous investigations. nevertheless, through the identification of the thematic focus of shs, other shs types emerge, such as factual supervisor-focused and motivational family-focused. this inference denotes that future investigations should pay more attention to hs as an issue-specific process. furthermore, shs avoidance should not be stated unless the need for help has also been considered. this group of students could be quickly misunderstood and mistakenly classified as a 'risk group'. therefore, shs types should be interpreted with caution because they may also substantially change depending on the context. in the context of preparing an academic paper, the students apparently rarely turn to their paper supervisors for help with motivational issues. nevertheless, paper supervisors in particular could teach suitable motivational regulatory strategies to encourage their students to continue their work (dignath & büttner, 2008). in this case, the supervisors must first be aware that their students are struggling with motivational difficulties. social interactions concerning not only content but also motivational issues should be intensified (wolters, 2003). however, this process requires a strong basis of trust between supervisor and student, a relationship that would also allow students to reveal their weaknesses or difficulties (butler & neuman, 1995; newman, 2000; ryan & pintrich, 1997). in addition, 62% of the students who were in the factual supervisor-focused group at the concept development stage switched to the independents. this result illustrates the relevance of close support during the development of the concept for independent work during the implementation stage, which is also described in the coaching literature (e.g. ulmi et al., 2017). most of the students in this study seem to know how to achieve desired grades through help-seeking processes if they have a perceived need for help. nonetheless, for the help-givers, this help should be provided in a way that enables students to help themselves in similar situations (e.g. by developing appropriate motivational regulation strategies). in other words, helpers such as parents, peers and supervisors should not primarily focus on the students' good grades but should instead direct their support toward sustainability and lifelong learning (eu council, 2002). 4.3 limitations and suggestions for further research directions in this study, we adopted a longitudinal, person-centred perspective on shs types outside the classroom context and integrated important aspects such as perceived need for help, specific issue areas, and the related sources of help. we could therefore show that shs types can vary based on the different issue areas encountered. however, the data evaluated in this study are based on self-reports regarding shs. measurements that are more objective could probably provide deeper and/or extended insights into the shs processes. this aspect especially relates to measurement of the frequency of seeking help. notably, our response options reflected the individually perceived frequency of help seeking and not objective frequency. the same applies to assessing the need for help. these aspects should be considered in future studies. although the recorded academic achievements were not based on self-reports, the students showed a relatively high average grade, which may have caused insignificant differences between the shs types. a comparison of extreme groups and the resulting shs types could lead to extended results. the backgrounds of the individual class members must also be further analysed to improve the understanding of the shs strategies of the students in these classes; the reason is that individual characteristics such as gender (e.g. nadler, 1998) and goal orientation (e.g. butler, 2006; ryan et al., 2001) can affect shs behaviour. the influence of the parents' support on the shs types is an important aspect that is lacking in research up to now. as newman (2000) was able to confirm that experiences from the parental home can influence help seeking, this area should also be integrated into further analyses of the shs types. finally, how different types of shs can influence performance levels, as well as how performance is influenced by switching between classes, should be taken into consideration. the outcomes clearly indicate that without the inclusion of analyses relating to family relationships and the cooperation/relationship between the students and the paper supervisor, only vague assumptions can be made about the reasons for the changes. this factor must be taken into account in further analyses. the present analyses are related to the context of producing a compulsory school leaving certificate paper (matura thesis) at the end of the upper secondary school. further analyses in the same or a similar context would be needed to strengthen these results, which could particularly show that shs should be considered context-specifically. keypoints students who do not seek help are not necessarily social help-seeking (shs) avoiders, but can also be students who work independently (i.e. students who do not perceive a need for help). within a long-term challenging task with various key activities, students can switch between shs types. for the final grade, the shs type per task stage is unimportant in this context. references aleven, v., stahl, e., schworm, s., fischer, f., & wallace, r. (2003). help seeking and help design in interactive learning environments. review of educational research, 73, 277-320. https://doi.org/10.3102/00346543073003277 asparouhov, t., & muthén, b. o. (2014). variable-specific entropy contribution. retrieved from https://www.statmodel.com/download/univariateentropy.pdf backhaus, n., & tuor, r. (2008). leitfaden für wissenschaftliches arbeiten, 7. überarbeitete und ergänzte auflage [guidelines for scientific work, 7th revised and supplemented edition]. zurich, switzerland: department of geography, university of zurich. https://doi.org/10.5167/uzh-10134 boldero, j., & fallon, b. j. (1995). adolescent help seeking: what do they get help for and from whom? journal of adolescence, 18, 193-209. https://doi.org/10.1006/jado.1995.1013 bonati, p., & hadorn, r. (2009). maturaund andere selbständige arbeiten betreuen. ein handbuch für lehrpersonen und dozierende. 2., überarbeitete und erweiterte auflage [supervising matura and other independent work. a manual for teachers and lecturers. 2nd, revised and extended edition]. bern, switzerland: hep verlag. butler, r. (1998). determinants of help seeking: relations between perceived reasons for classroom help-avoiding and help-seeking behaviors in an experimental context. journal of educational psychology, 90(4), 630-644. https://doi.org/10.1037/0022-0663.90.4.630 butler, r. (2006). an achievement goal perspective on student help seeking and teacher help giving in the classroom: theory, research, and educational implications. in s. a. karabenick & r. s. newman (eds.), help seeking in academic settings: goals, groups, and contexts (pp. 17-34). new york, ny: erlbaum. butler, r., & neuman, o. (1995). effects of task and ego achievement goals on help-seeking behaviors and attitudes. journal of educational psychology, 87, 261-271. https://doi.org/10.1037/0022-0663.87.2.261 dignath, c., & büttner, g. (2008). components of fostering self-regulated learning among students: a meta-analysis on intervention studies at primary and secondary school level. metacognition learning, 3, 231-264. https://doi.org/10.1007/s11409-008-9029-x elliot, a. j., & church, m. a. (1997). a hierarchical model of approach and avoidance achievement motivation. journal of personality and social psychology, 72(1), 218-232. https://doi.org/10.1037/0022-3514.72.1.218 elliot, a. j., & mcgregor, h. a. (2001). a 2 x 2 achievement goal framework. journal of personality and social psychology, 80, 501-519. https://doi.org/10.1037/0022-3514.80.3.501 ellis, s. (1997). strategy choice in sociocultural context. developmental review, 17, 490-524. https://doi.org/10.1006/drev.1997.0444 eu council. (2002). council resolution of 27 june 2002 on lifelong learning. official journal of the european communities, 9. retrieved from https://op.europa.eu/en/publication-detail/-/publication/0bf0f197-5b35-4a97-9612-19674583cb5b fallon, b. j., & bowles, t. (1999). adolescent help-seeking for major and minor problems. australian journal of psychology, 51(1), 12-18. https://doi.org/10.1080/00049539908255329 finch, h. w., & bronk, k. c. (2011). conducting confirmatory latent class analysis using mplus. structural equation modeling: a multidisciplinary journal, 18(1), 132-151. https://doi.org/10.1080/10705511.2011.532732 finney, s. j., barry, c. l., horst, s. j., & johnston, m. m. (2018). exploring profiles of academic help seeking: a mixture modeling approach. learning and individual differences, 61, 158-171. https://doi.org/10.1016/j.lindif.2017.11.011 geiser, c. (2011). datenanalyse mit mplus: eine anwendungsorientierte einführung [data analysis with mplus: a practical introduction]. wiesbaden, germany: vs verlag für sozialwissenschaften. https://doi.org/10.1007/978-3-531-92042-9 huber, c., husfeldt, v., lehmann, l., & quesel, c. (2008). projektteil d2: die qualität von maturaarbeiten in der schweiz [project part d2: the quality of matura work in switzerland.]. in f. eberle, k. gehrer, b. jaggi, m. kottnau, m. oepke, c. pflüger, c. huber, v. husfeldt, l. lehmann, & c. quesel (eds.), evaluation der maturitätsreform 1995 (evamar). schlussbericht zur phase ii (pp. 277-352). bern, switzerland: edi, sbf. retrieved from http://edudoc.ch/record/29677/files/web_evamar-komplett.pdf huber, c., lehmann, l., & husfeldt, v. (2011). unterschiedliche rahmenbedingungen bei der realisierung von maturaarbeiten [differing framework conditions for the realisation of matura theses]. revue suisse des sciences de l’éducation, 33(3), 443-460. retrieved from https://www.pedocs.de/volltexte/2015/10122/pdf/szbw_2011_3_huber_ua_unterschiedliche_rahmenbedingungen.pdf järvelä, s. (2011). how does help seeking help? new prospects in a variety of contexts. learning and instruction, 21, 297-299. https://doi.org/10.1016/j.learninstruc.2010.07.006 karabenick, s. a. (1998). strategic help-seeking: implications for learning and teaching. mahwah, nj: erlbaum. karabenick, s. a. (2003). seeking help in large college classes: a person-centered approach. contemporary educational psychology, 28, 37-58. https://doi.org/10.1016/s0361-476x(02)00012-7 karabenick, s. a., & berger, j.-l. (2013). help seeking as a self-regulated learning strategy. in h. bembenutty, t. j. cleary, & a. kitsantas (eds.), applications of self-regulated learning across diverse disciplines: a tribute to barry j. zimmerman (pp. 237-261). charlotte, nc: information age publishing. karabenick, s. a., & gonida, e. n. (2018). academic help seeking as a self-regulated learning strategy: current issues, future directions. in d. h. schunk & j. a. greene (eds.), handbook of self-regulation of learning and performance (pp. 421-433). new york, ny: routlege. karabenick, s. a., & knapp, j. r. (1988). help seeking and the need for academic assistance. journal of educational psychology, 80(3), 406-408. https://doi.org/10.1037/0022-0663.80.3.406 karabenick, s. a., & knapp, j. r. (1991). relationship of academic help seeking to the use of learning strategies and other instrumental achievement behavior in college students. journal of educational psychology, 83(2), 221-230. https://doi.org/10.1037/0022-0663.83.2.221 karabenick, s. a., & newman, r. s. (2006).help seeking in academic settings: goals, groups and contexts. mahwah, n.j.: lawrence erlbaum associates. karabenick, s. a., & newman, r. s. (2010). seeking help as an adaptive response to learning difficulties: person, situation, and developmental influences. in s. järvela (ed.), social and emotional aspects of learning (pp. 244-250). kidlington, oxford, uk: elsevier. https://doi.org/10.1016/b978-0-08-044894-7.00610-2 magnusson, d. (1998). the logic and implications of a person-oriented approach. in r. b. cairns, l. r. bergman, & j. kagan (eds.), methods and models for studying the individual: essays in honor of marian radke-yarrow (pp. 33-64). thousand oaks, ca: sage. makara, k. a., & karabenick, s. a. (2013). characterizing sources of academic help in the age of expanding educational technology: a new conceptual framework. in s. a. karabenick & m. puustinen (eds.), advances in help-seeking research and applications: the role of emerging technologies (pp. 37-72). charlotte, nc: information age publishing. martens, j. (2003). statistische datenanalyse mit spss für windows [statistical data analysis with spss for windows]. munich, germany: oldenbourg wissenschaftsverlag. https://doi.org/10.1515/9783486815085 mccardle, l., webster, e. a., haffey, a., & hadwin, a. f. (2017). examining students’ self-set goals for self-regulated learning: goal properties and patterns. studies in higher education, 42(11), 2153-2169. https://doi.org/10.1080/03075079.2015.1135117 muthén, b. o., & asparouhov, t. (2011, 27. july). lta in mplus: transition probabilities influenced by covariates. mplus web notes: no. 13. retrieved from http://www.statmodel.com/examples/ltawebnote.pdf muthén, l. k., & muthén, b. o. (1998-2017). mplus user’s guide (8th ed.). los angeles, ca: muthén & muthén. retrieved from https://www.statmodel.com/download/usersguide/mplususerguidever_8.pdf nadler, a. (1998). relationship, esteem, and achievement perspectives on autonomous and dependent help seeking. in s. karabenick (ed.), strategic help seeking: implications for learning and teaching (pp. 61-93). mahwah, new jersey: erlbaum. nagin, d. s. (2005). group-based modeling of development. cambridge, ma: harvard press. https://doi.org/10.4159/9780674041318 nelson-le gall, s. (1981). help-seeking: an understudied problem-solving skill in children. developmental review, 1(224-246). https://doi.org/10.1016/0273-2297(81)90019-8 nelson-le gall, s. (1985). help-seeking behavior in learning.review of research in education , 12, 55-90. https://doi.org/10.3102/0091732x012001055 nelson le gall, s., gumerman, r. a., & scott-jones, d. (1983). instrumental help-seeking and everyday problem-solving: a developmental perspective. in b. m. de-paulo, a. nadler, & j. d. fisher (eds.), new directions in helping (pp. 265-281). new york, ny: academic press. newman, r. s. (2000). social influences on the development of children’s adaptive help-seeking: the role of parents, teachers, and peers. developmental review, 20, 350-404. https://doi.org/10.1006/drev.1999.0502 nylund, k. l. (2007). latent transition analysis: modeling extensions and an application to peer victimization. retrieved from http://www.statmodel.com/download/nylund%20dissertation%20updated1.pdf pintrich, p. r., & zusho, a. (2002). the development of academic self-regulation: the role of cognitive and motivational factors. in a. wigfield & j. s. eccles (eds.), development of achievement motivation (pp. 249-284). san diego, ca: academic press. https://doi.org/10.1016/b978-012750053-9/50012-7 rickwood, d. j. (1995). the effectiveness of seeking help for coping with personal problems in late adolescence. journal of youth and adolescence, 24, 685-703. https://doi.org/10.1007/bf01536951 ritschl, v., weigl, r., & stamm, t. (2016). wissenschaftliches arbeiten und schreiben. verstehen, anwenden, nutzen für die praxis [academic work and writing. comprehension, application and benefit for practice]. berlin, germany: springer. https://doi.org/10.1007/978-3-662-49908-5 robins, r. w., john, o. p., & caspi, a. (1998). the typological approach to studying personality. in r. b. cairns, l. r. bergman, & j. kagan (eds.), methods and models for studying the individual: essays in honor of marian radke-yarrow (pp. 135-160). thousand oaks, ca: sage. rost, j. (2006). latent class analysis. in f. petermann & m. eid (eds.), handbuch der psychologischen diagnostik [manual of psychological diagnostics]. göttingen, germany: hogrefe verlag. ryan, a. m., patrick, h., & shim, s.-o. (2005). differential profiles of students identified by their teacher as having avoidant, appropriate, or dependent help-seeking tendencies in the classroom. journal of educational psychology, 97(2), 275-285. https://doi.org/10.1037/0022-0663.97.2.275 ryan, a. m., & pintrich, p. r. (1997). ‘‘should i ask for help?’’ the role of motivation and attitudes in adolescents’ help seeking in math class. journal of educational psychology, 89, 329-341. https://doi.org/10.1037/0022-0663.89.2.329 ryan, a. m., pintrich, p. r., & midgley, c. (2001). avoiding seeking help in the classroom: who and why? educational psychology review, 13(2), 93-114. https://doi.org/10.1023/a:1009013420053 ryan, a. m., & shim, s. s. (2012). changes in help seeking from peers during early adolescence: associations with changes in achievement and perceptions of teachers. journal of educational psychology, 104 (4), 1122-1134. https://doi.org/10.1037/a0027696 ryan, a. m., & shin, h. (2011). help-seeking tendencies during early adolescence: an examination of motivational correlates and consequences for achievement. learning and instruction, 21, 247-256. https://doi.org/10.1016/j.learninstruc.2010.07.003 scardamalia, m., & bereiter, c. (1992). text-based and knowledge-based questioning by children. cognition and instruction, 9(3), 177-199. https://doi.org/10.1207/s1532690xci0903_1 schenke, k., lam, a. c., conley, a. m., & karabenick, s. a. (2015). adolescents’ help seeking in mathematics classrooms: relations between achievement and perceived classroom environmental influences over one school year. contemporary educational psychology, 41, 133-146. https://doi.org/10.1016/j.cedpsych.2015.01.003 schworm, s. (2018). lernen in computerbasierten lernumgebungen: instruktionale unterstützungsmöglichkeiten [learning in computer-based learning environments: instructional support possibilities]. in m. heilemann, h. stöger, & a. ziegler (eds.), lernen im internet (pp. 93-112). berlin, germany: lit verlag. schworm, s., & fischer, f. (2006). academic help seeking. in h. mandl & h. f. friedrich (eds.), handbuch lernstrategien (pp. 282-239). göttingen, germany: hogrefe. sheskin, d. j. (2011). handbook of parametric and nonparametric statistical procedures. boca raton, fl: chapman & hall/crc press. specht, j., luhmann, m., & geiser, c. (2014). on the consistency of personality types across adulthood: latent profile analyses in two large-scale panel studies. journal of personality and social psychology, 107, 540–556. https://doi.org/10.1037/a0036863 stroebe, w., hewstone, m., codol, j.-p., & stephenson, g. m. (2013). sozialpsychologie: eine einführung. [social psychology: an introduction]. heidelberg, germany: springer verlag. swiss federal council, & edk. (1995). verordnung des bundesrates/reglement der edk über die anerkennung von gymnasialen maturitätsausweisen (mar) vom 16. januar/15. februar 1995 [ordinance of the federal council/regulation of the edk on the recognition of matura certificates (mar) of 16 january/15 february 1995]. bern, switzerland: schweizerischer bundesrat/edk. retrieved from https://edudoc.educa.ch/static/web/aktuell/medienmitt/vo_mar_1995_d.pdf tinsley, h. e. a., de st. aubin, t., & brown, m. (1982). college students' help-seeking preferences. journal of counselling psychology, 29, 523-533. https://doi.org/10.1037/0022-0167.29.5.523 ulmi, m., bürki, g., verhein, a., & marti, m. (2017).textdiagnose und schreibberatung. fachund qualifizierungsarbeiten begleiten [text diagnosis and writing advice: accompanying specialized and qualification work] (2nd ed.). berlin, germany: barbara budrich. wolters, c. a. (2003). regulation of motivation: evaluating an underemphasized aspect of self-regulated learning. educational psychologist, 38(4), 189-205. https://doi.org/10.1207/s15326985ep3804_1 wolters, c. a., pintrich, p. r., & karabenick, s. a. (2003).assessing academic self-regulated learning. paper presented at conference on indicators of positive development: definitions, measures, and prospective validity, washington dc. retrieved from https://www.researchgate.net/profile/stuart_karabenick/publication/225229608_assessing_academic_self-regulated_learning/links/5416daec0cf2bb7347db788a/assessing-academic-self-regulated-learning.pdf zimmerman, b. j. (2002). becoming a self-regulated learner: an overview. theory into practice, 41(2). https://doi.org/10.1207/s15430421tip4102_2 zimmerman, b. j., & moylan, a. r. (2009). where metacognition and motivation intersect. in d. j. hacker, j. dunlosky, & a. c. graesser (eds.), handbook of metacognition in education. new york, ny: routledge. frontline learning research 2 (2013) 12-34 issn 2295-3159 corresponding author: jenna vekkaila, department of teacher education, faculty of behavioural sciences, university of helsinki, p.o. box 9 (siltavuorenpenger 7), 00014 helsinki, finland. e-mail: jenna.vekkaila@helsinki.fi http://dx.doi.org/10.14786/flr.v1i2.43 12 | f l r focusing on doctoral students’ experiences of engagement in thesis work jenna vekkaila a , kirsi pyhältö a , kirsti lonka a a university of helsinki, finland article received 13 june 2013 / revised 18 september 2013 / accepted 19 september 2013 / available online 20 december 2013 abstract little is known about what inspires students to be involved in their doctoral process and stay persistent when facing challenges. this study explored the nature of students’ engagement in the doctoral work. altogether, 21 behavioural sciences doctoral students from one top-level research community were interviewed. the interview data were qualitatively content analysed. the doctoral students described their engagement in terms of experiences of dedication and efficiency. they rarely reported experiences of absorption. the primary sources of their engagement in their thesis work were increased sense of competence and relatedness. in addition, three qualitatively different forms of engagement in doctoral work including adaptive engagement, agentic engagement and work-life inspired engagement were identified from the doctoral students’ descriptions. further, there was a variation among the students in terms of what forms of engagement they emphasised in different phases of their doctoral studies. this study contributed to the literature on doctoral student engagement by opening the nature of engagement at the interfaces of studying and working by shedding light on the dual role of doctoral students as both students and professional researchers. moreover, this study broke down the complexity of engagement by identifying qualitatively different experiences and sources of engagement. the results encourage designing such engaging learning environments for doctoral students that promote their experiences of being competent researchers and integrated into their scholarly community. keywords: engagement; doctoral education; doctoral experience; scholarly community j. vekkaila et al. 13 | f l r 1. introduction doctoral studies are about learning in terms of research work and becoming an acknowledged researcher in a scholarly community. this takes place at the interfaces of studying and working. conducting doctoral research can be seen as both academic work and studying. doctoral students take their first steps as professional researchers by carrying out doctoral research and teaching undergraduates, which both can be considered to be academic work (brew, boud, & namgung, 2011; golde, 1998; turner & mcalpine, 2011). however, doctoral students also take courses in the role of a student (brew et al., 2011; golde, 1998; turner & mcalpine, 2011). such dual role at the interfaces of studying and working are nowadays required also more generally in life-long training to various professions in business, industry and government by the wider knowledge economy (boud & tennant, 2006; bourner, bowden, & laing, 2001; park, 2005) where solving complex, ill-defined problems (alexander, 1992; lonka, 1997) is constantly increasing. although doctoral students are highly competent and successful based on their academic backgrounds, earning the doctorate is always a highly challenging process. for instance, in doctoral education literature students‟ experienced distress (e.g., hyun, quinn, madon, & lustig, 2006; kurtz-costes, helmke, & ülküsteiner, 2006; toews et al., 1997) and remarkably high attrition rates varying from 30% to 50% (gardner, 2007; golde, 2005; lovitts, 2001; mcalpine & norton, 2006) depending on the contexts have been identified as huge challenges. especially in social sciences high attrition rates among doctoral students are a major concern (lovitts, 2001; lovitts & nelson, 2000; mcalpine & norton, 2006; nettles & millet, 2006). so-called “soft” or “ill-defined” domains such as the social and behavioural sciences are characterised by relatively loose theoretical structure and target of interest as well as unspecific strategies of inquiry (alexander, 1992; biglan, 1973a, 1973b). in such domains researchers often define and are involved in their own individual projects (e.g., lovitts, 2001). therefore, individualistic research structure may promote the idea of independent thinkers (chiang, 2003). however, it can also entail separation, which, in turn, is likely to promote negative experiences (e.g., chiang, 2003; lovitts, 2001) and consideration of interrupting doctoral studies (e.g., stubb, pyhältö, & lonka, 2011). in order to find ways to support doctoral student persistence research on doctoral education has for a long time focused on attrition and negative experiences (e.g., golde, 1998, 2005; lovitts, 2001; vassil & solvak, 2012; vekkaila, pyhältö, & lonka, 2013). research among undergraduate students, however, suggests that by focusing on strengths, positive emotions and full functioning (bresó, schaufeli, & salanova, 2011; krause & coates, 2008; ouweneel, le blanc, & schaufeli, 2011), a better understanding on doctoral students‟ engagement can be attained. this understanding provides tools for creating increasingly engaging environments for doctoral students (e.g., pontius & harper, 2006). our study aimed at filling the gap in the doctoral education literature by exploring the nature of doctoral students‟ engagement in their thesis work in the domain of behavioural sciences. 1.1 engagement in doctoral work owing to the dual nature of doctoral research, our study draws both on research on work engagement (e.g., schaufeli, martínez, pinto, salanova, & bakker, 2002a; schaufeli, salanova, gonzález-romá, & bakker, 2002b) and on study engagement (e.g., fredricks, blumenfeld, & paris, 2004; reeve, jang, carrell, jeon, & barch, 2004) to examine doctoral student engagement in doctoral work. engagement refers to a student‟s active involvement in a task or an activity at hand (e.g., case 2008; fredricks et al., 2004; reeve et al., 2004). accordingly, doctoral student engagement entails active involvement in the learning opportunities and practices provided by their environments. engagement is characterised by positive, fulfilling experiences including vigour, dedication and absorption (salanova, schaufeli, martínez, & bresó, 2010; schaufeli et al., 2002a, 2002b). vigour refers to high levels of energy and mental resilience while working, the willingness to invest effort in one‟s work, and persistence in the face of difficulties (schaufeli et al., 2002b). dedication, on the other hand, is characterised by a sense of significance, enthusiasm, inspiration, pride and challenge (schaufeli et al., 2002b). being fully concentrated http://www.tandfonline.com/action/dosearch?action=runsearch&type=advanced&result=true&prevsearch=%2bauthorsfield%3a%28mart%c3%adnez%2c+isabel%29 http://www.tandfonline.com/action/dosearch?action=runsearch&type=advanced&result=true&prevsearch=%2bauthorsfield%3a%28mart%c3%adnez%2c+isabel%29 http://www.tandfonline.com/action/dosearch?action=runsearch&type=advanced&result=true&prevsearch=%2bauthorsfield%3a%28bres%c3%b3%2c+edgar%29 j. vekkaila et al. 14 | f l r on and immersed in one‟s work characterises absorption (schaufeli et al., 2002b). absorption is close to the flow experience in which an individual is deeply immersed in an activity that is intrinsically enjoyable (csikszentmihalyi, 1990). there is evidence that engaged doctoral students were likely to feel effective and satisfied with their thesis work, and remained determined when encountering challenges (virtanen & pyhältö, 2012). in contrast, students who suffered from disengagement from their doctoral studies, were likely to feel less satisfied and more likely to give up (vekkaila et al., 2013). moreover, engaged doctoral students have, for instance, been shown to attain better learning outcomes and relationships within their scholarly community (gardner & barnes, 2007). several factors contribute to engagement (e.g., llorens, schaufeli, bakker, & salanova, 2007; reeve et al., 2004; schaufeli & bakker, 2004). for instance, in previous studies on doctoral education good quality supervision, support and constructive feedback (e.g., golde, 2005; hoskins & goldberg, 2005) as well as meaningful interaction within the scholarly community (e.g., gardner, 2007; deem & brehony, 2000; lovitts, 2001; pyhältö, stubb, & lonka, 2009; stubb et al., 2011) have been identified as predictors of doctoral students‟ satisfaction, study persistence and well-being. for instance, weidman and stein (2003) found a link between the number of faculty-student interactions and students‟ involvement in their research projects. moreover, ives and rowley (2005) showed that a constructive supervisory relationship was associated with students‟ progress and satisfaction with their doctoral studies, and hence their involvement in their thesis projects. 1.2 engagement and dynamic interplay between doctoral students and their environments the scholarly community often provides the primary work environment for doctoral students (brew et al., 2011; gardner, 2007; mcalpine & amundsen, 2008; pyhältö et al., 2009). hence, doctoral students‟ learning is highly embedded in the practices of a scholarly community. however, this community itself is a complex, multilayered, nested entity (mcalpine & norton, 2006) that can be defined as a discipline such as „education,‟ as a faculty, or as a specific research group (e.g., austin, 2002; pyhältö, nummenmaa, soini, stubb, & lonka, 2012a; white & nonnamaker, 2008). accordingly, the community provides various arenas and forms for student participation such as interaction with faculty, participation in international conferences, peer collaboration, working in a research group and teaching undergraduate students (brew et al., 2011; pyhältö & keskinen, 2012). further, students‟ involvement in the various arenas such as conducting research work, attending courses and participating in research collaboration may promote their experiences of dedication to and vigour in earning the doctorate as well as absorption in conducting research work. the previous findings on doctoral education imply that the doctoral student engagement is regulated by a complex, dynamic interplay between the student and the environment rather than a single individual or environmental attribute (e.g., golde, 2005; virtanen & pyhältö, 2012; vekkaila, pyhältö, hakkarainen, keskinen, & lonka, 2012; vekkaila et al., 2013). this includes that doctoral students‟ experiences of engagement are constantly constructed and re-constructed in the student-environment interaction. such interaction entails the students‟ prior learning experiences, beliefs, goals, and the practices and culture of the environment. doctoral students‟ perceptions, participation and other practices are mediated by their prior experiences and knowledge that have developed during their undergraduate studies and in their other professional careers or personal lives. the culture and practices of the environment, in turn, affect doctoral students‟ thinking, actions, and engagement. accordingly, the complex doctoral student-learning environment interrelation mediates students‟ engagement in the doctoral process. the dynamic interplay between the learner and learning environment (e.g., lindblom-ylänne & lonka, 2000) contributes to not only whether or not students engage in their studies (e.g., fredricks et al., 2004; leiter & bakker, 2010) but also to the ways in which they engage in their studies. accordingly, the quality of the dynamics between the doctoral student and the environment is likely to contribute to the ways the student engages in doctoral work. dynamics between students and their environment can contribute j. vekkaila et al. 15 | f l r to students‟ sense of relatedness, competence, autonomy (deci & ryan, 2002, 2008) and contribution (eccles, 2008). deci and ryan (2002) have proposed that the experiences of relatedness, competence and autonomy are the prerequisites for individuals‟ personally meaningful actions and experiences (see the selfdetermination theory). the sense of relatedness refers to feeling connected to others, having sense of belonging both with other individuals and with one‟s community, and be integral to and accepted by others (deci & ryan, 2002). the sense of competence, in turn, focuses on feeling effective and confident in one‟s on-going actions within the social environment and experiencing opportunities to express and exercise one‟s capacities (deci & ryan, 2002). when individuals are autonomous they feel as if they are the source of their own actions and behaviour even when those actions are influenced by outside forces (deci & ryan, 2002). that is, their actions are based on their own personal interests and values. furthermore, it is important to feel a sense of contribution when acting in a personally meaningful way (eccles, 2008). thus, the experiences of belonging, competence, autonomy and contribution are necessary in order to promote doctoral students‟ engagement (mason, 2012; virtanen & pyhältö, 2012). for instance, appel and dahlgren (2003) found that doctoral students were inspired in their studies by the opportunities available for intellectual development, feelings of having internal locus of control and academic freedom as a researcher, and chances to make a difference by their doctoral project. in addition, stubb et al. (2011) and pyhältö and keskinen (2012) more recently found that the doctoral students who experienced their scholarly community in a positive way, that is, as empowering, or who perceived themselves as active agents, less often reported lack of interest towards their own studies and considered interrupting their doctoral process less often than those students who had negative experiences or perceived themselves as passive objects. this indicates that doctoral students can be active in certain interaction arenas of a scholarly community whereas in some other communities they may participate infrequently and be more in a role of an observer. this, in turn, is likely to contribute to their engagement in doctoral work. it follows that also students‟ engagement in terms of how agentic (reeve & tseng, 2011) they experience themselves in their doctoral work may vary. at its best a doctoral student‟s peripheral role gradually evolves towards active, relational agency as the student is involved more intensively in the research group‟s shared knowledge creation practices, and develops a sense of ownership of one‟s own doctoral research and identity as a researcher (hakkarainen, hytönen, makkonen, seitamaa-hakkarainen, & white, 2013; hopwood, 2010; pyhältö & keskinen, 2012; pyhältö et al., 2012a; vekkaila et al., 2012). relational agency (edwards, 2005) refers to the capacity of doctoral students to work with other members of their research community in order to better respond to complex research problems (pyhältö & keskinen, 2012). this means that doctoral students are not influenced only by the scholarly community but can, at least to some extent, choose their primary arenas in which to participate and take initiative, direct and re-direct their own activity and learning (pyhältö & keskinen, 2012). therefore by adopting different strategies, the students can actively modify their environment, and hence their opportunities to engage in the scholarly community in question (virtanen & pyhältö, 2012) and, further, in their doctoral work. sometimes students‟ engagement in doctoral work may be inspired mainly by their work-life experience. mäkinen, olkinuora, and lonka (2004) showed that especially in fields such as teacher education, law, and medicine, where the student aimed at professional development rather than abstract, theoretical understanding, the so-called work-life orientation dominated. in previous studies those university students who expressed so-called work-life orientation were interested in professional development and saw their studying as training for a certain profession or vocation (lonka & lindblom-ylänne, 1996; mäkinen et al., 2004; vermunt, 1996). they appreciated directly useful, concrete and applicable knowledge (lonka & lindblom-ylänne, 1996; mäkinen et al., 2004; vermunt, 1996). such orientation on studying was considered to reflect practical interest rather than scientific ambition (lonka & lindblom-ylänne, 1996; mäkinen et al., 2004). brint, cantwell, and hannerman (2008), for instance, found in their study on undergraduate students that the culture of engagement in the arts, humanities and social sciences focused on participation and interest in ideas, whereas the culture of engagement in the natural sciences and engineering focused more on improvement of research skills, collaborative study, and the labour market. in our recent study on natural sciences doctoral students, this was not as straightforward: the students‟ inspiration and engagement in the doctoral work was often due to a strengthened sense of belonging and participation in the various practices j. vekkaila et al. 16 | f l r of their research community (vekkaila et al., 2012). this suggests that students may have different ways of being engaged in their doctoral process and earning the doctoral degree. the present study focused on exploring behavioural sciences students‟ engaging doctoral experiences. 2. the aim of this study this study is a part of a larger national research project on doctoral education in finland that aims to understand the process of phd education (see pyhältö et al., 2009). the present study aimed at gaining a better understanding of doctoral student engagement in thesis work. in our study, the following research questions were addressed: 1. what kinds of experiences of engagement did the doctoral students describe? 2. what were the sources of engagement in doctoral work? 3. were there qualitatively different forms of engagement? 3. method 3.1 doctoral education in finnish context in finland, doctoral studies are heavily focused on conducting thesis research. there is no extensive separate course work required before launching the doctoral research project. in fact, course work from 40 to 80 european credit transfer and accumulation system (ects) credits worth of postgraduate studies depending on the discipline included in doctoral studies are usually individually constructed and based on personal study plans that typically include international conferences and some methodological studies. in behavioural sciences, an article compilation with a summary has become the dominant form of thesis (66%) during recent years (pyhältö, stubb, & tuomainen, 2011). the article compilation is more dominant in psychology, whereas in educational sciences the dominant form is the monograph (a book format). the article compilation consists of three to five internationally refereed journal articles often coauthored with the supervisors and a summary that includes an introduction and a discussion bringing together the separate articles. doctoral supervision is usually based on an apprenticeship, in both research groups and supervisor-student dyads (löfström & pyhältö, 2012). in finland, students can conduct doctoral studies full-time or part-time. the general target duration for full-time studies for the doctorate is four years. however, often the completion time for the doctorate is longer than this. according to a recent survey the average time for completing the degree in behavioural sciences is five to six years (sainio, 2010). however, some sources indicate that the average completion time may be higher ranging from seven to over ten years (pyhältö et al., 2011). this may be explained by the heavy requirements of earning the doctorate. the articles included in the article compilation need to be published in peer reviewed journals, students need to write the summary of them, the thesis need to examined by two or three pre-reviewers, a students need to defend the thesis publicly before the faculty council decides whether to award the doctoral degree. long completion times may also be explained by the nature of finnish doctoral education system, that is, doctoral studies are free for the students, the licence to conduct doctoral studies is valid for life and students can conduct their doctoral studies part-time and have other professional full-time jobs. although the doctoral education is publicly funded, the students have to cover their costs of living, which is typically done through personal grants, project funding or wages earned by working outside the university (pyhältö et al., 2011). doctoral education in finland is more detailed described by the international postgraduate student mirror (2006) and pyhältö et al. (2012a). j. vekkaila et al. 17 | f l r 3.2 participants the participants were 21 behavioural sciences doctoral students (female: 17; male: 4) from a major research-intensive finnish university. all the participants were from the same case community participating in the larger national research project on doctoral education in finland (see pyhältö et al., 2009) and its all doctoral students were invited to participate in the study. participation was voluntary. the case community was chosen because it represented a national and international well-established research group and was considered to be good representative of organisation of doctoral education. eleven of the participants were full-time doctoral students and ten were part-time. six participants were pursuing a monograph, seven a summary of articles; eight participants were unsure of the form their theses would take. all the participants had master‟s degrees, typically in educational sciences and they were in different phases of their doctoral process. according to the participants‟ own estimates, twelve of them were in the beginning of the doctoral process meaning that they were typically launching their research projects, collecting and/or analysing data, or writing their first and/or second article. four were in the middle part of the process that typically included data analysis, and writing the monograph, or writing third and/or fourth article. four of the participants were in the last part of the process that typically meant finalising the monograph or the last articles and the summary of the articles. one of participants had already graduated. all the participants were interviewed on a voluntary basis. 3.3 interviews semi-structured interview (e.g., kvale, 2007) data were collected in 2007–2008. the interviews were designed to investigate the doctoral students‟ experiences of their thesis process and their views of themselves within it (see appendix 1). at the beginning of the interviews, the students were asked some background information questions about their discipline or subject, time spent on their thesis/studies, the phase of the process and time of graduation, as well as the form of the thesis and whether they were working on it full-time or part-time. the interview focused both on the retrospection of previous experiences of the ph.d. process and on the present situation. (stubb, 2012.) the interview was piloted before the actual data collection. in the first stage, it was tested with four doctoral students in behavioural sciences, and minor modifications to the questions were made. then the interview was tested with seven science students and no further modifications were required. all interviews were conducted by a researcher from the authors‟ research group (except one, which was done by a trained research assistant). each interview lasted approximately one hour (ranging from thirty minutes to almost three hours). the interviews were recorded and transcribed. (stubb, 2012.) 3.4 analysis the interview data were qualitatively content analysed (e.g., patton, 1990) by relying on an abductive strategy (e.g., coffey & atkinson, 1996; morgan, 2007). hence, the data observations and prior understanding based on theories were repeatedly assessed in relation to each other in order to acquire the most optimal understanding of the phenomenon (coffey & atkinson, 1996; morgan, 2007), that is, doctoral student engagement, when categorising the data. at the beginning of the first analysis phase, all the text segments in which the doctoral students referred to engaging experiences in terms of their doctoral work were coded into the same hermeneutic category by using a grounded strategy (e.g., harry, sturges, & klingner, 2005; mills, bonner, & francis, 2006). accordingly, all the text segments referring to engaging doctoral experiences from the 21 interviews were grouped together and formed the ground data for further analysis. the unit of analysis included the totality of thought referring to engaging experiences ranging from a sentence to dozen sentences. these text segments included expressions of interest, inspiration, energy, devotion, meaningfulness and positive doctoral thesis related emotions. j. vekkaila et al. 18 | f l r after this, the analysis focused on what the participants experienced, that is, the different qualities of engaging doctoral experiences. data were coded into three exclusive main categories by relying on research on characteristics of engagement introduced in the literature review (e.g., salanova et al., 2010; schaufeli et al., 2002a, 2002b) as follows: (a) dedication including participants‟ experiences where they expressed earning the doctorate, being a doctoral student, and conducting research and studies as personally highly meaningful and significant, and entailing strong devotion and positive emotions such as joy, enthusiasm and inspiration; (b) efficiency including participants‟ experiences of having willingness to invest effort in their research work and studies, strengthened self-images of themselves as researchers and having the effective and energetic drive to conduct doctoral work, and (c) absorption including participants‟ experiences of intensive situations where they experienced being fully concentrated on and engrossed in their research work and studies. the three main categories reflected the main experiences of engagement in doctoral work. the category labelled “efficiency” came close to vigour (e.g., schaufeli et al., 2002a, 2002b), however, the category was named as efficiency because in students‟ descriptions experiences of strengthened self-efficacy beliefs and an energetic drive with the research work were emphasised. at the end of the first phase, the analysis focused on what contributed to students‟ experiences of engagement in their doctoral work. the text segments in the categories representing the main experiences of engagement were coded into four basic categories according to the primary sources of, that is, causes for engagement as described by the participants by relying on deci and ryan‟s (2002, 2008) as well as eccles‟s (2008) works introduced in the literature review: (a) competence including participants‟ descriptions where their experiences of engagement in doctoral work were promoted by development of their academic skills and expertise, learning and developing understanding of the domain and own topic, and gaining insights into their own research; (b) relatedness including participants‟ descriptions where their experiences of engagement in doctoral work were strengthened by having dialogues and collaboration with supervisors, other researchers and peers, as well as participating in and becoming a valued part of a scholarly community; (c) autonomy including participants‟ descriptions where their experiences of engagement in doctoral work were promoted by being in control of their own research work, and following their own interest in their doctoral process, and (d) contribution including participants‟ descriptions where their experiences of engagement in doctoral work were strengthened by producing such significant scientific knowledge that make a difference, and seeing the value of their own research in practice. a visualisation of the first analysis phase is provided in figure 1. the agreement between the two classifiers regarding the independent parallel analysis of 30% (f = 36) of the text segments in relation to the main experiences of engagement was 94% and in relation to the sources of engagement was 97%. interrater reliability measured with cohen‟s kappa (κ) in regard to the main experiences of engagement was 0.91 and in regard to the sources of engagement was 0.95, indicating almost complete agreement. the text segments related to the main experiences and sources of engagement were quantified and the relation between them was analysed with cross-tabulation and χ²-tests. j. vekkaila et al. 19 | f l r figure 1. a visualisation of the first analysis phase. in the second phase of the analysis, a person-oriented analysis strategy was applied. the personoriented analysis involved that the analysis focused on identifying the forms of student‟s engagement in doctoral work. in practice, each participant‟s engaging experiences that were identified at the very beginning of the analysis from the interview data were grouped together, that is, formed own unity, and were separated from the experiences reported by the other participants. at first, the different forms of engagement presented in each participant‟s descriptions were investigated to delineate the initial categories by exploring the patterns, that is, differences and similarities in the main experiences and sources of each individual student‟s engaging experiences. also, each participant‟s engaging experiences were interpreted within the larger interview context. then, the similarities and differences in the main experiences and sources were explored across all participants‟ descriptions of engaging experiences. as a result, the experiences were divided into their own categories based on their differences, following the idea that the experiences presenting a certain form of engagement in one category were mutually similar, while being distinct enough from the other categories. the categories appeared to differ from each other in terms of how the participants expressed: (1) the dynamics between themselves and their scholarly community in the engaging experiences, and (2) the source of inspiration in their doctoral work in the engaging experiences. from the participants‟ descriptions three categories representing the qualitatively different forms of engagement in doctoral work were identified: adaptive form of engagement, agentic form of engagement and work-life inspired form of engagement. in adaptive and work-life inspired forms of engagement the dynamics between the doctoral students and their scholarly community was expressed as being static in nature, that is, providing an arena for adjusting and acquiring knowledge, whereas in the agentic form of engagement the dynamic was expressed as being reciprocal, that is, an arena for dialogue. in the students‟ expressions the source of inspiration in doctoral work in adaptive form of engagement was adapting and conforming to the current conditions and acquiring j. vekkaila et al. 20 | f l r the knowledge and skills that were valued in the scholarly community. in turn, in agentic form of engagement creating new knowledge was more emphasised as source of inspiration in doctoral work, whereas in work-life inspired form of engagement the students highlighted the importance of applying the new knowledge and skills acquired in the scholarly community in order to solve practical problems and contribute to the work-life outside academia. the qualitatively different forms of engagement were also studied in relation to the phase of studies. study phase was determined based on students‟ own evaluation of whether they were at the beginning, middle, or end of their own doctoral process. although in the participants‟ descriptions typically at least two of the forms of engagement were present, one of the forms was emphasised in their descriptions. in the results, we provide quotations of participants‟ descriptions that were translated from finnish into english. 4. results the results suggested that there was a variation in the participants‟ experiences of engagement. the doctoral students‟ descriptions of dedication, efficiency and absorption ranged from experiencing their doctoral work as highly meaningful to having energetic drive while conducting it. moreover, the sources of engagement varied from developing an understanding of one‟s own research into belonging to the scholarly community. the students also described qualitatively different forms of engagement. 4.1 main experiences of engagement in doctoral work the participants emphasised experienced dedication (53%) in their doctoral work (see table 1). for instance, the students perceived earning the doctorate and training as personally meaningful and significant, and described their strong devotion in their doctoral process and interest in their research. they also expressed extremely positive emotions including pleasure, satisfaction and joy. they were also enthusiastic about being doctoral students and pursuing their phds in the training program. for instance, as one of the students described: i like this graduate school because every time we have here a seminar, i leave it with a growing zeal. i think that conducting research is the right work for me. participating in this graduate school and its seminars really promote my excitement and inspiration. (p10) the students also often highlighted a sense of efficiency in conducting their doctoral work (40%). they reported positive, strengthened perceptions of their self-efficacy beliefs as researchers and their clear perceptions of the next steps in their research, and ability to organise and steer their own doctoral process. they were also willing to make efforts for their doctoral work and described having active, efficient and energetic drive when conducting it. as one of the students shared: when i present my work in different seminars and receive feedback . . . it has a practical influence on my work and then i really need to get to work with my research; then i know what i have to get working on next . . . it gives me energy to conduct my research further and i try to find time to conduct it . . . then my research moves forward . . . (p17) the students rarely described experiences of absorption in the doctoral work (7%). in these cases, for example, they described intensive episodes during which they were fully immersed in their work, including data analysis or writing the thesis. they were involved in the doctoral work even to the extent that other activities were brushed aside. as one student described: j. vekkaila et al. 21 | f l r then came this very intensive period . . . i was in the field collecting data every day for several months . . . i was immersed in the data collection for several years, because i found the situation in the field really interesting. (p21) 4.2 sources of engagement in doctoral work the sources that the participants identified as contributing to the engaging experiences varied. however, the engagement was often described in relation to learning and developing as a researcher as well as interacting with other researchers. table 1 shows that the participants emphasised an increased sense of competence (39%) as an important source of their engagement. the students‟ sense of competence often emerged as development of understanding or new academic skills. hence, their engagement often stemmed from learning and development as scholars. these experiences included, for instance, deepening their understanding of research work and theories, creating new knowledge, and developing their thinking and learning about their themes in more profound ways, as well as providing new insights in their research. as one student remarked: i think that finding and learning new knowledge is fun. my supervisor says that i should not read anymore, but when new research is published, i have to read it. i suppose i like to gain new insights and understanding about my research theme. they are really the best experiences in this work. (p15) almost as often, the students highlighted their sense of relatedness (37%) as a significant source of engagement in their doctoral work (table 1). characteristic of the situations in which the students‟ experiences of relatedness were promoted was that they perceived being actively involved in their scholarly community, and having a sense of belonging to it and being valued by others. they also described various participation and interaction arenas including research collaboration, receiving constructive feedback and discussions, and sharing interest and expertise with more experienced researchers, supervisors, and peers on research work in general and especially on their own doctoral research. as one of the students described: usually i become inspired by our seminars and discussions. the first thing that comes to my mind is professor h’s ways of stating concepts. he somehow makes theories clearer and adds new perspectives. i have also participated in a group where we have discussed the doctoral theses of other advanced doctoral students and through those discussions i have had many new ideas . . . i get the feeling that it is wonderful that i am able to do this and it is amazing to be here, that this work is really fun. (p1) sometimes the students described their sense of autonomy (13%) as the source of engagement in their doctoral work. they expressed the significance of being able to conduct such research work that was one of their personal interests, based on their own decisions, were in their own control, and defined on their own terms even though they often worked in research projects with other researchers. as one student commented: that seminar began and there we read the central texts related to the theory together in our graduate school group. it was an amazing time and we were given time and space to think . . . it was really nice time . . . [it was a] time when i did not have to limit myself and had the freedom to do and be. (p16) less often, the students expressed sense of contribution (11%) to be a source of their engagement. when the students described sense of contribution they typically reported the importance of being able to produce original scientific knowledge with significance and develop such understanding of the research themes that would be valued and making a difference especially in the practical work-life outside academia. as one of the students shared: j. vekkaila et al. 22 | f l r it really inspired me that some group with our support would innovate and develop a new way of performing and working and they would begin to apply it in practice. it is inspiring to be involved in those processes. i think that this research is useful and i can have an impact on something larger through this work. (p3) further investigation showed that there was a relation between the main experiences and sources of engagement (χ² =13.42, df =6, p =0.037). the sources of students‟ dedication and efficiency in terms of their doctoral work were typically their strengthened senses of competence and relatedness (table 1). table 1 the main experiences and sources of engagement in doctoral work (based on 120 engaging experiences reported by the participants) experiences of engagement sources of engagement dedication f (%) efficiency f (%) absorption f (%) total f (%) competence 19 (16%) 24 (20%) 4 (3%) 47 (39%) relatedness 22 (18%) 20 (16%) 3 (3%) 45 (37%) autonomy 12 (10%) 2 (2%) 1 (1%) 15 (13%) contribution 11 (9%) 2 (2%) 13 (11%) total 64 (53%) 48 (40%) 8 (7%) 120 (100%) 4.3 qualitatively different forms of engagement in doctoral work our person-oriented analysis showed that the participants‟ descriptions included three qualitatively different forms of engagement (see table 2). in each form the dynamics between the doctoral students and their scholarly community as well as the source of inspiration in doctoral work were expressed differently by the students. the first category was labelled adaptive form of engagement, where the students emphasised their experiences of dedication and efficiency through adapting and adjusting to their scholarly community and its research traditions and practices. such experiences reflected a static, one-directional relation between the students and their scholarly community. the students usually reported their relatedness to their own research community which provided the arena for acquiring knowledge from more experienced researchers, for instance, through supervision and following theoretical discussions. the students expressed adapting and conforming to the current conditions and acquiring the knowledge and skills that were valued in their scholarly community as the significant source of inspiration in doctoral work. such knowledge and skills included, for instance, writing skills and gaining the relevant theoretical understanding. being able to conduct the research according to the community‟s framework and criterion was also important. the students, for instance, described adaptive engagement in relation to their supervision and research as follows: i got a good feeling when i exchanged a few words with my supervisor. then it was all clear how i should continue my work . . . i learned something relevant or gained insights, because this is a new world for me . . . (p1) j. vekkaila et al. 23 | f l r overall this graduate school has been rewarding because it was a new experience to create the research plan but at the same time i could see what others had done and from others’ work i got some hints . . . i made notes and out of that mess i gradually came up with a logical vision and started to lay out my research plan. (p19) such adaptive form of engagement was most often described by students who were at the beginning of their doctoral process. in the second category, agentic form of engagement, the students emphasised their experiences of dedication and efficiency through a dialogical relationship between themselves and their scholarly community. such experiences reflected an active and re-forming interplay between the students and their community. the students also perceived their relatedness to both their own research community and the larger scholarly environment including international conferences that provided an arena for sharing research ideas, receiving constructive feedback and collaboration. the students highlighted creation of new knowledge as the important source of their inspiration in doctoral work. this included, for instance, being able to redefine their own research work in relation to their research community‟s framework, becoming autonomous and work on their own terms, and being able to argue their own point of view when contributing to their scholarly community. for example, the students expressed their agentic engagement in terms of dialogues with others and their own research work as follows: the most rewarding for me are the moments when i can share my thinking with others . . . for instance, i have those experiences where there were interesting discussions and i could present my point of view and we can develop some insights . . . i have found pleasure in those encounters in the field, or with my supervisor, when she can follow my ideas and clarify them, or through some e-mail conversations with a colleague. of course, these experiences require that i must also write something and then share it with others. (p16) at first, i did not know much and i was all at sea about on what theme i should focus my research; it was quite superficial . . . now i have hope . . . i have familiarised with it little by little and now i develop and cherish my own ideas. now i feel that it is my own project, more than before . . . (p12) the agentic form of engagement was typically reported by those students who were either halfway through or at the end of their process. the third category was labelled work-life inspired form of engagement, in which the students emphasised the influence of their professional lives on their dedication to their doctoral work. such experiences reflected three-directional relations between the students, their work-life outside academia and their research community. typically, the research community where they were receiving doctoral training provided the arena for acquiring such theoretical knowledge and research skills that extended their understanding of their research questions evolved from their work-live contexts. the students emphasised applying the new knowledge and skills in order to solve practical problems and contribute to the work-life outside academia as the significant source of inspiration in doctoral work. the students described their worklife inspired engagement, for instance, as follows: these were those moments of insight. i really understand my [professional] work now in a more profound way and can combine concepts that i have not previously realised to be related. i find answers to those questions from practical problems that i have seen in my own work . . . and i have gained a lot from the graduate school seminars where there have been discussions on these ideas . . . now, for instance, i have read a doctoral thesis and then i have gained some new insights into my own data and concepts, and through those concepts i can understand better my data . . . (p4) j. vekkaila et al. 24 | f l r actually the inspiring experiences and moments of joy or inspiration related to doctoral studies arise when i lead the groups involved in the project . . . their own zeal also encouraged me to continue and the idea that my research work could make a difference and support these practices in the future. (p10) such work-life inspired form of engagement was reported by the students in different phases of their doctoral process. table 2 qualitatively different forms of engagement (based on the person-oriented analysis of the participants’ engaging experiences) qualitatively different forms of engagement what kind of dynamic exists between the doctoral students and their scholarly community the source of inspiration in doctoral work adaptive agentic work-life inspired dedication and efficiency through a one-directional relation where the scholarly community provides the arena for the students to adjust and acquire knowledge dedication and efficiency through a dialogical relation between the students and the scholarly community where both the students and the community re-form dedication through a three-directional relation where the scholarly community provides the arena for the students to acquire knowledge to answer questions that have evolved from their work-life outside academia dedication and efficiency through conforming to the current conditions and acquiring the knowledge and skills valued in the scholarly community dedication and efficiency through creating new knowledge in relation to the scholarly community‟s theoretical framework, being able to work on their own terms and develop their own points of view dedication through applying the scholarly community‟s theoretical knowledge and research skills in order to solve practical problems and contribute to the worklife outside academia 5. discussion 5.1 theoretical reflections and implications engaging doctoral experience is rarely explored in both doctoral education and engagement literature. hence, our study provided new insight into doctoral student engagement by breaking down the complexity of engagement by identifying qualitatively different experiences and sources of engagement. results showed that the main experiences of engagement in doctoral work were dedication and efficiency. experiences of absorption were rarely reported. our finding were in line with the previous findings of work engagement research carried out in other work-life contexts and among undergraduate students where engagement is explored in terms of dedication, vigour and absorption (e.g., bresó et al., 2011; krause & coates, 2008; ouweneel et al., 2011; salanova et al., 2010; schaufeli et al., 2002a, 2002b). this implies that previous research on work engagement (e.g., salanova et al., 2010; schaufeli et al., 2002a, 2002b) appeared to provide a functional framework for exploring students‟ engagement in their doctoral work. j. vekkaila et al. 25 | f l r further investigations showed that the primary sources for engaging doctoral experiences were increased sense of competence and relatedness. the students reported sometimes sense of autonomy and contribution as sources for engagement in their doctoral work. this is in line with previous research suggesting that students‟ and workers‟ self-motivation, optimal functioning and psychological well-being are fostered when their senses of relatedness, competence, autonomy (e.g., deci & ryan, 2008; niemic & ryan, 2009; see also mason, 2012; virtanen & pyhältö, 2012) and contribution (eccles, 2008; see also virtanen & pyhältö, 2012) are promoted. however, the findings here clarify further the understanding of different sources of engagement in doctoral work. in our findings the experiences of competence and relatedness were emphasised. this may reflect the development of engagement during the doctoral process. in the present study, the doctoral students‟ dedication and sense of efficiency appeared to be strengthened when they developed their competences as researchers and became more related to their scholarly community. it may be that when students perceive themselves as competent and acknowledged members experiences of having autonomy and making contributions may become more salient ones. in addition, our results confirmed the previous findings suggesting that students‟ feelings of belonging and participation in a scholarly community contribute to their positive experiences, wellbeing as well as satisfaction with and persistence in doctoral studies (deem & brehony, 2000; golde, 2005; hoskins & goldberg, 2005; lovitts, 2001; pyhältö et al., 2009; pyhältö, vekkaila, & keskinen, 2012b; stubb et al., 2011). the results of our study also provided new insights by demonstrating how students‟ experiences of belonging were significant in terms of their engagement in doctoral work. the significance of experienced belonging among the behavioural science doctoral students may have to do with the nature of the research in their discipline. as part of the soft sciences, the behavioural sciences are sometimes characterised by solitary research work in libraries, archives or in the field (lovitts, 2001). one would therefore expect that relatedness would not be as important. in our participants reports the possibilities for experiencing being a valued, acknowledged member of a scholarly community was important. however, some students in these fields may also work in research groups (e.g., austin, 2010), for instance, in archaeology. also different forms of engagement, including adaptive engagement, agentic engagement and worklife inspired engagement, were identified. to our knowledge, qualitatively different forms of engagement have not been previously reported among university students. hence, this study contributed to the literature on doctoral student engagement by opening the nature of engagement at the interfaces of studying and working by shedding light on the dual role of doctoral students as both students and professional researchers. it is possible that the varying forms of engagement reflect the different meanings of doctoral work that were given by the participants (e.g., meyer, shanahan, & laugksch, 2005; stubb, pyhältö, & lonka, 2012a, 2012b). for instance, to some extent our results resembled the different perceptions of doctoral research found by stubb et al. (2012b). in their research the doctoral students perceived research work as 1) “a personal learning process”, 2) a “job to do”, 3) “making a contribution” and 4) “obtaining qualifications and gaining accomplishments”. the first category and the agentic and work-life inspired forms of engagement overlap with one other since in all of them the significance of exploring something that was defined in one‟s own terms or was personally interesting were emphasised by the participants. in turn, the second category and the adaptive form of engagement resemble each other, because, for both of these, the participants highlighted doctoral research as an activity in which they follow the traditions and practices of the scholarly community or its use in fulfilling the community‟s requirements for a doctorate. in addition, in the third category, answering interesting questions that made a difference was viewed as meaningful to the doctoral students, and, hence, has similarities with work-life inspired engagement. however, in work-life inspired engagement, the contribution focused mainly on professional contexts outside academia, whereas in the third category, the contribution focused both on the discipline and society. moreover, the fourth category of “accomplishment” not only included demonstrating one‟s excellent performance, but also the creation of new knowledge and, therefore, has similarities with the agentic form of engagement. however, gaining merit and status were also emphasised in this particular category, but were not expressed by the participants in relation to agentic engagement. hence, it may be that the sources of inspiration in doctoral work at least j. vekkaila et al. 26 | f l r partially reflect the students‟ motives, goals and aspirations related to their phds. furthermore, the meanings of doctoral research given by students and goals for earning the doctorate may affect what kinds of scholarly identities (e.g., pyhältö et al., 2012a) doctoral students construct, for instance, a professionally oriented one, and also their engagement in the doctoral process. if students perceive the meaning of doctoral work to be obtaining qualifications for work-life outside university and construct their identity through their professional careers it is likely to be reflected into their engagement in the doctoral process. then it may be that doctoral experiences that promote the connection between the doctoral work and professional life, and practical meaning and value of doctorate are likely to enhance students‟ engagement in doctoral work. in turn, experiences that do not enable students to make a meaningful connection between the doctorate and their aspirations may reduce their engagement in their doctoral work. moreover, our findings suggested that the qualitatively different forms of engagement were emphasised differently by the participants in different phases of the doctoral process. adaptive engagement was more often described by the students who were at the beginning of their doctoral process, agentic engagement by those students who were either halfway through or at the end of their process. work-life inspired engagement was reported by the students in all phases of the doctoral process. a reason for the adaptive form of engagement was being emphasised at the beginning of the doctoral process maybe that doctoral students‟ active agency and participation in their scholarly communities increases over time as they progress in their thesis process (e.g., hakkarainen et al., 2013; hopwood, 2010; pyhältö & keskinen, 2012). 5.2 methodological reflections and its limitations in this study, semi-structured interview data were collected and qualitative content analysis relying on abductive strategy that combined both grounded and theory-guided analyses (e.g., coffey & atkinson, 1996; harry et al., 2005; kvale, 2007; mills et al., 2006; morgan, 2007; patton, 1990) was used to identify the students‟ experiences of engagement in doctoral work. engagement has typically been investigated by using quantitative methods (e.g., ouweneel et al., 2011; salanova et al., 2010, schaufeli et al., 2002a, 2002b). the strength of our approach was that it allowed us to explore students‟ experiences of engagement in a profound manner and provided insights in the various aspects of engagement in doctoral work. certain challenges are involved in using a retrospective approach (e.g., cox & hassard, 2007). the participants‟ experiences and their overall life situations are often difficult to recall and sum up in a single interview (kvale, 2007). accordingly, the retrospection was likely to have affected the data, including a generalisation of experiences. the retrospective approach and semi-structured interviews also had their advantages (e.g., cox & hassard, 2007). the reflective and process-oriented design gave the participants an opportunity to reflect on their doctoral journey and identify significant experiences in it. this resulted in rich data and ensured that the participants recalled and reported only significant experiences. moreover, we explored the engagement among 21 behavioural sciences doctoral students who were conducting their thesis in one top-level research community. because of the distinctive features of the discipline (e.g., lindblom-ylänne, trigwell, nevgi, & ashwin, 2006; mccune & hounsell, 2005) and the limited sample size, generalising the results to other disciplines and in other countries should be done with caution. however, we have looked at doctoral students‟ experiences in other domains, and, for instance, results here resemble our (vekkaila et al., 2012) recent findings regarding natural sciences students‟ significant engaging and disengaging doctoral experiences. further longitudinal studies are needed to explore the development of engagement (e.g., demerouti, bakker, nachreiner, & schaufeli, 2001) in doctoral work among doctoral students from different domains and countries. this may provide a better understanding, for instance, of whether students experience their engagement in their thesis work differently in various domains and at the different phases of the doctoral process. also, the relation between engagement in the doctoral process, the meanings of earning the doctorate given by the students and development of a scholarly identity is worth of further investigation. j. vekkaila et al. 27 | f l r 5.3 educational implications in terms of developing more engaging learning environments for doctoral students, our findings imply that engagement is not a singular entity; instead it is multidimensional and entails various qualities. doctoral student may experience engagement in their doctoral work in varying ways, and hence it is one matter to be dedicated to doctoral research and another to experience oneself as an efficient researcher or absorption in research activities at hand. dedicated doctoral students are likely to be engaged in their doctoral work by their sense of significance, commitment and positive thesis related emotions. students feeling efficiency, in turn, are likely to express their engagement through their positive self-images as researchers and by their energetic actions, whereas absorption is likely to entail students‟ full concentration and being totally immersed in their study or research activities for certain periods of time. accordingly, the ways to support doctoral student engagement need to be diverse. our results implied that doctoral students‟ engagement in doctoral work can be supported by enhancing their experiences of being competent researchers and integrated into their scholarly community. such experiences can be supported by, for instance, facilitating doctoral students‟ participation in collaborative academic practices. an example of a practice that is likely to promote students‟ engagement is a learning community formed around certain academic activities (zhao & kuh, 2004) which are designed to strengthen students‟ positive self-efficacy beliefs, provide academic challenges, and involve active and collaborative learning techniques, interaction opportunities and social support (bresó et al., 2011; overall, deane, & peterson, 2011; umbach & wawrzynski, 2005). this could be applied in doctoral education both in supervisory meetings and in different academic groups that support, for instance, peer learning, writing processes, dialogues and collaborative problem solving (aitchison & lee, 2006; boud & lee, 2005; lonka, 2003). it is interesting that the doctoral students rarely described experiences of absorption in their doctoral work. a reason for this maybe that the experiences of absorption remain an unidentified or unused resource for supporting students‟ engagement in their doctoral work. absorption resembles the flow experience (csikszentmihalyi, 1990); hence, emerge of such intrinsically enjoyable experience can be fostered by optimising the balance between the challenges of learning tasks and students experiencing competence (e.g., inkinen et al., 2013). for instance, in their recent study on university students inkinen et al. (2013) noted that although positive and active emotions are only one aspect of the complex flow experience, they found that these kinds of emotions occurred when the perceived challenge and required skills were both very high and in balance. the balance may be reached by providing doctoral students the resources they need such as supervision, constructive feedback of their learning and development as a researcher, peer support, and control over their own research. then, when doctoral students experience balance between their resources and the unique challenges set by the doctoral research and intensively work at the edge of their competences they are more likely to experience absorption. moreover, based on our results doctoral students‟ engagement in their doctoral work may be facilitated by shared meaning-making among doctoral students and supervisors regarding their goals for the doctorate and meanings of research work given by both students and supervisors. in practice, this can be supported, for instance, by encouraging supervisors and students to reveal and elaborate on their perceptions in supervisory discussions. such elaborations may provide a tool for and support supervisors and students to construct a shared understanding of the focus of supervision. supervisory discussions on the goals and perceptions of doctoral research are important especially at the beginning of the doctoral process when supervisory relationships are formed and students plan and get started with their doctoral projects. golde (1998), for instance, showed that one of the main reasons for doctoral students leaving their studies during the first year was a mismatch between the students‟ and supervisors‟ goals, expectations and practices. at the same time, there may be both individual and contextual variations. doctoral students, supervisors and other members of a scholarly community face more and less difficult times. there is also the reciprocal, continuously evolving relation between students and their environments in which engagement is constructed. it follows that both the students and scholarly community need to be constantly adjusting. the results of our study can be used both by students themselves for preparing themselves for the doctoral j. vekkaila et al. 28 | f l r process and considering meaningful and active participation strategies, and by supervisors and other doctoral educators for supporting their students‟ engagement in the best possible ways. designing more engaging learning environments for today‟s doctoral students is also an investment for the future academics and other knowledge workers. doctoral students‟ experiences of engagement are likely to have long-lasting effects. for instance, stubb et al. (2012a) demonstrated a relation between doctoral students‟ perceptions of their doctoral project, well-being and engagement. the results showed that participants who perceived their doctoral research as a process (e.g., learning and developing as a researcher) reported less stress, exhaustion, anxiety and lack of interest than students who perceived their research as a product (e.g., career qualification) or both as a process and product. moreover, those students who reported process related-meaning had less frequently considered interrupting their studies than others. accordingly, students‟ experiences of engagement during their doctoral process may function as a basis for their further engagement and well-being. keypoints engaging doctoral experience is rarely explored in both doctoral education and engagement literature. this study provided new insight in doctoral student engagement by breaking down the complexity of engagement by identifying qualitatively different experiences and sources of engagement. this study contributed to the literature on doctoral student engagement by opening the nature of engagement at the interfaces of studying and working by shedding light on the dual role of doctoral students as both students and professional researchers. the results encourage designing such engaging learning environments for doctoral students that promote their experiences of being competent researchers and integrated into their scholarly community. the relation between engagement in the doctoral process, the meanings of earning the doctorate given by the students, and development of a scholarly identity is worth of further investigation. acknowledgements the work has been supported by a grant from the finnish cultural foundation to the first author, grant 2106008 from the university of helsinki, and grant 1121207 from the academy of finland. references aitchison, c., & lee, a. (2006). research writing: problems and pedagogies. teaching in higher education, 11(3), 265–278. doi:10.1080/13562510600680574 alexander, a. (1992). domain knowledge: evolving themes and emerging concerns. educational psychologist, 27(1), 33–51. doi:10.1207/s15326985ep2701_4 austin, a. e. (2002). preparing the next generation of faculty: graduate school as socialization to the academic career. the journal of higher education, 73(1), 94–122. doi:10.1353/jhe.2002.0001 austin, a. e. (2010). expectations and experiences of aspiring and early career academics. in l. mcalpine & g. åkerlind (eds.), becoming an academic. international perspectives (pp. 18–44). united kingdom: palgrave macmillan. j. vekkaila et al. 29 | f l r appel, m., & dahlgren, l. (2003). swedish doctoral students‟ experiences on their journey towards a phd: obstacles and opportunities inside and outside the academic building. scandinavian journal of educational research, 47(1), 89–110. doi:10.1080/0031383032000033380 biglan, a. (1973a). the characteristics of subject matter in different academic areas. journal of applied psychology, 57(3), 195–203. doi:10.1037/h0034701 biglan, a. (1973b). relationship between subject matter characteristics and the structure and output of university departments. journal of applied psychology, 57(3), 204–213. doi:10.1037/h0034699 boud, d., & lee, a. (2005). peer learning as pedagogic discourse for research education. studies in higher education, 30(5), 501–516. doi:10.1080/03075070500249138 boud, d., & tennant, m. (2006). putting doctoral education to work: challenges to academic practice. higher education research & development, 25(3), 293–306. doi:10.1080/07294360600793093 bourner, t., bowden, r., & laing, s. (2001). professional doctorates in england. studies in higher education, 26(1), 65–88. doi:10.1080/03075070020030724 bresó, e., schaufeli, w. b., & salanova, m. (2011). can a self-efficacy-based intervention decrease burnout, increase engagement, and enhance performance? a quasi-experimental study. higher education, 61(4), 339–355. doi:10.1007/s10734-010-9334-6 brew, a., boud, d., & namgung, s. u. (2011). influences on the formation of academics: the role of the doctorate and structured development opportunities. studies in continuing education, 33(1), 51–66. doi:10.1080/0158037x.2010.515575 brint, s., cantwell, a. m., & hannerman, r. a. (2008). two cultures: undergraduate academic engagement. uc berkeley: center for studies in higher education. doi:10.1007/s11162-008-9090-y case, j. (2008). alienation and engagement: development of an alternative theoretical framework for understanding student learning. higher education, 55(3), 321–332. doi:10.1007/s10734-007-9057-5 chiang, k. (2003). learning experiences of doctoral students in uk universities. international journal of sociology and social policy, 23(1/2), 4–32. doi:10.1108/01443330310790444 coffey, a., & atkinson, p. (1996). making sense of qualitative data. complementary research strategies. thousand oaks, ca: sage. cox, j. w., & hassard, j. (2007). ties to the past in organization research: a comparative analysis of retrospective methods. organization, 14(4), 475–497. doi:10.1177/1350508407078049 csikszentmihalyi, m. (1990). flow: the psychology of optimal experience. new york: harper-perennial. deem, r., & brehony, k. j. (2000). doctoral students access to research cultures – are some unequal than others? studies in higher education, 25(2), 149–165. doi:10.1080/713696138 deci, e. l., & ryan, r. m. (2002). an overview of self-determination theory: an organismic-dialectical perspective. in l. deci & r. m. ryan (eds.), handbook of self-determination research (pp. 3–33). rochester: the university of rochester press. deci, e. l., & ryan, r. m. (2008). facilitating optimal motivation and psychological well-being across life‟s domains. canadian psychology, 49(1), 14–23. doi:10.1037/0708-5591.49.1.14 demerouti, e., bakker, a. b., nachreiner, f., & schaufeli, w. b. (2001). the job demands – resources model of burnout. journal of applied psychology, 86(3), 499–512. doi:10.1037/0021-9010.86.3.499 eccles, j. s. (2008). agency and structure in human development. research in human development, 5(4), 231–243. doi:10.1080/15427600802493973 edwards, a. (2005). relational agency: learning to be a resourceful practitioner. international journal of educational research, 43(3), 168–182. doi:10.1016/j.ijer.2006.06.010 fredricks, j. a., blumenfeld, p. c., & paris, a. h. (2004). school engagement: potential of the concept, state of the evidence. review of educational research, 74(1), 59–109. doi:10.3102/00346543074001059 gardner, s. k. (2007). “i heard it through the grapevine”: doctoral student socialization in chemistry and history. higher education, 54(5), 723–740. doi:10.1007/s10734-006-9020-x gardner, s. k., & barnes, b. j. (2007). graduate student involvement: socialization for the professional role. journal of college student development, 48(4), 1–19. doi:10.1353/csd.2007.0036 golde, c. m. (1998). beginning graduate school: explaining first-year doctoral attrition. new directions for higher education, 101, 55–64. doi:10.1002/he.10105 golde, c. m. (2005). the role of the department and discipline in doctoral student attrition: lessons from four departments. the journal of higher education, 76(6), 669–700. doi:10.1353/jhe.2005.0039 http://www.tandfonline.com/doi/abs/10.1080/03075070124819 http://dx.doi.org/10.1080/0158037x.2010.515575 http://web.ebscohost.com/ehost/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie46bnkrquutrgk63nn5kx95uxxjl6oru%2btqk5jszaxurksueiulr9lporweezp33vy3%2b2g59q7sbwmtki0rlfks5zqeezdu33snoj6u9vmgktq33%2b7t8w%2b3%2bs7srattfgzrk8%2b5oxwhd%2fqu37z4uqm4%2b7y&hid=126 http://web.ebscohost.com/ehost/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie46bnkrquutrgk63nn5kx95uxxjl6oru%2btqk5jszaxurksueiulr9lporweezp33vy3%2b2g59q7sbwmtki0rlfks5zqeezdu33snoj6u9vmgktq33%2b7t8w%2b3%2bs7srattfgzrk8%2b5oxwhd%2fqu37z4uqm4%2b7y&hid=126 j. vekkaila et al. 30 | f l r hakkarainen, k., hytönen, k., makkonen, j., seitamaa-hakkarainen, p., & white, h. (2013). interagency, collective creativity, and academic knowledge practices. in a. sannino & v. ellis (eds.), learning and collective creativity: activity-theoretical and socio-cultural studies. london, england: routledge, taylor & francis group. harry, b., sturges, k. m., & klingner, j. k. (2005). mapping the process: an exemplar of process and challenge in grounded theory analysis. educational researcher, 34(2), 3–13. doi:10.3102/0013189x034002003 hopwood, n. (2010). a sociocultural view of doctoral students‟ relationships and agency. studies in continuing education, 32(2), 103–117. doi:10.1080/0158037x.2010.487482 hoskins, c. m., & goldberg, a. d. (2005). doctoral student persistence in counselor education programs: student-program match. counselor education and supervision, 44(3), 175–188. doi:10.1002/j.15566978.2005.tb01745.x hyun, j. k., quinn, b. c., madon, t., & lustig, s. (2006). graduate student mental health: needs assessment and utilization of counseling services. journal of college student development, 47(3), 247–266. doi:10.1353/csd.2006.0030 inkinen, m., lonka, k., hakkarainen, k., muukkonen, h., litmanen, t., & salmela-aro, k. (2013). the interface between core affects and the challenge–skill relationship. journal of happiness studies. an interdisciplinary forum on subjective well-being. doi:10.1007/s10902-013-9455-6 international postgraduate student mirror. (2006). catalonia, finland, ireland and sweden. högskoleverket, swedish national agency for higher education, 29r. retrieved from http://www.ub.edu/depdibuix/ir/0629r-shv_se-catalonia.pdf ives, g., & rowley, g. (2005). supervisor selection or allocation and continuity of supervision: ph.d. students‟ progress and outcomes. studies in higher education, 30(5), 535–555. doi:10.1080/03075070500249161 krause, k-l., & coates, h. (2008). students‟ engagement in first-year university. assessment & evaluation in higher education, 33(5), 493–505. doi:10.1080/02602930701698892 kurtz-costes, b., helmke, a. l., & ülkü-steiner, b. (2006). gender and doctoral studies: the perceptions of phd students in an american university. gender & education, 18(2), 137–155. doi:10.1080/09540250500380513 kvale, s. (2007). doing interviews. london: sage publications. leiter, m. p., & bakker, a. b. (2010). work engagement: an introduction. in a. b. bakker & m. p. leiter (eds.), work engagement: a handbook of essential theory and research (pp. 1–9). london and new york: psychology press. lindblom-ylänne, s., & lonka, k. (2000). interaction between learning environment and expert learning. lifelong learning in europe, 5(2), 90–97. lindblom-ylänne, s., trigwell, k., nevgi, a., & ashwin, p. (2006). how approaches to teaching are affected by discipline and teaching context. studies in higher education, 31(3), 285–298. doi:10.1080/03075070600680539 llorens, s., schaufeli, w., bakker, a., & salanova, m. (2007). does a positive gain spiral of resources, efficacy beliefs and engagement exist? computers in human behavior, 23(1), 825–841. doi:10.1016/j.chb.2004.11.012 lonka, k. (1997). explorations of constructive processes in student learning. a doctoral dissertation. helsinki: university press. lonka, k. (2003). helping doctoral students to finish their theses. in l. björk, g. bräuer, l. rienecker, g. ruhmann, & p. stray jørgensen (eds.), teaching academic writing across europe (pp. 113–131). dordrecht, the netherlands: kluwer university press. lonka, k., & lindblom-ylänne, s. (1996). epistemologies, conceptions of learning, and study practices in medicine and psychology. higher education, 31(1), 5–24. doi:10.1007/bf00129105 lovitts, b. e. (2001). leaving the ivory tower: the causes and consequences of departure from doctoral study. lanham, md: rowman and littlefield. lovitts, b. e., & nelson, g. (2000). the hidden crisis in graduate education: attrition from ph.d. programs. academe, 86(6), 44–50. doi:10.2307/40251951 j. vekkaila et al. 31 | f l r löfström, e., & pyhältö, k. (2012). the supervisory relationship as an arena for ethical problem-solving. education research international. doi:10.1155/2012/961505 mason, m. m. (2012). motivation, satisfaction, and innate psychological needs. international journal of doctoral studies, 7, 259–277. retrieved from http://ijds.org/volume7/ijdsv7p259277mason0345.pdf mcalpine, l., & amundsen, c. (2008). academic communities and developing identity: the doctoral student journey. in p. richards (ed.), global issues in higher education (pp. 57–83), ny: nova publishing. mcalpine, l., & norton, j. (2006). reframing our approach to doctoral programs: an integrative framework for action and research. higher education research & development, 25(1), 3–17. doi:10.1080/07294360500453012 mccune, v., & hounsell, d. (2005). the development of students‟ ways of thinking and practicing in three final-year biology courses. higher education, 49(3), 255–289. doi:10.1007/s10734-004-6666-0 meyer, j. h. f., shanahan, m. p., & laugksch, r. c. (2005). students‟ conceptions of research i: a qualitative and quantitative analysis. scandinavian journal of educational research, 49(3), 225–244. doi:10.1080/00313830500109535 mills, j., bonner, a., & francis, k. (2006). the development of constructivist grounded theory. international journal of qualitative methods, 5(1), 25–35. retrieved from http://ejournals.library.ualberta.ca/index.php/ijqm/article/view/4402/3795 morgan, d. l. (2007). paradigms lost and pragmatism regained: methodological implications of combining qualitative and quantitative methods. journal of mixed methods research, 1(1), 48–76. doi:10.1177/2345678906292462 mäkinen, j., olkinuora, e., & lonka, k. (2004). students at risk: students‟ general study orientations and abandoning/prolonging the course of studies. higher education, 48(2), 173–188. doi:10.1023/b:high.0000034312.79289.ab nettles, m. t., & millet, c. m. (2006). three magic letters: getting to ph.d. baltimore: the john hopkins university press. niemic, c. p., & ryan, r. m. (2009). autonomy, competence, and relatedness in the classroom. applying self-determination theory to educational practice. theory and research in education, 7(2), 133–144. doi:10.1177/1477878509104318 ouweneel, e., le blanc, p. m., & schaufeli, w. b. (2011). flourishing students: a longitudinal study on positive emotions, personal resources, and study engagement. the journal of positive psychology: dedicated to furthering research and promoting good practice, 6(2), 142–153. doi:10.1080/17439760.2011.558847 overall, n. c., deane, k. l., & peterson, e. r. (2011). promoting doctoral students‟ research self-efficacy: combining academic guidance with autonomy support. higher education research & development, 30(6), 791–805. doi:10.1080/07294360.2010.535508 park, c. (2005). new variant phd: the changing nature of the doctorate in the uk. journal of higher education policy and management, 27(2), 189–208. doi:10.1080/13600800500120068 patton, m. q. (1990). qualitative research and evaluation methods (2nd ed.). newbury park, ca: sage publications. pontius, j. l., & harper, s. r. (2006). principles for good practice in graduate and professional student engagement. new directions for student services, 115, 47–58. doi:10.1002/ss.215 pyhältö, k., & keskinen, j. (2012). doctoral students‟ sense of relational agency in their scholarly communities. international journal of higher education, 1(2), 136–149. doi:10.5430/ijhe.v1n2p136 pyhältö, k., nummenmaa, a. r, soini, t., stubb, j., & lonka, k. (2012a). research on scholarly communities and development of scholarly identity in finnish doctoral education. in s. ahola & d. m. hoffman (eds.), higher education research in finland. emerging structures and contemporary issues (pp. 337–357). jyväskylä: jyväskylä university press. pyhältö, k., stubb, j., & lonka, k. (2009). developing scholarly communities as learning environments for doctoral students. international journal for academic development, 14(3), 221–232. doi:10.1080/13601440903106551 j. vekkaila et al. 32 | f l r pyhältö, k., stubb. j., & tuomainen, j. (2011). international evaluation of research and doctoral education at the university of helsinki to the top and out to society. summary report on doctoral students‟ and principal investigators‟ doctoral training experiences. retrieved from http://wiki.helsinki.fi/display/evaluation2011/survey+on+doctoral+training pyhältö, k., vekkaila, j., & keskinen, j. (2012b). exploring the fit between doctoral students‟ and supervisors‟ perceptions of resources and challenges vis-à-vis the doctoral journey. international journal of doctoral studies, 7, 395–414. retrieved from http://ijds.org/volume7/ijdsv7p395414pyhalto383.pdf reeve, j., jang, h., carrell, d., jeon, s., & barch, j. (2004). enhancing students‟ engagement by increasing teachers‟ autonomy support. motivation and emotion, 28(2), 147–169. doi:10.1023/b:moem.0000032312.95499.6f reeve, j., & tseng, c-h. (2011). agency as a fourth aspect of students‟ engagement during learning activities. contemporary educational psychology, 36(4), 257–354. doi:10.1016/j.cedpsych.2011.05.002 sainio, j. (2010). asiantuntijana työmarkkinoille vuosina 2006 ja 2007 tohtorin tutkinnon suorittaneiden työllistyminen ja heidän mielipiteitään tohtorikoulutuksesta [experts for the labour market the employment of doctors who earned their doctoral degree in 2006-2007 and their perceptions of doctoral training]. tampere: kirjapaino hermes oy. retrieved from http://www.aarresaari.net/pdf/asiantuntijana_tyomarkkinoille_netti.pdf salanova, m., schaufeli, w., martínez, i., & bresó, e. (2010). how obstacles and facilitators predict academic performance: the mediating role of study burnout and engagement. anxiety, stress & coping: an international journal, 23(1), 53–70. doi:10.1080/10615800802609965 schaufeli, w. b., & bakker, a. b. (2004). job demands, job resources, and their relationship with burnout and engagement. journal of organizational behavior, 25(3), 293–315. doi:10.1002/job.248 schaufeli, w. b., martínez, i. m., pinto, a. m., salanova, m., & bakker, a. b. (2002a). burnout and engagement in university students. journal of cross-cultural psychology, 33(5), 464–481. doi:10.1177/0022022102033005003 schaufeli, w. b., salanova, m., gonzález-romá, v., & bakker, a. b. (2002b). the measurement of engagement and burnout: a two sample confirmatory factor analytic approach. journal of happiness studies, 3(1), 71–92. doi:10.1023/a:1015630930326 stubb, j. (2012). becoming a scholar: the dynamic interaction between the doctoral student and the scholarly community (doctoral dissertation). university of helsinki, faculty of behavioural sciences, department of teacher education, research report 336. retrieved from http://urn.fi/urn:isbn:978952-10-6867-6 stubb, j., pyhältö, k., & lonka, k. (2011). balancing between inspiration and exhaustion: phd students‟ experienced socio-psychological well-being. studies in continuing education, 33(1), 33–50. doi:10.1080/0158037x.2010.515572 stubb, j., pyhältö, k., & lonka, k. (2012a). the experienced meaning of working with a phd thesis. scandinavian journal of educational research, 56(4), 439–456. doi:10.1080/00313831.2011.599422 stubb, j., pyhältö, k., & lonka, k. (2012b). conceptions of research: the doctoral student experience in three domains. studies in higher education. 1–14. ifirst article. doi:10.1080/03075079.2011.651449 toews, j. a., lockyer, j. m., dobson, d. j. g., simpson, e., brownell, a. k. w., brenneis, f., macpherson, k. m., & cohen, g. s. (1997). analysis of stress levels among medical students, residents, and graduate students at four canadian schools of medicine. academic medicine, 72(11), 997–1002. retrieved from http://journals.lww.com/academicmedicine/abstract/1997/11000/analysis_of_stress_levels_among_m edical_students,.19.aspx turner, g., & mcalpine, l. (2011). doctoral experience as researcher preparation: activities, passion, status. international journal for researcher development, 2(1), 46–60. doi:10.1108/17597511111178014 umbach, p. d., & wawrzynski, m. r. (2005). faculty do matter: the role of college faculty in student learning and engagement. research in higher education, 46(2), 153–184. doi:10.1007/s11162-0041598-1 http://wiki.helsinki.fi/display/evaluation2011/survey+on+doctoral+training http://www.aarresaari.net/pdf/asiantuntijana_tyomarkkinoille_netti.pdf http://www.tandfonline.com/action/dosearch?action=runsearch&type=advanced&result=true&prevsearch=%2bauthorsfield%3a%28mart%c3%adnez%2c+isabel%29 http://www.tandfonline.com/action/dosearch?action=runsearch&type=advanced&result=true&prevsearch=%2bauthorsfield%3a%28bres%c3%b3%2c+edgar%29 http://www.tandfonline.com/loi/gasc20?open=23#vol_23 http://www.tandfonline.com/action/dosearch?action=runsearch&type=advanced&result=true&prevsearch=%2bauthorsfield%3a%28mart%c3%adnez%2c+isabel%29 j. vekkaila et al. 33 | f l r vassil, k., & solvak, m. (2012). when failing is the only option: explaining failure to finish phds in estonia. higher education, 64(4), 503–516. doi:10.1007/s10734-012-9507-6 vekkaila, j., pyhältö, k., hakkarainen, k., keskinen, j., & lonka, k. (2012). doctoral students‟ key learning experiences in the natural sciences. international journal for researcher development, 3(2), 154–183. doi:10.1108/17597511311316991 vekkaila, j., pyhältö, k., & lonka, k. (2013). experiences of disengagement – a study of doctoral students in the behavioral sciences. international journal of doctoral studies, 8, 61–81. retrieved from http://ijds.org/volume8/ijdsv8p061-081vekkaila0402.pdf vermunt, j. (1996). metacognitive, cognitive and affective aspects of learning styles and strategies: a phenomenographic analysis. higher education, 31(1), 25–50. doi:10.1007/bf00129106 virtanen, v., & pyhältö, k. (2012). what engages doctoral students in biosciences in doctoral studies? psychology, 3(12a), 1231–1237. doi:10.4236/psych.2012.312a182 weidman, j. c., & stein, e. l. (2003). socialization of doctoral students to academic norms. research in higher education, 44(6), 641–656. doi:10.1023/a:1026123508335 white, j., & nonnamaker, j. (2008). belonging and mattering. how science doctoral students experience community. naspa journal, 45(3), 350–372. doi:10.2202/1949-6605.1860 zhao, c., & kuh, g. d. (2004). adding value: learning communities and student engagement. research in higher education, 45(2), 115–138. doi:10.1023/b:rihe.0000015692.88534.de appendix 1 doctoral student interview discipline or subject: been as a phd student since: i‟m doing a monograph/collection of articles: i‟m female/male: i‟m doing my thesis full-time/ part-time: phase of my study: 1. how did you become a phd student? what is the topic of your phd work? how did you come up with this topic? does it relate to the work of others in your group? 2. what motivates you to do your phd research? 3. describe in your own words, how has your phd process gone so far? 4. describe some situation, event or episode from your phd studies that has really influenced your own thoughts about doing phd research or something else related to that. what happened? why? what did you think of and how did you feel? 5. at the moment, do you have some question/challenge that you are wondering about? if so, what? why? 6. what is the most enjoyable thing in postgraduate studies? what is the hardest? 7. describe a situation that gave you inspiration. what happened? why do you think it happened? what did you do, think and feel? describe a situation in your phd process that was in some way negative. what happened? why do you think it happened? what did you do, think and feel? 8. what kind of supervision have you gotten in your phd process? what kind of supervision would you hope for? 9. do you get support to your work from somewhere else? what kind of support? would you need something more? 10. describe a situation in your phd process where you felt that your supervisor especially succeeded. what happened and why was that situation meaningful to you? 11. what kind of role do other researchers and phd students have in your process? 12. in your opinion, how should postgraduate education be developed? http://web.ebscohost.com/ehost/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie46bnkrquutrgk63nn5kx95uxxjl6oru%2btqk5jszaxuq6muemylr9lporweezp33vy3%2b2g59q7rbgotvg0qbrqtzzqeezdu33snoj6u9vmgktq33%2b7t8w%2b3%2bs7t7etsem2r7e%2b5oxwhd%2fqu37z4uqm4%2b7y&hid=126 http://web.ebscohost.com/ehost/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie46bnkrquutrgk63nn5kx95uxxjl6oru%2btqk5jszaxuq6muemylr9lporweezp33vy3%2b2g59q7rbgotvg0qbrqtzzqeezdu33snoj6u9vmgktq33%2b7t8w%2b3%2bs7t7etsem2r7e%2b5oxwhd%2fqu37z4uqm4%2b7y&hid=126 http://www.springerlink.com/content/0361-0365/ http://www.springerlink.com/content/0361-0365/ j. vekkaila et al. 34 | f l r 13. what kind of advice would you give to a student who is considering phd studies? why? 14. is there still something you would like to tell? 15. what would you have wished to be asked about? microsoft word garrote_publication.docx frontline learning research vol.5 no. 1 (2017) 1 -‐ 15 issn 2295-‐3159 the relationship between social participation and social skills of pupils with an intellectual disability: a study in inclusive classrooms ariana garrote1 university of zurich, switzerland article received 5 august / revised 31 october / accepted 1 november / available online 26 january abstract researchers claim that a lack of social skills might be the main reason why pupils with special educational needs (sen) in inclusive classrooms often experience difficulties in social participation. however, studies that support this assumption are scarce, and none include pupils with an intellectual disability (id). this article seeks to make an important contribution to this discussion. the social skills and social participation of pupils with id and their typically developing (td) peers in 38 general education classrooms were assessed with multidimensional instruments. the analyses indicate that the majority of pupils with id were not popular but were socially accepted and had friends. additionally, no significant relationship was found between social skills and the social participation of pupils with id, although such pupils had lower levels of social skills compared with their td peers. thus, it appears that pupils with id do not require high levels of social skills to be befriended or accepted by classmates. in contrast, social skills were associated with popularity and social acceptance within the group of td pupils. in fact, popular td pupils had the highest level of social skills. these findings support the assumption that in addition to low levels of social skills, there must be other mechanisms that influence the social participation of pupils with id in inclusive classrooms. keywords: social skills; social participation; special educational needs; sociometric status; friendship 1 contact information: ariana garrote, department of educational sciences, university of zurich, freiestrasse 36, 8032 zürich, switzerland. e-mail: agarrote@ife.uzh.ch doi: http://dx.doi.org/10.14786/flr.v5i1.266 garrote | f l r 2 1. introduction children learn a wide range of social behaviours and skills in their social interactions with peers. their experiences within this social context influence their socio-emotional development and their later adjustment during adulthood (rubin, bukowski, & parker, 2006). this phenomenon is one reason why the un convention recommends the social participation of pupils with disabilities in the community and the classroom (united nations, 2006). consequently, an increasing tendency to include pupils with special educational needs (sen) in general education schools is observed internationally (koster, nakken, pijl, & van houten, 2009; pijl, frostad, & flem, 2008; ruijs & peetsma, 2009). however, the mere presence of these pupils in general education classrooms does not automatically result in successful social participation. pupils with sen are less involved in social interactions with peers, less accepted and more frequently rejected than their typically developing (td) peers (avramidis, 2013; estell et al., 2008; feldman, carter, asmus, & brock, 2015; garrote, 2016; grütter, meyer, & glenz, 2015; huber, 2006; koster, pijl, nakken, & van houten, 2010; nepi, fioravanti, nannini, & peru, 2015; pijl & frostad, 2010; rotheram-fuller, kasari, chamberlain, & locke, 2010). on one hand, studies demonstrate that td pupils do not like to work with their low-achieving peers, including peers with sen (huber & wilbert, 2012; krull, wilbert, & henneman, 2014; monchy, pijl, & zandberg, 2004). thus, teachers tend to avoid mixing pupils with sen and their td peers for certain activities, such as group work, and therefore, pupils with sen lack shared learning experiences. in fact, feldman et al. (2015) found that due to their teacher’s planning students with severe disabilities, included in general education classrooms, were not present in most of the classes and were not physically close enough to their peers if present. therefore, pupils with sen not only miss opportunities to interact and to become involved in social relationships but also do not receive the chance to learn and practice social skills with their td peers. on the other hand, researchers hypothesize that the difficulties pupils with sen experience with social participation may be due to their lack of social skills (e.g., avramidis, 2013; huber & wilbert, 2012; pijl et al., 2008; schwab, gebhardt, krammer, & gasteiger-klicpera, 2015). however, the relationship between the social participation of pupils with sen and their social skills has not been thoroughly studied, and the concepts of social participation and sen vary from study to study. typically, the samples of pupils with sen are heterogeneous and include pupils with different needs and problems (e.g., children with behavioural problems, with learning disabilities, or with physical disabilities). therefore, valid knowledge on the topic is lacking. this study contributes to closing this research gap. it investigates to what extent the social participation of pupils with sen in general education classrooms is associated with their social skills. social participation and social skills were measured in a large sample of pupils with a diagnosed intellectual disability (id) and their td peers. this study contributes the following novelties to the literature. first, although there are a number of studies investigating the social participation of pupils with sen enrolled in inclusive classrooms, there is little knowledge about influencing factors. thus, this study focusses on social skills, which are assumed to influence the social participation of pupils with sen. second, the heterogeneity of the group of pupils with sen was decreased by limiting the sample to pupils with id. third, social participation was measured using a multidimensional approach that included aspects of social relationships (i.e., friendships) and social acceptance (i.e., popularity and rejection), which are two important dimensions of social participation (bossaert, colpin, pijl, & petry, 2013; koster et al., 2009). fourth, to measure social skills, an empirically well-studied scale with a theoretical foundation was used. fifth, until now, no findings have been reported regarding the relationship between the social participation and the social skills of pupils with id in inclusive classrooms. finally, this study shall give new insights about processes influencing social participation in inclusive classrooms, to derive important implications for further research and for the development of suitable interventions. garrote | f l r 3 1.1. social skills there is no commonly accepted concept of social skills (or competences). however, there is consensus regarding the connection between social skills and successful social interactions as well as the ability to establish and maintain positive social relationships. socially competent individuals are described as being able to use social interactions to satisfy their goals and needs while considering the needs and goals of others (groeben, perren, stadelmann, & klitzing, 2011; perren, forrester-knauss, & alsaker, 2012; rosekrasnor, 1997). this definition differentiates between social skills that are important for the self and social skills that are oriented towards the others. malti and perren (2016) term these two dimensions of social skills selfand other-oriented. initiating and maintaining social interactions, leadership skills and the ability to set limits with peers are considered to be self-oriented skills because they aim at satisfying one’s own needs. other-oriented social skills, such as helping, caring, and cooperating, are based on considering the interests and benefits of others in social interactions. whereas deficits in selfand other-oriented social skills have been associated with negative peer relations, peer rejection, and victimization (bellini, peters, benner, & hopf, 2007; perren et al., 2012; malti & perren, 2016; henricsson & rydell, 2006), engaging in prosocial behaviour and being able to initiate social interactions can help children become positively involved with peers (fabes, martin, & hanish, 2009; henricsson & rydell, 2006; perren, argention-groeben, stadelmann, & klitzing, 2016; rubin et al., 2006). 1.2. social skills of pupils with sen there is evidence that certain groups of pupils with sen are less socially skilled than td children (gresham & macmillan, 1997). for example, pupils with autism spectrum disorders (asd) have difficulties with self-oriented social skills, such as initiating social interactions, which places them more at risk of being socially isolated (bellini et al., 2007). however, intervention studies have demonstrated that these children can benefit from the supportive behaviour of their td peers with respect to the development of social interaction skills (camargo et al., 2014; whalon, conroy, martinez, & werch, 2015). in addition, engaging in other-oriented social skills, such as cooperative and prosocial behaviour, can also have a positive impact on the social participation of pupils with asd (kamps et al., 2002). similar results have been found for pupils with intellectual disabilities (goldstein, english, shafer, & kaczmarek, 1997), learning disabilities (kavale & forness, 1996), and behavioural problems (frederickson & turner, 2003). however, few studies have investigated the relationship between social skills and the social participation of pupils with sen, and the findings are ambiguous. for example, frostad and pijl (2007) found a weak relationship between the social skills (cooperative behaviour and empathy) and the social position (acceptance, friendships and membership in a subgroup) of pupils with sen. however, a separate examination of the group of pupils with learning difficulties and the group of pupils with behaviour problems changed the results. while there was no relation between the social position and the social skills of pupils with learning difficulties, there was a significant relationship for pupils with behavioural problems. the authors concluded that a low level of empathy (i.e., exhibiting concern and respect for the feelings and viewpoint of others) might be an explanation of social difficulties only for pupils with behavioural problems. further, schwab et al. (2015) described a link between self-rated social participation and prosocial behaviour. students with sen (not specified) in secondary schools felt less socially included and reported lower levels of prosocial behaviour than their td peers. thus, the authors interpreted that the poor social participation of pupils with sen might be associated with their reported low levels of prosocial behaviour. this study attempts to contribute to the clarification of the relationship between the social skills and the social participation of pupils with sen in general education classrooms. to obtain more focused findings, the sample of pupils with sen consisted only of children with a diagnosed id. these pupils were compared with selected td peers with respect to social acceptance, social relationships, and selfand otheroriented social skills. the following research questions were addressed: garrote | f l r 4 a) how does the social participation of pupils with id in general education classrooms appear in terms of social relationships and social acceptance? b) how do peers and teachers rate the social skills of pupils with id compared with popular, rejected, and isolated td pupils? c) is there a relationship between the social participation of pupils with id and their social skills? 2. method 2.1. sample the sample consisted of 38 inclusive primary classrooms of the germanand french-speaking parts of switzerland. a total of 692 firstto fourth-graders (aged m = 97.92; sd = 10.51) participated in the study. only general education classrooms that included pupils with id were allowed in the study.2 therefore, in each classroom at least one pupil was officially diagnosed by a school psychologist as having id. based on this diagnosis special education resources were allocated to support these pupils3. in a first step, all 692 participants were individually interviewed regarding their social involvement in the classroom and the social skills of their peers. in a second step, popular, rejected, and isolated pupils were identified based on the collected data (for a detailed description, see the analyses). finally, social skills and peer relationships were estimated by teachers for the selected td pupils (n = 89; 51.6% female) and for the sample of pupils with id (n = 43; 39.5% female). 2.2. measures social acceptance, social relationships, and social skills were assessed with teacher questionnaires and by individual pupil interviews. the instruments used for the interviews were developed and piloted to be suitable for pupils with id. 2.2.1 social acceptance and social relationships to assess social acceptance, the rating method was applied. all pupils rated how much they liked to play with each classmate on a five-point-scale using smileys (1 = l = “i do not like to play with this classmate at all” to 5 = j = “i like to play with this classmate a lot”). each classmate’s name was read to the participants and presented on an individual card. the social relationships of pupils within the classroom were assessed using the nomination procedure. all participants were asked to nominate their regular playmates in the classroom (“with whom in your classroom do you play the most?”). the number of sameand cross-gender nominations was unlimited. teachers also estimated the social acceptance and relationships of the pupils. the teacher version of the selfand other-oriented social competences questionnaire (socomp; see perren et al., 2012) includes the subscale positive peer relationships (5 items; α = 0.89). the items in this subscale primarily address 2 pupils with id were full-time members of the general education classrooms. in each classroom, a special education teacher was present for four to 14 hours per week (m = 7.22, sd = 3.36). 3 the school psychologists applied the criterion iq < 75 to diagnose id at the time of the data collection. here, it is important to mention that iq tests have their limitations when it comes to assessments of children with first language different from the test language or who are in a difficult affective state. in addition, a dichotomous categorisation of td pupils and pupils with id does not reflect disability as a social construct. nevertheless, this categorisation is used in the present study because it is common in practice and affects processes of social participation. garrote | f l r 5 friendships (e.g., “has at least one good friend”) and social acceptance (e.g., “is generally popular among peers”). 2.2.2 social skills the social skills were rated by teachers and peers. the teachers were requested to estimate the social skills of their pupils with id and selected td pupils (popular, rejected, and isolated; for criteria, see 2.3.1) using the socomp questionnaire. the teachers were not informed regarding the selection criteria of the td pupils they rated, which means they were unaware of their sociometric status (popular, rejected, or isolated). all of the socomp items were estimated on a three-point scale (0 = “not true at all” to 2 = “definitely true”). two dimensions of social skills were assessed: self-oriented social skills (α = 0.88) with the three subscales leadership (3 items; e.g., “organizes, suggests play activities to peers”), setting limits (3 items; e.g., “refuses unreasonable requests from others”), and social participation (4 items; e.g., “converses with peers easily”), and other-oriented social skills (α = 0.88), including the two subscales prosocial (5 items; e.g., “frequently helps other children”) and cooperative behaviour (5 items; e.g., “compromises in conflicts with peers”). the other-oriented social skills of the pupils were also estimated by peers with two questions regarding cooperative and prosocial behaviour (α = 0.83). all of the participants rated on a five-point-scale with smileys (1 = l = “i do not agree at all” to 5 = j = “i totally agree”) four randomly selected classmates with respect to how well they could work with them and how helpful they were. 2.3. analyses 2.3.1 social status popular, rejected, and isolated pupils were selected for the study based on individual interviews with all of the participants. to identify popular and rejected pupils, the rating data were analysed. pupils who received the highest score on the scale (5) from most of their peers (at least one standard deviation above average) were categorized as popular. those pupils who received the lowest score on the scale (1) from most of their peers (at least one standard deviation above average) were categorized as rejected. to identify isolated pupils, indegree and outdegree scores were calculated based on the nomination data with ucinet (borgatti, everett, & freeman, 2002). children who were not nominated by any classmate and did not nominate anyone as a playmate (indegree = 0 = outdegree) were categorized as isolated. however, pupils were only identified as isolated if at least 80% of the pupils of the classroom had participated in the study. this cut-off criterion was established to respect the interdependency of network data (huisman & steglich, 2008; robins, pattison, & woolcock, 2004). table 1 shows the distribution of sociometric status among td pupils and pupils with id. in 38 classrooms, 89 td pupils were identified as popular, rejected, or isolated. in the case of 14 pupils, there was an overlap of the sociometric statuses “rejected” and “isolated”. for 46.5% of the pupils with id, a classification into status groups was possible. while most of these pupils were in the category rejected, five were identified as isolated and only one as popular. the rest of the pupils with id (53.5%) were “average”, which means they were not rejected, isolated, or popular. garrote | f l r 6 table 1 sociometric status of pupils with id and their td peers sample n popular rejected isolated isolated and rejected average pupils with id 43 1 14 4 1 23 td pupils 89 38 19 18 14 2.3.2 social acceptance and social relationships several scores for social acceptance (popularity and social rejection) and social relationships (number of friendships and at least one friend) were computed and standardized for each participant. therefore, the rating data, the nomination data, and the subscale on positive peer relationships of the teacher questionnaire were analysed. the rating data were used to calculate the scores for social acceptance. the popularity score consists of the sum of the highest rating on the scale (5) received by all classmates, whereas the social rejection score corresponds to the sum of the lowest ratings on the scale (1) received by all classmates. in addition, a mean score of social acceptance was calculated for each pupil with all ratings from 1 to 5 received by peers. for each pupil, two social relationship scores were calculated. one was the number of reciprocated friendships. the other was a dichotomized score, which represents whether the pupil had at least one friend or none. according to the common practice, reciprocal nominations were defined as friendships (hymel, vaillancourt, mcdougall, & renshaw, 2004). because social network data are strongly influenced by the participant ratio, the values were divided by the maximum number of possible friendships in the classroom ((n * (n – 1)) / 2). the teacher’s view of the social relationships and social acceptance of the pupils was represented with standardized sums of the socomp dimension of positive peer relationships. 2.3.3 social skills the social skills were estimated by teachers and peers. for the teacher’s perspective, the standardized sums of the two dimensions selfand other-oriented social skills of the socomp questionnaire were calculated for each pupil. based on the peer ratings regarding the cooperative and prosocial behaviour of classmates, a standardized mean score was calculated for each pupil. this score represents the otheroriented social skills from the peer’s perspective. 3. results 3.1. social acceptance and social relationships practically all popular td children had at least one friend and on average the highest number of friends compared with the other pupils (table 2). in addition, more than half of the rejected td pupils had friends. a majority of the pupils with id (63%) had at least one reciprocal friend. interestingly, this was the case for most of the rejected and for most of the average pupils with id but not for the only popular pupil in this sample. garrote | f l r 7 table 2 friendships of pupils with id and their td peers friendships group n m (sd) at least one friend (%) pupils with id 43 .98 (1.08) 27 (63) popular 1 rejected 14 .71 (.47) 10 (71) average 23 1.39 (1.27) 17 (74) td pupils 89 1.18 (1.48) 48 (54) popular 38 2.32 (1.53) 35 (92) rejected 19 .89 (.83) 13 (68) isolated pupils are not represented in this table. 3.2. social skills the relationship between the social skills of pupils with id, their sociometric status, and their friendships were analysed with nonparametric tests and correlations. in a first step, comparisons within the status groups of pupils with id and of td pupils were performed. in a second step, status groups of pupils with id were compared with status groups of td pupils. 3.2.1 pupils with id according to teacher reports, popular and accepted pupils with id (n = 24) did not differ significantly from rejected and isolated pupils with id (n = 19) in their social skills (table 3). in addition, only weak correlations were found between the mean acceptance score of pupils with id and their selforiented (r = 0.119) and other-oriented (r = 0.197) social skills. however, differences were found with respect to their positive peer relationships (u = 103.50, z = -3.07, p = 0.002). consistent with their sociometric status, rejected and isolated pupils with id were estimated by teachers to be less accepted and less popular among their peers. additionally, these pupils were perceived by their peers as having lower levels of other-oriented social skills (u = 140.00, z = -2.15, p = 0.031) than the popular and accepted pupils with id. in line with this result, the correlation between the other-oriented social skills rated by peers and the mean acceptance score was high (r = 0.662). however, rejected and isolated pupils with id did not have significantly fewer friends than popular and accepted pupils with id. for further analyses, the sample of pupils with id was split into two groups: pupils with at least one friend (n = 27; 62% accepted and 37% rejected) and pupils without friends (n = 16; 62% accepted and/or isolated, 31% rejected, and 6% popular). regarding social acceptance and rejection, the two groups did not differ significantly. in addition, based on peer and teacher estimations, no significant differences were found regarding self-oriented or other-oriented skills. surprisingly, the groups were also similar in their positive peer relationships reported by teachers. there was only a significant difference regarding the item “has at least one good friend” (u = 127.00, z = -2.52, p = 0.012), which indicates that teachers notice when pupils with id do not have friends. garrote | f l r 8 table 3 social skills mean values (sd) of pupils with id teacher-reported peer-reported group n self-oriented other-oriented other-oriented pupils with id 43 9.35 (3.92) 11.53 (4.39) 3.08 (.77) popular & average 24 10.29 (4.20) 12.42 (4.37) 3.35 (.62) rejected & isolated 19 8.58 (3.42) 10.42 (4.27) 2.74 (.82)* with friends 27 9.52 (3.98) 11.63 (4.85) 3.09 (.71) without friends 16 9.56 (3.97) 11.38 (3.63) 3.06 (.89) the values displayed are not standardised. the teacher-reported values reported range from 0 to 20, and the peerreported values range from 1 to 5. the indicated significant differences refer to a comparison with the value in the row above (* p < 0.05). 3.2.2 td pupils according to the teacher reports, popular td pupils (n = 38) had significantly higher levels of selforiented social skills (u = 588.50, z = -3.16, p = 0.002), other-oriented social skills (u = 305.50, z = -5.53, p = 0.000), and positive peer relationships (u = 142.50, z = -6.98, p = 0.000) than rejected and isolated td pupils (n = 51). additionally, rejected and isolated td pupils had fewer friends (u = 312.50, z = -6.60, p = 0.000) and were estimated by peers as exhibiting a lower level of other-oriented social skills (u = 107.00, z = -7.15, p = 0.000) than popular td pupils (table 4). in addition, high correlations were found between the mean acceptance score and the other-oriented social skills reported by teachers (r = 0.616) and by peers (0.727). however, the correlation between the mean acceptance score and the self-oriented social skills was low (r = 0.254). there were significant differences between td pupils with at least one friend (n = 48; 73% popular and 27% rejected) and without friends (n = 41; 48% rejected, 43% isolated, and 7% popular). td pupils with friends were estimated by teachers as having higher self-oriented (u = 1406.50, z = 3.49, p = 0.000) and other-oriented social skills (u = 1449.00, z = 3.85, p = 0.000). similar results emerged regarding peerreported other-oriented social skills (u = 1504.50, z = 4.28, p = 0.000). thus, according to teacher and peer reports, td pupils without friends tend to be less socially competent than td pupils with friends. in addition, the teacher reports displayed differences regarding positive peer relationships (u = 1578.50, z = 4.98, p = 0.000). this outcome indicates that teachers notice when pupils do or do not have friends. additionally, td pupils with and without friends differed significantly with respect to social acceptance (u = 1611.00, z = 5.16, p = 0.000) and rejection (u = 626.00, z = -2.82, p = 0.005). thus, popular td pupils tend to have friends, whereas rejected td pupils tend to be friendless. these accentuated results might have appeared because the subsample consists of pupils of extreme status groups: the most popular and the most rejected pupils in the classroom. this fact must be considered when interpreting the results. garrote | f l r 9 table 4 social skills mean values (sd) of td pupils teacher-reported peer-reported group n self-oriented other-oriented other-oriented td pupils 89 13.02 (5.42) 14.03 (5.15) 3.60 (.97) popular 38 15.08 (4.56) 17.50 (3.11) 4.35 (.56) rejected & isolated 51 11.49 (5.54)** 11.45 (4.86)*** 3.04 (.82)*** with friends 48 14.90 (4.57) 15.92 (4.58) 3.97 (.83) without friends 41 10.83 (5.55)*** 11.78 (4.91)*** 3.17 (.94)*** the values displayed are not standardised. the teacher-reported values range from 0 to 20, and the peer-reported values range from 1 to 5. the indicated significant differences refer to a comparison with the value in the row above (*** p < 0.001. ** p < 0.01). 3.2.3 comparison of pupils with id with td pupils compared with rejected td pupils (n = 19), rejected pupils with id (n = 14) had a significantly lower level of self-oriented social skills (u = 54.50, z = -2.87, p = 0.003; table 5). however, they did not differ with respect to their other-oriented social skills and positive peer relationships reported by their teachers. in addition, the result of a comparison of all pupils with id (n = 43) with the rejected and/or isolated td pupils (n = 51) revealed no significant differences regarding social skills, whereas regarding social relationships, a difference was found: an advantage for pupils with id. the teachers estimated the positive peer relationships of pupils with id on a higher level than of rejected and/or isolated td pupils (u = 777.00, z = -2.44, p = 0.015). in addition, pupils with id had significantly more friends than rejected and/or isolated td pupils (u = 660.50, z = -3.68, p = 0.000). table 5 social skills mean values (sd) of pupils with id and td pupils teacher-reported peer-reported group n self-oriented other-oriented other-oriented rejected pupils with id vs. 14 7.93 (3.45) 10.93 (4.51) 2.71 (.78) rejected td pupils 19 13.26 (5.34)** 10.79 (4.76) 2.95 (.77) pupils with id vs. 43 9.35 (3.92) 11.53 (4.39) 3.08 (.77) popular td pupils 38 15.08 (4.56)*** 17.50 (3.11)*** 4.35 (.56)*** rejected & isolated td pupils 51 11.49 (5.54) 11.45 (4.86) 3.04 (.82) the values displayed are not standardised. the teacher-reported values range from 0 to 20, and the peer-reported values range from 1 to 5. *** p < 0.001. ** p < 0.01. a different set of results emerged from the comparison of pupils with id (n = 43) with popular td pupils (n = 38). significant differences in all aspects were found. the pupils with id were estimated to have lower levels of self-oriented social skills (u = 295.00, z = -4.95, p = 0.000), other-oriented social skills (u = garrote | f l r 10 212.00, z = -5.75, p = 0.000), positive peer relationships (u = 169.00, z = -6.29, p = 0.000), and peerreported other-oriented social skills (u = 99.00, z = -6.79, p = 0.000) than popular td pupils. in addition, pupils with id had significantly fewer friends than popular td pupils (u = 162.50, z = -6.19, p = 0.000). finally, a fishers’ z-test was performed to compare pupils with id (n = 43) with their td peers (n = 89) regarding the correlations between social acceptance and social skills. only the correlation between social acceptance and the other-oriented social skills rated by the teachers was significantly higher (z = 2.711, p = 0.003) in the group of td pupils (r = 0.616) than in the group of pupils with id (r = 0.197). this result indicates that pupils with id are not less accepted if they have lower levels of other-oriented social skills. in contrast, socially accepted td pupils have higher levels of other-oriented social skills. in addition, the correlations between other-oriented social skills rated by peers and social acceptance were high for both groups and in turn did not differ significantly. this outcome means that for pupils with id as well as for td pupils being socially accepted was related to being perceived by peers as being helpful and cooperative. in contrast, for both groups, the self-oriented social skills were weakly correlated with social acceptance. 4. discussion until now, little was known regarding the social participation of pupils with id enrolled in general education classrooms. in addition, no studies have investigated the role that social skills may play in social participation. this study contributes to clarifying the relationship between the social skills and the social participation of pupils with id in inclusive classrooms. therefore, the social relationships and social acceptance of these pupils were analysed in relation to their selfand other-oriented social skills. in addition, a comparison of social skills was performed between pupils with id and td pupils experiencing a more or less positive social participation. this study reveals that most pupils with id enrolled in inclusive classrooms were not popular but were accepted by their peers. in addition, a majority of these pupils, including those who were rejected, had reciprocal friends. these findings agree with studies which report that not all pupils with sen are at risk of being isolated or rejected in general education classrooms (e.g., avramidis, 2013; frostad, mjaavatn, & pijl, 2011; frostad & pijl, 2007; grütter et al., 2015; koster et al., 2010; schwab, 2015). regarding social skills, in this study, pupils with id were generally less socially competent than their td peers. however, no significant association was found with having friends. pupils with id who had friends exhibited a similar level of social skills as pupils with id who were isolated or did not have reciprocal friendships. thus, it seems that their ability to form and maintain friendships was not influenced by a lack of social skills, and they even appear to possess skills that benefit interaction with peers. according to the literature, children require a number of basic social skills to form friendships (gest, graham-bermann, & hartup, 2001; sebanc, 2003). thus, it seems that the majority of the pupils with id in this sample must possess these basic social skills. this finding is promising because peer relationships can positively contribute to the socio-emotional adjustment of children (gifford-smith & brownell, 2003; murray & greenberg, 2006). in addition, the social skills of pupils with id were not always related to their social acceptance. from the perspective of peers, rejected pupils with id were estimated as being less cooperative and prosocial than accepted pupils with id. however, from the perspective of teachers, the social skills of rejected pupils with id did not differ from those of accepted pupils with id. similar results were presented in a study by frostad and pijl (2007), who found no significant relationship between the social skills of pupils with learning disabilities and their social acceptance or their social relationships. in addition, the two variables were only weakly related when the entire sample of pupils with and without sen was analysed. however, pupils with sen had lower levels of social skills than their td peers. garrote | f l r 11 in this study, pupils with id were also compared regarding their social skills with td pupils experiencing more or less difficulties in their social participation in classroom. while the social participation of popular td pupils seemed to be satisfactory, rejected and isolated td pupils experienced difficulties in their social participation. in a first step, the rejected and isolated td pupils were compared with the entire sample of pupils with id. as expected, there were no differences between the two groups with respect to social skills. however, pupils with id had fewer difficulties in building and maintaining social relationships than rejected and isolated td pupils. this outcome means that although pupils with id had levels of social skills similarly low to those of their rejected and isolated td peers, they tended to have more friends. however, the latter finding could be biased because isolated pupils are friendless by definition. thus, only rejected pupils with id and rejected td pupils were compared. these two groups displayed similarities in their peer relationships and their other-oriented social skills. however, rejected pupils with id had lower levels of self-oriented social skills than their rejected td peers. based on these results, the obvious conclusion is that pupils with id can experience more positive social relationships than certain of their td peers but tend to have more difficulties in setting limits, initiating social interactions and leading than their rejected td peers. otherwise, the pupils with id have levels of social skills similarly low to those of their rejected and isolated td peers. this result means that in general education classrooms not only pupils with sen but also certain of their td peers lack social skills and can therefore also be at risk of social exclusion. consequently, interventions to foster social participation should not only focus on pupils with sen but also should involve the entire class. in fact, peer-mediated learning activities that involve all pupils can positively influence the social interactions in inclusive classrooms (fuchs, fuchs, mathes, & martinez, 2002; jacques, wilton, & townsend, 1998). in a second step, a comparison between pupils with id and popular td pupils was performed. significant differences between the two groups were found. pupils with id had fewer friends and exhibited lower levels of selfand other-oriented social skills than popular td pupils. this contrast between the two groups was perhaps because practically none of the pupils with id were popular. consequently, it could be argued that pupils with id are not as popular as certain of their td peers because of their lack of social skills. on one hand, a strong positive relationship was found between the level of other-oriented social skills and the social acceptance of td pupils. this outcome could indicate that for td pupils providing particular consideration to the needs and goals of others can positively influence their social acceptance or popularity in a group. a similar finding was reported by gest et al. (2001). in their study, children who were perceived by teachers and peers as socially skilled were more popular among peers. on the other hand, among pupils with id, there was a weak relationship between social skills and social acceptance. more specifically, only peer rated other-oriented social skills were related to social acceptance, but not selfand other-oriented social skills reported by teachers. that is, if pupils with id were rejected by their peers, it was probably not only because of their low level of social skills. thus, there must be other mechanisms influencing the social participation of pupils with id in inclusive classrooms, for example, the achievement level of these pupils. in fact, krull et al. (2014) and nepi et al. (2015) found a relationship between low academic achievement levels and low social acceptance by peers. further, on a group level, classroom composition and group norms can also play a crucial role regarding the social participation of individuals in inclusive classrooms (garrote, 2016; grütter et al., 2015). but, additional studies are required to support these findings. the findings of this study are a contribution to current knowledge on the social participation and the social skills of pupils with id in inclusive classrooms. nevertheless, several limitations of this study should be mentioned. first, a specific concept of social skills was chosen. while such a choice makes the findings more conclusive within a study, comparisons with other studies using different concepts are challenging. second, when interpreting the results, it must be considered that correlations between aspects of social participation and social skills might appear because of overlapping concepts. for example, the aspect of social interactions can be found (in a different function) in relation to the assessment of social skills and of social participation. third, the high correlations between social acceptance and peer-rated other-oriented social skills might result from the assessment method. participants were requested to rate how much they liked to play with their classmates. subsequently, they rated how helpful their classmates were and how well garrote | f l r 12 they could work with them. if we consider the cognitive process of dissonance (festinger, 1957), it is expected that pupils will rate their classmates consistently or that the two variables will correlate highly. fourth, in this study, groups were compared regarding social skills. the question of which basic social skills pupils with sen require to have friends or be accepted by their peers remains unanswered. fifth, pupils with id were compared to td pupils of extreme status groups: the most popular and the most rejected pupils in the classroom. this was due to the study design, in which detailed data collection was restricted to a number of pupils in the sample. sixth, a dichotomous categorisation of pupils with id and td pupils was used to investigate possible hindering factors in the social participation of pupils with id. although, this categorisation does not reflect disability as a social construct, it is commonly used in practice to allocate special education resources to support individual pupils. in addition, how special educational support is implemented (e.g., in a resource-room) can enhance the perception of pupils with id as an out-group and influence processes of social participation in inclusive classrooms. to shed light into social processes influenced by this dichotomy used in practice further studies are needed. in conclusion, the findings support the assumption that social skills are only one possible explanation why certain pupils with id are more at risk of being less socially involved in their classrooms than their td peers. having friends or being rejected did not seem to depend on the low level of social skills of pupils with id. thus, there must be other factors that have a stronger influence on the social participation of these pupils. possible factors may be identified on the individual level (e.g., having the label “id”). however, group processes should be considered as well. focusing on group processes rather than on individual characteristics has been a promising approach regarding the development and evaluation of interventions to foster social interactions among pupils with and without sen (whalon et al., 2015). indeed, facilitating social participation requires the effort and engagement of all, including peers and teachers (farmer, mcauliffe lines, & hamm, 2011; garrote, sermier dessemontet, & moser opitz, 2017; gest & rodkin, 2011). this change of perspective could also be beneficial for research on social participation. this approach demands from researchers focusing more on variables on classroom or group level rather than solely on individual characteristics. in fact, social participation of individuals in inclusive classrooms is very likely to vary as a function of group and individual factors. keypoints most pupils with id have friends and are accepted by classmates in inclusive classrooms. generally, social skills might play a role in the social participation of pupils but are not the only influencing factor. for pupils with id in primary classrooms, high levels of self-oriented and other-oriented social skills are not necessary to be socially accepted and have friends. acknowledgements this work was supported by the swiss national science foundation [grant number 146086]. references avramidis, e. (2013). self-concept, social position and social participation of pupils with sen in mainstream primary schools. research papers in education, 28(4), 421–442. doi:10.1080/02671522.2012.673006 garrote | f l r 13 bellini, s., peters, j. k., benner, l., & hopf, a. (2007). a meta-analysis of school-based social skills interventions for children with autism spectrum disorders. remedial and special education, 28(3), 153– 162. doi:10.1177/07419325070280030401 borgatti, s. p., everett, m. g., & freeman, s. f. (2002). ucinet 6 for windows: software for social network analysis. natick: analytic technologies, inc. bossaert, g., colpin, h., pijl, s. j., & petry, k. (2013). truly included? a literature study focusing on the social dimension of inclusion in education. international journal of inclusive education, 17(1), 60–79. doi:10.1080/13603116.2011.580464 camargo, s., rispoli, m., ganz, j., hong, e., davis, h., & mason, r. (2014). a review of the quality of behaviorally-based intervention research to improve social interaction skills of children with asd in inclusive settings. journal of autism and developmental disorders, 44(9), 2096-2116. doi:10.1007/s10803-014-2060-7 estell, d. b., jones, m. h., pearl, r., van acker, r., farmer, t. w., & rodkin, p. c. (2008). peer groups, popularity, and social preference: trajectories of social functioning among students with and without learning disabilities. journal of learning disabilities, 41(1), 5–14. doi:10.1177/0022219407310993 fabes, r. a., martin, c. l., & hanish, l. d. (2009). children's behaviors and interactions with peers. in k. h. rubin, w. m. bukowski, & b. p. laursen (eds.), social, emotional, and personality development in context. handbook of peer interactions, relationships, and groups (pp. 45–62). new york: guilford press. farmer, t. w., mcauliffe lines, m., & hamm, j. v. (2011). revealing the invisible hand: the role of teachers in children's peer experiences. journal of applied developmental psychology, 32(5), 247–256. doi:10.1016/j.appdev.2011.04.006 feldman, r., carter, e. w., asmus, j., & brock, m. e. (2015). presence, proximity, and peer interactions of adolescents with severe disabilities in general education classrooms. exceptional children, 82(2), 192– 208. doi:10.1177/0014402915585481 festinger, l. (1957). a theory of cognitive dissonance. stanford, california: stanford university press. frederickson, n., & turner, j. (2003). utilizing the classroom peer group to adress children's social needs: an evaluation of the circle of friends intervention approach. the journal of special education, 36(4), 234–245. frostad, p., mjaavatn, p. e., & pijl, s. j. (2011). the stability of social relations among adolescents with special educational needs (sen) in regular schools in norway. london review of education, 9(1), 83– 94. doi:10.1080/14748460.2011.550438 frostad, p., & pijl, s. j. (2007). does being friendly help in making friends? the relation between the social position and social skills of pupils with special needs in mainstream education. european journal of special needs education, 22(1), 15–30. doi:10.1080/08856250601082224 fuchs, d., fuchs, l. s., mathes, p. g., & martinez, e. a. (2002). preliminary evidence on the social standing of students with learning disabilities in pals and no–pals classrooms. learning disabilities research & practice, 17(4), 205–215. doi:10.1111/1540-5826.00046 garrote, a. (2016). soziale teilhabe von kindern in inklusiven klassen. empirische pädagogik, 30(1), 67– 80. garrote, a., sermier dessemontet, r., & moser opitz, e. (2017). facilitating the social participation of pupils with special educational needs in mainstream schools: a review of school-based interventions. educational research review, 20, 12–23. doi:10.1016/j.edurev.2016.11.001 gest, s. d., graham-bermann, s. a., & hartup, w. w. (2001). peer experience: common and unique features of number of friendships, social network centrality, and sociometric status. social development, 10(1), 23–40. doi:10.1111/1467-9507.00146 gest, s. d., & rodkin, p. c. (2011). teaching practices and elementary classroom peer ecologies. journal of applied developmental psychology, 32(5), 288–296. doi:10.1016/j.appdev.2011.02.004 gifford-smith, m. e., & brownell, c. a. (2003). childhood peer relationships: social acceptance, friendships, and peer networks. journal of school psychology, 41(4), 235–284. doi:10.1016/s00224405(03)00048-7 garrote | f l r 14 goldstein, h., english, k., shafer, k., & kaczmarek, l. (1997). interaction among preschoolers with and without disabilities: effects of across-the-day peer intervention. journal of speech, language & hearing research, 40(1), 33–48. retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=c8h&an=1998076502&site=ehost-live gresham, f. m., & macmillan, d. l. (1997). social competence and affective characteristics of students with mild disabilities. review of educational research, 67(4), 377–415. retrieved from http://www.jstor.org/stable/1170514 groeben, m., perren, s., stadelmann, s., & klitzing, k. (2011). emotional symptoms from kindergarten to middle childhood: associations with selfand other-oriented social skills. european child & adolescent psychiatry, 20(1), 3-15. doi:10.1007/s00787-010-0139-z grütter, j., meyer, b., & glenz, a. (2015). sozialer ausschluss in integrationsklassen: ansichtssache? psychologie in erziehung und unterricht, 62(1), 65. doi:10.2378/peu2015.art05d henricsson, l., & rydell, a.-m. (2006). children with behaviour problems: the influence of social competence and social relations on problem stability, school achievement and peer acceptance across the first six years of school. infant and child development, 15(4), 347–366. doi:10.1002/icd.448 huber, c. (2006). soziale integration in der schule?!: eine empirische untersuchung zur sozialen integration von schülern mit sonderpädagogischem förderbedarf im gemeinsamen unterricht. marburg: tectum-verlag. huber, c., & wilbert, j. (2012). soziale ausgrenzung von schülern mit sonderpädagogischem förderbedarf und niedrigen schulleistungen im gemeinsamen unterricht. empirische sonderpädagogik, (2), 147– 165. huisman, m., & steglich, c. (2008). treatment of non-response in longitudinal network studies. social networks, 30(4), 297–308. doi:10.1016/j.socnet.2008.04.004 hymel, s., vaillancourt, t., mcdougall, p., & renshaw, p. (2004). peer acceptance and rejection in childhood. in p. k. smith & c. h. hart (eds.), blackwell handbook of childhood social development. oxford, uk: blackwell publishing ltd. jacques, n., wilton, k., & townsend, m. (1998). cooperative learning and social acceptance of children with mild intellectual disability. journal of intellectual disability research, 42(1), 29–36. doi:10.1046/j.1365-2788.1998.00098.x kamps, d., royer, j., dugan, e., kravits, t., gonzalez-lopez, a., garcia, j., . . . garrison kane, l. (2002). peer training to facilitate social interaction for elementary students with autism and theirs peers. council for exceptional children, 68(2), 173–187. kavale, k. a., & forness, s. r. (1996). social skill deficits and learning disabilities: a meta-analysis. journal of learning disabilities, 29(3), 226–237. doi:10.1177/002221949602900301 koster, m., nakken, h., pijl, s. j., & van houten, e. (2009). being part of the peer group: a literature study focusing on the social dimension of inclusion in education. international journal of inclusive education, 13(2), 117–140. doi:10.1080/13603110701284680 koster, m., pijl, s. j., nakken, h., & van houten, e. (2010). social participation of students with special needs in regular primary education in the netherlands. international journal of disability, development and education, 57(1), 59–75. doi:10.1080/10349120903537905 krull, j., wilbert, j., & hennemann, t. (2014). soziale ausgrenzung von erstklässlerinnen und erstklässlern mit sonderpädagogischem förderbedarf im gemeinsamen unterricht. empirische sonderpädagogik, 6(1), 59–75. malti, t., & perren, s. (eds.). (2016). soziale kompetenz bei kindern und jugendlichen: entwicklungsprozesse und förderungsmöglichkeiten (2nd ed.). stuttgart: kohlhammer. monchy, m. de, pijl, s. j., & zandberg, t. (2004). discrepancies in judging social inclusion and bullying of pupils with behaviour problems. european journal of special needs education, 19(3), 317–330. doi:10.1080/0885625042000262488 murray, c., & greenberg, m. t. (2006). examining the importance of social relationships and social contexts in the lives of children with high-incidence disabilities. the journal of special education, 39(4), 220–233. doi:10.1177/00224669060390040301 garrote | f l r 15 nepi, l. d., fioravanti, j., nannini, p., & peru, a. (2015). social acceptance and the choosing of favourite classmates: a comparison between students with special educational needs and typically developing students in a context of full inclusion. british journal of special education, n/a. doi:10.1111/14678578.12096 perren, s., argention-groeben, m., stadelmann, s., & klitzing, k. von. (2016). selbstund fremdbezogene soziale kompetenzen: auswirkungen auf das emotionale befinden. in t. malti & s. perren (eds.), soziale kompetenz bei kindern und jugendlichen. entwicklungsprozesse und förderungsmöglichkeiten (2nd ed., pp. 91–110). stuttgart: kohlhammer. perren, s., forrester-knauss, c., & alsaker, f. d. (2012). selfand other-oriented social skills: differential associations with children’s mental health and bullying roles. journal for educational research online / journal für bildungsforschung online; vol 4, no 1 (2012): assessment and development of social competence, (1), 99–123. retrieved from http://www.j-e-r-o.com/index.php/jero/article/view/306 pijl, s. j., & frostad, p. (2010). peer acceptance and self-‐‑concept of students with disabilities in regular education. european journal of special needs education, 25(1), 93–105. doi:10.1080/08856250903450947 pijl, s. j., frostad, p., & flem, a. (2008). the social position of pupils with special needs in regular schools. scandinavian journal of educational research, 52(4), 387–405. doi:10.1080/00313830802184558 robins, g., pattison, p., & woolcock, j. (2004). missing data in networks: exponential random graph (p∗) models for networks with non-respondents. social networks, 26(3), 257–283. doi:10.1016/j.socnet.2004.05.001 rose-krasnor, l. (1997). the nature of social competence: a theoretical review. social development, 6(1), 111–135. doi:10.1111/j.1467-9507.1997.tb00097.x rotheram-fuller, e., kasari, c., chamberlain, b., & locke, j. (2010). social involvement of children with autism spectrum disorders in elementary school classrooms. journal of child psychology and psychiatry, 51(11), 1227–1234. doi:10.1111/j.1469-7610.2010.02289.x rubin, k. h., bukowski, w. m., & parker, j. g. (2006). peer interactions, relationships, and groups. in n. eisenberg (ed.), handbook of child psychology (6th ed., pp. 571–645). hoboken, n.j: john wiley & sons. ruijs, n. m., & peetsma, t. t. d. (2009). effects of inclusion on students with and without special educational needs reviewed. educational research review, 4(2), 67–79. doi:10.1016/j.edurev.2009.02.002 schwab, s. (2015). social dimensions of inclusion in education of 4th and 7th grade pupils in inclusive and regular classes: outcomes from austria. research in developmental disabilities, 43-44, 72–79. doi:10.1016/j.ridd.2015.06.005 schwab, s., gebhardt, m., krammer, m., & gasteiger-klicpera, b. (2015). linking self-rated social inclusion to social behaviour. an empirical study of students with and without special education needs in secondary schools. european journal of special needs education, 30(1), 1–14. doi:10.1080/08856257.2014.933550 sebanc, a. m. (2003). the friendship features of preschool children: links with prosocial behavior and aggression. social development, 12(2), 249–268. doi:10.1111/1467-9507.00232 united nations (2006). convention on the rights of persons with disabilities and optional protocol. new york: united nations. whalon, k. j., conroy, m. a., martinez, j. r., & werch, b. l. (2015). school-based peer-related social competence interventions for children with autism spectrum disorder: a meta-analysis and descriptive review of single case research design studies. journal of autism and developmental disorders, 45(6), 1513–1531. doi:10.1007/s10803-015-2373-1 microsoft word iordanou_publication.docx frontline learning research vol.4 no. 5 (2016) 106 -‐ 119 issn 2295-‐3159 corresponding author: kalypso iordanou, university of central lancashire, 12 -14 university avenue, pyla, 7080 larnaka, cyprus. email address: kiordanou@uclan.ac.uk doi: http://dx.doi.org/10.14786/flr.v4i5.252 from theory of mind to epistemic cognition. a lifespan perspective kalypso iordanou university of central lancashire article received 25 april / revised 17 august / accepted 5 september / available online 19 january abstract although a sizeable body of research now exists in epistemic cognition, it tends to stand apart from other aspects of cognition and cognitive development. here it is proposed to situate epistemic cognition in a context of its roots and development as a dimension of cognitive development more generally. the present paper draws a strong continuous link between the earliest understanding of other minds, examined under the theory of mind, and the tasks that confront adults throughout the lifespan – that of interpreting evidence and coordinating it with what they already take to be true. the primary focus is the how question of knowledge change. to gain insight into this question, it is proposed to focus on epistemic activity in action. it is suggested here that the standards for knowledge formation and revision, which are closely connected with epistemic understanding of theory-evidence coordination, change developmentally. another major change proposed is that the process increasingly comes under conscious control. keywords: epistemic cognition, cognitive development, argumentation, theory of mind iordanou | f l r 107 1. problem how do people know? how do people form beliefs? how do people revise beliefs? are there developmental differences in this regard? these questions have long been a concern of psychologists, philosophers and educators. their answers can be found in writing on the topic of epistemic cognition (bendixen & rule, 2004; greene, muis, & pieschl, 2010; greene, sandoval, & bråten, 2016; muis, bendixen, & haerle, 2006; perry, 1970). following chinn, buckland, and samarapungaran (2011) and greene et al. (2016), epistemic cognition is defined here as “cognition of or relating to knowledge” (greene et al., 2016, p. 3). although a sizeable body of research now exists under this heading, it tends to stand apart, with few connections to other aspects of cognition and cognitive development. here i propose a broader view, situating epistemic cognition in a context of its roots and development as a dimension of cognition and cognitive development more generally. my primary focus is on the mechanism question, the ‘how’ question of knowledge change. i propose that this change only comes about through application of one's epistemic cognition in practice, which consists of forming and revising claims. this is a continuous process through life and there is reason to think that the nature of the process changes, with mechanisms and standards for knowledge formation and revision changing developmentally. a major change, i propose, is that the process increasingly comes under conscious control. to gain insight into the how question of knowledge change, i propose focusing on epistemic activity in action – the application of epistemic cognition. available models of the developmental progression of epistemic cognition offer a very general stage-like description of this progression with little attention to mechanism. they began with perry’s (1970) study of harvard undergraduate students, hardly a broad sample of the population. for many years, research following perry’s work continued the study of adolescent and adult samples, with no reference to the developmental origins of their thought. an assumption that epistemic beliefs emerge abruptly in adolescence and remain unchanged thereafter – a non-developmental account – seems unwarranted, considering the cognitive development that occurs along so many other dimensions during the years between early childhood and adolescence. researchers are thus left with few answers to the key questions of how epistemic conceptions emerge and how they continue to develop. although researchers have gone on to address many other important questions, such as how epistemic cognition is related to academic performance (muis, kendeou, franko, 2011; stømsø, bråten, & britt, 2011), little work has been done to further our understanding of its development. largely standing today is the 2004 conclusion drawn by bendixen and rule, “currently there is neither a unified model of epistemological understanding to guide research, nor a single model that clearly articulates the relationship between personal epistemology and how epistemological beliefs change and develop” (p. 69). a fuller developmental account is essential not only for expanding understanding at a theoretical level but also for its educational implications, by identifying means to support development of sophisticated epistemic cognition. it is not the case that little is known about cognitive development in the first decade of life, and more specifically development potentially relevant to epistemic cognition. in particular, there is now an extensive literature, particularly in the field of developmental psychology, on children’s theory of mind (tom), addressed to how young children understand their own and others’ minds. however, like epistemic cognition research, tom research has been confined to a particular age range – in the case of tom the first years of life − with very little work addressed to older children. research on older children’s higher-order tom tends to have an atheoretical quality, placing more emphasis on the application of second-order understanding and how it affects other aspects of development rather than on the development of a comprehensive theory addressing issues such as how change occurs (miller, 2012). in understanding developing knowledge about knowing, then, there exists a conspicuous gap consisting of the decade between early childhood and adolescence. my aim is to fill this gap by identifying a continuous development and in doing so to examine the nature of the process of change. iordanou | f l r 108 2. a model of development of epistemic cognition: what develops? drawing on chinn’s et al. (2011) model of epistemic cognition, i propose that the epistemic standards that individuals employ change developmentally. epistemic processes refer to strategies and other activities by which one can achieve knowledge. epistemic standards refer to the standards used to evaluate knowledge claims (chinn et al., 2011). the literature on epistemic cognition has focused on examining people's beliefs about knowledge and knowing – known as their epistemic beliefs (greene et al., 2016; kitchener, 2002). the model proposed here extends the literature significantly by proposing the examination of application of people’s epistemic beliefs (epistemic activity in action), rather than focusing on epistemic beliefs themselves. agreeing with sandoval (2005) that students’ beliefs about their own knowing may differ from their beliefs about scientists’ knowing, i extend this idea by proposing that the application of students’ epistemic beliefs in practice may differ from their epistemic beliefs. i further advocate that the distance between epistemic activity in action and epistemic cognition decreases as one acquires increasing awareness and control of each. given the focus of the present work on epistemic activity in action, the development of epistemic standards is a focus, although it is acknowledged that other components of epistemic cognition (i.e., epistemic processes, values) also develop and interact with epistemic standards (clement et al., 2015). insights from research in tom, testimony and argumentation contribute to addressing the question of what develops in the epistemic realm and supports knowledge change. 2.1. epistemic standards change developmentally 2.1.1. epistemic standards in early childhood the origins of epistemic cognition are identifiable in the early childhood achievements examined under the theory-of-mind literature (kuhn, cheney & weinstock, 2000). even young children form and revise beliefs. they are just not aware of doing so. for example, preschoolers have the tendency to report they have always known information they have just learned (taylor, esbensen & bennett, 1994). what influences young children to adopt and revise beliefs? a growing literature on testimony provides insights regarding young children’s standards in judging the credibility of the source of new information. standards that young children employ include an informant’s expertise, age, power, group membership and relationship with the child, with children showing preference for informants who are experts in the domain that the information is related to, to older informants, to authority figures and to those who have close familiarity with or belong to the same group (harris & carriveau, 2011; mills, 2013). for example, children prefer to seek and endorse information from native-accented speakers (kinzler, corriveau & harris, 2011). other epistemic standards employed by young children include an informant’s record of accuracy (harris & carriveau, 2011) and an informant’s confidence about their knowledge (jaswal & malone, 2007). research examining young children’s reasoning with peers also offers insights regarding standards that young children employ in modifying their beliefs. this line of research shows that even three-year-olds use evidence (e.g., this is ice) to justify their claims (kӧymen, rosenbaum & tomasello, 2014). notably, research shows that young children’s standards change with age. remarkable differences have been reported between the age of three and four. while three-year-olds show preference for egocentric standards, such as familiarity with the informant (corriveau, harris, et al., 2009), four-year-olds prefer more objective and more germane standards, such as the informant’s history of reliability. for example, four-year-olds show preference toward informers who have been reliable in their past performance, even when they have to reject familiar individuals who weren’t reliable in recent judgments in favour of reliable strangers (corriveau & harris, 2009). children by the age of four show greater sensitivity to the number and kind of errors made by an informant (mills, 2013) and are able to distinguish experts based on their domain of expertise, showing preference for one expert over another depending on the issue they are dealing with and experts’ domain of expertise, compared to three-year-olds (koenig & jaswal, 2011; sobel & corriveau, 2010). for example, when children were presented with a new dog, they preferred to ask the dog expert rather than a novice about the name of the dog (koenig & jaswal, 2011). furthermore, five-year-olds show better understanding of how iordanou | f l r 109 relevant facts can be used to affect knowledge change in others, as evidenced by the production of more justifications in their dialogues. importantly, they are also more open to changing their knowledge than three-year-olds (kӧymen, rosenbaum & tomasello, 2014). this change during the third to fourth year of life takes place at the same time as major developmental milestones are observed in children’s cognitive development, as manifested in their achievements in the false belief task. this co-incidence supports the more general position proposed here that development of epistemic cognition should be situated in cognitive development more generally. 2.1.2. epistemic standards in middle childhood the epistemic standards employed by elementary-school children remain predominantly egocentric. barzilai and zohar (2012), examining via think-alouds how elementary school students judged the trustworthiness of websites, found that the predominant epistemic standard employed was personal authority – asking, for example, their mom. more objective and rigorous epistemic standards, such as website author’s expertise, scientific evidence or author biases were very rarely employed. yet, during elementary school years, there is a developing appreciation of the epistemic standard of judging epistemic products (e.g., arguments, models) on the basis of their fit to evidence. pluta, chinn, and duncan (2011) asked elementary school students to generate a list of criteria to evaluate scientific models and found that a quarter reported criteria relating to model fit with evidence (although other criteria were more commonly reported). although elementary-school students show a developing appreciation for data to support their claims, they show preference for data from their own knowledge or experience rather than more objective scientific evidence (amsel & brock, 1996; anderson, chinn, change, waggoner, & yi, 1997; kuhn & moore, 2015). kuhn and moore (2015) examined how elementary-school students used evidence in their dialogues to convince their peers to change beliefs about a social science and a physical science topic. they found that, even though a list of relevant shared evidence was available, about 90% of the evidence that students employed came from their personal knowledge and experience. similar results were observed in research examining how elementary-school students deal with evidence that disconfirms their prior beliefs. amsel and brock (1996) examined children’s behaviour when, in their experimentation, they encountered findings that contradicted their prior beliefs; they found that children in elementary childhood failed to use the new evidence to change their beliefs. students typically are biased in evaluating evidence; they tend to ignore evidence that contradicts their knowledge or distort evidence to fit their existing theories (chinn & brewer, 1993). much of the research on scientific reasoning reports similar results (lehrer & schauble, 2015; sandoval, sodian, koerber, & wong, 2014). 2.1.3. epistemic standards in adolescence in adolescence, attention to objective data increases, although data based on personal knowledge remains a more predominant epistemic standard. subjective epistemic standards, such as agreement with one’s own knowledge, are predominant in adolescents’ judgments about the trustworthiness of sources (mason, boldrin, & ariasi, 2010) and about the veracity of knowledge claims (mason, ariasi, & boldrin, 2011). iordanou and constantinou (2015) examined how 15and 16-year-olds argue with peers who hold opposing views on a socio-scientific issue, in a knowledge-rich learning environment. participants’ dialogue transcripts were analyzed in terms of the overall use of evidence, the amount of evidence per argument and per counterargument, the function of evidence use and the accuracy of the evidence employed. only a quarter of adolescents’ dialogue units contained evidence. adolescents employed evidence most of the time to support their own position rather than to weaken the opposing position. in terms of the epistemic standards employed, these older adolescents, like middle-school students (kuhn & moore, 2015), used undocumented evidence claims from personal knowledge to support their claims. eighty percent of adolescents’ dialogue units made claims based on personal knowledge. besides limited use of evidence in argument production, limited employment of rigorous epistemic standards regarding evidence-claim coordination has been documented during argument evaluation. iordanou, muis and kendeou (2014) examined, using the think-aloud methodology, the processes that iordanou | f l r 110 adolescents engage in when reading a text, focusing particularly on on-line processing of evidence. adolescents rarely judged the credibility of evidence using epistemic standards such as the number of empirical studies which support a particular finding, the methodology used to produce a finding (e.g. whether the scientific method was used) or the fit of a claim to evidence. 2.1.4. epistemic standards in adulthood the growing literature examining college students’ judgments of trustworthiness of different information sources provides some insight regarding adults’ epistemic standards. adults show the ability to evaluate experts from different disciplines (e.g. biologist, chemist, earth scientist), by estimating the extent to which they might possess relevant knowledge about a specific science topic (bromme & thomm, 2015) – an ability not shown by young children. undergraduate students consider official documents as more credible sources of information than newspapers (bråten, strømsø & salmerón, 2011). however, they place unwarranted faith in textbooks (wineburg, 1991). examining undergraduate students’ dialogues with peers to gain insight into their epistemic activity in action, iordanou and constantinou (2014) have observed that even adults do not employ evidence consistently to support their claims. in iordanou and constantinou (2014) study, adults’ percentage of usage of evidence which functioned to support their claims was only 25% and the percentage of usage of evidence which functioned to weaken other’s claims was even less − 18%. kuhn (2016) in an effort to gain a better understanding of the factors underlying the limited use of evidence in argumentation, examined whether individuals’ limitations in conceptions of both evidence and causality may constrain their potential to employ evidence in argumentation. in that study, adults were presented with a scenario and were asked to choose among three options the one that could serve as the strongest evidence against an opponent’s claim. findings show that half of the adult participants chose the option which included no evidence and simply made a contrasting causal assertion, showing limitations in appreciation and application of epistemic standards pertaining to evidence-claim coordination, even in adulthood. similarly, kuhn et al. (2000) found that only half of a group of adults consisting of undergraduate students, college students and professionals reached an evaluativist way of thinking, that is an understanding that knowledge evolves through coordination of theory with data. the only exception was the group of experts − all of whom exhibited an evaluativist mode of thinking. limitations in reasoning about evidence have been observed not only in laypersons’ reasoning but also in scientists’ reasoning, such as confirmation bias in interpreting evidence in order to provide support to favourite theories. yet, despite these limitations, experts’ epistemic standards are more in line with the rigorous standards employed in formal science. scientists employ rigorous epistemic standards and practices (e.g., peer review, statistics), while they also reflect and revise those standards that make the distinction of the strongest theories at a particular time and the growth of knowledge possible (chinn & buckland, 2012). self-reflection on the way of knowing and the standards that one employs is according to habermas the most comprehensive way of knowing, and one of the highest criteria employed by doctoral examiners to judge thesis quality, compared to either the empirical-analytical way of knowing which places emphasis on facts, the objective elements of knowing, or the historical-hermeneutic way of knowing which stresses interpretation, the more subjective elements of knowing (clement et al., 2015). 2.2. epistemic understanding of evidence and theory-evidence coordination both develop underlying age changes observed in epistemic standards is changing epistemic understanding regarding evidence and its coordination with theory, which also undergoes development. three-year-olds make highly subjective judgments (wildenger, hofer & burr, 2010) and attribute thinking as reflection of external reality. one of the landmarks in this developmental progression of epistemic understanding of evidence and evidence-theory coordination is the understanding that evidence is different from a claim, which is reflected in pre-schoolers’ success in the false belief task by the age of four (perner & davies, 1991). this success reflects understanding that different information leads to different beliefs. this iordanou | f l r 111 understanding also entails the understanding that evidence is different from information; information only becomes evidence in relation to a claim. during middle childhood, the understanding that one piece of evidence is amenable to different interpretations is achieved (lalonde & chandler, 2002). the understanding that different individuals can assign different meanings to the same stimulus (carpendale & chandler, 1996) reflects achievement of more mature understanding than earlier success in tom tasks, since it involves an understanding that different beliefs could result from the same input, not different inputs. in other words, middle-school children realize not only that people can form different beliefs when they have access to different information, as was the case with tom tasks, but also when they have access to the same information. middle school students showed also a better understanding of evidence, reflected in their ability to distinguish between causes and reasons, than pre-schoolers. astington, pelletier and homer (2002) found that seven-year-olds exhibited better ability in distinguishing between the cause of a situation and a person’s reason for believing it than pre-schoolers; this ability was related with second-order false-belief understanding, that is, their awareness that people have beliefs about the content of others’ minds. yet, understanding of human knowing is not yet fully developed, as it is not applied consistently nor with appropriate justification (eisbach, 2004). also, even though middle-school children are able to understand multiple interpretations of simple stimuli, which offer clear-cut dual interpretations, such as ambiguous pictures (lalonde & chandler, 2002), nonetheless, when explicitly asked to respond to stimuli that do not offer any facilitative, perceptual cues of the existence of alternative interpretation, as is the case in most real-life situations, they are not able to do so. sandoval and millwood (2005), examining high school students’ written explanations for problems on natural selection, found that adolescents made noninterpretive references to data (e.g., the graph shows x). this finding suggests that even high school students believe that “claims are not distinct from data but are somehow embodied in them, that a particular graph or table or other inscription directly represents some aspect of the natural world and consequently has but one meaning” (p. 49, sandoval & millwood, 2005). children’s limited understanding of the fact that physical or other phenomena are not self-explanatory, but rather are amenable to different interpretations, is also reflected in their preference for direct observation as a means for knowing. when elementary school students were asked to explain how they could become more certain about what happened in a historical event or about the cause of frog deformities, for which there are contradictory accounts, most students reported that eyewitness accounts (e.g. talk to anyone who was around at that time) would be sufficient to provide an explanation. only a few elementary school students reported that investigation and interpretation of evidence can provide insights to what happened or what is the cause of the problem, respectively (iordanou, 2016; kuhn, iordanou, pease & wirkala, 2008). the understanding that evidence supports claims has its roots in early childhood – even young children draw on evidence from their personal experience to support or contradict claims (kӧymen, rosenbaum & tomasello, 2014; wildenger, hofer & burr, 2010). yet, this understanding is not fully developed even by adulthood. research in the area of argumentation shows that the epistemic understanding that evidence can be employed to offer support to theories precedes the development of the understanding that evidence also plays the important role of weakening claims (iordanou & constantinou, 2014, 2015; kuhn, zillmer, crowell, & zavala, 2013). the understanding that evidence can be used to weaken claims is related to understanding that evidence can have different interpretations and that evidence can have different functions in relation to different claims. 2.3. epistemic beliefs about standards vs. application of epistemic standards examining the development of epistemic cognition reveals two paradoxes. the first is the commonly encountered one of a discrepancy between beliefs and their expression in action. in other words, there appears to be a discrepancy between individuals’ beliefs regarding epistemic standards and the application of epistemic standards in practice. for example, although children might show to endorse the epistemic belief that experts in a domain are more reliable than non-experts, as seen in their preference between experts when iordanou | f l r 112 there are clear differences between them concerning the degree of their prior knowledge, children generally adopt in non-problematic fashion, during epistemic action, information from experts in different knowledge domains (harris & koening, 2006). elementary school students appear to adopt the epistemic belief, when asked, that the epistemic standard of model fit with evidence is useful to evaluate scientific models (pluta, chinn, & duncan, 2011), nonetheless, there is evidence that they do not employ this epistemic standard in action (iordanou, muis, & kendeou, 2014; sandoval et al., 2014). in tom tasks, when judging others’ mental states, adults underestimated the probability that a more ignorant other would search incorrectly as a result of holding a false belief, even though they were aware of the difference between their own and the other’s perspective (zhang et al., 2010; birch & bloom, 2007). in addition, in tasks entailing evaluation of texts, stømsø, bråten and britt (2011) found that, although some undergraduate students reported that they endorse the epistemic belief of justification of knowledge based on evidence, when asked to indicate the criteria on which they have based their judgments of trustworthiness in epistemic action, they reported both advanced criteria – content – but also less advanced ones – their own opinion. similarly, iordanou, muis, and kendeou (2014) reported a discrepancy between adolescents’ and adults’ epistemic knowledge and their epistemic activity in action. in that study, although some individuals acknowledged that they endorse the epistemic belief that evaluation and interpretation of evidence is central for knowing, when directly asked how they could become more certain about their knowledge, they did not engage spontaneously in evaluation of evidence during epistemic action, when they were reading a text. focusing on a particular age, we also observe lack of consistency in the application of epistemic standards. for example, pre-schoolers do not show consistency in using the epistemic standard of an informant’s history of errors, including the number and kind of errors made when choosing informants (mills, 2013); neither do they show consistency in assigning test questions correctly to different experts (aguiar, stoess & talyor, 2012). in the examples presented above an inconsistency between individual’s epistemic beliefs about standards and the application of those epistemic standards has been observed, as well as an inconsistency in the application of epistemic standards. individuals’ epistemic action is not always consistent with their epistemic beliefs. the second paradox appears in examining epistemic activity across the lifespan. although very young children show competence with respect to a particular epistemic criterion, older individuals exhibit limitations in the application of the same criterion. for example, even though some research findings show that pre-schoolers are able to judge an informant’s credibility based on the quality of the informant’s argument rather than on his or her power (castelain, bernard, van der henst, & mercier, 2015), other findings show that most undergraduate students do not engage in evaluation of arguments, examining, for example, whether scientific evidence supports a knowledge claim while researching information on the web (mason, boldrin, & ariasi, 2010) or when reading a text (iordanou, muis, & kendeou, 2014). it is proposed here that the mechanism behind development of epistemic cognition, which explains the two paradoxes described above, is the development of individuals’ epistemic awareness of their epistemic beliefs and conscious control of application of epistemic standards, an issue that we discuss below. 2.4. understanding of epistemic standards and control of their application develop and support epistemic cognition in action studying students engaging in dialogic argumentation over time offers insights regarding how both knowledge and epistemic cognition change. iordanou and constantinou (2015), employing the micro-genetic method, a powerful method for understanding epistemic cognitive development (sandoval, 2014), examined how students use evidence to influence the beliefs of their peers. eleventh graders, working with a partner, engaged in electronic argumentative dialogues with classmates who held an opposing view on the topic and in some evidence-focused reflective activities, based on transcriptions of their dialogues. another sixteen 11th graders, who studied the data base in the learning environment for the same amount of time as experimental-condition students but did not engage in an argumentative discourse activity, served as a comparison condition. the findings of this study were consistent with findings of other studies (iordanou & constantinou, 2014; kuhn & moore, 2015) in showing that after extensive engagement in argumentative iordanou | f l r 113 activities, students exhibited a shift from presenting their “right”, self-evident theories of how things are, without providing any data to support their argument beyond presenting their personal opinions, to employing data to support their positions and offering alternative interpretations for a particular piece of evidence. in addition, students developed an appreciation of the epistemic understanding that evidence can be used to weaken others’ claims, which appears to be a more challenging developmental achievement than understanding that evidence can be used to support one’s own claims. finally, students made more specific reference to evidence and its source after sustained engagement in argumentative activities, a finding which is also consistent with other studies (iordanou & constantinou, 2014), suggesting that the process of coordinating evidence with claims, and the awareness of the need to do so, came under increasing conscious control over time. the analysis of participants’ dialogues over the course of the intervention provided further support to this suggestion. in particular, the micro-genetic analysis showed that, in addition to the increase observed in the use of evidence and the function of evidence employed, an increase was observed in students’ meta-level statements regarding evidence (e.g., ‘‘give us some evidence’’, ‘‘you have not provided evidence’’) over the course of the intervention, revealing a developing epistemological understanding of the epistemic standard of evaluating a theory based on its fit to evidence. similar results were observed in chinn, duschl, duncan, buckland, and pluta’s (2008) study, where middle school students engaged in argumentation and reflective activities aimed at constructing, revising, and evaluating scientific models on the basis of evidence, over the course of an academic year. by the end of the intervention, students in the experimental condition exhibited greater advances not only in their ability to effectively coordinate models and evidence, but also in their understanding of epistemic criteria. a shift was observed from non-evidential criteria (e.g., have words and pictures) to evidential criteria, linking models to evidence. the findings of iordanou and constantinou (2015) and chinn et al.’s (2008) studies have two important implications. the first implication is that dialogic argumentation can offer a suitable setting for studying students’ epistemic activity in action and gaining a better understanding of how epistemic cognition changes. the second implication is that argumentation appears to be a promising pathway to support the development of epistemic cognition (iordanou, 2016; iordanou, kendeou, & beker, 2016; sandoval, 2005). engagement in argumentation is a fruitful way for making tacit epistemic beliefs, reflected first in epistemic action, explicit, as well as for changing epistemic beliefs (iordanou, 2016). the work of iordanou (2016) showed that engagement in dialogic argumentative activities supported the development of more evaluativist epistemic beliefs, that is an understanding that knowledge evolves through coordination of theory with data and through evaluation, the position found to be best supported by argument and evidence would be determined to have more merit compared to alternative positions (kuhn et al., 2000). the increasing acquisition of awareness and conscious control of application of epistemic standards proposed here, and reflected in the iordanou and constantinou’s (2015) findings as well as in findings from studies on testimony and tom (corriveau & harris, 2009; koenig & jaswal, 2011; mills, 2013; sobel & corriveau, 2010), are in line with other findings in cognitive development showing a developing metacognitive monitoring from childhood to adolescence (kitsantas & zimmerman, 2002; roderer & roebers, 2014; van der stel & veenman, 2010). tom research examining adults’ eye movements, while they were following a director’s instructions for moving items, shows that adults initially interpreted the director’s instructions egocentrically, just like children, but were faster and more effective in correcting a wrong interpretation (epley, morewedge & keysar, 2004). findings like this one suggest that adults’ better metacognitive control is what enables them to “correct” their egocentric errors and exhibit superior behaviour than do children (apperly, warren, andrews, grant, & todd, 2011). 2.5. specificity of epistemic standards there is ample evidence pointing to the domain-specificity of epistemic cognition (muis, bendixen, & haerle, 2006). individuals’ epistemic cognition differs across domains (kuhn et al., 2000) and advancement in epistemic cognition in one domain does not necessarily transfer in other domains (iordanou, iordanou | f l r 114 2010; 2016; hofer, 2004). in iordanou’s (2016) study, notable differences were observed in the epistemic standards employed between a social science topic and a physical science topic. when elementary school students were asked to justify their knowledge of a physical science topic – dinosaurs’ extinction – and a social science topic – home-schooling −, the majority of the students reported scientific evidence to justify their knowledge in the physical science topic, while they employed claims from general knowledge or personal experience (e.g. “you don’t have friends at home”) to justify their knowledge in the social science topic. domain differences were also observed in both participants’ epistemic beliefs and epistemic activity in action between different knowledge domains in iordanou et al.’s (2014) study. young adolescents and adults in that study engaged in more epistemic processing of evidence in the history domain than in the science domain. in particular, they engaged more in judging an evidence’s credibility while reading a text in the history domain, than in the science domain. behind this domain-specificity of epistemic cognition, reside domain-specific challenges regarding the development of epistemic cognition. kuhn et al. (2008) have suggested that in the social domain the major challenge in achieving sophisticated epistemic cognition is different from the challenge in the science domain. in a word, in the social domain, the challenge is to come to terms with the concern that human interpretation plays an unmanageable, overpowering role, while in the science domain, the major challenge is to recognize that human interpretation plays any role at all. in the science domain, the entry of human interpretation into what was previously regarded as direct perception of a single reality must be recognized and come to be understood in positive terms. human construction of alternative possibilities (multiple representations of truth, or theories) needs to be coordinated with empirical evidence, in an ongoing process that constitutes scientific work. in the social domain, in contrast, human interpretation is more readily recognized and the danger is one of a permanent stall in a radical relativism, with the evil of subjectivity seen as overpowering the quest for any knowledge beyond subjective opinion. epistemic standards also must be examined as a function of context. students’ epistemic standards differ when reflected in essays versus dialogues. in the kuhn and moore (2015) study, middle-school students used more evidence from their own personal knowledge and experience in their dialogues, about 90%, than in their essays, 40%, suggesting the dialogue was a more authentic experience for them. finally, specific content also introduces variation in standards. here, more research is needed. bråten, strømsø, and salmerón, (2011) found that readers with low topic knowledge failed to employ the most appropriate epistemic standards, whereas bromme and thomm (2015) found that adults’ judgments regarding reliable informants were not related to participants’ prior knowledge, general science knowledge or their study subject. similarly, mason, boldrin and ariasi (2010) found that prior knowledge was not related to epistemic activity in action, whereas iordanou, muis, and kendeou (2014) found that individuals’ prior knowledge predicted their epistemic cognition in action. 3. conclusions and future research the question of how knowledge changes as individuals progress through the lifespan requires better answers. research findings point to differences between children and adults in the way they make judgments. for example, tenney, small, kondrad, jaswal, and spellman (2011) found that adults take into consideration information regarding informants’ calibration, that is how well one’s confidence matches one’s likelihood of being correct, whereas children ignore this information and tend to rely more on an informant’s confidence. the review presented here proposes that epistemic understanding of theory-evidence coordination develops gradually and different forms of understanding develop at different ages. also, the present paper presents evidence showing that there is a discrepancy between epistemic beliefs and their expression in action. with age and expertise understanding of epistemic standards and control of their application develop and support epistemic cognition in action. there is a need for more developmental research, especially longitudinal studies, to enhance our understanding of the forms that epistemic cognitive development take and to explain why epistemic development occur or fail to occur. iordanou | f l r 115 to satisfy the quest for a better understanding of epistemic cognitive development, there is a need for new measures that would allow us to examine more deeply and thoroughly what develops (chinn et al., 2011). dynamic instruments that examine individuals’ epistemic cognition as a dynamic, complex construct need to be employed. some promising measures are think-aloud protocols (hofer, 2004), eye-tracking techniques, collaborative discussions and computer-based learning environments (greene, muis, & pieschl, 2010), all of them employed in micro-genetic investigations. there is also need for a better understanding of the specificity of epistemic cognition. research suggests that epistemic cognition has both general and context-specific elements (muis, bendixen, & haerle, 2006; sinatra, kienhues, & hofer, 2014). some aspects of epistemic cognition, such as the appreciation of evidence, transfer across contexts (iordanou & constantinou, 2014; 2015), whereas other aspects, such as the epistemic criteria that individuals employ for adopting and revising claims appear to be domain specific (iordanou, 2016; kuhn et al., 2008). taking into account the complex and multifaceted nature of epistemic cognition, future research needs to examine the specificity question of epistemic cognition at a more finegrained level, addressing questions such as how epistemic standards vary across conditions and why this is the case. finally, there is a need for future research to examine how the development of epistemic cognition in action can be supported. engagement in dialogic argumentation appears a promising pathway of supporting understanding that there is no single self-evident truth and that multiple interpretations may exist of the same phenomenon as the human mind plays an active role in ascribing meaning to the world (carpendale & lewis, 2006; iordanou & constantinou, 2015; moshman, 2004; walker, wartenberg, & winner, 2012). engagement also in explicit reflection about the role of evidence in reasoning and about epistemic standards are promising means for supporting an appreciation of the role of evidence in forming and revising knowledge (chinn & buckland, 2012; iordanou & constantinou 2014; 2015). future research should examine such methods further. in summary, the purpose of the present paper has been to draw a strong continuous link between the earliest understanding of other minds and the tasks that confront adults throughout the life span – that of interpreting evidence and coordinating it with what they already take to be true, in a manner over which they exercise conscious control. adults continue to do so imperfectly to be sure (kuhn, 2016) but their skill has developed from earlier levels and has the potential to continue to develop. i propose that epistemic cognition builds on increasing awareness and epistemic understanding of theory-evidence coordination and of the role of the human mind in interpreting reality. the standards for knowledge formation and revision are closely connected with epistemic understanding of theory-evidence coordination and change across the lifespan, as well as control of their application. competence in understanding theory-evidence coordination and the role of the human mind in knowing has its roots in early tom achievements and proceeds gradually from there towards more and more mature and complete understanding, in a process that ideally never ends. future research should go beyond a focus on what people believe and increase attention not only on how people choose what to believe, but on how these standards for choice themselves evolve and are applied within reallife contexts. lastly, addressing the question of how researchers and educators can best support individuals’ development in these respects promises to have profound consequences for people’s lives. keypoints the how question of knowledge change is examined evidence from tom and evidence theory coordination literature are examined a focus on epistemic activity in action is proposed the standards for knowledge formation and revision change developmentally the process increasingly comes under conscious control iordanou | f l r 116 references agruiar, n. r., stoess, c. j., & taylor, m. (2012). the development of children’s ability to fill the gaps in their knowledge by consulting experts. child development, 83(4), 1368-81. amsel, e., & brock, s. (1996). the development of evidence evaluation skills. cognitive development, 11, 523-550. doi: http://dx.doi.org/10.1111/j.1467-8624.2012.01782.x. anderson, r. c., chinn, c., chang, j., waggoner, m., & yi, h. (1997). on the logical integrity of children’s arguments. cognition and instruction, 15(2), 135-167. doi: http://dx.doi.org/10.1207/s1532690xci1502_1 apperly, i. a., warren, f., andrews, b. j., grant, j., & todd, s. (2011). developmental continuity in theory of mind: speed and accuracy of belief-desire reasoning in children and adults. child development, 82(5), 1691-1703. doi: http://dx.doi.org/10.1111/j.1467-8624.2011.01635.x astington, j. w., pelletier, j., & homer, b. (2002). theory of mind and epistemological development: the relation between children's second-order false-belief understanding and their ability to reason about evidence. new ideas in psychology, 20(2), 131-144. doi: http://dx.doi.org/10.1016/s0732118x(02)00005-3 barzilai, s., & zohar, a. (2012). epistemic thinking in action: evaluating and integrating online sources. cognition and instruction, 30(1), 39-85. doi: http://dx.doi.org/10.1080/07370008.2011.636495 bendixen, l., & rule, d. (2004). an integrative approach to personal epistemology: a guiding model. educational psychologist, 39(1), 69-80. doi: http://dx.doi.org/10.1207/s15326985ep3901_7 birch, s. a. j., & bloom, p. (2007). the curse of knowledge in reasoning about false beliefs. psychological science, 18(5), 382-386. doi: http://dx.doi.org/10.1111/j.1467-9280.2007.01909.x bråten, i., britt, m. a., strømsø, h. i., & rouet, j. (2011). the role of epistemic beliefs in the comprehension of multiple expository texts: toward an integrated model. educational psychologist, 46(1), 48-70. doi: http://dx.doi.org/10.1080/00461520.2011.538647 bråten, i., strømsø, h. i., & salmerón, l. (2011). trust and mistrust when students read multiple information sources about climate change. learning and instruction, 21(2), 180-192. doi: http://dx.doi.org/10.1016/j.learninstruc.2010.02.002 bromme, r., & thomm, e. (2015). knowing who knows: laypersons’ capabilities to judge experts’ pertinence for science topics. cognitive science, 38(8) 1-12. doi: http://dx.doi.org/10.1111/cogs.12252 carpendale, j. i., & chandler, m. j. (1996). on the distinction between false belief understanding and subscribing to an interpretive theory of mind. child development, 67(4), 1686-1706. doi: http://dx.doi.org/10.1111/j.1467-8624.1996.tb01821.x carpendale, j., & lewis, c. (2006). how children develop social understanding. oxford: blackwell. castelain, t., bernard, s., van der henst, j.-b., & mercier, h. (2015). the influence of power and reason on young maya children's endorsement of testimony. developmental science, 18(1), 1-10. doi: http://dx.doi.org/10.1111/desc.12336 chinn, c. a., buckland, l. a., & samarapungavan, a. (2011). expanding the dimensions of epistemic cognition: arguments from philosophy and psychology. educational psychologist, 46(3), 141-167. doi: http://dx.doi.org/10.1080/00461520.2011.587722 chinn, c. a., & brewer, w. f. (1993). the role of anomalous data in knowledge acquisition: a theoretical framework and implications for science instruction. review of educational research, 63, 1-49. doi: http://dx.doi.org/10.2307/1170558 chinn, c. a., & buckland, l. a. (2012). model-based instruction: fostering change in evolutionary conceptions and in epistemic practices. in k. s. rosengren, s. k. brem, e. m. evans, & g. m. sinatra (eds.), evolution challenges: integrating research and practice in teaching and learning about evolution (pp. 211-232). oxford: oxford university press. doi: http://dx.doi.org/10.1093/acprof:oso/9780199730421.003.0010 chinn, c. a., duschl, r. a., duncan, r. g., buckland, l. a., & pluta, w. j. (2008). a microgenetic classroom study of learning to reason scientifically through modeling and argumentation. in g. iordanou | f l r 117 kanselaar, j. van merriënboer, p. kircshner, &t. de jong (eds.), international perspectives in the learning sciences: creating a learning world. proceedings of the 8th international conference for the learning sciences-(vol. 3, pp. 14-15). utrecht, the netherlands: international society of the learning sciences. clement, n., lovat, t., holbrook, a., kiley, m., bourke, s., paltridge, b., ... & mcinerney, d. m. (2015). exploring doctoral examiner judgements through the lenses of habermas and epistemic cognition. in theory and method in higher education research (pp. 213-233). emerald group publishing limited. doi: http://dx.doi.org/10.1108/s2056-375220150000001010 corriveau, k. h., & harris, p. l. (2009). preschoolers continue to trust a more accurate informant 1 week after exposure to accuracy information. developmental science, 12, 188–193. doi: http://dx.doi.org/10.1111/j.1467-7687.2008.00763.x corriveau, k.h., harris, p.l., meins, e., ferneyhough, c., arnott, b., elliott, l., liddle, b., hearn, a., vittorini, l. & de rosnay, m. (2009). young children’s trust in their mother’s claims: longitudinal links with attachment security in infancy. child development, 80(3), 750-761. doi: http://dx.doi.org/10.1111/j.1467-8624.2009.01295.x eisbach, a. o. (2004). children’s developing awareness of diversity in people’s trains of thought. child development, 75(6), 1694-1707. doi: http://dx.doi.org/10.1111/j.1467-8624.2004.00810.x epley, n., morewedge, c. k., & keysar, b. (2004). perspective taking in children and adults: equivalent egocentrism but differential correction. journal of experimental social psychology, 40, 760-768. doi: http://dx.doi.org/10.1016/j.jesp.2004.02.002 greene, j. a., muis, k. r., & pieschl, s. (2010). the role of epistemic beliefs in students’ self-regulated learning with computer-based learning environments: conceptual and methodological issues. educational psychologist, 45(4), 245-257. doi: http://dx.doi.org/10.1080/00461520.2010.515932 greene, j. a., sandoval, w. a., bråten, i. (eds.). handbook of epistemic cognition. new york, ny: routledge. harris, p. l., & corriveau, k. h. (2011). young children's selective trust in informants. philosophical transactions of the royal society b: biological sciences, 366(1567), 1179-1187. doi: http://dx.doi.org/10.1098/rstb.2010.0321 harris, p. l., & koening, m. a. (2006). trust in testimony: how children learn about science and religion. child development, 77(3), 505-534. doi: http://dx.doi.org/10.1111/j.1467-8624.2006.00886.x hofer, b. k. (2004). epistemological understanding as a metacognitive process: thinking aloud during online searching. educational psychologist, 39(1), 43-55. doi: http://dx.doi.org/10.1207/s15326985ep3901_5 iordanou, k. (2010). developing argument skills across scientific and social domains. journal of cognition and development, 11(3), 293-327. doi: http://dx.doi.org/10.1080/15248372.2010.485335 iordanou, k. (2016). developing epistemological understanding through argumentation in scientific and social domains. zeitschrift für pädagogische psychologie. 30(2-3), 109-119. doi: http://dx.doi.org/10.1024/1010-0652/a000172 iordanou, k., & constantinou. c. p. (2014). developing pre-service teachers’ evidence-based argumentation skills on socio-scientific issues. learning & instruction, 34, 42-57. doi: http://dx.doi.org/10.1016/j.learninstruc.2014.07.004 iordanou, k., & constantinou. c. p. (2015). supporting use of evidence in argumentation through practice in argumentation and reflection in the context of socrates learning environment. science education, 99, 282–311. doi: http://dx.doi.org/10.1002/sce.21152 iordanou, k., kendeou., p., & beker, k. (2016). argumentative reasoning. in w. sandoval, j. greene, & i., bråten. (eds). handbook of epistemic cognition, (39-53). new york, ny: routledge. doi: http://dx.doi.org/10.4324/9781315795225 iordanou, k., muis, k., & kendeou, p. (2014). epistemic understanding and meta-level processing of evidence when reading a text. paper presented at the earli sig2 conference. amsterdam, the netherlands. iordanou | f l r 118 jaswal, v. k., & malone, l. s. (2007). turning believers into skeptics: 3-year-olds' sensitivity to cues to speaker credibility. journal of cognition and development, 8(3), 263-283. doi: http://dx.doi.org/10.1080/15248370701446392 kinzler, k. d., corriveau, k. h., & harris, p. l. (2011). children’s selective trust in native-‐accented speakers. developmental science, 14(1), 106-111. doi: http://dx.doi.org/10.1111/j.14677687.2010.00965.x kitsantas, a., & zimmerman, b. j. (2002). comparing self-regulatory processes among novice, non-expert, and expert volleyball players: a microanalytic study. journal of applied sport psychology, 14, 91–105. doi: http://dx.doi.org/10.1080/10413200252907761 koenig, m. a., & jaswal, v. k. (2011). characterizing children’s expectations about expertise and incompetence: halo or pitchfork effects? child development, 82(5), 1634-1647. doi: http://dx.doi.org/10.1111/j.1467-8624.2011.01618.x köymen, b., rosenbaum, l., & tomasello, m. (2014). reasoning during joint decision-making by preschool peers. cognitive development, 32, 74-85. doi: http://dx.doi.org/10.1016/j.cogdev.2014.09.001. kuhn, d. (2016). a role for reasoning in a dialogic approach to critical thinking. topoi, 1-8. doi: http://dx.doi.org/10.1007/s11245-016-9373-4 kuhn, d., cheney, r., & weinstock, m. (2000). the development of epistemological understanding. cognitive development, 15, 309–328. doi: http://dx.doi.org/10.1016/s0885-2014(00)00030-7 kuhn, d., iordanou, k., pease, m., & wirkala, c. (2008). beyond control of variables: what needs to develop to achieve skilled scientific thinking? cognitive development, 23, 435–451. doi: http://dx.doi.org/10.1016/j.cogdev.2008.09.006 kuhn, d., & moore, w. (2015). argumentation as core curriculum. learning: research and practice, 1(1), 66-78. doi: http://dx.doi.org/10.1080/23735082.2015.994254 kuhn, d., zillmer, n., crowell, a., & zavala, j. (2013). developing norms of argumentation: metacognitive, epistemological, and social dimensions of developing argumentive competence. cognition and instruction, 31(4), 456-496. doi: http://dx.doi.org/10.1080/07370008.2013.830618 lalonde, c. e., & chandler, m. j. (2002). children’s understanding of interpretation. new ideas in psychology, 20(2-3), 163-198. doi: http://dx.doi.org/10.1016/s0732-118x(02)00007-7 lehrer, r., & schauble, l. (2015). the development of scientific thinking. handbook of child psychology and developmental science. 2(16),1-44. (edited work) doi: http://dx.doi.org/10.1002/9781118963418.childpsy216 mason, l., ariasi, n., & boldrin, a. (2011). epistemic beliefs in action: spontaneous reflections about knowledge and knowing during online information searching and their influence on learning. learning and instruction, 21, 137-151. doi: http://dx.doi.org/10.1016/j.learninstruc.2010.01.001 mason, l., boldrin, a., & ariasi, n. (2010). searching the web to learn about a controversial topic: are students epistemically active? instructional science, 38, 607-633. doi: http://dx.doi.org/10.1007/s11251-008-9089-y miller, s. a. (2012). theory of mind: beyond the preschool years. new york, ny: psychology press. mills, c. m. (2013). knowing when to doubt: developing a critical stance when learning from others. developmental psychology, 49(3), 404-418. doi: http://dx.doi.org/10.1037/a0029500 moshman, d. (2004). from inference to reasoning: the construction of rationality. thinking and reasoning, 10(2), 221 – 239. doi: http://dx.doi.org/10.1080/13546780442000024 muis, k. r., bendixen, l. d., & haerle, f. c. (2006). domain-generality and domain-specificity in personal epistemology research: philosophical and empirical reflections in the development of a theoretical framework. educational psychology review, 18(1), 3-54. doi: http://dx.doi.org/10.1007/s10648-006-9003-6 muis, k. r., kendeou, p., & franco, g. m. (2011). consistent results with the consistency hypothesis? the effects of epistemic beliefs on metacognitive processing. metacognition and learning, 6, 45-63. doi: http://dx.doi.org/10.1007/s11409-010-9066-0 perner, j., & davies, g. (1991). understanding the mind as an active information processor: do young children have a “copy theory of mind”? cognition, 39, 51-69. doi: http://dx.doi.org/10.1016/00100277(91)90059-d iordanou | f l r 119 perry, w. g. (1970). forms of intellectual and ethical development in the college years: a scheme. new york: holt, rinehart and winston. pluta, w. j., chinn, c. a., & duncan, r. g. (2011). learners’ epistemic criteria for good scientific models. journal of research in science teaching, 48(5), 486-511. doi: http://dx.doi.org/10.1002/tea.20415 roderer, t., & roebers, c. m. (2014). can you see me thinking (about my answers)? using eye-tracking to illuminate developmental differences in monitoring and control skills and their relation to performance. metacognition and learning, 9(1), 1-23. doi: http://dx.doi.org/10.1007/s11409-013-9109-4 sandoval, w. a. (2005). understanding students’ practical epistemologies and their influence on learning through inquiry. science education, 89, 634-656. doi: http://dx.doi.org/10.1002/sce.20065 sandoval, w. a. (2014). science education's need for a theory of epistemological development. science education, 98(3), 383-387. doi: http://dx.doi.org/10.1002/sce.21107 sandoval, w. a., & millwood, k. a. (2005). the quality of students' use of evidence in written scientific explanations. cognition and instruction, 23(1), 23-55. doi: http://dx.doi.org/10.1207/s1532690xci2301_2 sandoval, w. a., sodian, b., koerber, s., & wong, j. (2014). developing children's early competencies to engage with science. educational psychologist, 49(2), 139-152. doi: http://dx.doi.org/10.1080/00461520.2014.917589 sinatra, g. m., kienhues, d., & hofer, b. k. (2014). addressing challenges to public understanding of science: epistemic cognition, motivated reasoning, and conceptual change. educational psychologist, 49(2), 123-138. doi: http://dx.doi.org/10.1080/00461520.2014.916216 sobel, d. m., & corriveau, k. h. (2010). children monitor individuals’ expertise for word learning. child development, 81(2), 669-679. doi: http://dx.doi.org/10.1111/j.1467-8624.2009.01422.x stømsø, h. i., bråten, i., & britt, m. a. (2011). do students’ beliefs about knowledge and knowing predict their judgement of texts’ trustworthiness? educational psychology, 31 (2), 177-206. doi: http://dx.doi.org/10.1080/01443410.2010.538039 taylor, m., esbensen, b. m., & bennett, r. t. (1994). children's understanding of knowledge acquisition: the tendency for children to report that they have always known what they have just learned. child development, 65, 1581-1604. doi: http://dx.doi.org/10.1111/j.14678624.1994.tb00837.x tenney, e. r., small, j. e., kondrad, r. l., jaswal, v. k., & spellman, b. a. (2011). accuracy, confidence, and calibration: how young children and adults assess credibility. developmental psychology, 47(4), 1065-1077. doi: http://dx.doi.org/10.1037/a0023273 van der stel, m., & veenman, m. v. (2010). development of metacognitive skillfulness: a longitudinal study. learning and individual differences, 20(3), 220-224. doi: http://dx.doi.org/10.1016/j.lindif.2009.11.005 walker, c. m., wartenberg t. e., & winner e. (2012). engagement in philosophical dialogue facilitates children's reasoning about subjectivity. developmental psychology, 2(1), 1-10. doi: http://dx.doi.org/10.1037/a0029870 wildenger, l. k., hofer, b. k., & burr, j. e. (2010). epistemological development in very young knowers. in l. d. bendixen, & f. c. fleucht (eds.), personal epistemology in the classroom: theory, research and implications for practice (pp. 220-257). cambridge: cambridge university press. doi: http://dx.doi.org/10.1017/cbo9780511691904.008 wineburg, s. s. (1991). on the reading of historical texts: notes on the breach between school and academy. american educational research journal, 28(3), 495-519. doi: http://dx.doi.org/10.3102/00028312028003495 zhang, t., zheng, x., zhang, l., sha, w., deák, g., & li, h. (2010). older children’s misunderstanding of uncertain belief after passing the false belief test. cognitive development, 25, 158-165. doi: http://dx.doi.org/10.1016/j.cogdev.2009.12.001 frontline learning research vol. 11 no. 1 (2023) 57 93 issn 2295-3159 corresponding author: emilie prast, faculty of social and behavioral sciences, institute of education and child studies, leiden university, wassenaarseweg 52, 2333ak leiden, the netherlands. e.j.prast@fsw.leidenuniv.nl. doi: https://doi.org/10.14786/flr.v11i1.1079 what do students think about differentiation and within-class achievement grouping? emilie j. prasta, kim stroeta, arnout koornneefa, & tom f. wilderjansbcde aeducational sciences program group, institute of education and child studies, faculty of social and behavioral sciences, leiden university, leiden, the netherlands bmethodology and statistics research unit, institute of psychology, faculty of social and behavioral sciences, leiden university, leiden, the netherlands cleids universitair medisch centrum (lumc), leiden institute for brain and cognition (libc), leiden, the netherlands dresearch group of quantitative psychology and individual differences, faculty of psychology and educational sciences, ku leuven, leuven, belgium edepartment of clinical psychology, faculty of behavioural and human movement sciences, vrije universiteit amsterdam, amsterdam, the netherlands article received 7 april 2022/ revised 19 february 2023/ accepted 22 february 2023/ available online 23 march 2023 abstract differentiation and achievement grouping are frequently implemented practices to adapt education to students’ varying educational needs based on achievement level. potential didactical and socioemotional advantages and disadvantages of these practices have been discussed in the literature. however, little is known about the perspective of students themselves. this study examined how dutch students (n = 428) perceived differentiation and within-class homogeneous achievement grouping in primary mathematics education, with attention for potential differences between students of diverse achievement levels. students of grades 1, 3 and 5 completed a questionnaire about various differentiated mathematics activities and (if applicable) within-class achievement grouping. in line with the didactical perspective on differentiation, extended instruction and less difficult tasks were appreciated most by low-achieving students whereas more difficult tasks were appreciated most by high-achieving students. students of all achievement groups had largely positive attitudes about achievement grouping and about their own achievement group. however, some differences between achievement groups were found, with less favourable results for students placed in low achievement groups. students’ responses to open-ended questions provided additional insights into the reasons behind students’ evaluations of differentiation and achievement grouping. differences between grade levels were also explored. keywords: differentiation; ability grouping; student perspective; mixed methods; mathematics education. mailto:e.j.prast@fsw.leidenuniv.nl 58 | f l r 1. introduction many teachers strive to adapt education to their students’ diverse educational needs by implementing differentiation (‘an approach by which teaching is varied and adapted to match students’ abilities using systematic procedures for academic progress monitoring and data-based decision-making.’; roy, guay, & valois, 2013, p.1187). however, differentiation is a controversial topic in the literature, particularly when it is organised by grouping students of a similar achievement level. researchers have discussed potential didactical and socioemotional advantages and disadvantages of various types of differentiation and achievement grouping (e.g., campbell, 2021; francis et al., 2017; marks, 2013; mcgillicuddy & devine, 2020; tieso, 2003; tomlinson et al., 2003; van geel et al., 2018). in this debate, students’ voices have not often been heard. given that the aim of differentiation is to adapt education to students’ needs, it is important to examine whether students themselves perceive the adaptations as successful in meeting their educational and socioemotional needs. therefore, this study investigates what students think about differentiation and within-class achievement grouping in primary mathematics education. 1.1 differentiation and achievement grouping differentiation based on students’ current academic achievement level (also called readiness-based or cognitive differentiation) entails two related processes: (1) monitoring students’ progress to determine their current achievement level and educational needs, and (2) adapting learning goals, instruction and practice to students’ current level of knowledge and skills and their corresponding educational needs (prast et al., 2015; roy et al., 2013). differentiation may be convergent or divergent (blok, 2004). in convergent differentiation, all students work towards the same goals, but the way in which students reach these goals is differentiated (e.g., with additional instruction). in divergent differentiation, students of different achievement levels work towards different learning goals. one frequently used way to organise differentiation is to group students based on achievement level. such groups may be homogeneous (similar achievement) or heterogeneous (mixed achievement), and within-class or between-class (see tieso, 2003 for an overview of grouping practices). this paper focuses on within-class homogeneous grouping, that is: subgroups of students with a similar achievement level within a class which includes a broad range of achievement levels. we use the term achievement grouping rather than ability grouping since recent guidelines for grouping and differentiation do not assume that students have a fixed ability level and instead emphasise that grouping arrangements should be flexible and responsive to changes in students’ educational needs (prast et al., 2015; tomlinson et al., 2003; van geel et al., 2018). achievement grouping can be used to differentiate instruction (e.g., additional instruction for subgroups with similar instructional needs) and practice (e.g., with tiered tasks for low-achieving, average-achieving and high-achieving students) (prast et al., 2015). this paper does not only examine students’ views on the grouping itself, but also on differentiated mathematics activities which may or may not take place in achievement groups. 1.2 mathematics education in the netherlands since the implementation of differentiation relies heavily on domain-specific pedagogical content knowledge (prast et al., 2015; van geel et al., 2018; vogt & rogalla, 2009), teachers’ implementation and students’ perceptions of differentiation are likely to be domain-specific. to study students’ perceptions of differentiation and achievement grouping in sufficient depth, this study focuses on one domain and context, namely primary mathematics education in the netherlands. this is a relevant context, because of the increased focus on data-based decision making and, accordingly, on progress monitoring and instructional adaptations in the subject of mathematics in the netherlands over the past decade (prast et al., 2018; van geel et al., 2016, 2018; visscher, 2015). a recent review (prast & hickendorff, in press) about the implementation of differentiation in primary mathematics education in the netherlands indicated that most teachers differentiate instruction and practice based on students’ achievement level at least to some extent. a mathematics lesson typically starts with a whole-class instruction. subsequently, additional instruction is often provided to low-achieving students, whereas practice tasks are frequently differentiated at three levels. 59 | f l r in the lower grades, differentiation is largely convergent since students work towards the same learning goals, but from grade 4 onwards the learning goals may also be differentiated (expertgroep doorlopende leerlijnen, 2008). differentiation is frequently organised using within-class achievement groups (i.e., subgroup instruction and tiered tasks), but alternatives such as individualised differentiation of the practice tasks using software are also used (prast & hickendorff, in press). 1.3 perspectives on differentiation and achievement grouping various theoretical perspectives on differentiation and achievement grouping can be taken. in the current study, we focus on two perspectives: a didactical perspective (concerning the teaching and learning of mathematical content, with a focus on cognitive processes) and a socioemotional perspective (concerning the social and emotional processes that may be involved when differentiated activities and achievement grouping are used). we formulated our hypotheses based on these two perspectives. while other perspectives might also be taken (e.g., a sociological perspective concerning implications of differentiation and achievement grouping at a societal level), we feel that these two perspectives are most relevant in the context of the current study, because they are closely related to students’ daily experiences with differentiation and achievement grouping (and therefore, relevant and understandable for students). 1.4 a didactical perspective on differentiation and achievement grouping from a didactical perspective, the rationale for readiness-based differentiation is that adapting instruction and practice to students’ current skill level enhances learning (tomlinson et al., 2003). according to this view, learning tasks should be at a moderate difficulty level in relation to a student’s current skills (csikszentmihalyi, 1990; murray & arroyo, 2002). when tasks are too easy, this may result in boredom and withdrawal, while confronting a student with tasks that are too difficult may lead to frustration and anxiety (csikszentmihalyi, 1990; murray & arroyo, 2002). when tasks are designed to be just within reach based on the skill level of the student, this may enhance students’ motivation and achievement (arroyo et al., 2014; csikszentmihalyi, 1990). more generally, aptitude-treatment interaction theory predicts that students need different instructional treatments, dependent upon their aptitude (readiness for learning based on current achievement level) (cronbach & snow, 1977; kalyuga, 2007). for example, direct explicit instruction may be highly effective for students with low prior knowledge but not for students with high prior knowledge (kalyuga, 2007; kirschner et al., 2006). in the literature, research relating differentiation and achievement grouping to student achievement is described. for example, a meta-analysis (deunk et al., 2018) in primary school found positive effects of interventions in which software was used to assist teachers in implementing differentiation, by continuously monitoring students’ achievement level and providing (suggestions for) differentiated instruction and practice. programmes in which differentiation was part of a broader school reform also had positive effects. within-class grouping had no overall effect on achievement (for all students together), but there was a negative effect on the achievement of students placed in low achievement groups. however, the effects of achievement grouping were difficult to interpret, because the original studies provided little information on whether and how instruction and practice were differentiated in the achievement groups. a more recent study comparing within-class grouping and whole-class teaching in the uk (jerrim, 2021) did not find clear evidence for effects of achievement grouping on student achievement. from a didactical perspective, achievement grouping is merely an organisational format that can be used to implement differentiation, provided that the groups are actually used to adapt instruction to students’ needs. however, achievement grouping may also have negative didactical consequences, for example if the grouping arrangements do not correspond accurately to students’ current achievement level (e.g., because the groups are insufficiently flexible) or if the quality of differentiation is limited (e.g., with insufficiently challenging learning materials for lower achievement groups). more generally, if the learning goals and tasks are differentiated, low-achieving students may not get the opportunity to reach the same learning goals as their high-achieving peers (divergent differentiation; blok, 2004; see also hart, 1992). 60 | f l r 1.5 a socioemotional perspective on achievement grouping from a socioemotional perspective, achievement grouping is not just a format to organise differentiation, but an educational approach that may affect socioemotional processes within a class. first, achievement grouping may affect social comparison processes, with potential effects on students’ academic self-concept. qualitative case studies have indicated that, even when neutral names are used for the achievement groups, primary school students are largely aware of the hierarchical grouping structure, especially in the higher grades (eder, 1983; gripton, 2020; marks, 2013; mcgillicuddy & devine, 2020). campbell (2021) describes two possible mechanisms for effects of achievement grouping on academic self-concept: labelling effects and reference group effects. in the case of labelling effects, students would internalise the achievement label belonging to their achievement group, with positive effects on the self-concept of students placed in high achievement groups and negative effects on the self-concept of students placed in low achievement groups. in the case of reference group effects, students would start to compare themselves to the other students placed in their achievement group rather than to the whole class, with positive effects on the self-concept of students placed in low achievement groups and negative effects on the self-concept of students placed in high achievement groups (the big-fish-little-pond effect; marsh, 1984, 1987). in a large-scale study about the effects of within-class grouping on students’ self-concept, campbell (2021) found more evidence for labelling effects than for reference group effects. in contrast, jerrim (2021) found no effects of within-class achievement grouping compared to whole-class teaching on students’ self-concept. more generally, the use of achievement groups may affect the social dynamics within a class. qualitative case studies have provided indications that placement in a high achievement group may be associated with a higher social status than placement in a low achievement group (marks, 2013; mcgillicuddy & devine, 2020). for example, students placed in high achievement groups have been described by their peers as ‘smart’, ‘good’ or ‘liked’, whereas students placed in low achievement groups have been described as ‘dumb’, ‘bad’ or ‘not liked’ (mcgillicuddy & devine, 2020). in a study (hargreaves et al., 2021) about peer relations of students placed in low achievement groups (including both within-class and between-class grouping systems), there was generally little evidence that troubles in peer relations were related to placement in a low achievement group. however, in some cases, students experienced feelings of exclusion that were related to their low achievement status. in a study including various types of between-class and within-class achievement grouping, students placed in low and high achievement groups reported achievement-related teasing (hallam et al., 2004). besides potential effects on peer interactions, achievement grouping might also affect teacher-student interactions. for example, teachers may implicitly or explicitly display different expectations about students placed in low versus high achievement groups (mcgillicuddy & devine, 2020; rubie-davies, 2014; van den bergh, 2018). taken together, this socioemotional perspective indicates that within-class achievement grouping may affect various socioemotional processes in the classroom, which may be experienced differently by students placed in low, average or high achievement groups. however, as described above, the direction of effects is not always clear and empirical research about the socioemotional aspects of within-class achievement grouping is relatively scarce. 1.6 students’ perspective on differentiation and achievement grouping when researching differentiation it makes sense to include students’ perspective, since students’ motivation and engagement will be shaped by their perceptions. little is known about students’ views on differentiation and within-class achievement grouping. since the goal of differentiation is to adapt education to students’ educational needs, it is relevant to know whether students feel that their educational needs are met by the adaptations made by teachers. besides, students can be an important source of information regarding potential socioemotional side-effects of differentiation and achievement grouping. questions may be raised regarding the validity of student perceptions as an indicator of the “best” practice in terms of student outcomes such as achievement or motivation: we cannot expect students to oversee all implications of within-class differentiation and achievement grouping. accordingly, the goal of this study is not to give a 61 | f l r complete overview of all potential effects, but to zoom in on one perspective that has not received much attention so far; that of the students themselves. previous research about this topic is scarce, and mostly focused on the grouping itself rather than on specific differentiation practices. first, there are studies comparing different types of grouping (e.g., between-class, within-class, whole-class teaching). such studies have reported negative experiences with between-class achievement grouping (boaler et al., 2000), and have presented mixed-achievement classes as a favourable alternative (hallam et al., 2004; tereshchenko et al., 2019). however, these studies did not focus on the differentiation practices that might be implemented within mixed-achievement classes. second, the smallscale qualitative studies that we have described in the previous section (eder, 1983; gripton, 2020; marks, 2013; mcgillicuddy & devine, 2020) provided insights into students’ experiences with within-class achievement grouping, with a focus on social-emotional aspects related to the grouping. these studies did not directly ask students about their preferences regarding grouping or differentiation. such studies are very scarce, but a third line of studies did ask students directly about their preferences regarding adaptations for students with special educational needs included in general education classrooms. generally, students with and without special educational needs had positive attitudes towards many of these adaptations, although they wanted everybody to have the same homework (vaughn et al., 1995; vaughn, schumm, niarhos, & daugherty, 1993; vaughn, schumm, niarhos, & gordon, 1993). these three lines of research have provided initial indications that students’ experiences or preferences may differ depending on their achievement level, although the direction of effects is not always consistent across studies. for example, one study found that low-achieving students had the most positive attitudes towards mixed-achievement classes, while highachieving students also perceived disadvantages such as a lack of challenge (tereshchenko et al., 2019). in contrast, another study found that low-achieving students tended to prefer homogeneous grouping whereas high-achieving students tended to prefer heterogeneous grouping (vaughn et al., 1995). 1.7 the current study: research questions and hypotheses the originality of the current study lies in the following. we zoomed in on students’ perspective on differentiation and grouping practices within primary school classes. first, we made the abstract concept of differentiation more concrete by asking students about various mathematics activities and relating their evaluations to students’ scores on a standardised mathematics achievement test. second, we investigated students’ opinions on within-class grouping using a mixed-methods approach combining quantitative ratings with qualitative reasons. throughout the study, we had attention for both didactical and socioemotional considerations (rather than focusing on either), and for the perspectives of students of diverse achievement levels and grade levels. the first aim of this study was to investigate whether students’ evaluations of various mathematics activities are dependent upon their achievement level (regardless of the use of achievement grouping in their class). within the didactical perspective on differentiation, the idea of aptitude-treatment interactions is central: students are supposed to have different educational needs depending on their current achievement level (cronbach & snow, 1977; prast et al., 2015; tomlinson et al., 2003). for example, the same activity may be appropriately challenging for some students and too difficult or too easy for other students. note that previous research on aptitude-treatment interactions has typically focused on the outcome of student achievement, whereas students’ perceived frequency, liking and learning from activities are the outcome variables in this study. thus, the first research question was: (1) do different students evaluate various mathematics activities differently, depending on an interaction between the type of activity and the achievement level of the student? in accordance with guidelines for differentiation (prast et al., 2015), we made a distinction between general activities for all students (whole-class instruction, working at mathematics tasks independently, working at mathematics tasks together), activities intended to serve the educational needs of low-achieving students (less difficult tasks and additional instruction in a subgroup or individually), and activities intended for high-achieving students (more difficult tasks and additional instruction about these enrichment tasks, in a subgroup or individually). based on the didactical perspective 62 | f l r on differentiation, we expected that students’ perceptions of these activities would interact with their achievement level. first, we expected that the frequency of activities as perceived by students would be dependent on achievement level. this would be in line with previous teacher self-report and observational studies indicating that many dutch teachers adapt instruction and practice activities to the achievement level of their students, for example by providing additional instruction to low-achieving students and more challenging tasks to high-achieving students (prast & hickendorff, in press). second, we expected that students’ reported liking of and learning from activities would be dependent upon students’ achievement level. based on the idea of aptitude-treatment interactions, the most probable direction of such an interaction effect would be that activities intended for low-achieving students (such as less difficult tasks) are evaluated more positively by low-achieving students whereas activities intended for high-achieving students (such as more difficult tasks) are evaluated more positively by high-achieving students. however, given the more critical views on differentiation that have also been described in the literature (e.g., hart, 1992), as well as the innovative character of this study, it remains to be seen whether these interaction effects are indeed present and whether the effects are in the hypothesised direction. the second aim of this study was to investigate students’ perceptions of within-class achievement grouping in primary school, with attention for potential differences between students placed in low, average and high achievement groups. we were not only interested in quantitative evaluations but also in the reasons behind students’ evaluations. this led to the following research questions: (2a) how do students placed in withinclass achievement groups evaluate their own achievement group and achievement grouping in general and do these evaluations differ between students placed in low, average and high achievement groups? (2b) which reasons do students provide for their evaluations? based on the indications for potentially different experiences of students placed in low, average and high achievement groups in the literature reviewed above, we expected that students’ evaluations would differ between achievement groups. however, given the scarce and inconsistent previous findings on student perceptions of within-class achievement grouping, we did not make specific predictions regarding the direction of those effects. since we expected that students’ reasons behind the quantitative evaluations might include socioemotional as well as didactical considerations (since grouping is typically used to differentiate tasks or instruction), we asked questions that probed both of these aspects. 2. method 2.1 participants and procedure data were collected in the fall of 2018 in the context of the research project ‘differentiation and motivation in primary mathematics education’, which was approved by the local ethics committee (project number ecpw-2018/210). fifty classes from 18 primary schools in the netherlands participated. after obtaining active informed consent from teachers and students, data were collected by students in the final year of academic teacher training, mostly at the school where they also did a teaching internship. the schools were diverse in terms of school size, location, and pedagogical-didactical school characteristics (e.g., public schools, schools with a religious background, montessori, etc.). we recruited one class of grades 1, 3 and 5 (in which students are typically 6-7, 8-9, and 10-11 years old) in each school to have a spread in grade levels while retaining a substantial number of classes per grade. in multigrade classes (nine classes, 18%), only students from the grades selected for our study participated. if a class had multiple teachers (common since 67% of teachers worked part-time), the teacher who most often taught that class participated. the average class size was 23 students (range 13 – 34, including students who did not participate in the research). teachers had an average of fifteen years of teaching experience (range 0 – 42 years). most teachers (n = 40, 80%) were female, reflecting the general dutch population of primary school teachers. in the context of the overarching research project, the participating teachers were interviewed and completed a questionnaire about their differentiation and achievement grouping practices. this yielded the following background information which we provide to assist in interpreting the findings of the current study. thirtytwo teachers (64%) reported that the use of achievement groups was fully integrated in their mathematics 63 | f l r teaching routine. these teachers would typically start a lesson with a whole-class instruction, followed by independent practice at three difficulty levels as provided by the curricular method. simultaneously, extended instruction would be provided to a subgroup of low-achieving students. another fourteen teachers (28%) reported to use achievement groups partly. these teachers would for example provide extended instruction to a subgroup of students who needed it, but would either provide little differentiation in the tasks or differentiate tasks in a different way, for example using software. four teachers (8%) did not or hardly work with achievement groups. the use of achievement groups was within-class, with one exception (in one school, mathematics was taught in separate classes for low-achieving and average-achieving students). of the teachers using achievement groups (partly or fully), fifteen teachers (30%) indicated to create or update grouping arrangements approximately every two to six weeks based on students’ scores on the end-ofchapter tests from the mathematics textbook. another eleven teachers (22%) reported to make new grouping arrangements twice per year based on the results of a standardised mathematics achievement test. eight teachers (16%) indicated to work with flexible groups, created per lesson or per week based on the teachers’ observations, educational software or students’ own view on whether they needed additional instruction. the remaining teachers created new groups 3 to 4 times per year (6 teachers, 12%), did not change the groups (1 teacher, 2%), created grouping arrangements in a different way (3 teachers, 6%) or had missing responses (2 teachers, 4%). across the various methods of grouping, some teachers indicated that the grouping arrangements could be adapted per lesson based on students’ needs, and that other sources of information such as students’ daily work were also used. table 1 overview of student characteristics in the full sample characteristic n grade level grade 1 45 grade 3 199 grade 5 184 gender boy 214 girl 212 missing 2 achievement level on standardised test a i (highest) 97 ii 57 iii 60 iv 55 v (lowest) 41 missing: test scores not available 118 within-class achievement group b high 108 average 93 low 54 not placed in single within-class achievement group during past 3 weeks 173 a see section 2.2.1. b see section 2.2.2. students not placed in a single within-class achievement group include students who switched between achievement groups during the past three weeks, students placed in between-class achievement groups, and students from classes without achievement grouping. 64 | f l r for the current study, the following data were collected: a student questionnaire, student achievement group placement and student achievement on a standardised mathematics achievement test. the grouping and achievement data were collected from the teacher. the student questionnaire was administered during school hours (maximum duration: 45 minutes). in grades 3 and 5, this questionnaire was administered to all students for whom informed consent had been obtained (n = 383). after an explanation and practice of the answering format, students completed the questionnaire independently under supervision of the research assistant. in grade 1, the same questionnaire was administered individually due to the students’ young age (typically six years). the research assistant read the questions out loud, after which the student could point to the answer (when applicable, see section 2.2.3) or say his or her answer, which was written down by the research assistant. since this individual administration was too resource-intensive to include all students of grade 1, we randomly selected one low-achieving, one average-achieving and one high-achieving student from the students with informed consent in each class (n = 45). thus, the total sample consisted of 428 students with a mean age of 8 years (range 5 – 12 years). an overview of student characteristics is provided in table 1. 2.2 measures 2.2.1 mathematics achievement test mathematics achievement was measured with the nationally administered cito mathematics achievement tests, of which the validity and reliability have been demonstrated (janssen, verhelst, et al., 2005; koerhuis & keuning, 2011). various grade level versions of the test are available, including a version for kindergarteners (janssen, scheltens, et al., 2005; koerhuis, 2010). each grade level version covers multiple mathematics domains, appropriate for the grade level of the students (janssen, verhelst, et al., 2005; koerhuis & keuning, 2011). if available (administration of the test is not mandatory), the most recent test scores obtained at the end of the previous schoolyear were collected from the teacher. to ensure the comparability of scores across various grade-level versions of the test, we used the achievement level scores which reflect students’ achievement level relative to a nationally representative sample: i = 80 th – 100 th percentile, ii = 60 th – 80 th percentile, iii = 40 th – 60 th percentile, iv = 20 th – 40 th percentile, v = 0 – 20 th percentile. in the analyses, these scores were recoded (and centered on the middle group) such that the highest value represents the highest achievement level (v = -2, iv = -1, iii = 0, ii = 1, i = 2). 2.2.2 achievement group placement if teachers used within-class achievement grouping, teachers were asked to indicate for each participating student in which group(s) the student had been placed during the past three weeks. the answering options were low, average, high and other (e.g., when students had switched between groups). we asked for a period of three weeks because this was long enough to experience relatively stable placement in an achievement group, but not so long that most of the students in the sample would have changed groups within that period. since comparisons between students placed in low, average and high achievement groups might be confounded when students had switched between groups, only students placed in a single within-class achievement group during the past three weeks were included in the analyses about achievement grouping. while students’ achievement group placement was generally related to their achievement on the mathematics achievement test, this correspondence was not perfect (see appendix 1 in the supplementary materials). 2.2.3 student questionnaire about differentiated activities and achievement grouping the student questionnaire was developed for this study by the first and second author, based on a model for differentiation in mathematics that is frequently implemented in the netherlands (prast et al., 2015). prior to large-scale administration, a small-scale pilot was conducted by administering it one-on-one to two students of grades 1, 3 and 5 to check whether students understood the questions and the answering format. the first part of the questionnaire asked students about nine mathematics activities (based on prast et al., 2015) representing three categories of activities: general activities (whole-class instruction, working on tasks independently, and working on tasks together), differentiated activities intended for low-achieving students (working on less difficult tasks, extended instruction in a subgroup, and individual extended instruction) and 65 | f l r differentiated activities intended for high-achieving students (working on enrichment tasks, subgroup instruction about enrichment tasks, and individual instruction about enrichment tasks). based on the teacher interview, the names of the activities as used in the students’ own class were used (e.g., if enrichment tasks were called “mathematics tigers”, that term would be used rather than “more difficult tasks”). for each of the nine activities, students were asked how often they were engaged in this activity, how much they liked this activity, and how much they learned from it (for a total of 27 questions). the answering format was a fivepoint scale represented by dots as shown in figure 1 (adapted from park et al., 2016). students had to indicate the dot that corresponded to their answer: the smallest dot corresponded to the smallest magnitude (e.g., never receiving whole-class instruction) and the largest dot corresponded to the largest magnitude (e.g., receiving whole-class instruction every lesson). do you ever get whole-class instruction? never always figure 1. sample item with answering format adapted from park et al. (2016). the second part of the questionnaire (consisting of 4 closed-ended and 5 open-ended questions) was only administered to students whose teachers used achievement groups. students were asked how much they liked to be in their achievement group (answered on the dot-likert scale described above), what they liked (openended) and did not like (open-ended) about being in that achievement group, how much they learned from being in that achievement group (dot-likert scale), why they learned much or little in that achievement group (open-ended), whether and why they would rather be in a different achievement group (yes or no followed by an open-ended explanation), and whether and why they would prefer a system without achievement groups (yes or no followed by an open-ended explanation). note that there is no strict separation between the differentiated activities (part 1) and achievement grouping (part 2): in many classes, within-class achievement grouping was used to organise the differentiated activities. however, the first part focused on differentiated activities, regardless of whether achievement groups were used in that class, while the second part focused on students’ perceptions of achievement grouping (only if relatively fixed achievement groups were used in their class). 2.2.4 analyses data were analysed in two parts, corresponding to the questionnaire and our research questions. part 1 focused on students’ ratings of the various activities, relative to their achievement level as measured with the cito mathematics achievement test. thus, students from classes without relatively fixed achievement groups were also included in these analyses, given that differentiated activities could also be organised in different ways, whereas students for whom cito test scores were not available were excluded from these analyses. part 2 focused on students’ perceptions regarding achievement grouping and therefore included only students who had been placed in a single within-class achievement group. data were analysed in r with multilevel models to take into account the nesting of students within classes and schools. to enhance readability, we focus on the most important steps here while additional statistical details are provided in the supplementary materials (see appendix 2). to answer the first research question (do different students evaluate various mathematics activities differently, depending on an interaction between the type of activity and the achievement level of the student?), we analysed whether there was a significant interaction effect between the type of activity (e.g. 66 | f l r whole-class instruction, independent work, etc.) and students’ achievement level in predicting students’ selfreported frequency of being engaged with the activities, students’ liking of the activities and students’ perceived learning from the activities. for each of these three outcome variables separately, we estimated four-level regression models with activity ratings (e.g., liking of whole-class instruction, independent work, etc.) nested in students (i.e., repeated measures), who were nested in classes nested within schools. to evaluate the significance of the interaction effect, we compared the fit of a full model including main effects of activity and achievement as well as an interaction between these variables to the fit of a reduced model without the interaction effect using a likelihood ration test (lrt). a significant lrt indicates that the full model fits the data significantly better. significant interaction effects were followed up with post-hoc tests evaluating the effect of achievement on the ratings of each activity. finally, potential interactions with grade level were explored (see section 3.2.1). as described in the introduction, we expected that the interaction effects would be significant. to answer the second research question, we performed two types of analyses: quantitative analyses to answer question 2a (how do students placed in within-class achievement groups evaluate their own achievement group and achievement grouping in general and do these evaluations differ between students placed in low, average and high achievement groups?), and qualitative analyses to answer question 2b (which reasons do students provide for their evaluations?). the closed-ended questions were analysed using multilevel models, herewith taking the clustering of students within classes and schools into account. achievement group was specified as a predictor of the outcome variable of interest: liking of achievement group, learning from achievement group, wanting to be in a different achievement group or preferring to work without achievement groups. as the latter two outcomes are dichotomous, a multilevel version of logistic regression was used. likelihood ratio tests were used to determine whether this model fitted the data significantly better than a model without achievement group as a predictor. as described in the introduction, we hypothesised that there would be differences between students placed in low, average and high achievement groups, but did not have specific hypotheses regarding the direction of effects. the analyses of the open-ended questions were exploratory and intended to give more meaning to the quantitative results. an inductive approach, in which students’ answers rather than theoretical expectations formed the starting point, was taken (linneberg & korsgaard, 2019). based on an initial review of students’ answers, the first author developed various lower-order codes (about 10 – 20 codes per question) which specified (aspects of) answers that were given by multiple students. since several codes were related to similar themes, these lower-order codes were then classified as belonging to one of the following higherorder themes, which were largely similar across questions: (1) answers about independent work and its difficulty level (2) answers about (social interactions within) the achievement group (3) answers related to instruction and the teacher (4) answers about learning and understanding and (5) general and other answers (mostly unspecific, e.g. “i just like it”). the coding scheme which was thus developed by the first author was discussed with the second author and revised accordingly. a random sample of 50 cases (21% of the 238 cases with achievement grouping data) was coded by both authors to determine interrater agreement. cohen’s kappa for the lower-order codes ranged between .70 and .86 for the various questions, and percentage agreement between 74.0% and 87.0%, indicating fair to good interrater agreement. the most frequent reason for non-agreement was that one of the authors had coded a statement with an unspecific code (i.e., general or other), whereas the other author had coded it with a specific code. after reviewing the cases of non-agreement, the second author understood the choices the first author had made and mostly agreed with them. the full sample was coded by the first author. 3. results 3.1 part 1: students’ perceptions of differentiated activities the analyses for part 1 included 310 students who provided data on both the questionnaire and the achievement test (for some analyses, n is slightly smaller due to missing data on single items in the questionnaire). students’ achievement test scores were distributed as follows: i (highest achievement) = 97 67 | f l r students, ii = 57 students, iii = 60 students, iv = 55 students, v = 41 students. in accordance with our sampling procedure, most students were in grade 3 (n = 139) and 5 (n = 150). only n = 21 students of grade 1 were included in these analyses, since achievement data of the previous year (when they were still in kindergarten) were not available for many students (see also section 3.1.1). descriptive statistics and results from the model building stage can be found in the supplementary materials (appendix 2). in line with our hypothesis, there were significant interaction effects of activity by achievement for all outcome variables. that is, the full models including the interaction term had a significantly better fit than reduced models without this interaction term and this was the case for students’ ratings of the frequency (χ 2 (8) = 182.36, p <.001), liking (χ 2 (8) = 161.75, p <.001), and amount of learning from these activities (χ 2 (8) = 95.74, p <.001). these interaction effects are visualised in figure 2, which displays the estimated means based on the full models for students’ self-reported frequency (left column), liking (middle column), and learning (right column) of activities, split by achievement level (for visual clarity, the achievement levels ii and iv are not displayed, but these follow the same linear regression). the figure shows that some activities were rated more highly by low-achieving students compared to high-achieving students, whereas this pattern was reversed for other activities. for example, easier tasks (printed in red) were rated more highly by lowachieving students (red dots), whereas enrichment tasks (printed in blue) were rated more highly by highachieving students (blue squares). in the following paragraphs, these interaction effects are interpreted further based on post-hoc tests evaluating the significance of the effect of students’ achievement level on students’ ratings for each activity. first, we consider students’ reported frequency of being engaged in the activities. the general activities of whole-class instruction and working together were reported equally frequently by students of all achievement levels. however, high-achieving students reported to work independently more often (although this difference seems small in the figure, it was significant). as hypothesised, the activities intended for lowachieving students (less difficult tasks, extended instruction in a subgroup, and individual extended instruction) were reported more frequently by low-achieving students. regarding the activities for highachieving students, high-achieving students reported to work on enrichment tasks more frequently, as hypothesised. however, high-achieving students did not report to receive more instruction about enrichment tasks, either in a subgroup or individually: these activities were generally reported infrequently, regardless of the achievement level of the students. second, we consider students’ liking of the activities. regarding the general activities, whole-class instruction was liked somewhat more by low-achieving students, independent work was liked somewhat more by high-achieving students, whereas working together was appreciated equally by students of all achievement levels. as expected, the activities intended for low-achieving students (less difficult tasks, extended instruction in a subgroup, and individual extended instruction) were liked more by low-achieving students. regarding the activities for high-achieving students, enrichment tasks were liked more by highachieving students, in line with our hypothesis. however, students’ liking of instruction about enrichment tasks (in a subgroup or individually) did not depend on students’ achievement level. this should be viewed in light of the relatively low reported frequency of instruction about enrichment tasks. third, we consider students’ reported amount of learning from the activities. students’ reported learning from the general activities (whole-class instruction, independent work and working together) was not dependent upon their achievement level. as expected, low-achieving students reported to learn more from the activities intended for low-achieving students (less difficult tasks, extended instruction in a subgroup, and individual extended instruction) compared to high achieving students. note, however, that even the lowestachieving students rated working on easier tasks lower than the general activity of working independently (as can be seen in figure 2). similar to the results regarding frequency and liking, high-achieving students did report to learn more from enrichment tasks, but did not report to learn more from instruction about enrichment tasks compared to students of other achievement levels. 68 | f l r figure 2. estimated means of student-reported frequency, liking and learning from activities split by achievement level based on the final multilevel models. general activities are printed in black, activities intended for low-achieving students are printed in red, and activities intended for high-achieving students are printed in blue. error bars represent 95% confidence intervals. 3.1.1 differences between grade levels in students’ perceptions of differentiated activities we explored whether the above results varied between grade levels, by adding grade level as a variable to the analyses and examining whether there were three-way interactions of activity by achievement level by grade level. again, the significance of the interactions was determined using likelihood ratio tests comparing the model with three-way interaction to the model without three-way interaction. the three-way interaction was significant for students’ reported frequency of activities (χ 2 (16) = 65.20, p <.001) and learning from activities (χ 2 (16) = 39.55, p = 0.001), but not significant for liking of activities (χ 2 (16) = 25.473, p = 0.062). follow-up analyses split by grade level indicated that the activity by achievement level interaction was not significant in any of the grade 1 analyses. however, this result should be interpreted with caution given the small sample size in grade 1. 1 within grades 3 and 5, all activity by achievement level interactions were significant. 1 we repeated the analyses with teachers’ estimation of students’ achievement level rather than the standardised achievement test as an independent variable, thereby increasing the sample size to n = 40. this yielded similar results, namely no significant activity by achievement level interaction effects in grade 1. 69 | f l r figure 3. estimated means of student-reported frequency, liking and learning from activities split by achievement level and grade level based on the final multilevel models. general activities are printed in black, activities intended for lowachieving students are printed in red, and activities intended for high-achieving students are printed in blue. error bars represent 95% confidence intervals. these similarities and differences between grade levels are illustrated in figure 3. in the left column representing grade 1, students’ ratings of the frequency (upper row), liking (middle row), and learning (lower row) of activities have broad confidence intervals which mostly overlap between students of diverse achievement levels. this illustrates that there were no significant differences between students of diverse achievement levels, although the pattern of ratings for liking does suggest systematic differences in the expected direction. in grades 3 (middle column) and 5 (right column), the pattern of effects was similar to the overall analyses, and the differences between low-achieving and high-achieving students tended to become more pronounced between grades 3 and 5. for example, regarding enrichment tasks, it can be seen 70 | f l r that the pattern of higher ratings by higher-achieving students was already present in grade 3 (for frequency, liking and learning of the activities), but these differences became more pronounced in grade 5, as indicated by the larger distance between the scores of low-achieving students and high-achieving students. for easier tasks and extended instruction in a subgroup, the differences between low-achieving students and highachieving students also seemed to increase (partially) between grade 3 and grade 5. however, for individual extended instruction, the difference between achievement levels did not seem to increase between grade 3 and grade 5, nor for the remaining activities (general activities and instruction about enrichment tasks), for which no (big) differences between achievement levels were present in the total sample. summing up, students’ ratings of activities tended to become more strongly related to students’ achievement level in higher grades. 3.2 part 2: students’ perceptions of achievement grouping the analyses of part 2 included 240 students who had been placed in the same within-class achievement group during the past three weeks (low: n = 52, average: n = 87, high: n = 101). for an overview of all codes that resulted from the qualitative analyses and their frequency, see appendix 3 in the supplementary materials. 3.2.1 students’ liking of their achievement group first, we asked students how much they liked to be in their achievement group. the likelihood ratio test indicated that the model with achievement group as a predictor of liking did not fit the data significantly better than the reduced model without achievement group as a predictor (χ 2 (2) = 5.21, p =.074). the raw means indicated that students placed in low (m = 3.94, sd = 1.24), average (m = 4.16, sd = 1.26) and high achievement groups (m = 4.35, sd = 0.98) generally liked to be in their achievement group. while the raw means increased with achievement level, the likelihood ratio test indicated that these differences were not significant. most responses to the open-ended question what students liked about their achievement group were related to the higher-order theme of independent work and its difficulty level (see table a8). while comments about this theme were made frequently by students from all achievement groups, the content of the answers differed between achievement groups. many students from high achievement groups mentioned that they liked challenges and difficult tasks (“because if i have this work, it’s a challenge”). in contrast, students from low achievement groups tended to appreciate that tasks were not too difficult, and some also mentioned that they liked to have fewer tasks, enabling them to finish their work. students from average achievement groups mentioned that the difficulty level was appropriate (not too difficult and not too easy) and sometimes explicitly mentioned that “it matches my level”. a second frequently mentioned higher-order theme comprised comments related to the achievement group itself, including its members and the social interactions between them. across achievement groups, students made positive comments about the members of the group (“the children in this group are kind”) and about being able to help each other (“because if you don’t know something the other children can help you”). a few students from average and high achievement groups made explicit comments about liking to be in a higher group: “then i think sometimes that i am a bit smarter together with the other children and that gives me confidence”. the remaining answers were related to the higher-order themes of learning and understanding (e.g., “then you learn more of it”), to instruction and the teacher (e.g., “that you get more help”), or belonged to the category of general and other answers (e.g., “i just like it"). upon the question what students did not like about their achievement group, fewer than half of the students mentioned a specific aspect that they did not like (see table a9). in fact, many students replied that they liked everything, although this was mentioned somewhat less frequently by students from low achievement groups. most of the specific answers were related to the higher-order themes of independent work and its difficulty and the achievement group. negative aspects mentioned by students placed in low achievement groups included too easy tasks, boredom and wanting to be in a higher group. negative aspects mentioned by 71 | f l r students placed in high achievement groups included needing to work hard or fast, distraction (by other students but also by the teacher explaining to another group), difficult work, and stress. for students placed in average achievement groups, some of the answers resembled those of students in low achievement groups (e.g., too easy, wanting to be in a higher group) whereas others were similar to those of students in high achievement groups (e.g., too difficult, needing to work too hard). relatively few students made negative comments related to the higher-order themes of learning and understanding (e.g., “i don’t learn so much and i want to get better”) and instruction and the teacher (e.g., “you get additional explanation when you understand it already”). 3.2.2 students’ learning in their achievement group second, we asked students how much they learned in their achievement group. the likelihood ratio test indicated a significant effect of achievement group (χ 2 (2) = 7.17, p =.028), indicating that students’ perceived amount of learning differed between achievement groups. students from high achievement groups (m = 4.40, sd = 0.96) perceived to learn more than students from average (m = 4.21, sd = 1.03) and low (m = 3.98, sd = 1.13) achievement groups. note, however, that all these means are relatively high on a fivepoint scale. students’ responses to the question why they learned much or little in that achievement group were frequently related to the higher-order theme of independent work and its difficulty (see table a10). this theme was most prominent for students from high achievement groups, who mentioned frequently that they learned much because of the higher difficulty level of the tasks: “you learn much because you also get more difficult sums”. when students from low and average achievement groups referred to this theme, their answers were more mixed: some mentioned an appropriate difficulty level as a reason for learning much, whereas others indicated that they did not learn so much because of an inappropriate difficulty level (too easy or too difficult). the question why students learned much or little also provoked relatively many comments related to the higher-order theme of learning and understanding. students from average and high achievement groups tended to explain why they learned or understood more (“you learn new goals every time”), whereas students placed in low achievement groups referred to learning and understanding both positively (“i learn much because i understand it better now”), and negatively (“because i don’t understand it at all”). the higher-order theme of instruction and the teacher was mentioned by students across achievement groups, mostly in a positive way (“because the teacher gives you more explanations and does more sums with you”). some students also provided answers related to the higher-order theme of the achievement group, which were similar to the answers described as reasons for liking or not liking the achievement group. again, a substantial proportion (about one-third) of the comments was classified as general or other (“i learn much because i learn much from it”). 3.2.3 students’ preference for an achievement group third, we asked students whether they would prefer to be in a different achievement group. the likelihood ratio test indicated a significant main effect of achievement group (χ 2 (2) = 24.68, p <.001). as can be seen in table 2, about half of the students currently placed in a low achievement group would prefer to be in a different achievement group, compared to only 11% of students currently placed in a high achievement group. for students from low achievement groups, reasons for wanting to stay in the same achievement group included appropriate difficulty in the current group), and positive comments about the group members and interaction in the current group (see table a11). in contrast, other students placed in low achievement groups wanted to move to another group because they wanted more difficult tasks, enrichment tasks, or more challenge, because they thought they would learn more or get better at mathematics in another group, because of the members of the desired group or for the sake of being in a higher group. for students placed in high achievement groups, reasons for wanting to stay in the same group were mainly related to appreciating the current difficulty level of the tasks and specifically enrichment tasks (“enrichment tasks are fun, so i want to keep those”), although a few students wanted to move to a different group because they thought that the current material was too difficult. for students from average achievement groups, comments 72 | f l r related to the higher-order theme of tasks and their difficulty level were among the most frequent reasons (besides general comments) to want to stay in the same group, but also to switch achievement groups. table 2 students’ preference for being in another achievement group current achievement group preference for other group wish to stay in current group low (n = 52) 25 (48.1%) 27 (51.9%) average (n = 84) 30 (35.7%) 54 (64.3%) high (n = 97) 11 (11.3%) 86 (88.7%) total (n = 233) 66 (28.3%) 167 (71.7%) 3.2.4 students’ preference for working with or without achievement groups finally, we asked students whether they would prefer to work without achievement groups. the likelihood ratio test indicated no significant main effect of achievement group (χ 2 (2) = 1.25, p = 0.535). across achievement groups, about 75% of students wanted to retain the achievement groups (see table 3). the most frequent reasons for wanting to retain achievement groups included general positive comments about the current grouping system, as well as between-student differences or appropriate difficulty in the current system (see table a12). some students explicitly described differences between students as an argument for differentiation: “well, if some students find it difficult and others find it hard and everybody does the same, i just don’t think it’s handy”. reasons for preferring to work without groups included the opportunity to learn more in a system without achievement groups and general negative comments about achievement groups. a few students explicitly mentioned equality, stating that they would like everybody to get the same tasks or to be equal. while all of these reasons for and against achievement grouping were mentioned by students of all achievement groups, the general tendency was that students from high achievement groups relatively frequently mentioned an appropriate difficulty level and the opportunity to learn more as reasons for retaining the grouping system. table 3 students’ preference for working with or without achievement groups achievement group without achievement groups with achievement groups low (n = 51) 15 (29.4%) 36 (70.6%) average (n = 84) 18 (21.4%) 66 (87.6%) high (n = 99) 22 (22.2%) 77 (77.8%) total (n = 234) 55 (23.5%) 179 (76.5%) 3.2.5 differences between grade levels in students’ perceptions of achievement grouping to explore potential differences between grade levels in students’ answers to the closed-ended questions, interactions with grade level were added to the analyses. none of these interactions were significant, indicating that the results were similar across grade levels. an extensive analysis of between-grade level differences in the open-ended answers is beyond the scope of this paper. we did explore the relative frequency of the answering categories across grade levels and found that students in higher grades tended to give relatively more specific answers (i.e., fewer general and other answers) than students in lower grades. in addition, relatively more of the answers of students in higher grades tended to be related to tasks and difficulty. in students’ explanations of why they would prefer to work with or without groups, most of the answers referring to either equality as an argument for no grouping or between-student differences as an argument for grouping were made by students from the highest grade. 73 | f l r 4. discussion while potential didactical and socioemotional advantages and disadvantages of differentiation and achievement grouping have been discussed in the literature, few studies have asked the opinion of students themselves. the current study extends the literature by exploring students’ perspective on differentiated activities and within-class achievement grouping, with attention for potential differences between students of diverse achievement levels. 4.1 students’ perceptions of differentiated activities our first research question was whether different students evaluate various types of mathematics activities differently, depending on their achievement level. as hypothesised, there were significant interactions between the type of activity and students’ achievement level for all three outcome variables: perceived frequency, liking and learning of the activities. in line with guidelines for differentiation and with previous studies in which teachers reported their use of differentiation strategies (prast et al., 2015; roy et al., 2013; van geel et al., 2018), low-achieving students perceived to receive extended instruction and less difficult tasks more frequently whereas high-achieving students worked at more difficult tasks more frequently. the infrequent occurrence of specific instruction for high-achieving students is not recommended, but also corresponds with previous findings (inspectorate of education, 2019; prast & hickendorff, in press). regarding students’ liking and learning of the various activities, we found that activities intended for lowachieving students such as less difficult tasks and extended instruction were rated more highly by lowachieving students, whereas more difficult tasks were rated more highly by high-achieving students. these are examples of perceived aptitude-treatment interactions (cronbach & snow, 1977; kalyuga, 2007). however, the following observations should be kept in mind. first, scores for general activities such as whole-class instruction were also high across achievement groups. second, students’ liking and learning from activities seemed to be related to students’ reported frequency of engaging in these activities. this might imply that students simply like, and perceive to learn from, activities to which they are used. nevertheless, if students’ experiences with activities would have been strongly negative, it seems unlikely that a higher frequency of an unpleasant activity would increase students’ liking of that activity. third, students generally reported to learn less from less difficult tasks, although this was less pronounced for lowachieving students than for high-achieving students. this is related to the issue of convergent versus divergent differentiation (blok, 2004). in the higher grades of primary school, the tasks in the lowest tier of many mathematics textbooks lead towards lower end-of-school learning goals than the tasks in the highest tier (expertgroep doorlopende leerlijnen, 2008). thus, it may be true that students in the higher grades learn less from less difficult tasks in the sense of covering less content (regardless of the degree of understanding of that content). finally, while these results are largely in line with our hypotheses based on the didactical perspective, students’ ratings of liking and learning from activities may also have been influenced by socioemotional factors. these are discussed in the following section. 4.2 students’ perceptions of achievement grouping our second research question was how students of diverse achievement levels evaluate their own achievement group and achievement grouping in general. our results provide partial support for the hypothesis that students’ perceptions of achievement grouping would differ between students placed in low, average and high achievement groups. generally, the average scores for liking and learning from one’s own achievement group were quite high. students’ liking of their own achievement group did not differ significantly across groups, but students’ perceived degree of learning did vary between achievement groups, with lower scores for students placed in low achievement groups. overall, about 70% of the students were satisfied with their achievement group placement, which is more than has been reported in previous studies about between-class grouping (boaler et al., 2000; hallam et al., 2004). however, this question revealed the most pronounced differences between achievement groups: around 50% of students placed in low achievement groups would prefer to be in a different group, compared to only 10% of students placed in high achievement groups. nevertheless, around 75% of students across achievement groups wanted to retain the grouping system (although this could also reflect a general desire for things to stay as they are; hallam et al. 74 | f l r (2004) also found that most students did not want to change anything about the grouping practices in their school). taken together, these results suggest quite positive attitudes towards grouping in general, but somewhat less positive experiences with placement in a low achievement group compared to a high achievement group. in addition to these quantitative ratings, we investigated which reasons students provided for their evaluations. as expected, students’ answers to the open-ended questions included socioemotional as well as didactical considerations, although didactical considerations seemed to be more prominent. students clearly evaluated the use of achievement grouping in relation to the use of differentiated activities. in line with the didactical perspective on differentiation, many students mentioned didactical advantages including the appropriate amount and difficulty level of independent work (mentioned by students of all achievement groups), challenge (mainly mentioned by students placed in high achievement groups) and the possibility to get additional instruction and to understand the material better (mainly mentioned by students placed in low and average achievement groups). however, some students also mentioned didactical disadvantages. some students from low achievement groups perceived the work as too easy, did not appreciate additional instruction, wanted more challenge or thought that they would learn more in a higher group. in contrast, some students from high achievement groups felt that the material was too difficult or that they needed to work too fast, which was sometimes stressful. some students from average achievement groups made similar comments in both directions (too difficult or too easy). ideally, differentiation should ensure that the tasks and instruction are appropriately challenging for the students in each achievement group (prast et al., 2015). students’ answers indicate that, while many students perceived the difficulty level as appropriate, other students did not. moreover, challenge seemed to be viewed by many students as something belonging exclusively to the high achievement group. enrichment tasks were highly valued by students from high achievement groups, but some students from low and average achieving groups also expressed the desire to work on enrichment tasks. compared to previous studies on between-class achievement grouping (boaler et al., 2000; hallam et al., 2004), the students in our sample generally seemed to be more positive about the didactical advantages of achievement grouping, but some of the perceived disadvantages resembled those mentioned in previous studies (e.g., a lack of challenge in low achievement groups, needing to work too fast in high achievement groups). as expected, students’ answers also included socioemotional considerations, but these were not always related to the socioemotional perspective on achievement grouping as described in the introduction (i.e., based on social comparisons of achievement level). many students made comments about their achievement group that did not seem to be related directly to the achievement level of that group, such as (dis-)liking the members of the group or having positive or negative interactions within the group. this importance of peer interactions in general in students’ perceptions of schooling echoes previous findings by hargreaves et al. (2021). partly in line with previous studies (eder, 1983; marks, 2013; mcgillicuddy & devine, 2020), we found some indications for social comparisons based on achievement group placement. a few students mentioned that they liked to know their own level, or the level of the other students. students occasionally mentioned the fact that their group was low (also: “bad”) or high (also: “the best” or “smart”) as a negative or positive aspect of being in that group. while such comments were relatively infrequent, they support the idea that within-class achievement groups may strengthen social comparison processes by making students more aware of whose achievement is low, average or high compared to the class average (i.e., labelling effects; campbell, 2021). based on students’ spontaneous answers to our open-ended questions, explicit teasing or stigmatisation based on achievement group placement did not seem to play a major role in the current sample. of course, these findings do not exclude the possibility of implicit stigmatisation or social status associated with achievement group placement (see marks, 2013; mcgillicuddy & devine, 2020; van den bergh, 2018). taken together, students’ answers provide some support for potential socioemotional sideeffects of within-class achievement grouping, including social comparisons of achievement level, but do not indicate pervasive negative social effects of placement in a low achievement group or of achievement grouping in general. this might be partly explained by differences between countries in the way in which 75 | f l r achievement grouping is implemented. since the achievement groups in this study were typically used only part of the time (besides whole-class activities) and were relatively flexible, it could be that this reduced the potential negative socioemotional effects of achievement grouping (education endowment foundation, 2018). if grouping arrangements are sufficiently flexible to respond to students’ current achievement level and corresponding educational needs, as recommended (prast et al., 2015), students might evaluate them more positively than when students perceive to be stuck in a (low) achievement group. note, however, that the degree of flexibility of the grouping arrangements differed substantially between teachers in the current sample (see section 2.1). 4.3 differences between grade levels we also explored whether students’ views on differentiation and achievement grouping differed between grade levels. we emphasise that our findings in grade 1 should be viewed as exploratory, given the small sample size. by and large, there seemed to be a trend towards more pronounced opinions in the higher grades. students’ reported liking and learning from activities were more strongly related to student achievement level in higher grades. the quantitative ratings of achievement grouping were similar across grades, but students in higher grades gave relatively more specific answers to the open-ended questions. this may have several reasons. first, due to maturation, older students may have been better able to express their opinions. this is likely to have affected the open-ended questions more strongly than the closed-ended questions. second, older students may have developed more pronounced opinions about differentiation due to more experience with differentiation. third, through socialisation, older students may have endorsed the values of an educational system which assumes that lessons should be adapted to between-student differences in achievement level (raveaud, 2005). in future research, it would be interesting to follow a group of students longitudinally from grade 1 onwards to examine how students’ views on differentiation and achievement grouping develop over time. 4.4 limitations, conclusions and implications students’ perceptions of differentiation and achievement grouping were central to this study. we do not claim that students always know what is best for them in terms of learning outcomes (see kirschner & van merriënboer, 2013). nevertheless, we feel that it is important to consider what students themselves think about the degree to which differentiation is meeting their educational needs, even if it would only be to explain teachers’ choices better to students if students would fundamentally disagree with the approach taken. this did not seem to be the case: by and large, students had quite positive attitudes towards differentiation. with a self-report questionnaire, there is always a risk of socially desirable answers. however, our general impression is that students responded quite frankly, maybe also due to their young age (e.g., “i don’t understand a shit of it”). our method of data collection offered several advantages. by asking students to quantitively rate specific mathematics activities (and relating this to students’ achievement level in the analyses), we could investigate the complex construct of differentiation in a way that was easy to understand for students as well as relatively quick and standardised. this enabled data collection on a larger scale and, therefore, provided more opportunities to quantify and generalise differences (or similarities) between students of diverse achievement levels than is typically possible in small-scale qualitative studies. by combining the quantitative ratings of activities and achievement grouping with open-ended questions, we also gained insights into students’ reasons behind their quantitative evaluations, although small-scale qualitative studies can study this in more depth. this study did not examine whether the way in which differentiation and achievement grouping were implemented affected students’ perceptions. for example, the quality of differentiation i.e., the degree to which adaptations are carefully matched to students’ educational needs may also affect students’ perceptions. in the current study, students’ achievement group placement did not always correspond with their achievement on the standardised achievement test administered at the end of the previous schoolyear. 76 | f l r while teachers may have created the achievement groups based on other and more recent achievement information (e.g., curriculum-based tests, daily mathematics work), it might also mean that some students were placed in an achievement group that was not appropriate for their achievement level. this might partially explain why some students perceived the work in their achievement group as too easy or too difficult. in addition, the flexibility of the grouping arrangements might affect students’ perceptions (education endowment foundation, 2018). finally, the use of multigrade classes may have implications for teachers’ practices (for example, if teachers create achievement groups within each grade level, they will need to divide their attention over more achievement groups than a teacher teaching a single-grade class) which may in turn affect students’ perceptions of differentiation and achievement grouping. these would be interesting issues to explore in future research. this study focused on differentiation in a specific context, namely primary mathematics education in the netherlands. due to substantial differences between countries and content areas in the traditions and practices of differentiation and achievement grouping, these results may not be directly generalisable to other countries or other content areas. in the netherlands, for example, differentiation practices seem to be somewhat similar for reading, in the sense that teachers might offer more or less difficult reading materials or more or less instruction to different subgroups of students, based on their achievement level (although the way in which instruction and practice are adapted to the needs of low-achieving or high-achieving students may be qualitatively different in reading compared to mathematics, because of the different content area and belonging didactical models). however, for other subjects such as science, teachers seem to use different approaches to differentiation (slim et al., 2022). compared to other countries, the achievement grouping practices in the netherlands may be relatively flexible, which might partially explain the relatively positive evaluations compared to previous studies (e.g., eder, 1983; gripton, 2020; marks, 2013; mcgillicuddy & devine, 2020). future research could not only study the generalisability of the findings across contexts, but could also use naturally occurring differences in the implementation of differentiation and achievement grouping between various countries and domains to investigate how these differences in implementation might affect students’ perceptions. from the current findings in the context of primary mathematics education in the netherlands, we conclude that students had largely positive attitudes about differentiation and achievement grouping. students appreciated it when the amount and difficulty of tasks and instruction were adapted to their current achievement level, and did not like either too easy or too difficult work. while the majority of students across achievement groups wanted to retain the achievement grouping system and reported high liking of their achievement group, students placed in low achievement groups reported to learn less from their group and more often had the desire to be in a different group. didactical considerations such as wanting to learn more or wanting to be challenged seemed to be more prominent in students’ reasoning than socioemotional considerations such as the social status associated with an achievement group. our findings have the following implications. many students displayed positive attitudes to learning: they liked to learn more, did not like to be distracted by other students, and wanted to be challenged. many students specifically mentioned that they liked or wanted to have enrichment tasks. to retain this positive attitude towards learning, we think that it would be helpful to encourage rather than discourage students who want to try more difficult tasks. this relates to the topic of self-regulation, which has been receiving increasing attention in the differentiation literature: ideally, students should be able to co-decide (in collaboration with the teacher) whether they need additional instruction and which tasks they should do (van geel et al., 2018). this might reduce negative experiences with work that is perceived as too hard or too easy. future research could also examine ways in which the perceived benefits of adapting education to students’ achievement level can be retained, while reducing potential socioemotional or didactical disadvantages of placement in a low achievement group. this might include ways to adapt instruction and practice to students’ educational needs more flexibly based on students’ current understanding of specific mathematical content (perhaps, also using adaptive educational software to assist the teacher in making these 77 | f l r choices), as well as variation of grouping arrangements (for example by using heterogeneous groups in situations where students of diverse achievement levels can learn from each other). in such future research, the perspective of students themselves should not be overlooked. keypoints students’ voices have not often been heard in the debate on differentiation and achievement grouping in this mixed-methods study, primary school students (n = 428) evaluated differentiated activities and within-class achievement grouping didactical and socioemotional aspects, and potential differences between students of diverse achievement levels were considered students had largely positive views on differentiated activities and achievement grouping but students also perceived disadvantages of placement in a low achievement group acknowledgments we thank all students and teachers who were involved in this study, as well as the reviewers of this manuscript, for their valuable contributions. references arroyo, i., woolf, b. p., burelson, w., muldner, k., rai, d., & tai, m. (2014). a multimedia adaptive tutoring system for mathematics that addresses cognition, metacognition and affect. international journal of artificial intelligence in education, 24(4), 387–426. https://doi.org/10.1007/s40593-0140023-y bates, d., maechler, m., bolker, b., & walker, s. (2015). fitting linear mixed-effects models using lme4. journal of statistical software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01. blok, h. (2004). adaptief onderwijs: betekenis en effectiviteit [adaptive education: meaning and effectivity]. pedagogische studiën, 81(1), 5–27. boaler, j., wiliam, d., & brown, m. (2000). students’ experiences of ability grouping disaffection, polarisation and the construction of failure. british educational research journal, 26(5), 631–648. https://doi.org/10.1080/713651583 campbell, t. (2021). in-class ‘ability’-grouping, teacher judgements and children’s mathematics selfconcept: evidence from primary-aged girls and boys in the uk millennium cohort study. cambridge journal of education. https://doi.org/10.1080/0305764x.2021.1877619 cronbach, l. j., & snow, r. e. (1977). aptitudes and instructional methods: a handbook for research on interactions. irvington. csikszentmihalyi, m. (1990). flow: the psychology of optimal experience. harper perennial. deunk, m. i., smale-jacobse, a. e., de boer, h., doolaard, s., & bosker, r. j. (2018). effective 78 | f l r differentiation practices:a systematic review and meta-analysis of studies on the cognitive effects of differentiation practices in primary education. educational research review, 24, 31–54. https://doi.org/10.1016/j.edurev.2018.02.002 eder, d. (1983). ability grouping and students’ academic self-concepts: a case study. the elementary school journal, 84, 149–161. https://doi.org/10.2307/1001307 education endowment foundation. (2018). sutton trust-education endowment foundation teaching and learning toolkit. education endowment foundation. https://educationendowmentfoundation.org.uk/resources/teaching-learning-toolkit expertgroep doorlopende leerlijnen. (2008). over de drempels met rekenen: consolideren, onderhouden, gebruiken en verdiepen. expertgroep doorlopende leerlijnen. faber, j. m., & visscher, a. j. (2016). de effecten van snappet: effecten van een adaptief onderwijsplatform op leerresultaten en motivatie van leerlingen. [the effects of snappet: effects of an adaptive educational platform on student achievement and motivation]. universiteit twente. francis, b., connolly, p., archer, l., hodgen, j., mazenod, a., pepper, d., sloan, s., taylor, b., tereshchenko, a., & travers, m. c. (2017). attainment grouping as self-fulfilling prophesy? a mixed methods exploration of self confidence and set level among year 7 students. international journal of educational research, 86, 96–108. https://doi.org/10.1016/j.ijer.2017.09.001 gripton, c. (2020). children’s lived experiences of ‘ability’ in the key stage one classroom: life on the ‘tricky table.’ cambridge journal of education, 50(5), 559–578. https://doi.org/10.1080/0305764x.2020.1745149 hallam, s., ireson, j., & davies, j. (2004). primary pupils’ experiences of different types of grouping in school. british educational research journal, 30(4), 515–533. https://doi.org/10.1080/0141192042000237211 hargreaves, e., buchanan, d., & quick, l. (2021). “look at them! they all have friends and not me”: the role of peer relationships in schooling from the perspective of primary children designated as “lowerattaining.” https://doi.org/10.1080/00131911.2021.1882942. https://doi.org/10.1080/00131911.2021.1882942 hart, s. (1992). differentiation. part of the problem or part of the solution? the curriculum journal, 3(2), 131–142. https://doi.org/10.1080/0958517920030203 inspectorate of education. (2019). reken-en wiskundeonderwijs aan potentieel hoogpresterende leerlingen [mathematics education for potentially high-achieving students]. inspectie van het onderwijs. janssen, j., scheltens, f., & kraemer, j. m. (2005). rekenen-wiskunde groep 3-8: handleidingen [mathematics test grade 1-6: manuals]. cito. janssen, j., verhelst, n., engelen, r., & scheltens, f. (2005). wetenschappelijke verantwoording van de toetsen lovs rekenen-wiskunde voor groep 3 tot en met 8 [scientific justification of the mathematics tests for grade 1 through 6]. cito. jerrim, j. (2021). the association between within-class grouping and children’s achievement in mathematics during year 2, year 5 and year 9. school choices report. education endowment foundation. https://educationendowmentfoundation.org.uk/public/files/within_class_grouping_report_-_final.pdf kalyuga, s. (2007). expertise reversal effect and its implications for learner-tailored instruction. educational 79 | f l r psychology review, 19(4), 509–539. https://doi.org/10.1007/s10648-007-9054-3 kirschner, p. a., sweller, j., & clark, r. e. (2006). why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based, experiential, and inquirybased teaching. educational psychologist, 41(2), 75–86. https://doi.org/10.1207/s15326985ep4102_1 kirschner, p. a., & van merriënboer, j. g. (2013). do learners really know best? urban legends in education. educational psychologist, 48, 169–183. https://doi.org/10.1080/00461520.2013.804395 koerhuis, i. (2010). rekenen voor kleuters [mathematics for kindergarteners]. cito. koerhuis, i. ., & keuning, j. (2011). wetenschappelijke verantwoording van de toetsen rekenen voor kleuters voor groep 1 en 2 [scientific justification of the tests mathematics for kindergarteners]. cito. lenth, r. (2020). emmeans: estimated marginal means, aka least-squares means. r package version 1.5 2.1. https://cran.r-project.org/package=emmeans linneberg, m. s., & korsgaard, s. (2019). coding qualitative data: a synthesis guiding the novice. qualitative research journal, 19(3), 259–270. https://doi.org/10.1108/qrj-12-2018-0012/full/xml marks, r. (2013). “the blue table means you don’t have a clue”: the persistence of fixed-ability thinking and practices in primary mathematics in english schools. forum, 55(1), 31. https://doi.org/10.2304/forum.2013.55.1.31 marsh, h. w. (1984). self-concept, social comparison, and ability grouping: a reply to kulik and kulik. in source: american educational research journal (vol. 21, issue 4). winter. marsh, h. w. (1987). the big-fish-little-pond effect on academic self-concept [article]. journal of educational psychology, 79(3), 280–295. https://doi.org/10.1037//0022-0663.79.3.280 mcgillicuddy, d., & devine, d. (2020). ‘you feel ashamed that you are not in the higher group’— children’s psychosocial response to ability grouping in primary school. british educational research journal, 46(3), 553–573. https://doi.org/10.1002/berj.3595 murray, t., & arroyo, i. (2002). toward measuring and maintaining the zone of proximal development in adaptive instructional systems. lncs, 2363, 749–758. park, d., tsukayama, e., gunderson, e. a., levine, s. c., & beilock, s. l. (2016). young children’s motivational frameworks and math achievement: relation to teacher-reported instructional practices, but not teacher theory of intelligence. journal of educational psychology, 108(3), 300–313. https://doi.org/10.1037/edu0000064 prast, e.j. & hickendorff, m. (in press). how do dutch teachers implement differentiation in primary mathematics education? in: r. maulana, m. helms-lorenz, & r.m. klassen (eds.). effective teaching around the world: theoretical, empirical, methodological and practical insights. springer. prast, e. j., van de weijer-bergsma, e., kroesbergen, e. h., & van luit, j. e. h. (2015). readiness-based differentiation in primary school mathematics: expert recommendations and teacher self-assessment. frontline learning research, 3(2), 90–116. https://doi.org/10.14768/flr.v3i2.163 prast, e. j., van de weijer-bergsma, e., kroesbergen, e. h., & van luit, j. e. h. (2018). differentiated instruction in primary mathematics: effects of teacher professional development on student achievement. learning and instruction, 54. https://doi.org/10.1016/j.learninstruc.2018.01.009 80 | f l r raveaud, m. (2005). hares, tortoises and the social construction of the pupil: differentiated learning in french and english primary schools. british educational research journal, 31(4), 459–479. https://doi.org/10.1080/01411920500148697 roy, a., guay, f., & valois, p. (2013). teaching to address diverse learning needs: development and validation of a differentiated instruction scale. international journal of inclusive education, 17(11), 1186–1204. https://doi.org/10.1080/13603116.2012.743604 rubie-davies, c. m. (2014). becoming a high expectation teacher: raising the bar. routledge. singer, j. d., & willett, j. b. (2003). applied longitudinal data analysis: modeling change and event occurrence. oxford university press. slim, t., van schaik, j., hotze, a. & raijmakers, m. (2022, july). differentiatie in het wetenschap & technologie-onderwijs: overtuiging en praktijk van aankomendeen expertleerkrachten. [differentiation in science & technology education: attitudes and practices of pre-service teachers and expert teachers]. poster presented at the onderwijs research dagen [educational research days], hasselt, belgium. tereshchenko, a., francis, b., archer, l., hodgen, j., mazenod, a., taylor, b., pepper, d., & travers, m. c. (2019). learners’ attitudes to mixed-attainment grouping: examining the views of students of high, middle and low attainment. research papers in education, 34(4). https://doi.org/10.1080/02671522.2018.1452962 tieso, c. l. (2003). ability grouping is not just tracking anymore. roeper review, 26(1), 29–36. tomlinson, c. a., brighton, c., hertberg, h., callahan, c. m., moon, t. r., brimijoin, k., conover, l. a., & reynolds, t. (2003). differentiating instruction in response to student readiness, interest, and learning profile in academically diverse classrooms: a review of literature. journal for the education of the gifted, 27(2–3), 119–145. https://doi.org/10.1177/016235320302700203 van den bergh, l. (2018). waarderen van diversiteit in het onderwijs [valuing diversity in education]. fontys opleidingscentrum speciale onderwijszorg. van geel, m., keuning, t., frèrejean, j., dolmans, d., van merriënboer, j., & visscher, a. j. (2018). capturing the complexity of differentiated instruction. school effectiveness and school improvement, 30(1), 51–67. https://doi.org/10.1080/09243453.2018.1539013 van geel, m., keuning, t., visscher, a. j., & fox, j. p. (2016). assessing the effects of a school-wide data-based decision-making intervention on student achievement growth in primary schools. american educational research journal, 53(2), 360–394. https://doi.org/10.3102/0002831216637346 vaughn, s., schumm, j. s., klingner, j., & saumell, l. (1995). students’ views of instructional practices: implications for inclusion. learning disability quarterly, 18(3), 236–248. https://doi.org/10.2307/1511045 vaughn, s., schumm, j. s., niarhos, f. j., & daugherty, t. (1993). what do students think when teachers make adaptations? teaching and teacher education, 9(1), 107–118. https://doi.org/10.1016/0742051x(93)90018-c vaughn, s., schumm, j. s., niarhos, f. j., & gordon, j. (1993). students’ perceptions of two hypothetical teachers’ instructional adaptations for low achievers. the elementary school journal, 94(1), 87–102. 81 | f l r visscher, a. j. (2015). over de zin van opbrengstgericht(er) werken in het onderwijs [about the value of (more) data-based decision making in education]. gion. vogt, f., & rogalla, m. (2009). developing adaptive teaching competency through coaching. teaching and teacher education, 25(8), 1051–1060. https://doi.org/10.1016/j.tate.2009.04.002 wickham, h. (2016). ggplot2: elegant graphics for data analysis. springer verlag new york. supplementary materials overview of appendices: appendix 1: correspondence between achievement test scores and achievement group placement appendix 2: additional information about the analyses in part 1 appendix 3: additional information about the qualitative analyses in part 2 appendix 1: correspondence between achievement test scores and achievement group placement table a1 correspondence between achievement test scores and achievement group placement achievement test score low achievement group average achievement group high achievement group i (highest) 2 14 59 ii 3 22 21 iii 13 30 4 iv 17 10 9 v (lowest) 17 4 6 table a1 displays the (imperfect) correspondence between students’ achievement group placement and their scores on the achievement test, is based on 231 students who had been placed in a single within-class achievement group during the past three weeks and for whom achievement test scores were available. note that the achievement test scores were collected at the end of the previous school year. reasons for noncorrespondence may include the assignment to achievement groups based on other measures and more recent sources of information (e.g., curriculum-based tests, students’ responses during mathematics lessons). 82 | f l r appendix 2: additional information about the analyses in part 1 descriptive statistics of the outcome variables are provided in tables a2 (frequency), a3 (liking), and a4 (learning). table a2 means and standard deviations of student-reported frequency of activities split by achievement level achievement level v (lowest) iv iii ii i (highest) overall activity m sd m sd m sd m sd m sd m sd whole-class instruction 4.22 1.01 3.96 0.96 4.07 0.99 3.89 0.99 3.83 1.16 3.96 1.05 working independently 4.02 0.94 3.80 1.21 3.95 1.11 4.23 0.93 4.28 0.86 4.09 1.01 working together 3.66 1.22 3.18 1.11 3.05 1.06 3.16 1.00 3.20 1.21 3.22 1.14 easier tasks 3.41 1.41 2.65 1.39 2.70 1.43 2.14 1.44 2.25 1.56 2.54 1.51 subgroup extended instruction 3.20 1.36 3.05 1.39 2.63 1.40 1.98 1.09 1.86 1.10 2.42 1.36 individual extended instruction 3.10 1.32 2.64 1.21 2.61 1.35 2.30 1.24 2.04 1.31 2.44 1.33 enrichment tasks 2.58 1.65 2.64 1.60 2.85 1.62 3.62 1.57 4.09 1.30 3.31 1.64 subgroup enrichment instruction 2.80 1.56 2.22 1.34 2.15 1.44 2.26 1.32 2.45 1.34 2.36 1.39 individual enrichment instruction 2.62 1.61 2.18 1.45 2.19 1.41 2.37 1.37 2.42 1.37 2.35 1.42 overall 3.30 1.46 2.93 1.43 2.91 1.47 2.88 1.47 2.93 1.53 note. students rated the frequency of activities on a five-point scale (5 = highest). 83 | f l r table a3 means and standard deviations of student-reported liking of activities split by achievement level achievement level v (lowest) iv iii ii i (highest) overall activity m sd m sd m sd m sd m sd m sd whole-class instruction 3.80 1.19 3.58 1.27 3.27 1.16 3.30 1.15 3.22 1.27 3.39 1.23 working independently 3.44 1.43 3.45 1.58 3.53 1.33 3.89 1.32 3.78 1.27 3.65 1.38 working together 4.20 1.17 4.05 1.37 4.30 1.01 3.98 1.17 3.89 1.37 4.05 1.25 easier tasks 3.80 1.38 3.25 1.57 3.10 1.54 2.46 1.66 2.38 1.58 2.88 1.63 subgroup extended instruction 3.88 1.36 3.27 1.46 3.29 1.50 2.61 1.50 2.35 1.27 2.95 1.50 individual extended instruction 3.50 1.55 2.84 1.51 2.58 1.42 2.45 1.33 2.34 1.30 2.65 1.44 enrichment tasks 2.52 1.57 3.20 1.59 3.10 1.64 4.11 1.30 4.15 1.25 3.56 1.56 subgroup enrichment instruction 3.15 1.67 3.02 1.50 2.83 1.53 3.16 1.33 3.05 1.35 3.04 1.45 individual enrichment instruction 2.88 1.62 2.89 1.52 2.41 1.43 2.88 1.36 2.74 1.40 2.75 1.45 overall 3.47 1.52 3.29 1.52 3.16 1.49 3.20 1.48 3.10 1.49 note. students rated their liking of activities on a five-point scale (5 = highest). 84 | f l r table a4 means and standard deviations of student-reported learning from activities split by achievement level achievement level v (lowest) iv iii ii i (highest) overall activity m sd m sd m sd m sd m sd m sd whole-class instruction 4.15 1.06 3.93 1.26 3.48 1.30 3.75 1.18 3.70 1.11 3.77 1.19 working independently 3.85 1.20 3.84 1.23 4.18 0.91 4.23 0.87 4.10 0.88 4.06 1.00 working together 4.10 1.09 3.80 1.21 3.92 1.06 3.74 1.19 3.82 1.10 3.86 1.13 easier tasks 3.27 1.45 2.89 1.50 2.69 1.51 2.09 1.34 2.20 1.43 2.54 1.50 subgroup extended instruction 4.17 1.18 4.09 1.14 3.72 1.20 3.33 1.50 3.43 1.37 3.69 1.33 individual extended instruction 3.95 1.20 4.00 1.22 3.71 1.30 3.51 1.35 3.57 1.37 3.71 1.31 enrichment tasks 3.33 1.61 4.24 1.04 4.12 1.14 4.52 1.04 4.57 0.66 4.25 1.12 subgroup enrichment instruction 3.74 1.52 3.69 1.36 3.73 1.28 3.72 1.24 3.82 1.28 3.75 1.31 individual enrichment instruction 3.65 1.53 3.71 1.33 3.49 1.32 3.68 1.31 3.65 1.31 3.64 1.34 overall 3.80 1.35 3.80 1.30 3.67 1.29 3.62 1.38 3.65 1.33 note. students rated their learning from activities on a five-point scale (5 = highest). analytical strategy data were analysed in r with multilevel models to take into account the nested data structure (e.g., scores of students within a class/school being correlated). for all analyses, in order to take the dependencies in the data due to the nested data structure into account, we used random intercepts at the various levels (i.e., student, class and school). as a four-level random intercept model is already quite complex for our data, we decided not to include random slopes in the model. the models were fitted with the package lme4 (bates et al., 2015), post-hoc analyses were conducted with the package emmeans (lenth, 2020), and figures were plotted with ggplot2 (wickham, 2016). effect coding was used. effect coding differs from dummy 85 | f l r coding in the sense that weights other than 0 and 1 (i.e., standard dummy coding) can be assigned to the various levels of a categorical variable, which facilitates the interpretation of the fixed effects. first, we estimated an empty model – also called unconditional means model (singer & willett, 2003) to investigate the amount of variance at the various levels. second, main effects of activity and achievement level were added to the model. third, the interaction between activity and achievement level was added. to evaluate the significance of main effects and interaction effects, likelihood ratio tests (lrt) were used to compare the fit of the full model (i.e., including the effect of interest) to the fit of a reduced model without that main effect or interaction. empty models the empty models indicated that by far the most variance was at the level of the various activities rated by the same students (i.e., repeated measures, see table a5). the variance at the student level was somewhat larger for the degree to which students perceived to learn from activities (13.1%) than for the other outcome variables. the amount of variance at the class and school level was quite small (0.5 – 2.2%), but these levels were retained in the analyses anyway to correct for any clustering effects at these levels. table a5 distribution of the outcome variance across the different levels in the data level frequency liking learning activity (within-student) 92.7% 90.6% 84.8% student 4.0% 5.8% 13.1% class 1.4% 2.2% 1.6% school 1.9% 1.4% 0.5% likelihood ratio tests comparing model fit table a6 provides an overview of the results of the likelihood ratio tests comparing model fit. as can be seen in the table, the model including the interaction between activity and achievement level had the best fit compared to a reduced model for all outcome variables. table a6 outcomes of likelihood ratio tests comparing model fit frequency liking learning model comparison χ2 df p χ2 df p χ2 df p main effect of activity added to empty model 621.79 8 <.001 248.03 8 <.001 326.19 8 <.001 main effect of achievement added to previous model 4.55 1 0.03 6.36 1 0.01 2.05 1 0.152 interaction effect added to previous model 182.36 8 <.001 161.75 8 <.001 95.74 8 <.001 86 | f l r post-hoc tests for the interaction effects table a7 provides an overview of the post-hoc tests for the interaction effect. a significant effect means that students’ achievement level predicts their ratings for that activity. since a score of 1 on the achievement tests reflects the highest achievement and 5 the lowest, a positive value for the effect indicates a negative effect of achievement level (i.e., activity ratings are higher for low-achieving students) whereas a negative value indicates a positive effect of achievement level (i.e., activity ratings are higher for high-achieving students). table a7 post-hoc tests of the interaction effect: the effect of achievement on students’ reported frequency, liking and learning for each activity separately activity and outcome variable 𝛽 se df t p whole-class instruction frequency -0.07 0.05 2491 -1.50 0.13 liking -0.12 0.06 2514 -2.22 0.03 learning -0.08 0.05 2126 -1.62 0.11 working independently frequency 0.11 0.05 2488 2.16 0.03 liking 0.12 0.06 2511 2.15 0.03 learning 0.08 0.05 2126 1.64 0.10 working together frequency -0.06 0.05 2491 -1.23 0.22 liking -0.07 0.06 2511 -1.22 0.22 learning -0.04 0.05 2131 -0.86 0.39 easier tasks frequency -0.25 0.05 2488 -4.88 <.001 liking -0.34 0.06 2512 6.11 <.001 learning -0.27 0.05 2142 5.08 <.001 subgroup extended instruction frequency -0.37 0.05 2488 -7.23 <.001 liking -0.36 0.06 2515 -6.43 <.001 learning -0.11 0.05 2142 5.08 <.001 individual extended instruction frequency -0.23 0.05 2496 -4.63 <.001 liking -0.23 0.06 2522 -4.16 <.001 learning -0.11 0.05 2148 -2.31 <.001 enrichment tasks frequency 0.43 0.05 2496 8.40 <.001 liking 0.41 0.06 2521 7.37 <.001 learning 0.25 0.05 2142 5.08 <.001 subgroup enrichment instruction frequency -0.02 0.05 2495 -0.44 0.66 liking 0.01 0.06 2517 0.27 0.79 learning 0.03 0.05 2156 0.61 0.54 individual enrichment instruction frequency 0.01 0.05 2494 0.15 0.88 liking -0.01 0.06 2520 -0.15 0.88 learning 0.01 0.05 2137 0.15 0.88 87 | f l r appendix 3: additional information about the qualitative analyses in part 2 tables a8 through a11 provide an overview of the answering categories (lower-order codes organised by higher-order themes) for each question, as well as the number of times these categories were mentioned by students placed in low, average and high achievement groups. table a8 students’ responses to the question: what do you like about being in your achievement group? frequency of comments a answering category example low (n = 59) average (n = 97) high (n = 110) total (n = 266) answers about work / difficulty 17 (28.8%) 32 (33.0%) 54 (49.1%) 103 (38.7%) appropriate difficulty you don’t need to do too difficult or too easy stuff 8 (13.6%) 15 (15.5%) 29 (26.4%) 52 (19.6%) challenge because if i have this work, it’s a challenge 1 (1.7%) 2 (2.1%) 13 (11.8%) 16 (6.0%) working more / doing many tasks so you can start to work immediately 3 (5.1%) 6 (6.2%) 4 (3.6%) 13 (4.9%) tasks / activities are fun you get nice tasks 1 (1.7%) 1 (1.0%) 7 (6.4%) 9 (3.4%) fewer tasks and/or finish earlier i need to do less and then i finish earlier 4 (6.8%) 3 (3.1%) 1 (0.9%) 8 (3.0%) my level it matches my level 0 (0.0%) 5 (5.2%) 0 (0.0%) 5 (1.9%) answers about group 17 (28.8%) 15 (15.5%) 23 (20.9%) 55 (20.7%) positive about group members / dynamics the children in this group are kind 8 (13.6%) 7 (7.2%) 13 (11.8%) 28 (10.5%) helping each other / working together because if you don’t know something the other children can help you 5 (8.5%) 4 (4.1%) 5 (4.6%) 14 (5.3%) concentrate / no distraction because it’s a quiet group and i can concentrate better 4 (6.8%) 2 (2.1%) 2 (1.8%) 8 (3.0%) group rank: being in a high(er) / smart(er) group then i think sometimes that i am a bit smarter together with the other children and that gives me confidence 0 (0.0%) 2 (2.1%) 3 (2.7%) 5 (1.9%) answers about learning 7 (11.9%) 12 (12.4%) 11 (10.0%) 30 (11.3%) learn more / faster / new things then you learn more of it 4 (6.8%) 8 (8.3%) 9 (8.2%) 21 (7.9%) understanding better then you understand better 2 (3.4%) 3 (3.1%) 0 (0.0%) 5 (1.9%) getting smarter / better at math / higher level then you get super smart 1 (1.5%) 1 (1.0%) 2 (1.8%) 4 (1.5%) answers about instruction / teacher 5 (8.5%) 11 (11.3%) 8 (7.3%) 24 (9.0%) (more) explanation / that you get more help 4 7 3 14 88 | f l r instruction / help (6.8%) (7.2%) (2.7%) (5.3%) positive about teacher the teacher thinks of a nice way 0 (0.0%) 2 (2.1%) 4 (3.6%) 6 (2.3%) no (additional) instruction / explanation you don’t need to listen to another explanation 1 (1.7%) 2 (2.1%) 1 (0.9%) 4 (1.5%) general and other answers 13 (22.0%) 27 (27.8%) 14 (12.7%) 54 (20.3%) unspecific / don’t know / other i just like it 10 (17.0%) 17 (17.5%) 9 (8.2%) 36 (13.5%) nothing is nice / negative comment i don’t like it at all 3 (5.1%) 7 (7.2%) 3 (2.7%) 13 (4.9%) everything is nice everything about this group is nice 0 (0.0%) 3 (3.1%) 2 (1.8%) 5 (1.9%) a number and percentage of comments within that achievement group. the total number of comments exceeds the number of students since some answers belonged to two categories. table a9 students’ responses to the question: what don't you like about being in your achievement group? frequency of comments a answering category example low (n = 55) average (n = 89) high (n = 105) total (n = 249) answers about work / difficulty 9 (16.4%) 15 (16.9%) 19 (18.1%) 43 (17.3%) needing to work much / longer / fast you need to finish quickly 1 (1.8%) 3 (3.4%) 11 (10.5%) 15 (6.0%) inappropriate difficulty: too hard sometimes it’s difficult 1 (1.8%) 5 (5.6%) 6 (5.7%) 12 (4.8%) boring / takes a long time because it takes a super long time. i find it super boring. 2 (3.6%) 4 (4.5%) 1 (1.0%) 7 (2.8%) inappropriate difficulty: too easy because it’s too easy now 4 (7.3%) 1 (1.1%) 1 (1.0%) 6 (2.4%) want more challenge because i like to be challenged 1 (1.8%) 2 (2.3%) 0 (0.0%) 3 (1.2%) answers about group 9 (16.4%) 8 (9.0%) 20 (19.1%) 37 (14.9%) distraction i am distracted when the others talk 4 (7.3%) 3 (3.4%) 12 (11.4%) 19 (7.6%) negative about group members / dynamics we quarrel sometimes 2 (3.6%) 2 (2.3%) 4 (3.8%) 8 (3.2%) group rank: want to be in higher group / not nice to be in low group of course, i would rather be in the plus-group [= highest group] 3 (5.5%) 3 (3.4%) 0 (0.0%) 6 (2.4%) stress about high achievement group the stress 0 (0.0%) 0 (0.0%) 4 (3.8%) 4 (1.6%) answers about learning 3 (5.4%) 5 (5.6%) 1 (1.0%) 9 (3.6%) not understanding when i don’t understand 1 (1.8%) 3 (3.4%) 1 (1.0%) 5 (2.0%) 89 | f l r learning less / not much i don’t learn so much and i want to get better 1 (1.8%) 1 (1.1%) 0 (0.0%) 2 (0.8%) being (called) bad / stupid it’s not nice to be so bad 1 (1.8%) 1 (1.1%) 0 (0.0%) 2 (0.8%) answers about instruction / teacher 2 (3.6%) 2 (2.3%) 4 (3.8%) 8 (3.2%) negative about instruction / teacher you get additional explanation when you understand it already 2 (3.6%) 2 (2.3%) 4 (3.8%) 8 (3.2%) general and other answers 32 (58.2%) 59 (66.3%) 61 (58.1%) 152 (61.0%) everything is nice / positive comment there’s nothing about this group that i don’t like 12 (21.8%) 34 (38.2%) 41 (39.1%) 87 (35.0%) unspecific / don’t know / other i just don’t like it 20 (36.4%) 25 (28.0%) 20 (19.0%) 65 (26.1%) a number and percentage of comments within that achievement group. the total number of comments exceeds the number of students since some answers belonged to two categories. table a10 students’ responses to the question: why do you learn much or little of being in your achievement group? frequency of comments a answering category example low (n = 52) average (n = 90) high (n = 104) total (n = 246) answers about work / difficulty 9 (17.3%) 17 (18.9%) 42 (40.4%) 68 (27.6%) difficult (positive) / challenging you learn much because you also get more difficult sums 2 (3.9%) 4 (4.4%) 30 (28.9%) 36 (14.6%) working independently i can do it by myself 1 (1.9%) 4 (4.4%) 3 (2.9%) 8 (3.3%) appropriate for my level it’s my level 0 (0.0%) 4 (4.4%) 3 (2.9%) 7 (2.9%) medium or varying difficulty sometimes it’s easy and sometimes it’s difficult 2 (3.9%) 2 (2.2%) 2 (1.9%) 6 (2.4%) too easy / not challenging / want more challenge now it is sometimes a bit too easy 1 (1.9%) 3 (3.3%) 1 (1.0%) 5 (2.0%) too difficult …and i don’t learn much because it’s sometimes difficult 1 (1.9%) 0 (0.0%) 2 (1.9%) 3 (1.2%) easy/not too difficult because the plus-group [highest group] would be too difficult 2 (3.9%) 0 (0.0%) 1 (1.0%) 3 (1.2%) answers about learning 12 (23.1%) 13 (14.4%) 12 (11.5%) 37 (15.0%) learn more / specific math content / getting more sums you learn new goals every time 3 (5.8%) 7 (7.8%) 6 (5.8%) 16 (6.5%) understand more / better i learn much because i understand it better now 3 (5.8%) 5 (5.6%) 3 (2.9%) 11 (4.5%) 90 | f l r get smarter / better at math because i get smarter 1 (1.9%) 1 (1.1%) 2 (1.9%) 4 (1.6%) not understanding because i don’t understand it at all 3 (5.8%) 0 (0.0%) 1 (1.0%) 4 (1.6%) learn less / fewer tasks because you do get fewer sums 2 (3.9%) 0 (0.0%) 0 (0.0%) 2 (0.8%) answers about instruction / teacher 7 (13.5%) 17 (18.9%) 12 (11.5%) 36 (14.6%) positive about instruction / teacher because the teacher gives you more explanations and does more sums with you 7 (13.5%) 15 (16.7%) 12 (11.5%) 34 (13.8%) negative about instruction / teacher i learn more from myself because the teachers confuse me 0 (0.0%) 2 (2.2%) 0 (0.0%) 2 (0.8%) answers about group 5 (9.6%) 7 (7.8%) 7 (6.7%) 19 (7.7%) helping each other / working together when you work together you also learn from it 3 (5.8%) 1 (1.1%) 3 (2.9%) 7 (2.9%) distraction i learn a bit less because sometimes children talk and draw attention 2 (3.9%) 1 (1.1%) 3 (2.9%) 6 (2.4%) positive about group members / dynamics because it’s nice in a nice group 0 (0.0%) 3 (3.3%) 0 (0.0%) 3 (1.2%) concentrate / no distraction quiet so more concentration 0 (0.0%) 2 (2.2%) 1 (1.0%) 3 (1.2%) general and other answers 19 (36.5%) 35 (38.9%) 31 (29.8%) 86 (35.0%) unspecific / don’t know / other i learn much because i learn much from it 17 (32.7%) 34 (37.8%) 29 (27.9%) 80 (32.5%) fun because i think it’s fun 2 (3.9%) 1 (1.1%) 2 (1.9%) 6 (2.4%) a number and percentage of comments within that achievement group. the total number of comments exceeds the number of students since some answers belonged to two categories. table a11 students’ responses to the question: why would you (not) prefer to be in a different achievement group? frequency of comments a answering category example low (n = 54) average (n = 87) high (n = 106) total (n =247) reasons for preferring to stay in the same group answers about work / difficulty 6 (11.1%) 15 (17.2%) 36 (34.0%) 57 (23.1%) difficulty [appropriate in current group] because i don’t do too difficult or too easy work 5 (9.3%) 9 (10.3%) 22 (20.8%) 36 (14.6%) tasks or activities in the current group are fun / want to keep enrichment tasks are fun, so i want to keep those 1 (1.9%) 3 (3.5%) 5 (4.7%) 9 (3.6%) 91 | f l r enrichment tasks my level because this is my level 0 (0.0%) 3 (3.5%) 3 (2.8%) 6 (2.4%) challenge because otherwise i don’t have enough challenge 0 (0.0%) 0 (0.0%) 6 (5.7%) 6 (2.4%) answers about group 6 (11.1%) 3 (3.5%) 6 (5.7%) 15 (6.1%) group members / dynamics / no distraction they are kind and not loud 5 (9.3%) 3 (3.5%) 5 (4.7%) 12 (4.9%) group rank: being in a high(er) / smart(er) group it’s the best group 1 (1.9%) 0 (0.0%) 1 (0.9%) 2 (0.8%) answers about learning 1 (1.9%) 4 (4.6%) 5 (4.7%) 10 (4.1%) learn more / faster / new things because i learn most in this group 1 (1.9%) 4 (4.6%) 3 (2.8%) 8 (3.2%) getting smarter / better at math because i get smarter 0 (0.0%) 0 (0.0%) 2 (1.9%) 2 (0.8%) answers about instruction / teacher 0 (0.0%) 2 (2.3%) 4 (3.8%) 6 (2.4%) positive about instruction / teacher because now i get the explanation that i need 0 (0.0%) 2 (2.3%) 4 (3.8%) 6 (2.4%) general and other answers 14 (25.0%) 34 (39.1%) 44 (41.5%) 92 (37.2%) this group is nice/fun because it’s nice in this group 6 (11.1%) 23 (26.4%) 20 (18.9%) 49 (19.8%) unspecific / don’t know / other i find it hard to explain 8 (14.8%) 11 (12.6%) 24 (22.6%) 43 (17.4%) reasons for preferring to be in a different group answers about work / difficulty 6 (11.1%) 8 (9.2%) 4 (3.8%) 18 (7.3%) difficulty [would be more appropriate in another group] more difficult sums: because i like that 3 (5.6%) 4 (4.6%) 4 (3.8%) 11 (4.5%) tasks or activities in the other group are fun / want to get enrichment tasks because i want enrichment tasks 2 (3.7%) 2 (2.3%) 0 (0.0%) 4 (1.6%) challenge more challenge 1 (1.9%) 2 (2.3%) 0 (0.0%) 3 (1.2%) answers about group 7 (13.0%) 5 (5.8%) 2 (1.9%) 14 (5.7%) group members / dynamics /distraction because i have many friends in the other group 5 (9.3%) 2 (2.3%) 1 (0.9%) 8 (3.2%) group rank: being in a high(er) / because i want to be in a higher group 2 (3.7%) 3 (3.5%) 1 (0.9%) 6 (2.4%) 92 | f l r smart(er) group answers about learning 5 (9.3%) 2 (2.3%) 0 (0.0%) 7 (2.8%) learn more / faster / new things because i learn much 1 (1.9%) 0 (0.0%) 0 (0.0%) 1 (0.4%) getting smarter / better at math / higher level then i think i will get better at math 4 (7.4%) 2 (2.3%) 0 (0.0%) 6 (2.4%) answers about instruction / teacher 1 (1.9%) 1 (1.2%) 0 (0.0%) 2 (0.8%) instruction / teacher then the teacher can help me when i find it difficult 1 (1.9%) 1 (1.2%) 0 (0.0%) 2 (0.8%) general and other answers 8 (14.8%) 13 (14.9%) 5 (4.7%) 26 (10.5%) other group is nice/fun or current group is not nice it’s not as much fun as another group 4 (7.4%) 2 (2.3%) 1 (0.9%) 7 (2.8%) unspecific / don’t know / other i can’t really explain 4 (7.4%) 11 (12.6%) 4 (3.8%) 19 (7.7%) a number and percentage of comments within that achievement group. the total number of comments exceeds the number of students since some answers belonged to two categories. table a12 students’ responses to the question: why would you prefer to work with or without achievement groups? frequency of comments a answering category example low (n = 52) average (n = 87) high (n = 102) total (n = 241) reasons for preferring to retain groups 37 (71.2%) 67 (77.0%) 80 (78.4%) 184 (76.4%) positive about groups / like it as it is because i like to work in a group 13 (25.0%) 29 (33.3%) 18 (17.7%) 60 (24.9%) between-student differences / appropriate difficulty because if you all work at the same level some people don’t get explanation and others don’t get challenge 6 (11.5%) 14 (16.1%) 24 (23.5%) 44 (18.3%) learning / working better with groups because you can learn better like this 2 (3.9%) 8 (9.2%) 13 (12.8%) 23 (9.5%) possibility to get more instruction i like it when you can choose whether you want explanation 4 (7.7%) 2 (2.3%) 3 (2.9%) 9 (3.7%) know your level it’s much more fun when everybody knows in which star [=group] they are 1 (1.9%) 1 (1.5%) 1 (1.0%) 3 (1.2%) unspecific / don’t know / other i don’t know how without [groups] 11 (21.2%) 13 (14.9%) 21 (20.6%) 45 (18.7%) reasons for preferring to work without groups 15 (28.9%) 20 (23.0%) 22 (21.6%) 57 (23.7%) learning / working better then you can do all tasks and then you get smarter 4 (7.7%) 7 (8.0%) 4 (3.9%) 15 (6.2%) 93 | f l r without groups negative about groups it’s so complicated now with all those groups 2 (3.9%) 2 (2.3%) 4 (3.9%) 8 (3.3%) equality, everybody should be/do the same because i think that everybody should get the same tasks 2 (3.9%) 3 (3.5%) 1 (1.0%) 6 (2.5%) unspecific / don’t know / other because i want that 7 (13.5%) 8 (9.2%) 13 (12.8%) 28 (11.6%) a number and percentage of comments within that achievement group. the total number of comments exceeds the number of students since some answers belonged to two categories. frontline learning research vol. 5 no. 3 special issue (2017) 81 93 issn 2295-3159 *corresponding author: laura helle, department of teacher education, university of turku, assistentinkatu 5, 20014 turun yliopisto, finland. lhelle@utu.fi doi: http://dx.doi.org/10.14786/flr.v5i3.254 prospects and pitfalls in combining eye-tracking data and verbal reports laura helle university of turku, finland article received 30 april / revised 30 january / accepted 23 march / available online 14 july abstract it is intuitively appealing to try to combine eye-tracking data and verbal reports when investigating medical image interpretation. however, before collecting such data, important decisions must be made, including exactly when and how to collect the verbal reports. the purpose of this methodological article is to reflect on the pros and cons of different solutions and to offer some guidelines to investigators. we start by exploring the ontology of vision and speech production and the epistemology of eye movements to grasp what fixations and verbal reports actually reflect. we are also interested in the major constraints of the two systems. second, we elaborate on two dominant investigational approaches to verbal accounts: concurrent think-aloud and chi’s explanations. later, we move on to other approaches. third, we present and critically evaluate studies from the literature on medical image interpretation, specifically ones that have sought to contrast or integrate eye-movement data and verbal reports. fourth, we conclude with some practical guidelines and suggestions for further research. keywords: eye tracking; gaze tracking; verbal reports; think-aloud; medical images; clinical reasoning mailto:lhelle@utu.fi http://dx.doi.org/10.14786/flr.v5i3.254 helle | f l r 82 1. introduction the study of medical expertise in visual domains, such as radiology and dermatology, is firmly rooted in two distinct investigational approaches, both of which serve certain purposes: (a) the study of visual search or perception using eye-tracking methods (e.g., berbaum et al., 1998; kundel, nodine, & carmody, 1978; krupinski et al., 2006; rubin et al., 2014) and (b) the study of clinical reasoning, usually employing verbal reports (azevedo, faremo, & lajoie, 2007; lesgold, feltovich, glaser, & wang, 1981; morita et al., 2008; van der gijp et al., 2015). before the advent of commercial eye trackers, verbal reports were basically the only way to gain insight into diagnostic reasoning. even today, some type of verbal report is needed because one cannot deduce from, for example, dwell times, whether a viewer actually “sees” a lesion (berbaum, franken, dorfman, caldwell, & krupinski, 2000). a crucial part of the perceptual process is assigning meaning to what one sees (nodine & kundel, 1987). as for the value of eye tracking, krupinski (2006) argued that eye tracking may be useful for developing individual eye movement profiles and for understanding the difference in performance between novices and experts. in addition, they can be useful for developing new visual search strategies. although studies following both lines of investigation have shown important insights, one can question whether either of the approaches alone is sufficient enough to answer important research questions. it is hard to see how medical image perception investigators are meeting the expectations of modeling, for instance, search strategies by relying on eye movement metrics alone. it also hard to see how process models can be justified based on only one source of data. as for the protocol analysts, it is odd that, for example, van der gijp et al. (2014) conceptualized the interpretation of radiological images as a process of perception, analysis, and synthesis but methodologically relied on concurrent think-aloud techniques without the use of eye tracking. in fact, it has been argued in the context of occupational psychology that complex cognitive work tasks should be studied by integrating various sources of information, including eye movement data, when appropriate, with verbal reports (patrick & james, 2004; gegenfurtner et al., 2017). patrick and james (2004) stressed that process tracing involves four stages, and important decisions have to be made in each stage. the stages are the following: (1) collection of data, (2) transcription, integration, and segmentation of the data into a time-lined account, (3) coding, and (4) further analysis of the data from stage 3 and representation of the data. in the data collection stage, one of the most critical decisions involves the timing of data collection, because verbal accounts can be collected concurrently with task performance or retrospectively. as for the transcription phase of verbal reports, the authors present the integrated actions of a person in a single table. alternatively, one could think of either a data matrix containing a time-lined account of actions that is obtained through eye-tracking software or a set of time-stamped, transcribed videos. stage 3 involves coding of the transcribed data either based on theoretical categories or done in a bottom-up fashion. the authors stressed that when categories are derived from a bottom-up approach, independent raters should refine categories iteratively, with some form of reliability being reported. in stage 4, the analyst filters or expands the data using the newly acquired codes from stage 3, and subjects the data to further analysis, whereby certain aspects of cognition are made more salient. the authors stress two points: (a) a minimum level of further analysis is whether a worker’s response or solution is correct; (b) there is a need to capture and represent at a global level a person’s reasoning during a scenario in relation to changes in the task and work situation. the purpose of this methodological article is to reflect on the pros and cons of different solutions and to offer some guidelines for investigators. it is stressed that this endeavor stretches the frontiers of the field because gegenfurtner, siewiorek, lehtinen, and säljö (2013) reported in their systematic review that combining eye tracking and verbal reports remains unexplored. also, this article is not a literature review. to review articles other than the one by gegenfurtner et al. (2017), see al-moteri, symmons, plummer, & cooper (2017); blondon, wipfli, and lovis (2015); and van der gijp et al. (2016). we start by exploring the ontology of vision and speech production and the epistemology of eye movements to grasp the major constraints involved in visual processing and speech production. second, we elaborate on two dominant helle | f l r 83 investigational approaches to verbal accounts and introduce alternative approaches. third, we present and evaluate studies from the literature on medical image interpretation that have sought to contrast or integrate eye movement data and verbal reports. fourth, we conclude with practical solutions and some suggestions for further research. 2. nature of the visual system: what do fixations actually reflect? the visual system is the part of the nervous system that allows organisms to see. it interprets information from the environment to build a representation of the surrounding world. the visual system has the complex task of reconstructing a three-dimensional world from a two-dimensional retinal representation of that world. the performance of the visual system in a constantly-changing visual environment is remarkable. the price, however, is that approximately one-third of the cortex is needed to process visual information (vanni, 2004). then, how does the visual system operate? information from the eyes flows into the brain through the optic nerve. information from the right visual field travels to the left optic tract. information from the left visual field travels to the right optic tract. each optic tract terminates in the lgn in the thalamus. the region that receives information directly from the lgn is called the v1. the macakee ape has over 30 cortical regions, and it is estimated that humans have approximately as many (vanni, 2004). these areas are connected to each other by an intricate wiring containing both feedforward and feedback connections (vanni, 2004). according to the ventral-dorsal model introduced by goodale and milner (1992), information flows in two directions from the primary visual cortex: (a) to the posterior parietal cortex through the dorsal stream and (b) to the inferotemporal cortex through the ventral stream. the dorsal pathway has been characterized as the action stream, a pathway concerned with converting visual inputs into motor outputs, whereas the ventral pathway provides a visual perception of objects and events in the world. goodale (1998) stressed, however, that even a simple action such as picking up a cup a coffee requires activity in both pathways. there are several bottlenecks in the visual system that stem from constraints in anatomy, attention, and working memory. first, high visual acuity is limited to the fovea, a spot on the retina. the fovea is employed for accurate vision in the direction where it is pointed. visual acuity decreases dramatically in the parafoveal area and periphery. in eye-movement research, it is possible to capture the target of foveal inspection through fixations. second, object recognition is limited by capacity and often attention-demanding because one cannot recognize multiple objects with more than one feature simultaneously (such as a letter t containing green and purple). object recognition requires more than 100 ms per item, which refers to processing time instead of presentation time (wolfe, võ, evans, & greene, 2011). third, there is a limit to focusing and shifting one’s attention: people tend to move their eyes between two to four times per second when reading and conducting most visual search tasks (salthouse & ellis, 1980). the gaze, however, can be trained to make the best out of the few fixations: the novice’s gaze is often drawn by salient, bottom-up features, whereas experts more often focus on top-down, task-relevant features, as evidenced by bertram, helle, kaakinen, and svedström (2013). (see also wolfe, evans, drew, aizenman, & josephs, 2015). fourth, although information flows into the system incessantly, working-memory capacity is limited to approximately four “chunks,” or combinations of items, at a time (cowan, 2010). in addition, information in one’s working memory is lost quickly: according to ericsson (2006), for tasks with response latencies of 5– 10 seconds, people are able to recall their sequences of thoughts quite accurately. for the main part, the human brain processes low-level information patterns in the environment automatically (vanni & heikkinen, 2015). studies adhering to a flash-view paradigm (i.e., presenting images to participants for 20–250 ms) have shown that people can partially infer a scene without even fixating on the scene (e.g., kirschner & thorpe, 2006). only a small part of the information reaching the cortex is helle | f l r 84 processed further, with storage capacity representing yet another filter. thus, visual information processing reaching awareness is only the tip of the iceberg. people use fixations to purposively sample information from their surroundings to reconstruct a representation of the surrounding world. however, studies adhering to a flash-view paradigm have shown that people can partially infer a scene without even fixating on the scene. thus, there appears to be two visual pathways, coined a selective pathway involving purposive sampling and a nonselective pathway by wolfe et al. (2011). to answer the question in the title, the sequence of fixations can be seen as reflecting a visual search through the selective pathway (i.e., attentional guidance). 3. speech production and verbal reports: what do verbal reports actually reflect? 3.1 speech production how people produce and why they produce speech is usually taken for granted. speech has many social and cultural functions, such as signifying group identity, social grooming, settling disputes, teaching, and entertainment. naturally, the function of each act of speech shapes speech production, which is a rather complex process. according to levelt (1989, pp. 4–14), speech production involves four stages (originally conceptualized as “processing components”) that depend heavily on “knowledge stores”: (a) conceptualization (i.e., preverbal message generation relying on situational knowledge and content knowledge); (b) formulation, including grammatical and phonological encoding relying on lexical knowledge; (c) articulation (i.e., execution of the phonetic plan by three sets of muscles involving up to 100 different muscles) resulting in overt speech; and (d) self-monitoring (i.e., the normal components of normal language comprehension relying on lexical knowledge). interestingly, the model includes the notion of inner speech, which is the product of the second phase. the model does not include writing as an alternation to articulation, but speech can be encoded into the visual or tactile form in addition to the auditory form. as a result, people manage to produce two to three words per second as a part of fluent conversation, and overtly naming a clear picture of an object can be initiated within 600 ms after the appearance of the picture (levelt, roelofs, & meyer, 1999). in fact, the generation of inner speech may be somewhat ahead of articulation. to cope with asynchrony, it is necessary for the phonetic plan to be stored. the storage mechanism is referred to as the articulatory buffer. it is important to note that these actions tax the speaker’s information-processing capacity, including working memory; in addition, speech production is delayed compared to recognition by the visual system. (recall that object recognition requires 100 ms processing time.) 3.2 two dominant approaches to verbal reports in a research context, verbal reports are heavily shaped by the context in which they are produced, and verbal reports serve various functions in different research traditions. this can be highlighted by comparing two dominant approaches to verbal reports: ericsson’s protocol analysis and chi’s explanations. ericsson and simon’s “verbal reports as data” (1980), with over 13,800 google citations and 1,619 web of science citations, appears to be the most influential piece of work on verbal reports. based on google scholar, ericsson has been the most active author on verbal reports over the last 30 years. second to ericsson and simon’s article is an article by micheline chi: “quantifying qualitative analysis of verbal data: a practical guide.” this paper has over 1,490 google citations and over 480 web of science citations. these two approaches are also frequently used in the context of medical image interpretation. therefore, ericsson’s protocol analysis and chi’s explanations deserve sections of their own. helle | f l r 85 according to ericsson (2006, p. 227), the central assumption of protocol analysis is that it is possible to instruct people “to verbalize their thoughts in a manner that does not alter the sequence and content of thoughts mediating the completion of a task and therefore should reflect immediately available information during thinking”. using levelt’s terminology, ericsson is after “inner speech”. in other words, the purpose is to elicit concurrent, nonreactive reports of thinking to understand expert reasoning and performance. according to the expert performance approach, the best way to obtain valid and complete traces of expert thought is to strive to produce laboratory conditions that capture “the essence of expertise,” where participants perform tasks that are representative of the studied phenomenon and where verbalizations directly reflect the participants’ spontaneous thoughts that are generated while completing the task. the instructions can be as follows (ericsson & simon, 1993, p. 376): “in this experiment, we are interested in what you say to yourself as you perform some tasks that we give you. in order to do this we will ask you to talk aloud as you work on the problems. what i mean by talk aloud is that i want you to say out loud everything that you say to yourself silently. just act as if you are alone in the room speaking to yourself. if you are silent for any length of time i will remind you to keep talking aloud.” in contrast, the goal of chi’s explanations (1997) is to figure out what a learner knows based on what a learner says or does and how that knowledge influences the learner’s reasons. chi avoided giving detailed instructions on how verbal reports should be elicited. instead, she gave detailed instructions on the analysis of such reports. she stressed that one must first determine “what” the learner said (e.g., a set of propositions or concepts). however, after that, to determine the overall structure of knowledge representations, one must assess the relations between the set. for example, a learner with naïve conceptions can hold pieces of unrelated knowledge, or a learner’s knowledge set can be theory-like, meaning that the reasoning can be captured by a few principles. according to chi (1997), the method of coding and analysing verbal data consists of the following eight steps: 1. reducing or sampling the protocols 2. segmenting the reduced or sampled protocols (sometimes optional) 3. developing a coding scheme or formalism 4. operationalizing evidence in the coded protocols 5. depicting the mapped formalism (optional) 6. seeking a pattern in the mapped formalism 7. interpreting the patterns 8. repeating the entire process, perhaps adopting a different grain size (optional). according to chi (1997), there are five key differences between ericsson’s protocol analysis and her verbal analysis. first, there is a clear juxtaposition in the way the verbal reports are collected. ericsson and simon (1993) underlined that research participants are simply verbalizing the information they attend to while generating an answer to a problem instead of describing, explaining, justifying, or rationalizing their actions. second, there is a difference in focus. ericsson and simon (1994) were concerned with tapping the online process of problem solving or decision making, whereas chi was interested in capturing the participants’ knowledge representations. she even argued that the goal of protocol analysis is to test the a priori model rather than to uncover what the participants are actually doing. the third difference has to do with analytical procedures and workloads. according to chi (1997), in protocol analysis, coming up with the ideal template, which requires a cognitive task analysis, represents the majority of the workload. in contrast, in verbal analysis, the referents are unknown; in self-explanation data, one must determine what the participant is talking about (e.g., an inference, plan, or inquiry). fourth, the method of validation or testing is different for the two methods. in ericsson’s protocol analysis, the sequence of verbal utterances is simply compared to the ideal template. the validation of the protocol analysis is “the degree of match” between these two. in the verbal analysis method, validation is achieved by using statistical testing. for example, qualitatively different knowledge representations of different groups of participants can be checked against helle | f l r 86 the answers to some subject-specific questions. ericsson (2006) pointed out that task analysis can be applied to the analysis of think-aloud protocols. however, he added that it is also possible to examine the convergent validity established by different types of data, including reaction times, error rates, patterns of brain activation, and sequences of eye movements. ericsson’s protocol analysis has several advantages. an obvious advantage is that ericsson provided detailed instructions on how to collect data using the method. the other advantage is that ericsson and simon (1993) provided a wealth of evidence indicating that the method is not reactive (i.e., it does not alter the course of cognitive processing). the main disadvantages are the following: (a) as morita et al. (2008) noted, medical image interpretation involves an implicit process that is difficult to verbalize; (b) thinkingaloud generally slows down performance, which may disrupt the execution of dynamic tasks in particular; (c) in certain tasks, it has been shown to alter accuracy (russo, johnson, & stephens, 1989). the disadvantage of prompting for explanations in the middle of an activity is that it has been repeatedly shown to affect behaviour in multiple ways (ericsson & simon, 1993). a more recent study exploring visual search behaviour on different sets of web pages showed that prompting for explanations not only prolonged the task, but also led to more general distributed visual behaviour and the issuing of more commands to navigate within and between the web pages. in addition, mental workload increased (hertzum, hansen, & anderson, 2009). another disadvantage of chi’s explanations is the lack of clear instructions on how to collect “explanations.” if explanations are required, some form of retrospective reporting should be seriously considered. 3.3 retrospective reporting in fact, people can provide quite accurate retrospective reports for short tasks that take 5–10 seconds (ericsson, 2006). the instruction can be as follows: “can you please tell me what you were thinking during problem solving?” (van gog, paas, van merriënboer, & witte, 2005). in the context of medical image interpretation, it is also common to ask the participants to report on the findings and final diagnosis either orally or in writing. as patrick and james (2004) pointed out, ideally, verbal reports should be collected immediately after task completion while the participant’s short-term memory still holds relevant information. according to the authors, when there is a need to rely on the participant’s long-term memory, some type of retrieval cues should be designed. in the case of medical images, which require more than 5–10 seconds to interpret, showing the participants a dynamic presentation of their eye movements (and keyboard movements when applicable) would seem to be a viable cuing solution. there have been some noteworthy efforts to compare concurrent think-aloud with retrospective reports and cued reporting. in the context of troubleshooting electric circuits, van gog, paas, van merriënboer and witte (2005) conjectured that the methods would extract different types of information regarding process tracing. the authors did not report the duration of the troubleshooting tasks, but it seems safe to assume that the task durations exceeded the critical limit of 5–10 seconds. thus, it was not surprising that the concurrent think-aloud and cued retrospective reporting involving showing the participants their eye movements and keyboard strokes resulted in more information than retrospective reporting. the remarkable finding was that the cued retrospective method resulted in less theoretical meaning-making verbalizations (why utterances), whereas the concurrent method resulted in less metacognitive utterances. thus, in addition to the context and population, one needs to consider carefully the type of information one is seeking. the advantage of retrospective reporting is that the method of verbalization does not interfere with task completion. if the task is of a very short duration (under 10 seconds), accurate reports of thinking processes can be expected. the advantage of cued retrospective reporting is that it does not interfere with task performance. however, it may be that the cues are not sufficient enough to recover all task-related information. helle | f l r 87 it is worth noting that morita et al. (2008) showed that it can be worthwhile to triangulate verbal reports obtained through different methods. their results indicated that experts use more conceptual words in thinking-aloud through a visual task, but they use more perceptual words when compared to novices in the writing of the report. the interpretation of the finding was that the development of expertise is based on an ability to build connections between percepts and concepts. 4. critical examination of studies combining eye tracking and verbal reports in the context of medical image perception although there are many arguments for promoting the combination of eye tracking and verbal reports, combining eye tracking and verbal reports is easier said than done, as can be seen from the following studies employing concurrent think-aloud. concurrent think-aloud attempts to capture nonreactive verbal reports of thinking (ericsson, 2006). the notion of “nonreactivity” means that the execution of the primary task is not affected, except for the fact that it may be prolonged. the participants are asked to perform a task while uttering briefly what spontaneously comes to mind. in other words, it aims to “vocalize inner speech.” ericsson emphasized numerous times that participants should be talking to themselves, not explaining what they are doing or why because it has been repeatedly shown that the act of explaining can seriously interfere with the task the investigators are trying to model. the first efforts to triangulate different sources of data obtained from different studies date back to the year 2000. berbaum et al. (2000) were interested in conducting a congenially designed laboratory experiment to determine if satisfaction of search is because of recognition error or because of decision error by two different methods (eye tracking versus protocol analysis). the design involved inserting artificial lesions in an image to see if it decreases the detection of native lesions, indicating satisfaction of search (sos). an earlier study employing eye tracking had indicated that inserting artificial lesions to certain images decreased the reporting of native lesions on those images. in the new experiment, berbaum et al. (2000) discovered two important things: first, the think-aloud condition served to eliminate the satisfaction of the search effect. second, the two methods provided contradictory results: the eye-tracking study suggested sos was because of decision error, whereas the think-aloud study suggested that sos was because of recognition error. the authors concluded that protocol analysis is limited in its ability to differentiate between search error and recognition error. on the other hand, there are perils in assuming that a lesion has been recognized based on dwell time alone. thus, it was hard to reconcile the fact that the two studies produced contradictory findings. also, the fact that the think-aloud procedure affected performance on the primary task casts doubts on the integrity of the entire study: it is difficult to argue that the think-aloud procedure was nonreactive. we speculate that the reactivity was because of the instructions given; the observers were instructed to use a finger to point to where they were looking at and to verbalize the structures and the features they were looking for. these early efforts highlight the difficulties investigators can experience in applying concurrent think-aloud and method triangulation. in fact, the first fundamental issue to consider is whether the concurrent think-aloud condition interferes with the primary task. to our knowledge, only a single study has been conducted on this issue in the context of medical image interpretation. littlefair, brennan, reed, williams, and pietrzyk (2012) explored whether the think-aloud condition affects pulmonary node detection; they did this using a withinsubjects design with seven participants, two viewing sessions with a “wash-out period” separating them, and a set of 30 two-dimensional radiographs. half of the radiographs contained a single artificial nodule, and the rest were non-nodular. the participants were informed that the radiographs may contain a single nodule. no time limit was set for viewing. performance was evaluated in terms of sensitivity, specificity, and roc measures, including multicase multireader roc auc analysis. in addition, the participants’ eye movements were tracked to compare, for example, fixations of areas of interest and time to fixate on areas of interest. helle | f l r 88 results indicated that only half of the nodules ended up correctly localized, indicating an absence of ceiling effects. there were no differences in performance under the two conditions, with the exception of confidence ratings (in the ta condition, the subjects were less confident) and task duration. the latter result was statistically significant. the results are well-aligned with ericsson and simon’s (1993) theoretical account. concurrent think-aloud has also been used, rather surprisingly, in the context of dynamic stimuli. an alternative approach situated in the context of fish locomotion can be seen in jarodzka, scheiter, gerjets, and van gog (2010). balslev et al. (2012) used think-aloud in the context of viewing films depicting infants with seizures and conditions resembling seizures. balslev et al. (2012) had their participants (medical students, residents, and experts) think-aloud while diagnosing the infant seizures presented in the short films, which lasted anywhere from 26–49 seconds. the films were looped and repeated until the observer wished to stop viewing. not surprisingly, the experts scored higher in diagnostic accuracy and spent relatively more time viewing task-relevant features. a content analysis of the verbal accounts revealed that experts engaged more, in relative terms, exploring the material and spent more time building and evaluating hypotheses. this pattern, in turn, explained why the experts returned to the areas of interest. this study showed how the combined use of eye movements and verbal reports can lead to a better understanding of medical image interpretation. finally, li, pelz, alm, and haake (2012) attempted to integrate eye movement information completely with the concurrent verbalizations of a group of dermatologists who differed in their level of training; the groups were asked to observe 42 two-dimensional dermatological images. subsequently, they developed a hierarchical probabilistic framework to extract unique and common eye movement patterns among multiple subjects within each expertise group. the idea was to map specific eye movement patterns to certain cognitions, such as identifying the primary morphology. although the study is a remarkably ambitious endeavor to integrate eye tracking and concurrent verbalizations, as a process-tracing study, the work suffers from the implementation of the concurrent verbalizations. the novices were requested to provide a detailed description of the materials “as if describing to their doctors over the phone.” the medical professionals were instructed to examine and describe the findings to students “as if teaching.” (ibid., p. 395) these are clear violations of the principle of focusing on inner thought, and asking these questions may have seriously interfered with the primary task of image interpretation. therefore, the mapping solution presented may have limited value. retrospective verbalization is a suitable option for very short tasks where there is simply not enough time to verbalize. jaarsma, jarodzka, nap, van merriënboer, and boshuizen (2014) applied a heavily time-constrained research design. the authors presented two-dimensional microscopic images of colon tissue to a group of clinical pathologists, pathology residents, and medical students. the viewing of the images was constrained to 2 seconds. the participants’ eye movements were registered, along with their post hoc verbal accounts of what they had seen. (the authors did not use the expression “retrospective think-aloud”; instead, they referred to post hoc verbalizations.) the investigators analyzed the two sources of data separately. the verbal accounts were analyzed through an elaborate content analysis. the most interesting findings related to the differences between the clinical pathologists and the residents: in their search, the clinical pathologists tended to rely on what they had already seen, further studying the image for other abnormalities, whereas the residents tended to double-check their initial findings. in their post hoc verbalizations, the clinical pathologists focused on the typicality of the tissue, whereas the residents concentrated on naming pathologies. this study showed that important insights can be gleaned by combining eye movements and a form of retrospective verbal reports. interestingly, not a single study could be found where the authors reported collecting the data by using chi’s explanations as the method. instead, going through the literature revealed several studies where the investigators collected the data using concurrent think-aloud and then referring to chi (1997) in the analysis phase (azevedo et al., 2007; van der gijp et al., 2014; van der gijp et al., 2015). it is not unusual to find studies using concurrent verbalizations, which end up gathering explanations (e.g., li et al., 2012; van der gijp et al., 2015). helle | f l r 89 5. conclusions the integration of eye tracking and verbal reports is intuitively appealing, and as illustrated in this methodological article, interesting insights can be gleaned by adopting a mixed-methods approach. however, before collecting such data, important decisions must be made. the first critical decision is timing, that is whether to collect concurrent or retrospective data. in the case of concurrent think-aloud, a decision must be made whether to follow ericsson and simon’s or chi’s advice. in the case of retrospective reports, one must decide whether to play back the eye movements to the observer to aid the retrieval of information from longterm memory. based on this methodological analysis, some methodological issues appear to be solved, whereas others require further investigation. we argue that two issues are solved: first, retrospective reporting without cuing is suitable for perceptual tasks of a very short duration (<10 seconds). the advantages are the following: (a) one can be certain that verbalization does not interfere with the primary task; (b) the observer’s verbalization is not constrained to the speed of the visual system; and (c) it is safe to assume that information is still available in short-term memory. second, there exists a compelling body of literature indicating that for the purposes of process tracing, ericsson’s nonreactive method is superior to the idea of soliciting direct explanations from the observers during task execution because soliciting explanations tends to interfere with the primary task. it is hard to see what the purpose of process tracing would be if the research method results in a substantial change in the primary activity. other issues remain underexplored. first, when tasks are longer than 10 seconds, the pros and cons of ericsson’s concurrent think-aloud versus cued retrospective reporting need to be weighed against each other. in the study by van gog et al. (2005) in the context of troubleshooting electric circuits, the concurrent think-aloud condition produced more theoretical expressions, which were to some extent lost in the stimulated recall condition. we emphasize that this issue has not been explored in the context of medical image interpretation. more fundamentally, there is a need for more studies to be conducted to show that perceptual tasks, such as viewing an x-ray, are not affected by the think-aloud condition. the study by russo et al. (1989) showed that even when experimenters stick meticulously to ericsson and simon’s instructions, the think-aloud condition may affect task performance. russo et al. (1989, p. 758) concluded that “protocol validity should be based on an empirical check rather than theory-based assurances”. what is proposed is a research agenda with two goals: (a) to further explore if think-aloud affects performance in a range of image interpretation tasks; (b) to compare the type of information obtained by concurrent think-aloud and cued retrospective reporting with different types of material (two-dimensional, volumetric, video). it would also be useful to include observers with varying levels of experience. keypoints before attempting to combine eye-tracking data and verbal accounts, important decisions must be made regarding the timing of the verbalizations and possible cuing. ericsson’s concurrent think-aloud is deemed superior to eliciting explanations from the observers during task performance. retrospective think-aloud is suitable for tasks of a very short duration (<10 seconds). a research agenda is proposed for investigating the methodological issues that remain unsolved. helle | f l r 90 acknowledgements the author wishes to thank dr. raymond bertram particularly for advice on writing about speech production. in addition, she is grateful to the two anonymous reviewers for providing exceptionally insightful and constructive feedback on the first version of the manuscript. references al-moteri, m. o., symmons, m., plummer, v., & cooper, s. (2017). eye tracking to investigate cue processing in medical decision-making: a scoping review. computers in human behavior, 66, 52-66. doi.10.1016/j.chb.2016.09.022 azevedo, r., faremo, s., & lajoie, s. p. (2007, january). expert-novice differences in mammogram interpretation. proceedings of the cognitive science society, 29(29). balslev, t., jarodzka, h., holmqvist, k., de grave, w., muijtjens, a. m., eika, b., ... scherpbier, a. j. (2012). visual expertise in paediatric neurology. european journal of paediatric neurology, 16(2), 161-166. doi.10.1016/j.ejpn.2011.07.004 berbaum, k. s., franken, e. a., dorfman, d. d., caldwell, r. t., & krupinski, e. a. (2000). role of faulty decision making in the satisfaction of search effect in chest radiography. academic radiology, 7(12), 1098-1106. doi.10.1016/s1076-6332(00)80063-x berbaum, k. s., franken, e. a., dorfman, d. d., miller, e. m., caldwell, r. t., kuehn, d. m., & berbaum, m. l. (1998). role of faulty visual search in the satisfaction of search effect in chest radiography. academic radiology, 5(1), 9-19. doi.10.1016/s1076-6332(98)80006-8 bertram, r., helle, l., kaakinen, j. k., & svedström, e. (2013). the effect of expertise on eye movement behaviour in medical image perception. plos one, 8(6), e66169. doi.10.1371/journal.pone.0066169 blondon, k., wipfli, r., & lovis, c. (2015). use of eye-tracking technology in clinical reasoning: a systematic review. studies in health technology and informatics, 210, 90-94. doi.10.3233/978-161499-512-8-90 chi, m. t. (1997). quantifying qualitative analyses of verbal data: a practical guide. the journal of the learning sciences, 6(3), 271-315. doi.10.1207/s15327809jls0603_1 cowan, n. (2010). the magical mystery four how is working memory capacity limited and why. current directions in psychological science, 19(1), 51-57. doi.10.1177/0963721409359277 ericsson, k. a. (2006). protocol analysis and expert thought: concurrent verbalizations of thinking during experts’ performance on representative tasks. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 223-241). cambridge, ma: cambridge university press. ericsson, k. a., & simon, h. a. (1980). verbal reports as data. psychological review, 87(3), 215. ericsson, k. a., & simon, h. a. (1993). protocol analysis. verbal reports as data. cambridge, ma: mit press. gegenfurtner, a., kok, e., geel, k., bruin, a., jarodzka, h., szulewski, a., & merriënboer, j. j. (2017). the challenges of studying visual expertise in medical image diagnosis. medical education, 51(1), 97-104. doi.10.1111/medu.13205 gegenfurtner, a., siewiorek, a., lehtinen, e., & säljö, r. (2013). assessing the quality of expertise differences in the comprehension of medical visualizations. vocations and learning, 6(1), 37-54. doi.10.1007/s12186-012-9088-7 goodale, m. a. (1998). visuomotor control: where does vision end and action begin? current biology, 8(14), r489-r491. doi.10.1016/s0960-9822(98)70314-8 goodale, m. a., & milner, a. d. (1992). separate visual pathways for perception and action. trends in neurosciences, 15(1), 20-25. helle | f l r 91 hertzum, m., hansen, k. d., & andersen, h. h. (2009). scrutinising usability evaluation: does thinking aloud affect behaviour and mental workload? behaviour & information technology, 28(2), 165-181. doi.10.1080/01449290701773842 jaarsma, t., jarodzka, h., nap, m., merrienboer, j. j., & boshuizen, h. (2014). expertise under the microscope: processing histopathological slides. medical education, 48(3), 292-300. doi.10.1111/medu.12385 jarodzka, h., scheiter, k., gerjets, p., & van gog, t. (2010). in the eyes of the beholder: how experts and novices interpret dynamic stimuli. learning and instruction, 20(2), 146-154. doi.10.1016/j.learninstruc.2009.02.019 kundel, h. l., nodine, c. f., & carmody, d. (1978). visual scanning, pattern recognition and decision making in pulmonary nodule detection. investigative radiology, 13(3), 175-181. kirchner, h., & thorpe, s. j. (2006). ultra-rapid object detection with saccadic eye movements: visual processing speed revisited. vision research, 46(11), 1762-1776. doi.10.1016/j.visres.2005.10.002 krupinski, e. a., tillack, a. a., richter, l., henderson, j. t., bhattacharyya, a. k., scott, k. m., … weinstein, r. s. (2006). eye-movement study and human performance using telepathology virtual slides. implications for medical education and differences with experience. human pathology, 37(12), 1543-1556. doi.10.1016/j.humpath.2006.08.024 lesgold, a. m., feltovich, p. j., glaser, r., & wang, y. (1981). the acquisition of perceptual diagnostic skill in radiology (no. lrdc-81/pds-1). pittsburgh university learning research and development center. levelt, w. j. m. (1989). speaking. from intention to articulation. cambridge, ma: mit press. levelt, w. j., roelofs, a., & meyer, a. s. (1999). a theory of lexical access in speech production. behavioral and brain sciences, 22(1), 1-38. li, r., pelz, j., shi, p., alm, c. o., & haake, a. r. (2012, march). learning eye movement patterns for characterization of perceptual expertise. in proceedings of the symposium on eye tracking research and applications (pp. 393-396). acm. littlefair, s., brennan, p., reed, w., williams, m., & pietrzyk, m. w. (2012, february). does the thinking aloud condition affect the search for pulmonary nodules? in spie medical imaging (pp. 83181a83181a). bellingham, wa: international society for optics and photonics. morita, j., miwa, k., kitasaka, t., mori, k., suenaga, y., iwano, s., ... ishigaki, t. (2008). interactions of perceptual and conceptual processing: expertise in medical image diagnosis. international journal of human-computer studies, 66(5), 370-390. doi.10.1016/j.ijhcs.2007.11.004 nodine, c. f., & kundel, h. l. (1987). using eye movements to study visual search and to improve tumor detection. radiographics, 7(6), 1241-1250. patrick, j., & james, n. (2004). process tracing of complex cognitive work tasks. journal of occupational and organizational psychology, 77(2), 259-280. rubin, g. d., roos, j. e., tall, m., harrawood, b., bag, s., ly, d. l., ... choudhury, r. k. (2014). characterizing search, recognition, and decision in the detection of lung nodules on ct scans: elucidation with eye tracking. radiology, 274(1), 276-286. doi.10.1148/radiol.14132918 russo, j. e., johnson, e. j., & stephens, d. l. (1989). the validity of verbal protocols. memory & cognition, 17(6), 759-769. salthouse, t. a., & ellis, c. l. (1980). determinants of eye-fixation duration. the american journal of psychology, 207-234. van der gijp, a., van der schaaf, m. f., van der schaaf, i. c., huige, j. c. b. m., ravesloot, c. j., van schaik, j. p. j., & ten cate, t. j. (2014). interpretation of radiological images: towards a framework of knowledge and skills. advances in health sciences education, 19(4), 565-580. doi.10.1007/s10459013-9488-y van der gijp, a., ravesloot, c. j., van der schaaf, m. f., van der schaaf, i. c., huige, j. c., vincken, k. l., ... van schaik, j. p. (2015). volumetric and two-dimensional image interpretation show different cognitive processes in learners. academic radiology, 22(5), 632-639. doi.10.1016/j.ejrad.2014.12.015 helle | f l r 92 van der gijp, a., ravesloot, c. j., jarodzka, h., van der schaaf, m. f., van der schaaf, i. c., van schaik, j. p. j., & ten cate, t. j. (2016). how visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. advances in health sciences education, doi. 10.1007/s/10459-016-9698-1 van gog, t., paas, f., van merriënboer, j. j., & witte, p. (2005). uncovering the problem-solving process: cued retrospective reporting versus concurrent and retrospective reporting. journal of experimental psychology: applied, 11(4), 237. doi.10.1037/1076-898x.11.4.237 vanni, s. (2004). näkötiedon käsittely aivokuoressa [processing of visual data in the cerebral cortex]. duodecim, 120, 2653-2662. vanni, s., & heikkinen, h. (2015). onko aivoissamme käyttämätöntä kapasiteettia? [is there unused capacity in our brain?] duodecim, 131, 1644-1649. wolfe, j. m., evans, k. k., drew, t., aizenman, a., & josephs, e. (2015). how do radiologists use the human search engine? radiation protection dosimetry, 501. doi.10.1093/rpd/ncv501 wolfe, j. m., võ, m. l. h., evans, k. k., & greene, m. r. (2011). visual search in scenes involves selective and nonselective pathways. trends in cognitive sciences, 15(2), 77-84. doi.10.1016/j.tics.2010.12.001 frontline learning research vol. 5 no. 3 special issue (2017) 14 30 issn 2295-3159 neural correlates of visual perceptual expertise: evidence from cognitive neuroscience using functional neuroimaging andreas gegenfurtnera1, ellen m. kokb, koos van geelb, anique b. h. de bruinb, bettina sorgerc a institut für qualität und weiterbildung, technische hochschule deggendorf, germany b school of health professions education, maastricht university, the netherlands c department of cognitive neuroscience, maastricht university, the netherlands article received 7 may / revised 23 march / accepted 23 march / available online 14 july abstract functional neuroimaging is a useful approach to study the neural correlates of visual perceptual expertise. the purpose of this paper is to review the functional-neuroimaging methods that have been implemented in previous research in this context. first, we will discuss research questions typically addressed in visual expertise research. second, we will describe which kinds of stimuli are employed and which functional-neuroimaging techniques are implemented in this kind of research, with a special focus on electroencephalography (eeg) and functional magnetic resonance imaging (fmri). third, we will summarize the outcomes of recent studies that addressed the neural correlates of visual expertise and will particularly focus on studies that examined the neural correlates of visual expertise in medical image diagnosis. finally, the review closes with a discussion of the benefits, caveats, and future directions of cognitiveneuroscience research for studying visual expertise. keywords: perceptual expertise; eeg; fmri; n170; ffa. 1 corresponding author: andreas gegenfurtner, institut für qualität und weiterbildung, technische hochschule deggendorf, dieter-görlitz-platz 1, 94469 deggendorf, germany. email: andreas.gegenfurtner@th-deg.de doi: http://dx.doi.org/10.14786/flr.v5i3.259 http://dx.doi.org/10.14786/flr.v5i3.259 gegenfurtner et al | f l r 15 1. introduction expertise can be defined as maximal adaptations to task constraints (ericsson & lehmann, 1996; gruber, jansen, marienhagen, & altenmueller, 2010) which can take many forms, including, among others, motor expertise, memory expertise, or perceptual expertise (ericsson & lehmann, 1996). perceptual expertise can be further categorized as visual, auditory, tactile, olfactory, vestibular, or gustatory expertise. visual expertise is evident, for example, when bird experts classify a passing little bird as an oriole or a cardinal (tanaka & curran, 2001) or when clinicians diagnose digitized slides of human tissue as pathologically normal or abnormal (helle et al., 2011). assuming that individual differences in visual perceptual expertise should be reflected in differences in the brain, the following question arises: can we reliably measure/objectify neural correlates of visual expertise with currently available functionalneuroimaging methods and therewith explain inter-individual behavioral differences with respect to visual perceptual expertise? in line with the overall goal of this special issue to introduce and discuss methodological approaches in visual expertise research (gegenfurtner & van merriënboer, 2017), the purpose of the present methodological review is to reflect on the promises and pitfalls of cognitive-neuroscience methods in the study of visual perceptual expertise. while the review can offer input for discussions among scholars experienced in conducting neuroscientific studies, the manuscript is mainly written to inform scholars who are unfamiliar with the methodological repertoire of functional neuroimaging and its use in expertise research. in this review, we will particularly address expertise in medical image diagnosis, which can be defined as the inspection and interpretation of a visual representation of the human anatomy or its functions (gegenfurtner, kok, van geel, de bruin, jarodzka, szulewski, & van merriënboer, 2017); but because this body of research is still limited and in its infancy, we will extend our review to other content domains with the aim of offering a more useful overview of current methodological decisions in the visual perceptual expertise literature. there are already several systematic reviews available on the neural aspects of visual perceptual expertise (for example, richler & gauthier, 2014, for face perception or gegenfurtner, siewiorek, lehtinen, & säljö, 2013, for medical image diagnosis). the present review has a particular emphasis on implementing cognitive-neuroscience (especially functional-neuroimaging) methods on visual perceptual expertise, and will follow four steps. first, we will start with a short discussion of typically addressed research questions. second, we will describe which kinds of stimuli are employed and which functionalneuroimaging methods are used, with a special focus on the frequently implemented methods eeg and fmri. third, we will summarize the outcomes of studies that addressed the neural correlates of visual perceptual expertise. and finally, we close this review with a discussion of the benefits, caveats, and future directions of cognitive-neuroscience research for studying visual expertise. 2. research questions research on visual perceptual expertise has focused on a wide range of different research questions. these research questions can be clustered in three distinct types: contrastive, developmental, and conditional. naturally, research questions strongly correspond with the research design. for example, contrastive research questions ask how participants of different levels of expertise vary in different neural measures. in a classic study, haller and radue (2005) were interested in examining “neuronal activations during processing of radiologic and non-radiologic images by experienced radiologists and non-radiologist subjects by using event-related functional magnetic resonance (mr) imaging” (p. 983). this is a representative example for the first type of research questions (contrastive research questions). a second type of research questions, developmental research questions, asks how participants neurally adapt to visual perceptual training. these studies typically employ a paradigm implementing a training of inexperienced participants over the course of several weeks. for example, gauthier and colleagues (1998) were interested in examining if increased experience with so-called ‘greebles’, artificially created stimuli (see figure 1), gegenfurtner et al | f l r 16 would yield to an increase of fmri activation in a particular brain region, the so-called ‘fusiform face area’ (outcomes of this study presented and discussed below). finally, the third type of research questions, conditional research questions, addresses the extent to which expertise effects – be they contrastive or developmental – are contingent on task conditions such as the duration of stimulus presentation or different manipulations of the presented stimuli. for example, bilalić and colleagues (2016) were interested in unravelling if expertise effects are moderated by the orientation of the presented stimulus, in their case, xray films showed either in a normal, upright position or in an inverted position (rotated by 180°). comparability across studies depends on the used research question and design. when designing a cognitiveneuroscience study, one can follow a single research question or several research questions even from different research-question types (see above). typically, conditional research questions that address the moderating effect of stimulus or task conditions are often combined with contrastive or developmental research designs. 3. methodology in visual perceptual expertise research in this section, we review established methods implemented in cognitive-neuroscience studies in the field of visual perceptual expertise. we first describe frequently used artificial and naturalistic stimuli. we then look at the methodology of fmri and eeg, in particular, on what kind of information can be derived from fmri and eeg signals, and we also outline other, less frequently used techniques in cognitive neuroscience. figure 1. examples of smoothie, spikie, and cubie objects (open access from op de beeck et al., 2006). 3.1. stimuli 3.1.1. artificial stimuli artificially created stimuli are objects that have no common reference in the real world. this is a deliberate choice to avoid any confounding effects that may be induced from familiarity with the object. gegenfurtner et al | f l r 17 several groups of artificial stimuli have been introduced. some of those used more frequently are ‘smoothies’, ‘spikies’, and ‘cubies’, and, perhaps most prominently, greebles (mentioned above). smoothies, spikies, and cubies are matlab-generated classes of objects that “were designed to have different shape properties and to seem novel (i.e., they did not immediately suggest associations with everyday object categories” (op de beeck, baker, dicarlo, & kanwisher, 2006, p. 13025). figure 1 shows example smoothies, spikies, and cubies. these artificial stimuli were created with variations of different dimensions, so that participants need to process more than one location of the object to attain high rates of discrimination. greebles are objects specifically constrained to be similar to faces along several dimensions. figure 2 shows example greebles. greebles are photo-realistically rendered, three-dimensional, computergenerated objects that all share a common configuration. as gauthier, williams, tarr, and tanaka (1998) explain: “each greeble is made up of a vertically-oriented ‘body’ with four protruding ‘appendages’, from top to bottom, two ‘boges’ a ‘quiff’ and a ‘dunth’” (p. 2402). greebles come in two different genders (called “glip” and “plok”) and five families (called “galli”, “osmit”, “radok”, “samar”, and “tasio”). greebles have been used in a range of studies using both fmri and eeg. figure 2. examples of greeble objects in their two genders and five families (open source from: https://commons.wikimedia.org/wiki/category:greeble). 3.1.2. real-world stimuli in opposition to these artificial stimuli that share little to no resemblance with naturally occurring objects, researchers also use real-world stimuli. these stimuli are called “real-world” to indicate that these objects are not researcher-generated. associated with real-world stimuli is the assumption that there are realworld experts that have developed visual skills related to these objects (shen, mack, & palmeri, 2014), so these material are used in an attempt to create ecologically valid domain-specific tasks. real-world stimuli can be classified as faces and non-face objects. first, photographs of faces (or of parts of faces) are extensively used as stimuli in visual perceptual expertise research because we have so much exposure to faces that this makes us all experts in face recognition (bentin, allison, puce, perez, & mccarthy, 1996; richler & gauthier, 2014). second, photographs of non-face objects includes cars (gauthier, skudlarski, gore, & anderson, 2000), different animal species such as birds (tanaka, curran, & sheinberg, 2005) or dogs (tanaka & curran, 2005), and also letters such as japanese (maurer, zevin, & mccandliss, 2008) and chinese characters (fan, chen, zhang, qi, jin, wang, et al., 2015; qi, wang, hao, zhu, he, & luo, 2016). researchers also use representations of chess positions (bilalić, langner, ulrich, & grodd, 2011) and medical images (haller & radue, 2005). in many studies, these real-world stimuli are presented either in gegenfurtner et al | f l r 18 original form or in inverted, rotated, or otherwise artificially distorted. the rationale behind these artificial manipulations is to complicate and change pattern recognition for expert participants. typically, real-world stimuli in these studies are static and two-dimensional. if we assume that the comprehension of visualizations is moderated by variations in dimensionality and dynamics (for a meta-analysis testing this assumption, see gegenfurtner, lehtinen, & säljö, 2011), then it seems surprising that the literature on the neural correlates of real-world visual perceptual expertise has not yet systematically compared how the brain processes of experts and novices differ when they view static vs. dynamic stimuli or two-dimensional vs. three-dimensional visualizations. 3.2. apparatus while viewing different kinds of stimuli, participants’ neural correlates can be measured with cognitive-neuroscience techniques. measuring these neural correlates is contingent on the study interests and research questions. typically, if researchers are interested in the temporal aspects of image processing, they use electroencephalography (eeg). conversely, if researchers are interested in the spatial aspects of image processing, they use functional magnetic resonance imaging (fmri). in addition to eeg and fmri, there are also several other measurement techniques, including magnetoencephalography (meg), positron emission tomography (pet), and functional near-infrared spectroscopy (fnirs). offering detailed descriptions of each of these techniques is beyond the scope of this review. ward (2006) and squire and colleagues (2013) offer easy-to-understand introductions. but it is informative here to briefly describe the two most frequently used techniques to illustrate how they work and what they measure. these are eeg and fmri. 3.2.1 eeg neurons communicate through electrical signals transmitted along axons and dendrites. when populations of neurons that are oriented in parallel are synchronously active, their electrical signals can be measured with electrodes placed on the scalp. electroencephalography (eeg) records and amplifies these electrical signals over time. when we perceive a picture, particular populations of neurons in our brain respond to this picture. this response is measurable as a change in voltage at the scalp before, while and after seeing the picture. if we average the recorded eeg signal across many trials, random brain activity that is unrelated to the neural processing of the picture is cancelled out. the relevant (stimulus-related) signal is preserved and called the ‘event-related potential’ (erp). when recording eeg from participants while they looked at pictures of faces, bentin and colleagues (1996) found a negative event-related potential (n) that reached its maximum at approximately 172 ms (n170) after picture onset. since this pioneering study, the n170 has become a widely studied eeg component in cognitive neuroscience. eeg measures have a high temporal resolution and are therefore time-sensitive. thus, eeg can be especially used to investigate temporal patterns of brain activity. however, eeg has a relatively low spatial resolution meaning that the localization of the eeg signal source (i.e., the location of the specific neuronal populations evoking the electrical brain activity) cannot be ascertained with high precision. 3.2.2 fmri fmri indirectly measures neural activity through its vascular response: following (e.g., visual) stimulation, neuronal activity in particular (e.g., visual) brain regions increases which results in enhanced local oxygen consumption. neuronal tissue gets new oxygen from the oxygenated hemoglobin in the blood. within a few seconds, the blood flow and the concentration of oxygenated hemoglobin in the blood increases in the particular brain region. this increase is called the hemodynamic response. since oxygenated and deoxygenated hemoglobin have different magnetic properties, the hemodynamic (or the blood oxygenation level-dependent, bold) response can be imaged using fmri. note that the hemodynamic response is considerably delayed and expanded which puts some constraints when designing fmri experiments. compared to eeg, the temporal resolution of fmri is rather low (one data point is normally obtained within 1-2s). however, the spatial resolution is considerably higher (in the mm3 range) meaning that fmri can gegenfurtner et al | f l r 19 provide specific information about the origin of the brain signal and therewith information about which part of the brain is involved in a particular activity (e.g., visual perception). note that fmri only measures relative (and not absolute) changes of the oxygenation level and that fmri visualizations are actually representations of statistical differences of the fmri signal across different experimental conditions. in summary, eeg has a very high temporal resolution and is therefore a suited method to investigate timing of brain activity. fmri has a much higher spatial resolution than eeg and is an appropriate method to indicate which brain regions are involved in a particular (e.g., perceptual) task. 4. results of visual expertise research this section presents the outcomes of studies addressing neural correlates of visual perceptual expertise. how does the development of expertise change temporal and spatial aspects of information processing? first, we summarize the findings of eeg research. second, we review the fmri findings. and finally, in a special section, we zoom in on the relatively new field of cognitive-neuroscience research applied to medical image diagnosis. 4.1. eeg research: the n170 using erps based on eeg measurements, cognitive-neuroscience research has provided strong support for the idea that a particular early erp component, namely the n170 (introduced above), plays a significant role when participants process photographs and pictures of faces (bentin et al., 1996; for a metaanalysis of this research, see hinojosa, mercado, & carretié, 2015). interestingly, it could be demonstrated that patients who suffer from face blindness, also called prosopagnosia (the inability to recognize faces), did not show this larger magnitude of the n170 component when processing faces (for reviews, see richler & gauthier, 2014; towler, fisher, & eimer, 2017). this body of evidence on face processing has inspired research on perceptual expertise because of the assumption, in part, that all humans are ‘face experts’. if the n170 was such a stable neurophysiological marker in face perception, would the enhanced n170 also reflect expert processing of other familiar, domain-specific objects? a pioneering study by tanaka and curran (2001) confirmed this hypothesis. eeg was recorded while participants viewed photographs of cars or birds. approximately 164 ms after stimulus onset, participants who were car experts showed a larger n170 component for cars compared to birds, and participants who were bird experts showed a larger n170 component for birds compared to cars. tanaka and curran (2001) carefully controlled for stimulus artefacts including image properties and task instruction, and also for group effects in that the same participants viewed photos of cars and birds and were thus expert and novice in different trials of the experiment. in summary, this study revealed that visual perceptual expertise is associated with an enhanced n170 component and therewith with very early stages of visual information processing. in recent years, this effect has been replicated with both car (gauthier & curby, 2005; scott, tanaka, sheinberg, & curran, 2008) and bird stimuli (scott, tanaka, sheinberg, & curran, 2006; tanaka et al., 2005). research also disclosed the expertise effect on n170 using artificial stimuli including blobs (curran, tanaka, & weiskopf, 2002) and greebles (rossion, gauthier, goffaux, tarr, & crommelinck, 2002; rossion, kung, & tarr, 2004) and with non-object letter symbols, including japanese (maurer et al., 2008) and chinese characters (fan et al., 2015; qi et al., 2016). in summary, eeg studies suggest that, similar to face perception (hinojosa et al., 2015; richler & gauthier, 2014; towler et al., 2017), visual expertise modifies the temporal aspects of information processing related with an enhanced n170 component for trained or domain-specific objects. gegenfurtner et al | f l r 20 4.2. fmri research: revealing the functional role of the ffa in 1997, kanwisher and colleagues located a brain region in the fusiform gyrus that is strongly activated when humans view faces. this region was called the fusiform face area (ffa). two years later, in 1999, gauthier, tarr, anderson, skudlarski, and gore demonstrated that the ffa is not only activated when viewing faces but also indicates the level of expertise with artificial objects (in this case greebles). the assumption was that the selectivity of ffa reflects a more generalized form of visual perceptual expertise that is not intrinsically specific or restricted to processing face stimuli (tarr & gauthier, 2000). this assumption was confirmed with bird (gauthier et al., 2000), car (gauthier et al., 2000), and artificial stimuli (gauthier et al., 1999). however, an early criticism of these studies was that these stimuli were similar to faces: indeed, parts of greebles evoke resemblance to faces, birds have faces, and also cars, at least in threequarter frontal views, resemble faces (kanwisher, 2000; grill-spector, knouf, & kanwisher, 2004). the conclusion was, thus, that ffa activation was more likely the result of face similarity than object expertise. to minimize the effect of faces, xu (2005) used side view photographs of birds and cars, and reported that visual perceptual expertise was still associated with ffa activation. since then, a rich plethora of fmri studies supported gauthier’s initial assumption that visual expertise in object perception was associated with activation in the ffa independent of face similarity (bilalić et al., 2011; bukach, gauthier, & tarr, 2006; palmeri & gauthier, 2004; righi, tarr, & kingon, 2013; but see bartlett, boggan, & krawczyk, 2013, for a study that did not find differences between experts and novices in ffa activation in a chess task. in that study, artificially inverted and distorted chess stimuli were used, so it is a matter of debate if these stimuli were suitable to trace chess expertise). in recent years, the discussion around face selectivity tended to be replaced with a more recent discussion whether ffa was the only region relevant for processing familiar objects or whether visual perceptual expertise was associated with the interaction between different brain regions (e.g., bilalić, langner, campitelli, turella, & grodd, 2015; harel, kravitz, & baker, 2013; wong & wong, 2014). in short, there seems to be broad consensus in the field that the processing of objects involves more than just ffa. specifically, wong and wong (2014, p. 308) explain that “perceptual expertise researchers have been considering the interaction between perceptual and cognitive processing as an important component in understanding perceptual expertise for different objects. it is therefore unnecessary to create the debate between the so-called “perceptual view” and “interactive view” of expert object recognition, as the interaction between perceptual and cognitive processing has been well accommodated in perceptual expertise research.” overall, studies using fmri demonstrated that when experts view domainspecific stimuli, the ffa and other brain regions are activated. the precise location of these “other” brain regions and their particular interaction patterns with ffa, however, are still under investigation. gegenfurtner et al | f l r 21 table 1 studies examining neural correlates of expertise differences with medical images as stimuli first author (year) participants stimulus task (apparatus) main findings bilalić (2016) 16 radiologists, 15 students upright or inverted chest x-ray films, photographs of faces, rooms, and tools viewing task: 1-back task (fmri) expertise effect in ffa fiorio (2010) 8 clinicians, 10 students photographs and videos of healthy and dystonic writing movement decision task: judgment if and to what extent writing was dystonic (tms) corticospinal activation in students, but not in experts haller (2005) 12 radiologists, 12 laypersons original and manipulated radiologic images detection task: finding manipula tions on the image (fmri) increased activa-tion in temporal and frontal gyri in radiologists harley (2009) 7 radiologists, 6 4th-yr residents, 7 1st-yr residents normal and abnormal chest x-ray films detection task: finding nodules (fmri) positive correlation of ffa activity with expertise melo (2011) 25 radiologists chest x-ray films that included lesions, animals, or letters detection task: finding lesions, animals, or letters (fmri) activation in left inferior frontal sulcus and posterior cingulate cortex ribas (2013) 29 radiologists veterinary x-ray films decision task: choose between four diagnosis options (eeg) positive correlation of expertise with electrode activity c4, f3, f8, oz, t6 4.3. visual expertise in medical image diagnosis gegenfurtner, siewiorek, lehtinen, and säljö (2013) reviewed the literature on visual expertise in relation to medical image diagnosis and identified three of 21 studies that examined neural correlates of expert-novice differences when inspecting medical visualizations (haller & radue, 2005; fiorio et al., 2010; harley et al., 2009). since the review of gegenfurtner et al. (2013), two additional studies were published that addressed the neural basis of visual perceptual expertise in medicine (bilalić et al., 2016; ribas et al., 2013). we briefly review these studies here, together with an additional paper (melo et al., 2011) that examined the neural correlates of radiologists’ diagnoses. although melo et al. (2011) did not analyse the effect of expertise, the study is relevant in the current context as it discusses the involvement of brain regions when deriving diagnoses from medical visualizations. table 1 offers an overview of the six studies. bilalić and colleagues (2016) asked radiologists and medical students to indicate if the current stimulus they were seeing was the same as the previous one. stimuli were chest x-ray films that were either presented in upright position or rotated by 180° (inverted), as well as stimuli including photographs of faces, rooms, and tools. the findings suggest that the ffa of radiologists compared to medical students was more sensitive in differentiating upright or rotated x-ray films from the photographs showing rooms and tools. bilalić et al. (2016) conclude that the ffa activation was likely associated with the level of participant expertise effect. also harley and colleagues (2009) found a positive correlation between ffa activation and the visual expertise of radiologists. however, harley et al. (2009) also reported that activity in the right ffa did not differ between radiologists and first-year residents looking at radiological images. haller and radue (2005) presented radiologic images (computer tomography scans, magnetic resonance images, and ultrasound pictures) that were either original or manipulated to radiologists and non-radiologists. the participants were asked to decide if the presented images were original or manipulated. the group of gegenfurtner et al | f l r 22 radiologists showed significantly stronger activation than the group of non-radiologists in the bilateral middle and inferior temporal gyrus, bilateral medial and middle frontal gyrus, and left superior and inferior frontal gyrus—regions that are allegedly associated with visual attention and memory retrieval (wager & smith, 2003). haller and radue’s (2005) study is interesting because it is the first to indicate that different brain regions interact when experts visually process medical images. the findings of melo et al. (2011) and ribas et al. (2013) further support this notion. particularly, using eeg, ribas and colleagues (2013) report that participations with higher levels of expertise had more electric activity compared to participants with lower levels of expertise. fiorio et al. (2010) used transcranial magnetic stimulation (tms) to examine how participants differed when viewing photographs and short video sequences of healthy and dystonic writing. briefly, in tms, a magnetic field generator is placed in close proximity to the head of a participant in order to evoke electric currents in brain areas (for introductions to tms, see walsh & cowey, 2000; ward, 2006). the authors showed that “observation of pathological actions differently modulates the viewer’s motor resonant system, depending on previous knowledge, visual expertise, and ability to recognize sub-optimal movement kinematics” (p. 698). fiorio and colleagues (2010) used dynamic stimuli, which is still rare in the field of cognitive-neuroscience methods applied to medical diagnosis. on the basis of the studies reviewed here, it seems safe to conclude that expertise in medical diagnosis cannot be located in and isolated to a single brain area but, instead, expertise seems to be associated with changes in activation in a multitude of neural regions as a function of experience, amount of training, and knowledge structures. we should note, however, that this interpretation is contingent on the level of task complexity in the original studies. it seems likely that more brain regions are activated when the task is more complex, while many studies employ simplified versions of the task of medical image diagnosis. the six studies reviewed in table 1 differ in their task complexity. the complexity of the employed tasks was categorized following the four-level model of task complexity in the comprehension of visualizations (gegenfurtner et al., 2011) shown in table 2. this model defines task complexity on the basis of contextual demands that differ as a function of the number of desired outcomes, the multiplicity of paths to attain desired outcomes, and the coordinative complexity of informational cues in the task material while moving toward task completion. the reviewed studies include one viewing task, in which participants had to say if they had just seen the same image (bilalić et al., 2016); three detection tasks, in which participants were asked to search for an abnormality or specific target within the image (haller & radue, 2005; harley et al., 2009; melo et al., 2011); and two decision tasks, in which the participants had to choose among a given set of options (fiorio et al., 2010; ribas et al., 2013). the study by fioro et al. (2010) is the only one using tms and the study by ribas et al. (2013) the only one using eeg; thus, findings from these two studies cannot easily be compared to the other four studies using fmri. somewhat surprisingly, to date, no study has asked participants to produce a full diagnosis from a presented visual material, perhaps because tasks inside a magnetic resonance scanner are kept deliberately simple and diagnostic problem-solving tasks would be too complex; even a simple blink causes severe artifacts in electroencephalograms. typing is practically impossible, and speaking might be hard to record due to the noise made by the scanner. furthermore, if the task is too complex in comparison to the control task, this might lead to differences in brain activation that are so widespread that it might no longer be able to meaningfully interpret them. this explains perhaps the scarcity of cognitive-neuroscience studies in medical image diagnosis relative to its wide application in the visual perceptual expertise literature. gegenfurtner et al | f l r 23 table 2 four-level model of task complexity in the comprehension of visualizations (adapted from gegenfurtner, lehtinen, & säljö, 2011) task type multiplicity of solution paths number of desired outcomes coordinative complexity example viewing task low low low looking at medical images detection task low low high searching for lung nodules decision task low high high deciding between given options problem-solving task high high high generating a diagnosis if we compare the studies using medical images as experimental stimuli (reviewed in table 1) with the wider visual perceptual expertise literature and their findings of ffa activation and the enhanced n170 component, it is evident that the expertise effect in ffa was partially confirmed with medical images (bilalić et al., 2016; harley et al., 2009). an increased n170 component has not yet been systematically addressed. it is very encouraging that, since our review some years ago (gegenfurtner et al., 2013), more and more cognitive-neuroscience studies using fmri or eeg emerge that address medical image diagnosis. we do expect that future research will proliferate in this area in an attempt to replicate ffa activation and n170 enhancement as neural correlates of visual expertise in the medical domain. these studies will help us understand how medical expertise changes temporal and spatial brain activation patterns associated with the diagnosis of medical visualizations. 5. discussion after reviewing typical research questions and experimental stimuli, describing eeg and fmri, and reporting the current state of the neural correlates of visual perceptual expertise, this section will now elaborate on the advantages and limitations of cognitive-neuroscience research in the current context. what are the benefits of using fmri and eeg? what are caveats of these methodologies? and what are directions for future research that originate from this review? 5.1. benefits the benefits of applying cognitive-neuroscience methods in research on visual perceptual expertise relate to the extension of behavioural research, high temporal and spatial sensitivity, and high levels of control. we elaborate on each of these benefits in turn. first, cognitive neuroscience can extend behavioural research. in particular, cognitive neuroscience affords different units and levels of analysis; these, in turn, make visible some of the neural correlates underlying cognitive processes that would not be accessible with behavioural measures (ansari, de smedt, & grabner, 2012). framing this triangulation, stern and schneider (2010) introduced the metaphor of a digital road map: with cognitive neuroscience, researchers can zoom in to the neural levels of cognition and perception and examine processes inside the human brain. if researchers are interested in these processes, eeg and fmri offer measures that can unveil neural activation as the basis for observable expertise differences (gegenfurtner et al., 2013; gruber et al., 2010). gegenfurtner et al | f l r 24 another advantage of many cognitive-neuroscience methods is their very high temporal and spatial resolution. more precisely, eeg takes measures in the range of milliseconds. thus, if we are interested in the temporal aspects of expert performance, then time-sensitive eeg is a very suited method, especially if measured in parallel with pupillometry (szulewski, gegenfurtner, howes, sivilotti, & van merriënboer, 2017) and eye tracking (holmqvist, nyström, anderson, dewhurst, jarodzka, & van de weijer, 2011; jarodzka, jaarsma, & boshuizen, 2015). in contrast, fmri has a unique capability in locating very precisely the brain regions that are active, e.g., when participants of varying levels of expertise interpret complex images. if research aims to uncover where and when neural activity occurs during expert performance, then eeg and fmri (at best in combination) are two very powerful, non-invasive methodological tools. finally, because of the extremely high sensitivity of eeg (temporal) and fmri (spatial), experiments in cognitive neuroscience are typically very controlled. these levels of control afford high levels of external validity (ansari et al., 2012; de smedt, 2014). because researchers invest a considerable amount of time and energy in securing experimental control, including a strict selection of participants (for example: only righthanded people) and carefully filtered stimulus materials (exemplarily reflected in the huge effort of creating greebles), findings from eeg and fmri often result in stable, generalizable inferences. these generalizable inferences can inform researchers when developing theories of visual expertise. because cognitive-neuroscience methods have high levels of temporal and spatial sensitivity, as well as experimental control, neural correlates of visual expertise can be used in theory testing and development (bilalić et al., 2015). neuroscientific findings thus have the potential to inform expertise research in two ways. first, they can be used to test the predictive validity of existing models and theories, for example on how expertise develops in novices (kok, de bruin, robben, & van merriënboer, 2012; van geel, kok, dijkstra, robben, & van merriënboer, 2017), intermediates (boshuizen & schmidt, 1992; ericsson & lehmann, 1992), and experts (gegenfurtner, 2013; gegenfurtner, nivala, lehtinen, & säljö, 2009). second, they can be used to develop novel theories to account for expertise differences revealed by methods of cognitive-neurosciences; differences that would have remained unobservable with behavioral methods alone (bilalić et al., 2015). 5.2. caveats no method comes without limitations. powerful and elegant as cognitive neuroscience may appear, its methodology also includes different costs that can compromise the available evidence. caveats include the temporal and spatial resolution, ecological validity, a reductive bias, and limited implications for educational practice. first, and perhaps surprisingly, the extreme sensitivity of eeg and fmri measures, positive on one side, introduces of course a number of limitations to the experimental setup. for example, in eeg research, already the slightest motions like a blink or moving the nostrils creates severe data artifacts. research projects thus often lose a considerable amount of data because participants had not been motionless enough while their neural activity was recorded (ansari et al., 2012; de smedt, 2014). this is particularly detrimental if we consider the financial costs of data collection and if we consider that cognitive neuroscience often works with small sample sizes. to cover for this data loss, even higher and stricter experimental controls are developed, implemented, and employed. the fact that eeg and fmri are so sensitive to small motions causing artefacts means that the ecological validity is easily compromised: the sensitivity to motion restricts the possibilities for ecologically valid experiments. this is related to compromises in the ecological validity of cognitive neuroscience experiments. experts are not typically motionless or work inside the tube of a 3-tesla mri scanner. it is thus a matter of debate to which extent cognitive neuroscience can reflect the complexity of processes and practices associated with real-world expertise. do we force experts to act in too artificial ways? can we capture how experts diagnose a patient case when we show them a chest x-ray in a rotated, blurred, or otherwise distorted mode, for the duration of only a few seconds? the high level of experimental control, that is clearly a benefit of cognitive neuroscience, comes at the same time with limitations to ecological validity. the limited gegenfurtner et al | f l r 25 ecological validity is also associated with tasks that are typically used. instead of complex problem-solving tasks that would reflect medical diagnosis, many of the reviewed studies employed lower levels of task complexity (gegenfurtner et al., 2011). furthermore, study participants are asked to complete these tasks repeatedly in longer sessions to get readable signals, which can further compromise ecological validity. this is in line with de smedt’s (2014) observation that tasks used in neuroscience “need to be very elementary, because the larger the number of cognitive processes in a particular task, the more difficult it will be to disentangle these cognitive processes physiologically.” cognitive neuroscience is interested in the neural correlates of behavior, cognition, emotion etc. (squire et al., 2013; stern & schneider, 2010; ward, 2006). epistemologically, from a neuroscience perspective, visual perceptual expertise tends to be reduced to changes in electrical activity or blood flow. while this can render fascinating findings, other important ingredients of expert performance are ignored. certainly, all research is reductionist (lehtinen, 2012; säljö, 2009). one must make decisions what to measure because we simply cannot account for all relevant aspects in a single study, as interesting these aspects may be (damşa et al., 2017). focusing on neural levels does not imply that we uncover the basis of human learning. one could easily argue that the basis is the social context within which we are situated (gegenfurtner & szulewski, 2017; säljö, 2009). in describing this reductive bias, lehtinen (2012) notes: “because of the impressive technical development of brain research during the last two decades (…) many neuroscientists have quite a strong tendency towards downwards reductionism (emphasis in the original). this reductionism stems from the idea that research registering brain processes with complex technical tools finally opens up a real scientific approach to learning research.” cognitive neuroscientists are well aware that eeg and fmri are just two among the many other methods of learning research (e.g., de smedt, 2014). this reductive bias is not only associated with limitations in how expertise is measured and methodically approximated; it also signals limitations in how expertise and performance are theorized and conceptually framed (lehtinen, 2012; säljö, 2009; siewiorek & gegenfurtner, 2010). more specifically, theories of visual expertise that are exclusively grounded on neuroscientific evidence risk to de-emphasize other facets of how expert performance is enacted and displayed in real-world activities and practices (gibson, 1986; goodwin, 1994; see also de bruin, 2017; gegenfurtner et al., 2017). this risk is of course inherent in all mono-method designs, largely because single method studies capture a limited number of units of analysis. conversely, the combination of approaches in mixed-method or multi-method designs allow for the triangulation of units of analyses, which can inform a theory of visual perceptual expertise that encompasses different analytic levels beyond what is evident from single method approaches like eeg, tms, or fmri. while the benefits of bridging methods of expertise research are clear, methods are always part of a scientific community. these communities have agency as political actors and “defend” their methods against the influences of concurrent academic realms (al lily, foland, stoloff, gogus, erguvan, awshar, et al., 2017), so it will be an interesting observation to see if and to what extent expertise researchers will (continue to) embrace methodological triangulations and combine cognitive neurosciences with other method approaches in their studies for the purpose of theory development. related to that is a false belief that findings from eeg and fmri would be directly applicable and informative for re-designing learning environments and curricula. educational neuroscientists work hard to deemphasize the hopes that many practitioners have when they read that finally, once we understand the brain, we understand how learning, expertise, and education “work”. ansari and colleagues (2012) write that “the most obvious question a teacher may ask is, ‘how will i be able to apply this knowledge?’ there is, in our view, no reason to expect that neuroimaging research, will determine directly how teaching should take place. this is considered by many ‘a bridge too far’”. thus, cognitive or educational neuroscience may have a very limited impact on educational practices. does this mean we should not conduct this kind of research? certainly not; but we should, perhaps, rethink our expectations about what neuroscience measures can do (ansari et al., 2012; de smedt, 2014; lehtinen, 2012; säljö, 2009; stern & schneider, 2010). gegenfurtner et al | f l r 26 5.3. directions for future research examining the neural correlates of visual expertise is a fascinating endeavour. this review has identified a small, still limited number of studies that examined how visual perceptual expertise in medical image diagnosis correlates with eeg, fmri, and tms measures. what are directions for future research that follow from this review? first, all but one of the reviewed studies in table 1 used static pictures. only fiorio and colleagues (2010) used video sequences as stimuli. we thus recommend exploring and testing if and to what extent neuroscience-based visual perceptual expertise research can use dynamic stimuli. second, future research can make more use of eeg as well as other neuroscience approaches such as tms or meg to study the neural correlates of visual expertise in medical image diagnosis. another possible direction for future research is the combination of neuroscience methods with other online measures of expertise, including eye tracking and pupillometry (holmqvist et al., 2011; gegenfurtner & seppänen, 2013; kok et al., 2012; szulewski et al., 2017) if the constraints of different temporal scales can be accommodated for. such combinations would be interesting theoretically as a means to inquire how eye movements and neural activity correlate in expert diagnostic reasoning. fourth, implications of cognitive-neurosciences for education and training need to be explored. to what extent can clinical practitioners and medical educators benefit from neuroscientific measures? this is a question that applies to the field of medical image perception more generally and is not exclusive to cognitive-neurosciences; for example, also eye tracking used to be criticized for not being relevant enough to medical education and training, but has demonstrated its benefits in the form of eye movement modeling examples (jarodzka, balslev, holmqvist, nyström, scheiter, gerjets, et al., 2012; seppänen & gegenfurtner, 2012). it remains to be seen in future research if, and how, a similar approach can be developed for functional imaging. we should note, however, that cognitive-neurosciences are useful methods in addition to instructional design studies: while design studies reveal what works, neural correlates can indicate why it works (gegenfurtner et al., 2013; kok, van geel, van merriënboer, & robben, 2017). finally, eeg and fmri are measures into the temporal and spatial configurations of visual perceptual expertise. these measures should be incorporated into existing theory frameworks of visual perceptual expertise to advance our conceptual understanding of how experts, intermediates, or novices comprehend medical visualizations. 6. conclusion as noted at the outset, if we assume that individual differences in visual expertise are reflected in differences in the brain, then cognitive neuroscience methods can be used to examine the neural correlates of the experts’ visual skills. these methods can complement other methodologies interested in how experts in medical disciplines form their diagnoses. this review summarized research on visual perceptual expertise and described which research questions were typically asked, which stimuli and functional neuroimaging methods were frequently used, and how experts and novices differ in their neural representations (e.g., with respect to activation within the ffa). we also outlined some of the benefits, limitations, and future directions of cognitive-neuroscience research as they apply to the comprehension of (medical) visualizations. this methodological review closes with the hope that interested researchers, who are perhaps yet inexperienced with cognitive neuroscience, will find this paper a useful introduction into the neural correlates of visual expertise. keypoints cognitive neuroscience can uncover the neural correlates of visual perceptual expertise electroencephalography can reveal temporal adaptations of expertise (e.g., n170) gegenfurtner et al | f l r 27 functional magnetic resonance imaging can reveal spatial adaptations of expertise (e.g., ffa) cognitive neuroscience examining expertise in medical image diagnosis is promising but still in its infancy eeg and fmri can complement and extend each other as wells as other methodologies in expertise research references al lily, a., foland, d., stoloff, d., gogus, a., erguvan, i. d., awshar, m. t., et al. (2017). academic domains as political battlegrounds: a global enquiry by 99 academics in the fields of education and technology. information development. doi:10.1177/0266666916646415 ansari, d., de smedt, b., & grabner, r. (2012). neuroeducation a critical overview of an emerging field. neuroethics, 5, 105-117. doi:10.1007/s12152-011-9119-3 bartlett, j., boggan, a. l., & krawczyk, d. c. (2013). expertise and processing distorted structure in chess. frontiers in human neuroscience, 7, 825. doi:10.3389/fnhum.2013.00825 bentin, s., allison, t., puce, a., perez, e., & mccarthy, g. (1996). electrophysiological studies of face perception in humans. journal of cognitive neuroscience, 8, 551-565. doi:10.1162/jocn.1996.8.6.551 bilalić, m., grottenthaler, t., nägele, t., & lindig, t. (2016). the faces in radiologic images: fusiform face area supports radiological expertise. cerebral cortex, 26, 1004-1014. doi:10.1093/cercor/bhu272 bilalić, m., langner, r., campitelli, g., turella, l., & grodd, w. (2015). editorial: neural implementation of expertise. frontiers in human neuroscience. doi:10.3389/fnhum.2015.00545 bilalić, m., langner, r., ulrich, r., & grodd, w. (2011). many faces of expertise: fusiform face area in chess experts and novices. journal of neuroscience, 31, 10206-10214. doi:10.1523/jneurosci.5727 -10.2011 boshuizen, h. p. a., & schmidt, h. g. (1992). on the role of biomedical knowledge in clinical reasoning by experts, intermediates, and novices. cognitive science, 16, 153-184. doi:10.1207/s15516709cog1602_1 bukach, c. m., gauthier, i., & tarr, m. j. (2006). beyond faces and modularity: the power of an expertise framework. trends in cognitive science, 10, 159-166. doi:10.1016/j.tics.2006.02.004 curran, t., tanaka, j. w., & weiskopf, d. m. (2002). an electrophysiological comparison of visual categorization and recognition memory. cognitive, affective, and behavioral neuroscience, 2, 1-18. doi:10.3758/cabn.2.1.1 damşa, c. i., froehlich, d. e., & gegenfurtner, a. (2017). reflections on empirical and methodological accounts of agency at work. in m. goller & s. paloniemi (eds.), agency at work: an agentic perspective on professional learning and development. new york: springer. de bruin, a. b. h. (2017). the potential of neuroscience for health sciences education: towards convergence of evidence and resisting seductive allure. advances in health sciences education. doi:10.1007/s10459016-9733-2 de smedt, b. (2014). advances in the use of neuroscience methods in research on learning and instruction. frontline learning research, 6, 7-14. doi:10.14786/flr.v2i4.115 fan, c., chen, s., zhang, l., qi, z., jin, y., wang, q., et al. (2015). n170 changes reflect competition between faces and identifiable characters during early visual processing. neuroimage, 110, 32-38. doi:10.1016/j.neuroimage.2015.01.047 fiorio, m., cesari, p., bresciani, m. c., & tinazzi, m. (2010). expertise with pathological actions modulates a viewer’s motor system. neuroscience, 167, 691-699. doi:10.1016/j.neuroscience.2010.02.010. gauthier, i., & curby, k. m. (2005). a perceptual traffic jam on highway n170. current directions in psychological science, 14, 30-33. doi:10.1111/j.0963-7214.2005.00329.x gauthier, i., skudlarski, p., gore, j. c., anderson, a. w. (2000). expertise for cars and birds recruits brain areas involved in face recognition. nature neuroscience, 3, 191-197. doi:10.1038/72140 gegenfurtner et al | f l r 28 gauthier, i., tarr, m. j., anderson, a. w., skudlarski, p., & gore, j. c. (1999). activation of the middle fusiform ’face area’ increases with expertise in recognizing novel objects. nature neurosciences, 2, 568-573. doi:10.1038/9224 gauthier, i., williams, p., tarr, m. j., & tanaka, j. (1998). training ’greeble’ experts: a framework for studying expert object recognition processes. vision research, 38, 2401-2428. doi:10.1016/s00426989(97)00442-2 gegenfurtner, a. (2013). transitions of expertise. in j. seifried & e. wuttke (eds.), transitions in vocational education (pp. 305-319). opladen: budrich. gegenfurtner, a., kok, e., van geel, k., de bruin, a., jarodzka, h., szulewski, a., & van merriënboer, j. j. g. (2017). the challenges of studying visual expertise in medical image diagnosis. medical education, 51, 97-104. doi:10.1111/medu.13205 gegenfurtner, a., lehtinen, e., & säljö, r. (2011). expertise differences in the comprehension of visualizations: a meta-analysis of eye-tracking research in professional domains. educational psychology review, 23, 523-552. doi:10.1007/s10648-011-9174-7 gegenfurtner, a., nivala, m., säljö, r., & lehtinen, e. (2009). capturing individual and institutional change: exploring horizontal versus vertical transitions in technology-rich environments. in u. cress, v. dimitrova, & m. specht (eds.), learning in the synergy of multiple disciplines. lecture notes in computer science (pp. 676-681). berlin: springer. doi:10.1007/978-3-642-04636-0_67 gegenfurtner, a., & seppänen m. (2013). transfer of expertise: an eye-tracking and think-aloud study using dynamic medical visualizations. computers & education, 63, 393-403. doi:10.1016/j.compedu.2012.12.021 gegenfurtner, a., siewiorek, a., lehtinen, e., & säljö, r. (2013). assessing the quality of expertise differences in the comprehension of medical visualizations. vocations and learning, 6, 37-54. doi: 10.1007/s12186-012-9088-7 gegenfurtner, a., & szulewski, a. (2016). visual expertise and the quiet eye in sports – comment on vickers. current issues in sport science, 1, 108. doi:10.15203/ciss_2016.108 gegenfurtner, a., & van merriënboer, j. j. g. (2017). methodologies for studying visual expertise. frontline learning research. gibson, j. (1986). the ecological approach to visual perception. new york: psychology press. goodwin, c. (1994). professional vision. american anthropologist, 96, 606-633. doi:10.1525/aa.1994.96.3. 02a00100 grill-spector, k., knouf, n., & kanwisher, n. (2004) the fusiform face area subserves face perception, not generic within-category identification. nature neuroscience, 7, 555-562. doi:10.1038/nn1224 gruber, h., jansen, p., marienhagen, j., & altenmüller, e. (2010). adaptations during the acquisition of expertise. talent development & excellence, 2, 3-15. haller, s., & radue, e. w. (2005). what is different about a radiologist’s brain? radiology, 236, 983-989. doi:10.1148/radiol.2363041370 harel, a., kravitz d., & baker c. i. (2013). beyond perceptual expertise: revisiting the neural substrates of expert object recognition. frontiers in human neuroscience, 7, 885. doi:10.3389/fnhum.2013.00885 harley, e. m., pope, w. b., villablanca, j. p., mumford, j., suh, r., mazziotta, j. c., et al. (2009). engagement of fusiform cortex and disengagement of lateral occipital cortex in the acquisition of radiological expertise. cerebral cortex, 19, 2746-2754. doi:10.1093/cercor/bhp051 helle, l., nivala, m., kronqvist, p., gegenfurtner, a., björk, p., & säljö, r. (2011). traditional microscopy instruction versus process-oriented virtual microscopy instruction: a naturalistic experiment with control group. diagnostic pathology, 6, s81-s89. doi:10.1186/1746-1596-6-s1-s8 hinojosa, j. a., mercado, f., & carretié (2015). n170 sensitivity to facial expression: a meta-analysis. neuroscience & biobehavioral reviews, 55, 498-509. doi:10.1016/j.neubiorev.2015.06.002 holmqvist, k., nyström, n., andersson, r., dewhurst, r., jarodzka, h., & van de weijer, j. (2011). eye tracking: a comprehensive guide to methods and measures. oxford: oxford university press. gegenfurtner et al | f l r 29 jarodzka, h., balslev, t., holmqvist, k., nyström, m., scheiter, k., gerjets, p., et al. (2012). conveying clinical reasoning based on visual observation via eye-movement modelling examples. instructional science, 40, 813-827. doi:10.1007/s11251-012-9218-5 jarodzka, h., jaarsma, t., & boshuizen, h. p. a. (2015). in my mind: how situation awareness can facilitate expert performance and foster learning. medical education, 49, 854-856. doi:10.1111/medu.12791 kanwisher, n. (2000). domain specificity in face perception. nature neuroscience, 3, 759-763. doi: 10.1038/77664 kanwisher, n., mcdermott, j., & chun, m. m. (1997). the fusiform face area: a module in human extrastriate cortex specialized for face perception. journal of neuroscience, 17, 4302-4311. kok, e. m., de bruin, a. b. h., robben, s. g. f., & van merriënboer, j. j. g. (2012). looking in the same manner but seeing it differently: bottom-up and expertise effects in radiology. applied cognitive psychology, 26, 854-862. doi:10.1002/acp.2886 kok, e. m., van geel, k., van merriënboer, j. j. g., & robben, s. g. f. (2017). what we do and do not know about teaching medical image interpretation. frontiers in psychology, 8, 309. doi:10.3389/fpsyg.2017.00309 lehtinen, e. (2012). learning of complex competences: on the need to coordinate multiple theoretical perspectives. in a. koskensalo, j. smeds, r. de cillia, & á. huguet (eds.), language: competencies change contact (pp. 13-27). berlin: lit. maurer, u., zevin, j. d., & mccandliss, b. d. (2008) left-lateralized n170 effects of visual expertise in reading: evidence from japanese syllabic and logographic scripts. journal of cognitive neuroscience 20, 1878-1891. doi:10.1162/ jocn.2008.20125 melo, m., scarpin, d. j., amaro, e., passos, r. b., sato, j. r., friston, k. j., et al. (2011). how doctors generate diagnostic hypotheses: a study of radiological diagnosis with functional magnetic resonance imaging. plos one, 6, e28752. doi:10.1371/journal.pone.0028752. nishimura, m., & maurer, d. (2008). the effect of categorisation on sensitivity to second-order relations in novel objects. perception, 37, 584-601. doi:10.1068/p5740 op de beeck, h. p., baker, c. i., dicarlo, j. j., & kanwisher, n. g. (2006). discrimination training alters object representations in human extrastriate cortex. journal of neuroscience, 26, 13025-13036. doi:10.1523/jneurosci.2481-06.2006 qi, z., wang, x., hao, s., zhu, c., he, w., & luo, w. (2016). correlations of electrophysiological measurements with identification levels of ancient chinese characters. plos one, 11, e0151133. doi: 10.1371/journal.pone.0151133 palmeri, t. j., & gauthier, i. (2004). visual object understanding. nature reviews neuroscience, 5, 291-303. doi:10.1038/nrn1364 ribas, l. m., rocha, f. t., siqueira ortega, n. r., freitas de rocha, a., & massad, e. (2013). brain activity and medical diagnosis: an eeg study. bmc neuroscience, 14, 109. doi:10.1186/1471-2202-14-109 richler, j. j., & gauthier, i. (2014). a meta-analysis and review of holistic face processing. psychological bulletin, 140, 1281-1302. doi:10.1037/a0037004 righi, g. r., tarr, m., & kingon, a. (2013). category-selective recruitment of the fusiform gyrus with chess. in j. staszewski (ed.), expertise and skill acquisition: the impact of william g. chase (pp. 261280). new york: taylor & francis. rossion, b., gauthier, i., goffaux, v., tarr, m. j., & crommelinck, m. (2002). expertise training with novel objects leads to left-lateralized facelike electrophysiological responses. psychological science, 13, 250257. doi:10.1111/1467-9280.00446 rossion, b., kung, c.-c., & tarr, m. j. (2004). visual expertise with nonface objects leads to competition with the early perceptual processing of faces in the human occipitotemporal cortex. proceedings of the national academy of sciences, 101, 14521-14526. doi:10.1073/pnas.0405613101 säljö, r. (2009). learning, theories of learning, and units of analysis in research. educational psychologist, 44, 202-208. doi:10.1080/00461520903029030 gegenfurtner et al | f l r 30 scott, l. s., tanaka, j. w., sheinberg, d. l., & curran, t. (2006). a reevaluation of electrophysiological correlates of expert object processing. journal of cognitive neuroscience, 18, 1453-1465. doi:10.1162/jocn.2006.18.9.1453 scott, l. s., tanaka, j. w., sheinberg, d. l., & curran, t. (2008). the role of category learning in the acquisition and retention of perceptual expertise: a behavioral and neurophysiological study. brain research, 1210, 204-215. doi:10.1016/j.brainres.2008.02.054 seppänen, m., & gegenfurtner, a. (2012). seeing through a teacher’s eyes improves students’ imaging interpretation. medical education, 46, 1113-1114. doi:10.1111/medu.12041 sergent, j., ohta, s., & macdonald, b. (1992). functional neuroanatomy of face and object processing. a positron emission tomography study. brain, 115, 15-36. doi:10.1093/brain/115.1.15 shen, j., mack, m. l., & palmeri, t. j. (2014). studying real-world perceptual expertise. frontiers in psychology, 5, 857. doi:10.3389/fpsyg.2014.00857 siewiorek, a., & gegenfurtner, a. (2010). leading to win: the influence of leadership style on team performance during a computer game training. in k. gomez, l. lyons, & j. radinsky (eds.), learning in the disciplines (vol. 1, pp. 524-531). chicago, il: international society of the learning sciences. squire, l. r., berg, d., bloom, f. e., du lac, s., ghosh, a., & spitzer, n. c. (2013). fundamental neuroscience (4th ed.). oxford: academic press. stern, e., & schneider, m. (2010). a digital road map analogy of the relationship between neuroscience and educational research. zdm the international journal on mathematics education, 42, 511-514. doi:10.1007/s11858-010-0278-1 szulewski, a., gegenfurtner, a., howes, d., sivilotti, m., & van merriënboer, j. j. g. (2017). measuring physician cognitive load: validity evidence for a physiologic and a psychometric tool. advances in health sciences education. doi:10.1007/s10459-016-9725-2 tanaka, j. w., & curran, t. (2001). a neural basis for expert object recognition. psychological science, 12, 43-47. doi:10.1111/1467-9280.00308 tarr, m. j., & gauthier, i. (2000). ffa: a flexible fusiform area for subordinate-level visual processing automatized by expertise. nature neuroscience, 3, 764-769. doi:10.1038/77666 towler, j., fisher, k., & eimer, m. (2017). the cognitive and neural basis of developmental prosopagnosia. the quarterly journal of experimental psychology, 72, 316-344. doi:10.1080/17470218.2016.1165263 van geel, k., kok, e. m., dijkstra, j., robben, s. g. f., & van merriënboer, j. j. g. (2017). teaching systematic viewing to final-year medical students improves systematicity but not coverage or detection of radiologic abnormalities. journal of the american college of radiology, 14, 235-241. doi:10.1016/j.jacr.2016.10.001 walsh, v., & cowey, a. (2000). transcranial magnetic stimulation and cognitive neuroscience. nature reviews neuroscience, 1, 73-80. ward, j. (2006). the student's guide to cognitive neuroscience. new york: psychology press. wong, a. c.-n., palmeri, t. j., & gauthier, i. (2009). conditions for facelike expertise with objects. becoming a ziggerin expert—but which type? psychological science, 20, 1108-117. doi:10.1111/j.1467-9280.2009.02430.x wong, a. c.-n., & wong, y. k. (2014). interaction between perceptual and cognitive processing well acknowledged in perceptual expertise research. frontiers in human neuroscience, 8, 308. doi:10. 3389/fnhum.2014.00308 xu, y. (2005). revisiting the role of the fusiform face area in visual expertise. cerebral cortex, 15, 12341242. doi:10.1093/cercor/bhi006 frontline learning research vol. 5 no. 3 special issue (2017) 94 122 issn 2295-3159 corresponding author information: margje w. j. van de wiel, dep. of work and social psychology, faculty of psychology and neuroscience, maastricht university, p.o. box 616, 6200 md maastricht, the netherlands. email: m.vandewiel@maastrichtuniversity.nl, phone: +31-43-3882171, fax: +31-43-3884211. doi: http://dx.doi.org/10.14786/flr.v5i3.257 examining expertise using interviews and verbal protocols margje w. j. van de wiel maastricht university, the netherlands article received 4 may / revised 2 march / accepted 23 march / available online 14 july abstract to understand expertise and expertise development, interactions between knowledge, cognitive processing and task characteristics must be examined in people at different levels of training, experience, and performance. interviewing is widely used in the initial exploration of domain expertise. work and cognitive task analysis chart the knowledge, skills, and strategies experts employ to perform effectively in representative tasks. interviewing may also shed light on the learning processes involved in acquiring and maintaining expertise and the way experts deal with critical incidents. interviews may focus on specific tasks, events, scenarios, and examples, but they do not directly tap the representations involved in task performance. methods that collect verbal protocols during and immediately after task performance better probe the ongoing processes in representing problems and accomplishing tasks. this article provides practical guidelines and examples to help researchers to prepare, conduct, analyse, and report expertise studies using interviews and verbal protocols that are derived from thinking aloud, dialogues or group discussions, free recall, explanation, and retrospective reports. in a multi-method approach, these methods and other techniques need to be combined to fully grasp the nature of expertise. this article shows how the cognitive processes in data collection constrain data quality and highlights how research questions guide the development of coding schemes that enable meaningful interpretation of the rich data obtained. it focuses on professional expertise and provides examples from medicine including visual tasks. this comprehensive review of qualitative research methods aims to contribute to the advancement of expertise. keywords: expertise, interviews, verbal protocols, cognitive processing, analysis mailto:m.vandewiel@maastrichtuniversity.nl http://dx.doi.org/10.14786/flr.v5i3.257 van de wiel | f l r 95 1. introduction for this special issue on “methodologies for studying visual expertise”, the present article will discuss the qualitative research methods of interviews and verbal protocols to examine expertise and expertise development. this article aims to guide students, practitioners, and researchers new to the field of expertise research, or these types of qualitative research, when and how to use these methods to answer their research questions. starting from a theoretical framework of expertise and cognitive processing in task performance, this article provides practical guidelines so that researchers can prepare, conduct, analyse, and report expertise studies using interviews and verbal protocols. the rationale behind the methods, as well as their strengths and weaknesses, are explained to understand how procedures should be designed to maximise the quality of the data. although these guidelines for research are applicable to all expertise domains, the focus here is on professional expertise. most examples will be drawn from medical expertise research, as it has a long-standing tradition of using diverse methods of verbal protocols, and includes various areas of visual expertise. this comprehensive overview of qualitative research methods contributes to the literature by showing why and how the methods can best be used to deliver valid, high-quality verbal data when examining expertise. the literature is reviewed from an analytical and practical perspective to connect different research traditions that shed light on the nature and origins of expertise. expertise research may add to the advancement of any domain, as careful analysis of task characteristics, performance, and underlying knowledge and cognitive processing is at the heart of improving current practices. this review aims to provide researchers and practitioners who want to embark on this endeavour with fundamental insights into theory and methods that help to further develop the level of expertise in their domain of interest. the article is organised into five further sections. first, expertise is defined in terms of outcomes, underlying knowledge and processes, and their interaction with task and domain characteristics. second, the steps in preparing expertise research are discussed, starting with the research questions, familiarisation with the domain of expertise and the specific tasks at hand, the selection of experts and other participants, and the main criteria for choosing between interviews or verbal protocols. third, the interview method is described and placed within the context of research on work and expertise. fourth, the characteristics of interview and verbal protocol methods are described in light of the cognitive processing involved in task performance and data collection. moreover, five methods used to gather verbal protocols to reveal expert task performance are discussed and illustrated in more detail. finally, a conclusion is provided that summarises the main issues to be considered when designing studies that examine expertise using the qualitative research methods of interviews and verbal protocols. 2. expertise two dominant perspectives on expertise can be distinguished in the literature. the expertperformance approach (ericsson, 1996, 2004, 2015; ericsson & smith, 1991) characterises expertise as the capability to demonstrate reproducible superior performance on representative tasks in a specific domain. the highest expertise level is achieved when individuals are able to go beyond mastery and contribute their creative ideas and innovations to the task at hand. although years of practice and experience are needed to become an expert, skilled performance and experience alone are not enough. routine behaviour and full automaticity should be counteracted by gaining high-level control of performance that allows further improvements to be made. in the expert-novice research approach (chi, glaser, & farr, 1988; chi, 2006a), expertise has been characterised by differences in performance and underlying knowledge between groups with increasing levels of experience in a particular domain. experts have a large and well-developed knowledge-base, that is tuned to the tasks performed and the problems encountered, and allows fast and accurate performance in routine situations. in more complex situations, they can apply their knowledge flexibly when trying to understand the situation and decide upon further actions. van de wiel | f l r 96 the obvious similarity in these characterisations of expertise is that they both emphasise routine, automatic versus controlled, deliberate performance that is adapted to the task at hand. both approaches explain how the development of knowledge and skills underlies expert performance (feltovich, prietula, & ericsson, 2006). simply said, expertise is the result of activating the right knowledge at the right time (anderson, 1996). experts have developed rich and coherent knowledge structures that allow immediate access to the relevant knowledge, strategies, skills, and control mechanisms. domain-specific task performance is mediated by evolving representations of the task and problem they attend to. this enables experts to perform effectively and efficiently, coordinating automatic thoughts and actions with deliberate thinking. problem representations guide them in selectively focusing on relevant information and features that novices are not aware of. moreover, they help experts to carefully monitor and adapt their performance in an ongoing process. figure 1 illustrates how both incoming information and the experts’ knowledge in long-term memory continuously interact to determine the content of working memory in task performance. the capacity of working memory is enhanced by retrieval cues that directly access the relevant parts of experts’ knowledge in long-term working memory (ericsson & kintsch, 1995), enabling them to coordinate thoughts and actions in cognitive processing. the evolving mental representations in task performance reflect the content of working memory. experts update their knowledge and skills by means of study, practice and experience. they enhance learning from their experiences by seeking feedback and reflecting upon their performance to find weak aspects in processes and outcomes that might be improved. expertise development is a gradual process in which the knowledge and skills needed to plan, monitor and evaluate performance are refined during practice. this requires the motivation to improve performance and invest effort in deliberate practice (ericsson, krampe, & tesch-röme, 1993; ericsson & pool, 2016). figure 1. information processing in task although both approaches define expertise in relative terms and focus on tasks representative of the domain, one important difference between them relates to the standards of performance. whereas the expertperformance approach (ericsson, 1996, 2015) focuses on top-performance that can be objectively measured, the expert-novice approach (chi et al., 1988, 2006a) is more pragmatic in comparing novices and students with intermediate levels of training to experienced performers within a particular domain. in professional domains, such as medicine, auditing, law, teaching, software engineering and psychotherapy, it is not as easy to objectively measure performance as it is in domains, such as sports and games, in which clear outcomes (e.g., time, points gained) are available. the experience of the professional and the presence of professional criteria, such as degrees, licenses, memberships of professional organisations, prizes, and teaching experience usually work well to identify experts (evetts, mieg, & felt, 2006; hoffman, shadbolt, burton, & klein, 1995; mieg, 2006). there is a notable absence of a ‘gold standard’ of professional performance that is based on a validated objective outcome measure (ericsson, 2004; weiss & shanteau, 2003; shanteau, weiss, thomas, & pounds, 2001) and one best solution or approach to a problem may not even exist (tracey, wambold, lichtenberg, & goodyear, 2014). in medicine, for example, physicians have to make decisions van de wiel | f l r 97 under conditions of uncertainty when they encounter more complex and rare patient problems. in many other professions, experts are confronted with uncertainty and new situations that require performance on the edge of what they may accomplish based on their knowledge and skills (klein, 2008; salas & klein, 2001). experts, furthermore, play an important role in the advancement of their domain and in setting (new) standards for performance (boshuizen & van de wiel, 2014; ericsson, 2009, 2015; evetts et al, 2006; lesgold, 2000). in summary, how expert performance can best be defined depends on several factors including the domain, the tasks, and the type of problems to be solved. the accumulated body of knowledge and skills available in a domain constrains the level of performance that can be acquired by individuals. shanteau (1992) found in his analysis that performance is better in structured domains in which incoming information is static, problems are predictable, conditions are similar, tasks are repetitive, and objective analysis, feedback and decision aids are available. in these structured domains, individuals have more chances to learn and improve their performance as compared to less structured domains, which do not share these task characteristics. kahneman and klein (2009) discuss how intuitive expertise, i.e., automatic accurate judgment, can only be developed in high-validity domains in which the environment is predictable via the recognition of a set of cues. if professionals are given adequate opportunity to practice, they can learn the causal structure and/or the statistical regularities that enable this recognition process. classical examples of ill-structured or low-validity domains include wine tasting, stock broking, and clinical psychology, all domains in which judgment is inconsistent. ericsson (2014, 2015; ericsson & pool, 2016) argues that professional performance can only be improved by searching for and identifying reproducible superior outcomes, which can then be used to guide deliberate practice. libraries of problem situations with known outcomes and simulators enable intensifying practice with immediate feedback for problems that are uncommon or have high-stakes in real practice. research on expertise contributes to the development of a domain and the performance levels that can be achieved by professionals on essential tasks. 3. examining expertise when examining expertise, first the research question needs to be clearly formulated: “what do you want to know about expertise?” as expertise is based on the development of a well-organised body of knowledge that determines the processes and strategies used in task performance, the most obvious questions are “what knowledge and skills underlie expert performance and how do they develop?”. the research question can focus on the representations of the problem to be solved, or the task to be performed and how these differ between novices and experts (e.g., chi, feltovich, & glaser, 1981; van de wiel, boshuizen, & schmidt, 2000), or change as a result of practice (boshuizen, van de wiel, & schmidt, 2012). but it may also focus on the knowledge and strategies used in problem solving and in performing a task (e.g,, boshuizen & van de wiel, 1999; diemers, van de wiel, scherpbier, baarveld, & dolmans, 2015; gilhooly et al., 1997; lesgold et al., 1988; kok et al., 2015). it can focus on the learning processes and how teaching and instruction can help novices to become experts (e.g., chi, bassok, lewis, reimann, & glaser, 1989; kok, de bruin, robben, & van merriënboer, 2013). it may also focus on the activities experts engage in to learn from their experience, further develop and maintain their expertise (e.g., ericsson et al., 1993; van de wiel, van den bossche, janssen, & jossberger, 2011), and the self-regulations skills they apply to plan, control and evaluate their performance. finally, it may be important to investigate the ways in which the knowledge of experts can fall short by focusing on biases and near-errors and how these might be overcome (e.g., chi, 2006a; hashem, chi, & friedman, 2003; elstein & schwarz, 2002). the general themes addressed by these research questions will require further specification depending on previous research, the domain of expertise, and the research interests. in the professional domain of medicine, there is a long tradition of expertise research that started with the seminal work of elstein, shulman, & sprafka in 1978. while searching for general problem solving van de wiel | f l r 98 skills, studies in the early years consistently showed that experts and novices used the same strategy of generating and testing hypotheses in diagnostic problem solving, but that experts generated diagnostic hypotheses faster and more accurately (elstein et al., 1978; norman, eva, brooks, & hamstra, 2006; neufeld, norman, feightner, & barrows, 1981). the accuracy of physicians’ diagnoses, however, was found to be case-specific and tied to the domain of clinical experience (elstein et al., 1978). research was then directed at uncovering the nature and organisation of knowledge underlying physicians’ performance in interaction with the patient cases diagnosed. results have shown that their large and well-developed knowledge base enables physicians to automatically retrieve the relevant knowledge in routine cases, as well as to analytically process cases that are difficult or evoke a sense of alarm (elstein & schwarz, 2002; stolper et al., 2010). elucidating the nature and acquisition of medical expertise is still an active research field that contributes to safe patient care and medical education. studying visual expertise in medicine is a rapidly growing field, as exemplified by this special issue. imaging techniques are important diagnostic tools that develop quickly and require complex knowledge and skills that need to be learned and assessed (gegenfurtner, siewiorek, lehtinen, & säljö, 2012). while examining images, bottom-up and top-down processes continuously interact and determine whether significant features are recognised and correctly interpreted. their knowledge ultimately determines whether experts see what can be detected, and understand what they see. a broad array of research questions is open to investigation in this field. having established what you want to know about expertise, the second question that needs to be answered in expertise research is “who are the experts?” as expertise manifests itself in the context of the tasks that experts engage in, this question is intricately intertwined with the question: “what tasks are critical to the domain?” depending on the type of performance outcomes available, it may be more or less straightforward to identify experts that consistently show superior performance on representative tasks. a thorough familiarisation with the domain which focuses on the tasks performed is needed to find out what may characterise expert behaviour. if objective outcome measures do not exist, professional criteria that provide social recognition, such as experience, degrees, licenses, job titles, status, and prizes, might be used to define experts (evetts, mieg, & felt, 2006; hoffman et al., 1995; mieg, 2006), as well as peer judgments that ask professionals to identify the best performers in their field, or those whom they would go to for advice (ericsson, 2006a; kahneman & klein, 2002; shanteau, 2002). other groups of participants must be included to examine in what way experts differ from those who are less experienced within the domain, or those who have worked as long in the profession but are considered to have less expertise. to study the development of expertise, groups with different levels of experience in the domain (ranging from naïve, novice, intermediate, and advanced to expert) are compared to each other. a group of trainees can also be followed on their developmental path towards becoming a professional. another crucial question to be addressed in preparing expertise research is “what research method(s) are most suitable to investigate the research questions in this field?”. in this article, the focus is on qualitative research methods of interviews and verbal protocols as they play a crucial role in uncovering the characteristics and origins of expertise in any domain. interviewing is a very straightforward way to initially explore a specific domain of expertise. it can be used to gather information on the relevant tasks undertaken, the knowledge and skills needed to perform these tasks and solve problems, the learning processes involved in education and continuous development, and the pitfalls associated with expertise that need to be dealt with. interviews deliver verbal protocols as data resulting from the answers to the interview questions. however, to better capture the cognitive processing of experts in task performance, verbal protocols must be gathered that are directly related to the task-specific processes. expertise researchers have, therefore, developed methods that study, in addition to behavior and outcomes of representative task performance, the thinking processes involved. they do so by probing the experts’ underlying representations, knowledge, and reasoning (chi, 2006b, 1997; ericsson & simon, 1980, 1993; feltovich et al. 2006; hoffman et al., 1995). these methods provide insight into the content of working memory during, or immediately after, task performance (see figure 1). two common methods used to assess online thinking by gathering verbal data during task performance are thinking aloud and discourse analysis of dialogues and group discussions. three common methods used to gather verbal protocols after task processing are free recall protocols, explanation van de wiel | f l r 99 protocols, and retrospective reports. table 1 provides an overview of the qualitative research methods discussed in the present article, and how they are related to both the domain and the task when examining expertise. table 1 overview of qualitative methods used to examine expertise in relation to the domain and the task exploration of domain expertise cognitive processing related to a specific task during task performance after task performance interviews thinking aloud free recall focus groups dialogues and group discussions explanation retrospective reports to guarantee the quality of the data obtained by interviews and verbal protocols in expertise research, careful preparation is required. the most critical steps in preparing studies using these qualitative research methods are summarised in table 2. in addition to the steps outlined above, how the protocols will be coded, and how the study will be communicated to the participants, are two important considerations for both interviews and verbal protocols. in relation to interviews, the emphasis is on preparing the interaction with the interviewee, whereas for verbal protocols the emphasis is on selecting and designing tasks that may differentiate expert and novice behaviour. these tasks should reflect the same goal-directed processing as required in real-world tasks. in the following sections, the methods are discussed to provide practical guidelines for developing, conducting, and analysing expertise studies that deliver valid, high-quality verbal data, which can then be reported in a transparent way. in addition, the strengths and weaknesses of these particular methods are highlighted and compared, and specific issues related to expertise research are explained and illustrated. table 2 steps in preparing studies using interviews and verbal protocols to examine expertise formulating the research question familiarisation with the domain and tasks determining the experts and participants interviews developing the interview guide preparing for the role of interviewer verbal protocols selecting and analysing the task developing the task materials and instructions developing coding schemes for analysing the verbal protocols framing the communication to participants van de wiel | f l r 100 4. interviews interviewing is one of the most common methods used to gather information about a given topic. it is a very natural process of inquiry that is used in everyday communication; just think about how often we engage in asking questions and receiving answers. the key to all good interviews is to clearly ask what you want to know, and to make sure that you receive the answer that allows you to know what you want to know. this sentence describes interviewing in a nutshell, highlighting the importance of the research questions, the interview guide, and the role of the interviewer in asking questions and evaluating answers. in relation to expertise research, it is important that this characterisation of interviewing also shows a realistic perspective on research. it assumes that we can come to understand a topic by obtaining relevant information in an objective way by interviewing a representative sample of participants (emans, 2004; king & horrocks, 2010). the interviewer wants to reveal what interviewees know, do, think, feel, believe, intend to do, want, or need, and assumes that the interviewee can communicate this during the interview. the basic processes involved in interviewing for research purposes are well explained by emans (2004) and summarised in figure 2. the goal is to reveal the cognitions of the interviewee, as related to a certain topic, i.e., the interviewee’s mental processes and the products of these processes, usually in the form of information, knowledge, thoughts, feelings and ideas about the topic. the task of the interviewer is to create a situation and ask questions that motivate an interviewee to connect to these cognitions and verbalise them in a reliable manner. the interviewer must carefully listen to the answers provided, and check whether these answers include the information necessary to answer the research questions. if not, further questions need to be asked. to obtain data for subsequent analysis, the whole interview needs to be recorded and transcribed verbatim, thus without any interpretation of the data. figure 2. processes in interviewing (adapted from emans, 2004). in the context of work, interviewing is the most common method used to interact with experienced practitioners as subject-matter experts to gather information about all kind of aspects of the work they are engaged in and the vocabulary they use. in job analysis, cognitive task analysis, and knowledge elicitation, interviews are used to yield primary insights into the tasks experts perform, the knowledge and skills underlying their performance, and the conditions that shape their performance. job analysis focuses on work activities, worker attributes, and/or work context and is used to inform human resource management practices, such as personnel selection, training, and performance management (bartram, 2008; sanchez & levine, 2012). job analysis is also a first step in job (re)design, workplace and equipment design and organising team work and provides an overview to determine what tasks need to be further scrutinised by task analysis or cognitive task analysis (chipman, schraagen & shalin, 2000; dubois & shalin, 2000). detailed analysis of tasks in terms of goals, actions, and thought processes helps to articulate what is not directly observable but which may be expressed by experts when they are sufficiently guided. in knowledge engineering, knowledge elicitation techniques are specifically designed for this purpose in order to develop expert systems and knowledge management systems (hoffman et al., 1995; hoffman & lintern, 2006; van de wiel | f l r 101 shadbolt & smart, 2015). interviews in different forms have been applied in a wide variety of professional domains to elicit knowledge from experts. the unstructured interview is often employed in an initial exploratory phase in which investigators familiarise themselves with the domain in an informal setting. in later phases, more structured interviews can scaffold the knowledge elicitation process by focusing on specific events, such as critical incidents in the critical incident technique (flanagan, 1954) and critical decisions made in unusual and challenging cases in the critical decision method (hoffman & lintern, 2006; shadbolt & smart, 2015). experts can also be asked to respond to an evolving scenario or to specific probe questions in order to systematically unravel their task representations (shadbolt & smart, 2015). the literature on job analysis, cognitive task analysis, and knowledge elicitation also emphasises the need for research methods to be combined to achieve a thorough understanding of the job and the tasks under study. as complex tasks often require teamwork, interviewing teams in addition to individual experts can provide further insight into how experts from different disciplines work together, build a shared understanding of tasks and situations, and coordinate and distribute tasks amongst themselves (salas, rosen, burke, goodwin, & fiore, 2006). the interview method provides relatively quick access to information from several teams within a domain, as compared to observation in the field or simulation of task performance. furthermore, as jobs, tasks, equipment, and work roles are not stable but rather continuously developing, experts in the field par excellence may provide valuable perspectives on future developments, and innovations, and insights into novel problems and how to deal with them (boshuizen & van de wiel, 2014; lesgold, 2000; sanchez & levine, 2012). in fact, experts shape the advancement of their field, and may share their ideas about these developments, and how to support them by individual and organisational learning, in future-oriented interviews (bartram, 2008; evetts et al, 2006; lesgold, 2000). whereas less structured, informal interviews are very suitable for an initial exploration of a domain or topic, semi-structured interviews are most useful in expertise research as they result in more objective data that also allow for comparisons to be made between different expertise groups. these interviews provide enough guidance to structure the conversation, but also enable the acquisition of meaningful information, as the interviewer may interact with the interviewee after an initial open question (emans, 2004). focus groups that investigate the opinions and experiences of people while they are interacting in groups are valuable tools to examine expertise for both exploratory and comparative purposes. in focus groups, initial questions guide the interview by inviting participants to share their views. in preparing interviews and focus groups for expertise research, the steps outlined in table 2 must be kept in mind. in the following sections, the development of the interview guide, the preparation for the tasks of interviewer, and the analysis of the data are described and illustrated. although face-to-face interviews are most commonly used, and taken as a starting point in the descriptions, the guidelines are largely applicable to interviews conducted in groups and by telephone or skype (deakin & wakefield, 2014; emans, 2004). in a subsequent section, the focus group method is explained in more detail. 4.1 the interview guide an interview guide is a script that helps to ensure that the interview is conducted in a standardised way. it consists of an introduction to the study, the body of the interview outlining the main questions and possible follow-up questions, the transitions between questions, and a conclusion (emans, 2004; king & horrocks, 2010; skopec, 1986). in the recruitment phase, participants already receive information about the study that may influence their contributions, and thus, the quality of the data gathered. therefore, it is good practice to compose this information as the first step of the interview guide. for researchers new to the field of the interview method table 3 provides an overview of the topics advised to be addressed in the interview guide. van de wiel | f l r 102 table 3 general outline of the interview guide information sheet provide context and purpose of the research; invite the participant; explain the procedure and expectations; explain the costs and benefits for participants; emphasise that participation is voluntary and may be withdrawn at any time; explain how data will be used; ask if there are any remaining questions; provide details of the research team and contact information (check compliance to ethical standards) introduction of the interview introduce yourself as interviewer; explain the purpose of research; show appreciation for participation; give the information letter and ask for informed consent; give a brief outline of the interview topics and indicate the time involved; clarify the roles and expectations of both interviewer and interviewee; ask permission to record the interview; ask if there are any remaining questions body of the interview introduction theme 1 question 1 + (probes) + follow-up questions question 2 + (probes) + follow-up questions question 3 + (probes) + follow-up questions et cetera introduction theme 2 question 1 + (probes) + follow-up questions question 2 + (probes) + follow-up questions et cetera conclusion of the interview indicate the end of the interview; summarise the main points; thank the interviewee for the valuable contribution; reiterate how the data will be used; provide a debriefing of the study; offer to provide a summary of the results after data processing has taken place; ask if there are any questions or comments developing the questions for the interview is an iterative process that is based on the research goals and the specific research questions (emans, 2004). as described above, the goal is to have a clear idea of what you want to know, and then to ask questions that elicit this information from the interviewees. in interview studies, it is also important to make the need for information explicit by defining the variables to be examined. the choice of variables should reflect a thorough understanding of the domain of study and, whenever possible, should be grounded in theory and based on previous research. formulating the possible outcomes of these variables assists in the phrasing of the interview questions and the analysis of the data. in fact, the possible outcomes are the type of answers you would expect to receive from the participants in the interview and, thus, provide clear guidelines for formulating the interview questions. an example of a study on the development of medical expertise in professional practice may illustrate this approach. in this study, our exploratory research questions were: “how do physicians learn in, from, and for their daily work, and how deliberate is this learning process?” (van de wiel et al., 2011). in addition, we wanted to examine differences in workplace learning between three groups of physicians. if we had asked our research questions directly, we would have cued the participants’ connotations about learning. this may have led to the physicians limiting their answers based on their interpretation of the concept, and focusing their attention too much on deliberate processes. based on research on workplace learning, deliberate practice, and selfregulated learning, we identified several relevant work-related activities from which physicians could learn: problem solving, consultation of colleagues, having differences of opinion, explaining to others, seeking and receiving feedback, evaluating performance, professional development activities, and participation in research. these work-related activities allowed us to formulate more specific research questions that helped us to decide what variables to focus on in the interviews. table 4 shows how specific research questions, van de wiel | f l r 103 variables, possible outcomes, and interview questions are related. for example, the specific research question formulated for the work-related activity of problem solving concerns what physicians can learn from the problems they encounter. these problems provide a chance to learn because they need to be solved, or dealt with as part of the job. the two most important variables to examine, therefore, are the types of problems encountered (i.e., what topics can physicians learn about?) and how they solve these problems (i.e., in what way do they learn?). some answers can already be anticipated and listed as possible outcomes in order to guide the formulation of the interview questions on this topic. in our study, an additional question asked for an example of a specific problem and how it was solved in order to illustrate and corroborate the previous answers. a subsequent research question that we addressed was what physicians can learn from solving problems by consulting colleagues. the variables of interest follow-up on the problem solving strategies previously mentioned by the interviewees. as we expected the consultation of colleagues to be an important strategy, we first allowed participants to mention it spontaneously, and then elaborated on the topic in the next interview question. in this study, we started with general questions and moved on to more specific learning activities. moreover, we were careful to introduce the study using the term professional development and not learning, as interview questions prime the way in which participants think and answer. table 4 relationships between specific research questions, variables, possible outcomes, and interview questions. examples are taken from a study on physicians’ learning in the workplace (van de wiel, et al., 2011) specific research questions variables possible outcomes interview questions what can physicians learn from the problems encountered in their practice? types of problems encountered problems in diagnosing patients problems in deciding on the treatment problems in interacting with patients 1. what problems do you usually encounter when diagnosing and treating patients? types of strategies in problem solving asking colleagues for advice searching the literature 1a. what strategies do you use to solve these problems? (if needed can you explain why?) example 1 example 2 et cetera 1b. can you give an example of a problem and how you solved it? what can physicians learn from consulting colleagues? kind of situations in which advice is asked lack of knowledge being uncertain 2. in what situations do you choose to ask a colleague for advice? frequency of advice seeking daily weekly monthly 2a. can you indicate how often you ask for advice? (ask to specify) reasons for advice seeking et cetera direct need for patient care learning 2b. why do you ask for advice? (probe for purpose/goal) the type of variables tapped into by the interview questions constrains the answers that can be expected. with closed questions, answers are restricted to a closed set, such as years of experience, age, or van de wiel | f l r 104 frequency (e.g., question 2a in table 4). in bipolar questions, this set is restricted to the answers yes or no. these questions are mostly asked in order to get a clear answer that can be followed up. in open questions, the interviewees are invited to share their perspective on a topic (e.g., questions 1, 1a, 1b, 2, and 2b in table 4). these questions must be unambiguously formulated, ask about one specific topic at a time, and should not be leading, i.e., should not suggest a particular answer option. interviewers must be neutral and not introduce bias by showing their own opinions, ideas, or feelings, or by suggesting answer options by providing examples or disclosing expectations. the goal is to obtain valid objective data and these types of suggestive questions may influence the thought processes of the interviewee. in answering open questions, interviewees must be encouraged to bring forward what is most prominent in their mind from their own frame of reference. the research team must critically review the interview guide and pilot the interviews with representatives of the target group. this will optimise the informational value of the data collected and prevent the use of restrictive and leading questions. 4.2 the tasks of the interviewer interviewing is a complex task in which the interviewer has to obtain the required information as efficiently as possible. the interviewer must create an interview situation in which the interviewee feels encouraged to speak up, but at the same time, the interviewer must remain in control by asking questions, evaluating the answers, and probing for more elaborate or meaningful answers if necessary (emans, 2004; skopec, 1986). this means that the interviewer must build up a positive relationship with the interviewee to make him or her feel at ease, and be well-prepared to guide the interviewee through the questions in an unobtrusive manner. while it is vital to standardise the situation over different interviews to obtain objective and comparable data, a natural conversation will usually occur if the interviewer is clear, kind, and genuinely interested in a professional way. the interviewer observes the interviewee and listens carefully to the answers by keeping the final goal in mind: gathering valid, complete, relevant and clear answers that reflect the interviewee’s cognitions and provide the information needed for the study. the interaction with the interviewee is steered by verbal and non-verbal probing. nondirective probes serve to fuel the conversation by encouraging interviewees to continue without interrupting their line of thought. effective nondirective probes include: silence accompanied by non-verbal signs of attention; neutral phrases, such as “um hmm”, “oh”, “yes”, “interesting”; rephrasing part of the answer; reflection of feelings that cannot be neglected by showing understanding; making brief summaries of what has been said to check the main points; and general elaborations such as, “can you tell me more?” and, “could you explain further?”. direct probes more actively intervene in the flow of the conversation and are used to focus attention on specific topics. common directive probes include: elaborations that ask for specifications; clarification questions when answers given are imprecise or not well understood; repetition of a question when it appears that the interviewee has not understood the question or avoids answering it; and confrontation when answers seem inconsistent. the interviewer must take the lead and coordinate questioning and probing with appropriate non-verbal behaviour in terms of eye-contact, facial expressions, gestures, and position, to obtain the information required and maintain a positive atmosphere throughout the interview. 4.3 analysing and reporting the data analysing interviews can be relatively straightforward, if during the translation of the specific research questions into variables and interview questions in the preparation phase (see table 4), the anticipated answers were matched to the intended results (emans, 2004). a good match guarantees the internal validity of the study (neuendorf, 2002). the analysis starts with transcribing the interviews verbatim and proceeds by completing the list of possible answers anticipated while preparing the interview with the answers actually given by the interviewees. the more open the questions, the more diverse the answers can be. the more limited the possible answer set, the easier it is to create an overview of the results. for closed questions asking for numerical data, such as work experience, age, and frequency, the results can be van de wiel | f l r 105 processed as they are in quantitative studies, and groups can be compared. for the open questions, researchers need to categorise the answers per variable using content, thematic, or template analysis (neuendorf, 2002; king & horrocks, 2010; brooks, mccluskey, turley, & king, 2015). categorisation starts by developing a coding scheme indicating all emergent answer categories per variable. coding is usually an ongoing and iterative process. a team of researchers reads and codes the transcripts and discusses these codes as new themes and subthemes in the answers emerge and are agreed upon. it is vital to code all relevant parts of participants’ answers. irrelevant answers, i.e., those that do not relate to the research questions, may be categorised under a separate code. this helps the researchers to check those answers and be open-minded to the possibility of finding unexpected emergent themes in the analysis, related to questions answered throughout the interview. the coding can be done manually or can be facilitated by software programs for qualitative analysis. after the coding of all transcripts is complete, the frequencies per code can be listed to indicate the most common answers to each question, as well as the exceptional ones. a good overview shows the main themes and subthemes and helps to uncover patterns in the data as well as any differences between participant groups. in essence, the analysis and reporting of interview data is a process of data reduction and summarisation. illustrative quotes from participants’ answers enrich reporting by giving insight in how participants typically expressed themselves. the description so far may be very abstract, but an example will show that the approach is actually very pragmatic. in the study we conducted about physicians’ workplace learning (van de wiel et al., 2011), we started by categorising the answers per interview question and later grouped them per variable and per specific research question (see table 4) in order to meaningfully report the data. for example, regarding the interview question about what problems participants encountered when diagnosing and treating patients (question 1 in table 4), we found that most participants encountered problems that could be categorised as problems with diagnosis, choosing diagnostic tools and treatments, interaction with patients, and practical organisational issues. participants gave specific examples of some problems, and these were summarised. the ways in which they solved these problems were also categorised and summarised. the answers to these questions were often intermingled, and just as the interviewer had to make sure that all topics were addressed, the researchers had to combine all answers in the analysis. as theory and previous research provided the basis for our specific research questions and variables, we chose to report the data by presenting these as themes and subthemes in the results section. our intention was to summarise what had been said by the participants and to indicate to what extent they concurred and differed on themes and subthemes. in our reporting of results, we also referred to characteristic quotes. analysis was an iterative process in which two coders consecutively categorised sets of data. these categories were reviewed and critically discussed by the research team. in a follow-up study we wanted to quantify to what extent the physicians were deliberately engaged in the work-related learning activities and relate this to other variables (van de wiel, & van den bossche, 2013). in a second step, we therefore analysed the data used in the van de wiel et al. (2011) study from this perspective. the analysis approach taken in this study illustrates how rich verbal interview data are, and how these data can be analysed in different ways depending on the specific research questions. the themes reported in the results of the qualitative analysis of the first study (i.e., work-related learning activities in medical practice) were used as variables to be coded to explore the extent of deliberate engagement in these activities in the second study. in an iterative process, three researchers coded a subset of the interviews until they could reliably distinguish three levels of deliberate practice for each variable: (0) not engaged in learning activity, (1) engaged in learning activities inherent to the job, such as solving a problem, and (2) engaged in deliberate practice as indicated by showing greater motivation and effort for learning to improve competence. clear definitions of codes guided the continuation of coding by one researcher, who consulted the others when in doubt. two themes that pervaded the entire interview were added as variables: reflection on diagnosis and treatment and planning learning activities. the categorisations were reported in a table to allow comparison between the medical residents and the experienced physicians participating in the study. the table displayed the frequencies of both groups’ learning activities, representing the ten variables at each of the three levels of deliberate practice. the outcomes were described in the text and illustrated by quotes. van de wiel | f l r 106 4.4 focus groups focus groups are the most common method used to investigate the opinions and experiences of carefully selected groups of people with regard to all kind of topics and across a wide range of fields (morgan, 1996; krueger & casey, 2015; stalmeijer, mcnaughton, & van mook, 2014). in relation to expertise research, this method is a valuable tool that can be used to gain insight into how different people experience a task, situation, or phenomenon. it can be used both to explore a topic and to compare different groups. only a few initial questions are needed to open up the discussion and encourage participants to share their view on the topic at hand. the advantage of this group interview method is that as people interact, they discuss and analyse the topic from different perspectives, ask each other questions, and may refine their views. a moderator guides the discussion and, as in individual interviews, is responsible for making the participants feel at ease while at the same time probing them to specify their contributions in order to obtain relevant data. a co-moderator usually assists in this process. the discussions are transcribed and then analysed in a bottom-up way by identifying themes and subthemes throughout the text that are relevant to the specific research questions. the researchers also look for relationships between these themes. at least two coders analyse the data independently and then critically review the coding scheme until they reach agreement. the researchers write a summary of the findings that may then be sent to participants to check whether they have suggestions for adjustments that would better represent their discussion. the summary also pinpoints issues that need further clarification and can be brought up at the next focus group. usually 34 rounds of focus group discussions with 5-8 participants in the target group are needed. extra groups are added as long as new information emerges in the discussions, i.e., until data saturation has been achieved. a synthesis of all themes and subthemes coded in the different groups is the basis for reporting the data. quotes can illustrate characteristic utterances. the focus group method has been frequently applied in medicine, often for purposes of medical education (stalmeijer et al., 2014), for example to gain insight into how to improve study arrangements (e.g., de leng, dolmans, van de wiel, muijtjens, & van der vleuten, 2007; van de wiel, schaper, scherpbier, van der vleuten, & boshuizen, 1999), and how the transition from medical school to clinical practice is perceived and can be supported (prince, van de wiel, van der vleuten, boshuizen, & scherpbier, 2004). in medical expertise research, this method has been used to understand how general practitioners approach the diagnostic task and what role non-analytical reasoning plays in their diagnostic process (stolper et al., 2009). 5. verbal protocols verbal protocols add another dimension to the examination of expertise as they deliver verbal data in relation to cognitive processing either during or directly after task performance. interviews are limited in that they provide self-reports by eliciting cognitions about task performance and expert behaviour in a general way. this may induce participants to interpret their own processes from an evaluative perspective and lead to reconstruction and generalisation of their memories of specific task performance (ericsson & simon, 1980; 1993; van someren, barnard, & sandberg, 1994). to minimise these effects, verbal protocol methods aim to capture the processes in performing representative domain tasks by tapping the content of working memory during or immediately after task processing (see figure 1). this diminishes participants’ opportunity to theorise and rationalise what they do and keeps the time delay in processing and verbalising to a minimum. however, requesting participants to verbalise their thoughts while they engage in a task may interfere with natural processing. interference may also occur when participants know beforehand that they will be asked to report back on their task performance. it is necessary, therefore, that studies using verbal protocol methods are carefully designed to capture the natural task processes and problem representations. the formulation of task instructions and selection of tasks and problem situations are critical to ensure that expert behaviour can be demonstrated, and contrasted to novice behavior, in goal-directed, realistic task performance. table 5 van de wiel | f l r 107 shows how the different methods discussed in this article score on the most important criteria in determining whether verbalisation of cognitions impacts the validity of the verbal data gathered. table 5 advantages and disadvantages of the qualitative methods of interviews and verbal protocols in relation to task processing criteria that may impact the validity of verbal data method specific task performance interference with task processing inducing interpretation reflection of working memory interviews +/ focus groups +/ during task think aloud + +/+ dialogues and group discussions + + probing + + +/+/ after task free recall + + explanation + + retrospective reports + + probing + +/+/ two methods that lie at the intersection of interviews and verbal protocols probe participants’ cognitive processing by means of questions during or after specific task performance. as depicted in table 5, one advantage of these probing methods is that cognitions can be examined in direct relation to the task at hand, yielding more specific and precise information than interviews. the disadvantage of probing during task performance is that questioning interrupts participants’ thoughts and actions, altering their normal task processing. this may even encourage participants to adopt an interpretative mind-set (ericsson & simon, 1980, 1993; van someren et al, 1994). probing after task performance overlaps with interviews that focus on specific tasks, events, scenarios, and examples as used in knowledge elicitation techniques (hoffman et al., 1995; hoffman & lintern, 2006; shadbolt & smart, 2015), as well as with some types of explanation protocols and retrospective reports. this method can deliver very valuable information regarding the research questions. as discussed in the section on interviews, this is dependent upon the way the questions are phrased and embedded within the interview guide. in expertise research, data gathered by interviews and verbal protocols may complement each other as participants’ overall cognitions about their expertise domain can be combined with assessments of task-specific processing and outcomes. as expertise is domain and task specific, the selection of participants, tasks, and particular problems to solve are crucial steps in setting up verbal protocol studies that examine expertise (see table 2). both task characteristics and the experts’ knowledge and experience determine the cognitive processes and evolving representations in task performance (see figure 1). if a representative domain task has been chosen, e.g., diagnosis in medicine, the next step is to decide upon the problems to be solved and the presentation format. in medicine, patient cases that reflect a consultation with a physician can, for example, be summarised in a brief description and might be supplemented with information from the patients’ record to mimic the situation in real practice. most important here is that the experimental situation captures the essentials of the task and the problem under investigation. in order to identify characteristics of expert behaviour, cognitive processing, and outcomes, as well as differences with other groups of participants, problems may be presented at various levels of difficulty. task materials and conditions may also be manipulated to van de wiel | f l r 108 investigate the effects of changing normal processing. in routine problems, experts are expected to automatically activate the right knowledge, but in more difficult problems and under special conditions, they will coordinate automatic thoughts with analytical thinking. the amount of problem information presented and the time in which the information becomes available also impact cognitive processing: the more information and time involved, the more coordination is required, and the more elaborate and deliberate thinking will be. factors such as these that are related to the nature of the task performed are also emphasised in the cognitive continuum theory (custers, 2013; hamm, 1988; hammond, hamm, grassia, & pearson, 1987). this theory situates most thinking somewhere in between intuition and analysis, which are conceptualised as two ends of a continuum of cognitive processing. where on the continuum thinking falls, depends on specific task characteristics. this theory provides a framework for analysing tasks in relation to processing requirements. verbalisation theory, in addition, provides a framework for analysing in what way verbalisation of the content in working memory influences task processing (ericsson & simon, 1980, 1993). if information and knowledge are already represented in a verbal format, they only have to be vocalised. if, however, they are represented in a visual or motor modality, encoding of the representations into verbal format is required. this demands extra cognitive processing that may alter the way in which the task normally proceeds. in preparing verbal protocol studies, both the task and problem characteristics need to be well thought out, and piloted to ascertain that the research questions can be answered. these characteristics may also be manipulated to test specific hypotheses. when combining verbal protocol methods in one study, researchers need to be careful that instructions given and procedures followed do not influence subsequent task processing. in conducting verbal protocol studies, task performance must be monitored and verbalisations need to be recorded and transcribed verbatim (chi, 1997; ericsson & simon, 1980, 1993; van someren et al, 1994). the outcome measures of the task comprise the dependent variables that are to be used to assess differences in expertise between groups or improvement over time. these may be complemented with quantitative processing measures, such as time used to solve or to explain a problem. the verbal protocols provide rich data on the underlying cognitions in task performance that need to be coded and interpreted to answer the research questions. analysing the data from a clear perspective enhances the acquisition of valuable, objective information and may reduce the workload. a coding scheme depicting the variables (e.g., knowledge used) and the related coding categories (e.g., biomedical and clinical knowledge), including definitions and examples of utterances per category, needs to be developed to guide data analysis (chi, 1997; van someren et al., 1994). the coding scheme can be based on theory, previous research, and/or cognitive task analysis or can be developed in a bottom up way (chi, 1997; hsieh & shannon, 2005; van someren et al., 1994). an important issue to decide upon when developing the coding scheme is the unit of analysis used, as this may vary from a word, single unit of information or proposition, to a clause, reasoning chain, or turn in a discussion, and is to be guided by the research goals. particularly in theory-driven research, it is good practice to develop the coding scheme using a set of pilot protocols and to test the hypotheses on another sample of protocols. in more exploratory research, the development of coding schemes may lead to theory-building and hypotheses generation. during analysis, the audioor video-recordings can be listened to or watched, if it is necessary to improve understanding and interpretation. the coding of verbal protocols is an iterative process, in which coders must seek agreement in order to obtain reliable data. although the analysis of verbal protocols is qualitatively in nature, the data may be quantitatively described by tallying the number of utterances per coding category (chi, 1997; krippendorf, 2012; neuendorf, 2002). such quantitative descriptions help to create an overview, reduce subjectivity in interpretation, and find and report meaningful patterns in the data, while protocol fragments show examples of the coded categories in relation to each variable. recapitulating the steps to be taken in preparing expertise research using qualitative methods (see table 2) clearly shows how important it is that the study design is guided by the research questions and by the strengths and weaknesses of the methods employed. understanding the ways in which various methods affect cognitive task processing in data collection helps to establish what interview or verbal protocol method to use. in designing verbal protocol studies, the next critical step is to choose task characteristics and van de wiel | f l r 109 requirements in a manner that can reveal expert behaviour under conditions reflecting or manipulating the essence of the task. this step is closely tied to the development of a coding scheme in which it is anticipated how each variable of interest can be measured in the verbal protocols. the final step of communicating the study to participants must comply with general research guidelines, as described in the outline of the interview guide (see table 3). in the following sections, the five verbal protocol methods of thinking aloud, dialogues or group discussions, free recall, explanation, and retrospective reports will be discussed. drawing from research on medical and visual expertise, examples will be provided of specific aspects of the design and analysis of each method. 5.1 think-aloud protocols the think-aloud method has been frequently used to investigate the knowledge and processes in task performance and is well-documented in the literature (chi, 1997; ericsson, 2006b; ericsson & simon, 1980, 1993; hassebrock & prietula, 1992; van someren, et al., 1994; shadbolt & smart, 2015). participants are asked to say everything that comes to their mind while they engage in a task. the rationale behind this method is that the verbalised thoughts reflect the evolving mental representations in working memory during task performance (see figure 1). verbalisation of thoughts does not change the sequence of actions, and usually does not disturb the cognitive processes engaged in, but merely slows down these processes. in some tasks, however, verbal encoding and vocalisation of information interferes with natural task performance. this is because the increased load on working memory makes it difficult to keep up with the flow of information that needs to be attended to, i.e., cognitive processing cannot be slowed down to successfully accomplish the task. verbalising is easiest when thoughts are already verbally represented, and requires extra cognitive effort in visual and motor tasks. in highly skilled and expert performance, cognitive processes are largely automated and think-aloud protocols will only reveal those thoughts that consciously come to mind. this method, then, shows what knowledge is activated, which parts of cognitive processing are automated, and when deliberate, analytical thinking is involved in specific groups, tasks and problem situations. natural thinking in cognitive tasks can be easily disturbed by task instructions. it is therefore important that these instructions are formulated in such a way that participants are not tempted to explain or justify what they do to the experimenter. practicing the think-aloud procedure with participants maximises the chance of obtaining valid data. when verbal materials are presented, these need to be read out loud to facilitate expressions of thoughts during information intake and signal the cues that trigger these thoughts. the data can be systematically analysed and reported in many different ways, depending on the research goals. in medicine, the think-aloud procedure has, for the most part, been applied in diagnostic tasks to reveal the knowledge and reasoning involved. hassebrock & prietula (1992) made a detailed analysis of diagnostic reasoning, focusing on knowledge states, conceptual operations, and lines of reasoning to explore different types of diagnosis, the cognitive activities engaged in (e.g., data examination, data explanation, hypothesis evaluation, meta-reasoning), and the links between patient cues, pathophysiological conditions and hypotheses, respectively. the coding scheme they used was very elaborate allowing for precise statements on the knowledge representations and the (causal) lines of reasoning in diagnosing cases of congenital heart disease. these statements could be compared with expert models of reasoning on the topic. it is a good illustration of how cognitive task analysis may guide the interpretation of qualitative data. however, although it is tempting to carry out this type of detailed analysis when data are so rich, it might be more practical to focus on some of the main inferences made, as in a study conducted by gilhooly, et al. (1997). they asked participants to diagnose eight ecg traces while thinking aloud. a computer program first listed all of the technical terms used by participants and an expert categorised these words into three coding categories. subsequently, the program counted the number of words indicating trace characteristics, clinical inferences and biomedical inferences. this method enabled the researchers to compare the knowledge used in visual diagnosis between different expertise groups and across ecg traces varying in difficulty, as well as between the think-aloud protocols and explanation protocols that were collected a week later. in this special issue, helle (2017) discusses the relationship between eye tracking data and verbal data. van de wiel | f l r 110 5.2 discourse analysis of dialogues and group discussions recording the natural discourses of collaborating experts or of students and teachers in education is a valuable method that can be used to gain insight into online group decision making, problem solving, and learning (chi, 1997; salas et al., 2006). it shows what issues participants attend to, how they regulate their discussions and task processes, how they interact, what kinds of knowledge and strategies they use, what they might learn, and what they can improve. the main task of researchers is to select a representative sample of meetings in which participants engage in knowledge sharing as part of their work or training, audioor video-record these meetings, transcribe them verbatim, and then analyse what has been said. participants might notice that they are being observed in the very beginning, but as soon as they start their tasks, they will proceed as usual. this method delivers very rich data, and reducing the data to manageable proportions in coding is a challenge, and a process that will be guided by the research goals. analysing and describing the data at different levels of detail, for example, in terms of the sequence of actions taken in discussing a patient, the type of patient problems discussed, and the content of a sample of these discussions, helps to create overview and pinpoint the issues of interest. patel and colleagues investigated team problem solving and decision making in the complex environment of hospital intensive care units. in their research, work domain analysis and communication patterns between attending physicians, residents, nurses, and consulted specialists provided an invaluable framework that could be used to analyse the content of the individual contributions and understand the processes in context (patel & arocha, 2001; patel, kaufman, & magder, 1996). the researchers particularly focused on discussions that took place during morning rounds. during these rounds the team discusses the patients on the ward, evaluating each patient in detail and planning future actions. they enriched their analysis by examining complementary data from patient charts, recordings of morning lectures, and interviews with participants. they segmented the protocols into episodes that distinguished between subsequent phases in the discussions, and further segmented these phases into thematic idea units. categorisation of these units classified the contributions at four knowledge levels: (1) observations of patient signs, (2) findings referring to clinically significant clusters of observations, (3) facets referring to pathophysiological states or broad categories of disease, and (4) diagnoses referring to clinical conclusions. they also categorised three types of decisions at the level of: findings, actions to be taken in patient management, and assessments of the overall state. the results showed how contributions were distributed over the different participants, how content changed over three consecutive days of patient care, how interactions, content, and reasoning differed in two types of intensive care units, and what episodes were particularly useful for expertise development. the data were both qualitatively and quantitatively described. this research in the tradition of naturalistic decision making (klein, 2008) provides a good example of how expertise can be examined when it is distributed over multiple agents in a complex, real-life, dynamic environment. two other studies using this method have analysed the discussions in learning situations to examine the application of biomedical and clinical knowledge in problem-based learning tutorials with real-patients (diemers, van de wiel, scherpbier, heineman, & dolmans, 2011), and the topics discussed in tutorial dialogues on diagnostic reasoning of trainees and their supervisors in general practice (stolper et al., 2015). in the study conducted by diemers and colleagues, a purposive sample of tutorial group discussions was divided into a preparation and a reporting phase. based on a technique of proposition analysis for medical protocols (patel & groen, 1986), all transcripts were segmented into small meaningful information units or propositions. propositions connect two concepts by a qualifier, such as “a pseudo polyp is characteristic of ulcerative colitis”. guided by previous research and pilot interviews, we coded the propositions as patient information, formal clinical knowledge, biomedical knowledge, informal clinical knowledge, procedural information, and other information, and also indicated if the proposition was put forward by the tutor or a student. in this way, we could compare both the number of propositions per coding category, and the number of propositions contributed by tutors and students in both phases. to analyse the function of biomedical knowledge, we categorised what the biomedical lines of reasoning in the protocols explained, and found that van de wiel | f l r 111 they mostly linked underlying mechanisms of disease to clinical features of patients, as intended by the educational format. in the study conducted by stolper and colleagues, a representative sample of tutorial dialogues was taken and segmented based on turns in the conversation and also on content changes. the coding proceeded in a bottom-up and iterative way, but was informed by the researchers knowledge of diagnostic reasoning and research goals, which helped to characterise the topics of discussion. as trainees usually presented several patient cases for discussion, we differentiated between a reporting and an analysis phase. segments were double-coded to indicate the contributions of trainees and supervisors. the number of words per code was counted so that we could examine to what extent the different topics were discussed and by whom. in line with the research questions, the data for all main coding categories were reported in tables, and specified in more detail for the categories of diagnostic reasoning and gut feelings. in addition, the tutorial dialogues, diagnostic reasoning, and the way in which gut feelings featured in the dialogues were described in qualitative terms. in both studies, the methods of analysis yielded rich but precise data that could be used to quantitatively compare the attention paid by the participant groups to the coding categories and the variables of interest, and qualitatively describe, interpret and illustrate these variables. 5.3 free recall protocols free recall is a classical method used to study problem representation underlying task performance (chi, 2006b; chi et al., 1988; feltovich et al., 2006). the most well-known example of a free recall study is the work on chess expertise of de groot (1946/1978), who asked participants to think of a move in an actual game position before they had to recall the position. chess masters selected the best moves and recalled more chess pieces than other players. this was explained as being the result of their better overall conception of the problem. masters could grasp the problem at a high level in a very short time showing their expert knowledge of chess. the memory paradigm used, i.e., asking the chess players to recall briefly presented chess positions, has since been adopted to chart the problem representations and underlying knowledge structures by reviewing how individual chess pieces are chunked together in memory (chase & simon, 1973). the rationale behind the free recall measure is that the content of working memory is retrieved immediately after task performance (see figure 1). therefore, matching the goals of processing in the experimental situation to the real task is a prerequisite consideration in the development of task instructions (ericsson & smith, 1991; ericsson, patel & kintsch, 2000). this is underlined by levels of processing theory showing that meaningful processing gives the best results due to the richer connections in memory (craik, 2002). the use of the free recall method in medical expertise research clearly shows that results are influenced by task processing conditions. in contrast to the superiority of expert memory for meaningful materials found in a wide variety of domains (chi, 2006b; feltovich et al., 2006), in medicine, findings have been less straightforward (norman, et al., 2006). in most circumstances, experienced physicians appear to represent patient cases in a more condensed way than advanced students, if they process them for diagnosis (schmidt & boshuizen, 1993; de bruin, van de wiel, rikers, & schmidt, 2005). a sequence of recall studies manipulating case materials and task instructions suggests that clinical case processing of experts using patient descriptions is rather robust. however, memorisation instructions, perceiving the task as a memory task, or instructions for elaborate processing may enhance their recall (de bruin, et al., 2005; van de wiel, schmidt, & boshuizen, 1998; van de wiel, ploegh, boshuizen, & schmidt, 2005; wimmers, schmidt, verkoeijen, & van de wiel, 2005). if lab data were processed without further patient information and under elaborate problem formulation conditions, experts outperformed students in recalling the lab data after this analytical diagnostic task (norman et al., 1989; wimmers et al., 2005). if medical students knew in advance that they would be asked to recall a patient case (intentional recall condition) they recalled more case information than if they did not know this (incidental recall condition) (van de wiel et al., 2005). in conclusion, the method has some drawbacks that are hard to control in experimental research, if only because more than one patient case should be presented. a lesson learned is that diagnostic tasks should be presented in a realistic way by aligning processing goals and providing the patient information that would be available van de wiel | f l r 112 in practice. for research in visual expertise this means that both the image and the information physicians have before interpreting the image should be presented (hatala, norman, & brooks, 1999; kulatungamoruzi, brooks, & norman, 2004). to capture the nature of expertise, the information provided in patient case descriptions needs to be presented in a standard order and phrased as it is communicated in practice, either in the words of patients or of colleagues, and interpretation must be left to the participants. the manipulation of task performance conditions can be an important strategy to further uncover expert knowledge structures. a good example is constraining the time in which participants have to process case materials, as it might be expected that this will have a lower impact on experts than on less advanced participants (schmidt & boshuizen, 1993; van de wiel et al., 1998). in addition, the method of free recall should be supplemented with other verbal protocols in order to corroborate findings. representations of patient cases, for example, might be investigated by asking physicians how they would summarise or characterise a case or describe what information was critical for their diagnosis. in visual domains, recall protocols of the images presented can be obtained by asking participants to describe what they saw, and this can be supplemented with instructions to indicate the relevant features on the image, or draw the image (e.g., lesgold et al., 1988; gilhooly et al., 1997). in this way, the representation of features recognised can be separated from the interpretation of the pattern of features in diagnosis. analysis of free recall protocols, characteristic summaries, or protocols with critical cues can proceed in a straightforward way using proposition analysis. the number of propositions in the protocol that match the propositions in the case materials is counted. the number of summaries, i.e., inferences referring to more than one case proposition, may be counted separately to show to what extent participants represent the case information at a higher interpretative level. an example of a summary encompassing four propositions is “auscultation reveals mitral valve insufficiency”, which summarises the more detailed information, “auscultation reveals a holosystolic murmur at the apex radiating towards the axilla” (van de wiel et al., 1998). the summaries can be provided at different levels of detail, varying from interpretation of lab data to the encapsulation of data into pathophysiological mechanisms or diagnostic labels. moreover, the order of the information recalled may also reveal how the problem and underlying knowledge are represented in memory (e.g., boshuizen & claessen, 1985). the research questions will determine what to focus on in designing the study and analysing the data. 5.4 explanation protocols in medical expertise research, the post-hoc explanation method was introduced by feltovich and barrows (1984), and further developed by patel and groen (1986) to gain insight into the knowledge used in the diagnostic process and the causal lines of reasoning. participants are asked to provide a pathophysiological explanation of the signs and symptoms in a patient case that they have processed for diagnosis. just as in free recall, it is assumed that the knowledge activated in case processing will be retrieved when providing the explanation (see figure 1). although, in general, this seems to be the case for expert processing, explanations may be influenced by the knowledge available to understand the case materials. as the knowledge of novices and intermediates, for example, is fragmented and less coherently organised, they tend to elaborate on their knowledge when explaining the recalled case features (boshuizen & schmidt, 1992). it is clear, however, that explanations of patient cases can reveal the knowledge participants use in linking case information, pathophysiological mechanisms, and disease. explanations allow for analysis of both the content and the structure of knowledge, i.e., how knowledge elements are related (chi, 1997). the explanation method is easy to use, but may be labour-intensive to analyse. based on two studies in which we applied this method to compare the knowledge structures of medical students with those of experienced physicians (van de wiel et al., 2000), and to examine the knowledge development of medical students over the period of their course (diemers, et al., 2015), a detailed account of how to proceed with analysis and reporting are provided. for these research goals of comparing knowledge between groups and over time, the first step in analysis is to develop a model explanation. a model explanation links the signs van de wiel | f l r 113 and symptoms in a case to the diagnosis via the most important biomedical and clinical concepts explicating the disease processes in a network representation much like a concept map. the model explanation represents a causal model of a disease and must be developed in close collaboration with experts. concept mapping is a particularly useful tool to elicit experts’ knowledge for this purpose (hoffman & lintern, 2006; shadbolt & smart, 2015). the participants’ written explanation protocols are then translated into networks of linked concepts and compared to the model explanation network. the network representations enable an assessment of explanation quality in terms of correctness of both concepts and links. in our studies, the variables we coded in the protocols included the total number of concepts used in explanations, specified as the number of model concepts, alternative concepts, detailed concepts, and wrong concepts, as well as the total number of links, specified as model links, alternative links, detailed links, wrong links and shortcuts in reasoning. as measures of quality, we used the percentage of model concepts (relative to the total number of concepts used in the protocols) and the percentage of model links (relative to the total number of links used in the protocols). in the diemers et al. study, we also counted the number of biomedical and clinical concepts used. the outcomes of the variables were depicted in graphics or tables to support interpretation. the method provides detailed insight into the quality of knowledge structures, (causal) lines of reasoning, and the nature of the knowledge used. moreover, it allows meaningful comparisons between groups and conditions. the combination of variables allows consistent patterns to be found in the data. explanation protocols provide a rich source of clear examples of expert and flawed knowledge and reasoning. two studies asking for explanations in the visual domain demonstrate the use of a keyword technique to characterise the type of utterances made by participants (gilhooly et al., 1997; jaarsma, jarodzka, nap, van merriënboer, & boshuizen , 2014). gilhooly and colleagues were interested in whether cardiologists had more biomedical knowledge available to explain ecg traces than less experienced participants. participants could look at the the ecg traces while providing the explanations. a computer program analysed how many words in the protocols indicated trace characteristics, clinical inferences, and biomedical inferences (similar to the analysis of the ecg think-aloud protocols that were collected a week before the explanations). this procedure resulted in very clear data which revealed the expected expertise effect. in the study conducted by jaarsma and colleagues, participants were asked to give an explanation for their diagnosis after they had seen a microscopic image for two seconds. two coders categorised the words in the combined protocols of 20 explanations. some of these categories were based on previous research and others emerged from the data. the number of words in each coding category characterised patterns of reasoning processes and the types of knowledge used by each of the three expertise groups. these examples show that explanation protocols can be reliably used to examine expertise differences in the visual domain, and that indicative words can be used as practical units of analysis. providing explanations can be a task in its own right (chi, 1997), as participants can for example be asked to explain a concept. this use of explanation protocols overlaps with knowledge elicitation techniques. explanations collected in medical expertise research can be analysed from a psychological perspective to reveal differences between expertise groups in terms of the content and organisation of their knowledge. for example, in a study on the explanation of clinical concepts, we analysed the elaborateness, quality, and fluency with which explanations were provided (van de wiel, boshuizen, schmidt, & schaper, 1999). the researchers told participants that they were interested in what they knew about certain concepts, and asked them to explain 20 concepts to the experimenter in approximately 2 minutes. the instructions were carefully crafted to ensure that participants communicated all they knew during the full time slot, indicated when they were not sure, and also explained how they could recognise a particular concept in patients. throughout the experiment, the experimenter guided the elicitation process according to this procedure in order to increase data quality. for each concept, a model explanation was constructed. this highlighted model concepts and referred to a definition of the concept, the major causes and clinical consequences, and the essential pathophysiological mechanisms of disease. the model explanations were based on medical literature and checked by medical specialists. the participants’ transcribed explanation protocols were segmented into meaningful information units that were then coded based on content. elaborateness of the protocols was measured by counting the total number of medical concepts used and specified per category: definition, van de wiel | f l r 114 cause, clinical knowledge, pathophysiological knowledge, and therapeutic knowledge. quality of the explanations was measured by comparing the explanation protocols to the model explanation revealing the number of model concepts and imprecise expressions used, and the number of clinical concepts that were unknown. accessibility of knowledge was operationalised by the fluency with which participants provided their explanations and measured by the number of times they abruptly changed the subject, used thinking pauses, stumbled, thought aloud, or referred to their lack of knowledge. coding was very precise and laborious and, as a result, reliable. the results could be clearly depicted in a table and gave good insights into the availability and accessibility of knowledge in the three expertise groups. the data were also interpreted in a qualitative manner for medical education purposes to illustrate major misconceptions in concepts that were weakly explained. this method may be particularly useful to chart knowledge and misunderstandings in complex domains, including visual expertise (e.g., the feature description test as used by kok et al., 2013), and to effectively design instruction. if explanations are requested in a written format, the method is more feasible and shows availability of knowledge but not the fluency by which it is accessed. the method can also be used for assessment of and feedback to students. 5.5 retrospective reports the questions asked to participants in retrospective reports of problem solving are similar to those that can be asked in interviews. examples of such questions are: “how did you solve the problem?” or “what did you think while problem solving?”. the important difference is that the delay between problem solving and answering these questions is minimised in retrospective reports that are generated immediately after task performance. this delay can undermine the validity of the answers because people tend to present their thought processes as being more coherent and intelligent than they are, and reconstruct their memories of the problem-solving processes based on the outcomes (ericsson & simon, 1980, 1993; van someren et al., 1994). the effect of delay may also exist in retrospective reports if the task lasts longer than 10 seconds, as, after that time, the sequence of thoughts is no longer readily available in working memory (ericsson & simon, 1993). another threat to the validity of the data is if, during questioning, researchers probe participants to give post-hoc rationalisations of what they did, e.g., by asking: “can you explain how you proceeded in solving the problem?”, “what general approach did you use in problem solving?” or “why did you solve the problem in this way?” if, for example, participants did not use a general, structured approach, but rather solved the problem by trial and error, they may feel embarrassed to say so. however, when interviews are well-prepared and conducted, the interviewer can tap into relevant knowledge and strategies the participants may have used by emphasising that there are no right or wrong answers, encouraging participants to think aloud, and probing. as demonstrated by knowledge elicitation practices, researchers can scaffold participants in a collaborative process to articulate what they know, even if this has not been articulated before (hoffman et al., 1995; hoffman & lintern, 2006). a good example of such a method is cued or stimulated recall, in which participants are walked through the task by watching and/or listening to a recording of the problem-solving process while expressing what they remember of their thoughts at specific points. this method may be particularly useful if concurrent thinking aloud is not possible because the incoming information that needs to be attended to is presented faster than it can be verbalised. the recording then provides cues in working memory that trigger retrieval of the cognitive processes and mental representations involved in task performance (see figure 1). for this reason, the method might be well suited to examine processing and interpretation of visual data. to obtain valid data, retrospective questions or reports should be gathered in representative samples of participants and validated by the results of other methods, such as thinking aloud. to analyse the data, researchers can develop a coding scheme (just as in interviews or think-aloud protocols) and report the data in both qualitative and quantitative ways. the following two examples will show how the method of stimulated recall can be effectively applied to gather additional data on participant’s thoughts while accomplishing a task. in a study on problem analysis in a tutorial group, the recorded discussion was presented to each individual group member immediately after the meeting to elicit their memories of their thinking during the discussion (de grave, van de wiel | f l r 115 boshuizen, & schmidt, 1996). participants could stop the videotape at any time to recall what they had in mind while the others were talking. the research goal was to investigate whether problem-based learning leads to conceptual change when students participate in small group discussions. the data showed that this stimulated recall procedure provided further insight into the prior knowledge invoked, the changes in reasoning based on the group’s theory building process, and the metacognitive reflections engaged in, as well as participants’ thoughts about the group process. a coding scheme guided analysis of both the group interaction and the stimulated recall protocols, enabling comparisons between the number of clauses in each coding category across the two types of protocols. a temporal analysis of the categories of theory building and meta-reasoning showed how these thinking processes interacted and how conceptual change came about. this was illustrated with the stimulated recall protocol of one student. this example shows that stimulated recall can be a very informative means of elucidating covert thinking processes in group discussions. in the other example, retrospective reports of electrical circuit problem solving were elicited by presenting the participant’s eye fixations and mouse-keyboard operations on the computer screen in the original task (van gog, paas, van merriënboer, witte, 2005). this may be a very good way of capturing the cognitive processing in perceptual-motor tasks. the goal of the study was to compare the results collected when the different methods of thinking aloud, retrospective reporting, and cued retrospective reporting were used. participants were asked to tell what they were thinking during the task while watching the record of their eye movements and actions. unfortunately, the record was replayed at the same speed as in task performance, and participants could not stop the recording to share their thoughts. this was probably the reason why cued retrospective reporting did not deliver more information than thinking aloud for three of four coding categories. task performance can slow down during thinking aloud, but this was not possible during the cued retrospective reporting procedure used in this study. in conclusion, the examples show that stimulated recall may provide valuable information about the knowledge and reasoning involved in task performance when it is well-designed and implemented. 6. conclusion the methods of interviewing and collecting verbal protocols provide rich data to examine expertise and expertise development from different perspectives in all kinds of domains. interviews are needed in the exploratory phase of research to gather information on the tasks and the problem situations to be investigated. they can also be used to obtain objective data to compare different expertise groups on targeted topics. in verbal protocol studies, knowledge, mental representations, and reasoning are examined in direct relation to representative tasks in which experts should outperform less advanced or experienced participants. the protocols collected as a result of thinking aloud, dialogues, and group discussions tap into the concurrent cognitive processes of experts and students during task performance, whereas verbal protocols of free recall, explanations, and retrospective reports are gathered after the task has been performed. all methods have their advantages and disadvantages in terms of the validity of the data obtained. to grasp the full nature of domain expertise, different methods should be applied in order to complement one another. sometimes, different verbal protocols can be gathered in the same study, for example, thinking aloud while interpreting patient information and making a diagnosis, incidental recall of the last case presented, and explanation of the case materials a week later (e.g., gilhooly et al., 1997). the selection of tasks and case materials is crucial and should be well-aligned with practices in real-life settings in order to capture expertise. varying task difficulty is an important manipulation that can be used to investigate in what situations, and for which participant groups, automatic processing falls short and deliberate thinking is involved. the interactions between knowledge, cognitive processing, and task characteristics are key to the understanding of expertise. interviews and cognitive task analysis play an important role in identifying the relevant interaction patterns, before research using verbal protocols can be designed. the formulation of interview questions and task instructions need careful attention in order to safeguard data validity when examining the underlying cognitions of expertise. analysis of interviews and verbal protocols is usually van de wiel | f l r 116 labour intensive, and reducing analysis to manageable proportions is best guided by the specific research goals. quantifying the qualitative variables identified in developing the coding schemes provides an overview that helps the researchers to find, interpret, and report main patterns in the data. all methods described in this article may contribute to further developing different domains of expertise. they can also be used to examine visual expertise, as professionals are usually able to communicate what they see, what conclusions they come to, and what they plan to do when collaborating with colleagues and teaching students. keypoints the research questions, available theory, and previous findings are the starting point for coding schemes used to analyse interviews and verbal protocols. interviews can be used to obtain relevant information in an objective way to compare groups with different levels of expertise. interviews provide indispensable insight in the tasks and materials to be selected for research gathering verbal protocols. as expertise emerges from expert knowledge in interaction with the task and the problems to be solved, the selection of participants and materials as well as the design of task instructions are critical in preparing verbal protocol studies. different methods of research, such as interviews and verbal protocols, need to be combined to fully grasp expertise in complex domains, such as visual expertise. acknowledgments i am very grateful for the valuable feedback i received from my colleagues, els boshuizen and fleurie nievelstein, and two anonymous reviewers on earlier drafts of this article. references anderson, j. r. (1996). act: a simple theory of complex cognition. american psychologist, 51(4), 355. doi:10.1037/0003-066x.51.4.355 bartram, d. (2008). work profiling and job analysis. in n. chmiel (ed.), an introduction to work and organizational psychology: a european perspective (2nd ed.) (pp. 3-28). oxford, uk: blackwell. brooks, j., mccluskey, s., turley, e., & king, n. (2015). the utility of template analysis in qualitative psychology research. qualitative research in psychology, 12(2), 202-222. doi:10.1080/14780887.2014.955224. boshuizen, h. p. a., & van de wiel, m. w. j. (1998). multiple representations in medicine: how students struggle with it. in m. w. van someren, p. reimann, h. p. a. boshuizen, & t. de jong (eds.). learning with multiple representations. amsterdam, the netherlands: elsevier. boshuizen, h. p. a., van de wiel, m. w. j., & schmidt, h. g. (2012). what and how advanced medical students learn from reasoning through multiple cases. instructional science, 40(5), 755-768. doi:10.1007/s11251-012-9211-z. boshuizen, h. p. a., & van de wiel, m. w. j. (2014). expertise development through schooling and work. in a. littlejohn, & a. margaryan (eds.), technology-enhanced professional learning: processes, practices and tools (pp. 71-84). new york, ny: taylor & francis/routledge. http://psycnet.apa.org/doi/10.1037/0003-066x.51.4.355 http://dx.doi.org/10.1007/s11251-012-9211-z van de wiel | f l r 117 chase, w. g., & simon, h. a. (1973). perception in chess. cognitive psychology, 4(1), 55-81. chi, m. t. h., (1997). quantifying qualitative analyses of verbal data: a practical guide. the journal of the learning sciences, 6(3), 271-315. doi:10.1207/s15327809jls0603_1 chi, m. t. h. (2006a). two approaches to the study of experts’ characteristics. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 21-30). new york, ny: cambridge university press. chi, m. t. h. (2006b). laboratory methods for assessing experts’ and novices’ knowledge. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 167-184). new york, ny: cambridge university press. chi, m. t. h., bassok, m., lewis, m. w., reimann, p., & glaser, r. (1989). self-explanations: how students study and use examples in learning to solve problems. cognitive science, 13(2), 145-182. doi:10.1016/0364-0213(89)90002-5 chi, m. t. h., feltovich, p. j., & glaser, r. (1981). categorization and representation of physics problems by experts and novices, cognitive science, 5, 121-152. doi:10.1207/s15516709cog0502_2 chi, m. t. h., glaser r., & farr, m. j. (1988). the nature of expertise. hillsdale, nj: lawrence erlbaum. chipman, s. f., schraagen, j. m., & shalin, v. l. (2000). introduction to cognitive task analysis. in j. m. schraagen, s. f. chipman, & v. l. shalin (eds.), cognitive task analysis (pp. 3-23). mahwah, nj: lawrence erlbaum. claessen, h. f. a., & boshuizen, h. p. a. (1985). recall of medical information by students and doctors. medical education, 19(1), 61-67. doi:10.1111/j.1365-2923.1985.tb01140.x craik, f. i. (2002). levels of processing: past, present... and future? memory, 10(5-6), 305-318. doi:10.1080/09658210244000135 custers, e. j. (2013). medical education and cognitive continuum theory: an alternative perspective on medical problem solving and clinical reasoning. academic medicine, 88(8), 1074-1080. doi:10.1097/acm.0b013e31829a3b10 deakin, h., & wakefield, k. (2014). skype interviewing: reflections of two phd researchers. qualitative research, 14(5), 603-616. doi:10.1177/1468794113488126 de bruin, a. b. h., van de wiel, m. w. j.,. rikers, r. m. j. p, & schmidt, h. g. (2005). examining the stability of experts’ clinical case processing: an experimental manipulation. instructional science, 33, 251-270. doi:10.1007/s11251-005-3598-8 de grave, w. s., boshuizen, h. p. a., & schmidt, h. g. (1996). problem based learning: cognitive and metacognitive processes during problem analysis. instructional science, 24(5), 321-341. doi:10.1007/bf00118111 de groot, a. d. (1978). thought and choice in chess. (2nd ed). the hague, the netherlands: mouton. (original work published in 1946) de leng, b., dolmans, d., van de wiel, m. w. j., muijtjens, a., & van der vleuten, c. (2007). how video cases should be used as authentic stimuli in problem-based medical education. medical education, 41, 181-188. doi:10.1111/j.1365-2929.2006.02671.x diemers, a. d., van de wiel, m. w. j., scherpbier, a. j., baarveld, f., & dolmans, d. h. (2015). diagnostic reasoning and underlying knowledge of students with preclinical patient contacts in pbl. medical education, 49(12), 1229-1238. doi:10.1111/medu.12886. diemers, a. d., van de wiel, m. w. j, heineman, e., scherpbier, a. j. j. a., & dolmans d. h. j. m. (2011). pre-clinical patient contacts and the application of biomedical and clinical knowledge. medical education, 45, 280-288. doi:10.1111/j.1365-2923.2010.03861.x. dubois, d., & shalin, v. l. (2000). describing job expertise using cognitively oriented task analyses (cota). in j. m. schraagen, s. f. chipman, & v. l. shalin (eds.), cognitive task analysis (pp. 4155). mahwah, nj: lawrence erlbaum. elstein, a. s., & schwarz, a. (2002). clinical problem solving and diagnostic decision making: selective review of the cognitive literature. british medical journal, 324, 729-732. elstein, a. s., shulman, l. s., & sprafka, s. a. (1978). medical problem solving: an analysis of clinical reasoning. cambridge, ma: harvard university press. http://dx.doi.org/10.1016/0364-0213(89)90002-5 van de wiel | f l r 118 emans, b. (2004). interviewing: theory, techniques and training. groningen, the netherlands: woltersnoordhoff. ericsson, k. a. (1996). the acquisition of expert performance: an introduction to some of the issues. in k. a. ericsson (ed.), the road to excellence: the acquisition of expert performance in the arts and sciences, sports and games. mahwah, nj: lawrence erlbaum. ericsson, k. a. (2004). deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. academic medicine, 79(10 suppl), s70-81. doi:00001888-20041000100022 ericsson, k. a. (2006a). an introduction to the cambridge handbook of expertise and expert performance: its development, organization, and content. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 3-19). new york, ny: cambridge university press. ericsson, k. a. (2006b). protocol analysis and expert thought: concurrent verbalisations of thinking during experts’ performance on representative tasks. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 223-241). new york, ny: cambridge university press. ericsson, k. a. (2009). development of professional expertise: toward measurement of expert performance and design of optimal learning environments. new york, ny: cambridge university press. ericsson, k. a. (2015). acquisition and maintenance of medical expertise: a perspective from the expertperformance approach with deliberate practice. academic medicine, 90(11), 1471-1486. doi:10.1097/acm.0000000000000939 ericsson, k. a. (2014). how to gain the benefits of the expert performance approach in domains where the correctness of decisions are not readily available: a reply to weiss and shanteau. applied cognitive psychology, 28(4), 458-463. doi:10.1002/acp.3029 ericsson, k. a., & kintsch, w. (1995). long-term working memory. psychological review, 102(2), 211245. doi:10.1037/0033-295x.102.2.211 ericsson, k. a., krampe, r. t., & tesch-römer, c. (1993). the role of deliberate practice in the acquisition of expert performance. psychological review, 100(3), 363-406. doi:10.1037/0033-295x.100.3.363 ericsson, k. a., patel, v., & kintsch, w. (2000). how experts' adaptations to representative task demands account for the expertise effect in memory recall: comment on vicente and wang (1998). psychological review, 107(3), 578-592. doi:10.1037/0033-295x.107.3.578 ericsson, k. a., & r. poole (2016). peak: secrets from the new science of expertise. london, uk: the bodley head. ericsson, k. a., & simon, h. a. (1980). verbal reports as data. psychological review, 87(3), 215-251. doi:10.1037/0033-295x.87.3.215 ericsson, k. a., & simon, h. a. (1993). protocol analysis: verbal reports as data. cambridge, ma: mit press. ericsson, k. a., & smith, j. (1991). prospects and limits of the empirical study of expertise: an introduction. in k. a. ericsson, & j. smith (eds.), toward a general theory of expertise: prospects and limits (pp. 138). cambridge: cambridge university press. evetts, j., mieg, h. a., & felt, u. (2006). professionalization, scientific expertise, and elitsm: a sociological perspective. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 105-123). new york, ny: cambridge university press. feltovich, p. j., & barrows, h. s. (1984). issues of generality in medical problem solving. in h. g. schmidt, & m. l. de volder (eds.), tutorials in problem-based learning. new directions in training for the health professions, (pp. 128-142). assen/maastricht: van gorcum. feltovich, p. j., prietula, m. j., & ericsson, a. (2006). studies of expertise from psychological perspectives. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 41-67). new york, ny: cambridge university press. http://psycnet.apa.org/doi/10.1037/0033-295x.102.2.211 http://psycnet.apa.org/doi/10.1037/0033-295x.107.3.578 http://dx.doi.org/10.1037/0033-295x.87.3.215 http://dx.doi.org/10.1037/0033-295x.87.3.215 van de wiel | f l r 119 flanagan, j. c. (1954). the critical incident technique. psychological bulletin, 51(4), 327. doi:10.1037/h0061470 gegenfurtner, a., siewiorek, a., lehtinen, e., & säljö, r. (2013). assessing the quality of expertise differences in the comprehension of medical visualizations. vocations and learning, 6(1), 37-54. doi:10.1007/s12186-012-9088-7. gilhooly, k. j., mcgeorge, p., hunter, j., rawles, j. m., kirby, i. k., green, c., & wynn, v. (1997). biomedical knowledge in diagnostic thinking: the case of electrocardiogram (ecg) interpretation, european journal of cognitive psychology, 9(2), 199-223. doi:10.1080/713752555. hamm, r. m. (1988). clinical intuition and clinical analysis: expertise and the cognitive continuum. in j. dowie, & a. elstein (eds.), professional judgment: a reader in clinical decision making, (pp.78-105). cambridge, ma: cambridge university press. hammond, k. r., hamm, r. m., grassia, j., & pearson, t. (1987). direct comparison of the efficacy of intuitive and analytical cognition in expert judgment. transactions on systems, man and cybernetics, ieee, 17(5), 753-770. doi:10.1109/tsmc.1987.6499282 hashem, a., chi, m. t., & friedman, c. p. (2003). medical errors as a result of specialization. journal of biomedical informatics, 36(1), 61-69. doi:10.1016/s1532-0464(03)00057-1 hassebrock, f., & prietula, m. j. (1992). a protocol-based coding scheme for the analysis of medical reasoning. international journal of man-machine studies, 37(5), 613-652. doi:10.1016/00207373(92)90026-h. hatala, r., norman, g. r., & brooks, l. r. (1999). impact of a clinical scenario on accuracy of electrocardiogram interpretation. journal of general internal medicine, 14(2), 126-129. doi:10.1111/j.1525-1497.1999.tb00008.x helle, l. (2017). prospects and pitfalls in combining eye-tracking data and verbal reports. frontline learning research, 5(3), 1-12. doi:10.14786/flr.v5i3.254 hoffman, r. r., & lintern, g. (2006). eliciting and representing the knowledge of experts. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 203-222). new york, ny: cambridge university press. hoffman, r. r., shadbolt, n. r., burton, a. m., & klein, g. (1995). eliciting knowledge from experts: a methodological analysis. organizational behavior and human decision processes, 62(2), 129-158. hsieh, h. f., & shannon, s. e. (2005). three approaches to qualitative content analysis. qualitative health research, 15(9), 1277-1288. doi:10.1177/1049732305276687 jaarsma, t., jarodzka, h., nap, m., merrienboer, j. j. g., & boshuizen, h. p. a. (2014). expertise under the microscope: processing histopathological slides. medical education, 48(3), 292-300. doi:10.1111/medu.12385. kahneman, d., & klein, g. (2009). conditions for intuitive expertise: a failure to disagree. american psychologist, 64(6), 515-526. doi:10.1037/a0016755 king, n., & horrocks, c. (2010). interviews in qualitative research. london, uk: sage. klein, g. (2008). naturalistic decision making. human factors, 50(3), 456-460. doi:10.1518/001872008x288385. kok, e. m., jarodzka, h., de bruin, a. b. h., binamir, h. a. n., robben, s. g. f., & van merriënboer, j. j. g. (2015). systematic viewing in radiology: seeing more, missing less?. advances in health sciences education, 1-17. doi:10.1007/s10459-015-9624-y. kok, e. m., de bruin, a. b. h., robben, s. g. f., & van merriënboer, j. j. g. (2013). learning radiological appearances of diseases: does comparison help? learning and instruction, 23, 90-97. doi:10.1016/j.learninstruc.2012.07.004. krippendorff, k. (2012). content analysis: an introduction to its methodology (3rd ed.). newsbury park, ca: sage. krueger, r. a., & casey, m. a. (2015). focus groups: a practical guide for applied research (5 th edition). thousand oaks, ca: sage. http://dx.doi.org/10.1109/tsmc.1987.6499282 http://dx.doi.org/10.1016/s1532-0464(03)00057-1 http://dx.doi.org/10.1016/0020-7373%2892%2990026-h http://dx.doi.org/10.1016/0020-7373%2892%2990026-h http://dx.doi.org/10.1016/j.learninstruc.2012.07.004 van de wiel | f l r 120 kulatunga-moruzi, c., brooks, l. r., & norman, g. r. (2004). using comprehensive feature lists to bias medical diagnosis. journal of experimental psychology: learning, memory, and cognition, 30(3), 563572. doi:10.1037/0278-7393.30.3.563 lesgold, a., rubinson, h., feltovich, p., glaser, r., klopfer, d., wang, y., et al. (1988). expertise in a complex skill: diagnosing x-ray pictures. in m. t. h. chi, r. glaser, & m. j. farr (eds.), the nature of expertise (pp. 311-342). hillsdale, nj: lawrence erlbaum. lesgold, a. (2000). on the future of cognitive task analysis. in j. m. schraagen, s. f. chipman, & v. l. shalin (eds.), cognitive task analysis (pp. 451-465). mahwah, nj: lawrence erlbaum. mieg, h. a. (2006). social and sociological factors in the development of expertise. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 743-760). new york, ny: cambridge university press. morgan, d. l. (1996). focus groups as qualitative research (2nd ed.). thousand oaks, ca: sage. neuendorf, k. a. (2002). the content analysis guidebook. thousand oaks, ca: sage. neufeld, v. r., norman, g. r., feightner, j. w., & barrows, h. s. (1981). clinical problem-solving by medical students: a cross-sectional and longitudinal analysis. medical education, 15(5), 315-322. doi:10.1111/j.1365-2923.1981.tb02495.x norman, g. r., brooks, l. r., & allen, s. w. (1989). recall by expert medical practitioners and novices as a record of processing attention. journal of experimental psychology: learning, memory, and cognition, 15(6), 1166-1174. doi:10.1037/0278-7393.15.6.1166 norman, g. r., eva, k., brooks, l., & hamstra, s. (2006). expertise in medicine and surgery. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 339-354). new york, ny: cambridge university press. patel, v. l., & arocha, j. f. (2001). the nature of constraints on collaborative decision making in health care settings. in e. salas, & klein, g. (eds.), linking expertise and naturalistic decision making (pp. 383-405). mahwah, nj: lawrence erlbaum. patel, v. l., & groen, g. j. (1986). knowledge based solution strategies in medical reasoning. cognitive science, 10(1), 91-116. doi:10.1207/s15516709cog1001_4 patel, v. l., kaufman, d. r., & magder, s. a. (1996). the acquisition of medical expertise in complex dynamic environments. in k .a. ericcson (ed.), the road to excellence: the acquisition of expert performance in the arts and sciences, sports and games, (pp. 127-165). mahwah, nj: lawrence erlbaum. prince, k. j. a. h., van de wiel, m. w. j., van der vleuten, c. p. m., boshuizen, h. p. a., & scherpbier, a. j. j. a. (2004). junior doctors' opinions about the transition from medical school to clinical practice: a change of environment. education for health, 17(3), 323-331. doi:10.1080/13576280400002510 salas, e., & klein, g. a. (eds .) (2001). linking expertise and naturalistic decision making. mahwah, nj: lawrence erlbaum. salas, e., rosen, m. a., burke, c. s., goodwin, g. f., & fiore, s. m. (2006). the making of a dream team: when expert teams do best. in k. a. ericsson, n. charness, p. j. feltovich, & r. r. hoffman (eds.), the cambridge handbook of expertise and expert performance (pp. 439-453). cambridge, uk: cambridge university press. sanchez, j. i., & levine, e. l. (2012). the rise and fall of job analysis and the future of work analysis. annual review of psychology, 63, 397-425. doi: 10.1146/annurev-psych-120710-100401. schmidt, h. g., & boshuizen, h. p. a. (1993). on the origin of intermediate effects in clinical case recall. memory and cognition, 21, 338 351. doi:10.3758/bf03208266 shanteau, j. (1992). competence in experts: the role of task characteristics. organizational behavior and human decision processes, 53(2), 252-266. shanteau, j., weiss, d. j., thomas, r. p., & pounds, j. c. (2002). performance-based assessment of expertise: how to decide if someone is an expert or not. european journal of operational research, 136(2), 253-263. skopec, e. w. (1986). situational interviewing. prospects heights, ii: waveland press. van de wiel | f l r 121 stalmeijer, r. e., mcnaughton, n., & van mook, w. n. k. a. (2014). using focus groups in medical education research: amee guide no. 91. medical teacher, 36(11), 923-939. doi:10.3109/0142159x.2014.917165 stolper, e., van bokhoven, m., houben, p., van royen, p., van de wiel, m. w. j., van der weijden, t., & dinant, g. j. (2009). the diagnostic role of gut feelings in general practice. a focus group study of the concept and its determinants. bmc family practice, 10(1), 17. doi:10.1186/1471-2296-10-17. stolper, e., van de wiel, m. w. j., hendriks, r. h. m., van royen, p., van bokhoven, m., van der weijden, t., & dinant, g. j. (2015). how do gut feelings feature in tutorial dialogues on diagnostic reasoning in gp traineeship? advances in health sciences education, 20, 499-513. doi:10.1007/s10459-014-9543-3. stolper, e., van de wiel, m. w. j., van bokhoven, m., van royen, p., van der weijden, t., & dinant, g. j. (2011). gut feelings as a third track in general practitioners’ diagnostic reasoning. journal of general internal medicine, 26, 197-203. doi:10.1007/s11606-010-1524-5. shadbolt, n. r., & smart, p. r. (2015) knowledge elicitation. in j. r. wilson, & s. sharples (eds.), evaluation of human work (4th ed.). boca raton, fl: crc press. tracey, t. j., wampold, b. e., lichtenberg, j. w., & goodyear, r. k. (2014). expertise in psychotherapy: an elusive goal? american psychologist, 69(3), 218229. doi:10.1037/a0035099 van de wiel, m. w. j., boshuizen, h. p. a., & schmidt, h. g. (2000). knowledge restructuring in expertise development: evidence from pathophysiological representations of clinical cases by students and physicians. european journal of cognitive psychology, 12(3), 323-355. doi:10.1080/09541440050114543 van de wiel, m. w. j., boshuizen, h. p. a., schmidt, h. g. & schaper, n. c. (1999). the explanation of medical concepts by expert physicians, clerks and advanced students. teaching and learning in medicine, 11(3), 153-163. doi:10.1207/s15328015tl110306 van de wiel, m. w. j., ploegh, k., boshuizen, h. p. a., & schmidt, h. g. (2005). the influence of diagnosis and memorization instructions on clinical case processing by students and physicians. paper presented at the annual meeting of the american educational research association 2005. montreal, canada, april 11-15. van de wiel, m. w. j., schaper, n. c., scherpbier, a. j. j. a., van der vleuten, c. p. m., & boshuizen, h. p. a. (1999). students' experiences with real patient tutorials in a problem-based curriculum. teaching and learning in medicine, 11(1), 12-20. doi:10.1207/s15328015tlm1101_5 van de wiel, m. w. j., & schmidt, h. g., boshuizen, h. p. a. (1998). a failure to reproduce the intermediate effect in clinical case recall. academic medicine, 73(8), 894-900. van de wiel, m. w. j., & van den bossche, p. (2013). deliberate practice in medicine: the motivation to engage in work-related learning and its contribution to expertise. vocations and learning, 6(1), 135158. doi:10.1007/s12186-012-9085-x. van de wiel, m. w. j., van den bossche, p., janssen, s., & jossberger, h. (2011). exploring deliberate practice in medicine: how do physicians learn in the workplace? advances in health sciences education. 16(1), 81-95. doi:10.1007/s10459-010-9246-3. van de wiel, m. w. j., van den bossche, p., & koopmans, r. p. (2011). deliberate practice, the high road to expertise: k.a. ericsson. in dochy, f., gijbels, d., segers, m., & van den bossche, p. (eds.), theories of learning for the workplace: building blocks for training and professional development programs (pp. 1-16). london, uk: routledge. van gog, t., paas, f., van merriënboer, j. j., & witte, p. (2005). uncovering the problem-solving process: cued retrospective reporting versus concurrent and retrospective reporting. journal of experimental psychology: applied, 11(4), 237-244. doi:10.1037/1076-898x.11.4.237 van someren, m. v., barnard, y. f., & sandberg, j. a. (1994). the think aloud method: a practical approach to modelling cognitive processes. london, uk: academic press. weiss, d. j., & shanteau, j. (2003). empirical assessment of expertise. human factors, 45(1), 104-116. http://psycnet.apa.org/doi/10.1037/a0035099 van de wiel | f l r 122 wimmers, p. f., schmidt, h. g., verkoeijen, p. p. j. l., & van de wiel, m. w. j. (2005). inducing expertise effects in clinical case recall through the manipulation of processing. medical education, 39, 949-957. doi:10.1111/j.1365-2929.2005.02250.x